Multiple Personality Disorder

Human beings aren’t the only ones who face the problem of dissociative identities, you know? Numbers face this affliction as well. They can have multiple split personalities with respect to other members of their kind. When you modulo them with other numbers, they take on different personas w.r.t. each number. The problem today is to help them recover their original selves given their various personalities; help them find out who they really are !

Alternatively, more formally, say you are given the following system of congruence equations:
x \equiv c_{1} \pmod p_{1}\\  x \equiv c_{2} \pmod p_{2}\\  ...\\  ...\\  x \equiv c_{n} \pmod p_{n}\\
where the p_{i}‘s are prime numbers and c_{i}‘s are numbers belonging to the domain and you have to solve for x.

Essentially, we merge x’s identities one by one together into a whole. We find out what led to his different personalities and use that knowledge to reconstruct his whole self. In other words;
Small Primes Modular Computation is a useful technique that allows efficient computation of x by using the Chinese remainder algorithm (CRA). The implemented solution to the problem is available here and the algorithm is as follows:
Input: p_{1},...,p_{n} \in R , c_{1},...,c_{n} \in R , where R is a Euclidean domain.
Output: x \in R such that x \equiv c_{i} \pmod p_{i} for 1 \leq i \leq n
1. P \leftarrow p_{1}...p_{n}
2. for 1 \leq i \leq n do

3. return \sum_{1 \leq i \leq n} x_{i}\frac{P}{p_{i}}

To help x recover faster, there are two obvious ways, if you think about it. First, you take identities two at a time and unify them. Have a team of psychologists work simultaneously with two distinct identities to improve efficiency. Keep doing this recursively until you are left with the final unified one !!! That is…
The above algorithm can be sped up considerably by taking into account the following ideas.

  1. The number ‘n’ could be very large and as a result the product of the primes would be very large although the primes may be small. Thus the Extended Euclid would get very skewed inputs and would take longer to evaluate. Instead, we can sort the primes array and then recombine in pairs such as c_{i}, c_{n-i}. By recursively calling this recombine function, we can balance the inputs and solve faster.
  2. Since each of the individual iterations of the for loops are independent of each other, they can be evaluated in parallel. Even if the system has 2 cores (which has become common today), the time taken reduces by half.

Great ! Now, we’ve successfully helped one patient who had n split personalities. What if many more such patients showed up? What would we do then?
Alternatively, a generalization of the problem is to take as input a list of integer vectors C = \{c_{1},... c_{n}\} where c_{i} = \{c_{i1},...,c_{ik}\} and a list of coprime moduli M = \{m_{1},..., m_{n}\} . To solve this problem, instead of simply applying CRA to every pair of elements from the integer vectors and moduli, as in the function above, for each vector in the list, one can apply the vector form of the CRA. In other words, apply CRA component wise by first finding the CRA_basis E = \{e_{1},...,e_{n}\} where e_{i} \equiv 1 \pmod m_{i} and e_{i} \equiv 0 \pmod m_{j} \forall j \neq i.

This post and implementation is a result of the discussion with the mentors from fplll, Martin Albrecht and Damien Stehle.

The Bipartite Matching Problem (Part III)

Its time to design the algorithm now. While there exist many algorithms to solve this problem today, we shall take a look at the one that makes use of the algorithm for the Maximum Flow Problem mentioned earlier to find the maximum matching. The aim of this post is to explain the algorithm. No focus is placed on ways to implement the algorithm. I’ll leave that to the reader to figure out if he/she so wishes. Like I mentioned at the start of this series, the idea is to arrive at a clearer understanding of how analysis of algorithms actually happens.

With that said, lets get down to it. If you recall, the graph defining the matching problem is undirected, while the flow networks are directed. The first step is to convert the undirected graph G (fig (a) ) , an instance of the Bipartite Matching problem into the corresponding flow network G’ (fig (b) ). Consider the figure below:



First, we direct all edges in G from L to R without loss of generality. We now add two more nodes “s” and “t”. Second we add directed edges (s,l) from s to each node in L.  Accordingly, we add directed edges (r,t) from each node in R to t. Finally, we give each edge in G’ a capacity of 1.

We now compute a maximum s-t flow in this network G’. And that’s it. We are done !

I can imagine the confusion. I remember feeling it too. Its not at all apparent that the answer that we just got is actually the answer to the problem we have been reading about. In part IV, our analysis will show that the value of this maximum flow is in fact equal to the size of the maximum matching in G. Moreover, we shall bound the running time of this algorithm.

The Bipartite Matching Problem (Part II)

Now that we are acquainted with the necessary background, lets try and formulate the problem with enough mathematical precision so that we can ask concrete questions and start thinking about algorithms to solve it. We set up the Bipartite Matching Problem as follows:

Let X {=} \{x_{1}, x_{2},...,x_{n}\} and Y {=} \{y_{1}, y_{2},...,y_{n}\} be two sets and let their Cartesian product X \times Y , represent the set of all possible ordered pairs of the form (x,y) , where x\in X and y\in Y . Then, a matching S can be defined as a set of ordered pairs, each from X \times Y with the property that each member of X and each member of Y appears in at most one pair in S. A perfect matching is a set S’ such that each member of X and each member of Y appear in exactly one pair in S’.

Recall the definition of a bipartite graph  G = (V, E). It is an undirected graph whose vertex set can be partitioned as V = X \cup Y with the property that every edge e \in E has one vertex in X and one vertex in Y. Consequently, in the case of graph, the edges can be viewed as pairs of nodes and we say that a matching in a graph G = (V, E), is a set of edges M \subseteq E with the property that each vertex appears in at most one edge of M. A set of edges M is called a perfect matching if each vertex of the graph appears in exactly one edge of M.
The Bipartite Matching Problem is that of finding a matching in G of the largest possible of size.

As a side note, the Stable Matching Problem and the Gale-Shapley algorithm proposed to solve it, is particularly interesting too. It is in fact the original version of the current problem.

The Bipartite Matching Problem (Part I)

I was reading about Covering-Packing Dualities that have been found in graphs and I came across the Minimum Vertex Cover and its packing dual, Maximum Matching. I thought I’d write about the Bipartite Matching Problem here. That way I’d remember it more clearly too. This was one of the first problems I read that allowed me understand the richness of the process of algorithm design. That’s why this post is in parts each of which represents the significant steps in the design of the algorithm. I’ll start (in this one) with a short description of the background required. Part II will be a description of the problem itself. Part III will describe the algorithm used to solve it and I will conclude in the last part with a slightly detailed analysis of the algorithm itself.

So lets begin…
Often we use graphs wherein the edges are “pipes” that carry “liquid” and the nodes are junctures where the pipes are joined together. Network graphs of this type have components such as capacities on the edges signifying the maximum liquid that they can carry at any given moment, source nodes that generate the traffic, target nodes that act as the destination points and of course, the liquid or traffic itself.  Formally, a Flow Network is a directed graph  G {=} (V, E) where

  • Each edge ‘e’ is associated with a capacity which is a non-negative number denoted by c_{e} .
  • There are k pairs of source-target nodes \{(s_{i}, t_{i})\}_{i} \in V .
  • There is a single target node t \in V.

An s-t flow is defined as a function f that maps each edge e to a non-negative real number, f : E \rightarrow \textbf{R}^{+} and the value f(e) represents the amount of flow carried by edge e. Any flow f must also satisfy the following two conditions:

  • (Capacity Constraints) For each edge e\in E we have, 0\leq f(e) \leq c_{e} .
  • (Conservation constraints) For each node v, we have $latex \sum_{\text{e into v}} f(e)\text{ } {=} \sum_{\text{e out of v}} f(e) $

The value of the flow is denoted by v(f) and is defined to be the amount of flow generated at the source: v(f)\text{ } {=} \sum_{\text{e out of s}} f(e) .  The Maximum Flow problem, given a flow network, is to find a flow of maximum possible value by arranging the traffic so as to make as efficient use as possible of the available capacities.

Now there exist a two major algorithms to solve the max-flow problem namely, Ford-Fulkerson and PreFlow Push. We need these to solve our original problem. If you are already familiar with them, you can proceed directly to the next part. Otherwise, I suggest you go over them.