--------------------------------------------------------
Network Alignment Programs Based on the Hungarian Method
--------------------------------------------------------

[A] Overview of programs:

There are two programs: (1) find_one_opt_align and (2) find_opt_pairs. 
find_one_opt_align finds one optimal alignment from one network (G1) into 
another network (G2) of equal or greater number of nodes by minimizing the sum 
of G1-node-to-G2-node costs of the alignment using the Hungarian method for 
minimum-weight bipartite matching. find_opt_pairs is a follow-on program to 
find_one_opt_align which finds all G1- node-to-G2-node pairings that are 
involved in at least one optimal alignment. It does this via a dynamic variant 
of the Hungarian method that allows subsequent optimal alignments to be found 
very quickly given a known optimal alignment. The Hungarian method originated 
from the work of two Hungarian mathematicians (hence the name) and has a long 
history in mathematics, computer science and optimization research. It is 
described in most graph theory texts e.g. [1]. The dynamic variant of the 
Hungarian method is described in [2].

[B] Compiling the programs:

To compile both programs, run "make" in the source directory (source). To test 
that the programs are working properly, run "make test_all" in the source folder 
and examine the output to see if any error messages arise; the tests are based 
on test vectors found in the source/test_vectors folder. Running "make clean" 
will clear all programs and other intermediate files.

[C} Network node labels:
Nodes from each network are labeled with non-negative integers. If a network has 
N nodes, nodes will be labeled from 0 to (N-1). All inputs and outputs to/from 
both programs use these integer labels. If the networks were originally labeled 
using some other sets of labels, you must record and keep track of the 
correspondence between his/her set of labels and the integer labels. This 
correspondence should be noted when you create the node-to-node paired cost 
matrix.

[D] Running find_one_opt_align:

To see the following usage template, run the program without arguments:
Usage: find_one_opt_align <one_opt_align_file> <mod_paired_cost_matrix_file> 
<ori_paired_cost_matrix_file>

Input file: <ori_paired_cost_matrix_file>

<ori_paired_cost_matrix_file> contains the matrix of G1-node-to-G2-node costs. 
The rows and columns of the matrix correspond to nodes of G1 and G2 
respectively. File format: first line contains the number of rows followed by 
the number of columns of the matrix. Subsequent lines comprise space-separated 
entries of the matrix.

e.g.
2 3         <- 2 rows, 3 columns
0.1 0.2 0.3 <- row 0 (corresponding to one node of G1)
0.2 0.3 0.4 <- row 1 (corresponding to the other node of G1)

In the above example, G1 has 2 nodes and G2 has 3 nodes (hence 3 columns in the 
matrix).

Output files: <one_opt_align_file> <mod_paired_cost_matrix_file>

<one_opt_align_file> will contain an optimal alignment of G1 to G2. File format: 
first line contains the number of nodes of G1. This is followed by a column 
vector of G2 nodes (labeled by integers as explained in [C]) aligned to G1 nodes 
0, 1, 2 ...

e.g.
3  <- number of nodes of G1
1  <- node 0 of G1 aligned to node 1 of G2
16 <- node 1 of G1 aligned to node 16 of G2
0  <- node 2 of G1 aligned to node 0 of G2

<mod_paired_cost_matrix_file> will contain the modified cost matrix resulting 
from running the Hungarian algorithm. It is needed as an input to the 
find_opt_pairs program. However, if you do not need to run the find_opt_pairs 
program and hence do not require this file, just set it to "/dev/null" when 
running the program. File format: exactly the same as for 
<ori_paired_cost_matrix_file>.

[E] Running find_opt_pairs:

To see the following usage template, run the program without arguments:
Usage: find_opt_pairs <opt_pairs_file> <one_opt_align_file> 
<mod_paired_cost_matrix_file> <ori_paired_cost_matrix_file> <start_row> 
<end_row>

Input files: <one_opt_align_file> <mod_paired_cost_matrix_file> 
<ori_paired_cost_matrix_file>

These are the same as the files described in [D].

Input parameters: <start_row> <end_row>

<start_row> and <end_row> are non-negative integers defining a range of nodes of 
G1. The range is inclusive of <start_row> and <end_row>. For each G1-node U in 
the range, a corresponding list of G2-nodes will be produced. If V(U) is in this 
list for U, that means the pairing (U, V(U)) is in at least one optimal 
alignment. In particular, if V(U) is only element in the list, then we can say 
that (U, V(U)) must appear in every possible optimal alignment. Each element 
V(U) in the list defines an optimal pair (U, V(U)). To parallelize the 
computation of optimal pairs over G1, partition the node-range of G1 into 
subranges and run the program over each subrange.

Output file: <opt_pairs_file>_<start_row>_<end_row>

Note that the output filename is <opt_pairs_file> suffixed with the range of G1-
nodes. So for example if <opt_pairs_file> = "opt_pairs", <start_row> = 5 and 
<end_row> = 7 then the output file will be "opt_pairs_5_7".

<opt_pairs_file>_<start_row>_<end_row> contains the list of lists of G2-nodes 
corresponding to optimal pairs. File format: first line contains <start_row> 
followed by <end_row>. This is followed by rows of lists of G2 nodes optimally-
paired to G1 nodes 0, 1, 2 ...
 
e.g.
5 7    <- start_row, end_row
5 6    <- (5,5) and (5,6) are optimal pairs
8 9 11 <- (6,8), (6,9) and (6,11) are optimal pairs
4      <- (7,4) is the sole optimal pair for node 7 of G1; any optimal alignment
          must align node 7 of G1 to node 4 of G2

[F] General remarks
        
At the beginning of each program, sanity checks will be made on the input 
arguments e.g. checking if specified input files exist. If everything checks 
out, the confirmation line "Inputs are valid" will be written to standard error. 
If you are starting a long computation, make sure to stick around until the 
confirmation line is echoed so that you are sure the program will run to 
completion. Throughout the computation, progress percentages will be written to 
standard out and at the end, the time in second will be written to standard out. 
If you are running the programs in the background, use > to redirect the 
standard out information to a file.

e.g. of standard out information for find_one_opt_align

> ./find_one_opt_align one_opt_align mod_paired_cost_matrix paired_cost_matrix
Program: find_one_opt_align

Output alignment will be stored at: one_opt_align
Output modified paired cost matrix will be stored at: mod_paired_cost_matrix
Input original paired cost matrix will be read at: paired_cost_matrix

progress = 0%
progress = 50%
progress = 100%

Time = 0 seconds.

e.g. of standard out information for find_opt_pairs

> ./find_opt_pairs opt_pairs one_opt_align mod_paired_cost_natrix paired_cost_matrix 5 7
Program: find_opt_pairs_5_7

Output optimal pairs will be stored at: opt_pairs_5_7
Input optimal alignment will be read at: one_opt_align
Input modified paired cost matrix file will be read at: mod_paired_cost_natrix
Input original paired cost matrix file will be read at: paired_cost_matrix
Input start row index: 5
Input end row index: 7

progress = 33.3333%
progress = 66.6667%
progress = 100%

Time = 0 seconds.

References:
[1] Douglas B. West, Introduction to Graph Theory (Second Edition), Chapter 3.2
[2] G. Ayorkor Mills-Tettey, Anthony (Tony) Stentz, and M Bernardine Dias, "The 
Dynamic Hungarian Algorithm for the Assignment Problem with Changing Costs," 
tech. report CMU-RI-TR-07-27, Robotics Institute, Carnegie Mellon University, 
July, 2007
