###################
##### AUTHORS #####
###################

Written by Vikram Saraph (vikram_saraph@brown.edu, vsaraph@nd.edu)
And Vipin Vijayan (vvijayan@nd.edu)
In collaboration with Prof. Tijana Milenkovic (tmilenko@nd.edu)

#######################
##### DESCRIPTION #####
#######################

This README describes the usage of the command line interface of MAGNA++, which implements
our network alignment algorithm, MAGNA++. MAGNA++ is a pairwise global
aligner and a genetic algorithm. It can either start computation from precomputed alignments,
or start completely from scratch.

###################
##### OPTIONS #####
###################

-G | source network
-H | target network
Both networks are specified as LEDA (.gw) graphs. A typical .gw file might look like:

====== BEGIN ======
LEDA.GRAPH
void
void
-2
3
|{A}|
|{B}|
|{C}|
3
1 2 0 |{}|
1 3 0 |{}|
2 3 0 |{}|
======= END =======

In LEDA, the first four lines are meant to specify the format of the data to appear in the remainder
of the file, though my code does not need this information, and is therefore ignored. The fifth line
indicates the number of nodes in the network. Following this are the nodes themselves, each of
which is delimited on each end by `|{' and `}|', respectively. Edges are specified after the nodes,
in a similar format. See http://reference.wolfram.com/mathematica/ref/format/LEDA.html for more details.

------------------------------------------------------------

-i | initial population (default: random)
The initial population is specified by a plain text file. This text file should contain the file names
of each nonrandom alignment to be part of the initial population. For example, this file might look like:

====== BEGIN ======
alignment1.aln
alignment2.aln
alignment3.aln
======= END =======

If the population size (see -p below) is larger than the number of alignments given in the initial
population file, the remainder of the initial population is automatically filled with randomly
generated alignments. If, however, the population size is smaller, then the program simply selects
the first P alignments from the initial population file, where P is the population size.

Each alignment files (with extension .aln) is given as text file with two, tab-separated columns.
Each row is an aligned pair of nodes, with the node in the first column belonging to the source network,
and the second belonging to the target. Such a file may look like:

====== BEGIN ======
A	D	
B	E
C	F
======= END =======

The initial population file provides a way to give the algorithm something nonrandom to start with.
For example, one may wish to include alignments generated by another algorithm into the initial population.
If -i is not specified (default behavior), then MAGNA++ uses a completely random initial population.

------------------------------------------------------------

-o | output file
This parameter specifies the prefix each output alignment should have. Suppose we are aligning net1.gw with
net2.gw. Then we might want `net1_net2' to appear at the beginning of the name of each alignment file outputted
by MAGNA++. The -o parameter provides the means to do this.

------------------------------------------------------------

-m | optimizing measure
This implementation of MAGNA++ is capable of optimizing three different measures of alignment quality.
These are edge correctness (EC), induced conserved structure (ICS), and symmetric substructure score (S^3).
To optimize EC, pass `EC' to the -m flag; to optimize ICS, pass `ICS'; and to optimize S^3, pass `S3'. Note that
`EC' (respectively `ICS' and '3S') appear in the names of the outputted alignment files.

------------------------------------------------------------

-d | node comparison data file
This implementation of MAGNA++ allows us to add node comparison scores to the edge weights.
The input for this is the file name that contains the scores.
The scores must be similarity scores, which means the higher the number, the more similar the nodes are.
The scores must also be non-negative real numbers.
The format of the score data file is real numbers seperated by spaces. For example, with the source graph of 2 nodes and target graph of 3 nodes, the score file will look as follows.
<FILE START>
0.90 0.75 0.50
0.65 0.95 0.60
<FILE END>
The final node score is calculated by taking the average of the node pairs.

------------------------------------------------------------

-a | alpha value
This value, which is between 0 and 1 inclusive, accounts for the the weighting between the edge score and the node score. Given edge score, E, specified by the option -m, and the node score N, specified by the option -d. The final score is
alpha*E + (1-alpha)*N
By default alpha = 1, which means it only uses edge scores by default.
If alpha = 0, then it uses only the node scores, which is equivalent to solving the assignment problem.

------------------------------------------------------------

-p | population size
This parameter specifies the size of the population used by MAGNA++. Generally, a larger population
size means better alignments. This value also appears in the names of the outputted alignment files.

------------------------------------------------------------

-e | elite percentage
This value tells MAGNA++ what percentage of the population it should consider "elite," or how many of
the better alignments are passed to the next generation. By default, this value is 0.5, as MAGNA++ discards
half of the population with each new generation. The user is free to vary this parameter, though 0.5 typically
works best.

------------------------------------------------------------

-n | number of generations
This parameter specifies the number of generations for which to run MAGNA++. Generally, running MAGNA++
for more generations yields higher quality alignments.

------------------------------------------------------------

-f | frequency of output
This parameter is used to specify how many times MAGNA++ should output the best alignment in the population.
For example, suppose the user wants to run MAGNA++ for 1000 generations, with a frequency 5. Then the best
alignment from the population is written after generations 0, 200, 400, 600, 800, and 1000. (The best of the zeroth
generation is also outputted, regardless of the frequency). It is strongly recommended that the frequency of output
divide the number of generations. By default, the frequency is set to 1, which means only the best alignment from
the very last generation is outputted.

------------------------------------------------------------

-t | number of threads
This parameter specifies the number of threads used to run MAGNA++.
The recommended number of threads is the number of cores available in the computer.


##############################
##### SAMPLE EXECUTION 1 #####
##############################

Suppose we have networks test_graph1.gw and test_graph2.gw, with an initial population defined by test_population.txt.
We want to optimize edge correctness, and let MAGNA++ run for 10 generations on a population size of 10.
In addition, we want the best alignment of the population halfway through execution. The following command does this.

./magna -G test_graph1.gw -H test_graph2.gw -i test_population.txt -o test_run -m EC -p 10 -n 10 -f 2

After execution, as a result, the following three files are written in the same directory as the executable:
test_run_EC_10_10_0.aln
test_run_EC_10_10_5.aln
test_run_EC_10_10_10.aln


##############################
##### SAMPLE EXECUTION 2 #####
##############################

Suppose we have networks test_graph1.gw and test_graph2.gw, with an initial population defined by test_population.txt. And node score similarity matrix file, nodescores.txt.
We want to optimize edge correctness and node comparisons, and let MAGNA++ run for 10 generations on a population size of 10. We want to weight the edge score and node score with 0.6.
In addition, we want the best alignment of the population halfway through execution. The following command does this.

./magna -G test_graph1.gw -H test_graph2.gw -i test_population.txt -o test_run -m EC -p 10 -n 10 -f 2 -d nodescores.txt -a 0.6

After execution, as a result, the following three files are written in the same directory as the executable:
test_run_EC_10_10_0.aln
test_run_EC_10_10_5.aln
test_run_EC_10_10_10.aln


##############################
##### SAMPLE EXECUTION 3 #####
##############################
./magna -G ../data/ex1.gw -H ../data/ex2.gw -o ../results/ex -m S3 -p 20 -n 20 -i ../data/exinitpop.txt -a 0.5 -d ../data/exgdvsim.txt
 where ex1.gw and ex2.gw are the networks
exinitpop.txt is the initial population file
exgdvsim.txt is the node similarity file
