Enhancing gene co-expression network inference for the malaria parasite Plasmodium falciparum

 

Contact: Prof. Tijana Milenkovic, tmilenko AT nd DOT edu


Reference: Qi Li, Katrina A Button-Simons, Mackenzie AC Sievert, Elias Chahoud, Gabriel F Foster, Kaitlynn Meis, Michael T Ferdig, and Tijana Milenković (2023). Enhancing gene co-expression network inference for the malaria parasite Plasmodium falciparum. under review.

Abstract: Malaria results in more than 550,000 deaths each year due to drug resistance in the most lethal Plasmodium (P.) species P. falciparum}. A full P. falciparum genome was published in 2002, yet 44.6% of its genes have unknown function. Improving functional annotation of genes is important for identifying drug targets and understanding evolution of drug resistance. Genes function by interacting with one another. So, analyzing gene co-expression networks can enhance functional annotations and prioritize genes for wet lab validation. Earlier efforts to build gene co-expression networks in P. falciparum have been limited to a single network inference method and gaining biological understanding for only a single gene and its interacting partners. Here, we explore multiple inference methods and aim to systematically predict functional annotations for all P. falciparum genes. We evaluate each inferred network based on how well it predicts existing gene-Gene Ontology (GO) term annotations using network clustering and leave-one-out cross-validation. We assess overlaps of the different networks' edges (gene co-expression relationships) as well as predicted functional knowledge. The networks' edges are overall complementary: 47%-85% of all edges are unique to each network. In terms of accuracy of predicting gene functional annotations, all networks yield relatively high precision (as high as 87% for the network inferred using mutual information), but the highest recall reached is below 15%. All networks having low recall means that none of them capture a large amount of all existing gene-GO term annotations. In fact, their annotation predictions are highly complementary, with the largest pairwise overlap of only 27%. The different networks seem to capture different aspects of the P. falciparum biology in terms of both inferred interactions and predicted gene functional annotations. Thus, relying on a single network inference method should be avoided when possible.


Gene expression data: GSE19468 can be downloaded from NIH website.

  • Reference: "Hu, G. et al. (2010). Transcriptional profiling of growth perturbations of the human malaria parasite Plasmodium falciparum. Nature Biotechnology 28(1), 91–98."

  • Our processed GSE19468 (i.e., Additional File 5) can be downloaded here.

  • Our processed GSE19468 (i.e., Additional File 6) with cyclical stage variation removed can be downloaded here.


Ground truth GO annotation data: We consider gene-GO term annotation from GeneDB and PlasmoDB.

  • Our processed gene-GO term associations (i.e., Additional File 7) can be downloaded here.


Endocytosis data: We consider three endocytosis-related gene lists, kelch13-, EPS15-, and clathrin- interacting genes.

  • Reference: "Birnbaum, J. et al. (2020). A Kelch13-defined endocytosis pathway mediates artemisinin resistance in malaria parasites. Science 367(6473), 51–59.

  • The three gene lists can be found from the supplementary file of the original publication (i.e., referenced above), and also can be downloaded here.


Gene co-expression networks:

  • Reference of ARACNe: "Margolin, A.A. et al. (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, vol. 7, pp. 1–7.

    • The implementation of ARACNe can be found here.

    Reference of RF: "Huynh-Thu, V.A. et al. (2010). Inferring regulatory networks from expression data using tree-based methods. PLOS ONE 5(9), 1–10.

    • The implementation of GENIE3 can be found here.

    Reference of AdaL: "Krämer, N. et al. (2009). Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinformatics 10(1), 384.

    • The implementation of AdaL can be found from R repository CRAN package called parcor".

  • All of our inferred gene co-expression networks can be found here.


Clustering methods:

  • The implementation of BigCLAM can be found here.

    • Reference: "Yang, J. et al. (2013). Overlapping community detection at scale: a nonnegative matrix factorization approach. in Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 587–596."

  • The implementation of MCL can be found here.

    • Reference: "Enright, A.J. et al. (2002).An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 30(7), 1575–1584."


Implementations:

  • The implementation of our network validation can be found here.


Predictions:

  • The predicted gene-GO term associations with respect to true positives (TP, i.e., Additional File 2) and novel preidctions (NP, i.e., Additional File 3) can be found: here.

    The predicted gene-gene interactions with confidence scores (i.e., Additional File 4) can be found: here.