Network-based machine learning and graph theory algorithms for precision oncologyNetwork-based machine learning and graph theory algorithms for precision oncologyWei Zhang and Jeremy Chien and Jeongsik Yong and Rui Kuang2017
Paper summarymweissThe overarching goal of precision oncology is to use personal genomic information, like gene expression data from a tumor biopsy, to devise individualized treatment plans. Network-based analytics support this goal in three ways. First, by helping us to identify malfunctioning gene pathways by observing patient-specific gene expression levels in relation to one another. Second, by enabling us to predict the phenotype, or observable characteristics, of a patient's cancer. Third, by recommending drugs to re-purpose as chemotherapies that targeting a specific gene or protein that has been dysregulated in a pathway.
## Identifying Pathological Gene Pathways
Cancer is best characterized as a disease caused by frequently mutated genes that cause dysregulated biological pathways. There are many kinds of biological pathways, such as metabolic pathways that may synthesize useful molecules or break down complex molecules, or a gene-regulatory pathway wherein molecules like proteins, mRNA, or DNA regulate the production, expression, and activity of other molecules thereby affecting the behaviour of the cell.
Systems biology makes use of many types of pathways, which may be viewed as molecular networks like the gene-regulatory network, the protein-protein interaction network, and the cellular signal transduction network. One goal of network-based analysis of personal genomic profiles is to identify network modules that are causative of cancer and informative of cancer phenotype. This paper outlines three ways of integrating molecular networks to make these predictions: model-based integration, pre-processing integration, and post-analysis integration.
The authors use the phrase **pre-processing integration** to indicate a two-step process. The first step detects molecular sub-networks by using highly discriminative genes as seed genes in a greedy search to find discriminative sub-networks. The gene expression in each sub-network is then normalized as one feature value for classification with logistic regression or another method. This method allows you to customize your sub-network features, but generally does not produce optimal predictions.
An example application of this type of method comes from a paper called [Network-based classification of breast cancer metastasis](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2063581/). Their goal is to determine which gene pathways give rise to metastasis. They note that previous methods had identified certain genes as biomarkers correlated with metastasis. Their approach was to use these genes as seeds for an algorithm to identify discriminative sub-networks in a protein-protein interaction network.
To conduct the **post-analysis integration** of oncogenic alterations in networks, first analyze the patient genomic profiles to generate a list of oncogenic alterations. Second, map those alterations to the network as seed genes for the module analysis. This method is highly informative of cancer mechanisms in the network, but relies on accurate identification of oncogenic alterations in the patient data.
When using a **model-based integration** approach, you incorporate the molecular network into the machine learning model as a regularizing graph Laplacian. The coefficients that the model learns form dense sub-networks that are used to enable the model to predict patient survival or cancer phenotype. This form of framework has a global optimization strategy and generally produces the best outcome predictions.
## Methods for Drug Re-purposing
If you want to re-purpose a drug for use against a cancer target, there are three networks that are useful to inform your work: drug-drug similarity, drug-target relations, and gene-gene relations. Additionally, there are three types of methods that you can use with this data to create a predictive model: graph connectivity measures, link prediction models, and network based classification methods.
## Graph Connectivity Methods
Several studies have shown that drugs sharing similar chemical structures, side-effects, or similarities in gene-expression profiles following drug treatment, can be good candidates for re-purposing. It follows that we may combine a drug-drug interaction network, a drug-target interaction network, and a target-target interaction network to estimate the likelihood that a specific drug may be re-purposed for a specific target. The proxy that is used for similarity is simply length of the path from the query drug in the drug-drug interaction network to the target in the target-target interaction network.
## Link Prediction Models
The link prediction model is close to the Graph Connectivity method, but relies on more advanced similarity measures and the global graph structure to determine relations between drugs and targets. For example, one paper using a link prediction model for drug repurposing uses drug-drug two dimensional structural similarity, target-target genomic sequence similarity, and a dataset of known drug-target interactions. Once these measures have been calculated two established methods are used to predict new drug-target relations: matrix completion and random walks.
## Network-based Classification Methods
Here we view drug repositioning as a classification problem and apply standard classification methods such as SVM. The inputs to the classifier are network topological features (local neighborhood) and the labels are from a dataset of drug-target relations describing whether a specific drug is effective against a specific target. You use the classifier to make predictions on the gene neighborhoods present in your test set.
## Network-Based Analysis of TCGA Mutation Data
The paper also conducts a network-based analysis mutated genes in 31 cancer genome projects in TCGA using enriched KEGG pathways. The authors find that they are able to re-identify an important signalling pathway that requires the gene AMPK and is known to be affected in breast cancer (BRCA) and uterine corpus endometrial cancer (UCEC).
This article surveys precision oncology, primarily discussing drug re-positioning and alorithms for integrating molecular networks with patient data. It also contains a case-study on ovarian cancer, and a network-based analysis of TCGA data using gene mutations.
## Future Work
The authors outline several directions for future work. One direction is to explore higher-resolution graphs that distinguish between isoforms of genes and capture more complex relationships between molecules. Another is to research heterogeneous cell populations in tumors using single-cell RNA sequencing to differentiate cells with different drug targets. Finally, it is essential to develop a standardized software platform that integrates biomedical, biological network data, and analytic software to support comprehensive network-based analysis of patient genomic data and drug re-positioning.