Yongjin Tang, Jinyuan Zheng, Xiaomeng Fu, Yang Chen and Donghong Lin*
Department of Clinical Laboratory, School of Medical Technology and Engineering, Fujian Medical University, Fujian, P.R. China
Received Date: March 09, 2019; Accepted Date: May 02, 2019; Published Date: May 09, 2019
Citation: Tang Y, Zheng J, Fu X, Chen Y (2019) Bioinformatics Analysis of Differentially Expressed Genes and Their Functional Enrichment in Acute Myeloid Leukemia Bearing MLL Translocation. Biomark J Vol.5 No.1:4
Background: Chromosomal translocations of the mixed-lineage leukemia gene (MLL; tMLL) correlate with resistance to therapy and an extremely poor prognosis for individuals with Acute Myeloid Leukemia (AML). The underlying mechanisms are still unknown. This study aims to identify the key genes and the potential molecular mechanisms involved in MLL rearrangement in AML using a bioinformatics approach.
Methods: The gene expression profiles from 15 individuals with partial tandem duplication of the MLL gene (MLL-PTD)-AML and 10 tMLL-AML samples were downloaded from the Gene Expression Omnibus (GEO) database. The Differentially Expressed Genes (DEGs) were selected and functional enrichment analyses were performed. The Protein-Protein Interaction (PPI) network was established and visualized in Cytoscape. The hub genes were identified by CytoHubba and significant modules were screened out by Molecular Complex Detection (MCODE).
Results: We categorized a total of 885 DEGs comprising 330 upregulated and 555 downregulated genes. The majority of DEGs were significantly enriched for calcium ion transmembrane transport, embryonic skeletal system morphogenesis and cell proliferation processes. Several pathways were enriched, including those associated with PI3K-Akt signaling and insulin resistance. We identified 32 hub genes and screened out 2 modules.
Conclusion: The genes we have identified in this study may represent potential biomarkers for MLL-rearranged AML and contribute to the development of novel therapeutic strategies.
Differentially expressed genes; Functional enrichment analysis; Proteinprotein interactions; Biomarker; tMLL; Acute myeloid leukemia
ACTR2: Actin Related Protein 2 Homolog; AML: Acute Myeloid Leukemia; ARPC3: Actin Related Protein 2/3 Complex subunit 3; ARPC5: Actin Related Protein 2/3 Complex subunit 5; BCR: BCR activator of RhoGEF and GTPase ; BP: Biological Process; CC: Cell Component; CTTN: Cortactin; CXCR4: C-X-C motif chemokine Receptor 4; DAVID: Database for Annotation, Visualization and Integrated Discovery; DEGs: Differentially Expressed Genes; GEO: Gene Expression Omnibus; GO: Gene Ontology; HOXB: Homeobox B; ISN: Insulin Signaling Network; KEGG: Kyoto Encyclopedia of Genes and Genomes; MCODE: Molecular Complex Detection; MF: Molecular Function; MLL: Mixed-Lineage Leukemia gene; MRC2: Mannose Receptor C, type 2; mTOR: Mammalian Target protein Rapamycin; PM: Plasma Membrane; PPI: Protein-Protein Interaction; PTD: Partial Tandem Duplication; RUNX2: RUNX family transcription factor 2; STRING: Search Tool for the Retrieval of Interacting Genes/Proteins.
Acute Myeloid Leukemia (AML) belongs to a class of highly heterogeneous hematologic malignancies, involving the abnormal proliferation of protocells in peripheral blood and bone marrow. The majority of AML patients bear abnormal chromosomal karyotypes. Cytogenetic abnormalities such as chromosomal translocations involving the mixed-lineage leukemia gene (MLL) are associated with a poor prognosis for AML patients . A small subset of AML patients develops treatment-induced secondary leukemias, in which the MLL gene may be rearranged to generate Partial Tandem Duplication (PTD), amplification, or fused to a partner gene through a chromosomal translocation (tMLL) . MLL was shown to be involved in more than 100 repeated translocations, mainly implicating nine translocation partners (AF4, AF9, ENL, AF10, AF6, ELL, AF1P, AF17 and SEPT6), accounting for almost 90% of total MLL rearrangements. It was reported that 15%-20% of all pediatric AML cases were caused by chromosome 11q23 translocations . Moreover, acute leukemias harboring MLL translocations accounted for 10% of all acute leukemias in humans . The presence of an MLL rearrangement is predictive of early relapse and an extremely poor prognosis in relation to many other types of leukemia [5-8].
Armstrong SA and his colleagues demonstrated that MLL translocations are associated with a distinct gene expression profile, distinguishable from typical AML . It was reported that the gene expression characteristics common to MLL rearranged in AML patients can identify abnormal genes related to MLL translocation specificity . Based on gene expression profiling, tMLL was shown to be a distinct subtype of AML. It was demonstrated that MLL fusion genes were different in MLL-PTD that displayed molecular heterogeneity, and a clear expression signature was identified for cases with MLL chimeric fusion genes, irrespective of lineage. However, no significantly distinct clustering was determined for the MLL-PTD and tMLL subtypes of AML . Given these findings, we performed bioinformatics analysis to explore whether there were any Differentially Expressed Genes (DEGs) between the two subsets of AML described, rather than defining separate expression signatures for tMLL and MLL-PTD.
With the development and the pervasive application of microarray technology, gene expression profiling analyses have become extremely valuable in biological research. Huge data sets can be generated, capable of providing abundant high-throughput data for gene mining. Bioinformatics analysis represents a powerful tool for the comprehensive analysis of cancer-associated genes. In this study, we aimed to identify DEGs by comparing MLL-PTD and tMLL data from the GEO database. Functional and pathway enrichment analyses were applied to explain the main characteristics. By analyzing the Protein-Protein Interaction (PPI) network, hub genes were identified and the underlying signaling pathways were explored. Our data could potentially contribute to the identification of novel prognostic factors and inform new treatment options for AML patients bearing MLL translocation.
Gene expression profile data
The gene expression profile dataset (GSE15013) was downloaded from the Gene Expression Omnibus database (https://www.ncbi.nlm.nih.gov/geo/). The gene expression profile was based on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array, Agilent Technologies, Santa Clara, CA, USA). The GSE15013 dataset contained a total of 25 samples, including 10 taken from AML patients with tMLL, and 15 taken from AML patients with MLL-PTD. These samples were divided into two groups in order to select for DEGs using GEO2R online software.
GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) performs comparisons between original submitter-supplied processed data tables, using the GEO query and limma R packages from the Bioconductor project, to identify DEGs across experimental conditions. In this study, GEO2R was used to analyze the DEGs in the comparison between tMLL and MLL-PTD samples. P-value<0.01 and |logFC|≥1 were set as threshold conditions for DEG selection.
GO ontology and KEGG pathway enrichment analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID, Version 6.8, https://david.ncifcrf.gov/) is a public online tool for functional annotation and enrichment analysis, used to reveal biological functions of large lists of genes. In order to determine the function of the DEGs we had identified, Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis features of the DAVID tool were used. P-values<0.05 were considered as statistically significant.
PPI network generation and module analysis
Protein-Protein Interaction (PPI) networks for the identified DEGs were generated using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, http://string-db.org/). STRING is a biological database and web resource of known and predicted protein–protein interactions, which can help to systematically deconvolute cellular processes. Herein, an interaction score of >0.7 (high confidence) was considered as a cut-off standard. The PPI network was visualized using Cytoscape (Version 3.5.0, http://www.cytoscape.org/), an open source bioinformatics platform for visualizing molecular interaction networks, and using in combination with gene expression profiles and other type of data, and re-checked the nodes in PPI network by several algorithms on CytoHubba. Hub genes were ranked by twelve different algorithms on CytoHubba. We screened out the top 32 genes with a high degree of connectivity by MCC and labeled them hub genes. To analyze the potential module relationships within the PPI network, we mapped the PPI network onto Cytoscape and used Cytoscape’s Molecular Complex Detection (MCODE) application. The MCODE analysis was used to screen out modules from the PPI network with a degree cutoff=2, node score cutoff=0.2, k-score=2 and max. depth=100. GO ontology and KEGG pathway analyses were also made to inform module discoveries.
Identification of DEGs
GEO2R analysis showed that a total of 885 DEGs were extracted from the GSE15013 dataset, comprising 330 upregulated and 555 downregulated genes. The top ten most significantly upregulated DEGs were MECOM, SENP6, AK2, F12, KCTD15, XAGE1B/// XAGE1E, APOC4-APOC2///APOC4///APOC2, PYGM, LOC401068 and ADGRG6, while the top ten most significantly downregulated DEGs were DST, KMT2A, NPR3, LCA5, LINC00982, FNBP1L, CDH4, NPR3, NFIX and TNFRSF21 (Table 1).
|ID||Gene Symbol||Log FC||P-Value|
Table 1: The most significantly upregulated and downregulated DEGs in tMLL-bearing AML samples (top ten are shown for each type of DEGs, tMLL versus MLL-PTD, p-value<0.01).
GO function and KEGG pathway enrichment analysis
The GO Biological Process (BP) function of the DAVID tool revealed that the upregulated DEGs were significantly enriched in calcium ion transmembrane transport, cilium assembly and osteoblast differentiation processes. The downregulated DEGs were mainly involved in hematopoiesis, cell proliferation, apoptosis, angiogenesis, phosphorylation of adhesion proteins, signal transduction, bone formation and gene expression regulation processes. For the GO Cell Component (CC) analysis, the upregulated DEGs were significantly enriched in cytoskeleton components, while the downregulated DEGs were distributed throughout the cell. The analysis of Molecular Function (MF) revealed that, the upregulated DEGs were significantly enriched in genes with roles in cell energy, metabolism maintenance, and calcium ion binding, while the downregulated DEGs were involved in collagen binding, calcium ion binding, metal ion binding and transcriptional activity. Data from GO functional enrichment analyses of upregulated and downregulated DEGs are shown in (Tables 2 and 3), respectively.
|GO:0070588~calcium ion transmembrane transport||8||0.000702436|
|GO:0005262~calcium channel activity||5||0.01125974|
|GO:0005509~calcium ion binding||16||0.034628231|
|GO:0004522~ribonuclease A activity||2||0.048843496|
Table 2: GO functional enrichment analysis of upregulated DEGs (p-value<0.05).
|GO:0048704~embryonic skeletal system morphogenesis||8||0.000010698|
|GO:0030198~extracellular matrix organization||14||0.000165077|
|GO:0045944~positive regulation of transcription from RNA polymerase II promoter||36||0.000659188|
|GO:0005887~integral component of plasma membrane||56||0.000020763|
|GO:0005509~calcium ion binding||25||0.012526355|
|GO:0046872~metal ion binding||57||0.016719123|
|GO:0003700~transcription factor activity, sequence-specific DNA binding||30||0.023490364|
|GO:0001077~transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-specific binding||11||0.023632956|
Table 3: GO functional enrichment analysis of downregulated DEGs (top five are shown for each category, p-value<0.05).
Additionally, KEGG pathway enrichment analysis showed that the upregulated DEGs were mainly associated with the insulin signaling pathway. The down regulated DEGs however, were enriched in twelve pathways, such as those involved in hematopoietic cell lineage development and PI3K-Akt signaling (Table 4).
|hsa04910||Insulin signaling pathway||5||0.049250182|
|hsa04640||Hematopoietic cell lineage||9||0.000780907|
|hsa05200||Pathways in cancer||18||0.007980125|
|hsa04974||Protein digestion and absorption||7||0.014962551|
|hsa04810||Regulation of actin cytoskeleton||11||0.021577969|
|hsa04151||PI3K-Akt signaling pathway||15||0.025592846|
|hsa05100||Bacterial invasion of epithelial cells||6||0.032323678|
|hsa05205||Proteoglycans in cancer||10||0.039012569|
|hsa04666||Fc gamma R-mediated phagocytosis||6||0.042460154|
Table 4: KEGG pathways enrichment analysis of DEGs (p-value<0.05).
PPI network generation and Hub gene identification
Using STRING analysis, a total of 520 PPI relationships between identified DEG products were obtained (Figure 1). A threshold degree >15 for the default filter was set to identify key genes. Using Cytoscape to visualize the PPI network analysis, 32 hub genes were selected, including ACTR2, ARPC3, ARPC5, CTTN, FNBP1L and APP (Figure 2).
Figure 1: The visualization analysis of 885 DEGs in the PPI network generated by Cytoscape.
Note: 520 PPI relationships were obtained between the 885 DEGs identified. The blue circles represent the DEGs in the PPI network, while the lines show the interactions between the DEGs. The red squares showed more significant genes, according degree more than 15 by default filter set as threshold.
Abbreviations: DEGs: Differentially Expressed Genes; PPI: Protein-Protein Interaction.
Figure 2: The 32 hub genes in the PPI network with high confidence.
Note: Hub genes were screened out in the PPI network using CytoHubba and visualized in STRING. Each circle represents a different hub gene, and the interconnecting lines represent the interaction between these hub genes.
Abbreviations: PPI: Protein-Protein Interaction; STRING: Search Tool for the Retrieval of Interacting Genes/Proteins.
In order to detect significant modules in the PPI network, we used the MCODE plug-in. The top two significant modules in MCODE with a score of ≥ 10 and nodes ≥ 10 were selected from the PPI network, including Module A (MCODE score=13.00 with 13 nodes) and Module B (MCODE score=12.11 with 19 nodes) (Figure 3). Biological process enrichment analysis performed using DAVID software revealed that Module A was involved mainly in endocytosis, the ephrin receptor signaling pathway, positive regulation of actin filament polymerization and, the cell projection morphogenesis process. Meanwhile, Module B was largely associated with signal transduction, cellular communication, and cellular response to stimuli, inflammatory responses, platelet activation and the coagulation process (Figure 4). KEGG pathway enrichment analysis showed that these two modules were for the most part connected with bacterial invasion of epithelial cells, endocytosis, and calcium and chemokine signaling pathways (Figure 5).
Figure 3: The significant modules were screened out from the PPI network.
Note: (A) Module A was selected out the following parameters (MCODE score=13.00 and nodes=13). (B) Modules B was selected out the following parameters (MCODE score=12.11 and nodes=19). The circles represented the different genes present in the two modules, and the interconnecting lines show the interactions between these genes.
Abbreviations: MCODE: Molecular Complex Detection; PPI: Protein-Protein Interaction.
Figure 4: The GO biological process enrichment analysis of modules A and B.
Note: (A) The GO biological process enrichment analysis of Module A. (B) The GO biological process enrichment analysis of Module B. The vertical axis represents the different biological process revealed by GO analysis of the two modules, and the horizontal axis shows the gene counts for the corresponding biological processes. P-value<0.05.
Abbreviation: GO: Gene Ontology.
Figure 5: The KEGG pathway enrichment analysis of modules A and B.
Note: (A) The KEGG pathway enrichment analysis of Module A. (B) The KEGG pathway enrichment analysis of Module B. The vertical axis represents the different pathways revealed by KEGG enrichment analysis in the two modules, and the horizontal axis shows gene count corresponding to the different pathways. P value<0.05.
Abbreviation: KEGG: Kyoto Encyclopedia of Genes and Genomes.
In recent years, there has been an increased emphasis on individualized and targeted therapy for Acute Myeloid Leukemia (AML) patients. However, the potential for chromosoma mutation and recombination processes (such as demonstrated by 11q23, t (6;9), and other genetic abnormalities) has meant that, the prognosis for AML patients remains poor and existing treatments still lack efficacy . Even with the arrival of improved treatment options such as allogeneic hematopoietic stem cell transplantation, the translocation of the mixed-lineage leukemia (tMLL) gene presents a huge challenge in the treatment of leukemia. AML patients harboring tMLL typically receive a relatively poor prognosis and are at an elevated risk of early relapse. It is therefore important to explore the mechanisms of tMLL in AML for the development of novel therapeutic strategies. Recently, gene expression profiling analysis has been widely used to reveal abnormal gene expression patterns related to AML, with the aim of identifying novel diagnostic and therapeutic targets.
In this study, 15 MLL-PTD and 10 tMLL-AML patient samples were selected from the GEO dataset of GSE15013. As a result, 885 DEG genes were identified, including 330 upregulated and 550 downregulated genes. To further understand the characters of these DEGs, we conducted GO functional and KEGG pathway analyses.
The functional enrichment analysis showed that the upregulated and downregulated DEGs were enriched in different roles in cell life processes. Among the genes in upregulated DEGs functional enrichment, mannose receptor C, type 2 (MRC2) is related to collagen turnover and cancer prognosis, and have been confirmed it played a vital role in Foxp3+ regulatory T cells in local dysfunctional immune environment . Kuo YH and his colleagues discovered back in 2009 that RUNX2 could be induced acute myeloid leukemia in cooperation with Cbfβ-SMMHC in mice . Subsequently, Schnerch et al.  reported RUNX2 was upregulated during leukemogenesis in an AML patient with an inherent RUNX2 haploinsufficiency and cleidocranial dysplasia. These insightful research findings along with microarray data point to an important role for RUNX2 in the development and progression of AML. Nevertheless, among the genes in downregulated DEGs functional enrichment, HOXB gene seemed to play an important role in biological processes. HOX homeobox genes play a role in both the early stem cell function as well as in later stages of hematopoietic differentiation, and that perturbations of HOX genes expression can be leukemogenic [16,17]. It has been presented those four HOXB genes (HOXB2, B3, B5 and B6) expression values were significantly differ between MLL-PTD and tMLL cases . It is reasonable to believe that HOXB genes play a crucial biological role in AML and may be a potential biomarker for prognosis and therapeutic target.
Our analyses revealed that the majority of the key genes were down-regulated. Actin related protein 2 homolog (ACTR2), actin related protein 2/3 complex subunit 3 (ARPC3) and actin related protein 2/3 complex subunit 5 (ARPC5) are known to be the major constituents of the ARP2/3 complex, which is located at the cell surface and is essential for the regulation of cell shape and motility through lamellipodial actin assembly and protrusion . The ARP2/3 complex is important for cell migration both in normal and malignant tumor cells [19,20]. In addition, alternatively spliced transcript variants of ARPC3 have been identified. It was reported that ARPC3 is linked to adipogenesis and lipid accumulation when bound to the ARPC3 promoterassociated CpG site . ARPC5 has been shown to be involved in cell migration and invasion in head and neck squamous cell carcinoma . Furthermore, ARPC5 may affect the proliferation of myeloma cells via the mammalian target protein rapamycin (mTOR) C1 signaling pathway . Numerous studies have shown that another hub gene, cortactin (CTTN), may regulate the actin cytoskeleton and thus participat in cellular movement, adhesion, polarization, contraction and other related processes [24-26]. CTTN is overexpressed in breast cancer , hepatocellular carcinoma , gastric carcinoma  and head and neck squamous cell carcinomas . The aberrant regulation of this gene also contributes to tumor cell invasion and metastasis and may serve as a prognostic factor and potential therapeutic target. In addition, a CTTN polymorphism may significantly increase cancer susceptibility. Some studies have indicated that the CTTN g.-9101C>T polymorphism may influence lymph node-positive colorectal and gastric cancer [31,32]. In Chronic Lymphocytic Leukemia (CLL) cells, CTTN is a checkpoint molecule at the intersection of BCR activator of RhoGEF and GTPase (BCR) and C-X-C motif chemokine receptor 4 (CXCR4) signaling pathways, thereby regulating malignant cell migration and the progression of leukemia . Considering that the aforementioned genes may impact on the migration, invasion or proliferation of tumor cells, we hypothesize that these genes may also be involved in the tumorigenesis and metabolism of AML bearing MLL translocations. Further experiments will be necessary to explore this relationship in more detail.
Another important gene APP, which encodes the amyloid precursor protein, was identified among the hub genes. APP is located on 21q21.3 and is involved the pathogenesis of Alzheimer's disease and Down’s syndrome . APP is mainly expressed in the brain but can also be expressed in non-neuronal tissues. An increasing number of studies have reported that APP may be involved in the growth of various cell types under physiological and pathological conditions. It has been demonstrated that APP overexpression in oral squamous cell carcinoma , pancreatic cancer , thyroid cancers , prostate cancer , and AML harboring complex karyotypes or t (8;21) [39,40], promoted cancer cells to proliferate and metastasize, negatively effecting disease prognosis. Furthermore, APP may cooperate with c-KIT mutation/overexpression in the regulation of cell apoptosis in AML1-ETO-positive leukemia via the PI3K/Akt signaling pathway, implying that APP could be considered as a new biomarker for targeted therapy . Although APP is closely related to the tumorigenesis of leukemia, its specific function and mechanism of action have not yet been completely determined in tMLL bearing AML, and further experiments will need to be performed.
Pathway analysis may provide more accurate information about genetic biological functions relationship than GO analysis. Here, we found that the KEGG pathway of upregulated DEGs was enriched in genes belonging to the insulin signaling pathway, and the downregulated DEGs were concentrated in 12 pathways, one of which was the PI3K/Akt signaling pathway. Recent studies have demonstrated that the PI3K/Akt signaling pathway is frequently activated in AML patients and plays an important role in the proliferation, survival and drug resistance of human AML cell lines . In addition, metabolic targeting of the PI3K/Akt/mTOR signaling pathway has been extensively studied as an anticancer strategy . Clinical studies of metabolic intervention in AML patients with isocitrate dehydrogenase mutations have shown promising results . The insulin signaling pathway is involved in a series of diseases, including cancer. The well-characterized Insulin Signaling Network (ISN)  comprises proteins with crucial roles in cell proliferation and death, linked with a cell population model and applied to data of a cell line of AML treated with a mammalian target of rapamycin inhibitor with antitumor activity . Currently, it is unclear whether the PI3K/Akt and insulin signaling pathways are involved in AML patients bearing tMLL. Further studies are needed to explore the roles of the DEGs associated with these signaling pathways.
In order to further our understanding of DEG function in tMLLbearing AML, two modules were determined by MCODE. Through functional enrichment, these modules were shown to be largely involved in the bacterial invasion of epithelial cells, endocytosis and calcium as well as chemokine signaling pathways. Endocytosis is defined as the process of macromolecule uptake across the Plasma Membrane (PM) and is involved in both the terminations of receptor signal transduction  and initiation of certain signaling cascades . The interplay between signal transduction and endocytosis affects cancer progression, as the changes in survival, proliferation, and migration signals are critical for metastasis. Numerous publications document relationship between "defective" endocytosis and cancerassociated mutations, translocations, or altered expression levels of genes implicated in cancer-linked endocytosis mechanism . Moreover, changes in the nature of both the calcium and the chemokine signaling pathways may be a contributing factor to the onset of many human diseases. We speculate that these mechanisms may exist in AML bearing MLL translocations, thus affecting the treatment and clinical prognosis of AML patients.
In summary, our study showcases preliminary research into the mechanisms involved in AML bearing MLL translocations. A total of 885 DEGs and 32 hub genes were selected by bioinformatics analysis. Of these, the key genes, ACTR2, ARPC3, ARPC5, CTTN and APP may be implicated in tumor progression and could potentially represent promising prognostic biomarkers and therapeutic targets for tMLL-AML patients. However, further experimental evidence is required to build on our promising data.
This study was support by the National Natural Science Foundation of China (No. 81371879).
Yongjin Tang and Donghong Lin raise the conception and design the program. Yongjin Tang and Jinyuan Zheng download and proceed the data, while Xiaomeng Fu and Yang Chen visualize the data. Yongjin Tang write the original draft and all authors revises the article.