Measuring Comparative Statistical Effectiveness of Cancer Subtype Categorization Using Gene Expression Data

  • Avila Clemenshia P. Department of Computer Science, Sri Ramakrishna College of Arts & Science, India
  • Deepa C. Department of Computer Science, Sri Ramakrishna College of Arts & Science, India


This work focused on the analysis of various gene expression-based cancer subtype classification approaches. Correctly classifying cancer subtypes is critical for understanding cancer pathophysiology and effectively treating cancer patients by using gene expression data to categorize cancer subtypes. When dealing with limited samples and high-dimensional biological data, most classifiers may suffer from overfitting and lower precision. The goal of this research is to develop a machine learning (ML) system capable of classifying human cancer subtypes based on gene expression data in cancer cells. These issues can be solved using ML algorithms such as Transductive Support Vector Machines (TSVM), Boosting Cascade Deep Forest (BCD Forest), Enhanced Neural Network Classifier (ENNC), Deep Flexible Neural Forest (DFN Forest), Convolutional Neural Network (CNN), and Cascade Flexible Neural Forest (CFN Forest). In inferring the benefits and rawbacks of these strategies, such as DFN Forest and CFN Forest, the findings are 95%.


cancer subtypes, gene expression data, machine learning, Deep Flexible Neural Forest (DFN Forest) strategy,


1. K.H. Park, V.H. Pham, K. Davagdorj, L. Munkhdalai, K.H. Ryu, A subtype classification of hematopoietic cancer using a machine learning approach, [in:] Recent Challenges in Intelligent Information and Database Systems Asian Conference on Intelligent Data and Database Systems, (ACIIDS 2021), T.P. Hong, K. Wojtkiewicz, R. Chawuthai, P. Sitek [Eds], Communications in Computer and Information Science, Vol. 1371, Springer, Singapore, pp. 113–121, 2021, doi: 10.1007/978-981-16-1685-3_10.
2. L. Zhang et al., Elastic net regularized Softmax regression methods for multi-subtype categorization in cancer, Current Bioinformatics, 15(3): 212–224, 2020, doi: 10.2174/1574893613666181112141724.
3. J.T.-H. Chang, Y.M. Lee, R.S. Huang, The impact of the Cancer Genome Atlas on lung cancer, Translational Research, 166(6): 568–585, 2015, doi: 10.1016/j.trsl.2015.08.001.
4. K. Chaudhary, O.B. Poirion, L. Lu, L.X. Garmire, Deep learning-based multi-omics integration robustly predicts survival in liver cancer, Clinical Cancer Research, 24(6): 1248–1259, 2018, doi: 10.1158/1078-0432.CCR-17-0853.
5. Y. Guo, X. Shang, Z. Li, Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer, Neurocomputing, 324(1): 20–30, 2019, doi: 10.1016/j.neucom.2018.03.072.
6. H. Liu, J. Li, L. Wong, A comparative study on feature selection and categorization methods using gene expression profiles and proteomic patterns, Genome Informatics, 13: 51–60, 2002, doi: 10.11234/gi1990.13.51.
7. D. Castillo et al., Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level, PloS ONE, 14(2): e0212127, 2019, doi: 10.1371/journal.pone.0212127.
8. P. García-Díaz, I. Sánchez-Berriel, J.A. Martínez-Rojas, A.M. Diez-Pascual, Unsupervised feature selection algorithm for multiclass cancer categorization of gene expression RNA-Seq data, Genomics 112(2): 1916–1925, 2020, doi: 10.1016/j.ygeno.2019.11.004.
9. S. Ramaswamy et al., Multiclass cancer diagnosis using tumor gene expression signatures, Proceedings of the National Academy of Sciences, 98(26): 15149–15154, 2001, doi: 10.1073/pnas.211566398.
10. K.H. Chen et al., Gene selection for cancer identification: A decision tree strategy empowered by particle swarm optimization algorithm, BMC Bioinformatics, 15(1): 49, 2014, doi: 10.1186/1471-2105-15-49.
11. L. Goh, Q. Song, N.K. Kasabov, A novel feature selection method to improve categorization of gene expression data, [in:] APBC ’04: Proceedings of the Second Conference on Asia-Pacific Bioinformatics, Dunedin, New Zealand, Vol. 29, pp. 161–166, 2004, doi: 10.5555/976520.976542.
12. T. Nguyen, A. Khosravi, D. Creighton, S. Nahavandi, Hidden Markov models for cancer classification using gene expression profiles, Information Sciences, 316: 293–307, 2015, doi: 10.1016/j.ins.2015.04.012.
13. Md.R. Karim, M. Cochez, O. Beyan, S. Decker, C. Lange, OncoNetExplainer: Explainable predictions of cancer types based on gene expression data, IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), Athens, Greece, pp. 415–422, 2019, doi: 10.1109/BIBE.2019.00081.
14. U. Maulik, A. Mukhopadhyay, D. Chakraborty, Gene-expression-based cancer subtypes prediction through feature selection and transductive SVM, IEEE Transactions on Biomedical Engineering, 60(4): 1111–1117, 2013, doi: 10.1109/TBME.2012.2225622.
15. T. Xu, T.D. Le, L. Liu, R. Wang, B. Sun, J. Li, Identifying cancer subtypes from miRNATF-mRNA regulatory networks and expression data, PloS ONE, 11(4): e0152792, 2016, doi: 10.1371/journal.pone.0152792.
16. H. Salem, G. Attiya, N. El-Fishawy, Classification of human cancer diseases by gene expression profiles, Applied Soft Computing, 50(1): 124–134, 2017, doi: 10.1016/j.asoc.2016.11.026.
17. Z.H. Zhou, J. Feng, Deep forest, National Science Review, 6(1): 74–86, 2019, doi: 10.1093/nsr/nwx118.
18. Y. Guo, S. Liu, Z. Li, X. Shang, BCDForest: A boosting cascade deep forest strategy towards the classification of cancer subtypes based on gene expression data, BMC Bioinformatics, 19(Suppl. 5): 118, 2018, doi: 10.1186/s12859-018-2095-4.
19. Y. Guo, Y. Qi, Z. Li, X. Shang, Improvement of cancer subtype prediction by incorporating transcriptome expression data and heterogeneous biological networks, BMC Medical Genomics, 11(Suppl. 6): 87–98, 2018, doi: 10.1186/s12920-018-0435-x.
20. P. Vasudevan, T. Murugesan, Cancer subtype discovery using prognosis-enhanced neural network classifier in multigenomic data, Technology in Cancer Research & Treatment, 17(7): 1–13, 2018, doi: 10.1177/1533033818790509.
21. K. Lee, H.O. Jeong, S. Lee, W.K. Jeong, CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network, Scientific Reports, 9(1): 16927, 2019, doi: 10.1038/s41598-019-53034-3.
22. Y. Dong, W. Yang, J.Wang, J. Zhao, Y. Qiang, MLW-gcForest: A multi-weighted gcForest model for cancer subtype classification by methylation data, Applied Sciences, 9(17): 3589, 2019, doi: 10.3390/app9173589.
23. J. Xu, P. Wu, Y. Chen, Q. Meng, H. Dawood, M.M. Khan, A novel deep flexible neural forest strategy for classification of cancer subtypes based on gene expression data, IEEE Access, 7(2): 22086–22095, 2019, doi: 10.1109/ACCESS.2019.2898723.
24. M. Mostavi, Y.C. Chiu, Y. Huang, Y. Chen, Convolutional neural network strategies for cancer type prediction based on gene expression, BMC Medical Genomics, 13(5): 44, 2020, doi: 10.1186/s12920-020-0677-2.
25. L. Zhong, Q. Meng, Y. Chen, A cascade flexible neural forest strategy for cancer subtype categorization on gene expression data, Computational Intelligence and Neuroscience, 2021: 6480456, 2021, doi: 10.1155/2021/6480456.
26. L. Zhong, Q. Meng, Y. Chen, L. Du, P. Wu, A laminar augmented cascading flexible neural forest strategy for classification of cancer subtypes based on gene expression data, BMC Bioinformatics, 22(1): 475, 2021, doi: 10.1186/s12859-021-04391-2.
27. R. Majji, G. Nalinipriya, C. Vidyadhari, R. Cristin, Jaya ant lion optimization-driven deep recurrent neural network for cancer categorization using gene expression data, Medical & Biological Engineering & Computing, 59(5): 1005–1021, 2021, doi: 10.1007/s11517-021-02350-w.
28. Z. Yu, Z. Wang, X. Yu, Z. Zhang, RNA-seq-based breast cancer subtypes classification using machine learning approaches, Computational Intelligence and Neuroscience, 2020: 4737969, 2020, doi: 10.1155/2020/4737969.
29. M.I. Jaber et al., A deep learning image-based intrinsic molecular subtype classifier of breast tumors reveals tumor heterogeneity that may affect survival, Breast Cancer Research, 22(1): 12, 2020, doi: 10.1186/s13058-020-1248-3.
30. C. Chakraborty, A. Kishor, J.J.P.C. Rodrigues, Novel enhanced-grey wolf optimization hybrid machine learning technique for biomedical data computation, Computers and Electrical Engineering, 99(6): 107778, 2022, doi: 10.1016/j.compeleceng.2022.107778.
Jun 6, 2024
How to Cite
P., Avila Clemenshia; C., Deepa. Measuring Comparative Statistical Effectiveness of Cancer Subtype Categorization Using Gene Expression Data. Computer Assisted Methods in Engineering and Science, [S.l.], v. 31, n. 2, p. 261–272, june 2024. ISSN 2956-5839. Available at: <>. Date accessed: 18 july 2024. doi:
[CLOSED]Scientific Computing and Learning Analytics for Smart Healthcare Systems