Classification and biomarker selection in lower-grade glioma using robust sparse logistic regression applied to RNA-seq data

Main Article Content

João Carrilho
Marta B. Lopes

Abstract

Effective diagnosis and treatment in cancer is a barrier for the development of personalized medicine, mostly due to tumor heterogeneity. In the particular case of gliomas, highly heterogeneous brain tumors at the histological, cellular and molecular levels, and exhibiting poor prognosis, the mechanisms behind tumor heterogeneity and progression remain poorly understood. The recent advances in biomedical high-throughput technologies have allowed the generation of large amounts of molecular information from the patients that combined with statistical and machine learning techniques can be used for the definition of glioma subtypes and targeted therapies, an invaluable contribution to disease understanding and effective management.
In this work sparse and robust sparse logistic regression models with the elastic net penalty were applied to glioma RNA-seq data from The Cancer Genome Atlas (TCGA), to identify relevant transcriptomic features in the separation between lower-grade glioma (LGG) subtypes and identify putative outlying observations. In general, all classification models yielded good accuracies, selecting different sets of genes. Among the genes selected by the models, TXNDC12, TOMM20, PKIA, CARD8 and TAF12 have been reported as genes with relevant role in glioma development and progression. This highlights the suitability of the present approach to disclose relevant genes and fosters the biological validation of non-reported genes.

Article Details

How to Cite
Carrilho, J., & Lopes, M. B. (2022). Classification and biomarker selection in lower-grade glioma using robust sparse logistic regression applied to RNA-seq data. Brazilian Journal of Biometrics, 40(4), 371–381. https://doi.org/10.28951/bjb.v40i4.634
Section
Articles

References

Alfons, A., Croux, C. & Gelper, S. Sparse Least Trimmed Squares regression for analyzinghigh-dimensional large data sets.0The Annals of Applied Statistics7, 226–249 (2013).

Cai, Z., Poulos, R. C., Liu, J. & Zhong, Q. Machine learning for multi-omics data integrationin cancer. iScience25, 103798 (2022).

Chen, H.VennDiagram: Generate High-Resolution Venn and Euler PlotsR package version 1.7.3 (2022). https://CRAN.R-project.org/package=VennDiagram.

Friedman, J., Hastie, T. & Tibshirani, R. Regularization Paths for Generalized Linear Modelsvia Coordinate Descent.Journal of Statistical Software33,1–22. https://www.jstatsoft.org/v33/i01/ (2010).

Hastie, T., Tibshirani, R. & Wainwright, M.Statistical Learning with Sparsity: The Lasso andGeneralizations143 (CRC Press, 2015).

Jensch, A., Lopes, M. B., Vinga, S. & Radde, N. ROSIE: Robust sparse ensemble for outlierdetection and gene selection in cancer omics data. Statistical Methods in Medical Research31,947–958 (2022).

Kang, K., Xie, F., Wu, Y., Han, C., Bai, Y., Long, J., Lian, X. & Zhang, F. Genomic instabilityin lower-grade glioma: Prediction of prognosis based on lncRNA and immune infiltration.Molecular Therapy - Oncolytics 22,431–443 (2021).

Konopka, T.umap: Uniform Manifold Approximation and ProjectionR package version 0.2.8.0 (2022). https://CRAN.R-project.org/package=umap.

Kurnaz, F. S., Hoffmann, I. & Filzmoser, P.enetLTS: Robust and Sparse Methods for High Dimensional Linear and Binary and Multinomial RegressionR package version 1.1.0 (2022). https://CRAN.R-project.org/package=enetLTS.

Kurnaz, F. S., Hoffmann, I. & Filzmoser, P. Robust and sparse estimation methods for high-dimensional linear and logistic regression. Chemometrics and Intelligent Laboratory Systems 172, 211–222 (2018).

Liu, Z., Gartenhaus, R. B., Tan, M., Jiang, F. & Jiao, X. Gene and pathway identification with Lp penalized Bayesian logistic regression.BMC Bioinformatics 412 (2008).

Lopes, M. B. & Vinga, S. Tracking intratumoral heterogeneity in glioblastoma via regularizedclassification of single-cell RNA-Seq data.BMC Bioinformatics 21, 59 (2020).

Louis, D. N.et al.The 2021 WHO classification of tumors of the Central Nervous System: a summary. Neuro-Oncology 23, 1231–1251 (2021).

McInnes, L., Healy, J. & Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).

Nakamoto, T.et al.Prediction of malignant glioma grades using contrast-enhanced T1-weightedand T2-weighted magnetic resonance images based on a radiomic analysis. Scientific Reports 9 (2019).

R Core Team.R: A Language and Environment for Statistical ComputingR Foundation for Statistical Computing (Vienna, Austria, 2022). https://www.R-project.org/.

Ratushna, O. O. Glucose deprivation affects the expression of genes encoding cAMP-activated protein kinase and related proteins in U87 glioma cells in ERN1 dependent manner. Endocrine Regulations 54, 244–254 (2020).

Ren, J., Lou, M., Shi, J., Xue, Y. & Cui, D. Identifying the genes regulated by IDH1 via gene-chip in glioma cell U87.International journal of clinical and experimental medicine 8, 18090 (2015).

Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C. & Müller, M. pROC:an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).

Rousseeuw, P. J. Least median of squares regression. Journal of the American Statistical Society 79, 971–880 (2013).

Rousseeuw, P. J. & Driessen, K. V. Computing LTS regression for large data sets. Data Miningand Knowledge Discovery 12, 29–45 (2006).

Rousseeuw, P. J. & Leroy, A. M.Robust regression and outlier detection (John wiley & sons, 2005).

Sahm, F.et al.Farewell to oligoastrocytoma: in situ molecular genetics favor classification as either oligodendroglioma or astrocytoma. Acta Neuropathologica 128, 551–559 (2014).

Segaert, P, Lopes, M. B., Casimiro, S, Vinga, S & Rousseeuw, P. J. Robust identification oftarget genes and outliers in triple-negative breast cancer data. Statistical Methods in Medical Research 28, 3042–3056 (2019).

Serfling, R. & Wang, S. General foundations for studying masking and swamping robustness of outlier identifiers. Statistical Methodology 20, 79–90 (2014).

Sharma, N., Saxena, S., Agrawal, I., Singh, S., Srinivasan, V., Arvind, S., Epari, S., Paul, S.& Jha, S. Differential Expression Profile of NLRs and AIM2 in Glioma and Implications for NLRP12 in Glioblastoma. Scientific Reports 9, 8480 (2019).

Sun, H, Wang, J, Zhang, Z, Hu, N & Wang, T. An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data. Computational and Mathematical Methods in Medicine 9436582 (2021).

Wang, X., Yang, Q., Liu, N., Bian, Q., Gao, M. & Hou, X. Clinical Value of TXNDC12Combined With IDH And 1p19q as Biomarkers for Prognosis of Glioma. Pathology & Oncology Research 27,1609825 (2021).

WHO. WHO Classification of Tumours Editorial Board. World Health Organization Classification of Tumours of the Central Nervous System.5th ed. Lyon: International Agency for Research on Cancer (2021).

Wickham, H., Hester, J. & Bryan, J.readr: Read Rectangular Text DataR package version 2.1.2 (2022). https://CRAN.R-project.org/package=readr.

Wijethilake, N., Meedeniya, D., Chitraranjan, C. & Perera, I.Survival prediction and risk estimation of Glioma patients using mRNA expressions in 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE) (2020), 35–42.

Wu, Y., Guo, Y., Ma, J., Sa, Y., Li, Q. & Zhang, N. Research Progress of Gliomas in Machine Learning. Cells 10, 3169 (2021).

Youssef, G. & Miller, J. J. Lower Grade Gliomas. Current Neurology and Neuroscience Reports 20 (2020).

Zheng, J., Zhou, Z., Qiu, Y., Wang, M., Yu, H., Wu, Z., Wang, X. & Jiang, X. A Pyroptosis-Related Gene Prognostic Index Correlated with Survival and Immune Microenvironment in Glioma.Journal of Inflammation Research 15,17–32 (2022).

Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society 67,301–320 (2005).