CT-derived model for the diagnosis of pulmonary invasive mucinous adenocarcinoma by machine learning
-
摘要:
目的 肺黏液腺癌是一种罕见的肺癌亚型,存在独特的分子生物学特征,并影响治疗方案的选择。本研究拟通过建立浸润性黏液腺癌的机器学习模型来提高治疗前黏液腺癌诊断的准确性。 方法 回顾性分析河北医科大学第四医院在2017年1月—2022年5月期间经穿刺活检或手术病理证实的620例肺浸润性腺癌患者资料。采用倾向性评分匹配法(PSM)进行1 : 1匹配后按7 : 3比例将患者随机分为训练集和测试集, 应用具有统计学差异的变量构建支持向量机(SVM)、随机森林(RF)及逻辑回归(LR)3种机器学习模型, 并通过AUC值选择最优模型。通过5折交叉验证方法分析最优机器学习模型AUC值及绘制决策曲线分析(decision curve analysis, DCA)曲线, 并构建诺莫图。 结果 结果显示病灶位于下叶、囊腔、支气管截断征及ΔCTV值是浸润性黏液性腺癌的独立预测因素。将以上4个特征通过机器学习构建预测模型并进行模型比较, 最终显示逻辑回归模型(AUC = 0. 801)为最优模型。将285例随机抽取30%为测试集(85例), 剩余样本作为训练集进行5折交叉验证, 逻辑回归模型在验证集中得到AUC为0. 777, 测试集中的AUC为0. 785, 准确度为0. 682, 训练集中的AUC为0. 803, 准确度为0. 749。最终构建逻辑回归模型的诺莫图, 模型校准曲线中的Briser Score为0. 149, 且绘制的DCA曲线同样显示该模型具有良好的预测能力及稳定性。 结论 通过对基于临床及CT特征的机器学习模型的分析, 构建了原发性肺浸润性黏液性腺癌的临床预测模型, 该模型具有潜在指导临床诊断的作用。 Abstract:Objective Lung mucinous adenocarcinoma is a rare subtype of lung cancer with unique molecular biology characteristics. It influences the choice of treatment options. We explore a machine learning model based on clinical and CT features in the diagnosis of lung invasive mucinous adenocarcinoma, propose to improve the diagnostic accuracy of pre-treatment mucinous adenocarcinoma. Methods A retrospective analysis of 620 cases with pulmonary invasive adenocarcinoma confirmed by needle biopsy or surgical pathology in the Fourth Hospital of Hebei Medical University from January 2017 to May 2022 was performed. After matching by using the propensity score matching (PSM) with a matching ratio 1 : 1, the patients were randomly divided into the training set and the test set based on the 7 : 3 ratio. Three machine learning models, namely, support vector machine (SVM), random forest (RF) and logistic regression (LR), were constructed using the variables with statistical differences, and the optimal model was selected by AUC values. The AUC value of the optimal machine learning model was analysed by 5-fold cross-validation method, the DCA curve was drawn to evaluate the diagnostic efficiency of the constructed model, and a Nomogram is constructed. Results Analysis showed that lesion location in the lower lobe, cystic lumen, bronchial truncation and ΔCTV value were independent predictive factors for invasive mucinous adenocarcinoma. The 4 above mentioned features were constructed by machine learning, and the prediction model was compared. Finally, the logistic regression model (AUC=0.801) was shown to be the optimal model. 30% of 285 cases were randomly selected as the test set (n=85 cases), and the remaining samples were used as the training set for 5-fold cross-validation. The logistic regression model obtained AUC of 0.777 in the validation set, AUC of 0.785 in the test set, accuracy of 0.682, AUC of 0.803 in the training set and accuracy of 0.749. Finally, the Nomogram of the logistic regression model was constructed. The Briser Score in the calibration curve of the model was 0.149, and the DCA curve also showed that the model had good predictive ability and stability in potential clinic application. Conclusion By using machine learning models based on clinical and CT features, a clinical prediction model for primary pulmonary invasive mucinous adenocarcinoma was constructed, which has a potential role in guiding clinical diagnosis. -
Key words:
- Primary lung cancer /
- Mucinous adenocarcinoma /
- CT features /
- Diagnosis /
- Model /
- Machine learning
-
表 1 2组浸润性肺腺癌患者临床资料及CT影像特征比较
Table 1. Comparison of clinical data and CT imaging characteristics of invasive pulmonary adenocarcinoma patients
项目 类别 非黏液腺癌(n=140) 黏液腺癌(n=145) 统计量 P值 性别[例(%)] 男性 75(53.571) 59(40.690) 4.745a 0.029 女性 65(46.429) 86(59.310) 毛刺[例(%)] 无 78(55.714) 83(57.241) 0.068a 0.795 有 62(44.286) 62(42.759) 囊腔[例(%)] 无 105(75.000) 71(48.966) 20.441a <0.001 有 35(25.000) 74(51.034) 血管穿行[例(%)] 无 107(76.429) 93(64.138) 5.141a 0.023 有 33(23.571) 52(35.862) 血管集束征[例(%)] 无 97(69.286) 105(72.414) 0.338a 0.561 有 43(30.714) 40(27.586) 充气支气管征[例(%)] 无 126(90.000) 119(82.069) 3.713a 0.054 有 14(10.000) 26(17.931) 支气管截断[例(%)] 无 85(60.714) 113(77.931) 9.955a 0.002 有 55(39.286) 32(22.069) 靠近胸膜[例(%)] 否 46(32.857) 58(40.000) 1.568a 0.210 是 94(67.143) 87(60.000) 下叶[例(%)] 否 81(57.857) 48(33.103) 17.616a <0.001 是 59(42.143) 97(66.897) 平扫CT值(x±s,Hu) 58.364±187.982 5.197±71.267 -3.704b <0.001 动脉期CT值(x±s,Hu) 29.950±180.688 23.134±82.044 -3.158b 0.002 静脉期CT值[M(P25, P75),Hu] 60.000(14.000, 74.000) 48.000(29.000, 67.000) 1.334c 0.182 ΔCTA值[M(P25, P75),Hu] 20.000(9.000, 30.000) 18.000(10.000, 37.000) 0.196c 0.845 ΔCTV值[M(P25, P75),Hu] 32.000(22.000, 51.000) 25.000(14.000, 46.000) 2.361c 0.018 最大径(x±s, cm) 2.916±1.519 2.872±1.682 0.230b 0.818 年龄[M(P25, P75),岁] 62.000(56.000, 66.000) 61.000(56.000, 67.000) 0.104c 0.918 注:a为χ2值,b为t值,c为Z值。 表 2 黏液腺癌预测因素的多因素分析
Table 2. Multivariate analysis of predictors of mucinous adenocarcinoma
预测因素 B SE Z值 P值 OR值 95% CI 下叶 0.878 0.261 3.361 0.001 2.406 1.447~4.035 囊腔 1.022 0.270 3.788 <0.001 2.779 1.646~4.751 支气管截断 -0.836 0.288 -2.904 0.004 0.433 0.244~0.757 ΔCTV值 -0.785 0.271 -2.897 0.004 0.456 0.266~0.771 表 3 不同机器学习模型的比较
Table 3. Comparisons of different machine learning models
模型 AUC Cutoff 准确度 灵敏度 特异度 阳性预测值 阴性预测值 F1分数 Kappa值 LR 0.801 0.530 0.721 0.744 0.787 0.667 0.780 0.703 0.444 RF 0.672 0.600 0.605 0.853 0.481 0.500 0.725 0.630 0.221 SVM 0.385 0.507 0.465 0.000 1.000 0.450 0.470 0.000 0.057 -
[1] SUNG H, FERLAY J, SIEGEL R L, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA Cancer J Clin, 2021, 71(3): 209-249. doi: 10.3322/caac.21660 [2] DONG R F, ZHU M L, LIU M M, et al. EGFR mutation mediates resistance to EGFR tyrosine kinase inhibitors in NSCLC: from molecular mechanisms to clinical research[J]. Pharmacol Res, 2021, 167: 105583. DOI: 10.1016/j.phrs.2021.105583. [3] MEMMOTT R M, WOLFE A R, CARBONE D P, et al. Predictors of response, progression-free survival, and overall survival in patients with lung cancer treated with immune checkpoint inhibitors[J]. J Thorac Oncol, 2021, 16(7): 1086-1098. doi: 10.1016/j.jtho.2021.03.017 [4] ALTMAYER S, VERMA N, FRANCISCO M Z, et al. Classification and imaging findings of lung neoplasms[J]. Semin Roentgenol, 2020, 55(1): 41-50. doi: 10.1053/j.ro.2019.10.002 [5] E L N, LU L, LI L, et al. Radiomics for classification of lung cancer histological subtypes based on nonenhanced computed tomography[J]. Acad Radiol 2019, 26(9): 1245-1252. doi: 10.1016/j.acra.2018.10.013 [6] LEE M A, KANG J, LEE H Y, et al. Spread through air spaces (STAS) in invasive mucinous adenocarcinoma of the lung: incidence, prognostic impact, and prediction based on clinicoradiologic factors[J]. Thorac Cancer, 2020, 11(11): 3145-3154. doi: 10.1111/1759-7714.13632 [7] SHANG G, JIN Y, ZHENG Q, et al. Histology and oncogenic driver alterations of lung adenocarcinoma in Chinese[J]. Am J Cancer Res, 2019, 9(6): 1212-1223. [8] LIN G, LI H, KUANG J, et al. Acinar-predominant pattern correlates with poorer prognosis in invasive mucinous adenocarcinoma of the lung[J]. Am J Clin Pathol, 2018, 149(5): 373-378. doi: 10.1093/ajcp/aqx170 [9] CAI L, WANG J, YAN J, et al. Genomic profiling and prognostic value analysis of genetic alterations in chinese resected lung cancer with invasive mucinous adenocarcinoma[J]. Front Oncol, 2020, 10: 603671. DOI: 10.3389/fonc.2020.603671. [10] GOW C H, HSIEH M S, LIU Y N, et al. Clinicopathological features and survival outcomes of primary pulmonary invasive mucinous adenocarcinoma[J]. Cancers (Basel), 2021, 13(16): 4103. doi: 10.3390/cancers13164103 [11] XU X, SHEN W, WANG D, et al. Clinical features and prognosis of resectable pulmonary primary invasive mucinous adenocarcinoma[J]. Transl Lung Cancer Res, 2022, 11(3): 420-431. doi: 10.21037/tlcr-22-190 [12] WANG T, YANG Y, LIU X, et al. Primary invasive mucinous adenocarcinoma of the lung: prognostic value of CT imaging features combined with clinical factors[J]. Korean J Radiol, 2021, 22(4): 652-662. doi: 10.3348/kjr.2020.0454 [13] 包杰, 金银华, 华奇峰, 等. 结合病理对原发性肺黏液腺癌的MSCT表现分析[J]. 医学影像学杂志, 2020, 30(5): 871-874. https://www.cnki.com.cn/Article/CJFDTOTAL-XYXZ202005042.htmBAO J, JIN YH, HUA QF, et al. Analysis of MSCT findings in primary pulmonary mucinous adenocarcinoma with pathology[J]. Journal of Medical Imaging, 2020, 30(5): 871-874. https://www.cnki.com.cn/Article/CJFDTOTAL-XYXZ202005042.htm [14] 邵元伟, 滕敏敏, 王晓蕾, 等. 原发性肺浸润性黏液腺癌的临床病理特征与CT表现[J]. 中国临床医学影像杂志, 2020, 31(10): 719-722, 726. https://www.cnki.com.cn/Article/CJFDTOTAL-LYYX202010011.htmSHAO Y W, TENG M M, WANG X L, et al. Clinicopathological features and CT findings of primary pulmonary invasive mucinous adenocarcinoma[J]. Journal of China Clinic Medical Imaging, 2020, 31(10): 719-722, 726. https://www.cnki.com.cn/Article/CJFDTOTAL-LYYX202010011.htm [15] 魏东波, 荆燕, 董强, 等. 肺实性结节性黏液腺癌CT特征及18F-FDG特点与相关病理基础研究[J]. 医学影像学杂志, 2020, 30(10): 1825-1828. https://www.cnki.com.cn/Article/CJFDTOTAL-XYXZ202010022.htmWEI DB, JING Y, DONG Q, et al. A research on pulmonary solid-nodular mucinous adenocarcinoma involved its features of CT finding, metabolic characteristics of 18F-FDG and related pathological basis[J]. Journal of Medical Imaging, 2020, 30(10): 1825-1828. https://www.cnki.com.cn/Article/CJFDTOTAL-XYXZ202010022.htm [16] NIE K, NIE W, ZHANG Y X, et al. Comparing clinicopathological features and prognosis of primary pulmonary invasive mucinous adenocarcinoma based on computed tomography findings[J]. Cancer Imaging, 2019, 19(1): 47. doi: 10.1186/s40644-019-0236-2