基于多标签机器学习的糖尿病肾病中医“同病异证”风险评估模型的构建

佟旭; 杨纯; 孟庆刚

doi:10.16766/j.cnki.issn.1674-4152.002307

基于多标签机器学习的糖尿病肾病中医“同病异证”风险评估模型的构建

doi: 10.16766/j.cnki.issn.1674-4152.002307

佟旭¹,
杨纯²,
孟庆刚^3, ,

1.
中国中医科学院中医基础理论研究所，北京 100700
2.
中国科学院信息工程研究所，北京 100195
3.
北京中医药大学中医学院中医药系统复杂性研究中心，北京 100029

基金项目:

国家自然科学基金面上项目 81473800

中国中医科学院中医基础理论研究所自主选题项目 YZ-202118

详细信息

通讯作者:
孟庆刚, E-mail: mqgangzy@126.com

中图分类号: R587.2 R255.4
计量
- 文章访问数: 1200
- HTML全文浏览量: 440
- PDF下载量: 65
- 被引次数: 0
出版历程
- 收稿日期: 2021-08-24
- 网络出版日期: 2022-03-04

Risk assessment model of diabetic nephropathy with "same disease and different syndromes" in traditional Chinese medicine based on multi-label machine learning

1.
Institute of Basic Theory of Traditional Chinese Medicine, Chinese Academy of Traditional Chinese Medicine, Beijing 100700, China

摘要

摘要: 目的基于多标签机器学习算法策略，构建符合中医特色的糖尿病肾病“同病异证”风险评估模型并比较其效能，为辅助中医药防治糖尿病肾病提供更高效的方法。方法利用8 795条糖尿病肾病诊疗数据，基于复杂网络社区发现算法进行特征选择，分别在“转化问题”与“算法适应”2种算法策略下，使用支持向量机(SVM)、自组织增强(AdaBoost)、多标签条件随机场(ML-RBF)、多标签最近邻(ML-KNN)等算法构建多标签学习模型，并使用5种评价指标对模型效能进行比较。结果最终构建了具有8 795条样本，113个指标、15个证型标签的糖尿病肾病多标签数据集。模型评价方面，ML-KNN在海明损失(Hamming Loss)、排序损失(ranking Loss)、覆盖度(Coverage)指标上性能最好；SVM在1-错误率(one-error)指标上出现3次最小值，但仍以KNN的one-error指标平均值最佳；4种模型的平均精度(average precision)均在90%以上，以ML-KNN及ML-RBF性能相对最佳。上述4种模型在糖尿病肾病“同病异证”的多证型风险评估方面均具有较好的诊断效能，综合来看ML-KNN性能相对最优。结论多标签机器学习算法能够用于中医多证型等复杂情况的风险评估，为辅助中医药防治糖尿病肾病提供参考，也为多标签机器学习在全科医学临床多病种诊疗的应用提供方法学借鉴。
- 多标签机器学习 /
- 糖尿病肾病 /
- 中医 /
- 风险评估 /
- 全科医学
Abstract: Objective To construct a risk assessment model of diabetic nephropathy with "same disease and different syndromes" in traditional Chinese medicine based on the multi-label machine learning algorithm and compare its effectiveness, and to provides an efficient way to assist traditional Chinese medicine in preventing and treating diabetic nephropathy. Methods Based on the data of 8 795 diabetic nephropathy, feature selection was carried out based on the complex network community detection algorithm. Under the two algorithms of "transformation problem" and "algorithm adaptation", the SVM, AdaBoost, ML-RBF and ML-KNN algorithms were used to construct the multi-label learning model, and five evaluation indexes were used to compare the model efficiency. Results A multi-label dataset of diabetic nephropathy with 8 795 samples, 113 characteristics and 15 syndrome types was constructed. In terms of model evaluation, ML-KNN had the best performance in Hamming loss, ranking loss and coverage indicators; SVM had three minimum values on one error index, but the average value of one error index of KNN was still the best. The average precision of the four models was more than 90%, and the performance of ML-KNN and ML-RBF were relatively the best. The above four models had better diagnostic efficiency in the multiple syndrome risk assessment of diabetic nephropathy with "same disease and different syndromes", and ML-KNN performance was relatively optimal. Conclusion The multi-label machine learning algorithm can be applied to the risk assessment of complex syndromes, such as TCM. It provides a reference for assisting Chinese medicine in the prevention and treatment of diabetic nephropathy and provides a methodological reference for the application of multi-label machine learning in clinical multi-disease diagnosis and treatment in general practice.
- Multi-label machine learning /
- Diabetic nephropathy /
- Traditional Chinese medicine /
- Risk assessment /
- General practice

HTML全文

图 1 “1对r”的支持向量机算法示意图

下载: 全尺寸图片幻灯片

图 2 AdaBoost用于多标签分类算法示意图

下载: 全尺寸图片幻灯片

图 3 ML-RBF神经网络架构

下载: 全尺寸图片幻灯片

图 4 4种模型效能的极坐标折线图

注：A为4种模型Hamming loss十折交叉验证结果；B为4种模型ranking loss十折交叉验证结果；C为4种模型one-error十折交叉验证结果；D为4种模型coverage十折交叉验证结果；E为4种模型average precision十折交叉验证结果。

下载: 全尺寸图片幻灯片

表 1 糖尿病肾病多标签数据集

序号	证型标签数量	指标数量
1	唯一证型标签	64
2	2个证型标签	31
3	3个证型标签	11
4	4个证型标签	6
5	5个证型标签	0
6	6个证型标签	1
注：证型标签总数=15，特征总数=113。

下载: 导出CSV

表 2 不同模型的性能比较(x ±s)

评价指标	不同算法
评价指标	SVM	AdaBoost	ML-RBF	ML-KNN
Hamming loss↓	0.039 4±0.004 5	0.043 1±0.001 3	0.040 5±0.001 4	0.030 2±0.002 3^a
Ranking loss↓	0.069 3±0.018 9	0.073 0±0.008 9	0.071 8±0.010 4	0.065 4±0.016 0^a
One-error↓	0.070 3±0.018 4	0.089 0±0.008 8	0.073 8±0.010 4	0.063 5±0.017 0^a
Coverage↓	0.569 1±0.012 4	0.643 9±0.011 6	0.595 1±0.010 7	0.523 0±0.010 6^a
Average precision↑	0.923 2±0.039 3	0.914 5±0.038 5	0.924 9±0.041 2	0.933 5±0.033 5^a
注：箭头方向为上“↑”代表该值越大，模型效能越好；箭头方向为下“↓”代表该值越小，模型的分效能越好，相对最优的结果以a上标表示。

下载: 导出CSV

参考文献(15)

[1]	蔡珊珊, 杨嘉恩, 梁惠卿. 非酒精性脂肪性肝病"同病异证"临床指标研究进展[J]. 中国中医药信息杂志, 2020, 27(1): 137-140. https://www.cnki.com.cn/Article/CJFDTOTAL-XXYY202001031.htm
[2]	朱毛, 李秋容. 糖尿病肾病患者24 h尿蛋白、血清胱抑素C水平与中医证型的相关性[J]. 中国卫生检验杂志, 2021, 31(15): 1864-1867. https://www.cnki.com.cn/Article/CJFDTOTAL-ZWJZ202115021.htm
[3]	邱文超, 郭雪梅, 朱穆朗玛, 等. 中医药治疗糖尿病肾病研究进展[J]. 辽宁中医药大学学报, 2021, 23(4): 157-162. https://www.cnki.com.cn/Article/CJFDTOTAL-LZXB202104035.htm
[4]	TANG G, LI S, ZHANG C, et al. Clinical efficacies, underlying mechanisms and molecular targets of Chinese medicines for diabetic nephropathy treatment and management[J]. Acta Pharm Sin B, 2021, 11(9): 2749-2767. doi: 10.1016/j.apsb.2020.12.020
[5]	刘睿卓, 远方, 宫成军. 加味杞菊地黄汤治疗肝肾阴虚兼血瘀证糖尿病肾病患者的临床疗效及对血清VEGF、IGF-1、TGF-β1水平的影响[J]. 世界中西医结合杂志, 2021, 16(6): 1058-1062, 1067. https://www.cnki.com.cn/Article/CJFDTOTAL-SJZX202106017.htm
[6]	LIU Z, HE H, YAN S, et al. End-to-end models to imitate traditional chinese medicine syndrome differentiation in lung cancer diagnosis: Model development and validation[J]. JMIR Med Inform, 2020, 8(6): e17821. doi: 10.2196/17821
[7]	李本岳, 李伟荣, 潘华峰, 等. 人工智能对中医诊断的影响[J]. 世界科学技术-中医药现代化, 2020, 22(5): 1624-1628. https://www.cnki.com.cn/Article/CJFDTOTAL-SJKX202005039.htm
[8]	ZHANG H, NI W, LI J, et al. Artificial intelligence-based traditional chinese medicine assistive diagnostic system: Validation study[J]. JMIR Med Inform, 2020, 8(6): e17608. doi: 10.2196/17608
[9]	佟旭, 孟庆刚. 基于社区发现的中医多标签数据特征选择研究[J]. 中华中医药杂志, 2016, 31(11): 4763-4765. https://www.cnki.com.cn/Article/CJFDTOTAL-BXYY201611112.htm
[10]	佟旭. 中医药多标签数据特征选择软件: 4688170[CP]. 国家计算机版权局, 2019-12-17.
[11]	汪海燕, 黎建辉, 杨风雷. 支持向量机理论及算法研究综述[J]. 计算机应用研究, 2014, 31(5): 1281-1286. doi: 10.3969/j.issn.1001-3695.2014.05.001
[12]	奉国和. SVM分类核函数及参数选择比较[J]. 计算机工程与应用, 2011, 47(3): 123-124, 128. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG201103038.htm
[13]	于晓松. 新中国成立70年以来中国全科医学发展与展望[J]. 中华全科医学, 2019, 17(11): 1797-1799. https://www.cnki.com.cn/Article/CJFDTOTAL-SYQY201911001.htm
[14]	李子贇, 佟旭, 李海玉. 中医全科医学诊疗实践及发展优势[J]. 中华全科医学, 2020, 18(9): 1433-1436. https://www.cnki.com.cn/Article/CJFDTOTAL-SYQY202009001.htm
[15]	陈国湘, 李俊, 韦华, 等. 基于人工智能技术的全科医生培养模式探索[J]. 中华全科医学, 2021, 19(2): 167-170. https://www.cnki.com.cn/Article/CJFDTOTAL-SYQY202102001.htm