PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Examination of summarized medical records for ICD code classification via BERT

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The International Classification of Diseases (ICD) is utilized by member countries of the World Health Organization (WHO). It is a critical system to ensure worldwide standardization of diagnosis codes, which enables data comparison and analysis across various nations. The ICD system is essential in supporting payment systems, healthcare research, service planning, and quality and safety management. However, the sophisticated and intricate structure of the ICD system can sometimes cause issues such as longer examination times, increased training expenses, a greater need for human resources, problems with payment systems due to inaccurate coding, and unreliable data in health research. Additionally, machine learning models that use automated ICD systems face difficulties with lengthy medical notes. To tackle this challenge, the present study aims to utilize Medical Information Mart for Intensive Care (MIMIC-III) medical notes that have been summarized using the term frequency-inverse document frequency (TF-IDF) method. These notes are further analyzed using deep learning, specifically bidirectional encoder representations from transformers (BERT), to classify disease diagnoses based on ICD codes. Even though the proposed methodology using summarized data provides lower accuracy performance than state-of-the-art methods, the performance results obtained are promising in terms of continuing the study of extracting summary input and more important features, as it provides real-time ICD code classification and more explainable inputs.
Rocznik
Strony
60--74
Opis fizyczny
Bibliogr. 49 poz., fig., tab.
Twórcy
  • Aalborg University, Department of Materials and Production, Operations Research Group, Denmark
  • Aalborg University, Department of Materials and Production, Operations Research Group, Denmark
  • Aalborg University, Department of Materials and Production, Operations Research Group, Denmark
Bibliografia
  • [1] Alsentzer, E., Murphy, J. R., Boag, W., Weng, W. H., Jin, D., Naumann, T., & McDermott, M. (2019). Publicly available clinical BERT embeddings. ArXiv, abs/1904.03323. https://doi.org/10.48550/arXiv.1904.03323
  • [2] Baumel, T., Nassour-Kassis, J., Cohen, R., Elhadad, M., & Elhadad, N. (2018). Multi-label classification of patient notes: case study on ICD code assignment. In Workshops at the thirty-second AAAI conference on artificial intelligence. ArXiv, abs/1709.09587. https://doi.org/10.48550/arXiv.1709.09587
  • [3] Bhargava, P., Drozd, A., & Rogers, A. (2021). Generalization in NLI: Ways (not) to go beyond simple heuristics. arXiv preprint. ArXiv, abs/2110.01518. https://doi.org/10.48550/arXiv.2110.01518
  • [4] Cao, P., Chen, Y., Liu, K., Zhao, J., Liu, S., & Chong, W. (2020a). HyperCore: Hyperbolic and co-graph representation for automatic ICD coding. 58th Annual Meeting of the Association for Computational Linguistics (pp. 3105-3114). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.282
  • [5] Cao, P., Yan, C., Fu, X., Chen, Y., Liu, K., Zhao, J., Liu, S., & Chong, W. (2020b). Clinical-coder: Assigning interpretable ICD-10 codes to Chinese clinical notes. 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 294-301). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-demos.33
  • [6] Chen, P. F., Wang, S. M., Liao, W. C., Kuo, L. C., Chen, K. C., Lin, Y. C., Yang, C., Chiu, C., Chang, S., & Lai, F. (2021). Automatic ICD-10 coding and training system: deep neural network based on supervised learning. JMIR Medical Informatics, 9(8), e23230. https://doi.org/10.2196/23230
  • [7] Chute, C. G., & Çelik, C. (2021). Overview of ICD-11 architecture and structure. BMC Medical Informatics and Decision Making, 21(6), 378. https://doi.org/10.1186/s12911-021-01539-1
  • [8] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805. https://doi.org/10.48550/arXiv.1810.04805
  • [9] Du, Y., Xu, T., Ma, J., Cen, E., Zheng, Y., Liu, T., & Tong, G. (2020). An automatic ICD coding method for clinical records based on deep neural network. Big Data Research, 6(5), 3-15. https://doi.org/10.11959/j.issn.2096-0271.2020040
  • [10] European Commission, Eurostat, (2018). European statistics code of practice: for the national statistical authorities and Eurostat (EU statistical authority), Publications Office of the European Union. https://data.europa.eu/doi/10.2785/798269
  • [11] European Commission. (2020). Strategic plan 2020-2024. https://commission.europa.eu/system/files/2020-10/eac_sp_2020_2024_en.pdf
  • [12] Eurostat. (2012). Healthcare statistics. https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Healthcare_statistics&oldid=86497
  • [13] Farkas, R., & Szarvas, G. (2008). Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinformatics, 9(3), S10. https://doi.org/10.1186/1471-2105-9-S3-S10
  • [14] Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., Mietus, J. E., Moody, G. B., Peng, C. K., & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), e215–e220. https://doi.org/10.1161/01.cir.101.23.e215
  • [15] Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J. & Poon, H. (2021). Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare, 3(1), 1-23. https://doi.org/10.1145/3458754
  • [16] Harrison, J. E., Weber, S., Jakob, R., & Chute, C. G. (2021). ICD-11: an international classification of diseases for the twenty-first century. BMC Medical Informatics and Decision Making, 21(6), 206. https://doi.org/10.1186/s12911-021-01534-6
  • [17] Hsu, J. L., Hsu, T. J., Hsieh, C. H., & Singaravelan, A. (2020). Applying convolutional neural networks to predict the ICD-9 codes of medical records. Sensors, 20(24), 7116. https://doi.org/10.3390/s20247116
  • [18] Huang, C. W., Tsai, S. C., & Chen, Y. N. (2022). PLM-ICD: automatic ICD coding with pretrained language models. ArXiv, abs/2207.05289. https://doi.org/10.48550/arXiv.2207.05289
  • [19] Huang, J., Osorio, C., & Sy, L. W. (2019). An empirical evaluation of deep learning for ICD-9 code assignment using MIMIC-III clinical notes. Computer Methods and Programs in Biomedicine, 177, 141–153. https://doi.org/10.1016/j.cmpb.2019.05.024
  • [20] Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016b). MIMIC-III, a freely accessible critical care database. Scientific data, 3, 160035. https://doi.org/10.1038/sdata.2016.35
  • [21] Johnson, A., Pollard, T., & Mark, R. (2016a). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26
  • [22] Kaur, R., & Ginige, J. A. (2018). Comparative analysis of algorithmic approaches for auto-coding with ICD-10-AM and ACHI. IOS Press, 252, 73-79. https://doi.org/10.3233/978-1-61499-890-7-73
  • [23] Kaur, R., Ginige, J. A., & Obst, O. (2021). A systematic literature review of automated ICD coding and classification systems using discharge summaries. ArXiv, abs/2107.10652. https://doi.org/10.48550/arXiv.2107.10652
  • [24] Li, F., & Yu, H. (2020). ICD coding from clinical text using multi-filter residual convolutional neural network. AAAI conference on artificial intelligence (pp. 8180-8187). https://doi.org/10.1609/aaai.v34i05.6331
  • [25] Li, M., Fei, Z., Zeng, M., Wu, F. X., Li, Y., Pan, Y., & Wang, J. (2019). Automated ICD-9 coding via a deep learning approach. IEEE/ACM transactions on computational biology and bioinformatics, 16(4), 1193-1202. https://doi.org/10.1109/TCBB.2018.2817488
  • [26] Marafino, B. J., Davies, J. M., Bardach, N. S., Dean, M. L., & Dudley, R. A. (2014). N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit. Journal of the American Medical Informatics Association, 21(5), 871-875. https://doi.org/10.1136/amiajnl-2014-002694
  • [27] Minh, D., Wang, H. X., Li, Y. F., & Nguyen, T. N. (2022). Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review, 55, 3503-3568. https://doi.org/10.1007/s10462-021-10088-y
  • [28] Moons, E., Khanna, A., Akkasi, A., & Moens, M. F. (2020). A comparison of deep learning methods for ICD coding of clinical records. Applied Sciences, 10(15), 5262. https://doi.org/10.3390/app10155262
  • [29] Mullenbach, J., Wiegreffe, S., Duke, J., Sun, J., & Eisenstein, J. (2018). Explainable prediction of medical codes from clinical text. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1101-1111). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1100
  • [30] Nawalkar, N., Attar, V. Z., & Kalamkar, S. P. (2022). Automated icd-9 medical code assignment from given free text using deep learning approach. In S. Tiwari, M. C. Trivedi, M. L. Kolhe, K. K. Mishra, & B. K. Singh (Eds.), Advances in Data and Information Sciences (Vol. 318, pp. 317–327). Springer Singapore. https://doi.org/10.1007/978-981-16-5689-7_28
  • [31] Pascual, D., Luck, S., & Wattenhofer, R. (2021). Towards BERT-based automatic ICD coding: Limitations and opportunities. ArXiv, abs/2104.06709. https://doi.org/10.48550/arXiv.2104.06709
  • [32] Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., & Elhadad, N. (2014). Diagnosis code assignment: models and evaluation metrics. Journal of the American Medical Informatics Association, 21(2), 231-237. https://doi.org/10.1136/amiajnl-2013-002159
  • [33] Pezzella, P. (2022). The ICD‐11 is now officially in effect. World Psychiatry, 21(2), 331-332. https://doi.org/10.1002/wps.20982
  • [34] Ponthongmak, W., Thammasudjarit, R., McKay, G. J., Attia, J., Theera-Ampornpunt, N., & Thakkinstian, A. (2023). Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches. Informatics in Medicine Unlocked, 38, 101227. https://doi.org/10.1016/j.imu.2023.101227
  • [35] Rios, A., & Kavuluru, R. (2018). Few-shot and zero-shot multi-label learning for structured label spaces. Conference on Empirical Methods in Natural Language Processing (pp. 3132-3142). Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1352
  • [36] Scheurwegs, E., Luyckx, K., Luyten, L., Daelemans, W., & Van den Bulcke, T. (2016). Data integration of structured and unstructured sources for assigning clinical codes to patient stays. Journal of the American Medical Informatics Association, 23(e1), e11-e19. https://doi.org/10.1093/jamia/ocv115
  • [37] Shi, H., Xie, P., Hu, Z., Zhang, M., & Xing, E. P. (2017). Towards automated ICD coding using deep learning. ArXiv, abs/1711.04075. https://doi.org/10.48550/arXiv.1711.04075
  • [38] Singaravelan, A., Hsieh, C. H., Liao, Y. K., & Hsu, J. L. (2021). Predicting icd-9 codes using self-report of patients. Applied Sciences, 11(21), 10046. https://doi.org/10.3390/app112110046
  • [39] Tabassum, A., & Patil, R. R. (2020). A survey on text pre-processing & feature extraction techniques in natural language processing. International Research Journal of Engineering and Technology (IRJET), 7(06), 4864-4867.
  • [40] Teng, F., Liu, Y., Li, T., Zhang, Y., Li, S., & Zhao, Y. (2022). A review on deep neural networks for ICD coding. IEEE Transactions on Knowledge and Data Engineering, 35(5), 4357-4375. https://doi.org/10.1109/TKDE.2022.3148267
  • [41] Turc, I., Chang, M. W., Lee, K., & Toutanova, K. (2019). Well-read students learn better: On the importance of pre-training compact models. ArXiv, abs/1908.08962. https://doi.org/10.48550/arXiv.1908.08962
  • [42] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. ArXiv, abs/1706.03762. https://doi.org/10.48550/arXiv.1706.03762
  • [43] Vu, T., Nguyen, D. Q., & Nguyen, A. (2020). A label attention model for ICD coding from clinical text. ArXiv, abs/2007.06351. https://doi.org/10.48550/arXiv.2007.06351
  • [44] Wang, D., Su, J., & Yu, H. (2020). Feature extraction and analysis of natural language processing for Deep Learning english language. IEEE Access, 8, 46335-46345. https://doi.org/10.1109/ACCESS.2020.2974101
  • [45] Wang, G., Li, C., Wang, W., Zhang, Y., Shen, D., Zhang, X., Henao, R., & Carin, L. (2018). Joint embedding of words and labels for text classification. ArXiv, abs/1805.04174. https://doi.org/10.48550/arXiv.1805.04174
  • [46] World Health Organization, (2023). International Classification of Diseases for Mortality and Morbidity Statistics Eleventh Revision (ICD-11). https://icdcdn.who.int/icd11referenceguide/en/html/index.html
  • [47] Yan, C., Fu, X., Liu, X., Zhang, Y., Gao, Y., Wu, J., & Li, Q. (2022). A survey of automated International Classification of Diseases coding: development, challenges, and applications. Intelligent Medicine, 2(3), 161-173. https://doi.org/10.1016/j.imed.2022.03.003
  • [48] Zeng, M., Li, M., Fei, Z., Yu, Y., Pan, Y., & Wang, J. (2019). Automatic ICD-9 coding via deep transfer learning. Neurocomputing, 324, 43-50. https://doi.org/10.1016/j.neucom.2018.04.081
  • [49] Zhang, Z., Liu, J., & Razavian, N. (2020). BERT-XML: Large scale automated ICD coding using BERT pretraining. ArXiv, abs/2006.03685. https://doi.org/10.48550/arXiv.2006.03685
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-a74ffd45-8b64-458f-9111-ab4c0ba184a7
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.