IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

AIP Publishing Logo

  • Previous Article
  • Next Article

Natural language processing: A review

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Reprints and Permissions
  • Cite Icon Cite
  • Search Site

Anushka Gangal , Ayush Shrivastava , Nadia Mahmood Hussien , Prabhishek Singh , Manoj Diwakar , Kapil Joshi , Sapna Bisht , Naveen Chandra Joshi; Natural language processing: A review. AIP Conf. Proc. 1 September 2023; 2771 (1): 020010. https://doi.org/10.1063/5.0153960

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Natural language processing (NLP) has received a great deal of attention for its computer representation and evaluation of human language. AI, email spam location, data extraction, once-finished, clinical, and question addressing are only a couple of the applications. The article is broken into four areas, with the first talking about different degrees of NLP and parts of Natural LanguageGeneration (NLG), trailed by the arrangement of encounters and improvement of NLP, the high level, new things, and weights, and future expansion. We’ll also look at the tools and methods utilized in Natural Language Processing, as well as how these procedures work when we apply them. The single correlation between operations and how each approach performs. Regular language handling has not yet attained flawlessness, although continued progress in this field can certainly approach the line of flawlessness. Today, numerous AIs recognize and respond to consumer voice directions using typical language handling calculations.

Sign in via your Institution

Citing articles via, publish with us - request a quote.

natural language processing literature review

Sign up for alerts

  • Online ISSN 1551-7616
  • Print ISSN 0094-243X
  • For Researchers
  • For Librarians
  • For Advertisers
  • Our Publishing Partners  
  • Physics Today
  • Conference Proceedings
  • Special Topics

pubs.aip.org

  • Privacy Policy
  • Terms of Use

Connect with AIP Publishing

This feature is available to subscribers only.

Sign In or Create an Account

  • Methodology
  • Open access
  • Published: 15 April 2024

Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research

  • Safoora Masoumi   ORCID: orcid.org/0000-0002-7343-1771 1 ,
  • Hossein Amirkhani 2 ,
  • Najmeh Sadeghian 3 &
  • Saeid Shahraz 4  

Systematic Reviews volume  13 , Article number:  107 ( 2024 ) Cite this article

1030 Accesses

2 Altmetric

Metrics details

Abstract review is a time and labor-consuming step in the systematic and scoping literature review in medicine. Text mining methods, typically natural language processing (NLP), may efficiently replace manual abstract screening. This study applies NLP to a deliberately selected literature review problem, the trend of using NLP in medical research, to demonstrate the performance of this automated abstract review model.

Scanning PubMed, Embase, PsycINFO, and CINAHL databases, we identified 22,294 with a final selection of 12,817 English abstracts published between 2000 and 2021. We invented a manual classification of medical fields, three variables, i.e., the context of use (COU), text source (TS), and primary research field (PRF). A training dataset was developed after reviewing 485 abstracts. We used a language model called Bidirectional Encoder Representations from Transformers to classify the abstracts. To evaluate the performance of the trained models, we report a micro f1-score and accuracy.

The trained models’ micro f1-score for classifying abstracts, into three variables were 77.35% for COU, 76.24% for TS, and 85.64% for PRF.

The average annual growth rate (AAGR) of the publications was 20.99% between 2000 and 2020 (72.01 articles (95% CI : 56.80–78.30) yearly increase), with 81.76% of the abstracts published between 2010 and 2020. Studies on neoplasms constituted 27.66% of the entire corpus with an AAGR of 42.41%, followed by studies on mental conditions ( AAGR  = 39.28%). While electronic health or medical records comprised the highest proportion of text sources (57.12%), omics databases had the highest growth among all text sources with an AAGR of 65.08%. The most common NLP application was clinical decision support (25.45%).

Conclusions

BioBERT showed an acceptable performance in the abstract review. If future research shows the high performance of this language model, it can reliably replace manual abstract reviews.

Peer Review reports

The history of natural language processing (NLP) is relatively short, but it has seen rapid growth through multiple fundamental revolutions. Alan Turing invented a test in the 1950s to determine whether computers could think like humans [ 1 ]. NLP scientists then applied universal linguistic rules to textual data to understand it. During this time, Noam Chomsky’s universal theory of language dominated NLP scientists’ attention. Computer scientists replaced this linguistic approach with computational models based on statistical analysis [ 1 ]. Increasing computational power for analyzing a large amount of textual information has contributed to our current understanding of NLP and its applications due to the invention of machine learning methods, especially deep learning [ 1 , 2 , 3 ]. Our intelligent machines now need natural language processing (NLP) to decipher meanings from human languages. With the widespread availability of smart gadgets in everyone’s life, NLP has become even more advanced over the past two decades [ 4 , 5 ]. Machines cannot recognize phrases and expressions without NLP in spoken and written languages. Moreover, the enormous amount of unstructured data produced daily highlights the need for NLP to assist professionals in sorting out their information [ 2 , 4 ]. Evidence-based medicine relies on systematic literature reviews to answer specific questions from a large amount of textual data, which can be challenging and time-consuming [ 6 ].

Machine learning and natural language processing can speed up and improve the SLR. In this context, text classification and data extraction are two NLP-based strategies. Abstract screening is an essential application of text classification in literature reviews. Alternatively, data extraction identifies information about a particular variable of interest. NLP can, for example, help extract the number of individuals who participated in particular clinical trials [ 6 , 7 ]. BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based machine learning model for language modeling that has demonstrated significant success in various NLP tasks [ 8 ]. BioBERT, a BERT-based model pre-trained on biomedical texts, has outperformed other pre-trained language models in some biomedical datasets [ 9 ]. BioBERT has been highly performant in previous studies [ 10 , 11 , 12 , 13 ].

In this study, we deliberately analyze the evolution of medical NLP over the last two decades and benchmarked some of our findings against two similar studies published recently [ 14 , 15 ]. As an example of how NLP aids abstract review, we conducted an SLR using an automated method. Based on the results of SLR, a list of data sources used in medical NLP literature is provided, along with the type of NLP application and the related disease areas. We also show how the BioBERT model categorizes abstracts.

Developing training data

PubMed, Embase, PsycINFO, and CINAHL were searched using controlled vocabulary thesaurus (MesH in PubMed and Emtree in Embase) and free-text keywords. The search queries included “natural language processing” and “text mining.” Additional file 1 provides the full search queries. Also excluded were editorials, case reports, commentary, erratum, replies, and studies without abstracts. Before 2000, there were few NLP studies. Therefore, we included all abstracts published between January 2000 and December 2020. Multiple steps are involved in the study. First, we classified NLP abstracts based on their text source (e.g., social media versus clinical notes). After optimizing retrievable meaningful classes of abstracts, a finalized training dataset was created. Next, we calculated the classification accuracy of the computer algorithm using the entire corpus. As a final step, we applied the algorithm to obtain the classes and visualized them. The last author (S. S.) randomly selected 100 abstracts from PubMed and classified their text sources, the context of use (e.g., abstracts pertaining to clinical decision support vs. those related to NLP method development), and the type of medical conditions studied. Using these primary classes, the lead author (S. M.) and third author (N. S.) explored more classes and categories for each of these classes in further PubMed abstracts. By adding more abstracts, they continued to find more classes and subgroups until they were unable to find any more classes and subgroups. The saturation process was completed after reviewing 485 abstracts. All authors discussed and optimized the classification iteratively until they reached an agreement on the final classification. In Table 1 , the finalized classes and their definitions are described.

As depicted in Fig.  1 , machine learning algorithms were used to classify abstracts in the final corpus into those obtained from the trained dataset. By fine-tuning the pre-trained language models ubiquitous in modern NLP, we followed the favored approach. BERT, or Bidirectional Encoder Representations from Transformers, is a language model developed by Google [ 8 ]. The models are pre-trained on large corpora and then fine-tuned using task-specific training data by reusing the parameters from the pre-trained models. We used the BioBERT model [ 9 ] from the Hugging Face Transformers library [ 16 ], which was trained on abstracts from PubMed and full articles from PubMed Central. Then we fine-tuned three different models, one for each of our targets: text source, context of use, and primary research fields. The hyper-parameters, such as the learning rate and number of epochs, were selected using cross-validation. The final model was trained on the entire training data using the optimized hyperparameters. Since we utilized a pre-trained BioBERT model, a standard GPU, such as the Nvidia Tesla K80, was sufficient for fine-tuning the model during both the training and inference phases. All the experiments were conducted in a Google Colab environment leveraging this GPU.

figure 1

Overview of the proposed machine learning approach

For each target variable, we fine-tuned three different classifiers as an alternative method of improving the models’ accuracy. Repeating the fine-tuning process resulted in a different classifier due to different data batches used during training. The final prediction for an input article was obtained by majority voting of the base classifiers’ predictions. Afterwards, the trained models were applied to the entire corpus. A set of 362 randomly selected abstracts was manually annotated by the lead (S. M.), third (N. S.), and last author (S. S.) to evaluate the labels provided by the trained models. Next, the human annotations were compared to those provided by the models. The evaluation showed that the trained models’ accuracy in classifying abstracts into the text source, the context of use, and the primary research field was sufficient, mainly to track the time trends of the classes. Therefore, we assumed that misclassifications would remain constant over time. Our next step was to fit models that indicated publication growth rates for different study subgroups using ordinary least-squares regression. Citations were the dependent variable, and publication year was the predictor. Per year, the coefficient of the predictor showed an average increase in citations. A squared term for the publication year was added to the primary model to determine if the growth was linear or exponential. The increase in R 2 indicated logarithmic growth. The average annual growth rate (AAGR) was calculated by averaging all annual growth rates (AGR) over the study period (sum of AGRs/number of periods). We calculated AGR as the difference between the current year’s value and the past year’s value divided by the past year’s value.

We report a micro f1-score to evaluate the trained models. The f1-score is calculated as the harmonic mean of precision and recall for the positive class in a binary classification problem.

True positive (TP) and true negative (TN) are the numbers of samples correctly assigned to the positive and negative classes, respectively. On the other hand, false positive (FP) and false negative (FN) are the numbers of samples that are wrongly assigned to the positive and negative classes, respectively. Accuracy is the ratio of the samples correctly assigned to their respective classes.

Precision (P) and recall (R) are calculated as follows if TP, FP, and FN represent the number of true-positive, false-positive, and false-negative instances, respectively:

And f1-score will be as follows.

The average of the f1-scores obtained for different classes is computed for multiclass problems, such as ours. We report the weighted average considering the number of instances in each class in order to account for label imbalance.

Based on the evaluation, the trained models classified abstracts accurately into their text source, context, and primary research field (disease area) by 78.5%, 77.3%, and 87.6%, respectively. Accordingly, the trained models’ micro f1-scores for classifying abstracts into their text source, context of use, and primary research field were 77.35%, 76.24%, and 85.64%, respectively. We retrieved 22,294 English abstracts from the database. There were 12,817 references left after removing 8815 duplicates, 500 articles without abstracts, 32 errata, 31 commentaries, 31 editorials, and 68 veterinary-related abstracts. The selected analyses were based on 12,161 abstracts, excluding those published in 2021. Figure  2 illustrates the abstract selection process for creating the final abstract collection. NLP publications have increased logarithmically since 2000, as shown in Fig.  3 .

figure 2

PRISMA flowchart illustrating the steps of abstract selection for building the final corpus. *For most analyses, we excluded abstracts for the year 2021, leaving 12,161 abstracts in the analysis data

figure 3

Trend analysis of 12,817 abstracts showing the overall trend of the growth and the number of articles per year

The Additional file 2 conveys the total number of abstracts retrieved for each subgroup. Table 2 shows the AAGR and average growth slope (coefficient) with a 95% confidence interval. It also displays the adjusted R2 of the regression model with and without a squared term for the publication year. The AAGR was 20.99%, with an average increase of 72 (95% CI : 56.80–78.30) publications per year. According to the adjusted R2 of 83%, the publication number is strongly affected by time. After adding a squared term for publication year, the indicator increased to 93%, indicating logarithmic growth. In all types of NLP text sources, electronic medical or health records or similar electronic clinical notes accounted for the highest percentage (57.12%). The addition of published articles and other sources of medical evidence accounted for 33.84% of all NLP text sources. Social media, including websites and databases with omics data (e.g., genomics), accounted for less than 10% of all NLP text sources (Table  2 ). Figure  4 displays the relative proportions and growth trends of four specific subgroups of text sources since the year 2000. Additionally, it presents the percentage representation of these chosen subgroups within the total for the “context of use” of the text sources. Despite comprising only 4.91% of publications, the so-called omics text data exhibited the fastest growth ( AAGR  = 65.08%) among all other text sources.

figure 4

Proportion and growth of four selected subgroups of text source since the year 2000 and percentage of selected subgroups of the “context of use” of the total for the subgroups of text source

Changes in the dominant primary research fields since 2000, along with the expansion rates, as well as the distribution percentages for specific subcategories within “context of use” and “text source,” are illustrated in Fig.  5 . Four medical fields accounted for slightly over 65% of all the research NLP researchers conducted and published (neoplasms, mental conditions, infectious diseases, and circulatory diseases). Neoplasms topped this list. The growth rates of all these medical fields were comparable (Table  2 and Fig.  5 ). NLP methods for clinical decision support were the most notable identifiable application among different aims (called “context of use”) of NLP studies, accounting for 25.45% of all publications. In contrast, bioinformatics-related discoveries showed the highest growth ( AAGR  = 69.65%) among all medical NLP applications, in line with the highest growth of omics databases. Among the subgroups under “context of use,” the majority belonged to “other medical fields,” which included a wide range of medical applications. Changes in the “context of use” since the year 2000, including its proportion and growth, along with the percentage representation of specific subgroups within the “text source,” are depicted in Fig.  6 (Table  2 and Fig.  6 ).

figure 5

Proportion and growth of the most prevalent primary research fields since the year 2000 and the percentage of selected subgroups of the “context of use” and the “text source” of the total for the subgroups of the primary research field

figure 6

Proportion and growth of the “context of use” since the year 2000 and the percentage of selected subgroups of the “text source” of the total for the subgroups of the “context of use”

According to Fig.  5 , clinical decision support applications and electronic medical/health records had the highest proportion of context of use and text source for each subgroup of primary research fields. The proportion of text source and context of use subtypes varied significantly across medical fields. For instance, published papers on NLP method advancement accounted for the highest percentage (35%) of ICD-11 codes for mental, behavioral, and neurodevelopmental disorders. Similarly, social media was used more frequently (17%) in certain infectious or parasitic diseases than in any diseases designated by ICD-11 codes.

Yu Zhu et al. [ 13 ] used output-modified BioBERT pre-trained with PubMed and PMC and obtained an f-score of 80.9, like ours. Elangovan et al. [ 11 ] found a lower f-score in a similar study. The other two systematic reviews observed a similar upward trend in using NLP in various medical fields over the last two decades [ 14 , 15 ]. In 2000, medical NLP publications began to appear prominently in peer-reviewed journals. This study shows BioBERT can spot an expected result reported in previous studies.

We were particularly interested in the type of text sources used in medial NLP, the type of medical conditions studied, and the motivation behind performing NLP. Three published bibliographic studies shared some features with ours. Using PubMed data, Chen et al. examined 1405 papers over 10 years (2007–2016) and reported country-region, author-affiliation, and thematic NLP research areas [ 14 ]. Using PubMed data from 1999 to 2018, Wang et al. identified 3498 publications. Additionally, country-regions, author affiliations, disease areas, and text sources were reported [ 15 ]. Similar to Wang [ 15 ] and Chen [ 14 ], Chen et al. [ 17 ] used NLP methods to explore a similar set of variables; however, the authors focused only on NLP-enhanced clinical trial research. PubMed, Web of Science, and Scopus were searched for 451 published articles from 2001 to 2018. We selected 12,817 peer-reviewed citations using a different approach than typical bibliographic methods. We systematically scanned four chief article datasets and manually classified citations based on three variables: primary research fields, text source used, and motivation for NLP (context of use). In addition, we used BioBERT of Google as a preferred NLP method to assign subgroups to our variables.

Unlike typical bibliometric research, we were not interested in regional or institutional distributions, typical features of bibliometric research. Instead, we explored the hows and whys of medical NLP research over the past two decades.

According to our results, annual medical NLP publications grew by roughly 21% between 2000 and 2020 on average, similar to the nearly 18% growth. Chen et al. reported between 2007 and 2016 [ 14 ]. According to Wang et al. [ 15 ] and Chen et al. [ 17 ], medical NLP publications increased rapidly between 1999 and 2017. The logarithmic progression of the citations in our study can partly be explained by the annual increase of over 65% in NLP studies using omics datasets. Nearly 27% of all NLP research was conducted on neoplasms, mental conditions, infectious diseases, and circulatory diseases. Similarly, Wang et al. retrieved around 25% of their citations from neoplasms [ 15 ]. Previous authors have not explained why medical NLP citations are unequally high in cancer and a limited number of other fields, like mental health. The same is valid for why particular medical conditions are at the center of medical community researchers, while EHR (electronic health records) or EMR (electronic medical records) massive data must be equally available for all medical conditions proportional to their prevalence. In the case of cancers and infectious lung disease, however, unstructured text may convey more information because of pathology or radiology reports. We can potentially apply medical NLP to broader clinical and research settings by studying the systematic differences across medical conditions from an NLP standpoint.

There are strengths and weaknesses to our approach. We began by categorizing medical conditions hierarchically using a systemic strategy. To identify primary research fields, we used ICD-11’s top-level taxonomy. In the future, if NLP studies follow the same procedure, the findings will remain comparable. We chose the BioBERT model from various pre-trained language models, including ClinicalBERT and BlueBERT. BioBERT can train with 4.5 billion biomedical words from PubMed abstracts and 13.5 billion words from PMC full-text articles. Compared to similar BERT models, NLP researchers are more involved with BioBERT. Hence, we recommend comparing the performance of various BERT models before selecting a model if an NLP specialist is not confident enough to choose the proper model. Finally, we publish the method for developing the analysis database (NLP corpus) based on medical systematic review guidelines. Future research can use this approach to confirm whether NLP can replace systematic literature reviews.

A potential shortcoming of our study was the idiosyncratic nature of the initial classification used for training the machine. Using our experience with observational datasets, such as electronic clinical notes and NLP applications, to analyze unstructured clinical data, we began building the initial subgroups. Nevertheless, to mitigate the risk of bias, we dissected the published studies cumulatively until more studies could not update the evolving classification. Our models’ estimated classification accuracy may have been adequate because of this strategy. The model can be fine-tuned based on more annotated articles, the hyper-parameters can be tuned more thoroughly, and multi-task learning can be explored instead of training separate models for each task. Additionally, the accuracy may improve further after the training dataset is expanded. Finally, we only included abstracts written in English. Results and conclusions may be influenced if relevant research published in languages other than English is excluded.

This study aimed to evaluate the performance of BioBERT as a tool to substitute manual abstract review using a language model. BioBERT is an acceptable method for abstract selection for systematic literature searches since it uses a uniform and human-independent algorithm that reduces the time required for manual abstract selection and increases inter-study reliability.

Availability of data and materials

Data sharing is not applicable, but the authors are happy to share the abstracts publicly if that helps the reviewers or readers.

Abbreviations

Natural language processing

Average annual growth rate

Electronic health records

Electronic medical records

International Classification of Diseases

Bidirectional Encoder Representations from Transformers

Johri P, Khatri S, Taani A, Sabharwal M, Suvanov S, Kumar A, editors. Natural language processing: history, evolution, application, and future work. 3rd International Conference on Computing Informatics and Networks; 2021. p.365–75.

Zhou M, Duan N, Liu S, Shum H. Progress in neural NLP: modeling, learning, and reasoning. Engineering. 2020;6(3):275–90.

Article   Google Scholar  

Jones KS. Natural language processing: a historical review. In: Zampolli A, Calzolari N, Palmer M, editors. Current Issues in Computational Linguistics: In Honour of Don Walker. Dordrecht: Springer; 1994. p. 3–16.

Locke S, Bashall A, Al-Adely S, Moore J, Wilson A, Kitchen G. Natural language processing in medicine: a review. Trends Anaesthesia Crit Care. 2021;38:4–9.

Manaris B. Natural language processing: a human-computer interaction perspective. Adv Comput. 1998;47:1–66.

Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev. 2019;8(1):163.

Article   PubMed   PubMed Central   Google Scholar  

Kim SN, Martinez D, Cavedon L, Yencken L. Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics. 2011;12(2):S5.

Devlin J, Chang M-W, Lee K, Toutanova K, editors. BERT: Pre-training of deep bidirectional transformers for language understanding. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); 2019. p. 4171–86.

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.

Article   CAS   PubMed   Google Scholar  

Giorgi JM, Bader GD. Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics. 2018;34(23):4087–94.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Elangovan A, Li Y, Pires DEV, Davis MJ, Verspoor K. Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT. BMC Bioinformatics. 2022;23(1):4.

Ji Z, Wei Q, Xu H. BERT-based ranking for biomedical entity normalization. AMIA Jt Summits Transl Sci Proc. 2020;2020:269–77.

PubMed   PubMed Central   Google Scholar  

Zhu Y, Li L, Lu H, Zhou A, Qin X. Extracting drug-drug interactions from texts with BioBERT and multiple entity-aware attentions. J Biomed Inform. 2020;106:103451.

Article   PubMed   Google Scholar  

Chen X, Xie H, Wang FL, Liu Z, Xu J, Hao T. A bibliometric analysis of natural language processing in medical research. BMC Med Inform Decis Mak. 2018;18(1):1–14.

Wang J, Deng H, Liu B, Hu A, Liang J, Fan L, et al. Systematic evaluation of research progress on natural language processing in medicine over the past 20 years: bibliometric study on PubMed. J Med Internet Res. 2020;22(1):e16816.

Wolf T, Chaumond J, Debut L, Sanh V, Delangue C, Moi A, et al., editors. Transformers: state-of-the-art natural language processing. Conference on Empirical Methods in Natural Language Processing: System Demonstrations; 2020. p. 38–45

Chen X, Xie H, Cheng G, Poon LK, Leng M, Wang FL. Trends and features of the applications of natural language processing techniques for clinical trials text analysis. Appl Sci. 2020;10(6):2157.

Article   CAS   Google Scholar  

Download references

Acknowledgements

Not applicable

Not applicable.

Author information

Authors and affiliations.

Pediatric Infectious Diseases Research Center, Mazandaran University of Medical Sciences, Sari, Iran

Safoora Masoumi

Computer and Information Technology Department, University of Qom, Qom, Iran

Hossein Amirkhani

Student Research Committee, Mazandaran University of Medical Sciences, Sari, Iran

Najmeh Sadeghian

Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, USA

Saeid Shahraz

You can also search for this author in PubMed   Google Scholar

Contributions

SM drafted the manuscript, helped conduct the analysis, and contributed to the design of the study and data collection. HA conducted the analysis and contributed to the design of the study. NS contributed to data collection and critical review of the drafts. SSH designed the study, supervised the integrity and accuracy of the outputs, and critically reviewed the draft and revised it iteratively.

Corresponding author

Correspondence to Safoora Masoumi .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

: Appendix 1 . The three tables (a, b, and c) show the absolute number of abstracts retrieved between 2000 and 2020 (inclusive) for each of the three classes studied.

Additional file 2

: Appendix 2 . The search strategy used to received abstracts from four databases.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Masoumi, S., Amirkhani, H., Sadeghian, N. et al. Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research. Syst Rev 13 , 107 (2024). https://doi.org/10.1186/s13643-024-02470-y

Download citation

Received : 23 July 2022

Accepted : 28 January 2024

Published : 15 April 2024

DOI : https://doi.org/10.1186/s13643-024-02470-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Natural language processing (NLP)
  • Trend analysis
  • Machine learning

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

natural language processing literature review

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

ijerph-logo

Article Menu

natural language processing literature review

  • Subscribe SciFeed
  • Recommended Articles
  • PubMed/Medline
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A narrative literature review of natural language processing applied to the occupational exposome.

natural language processing literature review

1. Introduction

  • What are the most common text mining and NLP approaches used in exposure assessment research?
  • What resources are used for this task?
  • What are the most common NLP methods used?
  • What are the main challenges and future directions of research?

2. Review Methodology

  • (“natural language processing” OR “text mining” OR “text-mining” OR “text and data mining” OR ontology OR lexic* OR corpus OR corpora) AND (exposome OR exposure OR socioexposome OR (“risk factor” AND (“work” OR “occupational” OR “environmental*”)))
  • Original work;
  • Study exposures concerning humans;
  • Study occupational and/or environmental exposures of humans, such as airborne agents (e.g., particulates or substances and biological agents (viruses)), stressors, psycho-social and physical (e.g., muscle-skeletal) exposures as well as workplace accidents;
  • Have their full texts available;
  • Are written in English;
  • Focus on text mining or natural language processing and their texts containing a method, experiments and result section.
  • Studied animal or plant exposures;
  • Studied drug, nutrition or dietary exposures on humans;
  • Written in another language than English;
  • Commentaries, opinion papers or editorials.

3.1. Machine Learning Methods

3.2. knowledge-based methods, 3.3. database creation and fusion, 3.4. literature reviews and qualitative research, 4. discussion.

  • Data volume and quality Whilst there has been some use of unsupervised machine learning methods (e.g., clustering via LDA) in the selected studies, a majority use supervised machine learning. One downside of this is that the latter approach requires human annotated data, which usually requires expert knowledge and is therefore a time-consuming and costly process. To overcome this issue, the use of semi-supervised or unsupervised learning methods might be explored, because it requires either significantly less annotated training data or none at all. An example of this is the use of topic modelling techniques to cluster jobs and exposures from the existing literature. Another opportunity lies in using semi-supervised Named Entity Recognition to increase the coverage of annotated literature.
  • Novel deep learning techniques The present studies predominantly utilise traditional machine learning techniques (e.g., SVMs); however, the field has drastically evolved over recent years with more advanced techniques known as deep learning methods producing scaleable and accurate results. This includes but is not limited to transfer learning [ 76 ] or adversarial learning [ 77 ], which include a variety of neural networks structures or knowledge graphs that have been at the core of NLP research. This also includes Transformer-based methods [ 78 ] (e.g., large pre-trained language models such as BERT [ 79 ]), which have made a significant impact on the field of NLP over recent years and could prove to be useful in NLP for occupational exposure research. This type of deep learning method is based on attention [ 80 ], which has been shown to improve results in a variety of other domains that have utilised NLP (e.g., healthcare). These advances could be used to improve tasks such as Named Entity Recognition (NER) [ 81 ] or Relation Extraction (RE) [ 82 ] in occupational exposure research, which up until this point have relied on traditional machine learning only. Both tasks could prove useful in the context of occupational exposure research to automatically identify key concepts (e.g., types of exposures, jobs or work environments) but also how they relate to one another (e.g., a particular role is in a specific work place). Other advances could be made through the use of unsupervised methods, which thus far have also relied on traditional machine learning only. More recent methods such as Neural Topic Models (NTM) have become increasingly popular for different tasks, including document summarisation and text generation [ 83 ] due to their flexibility and capability. These methods could also be applied to occupational exposure research to uncover new topics and concepts at a larger scale or draw new connections between exposures and work environments. Similarly, NTM methods could also be coupled with pre-trained language models to further boost performance and result in more accurate representations of new topics [ 83 ].
  • Extrapolating existing research to other domains of exposure research Most of the research explored in this review is specific to a particular type of exposure, databases or enhancement of literature reviews. The domain-specificity and different needs/requirements for each type of exposure make it therefore hard to extrapolate these existing works to other fields, link and scale up existing approaches.

5. Conclusions

Author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest, abbreviations.

AIArtificial Intelligence
AOPAdverse Outcome Pathways
BERTBidirectional Encoder Representations from Transformers
CTDComparative Toxicogenomics Database
DRSDocument relevancy score
LDALatent Dirichlet Allocation
LSALatent semantic analysis
LSTMLong Short Term Memory
NERNamed Entity Recognition
NLPNatural Language Processing
NLTKNatural Language Toolkit
NTMNeural topic models
PCAPrincipal component analysis
RERelation Extraction
SVMSupport Vector Machine
TF-IDFfrequency–inverse document frequency
UMLSUnified Medical Language System
  • Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning ; MIT Press: Cambridge, MA, USA, 2016. [ Google Scholar ]
  • Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach ; Prentice Hall: Hoboken, NJ, USA, 2002. [ Google Scholar ]
  • Wild, C.P. The exposome: From concept to utility. Int. J. Epidemiol. 2012 , 41 , 24–32. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Haddad, N.; Andrianou, X.D.; Makris, K.C. A scoping review on the characteristics of human exposome studies. Curr. Pollut. Rep. 2019 , 5 , 378–393. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Kreimeyer, K.; Foster, M.; Pandey, A.; Arya, N.; Halford, G.; Jones, S.F.; Forshee, R.; Walderhaug, M.; Botsis, T. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J. Biomed. Inform. 2017 , 73 , 14–29. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chowdhury, G.G. Natural language processing. Annu. Rev. Inf. Sci. Technol. 2003 , 37 , 51–89. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Qiu, X.; Sun, T.; Xu, Y.; Shao, Y.; Dai, N.; Huang, X. Pre-trained models for natural language processing: A survey. Sci. China Technol. Sci. 2020 , 63 , 1872–1897. [ Google Scholar ] [ CrossRef ]
  • Przybyła, P.; Brockmeier, A.J.; Kontonatsios, G.; Le Pogam, M.A.; McNaught, J.; von Elm, E.; Nolan, K.; Ananiadou, S. Prioritising references for systematic reviews with RobotAnalyst: A user study. Res. Synth. Methods 2018 , 9 , 470–488. [ Google Scholar ] [ CrossRef ]
  • Balasubramanian, V.; Vivekanandhan, S.; Mahadevan, V. Pandemic tele-smart: A contactless tele-health system for efficient monitoring of remotely located COVID-19 quarantine wards in India using near-field communication and natural language processing system. Med. Biol. Eng. Comput. 2021 , 60 , 61–79. [ Google Scholar ] [ CrossRef ]
  • Dong, T.; Yang, Q.; Ebadi, N.; Luo, X.R.; Rad, P. Identifying Incident Causal Factors to Improve Aviation Transportation Safety: Proposing a Deep Learning Approach. J. Adv. Transp. 2021 , 2021 , 5540046. [ Google Scholar ] [ CrossRef ]
  • Medina Sada, D.; Mengel, S.; Gittner, L.S.; Khan, H.; Rodriguez, M.A.P.; Vadapalli, R. A Preliminary Investigation with Twitter to Augment CVD Exposome Research. In Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, Austin, TX, USA, 5–8 December 2017; pp. 169–178. [ Google Scholar ]
  • Lee, S.W.; Kwon, J.H.; Lee, B.; Kim, E.J. Scientific Literature Information Extraction Using Text Mining Techniques for Human Health Risk Assessment of Electromagnetic Fields. Sens. Mater. 2020 , 32 , 149–157. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Lamurias, A.; Jesus, S.; Neveu, V.; Salek, R.M.; Couto, F.M. Information Retrieval using Machine Learning for Biomarker Curation in the Exposome-Explorer. bioRxiv 2020 , 6 , 689264. [ Google Scholar ] [ CrossRef ]
  • Larsson, K.; Baker, S.; Silins, I.; Guo, Y.; Stenius, U.; Korhonen, A.; Berglund, M. Text mining for improved exposure assessment. PLoS ONE 2017 , 12 , e0173132. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Tewari, S.; Toledo Margalef, P.; Kareem, A.; Abdul-Hussein, A.; White, M.; Wazana, A.; Davidge, S.T.; Delrieux, C.; Connor, K.L. Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence. J. Pers. Med. 2021 , 11 , 1064. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Varghese, A.; Cawley, M.; Hong, T. Supervised clustering for automated document classification and prioritization: A case study using toxicological abstracts. Environ. Syst. Decis. 2018 , 38 , 398–414. [ Google Scholar ] [ CrossRef ]
  • Li, J.; Wang, J.; Xu, N.; Hu, Y.; Cui, C. Importance degree research of safety risk management processes of urban rail transit based on text mining method. Information 2018 , 9 , 26. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Leroy, G.; Harber, P.; Revere, D. Public sharing of medical advice using social media: An analysis of Twitter. Grey J. (TGJ) 2016 , 12 , 104–113. [ Google Scholar ]
  • Karystianis, G.; Buchan, I.; Nenadic, G. Mining characteristics of epidemiological studies from Medline: A case study in obesity. J. Biomed. Semant. 2014 , 5 , 22. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Karystianis, G.; Thayer, K.; Wolfe, M.; Tsafnat, G. Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews. J. Biomed. Inform. 2017 , 70 , 27–34. [ Google Scholar ] [ CrossRef ]
  • Fan, J.w.; Li, J.; Lussier, Y.A. Semantic modeling for exposomics with exploratory evaluation in clinical context. J. Healthc. Eng. 2017 , 2017 , 3818302. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ali, I.; Guo, Y.; Silins, I.; Högberg, J.; Stenius, U.; Korhonen, A. Grouping chemicals for health risk assessment: A text mining-based case study of polychlorinated biphenyls (PCBs). Toxicol. Lett. 2016 , 241 , 32–37. [ Google Scholar ] [ CrossRef ]
  • Davis, A.P.; Wiegers, T.C.; Johnson, R.J.; Lay, J.M.; Lennon-Hopkins, K.; Saraceni-Richards, C.; Sciaky, D.; Murphy, C.G.; Mattingly, C.J. Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database. PLoS ONE 2013 , 8 , e58201. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Vishnyakova, D.; Pasche, E.; Gobeill, J.; Gaudinat, A.; Lovis, C.; Ruch, P. Classification and prioritization of biomedical literature for the comparative toxicogenomics database. In Proceedings of the MIE, Pisa, Italy, 26–29 August 2012; pp. 210–214. [ Google Scholar ]
  • Lu, Y.; Xu, H.; Peterson, N.B.; Dai, Q.; Jiang, M.; Denny, J.C.; Liu, M. Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches. Int. J. Data Min. Bioinform. 2012 , 6 , 447–459. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Giummarra, M.J.; Lau, G.; Gabbe, B.J. Evaluation of text mining to reduce screening workload for injury-focused systematic reviews. Inj. Prev. 2020 , 26 , 55–60. [ Google Scholar ] [ CrossRef ]
  • Warth, B.; Spangler, S.; Fang, M.; Johnson, C.H.; Forsberg, E.M.; Granados, A.; Martin, R.L.; Domingo, X.; Huan, T.; Rinehart, D.; et al. Exposing the Exposome with Global Metabolomics and Cognitive Computing. bioRxiv 2017 , 145722. [ Google Scholar ] [ CrossRef ]
  • Berrang-Ford, L.; Sietsma, A.J.; Callaghan, M.; Minx, J.C.; Scheelbeek, P.F.; Haddaway, N.R.; Haines, A.; Dangour, A.D. Systematic mapping of global research on climate and health: A machine learning review. Lancet Planet. Health 2021 , 5 , e514–e525. [ Google Scholar ] [ CrossRef ]
  • Minet, E.; Haswell, L.E.; Corke, S.; Banerjee, A.; Baxter, A.; Verrastro, I.; e Lima, F.D.A.; Jaunky, T.; Santopietro, S.; Breheny, D.; et al. Application of text mining to develop AOP-based mucus hypersecretion genesets and confirmation with in vitro and clinical samples. Sci. Rep. 2021 , 11 , 6091. [ Google Scholar ] [ CrossRef ]
  • Taboureau, O.; El M’Selmi, W.; Audouze, K. Integrative systems toxicology to predict human biological systems affected by exposure to environmental chemicals. Toxicol. Appl. Pharmacol. 2020 , 405 , 115210. [ Google Scholar ] [ CrossRef ]
  • Russ, D.E.; Ho, K.Y.; Colt, J.S.; Armenti, K.R.; Baris, D.; Chow, W.H.; Davis, F.; Johnson, A.; Purdue, M.P.; Karagas, M.R.; et al. Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies. Occup. Environ. Med. 2016 , 73 , 417–424. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Semenza, J.C.; Herbst, S.; Rechenburg, A.; Suk, J.E.; Höser, C.; Schreiber, C.; Kistemann, T. Climate change impact assessment of food-and waterborne diseases. Crit. Rev. Environ. Sci. Technol. 2012 , 42 , 857–890. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Zhao, F.; Li, L.; Chen, Y.; Huang, Y.; Keerthisinghe, T.P.; Chow, A.; Dong, T.; Jia, S.; Xing, S.; Warth, B.; et al. Risk-Based Chemical Ranking and Generating a Prioritized Human Exposome Database. Environ. Health Perspect. 2021 , 129 , 047014. [ Google Scholar ] [ CrossRef ]
  • Dong, Z.; Fan, X.; Li, Y.; Wang, Z.; Chen, L.; Wang, Y.; Zhao, X.; Fan, W.; Wu, F. A Web-Based Database on Exposure to Persistent Organic Pollutants in China. Environ. Health Perspect. 2021 , 129 , 057701. [ Google Scholar ] [ CrossRef ]
  • Rugard, M.; Coumoul, X.; Carvaillo, J.C.; Barouki, R.; Audouze, K. Deciphering adverse outcome pathway network linked to bisphenol F using text mining and systems toxicology approaches. Toxicol. Sci. 2020 , 173 , 32–40. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Barupal, D.K.; Fiehn, O. Generating the blood exposome database using a comprehensive text mining and database fusion approach. Environ. Health Perspect. 2019 , 127 , 097008. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wishart, D.; Arndt, D.; Pon, A.; Sajed, T.; Guo, A.C.; Djoumbou, Y.; Knox, C.; Wilson, M.; Liang, Y.; Grant, J.; et al. T3DB: The toxic exposome database. Nucleic Acids Res. 2015 , 43 , D928–D934. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Zhang, H.; Hu, H.; Diller, M.; Hogan, W.R.; Prosperi, M.; Guo, Y.; Bian, J. Semantic Standards of External Exposome Data. Environ. Res. 2021 , 197 , 111185. [ Google Scholar ] [ CrossRef ]
  • Ekenga, C.C.; McElwain, C.A.; Sprague, N. Examining public perceptions about lead in school drinking water: A mixed-methods analysis of Twitter response to an environmental health hazard. Int. J. Environ. Res. Public Health 2018 , 15 , 162. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Hollister, B.M.; Restrepo, N.A.; Farber-Eger, E.; Crawford, D.C.; Aldrich, M.C.; Non, A. Development and performance of text-mining algorithms to extract socioeconomic status from de-identified electronic health records. In Pacific Symposium on Biocomputing 2017 ; World Scientific: Singapore, 2017; pp. 230–241. [ Google Scholar ]
  • Hartmann, J.; Wuijts, S.; van der Hoek, J.P.; de Roda Husman, A.M. Use of literature mining for early identification of emerging contaminants in freshwater resources. Environ. Evid. 2019 , 8 , 33. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Cawley, M.; Beardslee, R.; Beverly, B.; Hotchkiss, A.; Kirrane, E.; Sams II, R.; Varghese, A.; Wignall, J.; Cowden, J. Novel text analytics approach to identify relevant literature for human health risk assessments: A pilot study with health effects of in utero exposures. Environ. Int. 2020 , 134 , 105228. [ Google Scholar ] [ CrossRef ]
  • Jornod, F.; Rugard, M.; Tamisier, L.; Coumoul, X.; Andersen, H.R.; Barouki, R.; Audouze, K. AOP4EUpest: Mapping of pesticides in adverse outcome pathways using a text mining tool. Bioinformatics 2020 , 36 , 4379–4381. [ Google Scholar ] [ CrossRef ]
  • Kiossogloua, P.; Bordaa, A.; Graya, K.; Martin-Sancheza, F.; Verspoora, K.; d Lopez-Camposa, G. Characterising the Scope of Exposome Research: A Generalisable Approach ; IOS Press: Amsterdam, The Netherlands, 2017. [ Google Scholar ]
  • Davis, A.P.; Grondin, C.J.; Johnson, R.J.; Sciaky, D.; Wiegers, J.; Wiegers, T.C.; Mattingly, C.J. Comparative toxicogenomics database (CTD): Update 2021. Nucleic Acids Res. 2021 , 49 , D1138–D1143. [ Google Scholar ] [ CrossRef ]
  • Zgheib, E.; Kim, M.J.; Jornod, F.; Bernal, K.; Tomkiewicz, C.; Bortoli, S.; Coumoul, X.; Barouki, R.; De Jesus, K.; Grignard, E.; et al. Identification of non-validated endocrine disrupting chemical characterization methods by screening of the literature using artificial intelligence and by database exploration. Environ. Int. 2021 , 154 , 106574. [ Google Scholar ] [ CrossRef ]
  • Ayadi, A.; Auffan, M.; Rose, J. Ontology-based NLP information extraction to enrich nanomaterial environmental exposure database. Procedia Comput. Sci. 2020 , 176 , 360–369. [ Google Scholar ] [ CrossRef ]
  • Schwartz, K.L.; Achonu, C.; Buchan, S.A.; Brown, K.A.; Lee, B.; Whelan, M.; Wu, J.H.; Garber, G. Epidemiology, clinical characteristics, household transmission, and lethality of severe acute respiratory syndrome coronavirus-2 infection among healthcare workers in Ontario, Canada. PLoS ONE 2020 , 15 , e0244477. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Loper, E.; Bird, S. Nltk: The natural language toolkit. arXiv 2002 , preprint. arXiv:cs/0205028. [ Google Scholar ]
  • Rani, J.; Shah, A.R.; Ramachandran, S. pubmed. mineR: An R package with text-mining algorithms to analyse PubMed abstracts. J. Biosci. 2015 , 40 , 671–682. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Howard, J.; Ruder, S. Transfer Learning over Text Using ULMFiT. In Proceedings of the NIPS, Long Beach, CA, USA, 4–9 December 2017. [ Google Scholar ]
  • Christensen, H.E.; Luginbyhl, T.T. Registry of Toxic Effects of Chemical Substances ; Technical Report; Tracor JITCO, Inc.: Rockville, MD, USA, 1975. [ Google Scholar ]
  • Neveu, V.; Nicolas, G.; Salek, R.M.; Wishart, D.S.; Scalbert, A. Exposome-Explorer 2.0: An update incorporating candidate dietary biomarkers and dietary associations with cancer risk. Nucleic Acids Res. 2020 , 48 , D908–D912. [ Google Scholar ] [ CrossRef ]
  • Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011 , 12 , 2825–2830. [ Google Scholar ]
  • Korhonen, A.; Ó Séaghdha, D.; Silins, I.; Sun, L.; Högberg, J.; Stenius, U. Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS ONE 2012 , 7 , e33427. [ Google Scholar ] [ CrossRef ]
  • Davis, A.P.; Grondin, C.J.; Johnson, R.J.; Sciaky, D.; McMorran, R.; Wiegers, J.; Wiegers, T.C.; Mattingly, C.J. The comparative toxicogenomics database: Update 2019. Nucleic Acids Res. 2019 , 47 , D948–D954. [ Google Scholar ] [ CrossRef ]
  • Settles, B. ABNER: An open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 2005 , 21 , 3191–3192. [ Google Scholar ] [ CrossRef ]
  • Aronson, A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. In Proceedings of the AMIA Symposium, Washington, DC, USA, 3–7 November 2001; American Medical Informatics Association: Bethesda, MD, USA, 2001; p. 17. [ Google Scholar ]
  • Corbett, P.; Copestake, A. Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinform. 2008 , 9 , S4. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Carvaillo, J.C.; Barouki, R.; Coumoul, X.; Audouze, K. Linking bisphenol S to adverse outcome pathways using a combined text mining and systems biology approach. Environ. Health Perspect. 2019 , 127 , 047005. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Ananiadou, S.; Rea, B.; Okazaki, N.; Procter, R.; Thomas, J. Supporting systematic reviews using text mining. Soc. Sci. Comput. Rev. 2009 , 27 , 509–523. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Lopez-Campos, G.; Kiossoglou, P.; Borda, A.; Hawthorne, C.; Gray, K.; Verspoor, K. Characterizing the Scope of Exposome Research Through Topic Modeling and Ontology Analysis. In MEDINFO 2019: Health and Wellbeing e-Networks for All ; IOS Press: Amsterdam, The Netherlands, 2019; pp. 1530–1531. [ Google Scholar ]
  • Cunningham, H.; Tablan, V.; Roberts, A.; Bontcheva, K. Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 2013 , 9 , e1002854. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Nenadic, G.; Ananiadou, S.; McNaught, J. Enhancing automatic term recognition through recognition of variation. In Proceedings of the 20th International Conference on Computational Linguistics, COLING 2004, Geneva, Switzerland, 23–27 August 2004; pp. 604–610. [ Google Scholar ]
  • Cohen, W.W. Minorthird: Methods for Identifying names and Ontological Relations in Text Using Heuristics for Inducing Regularities from Data. In Proceedings of the 6th International Workshop on Knowledge Discovery on the Web, Seattle, WA, USA, 22–25 August 2004. [ Google Scholar ]
  • High, R. The era of cognitive systems: An inside look at IBM Watson and how it works. IBM Corp. Redbooks 2012 , 1 , 16. [ Google Scholar ]
  • Schultheisz, R.J. TOXLINE: Evolution of an online interactive bibliographic database. J. Am. Soc. Inf. Sci. 1981 , 32 , 421–429. [ Google Scholar ] [ CrossRef ]
  • Barupal, D.K.; Schubauer-Berigan, M.K.; Korenjak, M.; Zavadil, J.; Guyton, K.Z. Prioritizing cancer hazard assessments for IARC Monographs using an integrated approach of database fusion and text mining. Environ. Int. 2021 , 156 , 106624. [ Google Scholar ] [ CrossRef ]
  • Grondin, C.J.; Davis, A.P.; Wiegers, T.C.; King, B.L.; Wiegers, J.A.; Reif, D.M.; Hoppin, J.A.; Mattingly, C.J. Advancing exposure science through chemical data curation and integration in the Comparative Toxicogenomics Database. Environ. Health Perspect. 2016 , 124 , 1592–1599. [ Google Scholar ] [ CrossRef ]
  • Coletti, M.H.; Bleich, H.L. Medical subject headings used to search the biomedical literature. J. Am. Med Inform. Assoc. 2001 , 8 , 317–323. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene ontology: Tool for the unification of biology. Nat. Genet. 2000 , 25 , 25–29. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Maglott, D.; Ostell, J.; Pruitt, K.D.; Tatusova, T. Entrez Gene: Gene-centered information at NCBI. Nucleic Acids Res. 2010 , 39 , D52–D57. [ Google Scholar ] [ CrossRef ]
  • Davi, A.; Haughton, D.; Nasr, N.; Shah, G.; Skaletsky, M.; Spack, R. A review of two text-mining packages: SAS TextMining and WordStat. Am. Stat. 2005 , 59 , 89–103. [ Google Scholar ] [ CrossRef ]
  • Lewis, R.B.; Maas, S.M. QDA Miner 2.0: Mixed-model qualitative data analysis software. Field Methods 2007 , 19 , 87–108. [ Google Scholar ] [ CrossRef ]
  • Wallace, B.C.; Small, K.; Brodley, C.E.; Lau, J.; Trikalinos, T.A. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, Miami, FL, USA, 28–30 January 2012; pp. 819–824. [ Google Scholar ]
  • Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016 , 3 , 9. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Chakraborty, A.; Alam, M.; Dey, V.; Chattopadhyay, A.; Mukhopadhyay, D. Adversarial attacks and defences: A survey. arXiv 2018 , arXiv:1810.00069. [ Google Scholar ] [ CrossRef ]
  • Singh, S.; Mahmood, A. The NLP cookbook: Modern recipes for transformer based deep learning architectures. IEEE Access 2021 , 9 , 68675–68702. [ Google Scholar ] [ CrossRef ]
  • Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018 , arXiv:1810.04805. [ Google Scholar ]
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017 , 30 , 5998–6008. [ Google Scholar ]
  • Lample, G.; Ballesteros, M.; Subramanian, S.; Kawakami, K.; Dyer, C. Neural architectures for named entity recognition. arXiv 2016 , arXiv:1603.01360. [ Google Scholar ]
  • Kumar, S. A survey of deep learning methods for relation extraction. arXiv 2017 , arXiv:1705.03645. [ Google Scholar ]
  • Zhao, H.; Phung, D.; Huynh, V.; Jin, Y.; Du, L.; Buntine, W. Topic modelling meets deep neural networks: A survey. arXiv 2021 , arXiv:2103.00498. [ Google Scholar ]

Click here to enlarge figure

Papers
NLTK[ , , , , ],
[ ]
Other[ , , , , ],
[ , , ],
[ , , , ],
[ , , ]
Not declared[ , , , , , ],
[ , , , , ],
[ , , ]
Scientific literature[ , , , , , ],
[ , , , ],
[ , , ],
[ , , , , , ],
[ , , , ]
Existing Database[ , , , , ]
Twitter[ , , ]
EHR[ , , ]
Accident reports[ , , ]
Machine learning[ , , , ],
[ , , , ],
[ , , , , ],
[ , , , , ],
[ , , , ]
Knowledge based[ , , , , ]
Database creation and fusion[ , , , , , ],
[ , , ]
Rule-based algorithms[ , ]
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Schoene, A.M.; Basinas, I.; van Tongeren, M.; Ananiadou, S. A Narrative Literature Review of Natural Language Processing Applied to the Occupational Exposome. Int. J. Environ. Res. Public Health 2022 , 19 , 8544. https://doi.org/10.3390/ijerph19148544

Schoene AM, Basinas I, van Tongeren M, Ananiadou S. A Narrative Literature Review of Natural Language Processing Applied to the Occupational Exposome. International Journal of Environmental Research and Public Health . 2022; 19(14):8544. https://doi.org/10.3390/ijerph19148544

Schoene, Annika M., Ioannis Basinas, Martie van Tongeren, and Sophia Ananiadou. 2022. "A Narrative Literature Review of Natural Language Processing Applied to the Occupational Exposome" International Journal of Environmental Research and Public Health 19, no. 14: 8544. https://doi.org/10.3390/ijerph19148544

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

CS 685, Spring 2020, UMass Amherst

Literature Review

Research paper reading, general tips for researching the literature, suggested papers.

  • Open access
  • Published: 03 June 2021

A systematic review of natural language processing applied to radiology reports

  • Arlene Casey 1 ,
  • Emma Davidson 2 ,
  • Michael Poon 2 ,
  • Hang Dong 3 , 4 ,
  • Daniel Duma 1 ,
  • Andreas Grivas 5 ,
  • Claire Grover 5 ,
  • Víctor Suárez-Paniagua 3 , 4 ,
  • Richard Tobin 5 ,
  • William Whiteley 2 , 6 ,
  • Honghan Wu 4 , 7 &
  • Beatrice Alex 1 , 8  

BMC Medical Informatics and Decision Making volume  21 , Article number:  179 ( 2021 ) Cite this article

16k Accesses

82 Citations

16 Altmetric

Metrics details

Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports.

We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics.

We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results.

Conclusions

Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.

Peer Review reports

Medical imaging examinations interpreted by radiologists in the form of narrative reports are used to support and confirm diagnosis in clinical practice. Being able to accurately and quickly identify the information stored in radiologists’ narratives has the potential to reduce workloads, support clinicians in their decision processes, triage patients to get urgent care or identify patients for research purposes. However, whilst these reports are generally considered more restricted in vocabulary than other electronic health records (EHR), e.g. clinical notes, it is still difficult to access this efficiently at scale [ 1 ]. This is due to the unstructured nature of these reports and Natural Language Processing (NLP) is key to obtaining structured information from radiology reports [ 2 ].

NLP applied to radiology reports is shown to be a growing field in earlier reviews [ 2 , 3 ]. In recent years there has been an even more extensive growth in NLP research in general and in particular deep learning methods which is not seen in the earlier reviews. A more recent review of NLP applied to radiology-related research can be found but this focuses on one NLP technique only, deep learning models [ 4 ]. Our paper provides a more comprehensive review comparing and contrasting all NLP methodologies as they are applied to radiology.

It is of significance to understand and synthesise recent developments specific to NLP in the radiology research field as this will assist researchers to gain a broader understanding of the field, provide insight into methods and techniques supporting and promoting new developments in the field. Therefore, we carry out a systematic review of research output on NLP applications in radiology from 2015 onward, thus, allowing for a more up to date analysis of the area. An additional listing of our synthesis of publications detailing their clinical and technical categories can be found in Additional file 1 and per publication properties can be found in Additional file 2 . Also different to the existing work, we look at both the clinical application areas NLP is being applied in and consider the trends in NLP methods. We describe and discuss study properties, e.g. data size, performance, annotation details, quantifying these in relation to both the clinical application areas and NLP methods. Having a more detailed understanding of these properties allows us to make recommendations for future NLP research applied to radiology datasets, supporting improvements and progress in this domain.

Related work

Amongst pre-existing reviews in this area, [ 2 ] was the first that was both specific to NLP on radiology reports and systematic in methodology. Their literature search identified 67 studies published in the period up to October 2014. They examined the NLP methods used, summarised their performance and extracted the studies’ clinical applications, which they assigned to five broad categories delineating their purpose. Since Pons et al.’s paper, several reviews have emerged with the broader remit of NLP applied to electronic health data, which includes radiology reports. [ 5 ] conducted a systematic review of NLP systems with a specific focus on coding free text into clinical terminologies and structured data capture. The systematic review by [ 6 ] specifically examined machine learning approaches to NLP (2015–2019) in more general clinical text data, and a further methodical review was carried out by [ 7 ] to synthesise literature on deep learning in clinical NLP (up to April 2019) although the did not follow the PRISMA guideline completely. With radiology reports as their particular focus, [ 3 ] published, the same year as Pons et al.’s review, an instructive narrative review outlining the fundamentals of NLP techniques applied in radiology. More recently, [ 4 ] published a systematic review focused on deep learning radiology-related research. They identified 10 relevant papers in their search (up to September 2019) and examined their deep learning models, comparing these with traditional NLP models and also considered their clinical applications but did not employ a specific categorisation. We build on this corpus of related work, and most specifically Pons et al.’s work. In our initial synthesis of clinical applications we adopt their application categories and further expand upon these to reflect the nature of subsequent literature captured in our work. Additionally, we quantify and compare properties of the studies reviewed and provide a series of recommendations for future NLP research applied to radiology datasets in order to promote improvements and progress in this domain.

Our methodology followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) [ 8 ], and the protocol is registered on protocols.io.

Eligibility for literature inclusion and search strategy

We included studies using NLP on radiology reports of any imaging modality and anatomical region for NLP technical development, clinical support, or epidemiological research. Exclusion criteria included: (1) language not English; (2) wrong publication type (e.g., case reports, reviews, conference abstracts, comments, patents, or editorials) (2) published before 2015; (3) uses radiology images only (no NLP); (4) not radiology reports; (5) no NLP results; (6) year out of range; (7) duplicate, already in the list of publications retrieved; (8) not available in full text.

We used Publish or Perish [ 9 ], a citation retrieval and analysis software program, to search Google Scholar. Google Scholar has a similar coverage to other databases [ 10 ] and is easier to integrate into search pipelines. We conducted an initial pilot search following the process described here, but the search terms were too specific and restricted the number of publications. For example, we experimented with using specific terms used within medical imaging such at CT, MRI. Thirty-seven papers were found during the pilot search but the same papers also appeared in our final search. We use the following search query restricted to research articles published in English between January 2015 and October 2019. (“radiology” OR “radiologist”) AND (“natural language” OR “text mining” OR “information extraction” OR “document classification” OR “word2vec”) NOT patent. We automated the addition of publication metadata and applied filtering to remove irrelevant publications. These automated steps are described in Tables 1 and 2 .

In addition to query search, another method to find papers is to conduct a citation search [ 15 ]. The citation search compiled a list of publications that cite the Pons et al. review and the articles cited in the Pons’ review. To do this, we use a snowballing method [ 16 ] to follow the forward citation branch for each publication in this list, i.e. finding every article that cites the publications in our list. The branching factor here is large, so we filter at every stage and automatically add metadata. One hundred and seventy-one papers were identified as part of the snowball citation search and of these 84 were in the final 164 papers.

Manual review of literature

Four reviewers (three NLP researchers [AG,DD and HD] and one epidemiologist [MTCP]) independently screened all titles and abstracts with the Rayyan online platform and discussed disagreements. Fleiss’ kappa [ 17 ] agreement between reviewers was 0.70, indicating substantial agreement [ 18 ]. After this screening process, each full-text article was reviewed by a team of eight (six NLP researchers and two epidemiologists) and double reviewed by a NLP researcher. We resolved any discrepancies by discussion in regular meetings.

Data extraction for analysis

We extracted data on: primary clinical application and technical objective, data source(s), study period, radiology report language, anatomical region, imaging modality, disease area, dataset size, annotated set size, training/validation/test set size, external validation performed, domain expert used, number of annotators, inter-annotator agreement, NLP technique(s) used, best-reported results (recall, precision and F1 score), availability of dataset, and availability of code.

The literature search yielded 4836 possibly relevant publications from which our automated exclusion process removed 4,402, and during both our screening processes, 270 were removed, leaving 164 publications. See Fig. 1 for details of exclusions at each step.

figure 1

PRISMA diagram for search publication retrieval

General characteristics

2015 and 2016 saw similar numbers of publications retrieved (22 and 21 respectively) with the volume increasing almost three-fold in 2019 (55), noting 2019 only covers 10 months (Fig. 2 ). Imaging modality (Table 3 ) varied considerably and 46 studies used reports from multiple modalities. Of studies focusing on a single modality, the most featured were CT scans (38) followed by MRI (16), X-Ray (8), Mammogram (5) and Ultrasound (4). Forty-seven studies did not specifying scan modality. For the study samples (Table 4 ), 33 papers specified that they used consecutive patient images, 38 used non-consecutive image sampling and 93 did not clearly specify their sampling strategy. The anatomical regions for scans varied (Table 5 ) with mixed being the highest followed by Thorax and Head/neck. Disease categories are presented in Table 6 with the largest disease category being Oncology. The majority of reports were in English (141) and a small number in other languages e.g., Chinese (5), Spanish (4), German (3) (Table 7 ). Additional file 2 , CSV format, provides a breakdown of the information in Tables 3 , 4 , 5 , 6 and 7 per publication.

Clinical application categories

In synthesis of the literature each publication was classified by the primary clinical purpose. Pons’ work in 2016 categorised publications into 5 broad categories: Diagnostic Surveillance, Cohort Building for Epidemiological Studies, Query-based Case Retrieval, Quality Assessment of Radiological Practice and Clinical Support Services. We found some changes in this categorisation schema and our categorisation consisted of six categories: Diagnostic Surveillance, Disease information and classification, Quality Compliance, Cohort/Epidemiology, Language Discovery and Knowledge Structure, Technical NLP . The main difference is we found no evidence for a category of Clinical Support Services which described applications that had been integrated into the workflow to assist. Despite the increase in the number of publications, very few were in clinical use with more focus on the category of Disease Information and Classification . We describe each clinical application area in more detail below and where applicable how our categories differ from the earlier findings. A listing of all publications and their corresponding clinical application and technical category can be found in Additional file 1 , MS Word format, and in Additional file 2 in CSV format. Table 8 shows the clinical application category by the technical classification and Fig. 2 shows the breakdown of clinical application category by publication year. There were more publications in 2019 compared with 2015 for all categories except Language Discovery & Knowledge Structure, which fell by \(\approx\) 25% (Fig. 2 ).

figure 2

Clinical application of publication by year

Diagnostic surveillance

A large proportion of studies in this category focused on extracting disease information for patient or disease surveillance e.g. investigating tumour characteristics [ 19 , 20 ]; changes over time [ 21 ] and worsening/progression or improvement/response to treatment [ 22 , 23 ]; identifying correct anatomical labels [ 24 ]; organ measurements and temporality [ 25 ]. Studies also investigated pairing measurements between reports [ 26 ] and linking reports to monitoring changes through providing an integrated view of consecutive examinations [ 27 ]. Studies focused specifically on breast imaging findings investigating aspects, such as BI-RADS MRI descriptors (shape, size, margin) and final assessment categories (benign, malignant etc.) e.g., [ 28 , 29 , 30 , 31 , 32 , 33 ]. Studies focused on tumour information e.g., for liver [ 34 ] and hepatocellular carcinoma (HPC) [ 35 , 36 ] and one study on extracting information relevant for structuring subdural haematoma characteristics in reports [ 37 ].

Studies in this category also investigated incidental findings including on lung imaging [ 38 , 39 , 40 ], with [ 38 ] additionally extracting the nodule size; for trauma patients [ 41 ]; and looking for silent brain infarction and white matter disease [ 42 ]. Other studies focused on prioritising/triaging reports, detecting follow-up recommendations, and linking a follow-up exam to the initial recommendation report, or bio-surveillance of infectious conditions, such as invasive mould disease.

Disease information and classification

Disease Information and Classification publications use reports to identify information that may be aggregated according to classification systems. These publications focused solely on classifying a disease occurrence or extracting information about a disease with no focus on the overall clinical application. This category was not found in Pons’ work. Methods considered a range of conditions including intracranial haemorrhage [ 43 , 44 ], aneurysms [ 45 ], brain metastases [ 46 ], ischaemic stroke [ 47 , 48 ], and several classified on types and severity of conditions e.g., [ 46 , 49 , 50 , 51 , 52 ]. Studies focused on breast imaging considered aspects such as predicting lesion malignancy from BI-RADS descriptors [ 53 ], breast cancer subtypes [ 54 ], and extracting or inferring BI-RADS categories, such as [ 55 , 56 ]. Two studies focused on abdominal images and hepatocellular carcinoma (HCC) staging and CLIP scoring. Chest imaging reports were used to detect pulmonary embolism e.g., [ 57 , 58 , 59 ], bacterial pneumonia [ 60 ], and Lungs-RADS categories [ 61 ]. Functional imaging was also included, such as echocardiograms, extracting measurements to evaluate heart failure, including left ventricular ejection fractions (LVEF) [ 62 ]. Other studies investigated classification of fractures [ 63 , 64 ] and abnormalities [ 65 ] and the prediction of ICD codes from imaging reports [ 66 ].

Language discovery and knowledge structure

Language Discovery and Knowledge Structure publications investigate the structure of language in reports and how this might be optimised to facilitate decision support and communication. Pons et al. reported on applications of Query-based retrieval which has similarities to Language Discovery and Knowledge Structure but it is not the same. Their category contains studies that retrieve cases and conditions that are not predefined and in some instances could be used for research purposes or are motivated for educational purposes. Our category is broader and encompasses papers that investigated different aspects of language including variability, complexity simplification and normalising to support extraction and classification tasks.

Studies focus on exploring lexicon coverage and methods to support language simplification for patients looking at sources, such as the consumer health vocabulary [ 67 ] and the French lexical network (JDM) [ 68 ]. Other works studied the variability and complexity of report language comparing free-text and structured reports and radiologists. Also investigated was how ontologies and lexicons could be combined with other NLP methods to represent knowledge that can support clinicians. This work included improving report reading efficiency [ 69 ]; finding similar reports [ 70 ]; normalising phrases to support classification and extraction tasks, such as entity recognition in Spanish reports [ 71 ]; imputing semantic classes for labelling [ 72 ], supporting search [ 73 ] or to discover semantic relations [ 74 ].

Quality and compliance

Quality and Compliance publications use reports to assess the quality and safety of practice and reports similar to Pons’ category. Works considered how patient indications for scans adhered to guidance e.g., [ 75 , 76 , 77 , 78 , 79 , 80 ] or protocol selection [ 81 , 82 , 83 , 84 , 85 ] or the impact of guideline changes on practice, such as [ 86 ]. Also investigated was diagnostic utilisation and yield, based on clinicians or on patients, which can be useful for hospital planning and for clinicians to study their work patterns e.g. [ 87 ]. Other studies in this category looked at specific aspects of quality, such as, classification for long bone fractures to support quality improvement in paediatric medicine [ 88 ], automatic identification of reports that have critical findings for auditing purposes [ 89 ], deriving a query-based quality measure to compare structured and free-text report variability [ 90 ], and [ 91 ] who describe a method to fix errors in gender or laterality in a report.

Cohort and epidemiology

This category is similar to Pons’ earlier review but we treated the studies in this category differently attempting to differentiate which papers described methods for creating cohorts for research purposes, and those which also reported the outcomes of an epidemiological analysis. Ten studies use NLP to create specific cohorts for research purposes and six reported the performance of their tools. Out of these papers, the majority (n = 8) created cohorts for specific medical conditions including fatty liver disease [ 92 , 93 ] hepatocellular cancer [ 94 ], ureteric stones [ 95 ], vertebral fracture [ 96 ], traumatic brain injury [ 97 , 98 ], and leptomeningeal disease secondary to metastatic breast cancer [ 99 ]. Five papers identified cohorts focused on particular radiology findings including ground glass opacities (GGO) [ 100 ], cerebral microbleeds (CMB) [ 101 ], pulmonary nodules [ 102 , 103 ], changes in the spine correlated to back pain [ 1 ] and identifying radiological evidence of people having suffered a fall. One paper focused on identifying abnormalities of specific anatomical regions of the ear within an audiology imaging database [ 104 ] and another paper aimed to create a cohort of people with any rare disease (within existing ontologies - Orphanet Rare Disease Ontology ORDO and Radiology Gamuts Ontology RGO). Lastly, one paper took a different approach of screening reports to create a cohort of people with contraindications for MRI, seeking to prevent iatrogenic events [ 105 ]. Amongst the epidemiology studies there were various analytical aims, but they primarily focused on estimating the prevalence or incidence of conditions or imaging findings and looking for associations of these conditions/findings with specific population demographics, associated factors or comorbidities. The focus of one study differed in that it applied NLP to healthcare evaluation, investigating the association of palliative care consultations and measures of high-quality end-of-life (EOL) care [ 99 ].

Technical NLP

This category is for publications that have a primary technical aim that is not focused on radiology report outcome, e.g. detecting negation in reports, spelling correction [ 106 ], fact checking [ 107 , 108 ] methods for sample selection, crowd source annotation [ 109 ]. This category did not occur in Pons’ earlier review.

NLP methods in use

NLP methods capture the different techniques an author applied broken down into rules, machine learning methods, deep learning, ontologies, lexicons and word embeddings. We discriminate machine learning from deep learning, using the former to represent traditional machine learning methods.

Over half of the studies only applied one type of NLP method and just over a quarter of the studies compared or combined methods in hybrid approaches. The remaining studies either used a bespoke proprietary system or focus on building ontologies or similarity measures (Fig. 3 ). Rule-based method use remains almost constant across the period, whereas use of machine learning decreases and deep learning methods rises, from five publications in 2017 to twenty-four publications in 2019 (Fig. 4 ).

figure 3

NLP method breakdown

figure 4

NLP method by year

A variety of machine classifier algorithms were used, with SVM and Logistic Regression being the most common (Table 9 ). Recurrent Neural Networks (RNN) variants were the most common type of deep learning architectures. RNN methods were split between long short-term memory (LSTM) and bidirectional-LSTM (Bi-LSTM), bi-directional gated recurrent unit (Bi-GRU), and standard RNN approaches. Four of these studies additionally added a Conditional Random Field (CRF) for the final label generation step. Convolutional Neural Networks (CNN) were the second most common architecture explored. Eight studies additionally used an attention mechanism as part of their deep learning architecture. Other neural approaches included feed-forward neural networks, fully connected neural networks and a proprietary neural system IBM Watson [ 82 ] and Snorkel [ 110 ]. Several studies proposed combined architectures, such as [ 31 , 111 ].

NLP method features

Most rule-based and machine classifying approaches used features based on bag-of-words, part-of-speech, term frequency, and phrases with only two studies alternatively using word embeddings. Three studies use feature engineering with deep learning rather than word embeddings. Thirty-three studies use domain-knowledge to support building features for their methods, such as developing lexicons or selecting terms and phrases. Comparison of embedding methods is difficult as many studies did not describe their embedding method. Of those that did, Word2Vec [ 112 ] was the most popular (n = 19), followed by GLOVE embeddings [ 113 ] (n = 6), FastText [ 114 ] (n = 3), ELMo [ 115 ] (n = 1) and BERT [ 116 ] (n = 1). Ontologies or lexicon look-ups are used in 100 studies; however, even though publications increase over the period in real terms, 20% fewer studies employ the use of ontologies or lexicons in 2019 compared to 2015. The most widely used resources were UMLS [ 117 ] (n = 15), Radlex [ 118 ] (n = 20), SNOMED-CT [ 119 ] (n = 14). Most studies used these as features for normalising words and phrases for classification, but this was mainly those using rule-based or machine learning classifiers with only six studies using ontologies as input to their deep learning architecture. Three of those investigated how existing ontologies can be combined with word embeddings to create domain-specific mappings, with authors pointing to this avoiding the need for large amounts of annotated data. Other approaches looked to extend existing medical resources using a frequent phrases approach, e.g. [ 120 ]. Works also used the derived concepts and relations visualising these to support activities, such as report reading and report querying (e.g. [ 121 , 122 ])

Annotation and inter-annotator agreement

Eighty-nine studies used at least two annotators, 75 did not specify any annotation details, and only one study used a single annotator. Whilst 69 studies use a domain expert for annotation (a clinician or radiologist) only 56 studies report the inter-annotator agreement. Some studies mention annotation but do not report on agreement or annotators. Inter-annotator agreement values for Kappa range from 0.43 to perfect agreement at 1. Whilst most studies reported agreement by Cohen’s Kappa [ 123 ] some reported precision, and percent agreement. Studies reported annotation data sizes differently, e.g., on the sentence or patient level. Studies also considered ground truth labels from coding schemes such as ICD or BI-RADS categories as annotated data. Of studies which detailed human annotation at the radiology report level, only 45 specified inter-annotator agreement and/or the number of annotators. Annotated report numbers for these studies varies with 15 papers having annotated less than 500, 12 having annotated between 500 and less than 1000, 15 between 1000 and less than 3000, and 3 between 4000 and 8,288 reports. Additional file 2 gives all annotation size information on a per publication basis in CSV format.

Data sources and availability

Only 14 studies reported that their data is available, and 15 studies reported that their code is available. Most studies sourced their data from medical institutions, a number of studies did not specify where their data was from, and some studies used publicly available datasets: MIMIC-III (n = 5), MIMIC-II (n = 1), MIMIC-CXR (n = 1); Radcore (n = 5) or STRIDE (n = 2). Four studies used combined electronic health records such as clinical notes or pathology reports.

Reporting on total data size differed across studies with some not giving exact data sizes but percentages and others reporting numbers of sentences, reports, patients, or a mixture of these. Where an author was not clear on the type of data they were reporting on, or on the size, we marked this as unspecified. Thirteen studies did not report on total data size. Data size summaries for those reporting at the radiology report level is n = 135 or 82.32% of the studies (Table 10 ). The biggest variation of data size by NLP Method is in studies that apply other methods or are rule-based. Machine learning also varies in size; however, the median value is lower compared to rule-based methods. The median value for deep learning is considerably higher at 5000 reports compared to machine learning or those that compare or create hybrid methods. Of the studies reporting on radiology reports numbers, 39.3% used over 10,000 reports and this increases to over 48% using more than 5000 reports. However, a small number of studies, 14%, are using comparatively low numbers of radiology reports, less than 500 (Table 11 ).

NLP performance and evaluation measures

Performance metrics applied for evaluation of methods vary widely with authors using precision (positive predictive value (PPV)), recall (sensitivity), specificity, the area under the curve (AUC) or accuracy. We observed a wide variety in evaluation methodology employed concerning test or validation datasets. Different approaches were taken in generating splits for testing and validation, including k-fold cross-validation. Table 12 gives a summary of the number of studies reporting about total data size and splits across train, validation, test, and annotation. This table is for all data types, i.e., reports, sentences, patients or mixed. Eighty-two studies reported on both training and test data splits, of which only 38 studies included a validation set. Only 10 studies validated their algorithm using an external dataset from another institution, another modality, or a different patient population. Additional file 2 gives all data size information on a per publication basis in CSV format. The most widely used metrics for reporting performance were precision (PPV) and recall (sensitivity) reported in 47% of studies. However, even though many studies compared methods and reported on the top-performing method, very few studies carried out significance testing on these comparisons. Issues of heterogeneity make it difficult and unrealistic to compare performance between methods applied, hence, we use summary measures as a broad overview (Fig. 5 ). Performance reported varies, but both the mean and median values for the F1 score appear higher for methods using rule-based only or deep learning only methods. Whilst differences are less discernible between F1 scores for application areas, Diagnostic Surveillance looks on average lower than other categories.

figure 5

Application Category and NLP Method, Mean and Median Summaries. Mean value is indicated by a vertical bar, the box shows error bars and the asterisk is the median value

Discussion and future directions

Our work shows there has been a considerable increase in the number of publications using NLP on radiology reports over the recent time period. Compared to 67 publications retrieved in the earlier review of [ 2 ], we retrieved 164 publications. In this section we discuss and offer some insight into the observations and trends of how NLP is being applied to radiology and make some recommendations that may benefit the field going forward.

Clinical applications and NLP methods in radiology

The clinical applications of the publications is similar to the earlier review of Pons et al. but whilst we observe an increase in research output we also highlight that there appears to be even less focus on clinical application compared to their review. Like many other fields applying NLP the use of deep learning has increased, with RNN architectures being the most popular. This is also observed in a review of NLP in clinical text [ 7 ]. However, although deep learning use increases, rules and traditional machine classifiers are still prevalent and often used as baselines to compare deep learning architectures against. One reason for traditional methods remaining popular is their interpretability compared to deep learning models. Understanding the features that drive a model prediction can support decision-making in the clinical domain but the complex layers of non-linear data transformations deep learning is composed of does not easily support transparency [ 124 ]. This may also help explain why in synthesis of the literature we observed less focus on discussing clinical application and more emphasis on disease classification or information task only. Advances in interpretability of deep learning models are critical to its adoption in clinical practice.

Other challenges exist for deep learning such as only having access to small or imbalanced datasets. Chen et al. [ 125 ] review deep learning methods within healthcare and point to these challenges resulting in poor performance but that these same datasets can perform well with traditional machine learning methods. We found several studies highlight this and when data is scarce or datasets imbalanced, they introduced hybrid approaches of rules and deep learning to improve performance, particularly in the Diagnostic Surveillance category. Yang et al. [ 126 ] observed rules performing better for some entity types, such as time and size, which are proportionally lower than some of the other entities in their train and test sets; hence they combine a bidirectional-LSTM and CRF with rules for entity recognition. Peng et al. [ 19 ] comment that combining rules and the neural architecture complement each other, with deep learning being more balanced between precision and recall, but the rule-based method having higher precision and lower recall. The authors reason that this provides better performance as rules can capture rare disease cases, particularly when multi-class labelling is needed, whilst deep learning architectures perform worse in instances with fewer data points.

In addition to its need for large-scale data, deep learning can be computationally costly. The use of pre-trained models and embeddings may alleviate some of this burden. Pre-trained models often only require fine-tuning, which can reduce computation cost. Language comprehension pre-learned from other tasks can then be inherited from the parent models, meaning fewer domain-specific labelled examples may be needed [ 127 ]. This use of pre-trained information also supports generalisability, e.g., [ 58 ] show that their model trained on one dataset can generalise to other institutional datasets.

Embedding use has increased which is expected with the application of deep learning approaches but many rule-based and machine classifiers continue to use traditional count-based features, e.g., bag-of-words and n-grams. Recent evidence [ 128 ] suggests that the trend to continue to use feature engineering with traditional machine learning methods does produce better performance in radiology reports than using domain-specific word embeddings.

Banerjee et al. [ 44 ] found that there was not much difference between a uni-gram approach and a Word2vec embedding, hypothesising this was due to their narrow domain, intracranial haemorrhage. However, the NLP research field has seen a move towards bi-directional encoder representations from transformers (BERT) based embedding models not reflected in our analysis, with only one study using BERT generated embeddings [ 46 ]. Embeddings from BERT are thought to be superior as they can deliver better contextual representations and result in improved task performance. Whilst more publications since our review period have used BERT based embeddings with radiology reports e.g. [ 127 , 129 ] not all outperform traditional methods [ 130 ]. Recent evidence shows that embeddings generated by BERT fail to show a generalisable understanding of negation [ 131 ], an essential factor in interpreting radiology reports effectively. Specialised BERT models have been introduced such as ClinicalBERT [ 132 ] or BlueBERT [ 129 ]. BlueBERT has been shown to outperform ClinicalBERT when considering chest radiology [ 133 ] but more exploration of the performance gains versus the benefits of generalisability are needed for radiology text.

All NLP models have in common that they need large amounts of labelled data for model training [ 134 ]. Several studies [ 135 , 136 , 137 ] explored combining word embeddings and ontologies to create domain-specific mappings, and they suggest this can avoid a need for large amounts of annotated data. Additionally, [ 135 , 136 ] highlight that such combinations could boost coverage and performance compared to more conventional techniques for concept normalisation.

The number of publications using medical lexical knowledge resources is still relatively low, even though a recent trend in the general NLP field is to enhance deep learning with external knowledge [ 138 ]. This was also observed by [ 7 ], where only 18% of the deep learning studies in their review utilised knowledge resources. Although pre-training supports learning previously known facts it could introduce unwanted bias, hindering performance. The inclusion of domain expertise through resources such as medical lexical knowledge may help reduce this unwanted bias [ 7 ]. Exploration of how this domain expertise can be incorporated with deep learning architectures in future could improve the performance when having access to less labelled data.

Task knowledge

Knowledge about the disease area of interest and how aspects of this disease are linguistically expressed is useful and could promote better performing solutions. Whilst [ 139 ] find high variability between radiologists, with metric values (e.g. number of syntactic, clinical terms based on ontology mapping) being significantly greater on free-text than structured reports, [ 140 ] who look specifically at anatomical areas find less evidence for variability. Zech et al. [ 141 ] suggest that the highly specialised nature of each imaging modality creates different sub-languages and the ability to discover these labels (i.e. disease mentions) reflects the consistency with which labels are referred to. For example, edema is referred to very consistently whereas other labels are not, such as infarction/ischaemic. Understanding the language and the context of entity mentions could help promote novel ideas on how to solve problems more effectively. For example, [ 35 ] discuss how the accuracy of predicting malignancy is affected by cues being outside their window of consideration and [ 142 ] observe problems of co-reference resolution within a report due to long-range dependencies. Both these studies use traditional NLP approaches, but we observed novel neural architectures being proposed to improve performance in similar tasks specifically capturing long-range context and dependency learning, e.g., [ 31 , 111 ]. This understanding requires close cooperation of healthcare professionals and data scientists, which is different to some other fields where more disconnection is present [ 125 ].

Study heterogeneity, a need for reporting standards

Most studies reviewed could be described as a proof-of-concept and not trialled in a clinical setting. Pons et al. [ 2 ] hypothesised that a lack of clinical application may stem from uncertainty around minimal performance requirements hampering implementations, evidence-based practice requiring justification and transparency of decisions, and the inability to be able to compare to human performance as the human agreement is often an unknown. These hypotheses are still valid, and we see little evidence that these problems are solved.

Human annotation is generally considered the gold standard at measuring human performance, and whilst many studies reported that they used annotated data, overall, reporting was inconsistent. Steps were undertaken to measure inter-annotator agreement (IAA), but in many studies, this was not directly comparable to the evaluation undertaken of the NLP methods. The size of the data being used to draw experimental conclusions from is important and accurate reporting of these measures is essential to ensure reproducibility and comparison in further studies. Reporting on the training, test and validation splits was varied with some studies not giving details and not using held-out validation sets.

Most studies use retrospective data from single institutions but this can lead to a model over-fitting and, thus, not generalising well when applied in a new setting. Overcoming the problem of data availability is challenging due to privacy and ethics concerns, but essential to ensure that performance of models can be investigated across institutions, modalities, and methods. Availability of data would allow for agreed benchmarks to be developed within the field that algorithm improvements can be measured upon. External validation of applied methods was extremely low, although, this is likely due to the availability of external datasets. Making code available would enable researchers to report how external systems perform on their data. However, only 15 studies reported that their code is available. To be able to compare systems there is a need for common datasets to be available to benchmark and compare systems against.

Whilst reported figures in precision and recall generally look high more evidence is needed for accurate comparison to human performance. A wide variety of performance measures were used, with some studies only reporting one measure, e.g., accuracy or F1 scores, with these likely representing the best performance obtained. Individual studies are often not directly comparable for such measures, but none-the-less clarity and consistency in reporting is desirable. Many studies making model comparisons did not carry out any significance testing for these comparisons.

Progressing NLP in radiology

The value of NLP applied to radiology is clear in that it can support areas such as clinicians in their decision making and reducing workload, add value in terms of automated coding of data, finding missed diagnosis for triage or monitoring quality. However, in recent years labelling disease phenotypes or extracting disease information in reports has been a focus rather than real-world clinical application of NLP within radiology. We believe this is mainly due to the difficulties in accessing data for research purposes. More support is needed to bring clinicians and NLP experts together to promote innovative thinking about how such work can benefit and be trialled in the clinical environment. The challenges in doing so are significant because of the need to work within safe environments to protect patient privacy. In terms of NLP methods, we observe that the general trends of NLP are applied within this research area, but we would emphasise as NLP moves more to deep learning it is particularly important in healthcare to think about how these methods can satisfy explainability. Explainability in artificial intelligence and NLP has become a hot topic in general but it is now also being addressed in the healthcare sector [ 143 , 144 ]. Methodology used is also impacted by data availability with uncommon diseases often being hard to predict with deep learning as data is scarce. If the practical and methodological challenges on data access, privacy and less data demanding approaches can be met there is much potential to increase the value of NLP within radiology. The sharing of tools, practice, and expertise could also ease the real-world application of NLP within radiology.

To help move the field forward, enable more inter-study comparisons, and increase study reproducibility we make the following recommendations for research studies:

Clarity in reporting study properties is required: (a) Data characteristics including size and the type of dataset should be detailed, e.g., the number of reports, sentences, patients, and if patients how many reports per patient. The training, test and validation data split should be evident, as should the source of the data. (b) Annotation characteristics including the methodology to develop the annotation should be reported, e.g., annotation set size, annotator details, how many, expertise. (c) Performance metrics should include a range of metrics: precision, recall, F1, accuracy and not just one overall value.

Significance testing should be carried out when a comparison between methods is made.

Data and code availability are encouraged. While making data available will often be challenging due to privacy concerns, researchers should make code available to enable inter-study comparisons and external validation of methods.

Common datasets should be used to benchmark and compare systems.

Limitations of study

Publication search is subject to bias in search methods and it is likely that our search strategy did inevitably miss some publications. Whilst trying to be precise and objective during our review process some of the data collected and categorising publications into categories was difficult to agree on and was subjective. For example, many of the publications could have belonged to more than one category. One of the reasons for this was how diverse in structure the content was which was in some ways reflected by the different domains papers were published in. It is also possible that certain keywords were missed in recording data elements due to the reviewers own biases and research experience.

This paper presents an systematic review of publications using NLP on radiology reports during the period 2015 to October 2019. We show there has been substantial growth in the field particularly in researchers using deep learning methods. Whilst deep learning use has increased, as seen in NLP research in general, it faces challenges of lower performance when data is scarce or when labelled data is unavailable, and is not widely used in clinical practice perhaps due to the difficulties in interpretability of such models. Traditional machine learning and rule-based methods are, therefore, still widely in use. Exploration of domain expertise such as medial lexical knowledge must be explored further to enhance performance when data is scarce. The clinical domain faces challenges due to privacy and ethics in sharing data but overcoming this would enable development of benchmarks to measure algorithm performance and test model robustness across institutions. Common agreed datasets to compare performance of tools against would help support the community in inter-study comparisons and validation of systems. The work we present here has the potential to inform researchers about applications of NLP to radiology and to lead to more reliable and responsible research in the domain.

Availability of data and materials

All data generated or analysed during this study are included in this published article [and its supplementary information files].

Abbreviations

natural language processing

international classification of diseases

Breast Imaging-Reporting and Data System

inter-annotator agreement

unified medical language system

embeddings from Language Models

bidirectional encoder representations from transformers

support vector machine

convolutional neural network

long short-term memory

bi-directional long short-term memory

bi-directional gated recurrent unit

conditional random field

Global Vectors for Word Representation

Bates J, Fodeh SJ, Brandt CA, Womack JA. Classification of radiology reports for falls in an HIV study cohort. J Am Med Inform Assoc. 2016;23(e1):113–7. https://doi.org/10.1093/jamia/ocv155 .

Article   Google Scholar  

Pons E, Braun LMM, Hunink MGM, Kors JA. Natural language processing in radiology: a systematic review. Radiology. 2016;279(2):329–43. https://doi.org/10.1148/radiol.16142770 .

Article   PubMed   Google Scholar  

Cai T, Giannopoulos AA, Yu S, Kelil T, Ripley B, Kumamaru KK, Rybicki FJ, Mitsouras D. Natural language processing technologies in radiology research and clinical applications. RadioGraphics. 2016;36(1):176–91. https://doi.org/10.1148/rg.2016150080 .

Article   PubMed   PubMed Central   Google Scholar  

Sorin V, Barash Y, Konen E, Klang E. Deep learning for natural language processing in radiology-fundamentals and a systematic review. J Am Coll Radiol. 2020;17(5):639–48. https://doi.org/10.1016/j.jacr.2019.12.026 .

Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, Forshee R, Walderhaug M, Botsis T. Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J Biomed Inform. 2017;73:14–29. https://doi.org/10.1016/j.jbi.2017.07.012 .

Spasic I, Nenadic G. Clinical text data in machine learning: systematic review. JMIR Med Inform. 2020;8(3):17984. https://doi.org/10.2196/17984 .

Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, Soni S, Wang Q, Wei Q, Xiang Y, Zhao B, Xu H. Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc. 2020;27(3):457–70. https://doi.org/10.1093/jamia/ocz200 .

Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4(1):1. https://doi.org/10.1186/2046-4053-4-1 .

Harzing AW. Publish or Perish (2007). Available from https://harzing.com/resources/publish-or-perish . Accessed 1 Nov 2019.

Gehanno J-F, Rollin L, Darmoni S. Is the coverage of google scholar enough to be used alone for systematic reviews. BMC Med Inform Decis Mak. 2013;13:7. https://doi.org/10.1186/1472-6947-13-7 .

Wilkinson LJ, REST API. Publication title: crossref type: website. https://www.crossref.org/education/retrieve-metadata/rest-api/ . Accessed 26 Jan 2020.

For AI AI. Semantic scholar |AI-powered research tool. https://api.semanticscholar.org/ . Accessed 26 Jan 2021.

University C. arXiv.org e-Print archive. https://arxiv.org/ . Accessed 26 Jan 2021.

Bearden E, LibGuides: unpaywall: home. https://library.lasalle.edu/c.php?g=982604&p=7105436 . Accessed 26 Jan 2021.

Briscoe S, Bethel A, Rogers M. Conduct and reporting of citation searching in Cochrane systematic reviews: a cross-sectional study. Res Synth Methods. 2020;11(2):169–80. https://doi.org/10.1002/jrsm.1355 .

Wohlin C, Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering. EASE ’14. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2601248.2601268 . event-place: London, England, UK. https://doi.org/10.1145/2601248.2601268 .

Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76(5):378–82. https://doi.org/10.1037/h0031619 .

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74. https://doi.org/10.2307/2529310 .

Article   CAS   Google Scholar  

Peng Y, Yan K, Sandfort V, Summers R.M, Lu Z. A self-attention based deep learning method for lesion attribute detection from CT reports. In: 2019 IEEE international conference on healthcare informatics (ICHI), pp. 1–5. IEEE Computer Society, Xi’an, China (2019). https://doi.org/10.1109/ICHI.2019.8904668 .

Bozkurt S, Alkim E, Banerjee I, Rubin DL. Automated detection of measurements and their descriptors in radiology reports using a hybrid natural language processing algorithm. J Digit Imaging. 2019;32(4):544–53. https://doi.org/10.1007/s10278-019-00237-9 .

Hassanpour S, Bay G, Langlotz CP. Characterization of change and significance for clinical findings in radiology reports through natural language processing. J Digit Imaging. 2017;30(3):314–22. https://doi.org/10.1007/s10278-016-9931-8 .

Kehl KL, Elmarakeby H, Nishino M, Van Allen EM, Lepisto EM, Hassett MJ, Johnson BE, Schrag D. Assessment of deep natural language processing in ascertaining oncologic outcomes from radiology reports. JAMA Oncol. 2019;5(10):1421–9. https://doi.org/10.1001/jamaoncol.2019.1800 .

Chen P-H, Zafar H, Galperin-Aizenberg M, Cook T. Integrating natural language processing and machine learning algorithms to categorize oncologic response in radiology reports. J Digit Imaging. 2018;31(2):178–84. https://doi.org/10.1007/s10278-017-0027-x .

Cotik V, Rodríguez H, Vivaldi J. Spanish named entity recognition in the biomedical domain. In: Lossio-Ventura JA, Muñante D, Alatrista-Salas H, editors. Information management and big data. Communications in computer and information science, vol. 898. Lima: Springer; 2018. p. 233–48. https://doi.org/10.1007/978-3-030-11680-4-23 .

Chapter   Google Scholar  

Sevenster M, Buurman J, Liu P, Peters JF, Chang PJ. Natural language processing techniques for extracting and categorizing finding measurements in narrative radiology reports. Appl Clin Inform. 2015;06(3):600–10. https://doi.org/10.4338/ACI-2014-11-RA-0110 .

Sevenster M, Bozeman J, Cowhy A, Trost W. A natural language processing pipeline for pairing measurements uniquely across free-text CT reports. J Biomed Inform. 2015;53:36–48. https://doi.org/10.1016/j.jbi.2014.08.015 .

Oberkampf H, Zillner S, Overton JA, Bauer B, Cavallaro A, Uder M, Hammon M. Semantic representation of reported measurements in radiology. BMC Med Inform Decis Mak. 2016;16(1):5. https://doi.org/10.1186/s12911-016-0248-9 .

Liu Y, Zhu L-N, Liu Q, Han C, Zhang X-D, Wang X-Y. Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing. Chin Med J. 2019;132(14):1673–80. https://doi.org/10.1097/CM9.0000000000000301 .

Gupta A, Banerjee I, Rubin DL. Automatic information extraction from unstructured mammography reports using distributed semantics. J Biomed Inform. 2018;78:78–86. https://doi.org/10.1016/j.jbi.2017.12.016 .

Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S, Bekhuis T, Jacobson RS. Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform. 2017;69:177–87. https://doi.org/10.1016/j.jbi.2017.04.011 .

Short RG, Bralich J, Bogaty D, Befera NT. Comprehensive word-level classification of screening mammography reports using a neural network sequence labeling approach. J Digit Imaging. 2019;32(5):685–92. https://doi.org/10.1007/s10278-018-0141-4 .

Lacson R, Goodrich ME, Harris K, Brawarsky P, Haas JS. Assessing inaccuracies in automated information extraction of breast imaging findings. J Digit Imaging. 2017;30(2):228–33. https://doi.org/10.1007/s10278-016-9927-4 .

Lacson R, Harris K, Brawarsky P, Tosteson TD, Onega T, Tosteson ANA, Kaye A, Gonzalez I, Birdwell R, Haas JS. Evaluation of an automated information extraction tool for imaging data elements to populate a breast cancer screening registry. J Digit Imaging. 2015;28(5):567–75. https://doi.org/10.1007/s10278-014-9762-4 .

Yim W-W, Kwan SW, Yetisgen M. Tumor reference resolution and characteristic extraction in radiology reports for liver cancer stage prediction. J Biomed Inform. 2016;64:179–91. https://doi.org/10.1016/j.jbi.2016.10.005 .

Yim W-W, Kwan SW, Yetisgen M. Classifying tumor event attributes in radiology reports. J Assoc Inform Sci Technol. 2017;68(11):2662–74. https://doi.org/10.1002/asi.23937 .

Yim W, Denman T, Kwan SW, Yetisgen M. Tumor information extraction in radiology reports for hepatocellular carcinoma patients. AMIA Summits Transl Sci Proc. 2016;2016:455–64.

PubMed   Google Scholar  

Pruitt P, Naidech A, Van Ornam J, Borczuk P, Thompson W. A natural language processing algorithm to extract characteristics of subdural hematoma from head CT reports. Emerg Radiol. 2019;26(3):301–6. https://doi.org/10.1007/s10140-019-01673-4 .

Farjah F, Halgrim S, Buist DSM, Gould MK, Zeliadt SB, Loggers ET, Carrell DS. An automated method for identifying individuals with a lung nodule can be feasibly implemented across health systems. eGEMs. 2016;4(1):1254. https://doi.org/10.13063/2327-9214.1254 .

Karunakaran B, Misra D, Marshall K, Mathrawala D, Kethireddy S. Closing the loop-finding lung cancer patients using NLP. In: 2017 IEEE international conference on big data (big data), pp. 2452–61. IEEE, Boston, MA (2017). https://doi.org/10.1109/BigData.2017.8258203 .

Tan WK, Hassanpour S, Heagerty PJ, Rundell SD, Suri P, Huhdanpaa HT, James K, Carrell DS, Langlotz CP, Organ NL, Meier EN, Sherman KJ, Kallmes DF, Luetmer PH, Griffith B, Nerenz DR, Jarvik JG. Comparison of natural language processing rules-based and machine-learning systems to identify lumbar spine imaging findings related to low back pain. Acad Radiol. 2018;25(11):1422–32. https://doi.org/10.1016/j.acra.2018.03.008 .

Trivedi G, Hong C, Dadashzadeh ER, Handzel RM, Hochheiser H, Visweswaran S. Identifying incidental findings from radiology reports of trauma patients: an evaluation of automated feature representation methods. Int J Med Inform. 2019;129:81–7. https://doi.org/10.1016/j.ijmedinf.2019.05.021 .

Fu S, Leung LY, Wang Y, Raulli A-O, Kallmes DF, Kinsman KA, Nelson KB, Clark MS, Luetmer PH, Kingsbury PR, Kent DM, Liu H. Natural language processing for the identification of silent brain infarcts from neuroimaging reports. JMIR Med Inform. 2019;7(2):12109. https://doi.org/10.2196/12109 .

Jnawali K, Arbabshirani MR, Ulloa AE, Rao N, Patel AA. Automatic classification of radiological report for intracranial hemorrhage. In: 2019 IEEE 13th international conference on semantic computing (ICSC), pp. 187–90. IEEE, Newport Beach, CA, USA (2019). https://doi.org/10.1109/ICOSC.2019.8665578 .

Banerjee I, Madhavan S, Goldman RE, Rubin DL. Intelligent Word embeddings of free-text radiology reports. In: AMIA annual symposium proceedings, pp. 411–20 (2017). Accessed 30 Oct 2020.

Kłos M, Żyłkowski J, Spinczyk D, Automatic classification of text documents presenting radiology examinations. In: Pietka E, Badura P, Kawa J, Wieclawek W, editors. Proceedings 6th international conference information technology in biomedicine (ITIB’2018). Advances in intelligent systems and computing, pp. 495–505. Springer (2018). https://doi.org/10.1007/978-3-319-91211-0-43 .

Deshmukh N, Gumustop S, Gauriau R, Buch V, Wright B, Bridge C, Naidu R, Andriole K, Bizzo B. Semi-supervised natural language approach for fine-grained classification of medical reports. arXiv:1910.13573 [cs.LG] (2019). Accessed 30 Oct 2020.

Kim C, Zhu V, Obeid J, Lenert L. Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke. PLoS ONE. 2019;14(2):0212778. https://doi.org/10.1371/journal.pone.0212778 .

Garg R, Oh E, Naidech A, Kording K, Prabhakaran S. Automating ischemic stroke subtype classification using machine learning and natural language processing. J Stroke Cerebrovasc Dis. 2019;28(7):2045–51. https://doi.org/10.1016/j.jstrokecerebrovasdis.2019.02.004 .

Shin B, Chokshi FH, Lee T, Choi JD. Classification of radiology reports using neural attention models. In: 2017 international joint conference on neural networks (IJCNN), pp. 4363–70. IEEE, Anchorage, AK (2017). https://doi.org/10.1109/IJCNN.2017.7966408 .

Wheater E, Mair G, Sudlow C, Alex B, Grover C, Whiteley W. A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records. BMC Med Inform Decis Mak. 2019;19(1):184. https://doi.org/10.1186/s12911-019-0908-7 .

Gorinski P.J, Wu H, Grover C, Tobin R, Talbot C, Whalley H, Sudlow C, Whiteley W, Alex B. Named entity recognition for electronic health records: a comparison of rule-based and machine learning approaches. arXiv:1903.03985 [cs.CL] (2019). Accessed 30 Oct 2020.

Alex B, Grover C, Tobin R, Sudlow C, Mair G, Whiteley W. Text mining brain imaging reports. J Biomed Semant. 2019;10(1):23. https://doi.org/10.1186/s13326-019-0211-7 .

Bozkurt S, Gimenez F, Burnside ES, Gulkesen KH, Rubin DL. Using automatically extracted information from mammography reports for decision-support. J Biomed Inform. 2016;62:224–31. https://doi.org/10.1016/j.jbi.2016.07.001 .

Patel TA, Puppala M, Ogunti RO, Ensor JE, He T, Shewale JB, Ankerst DP, Kaklamani VG, Rodriguez AA, Wong STC, Chang JC. Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer. 2017;123(1):114–21. https://doi.org/10.1002/cncr.30245 .

Banerjee I, Bozkurt S, Alkim E, Sagreiya H, Kurian AW, Rubin DL. Automatic inference of BI-RADS final assessment categories from narrative mammography report findings. J Biomed Inform. 2019. https://doi.org/10.1016/j.jbi.2019.103137 .

Miao S, Xu T, Wu Y, Xie H, Wang J, Jing S, Zhang Y, Zhang X, Yang Y, Zhang X, Shan T, Wang L, Xu H, Wang S, Liu Y. Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches. Int J Med Inform. 2018;119:17–21. https://doi.org/10.1016/j.ijmedinf.2018.08.009 .

Dunne RM, Ip IK, Abbett S, Gershanik EF, Raja AS, Hunsaker A, Khorasani R. Effect of evidence-based clinical decision support on the use and yield of CT pulmonary angiographic imaging in hospitalized patients. Radiology. 2015;276(1):167–74. https://doi.org/10.1148/radiol.15141208 .

Banerjee I, Ling Y, Chen MC, Hasan SA, Langlotz CP, Moradzadeh N, Chapman B, Amrhein T, Mong D, Rubin DL, Farri O, Lungren MP. Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artif Intell Med. 2019;97:79–88. https://doi.org/10.1016/j.artmed.2018.11.004 .

Chen MC, Ball RL, Yang L, Moradzadeh N, Chapman BE, Larson DB, Langlotz CP, Amrhein TJ, Lungren MP. Deep learning to classify radiology free-text reports. Radiology. 2017;286(3):845–52. https://doi.org/10.1148/radiol.2017171115 .

Meystre S, Gouripeddi R, Tieder J, Simmons J, Srivastava R, Shah S. Enhancing comparative effectiveness research with automated pediatric pneumonia detection in a multi-institutional clinical repository: a PHIS+ pilot study. J Med Internet Res. 2017;19(5):162. https://doi.org/10.2196/jmir.6887 .

Beyer SE, McKee BJ, Regis SM, McKee AB, Flacke S, El Saadawi G, Wald C. Automatic lung-RADSTM classification with a natural language processing system. J Thorac Dis. 2017;9(9):3114–22. https://doi.org/10.21037/jtd.2017.08.13 .

Patterson OV, Freiberg MS, Skanderson M, Fodeh SJ, Brandt CA, DuVall SL. Unlocking echocardiogram measurements for heart disease research through natural language processing. BMC Cardiovasc Disord. 2017;17(1):151. https://doi.org/10.1186/s12872-017-0580-8 .

Lee C, Kim Y, Kim YS, Jang J. Automatic disease annotation from radiology reports using artificial intelligence implemented by a recurrent neural network. Am J Roentgenol. 2019;212(4):734–40. https://doi.org/10.2214/AJR.18.19869 .

Fiebeck J, Laser H, Winther HB, Gerbel S. Leaving no stone unturned: using machine learning based approaches for information extraction from full texts of a research data warehouse. In: Auer S, Vidal M-E, editors. 13th international conference data integration in the life sciences (DILS 2018). Lecture Notes in Computer Science, pp. 50–8. Springer, Hannover, Germany (2018). https://doi.org/10.1007/978-3-030-06016-9_5 .

Hassanzadeh H, Kholghi M, Nguyen A, Chu K. Clinical document classification using labeled and unlabeled data across hospitals. In: AMIA annual symposium proceedings 2018, pp. 545–54 (2018). Accessed 30 Oct 2020.

Krishnan GS, Kamath SS. Ontology-driven text feature modeling for disease prediction using unstructured radiological notes. Comput Sist. 2019. https://doi.org/10.13053/cys-23-3-3238 .

Qenam B, Kim TY, Carroll MJ, Hogarth M. Text simplification using consumer health vocabulary to generate patient-centered radiology reporting: translation and evaluation. J Med Internet Res. 2017;19(12):417. https://doi.org/10.2196/jmir.8536 .

Lafourcade M, Ramadier L. Radiological text simplification using a general knowledge base. In: 18th international conference on computational linguistics and intelligent text processing (CICLing 2017). CICLing 2017. Budapest, Hungary (2017). https://doi.org/10.1007/978-3-319-77116-8_46 .

Hong Y, Zhang J. Investigation of terminology coverage in radiology reporting templates and free-text reports. Int J Knowl Content Dev Technol. 2015;5:5–14. https://doi.org/10.5865/IJKCT.2015.5.1.005 .

Comelli A, Agnello L, Vitabile S. An ontology-based retrieval system for mammographic reports. In: 2015 IEEE symposium on computers and communication (ISCC), pp. 1001–6. IEEE, Larnaca (2015). https://doi.org/10.1109/ISCC.2015.7405644

Cotik V, Filippo D, Castano J. An approach for automatic classification of radiology reports in Spanish. Stud Health Technol Inform. 2015;216:634–8.

Johnson E, Baughman WC, Ozsoyoglu G. A method for imputation of semantic class in diagnostic radiology text. In: 2015 IEEE international conference on bioinformatics and biomedicine (BIBM), pp. 750–5. IEEE, Washington, DC (2015). https://doi.org/10.1109/BIBM.2015.7359780 .

Mujjiga S, Krishna V, Chakravarthi KJV. Identifying semantics in clinical reports using neural machine translation. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33(01), pp. 9552–7 (2019). https://doi.org/10.1609/aaai.v33i01.33019552 . Accessed 30 Oct 2020.

Lafourcade M, Ramadier L. Semantic relation extraction with semantic patterns: experiment on radiology report. In: Proceedings of the tenth international conference on language resources and evaluation (LREC 2016). LREC 2016 proceedings. european language resources association (ELRA), Portorož, Slovenia (2016). https://hal.archives-ouvertes.fr/hal-01382320 .

Shelmerdine SC, Singh M, Norman W, Jones R, Sebire NJ, Arthurs OJ. Automated data extraction and report analysis in computer-aided radiology audit: practice implications from post-mortem paediatric imaging. Clin Radiol. 2019;74(9):733–1173318. https://doi.org/10.1016/j.crad.2019.04.021 .

Mabotuwana T, Hombal V, Dalal S, Hall CS, Gunn M. Determining adherence to follow-up imaging recommendations. J Am Coll Radiol. 2018;15(3, Part A):422–8. https://doi.org/10.1016/j.jacr.2017.11.022 .

Dalal S, Hombal V, Weng W-H, Mankovich G, Mabotuwana T, Hall CS, Fuller J, Lehnert BE, Gunn ML. Determining follow-up imaging study using radiology reports. J Digit Imaging. 2020;33(1):121–30. https://doi.org/10.1007/s10278-019-00260-w .

Bobbin MD, Ip IK, Sahni VA, Shinagare AB, Khorasani R. Focal cystic pancreatic lesion follow-up recommendations after publication of ACR white paper on managing incidental findings. J Am Coll Radiol. 2017;14(6):757–64. https://doi.org/10.1016/j.jacr.2017.01.044 .

Kwan JL, Yermak D, Markell L, Paul NS, Shojania KJ, Cram P. Follow up of incidental high-risk pulmonary nodules on computed tomography pulmonary angiography at care transitions. J Hosp Med. 2019;14(6):349–52. https://doi.org/10.12788/jhm.3128 .

Mabotuwana T, Hall CS, Tieder J, Gunn ML. Improving quality of follow-up imaging recommendations in radiology. In: AMIA annual symposium proceedings, vol. 2017, pp. 1196–204 (2018). Accessed 30 Oct 2020.

Brown AD, Marotta TR. A natural language processing-based model to automate MRI brain protocol selection and prioritization. Acad Radiol. 2017;24(2):160–6. https://doi.org/10.1016/j.acra.2016.09.013 .

Trivedi H, Mesterhazy J, Laguna B, Vu T, Sohn JH. Automatic determination of the need for intravenous contrast in musculoskeletal MRI examinations using IBM Watson’s natural language processing algorithm. J Digit Imaging. 2018;31(2):245–51. https://doi.org/10.1007/s10278-017-0021-3 .

Zhang AY, Lam SSW, Liu N, Pang Y, Chan LL, Tang PH. Development of a radiology decision support system for the classification of MRI brain scans. In: 2018 IEEE/ACM 5th international conference on big data computing applications and technologies (BDCAT), pp. 107–15 (2018). https://doi.org/10.1109/BDCAT.2018.00021 .

Brown AD, Marotta TR. Using machine learning for sequence-level automated MRI protocol selection in neuroradiology. J Am Med Inform Assoc. 2018;25(5):568–71. https://doi.org/10.1093/jamia/ocx125 .

Yan Z, Ip IK, Raja AS, Gupta A, Kosowsky JM, Khorasani R. Yield of CT pulmonary angiography in the emergency department when providers override evidence-based clinical decision support. Radiology. 2016;282(3):717–25. https://doi.org/10.1148/radiol.2016151985 .

Kang SK, Garry K, Chung R, Moore WH, Iturrate E, Swartz JL, Kim DC, Horwitz LI, Blecker S. Natural language processing for identification of incidental pulmonary nodules in radiology reports. J Am Coll Radiol. 2019;16(11):1587–94. https://doi.org/10.1016/j.jacr.2019.04.026 .

Brown AD, Kachura JR. Natural language processing of radiology reports in patients with hepatocellular carcinoma to predict radiology resource utilization. J Am Coll Radiol. 2019;16(6):840–4. https://doi.org/10.1016/j.jacr.2018.12.004 .

Article   CAS   PubMed   Google Scholar  

Grundmeier RW, Masino AJ, Casper TC, Dean JM, Bell J, Enriquez R, Deakyne S, Chamberlain JM, Alpern ER. Identification of long bone fractures in radiology reports using natural language processing to support healthcare quality improvement. Appl Clin Inform. 2016;7(4):1051–68. https://doi.org/10.4338/ACI-2016-08-RA-0129 .

Heilbrun ME, Chapman BE, Narasimhan E, Patel N, Mowery D. Feasibility of natural language processing-assisted auditing of critical findings in chest radiology. J Am Coll Radiol. 2019;16(9, Part B):1299–304. https://doi.org/10.1016/j.jacr.2019.05.038 .

Maros ME, Wenz R, Förster A, Froelich MF, Groden C, Sommer WH, Schönberg SO, Henzler T, Wenz H. Objective comparison using guideline-based query of conventional radiological reports and structured reports. In Vivo. 2018;32(4):843–9. https://doi.org/10.21873/invivo.11318 .

Minn MJ, Zandieh AR, Filice RW. Improving radiology report quality by rapidly notifying radiologist of report errors. J Digit Imaging. 2015;28(4):492–8. https://doi.org/10.1007/s10278-015-9781-9 .

Goldshtein I, Chodick G, Kochba I, Gal N, Webb M, Shibolet O. Identification and characterization of nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol. 2020;18(8):1887–9. https://doi.org/10.1016/j.cgh.2019.08.007 .

Redman JS, Natarajan Y, Hou JK, Wang J, Hanif M, Feng H, Kramer JR, Desiderio R, Xu H, El-Serag HB, Kanwal F. Accurate identification of fatty liver disease in data warehouse utilizing natural language processing. Dig Dis Sci. 2017;62(10):2713–8. https://doi.org/10.1007/s10620-017-4721-9 .

Sada Y, Hou J, Richardson P, El-Serag H, Davila J. Validation of case finding algorithms for hepatocellular cancer from administrative data and electronic health records using natural language processing. Med Care. 2016;54(2):9–14. https://doi.org/10.1097/MLR.0b013e3182a30373 .

Li AY, Elliot N. Natural language processing to identify ureteric stones in radiology reports. J Med Imaging Radiat Oncol. 2019;63(3):307–10. https://doi.org/10.1111/1754-9485.12861 .

Tan WK, Heagerty PJ. Surrogate-guided sampling designs for classification of rare outcomes from electronic medical records data. arXiv:1904.00412 [stat.ME] (2019). Accessed 30 Oct 2020.

Yadav K, Sarioglu E, Choi H-A, Cartwright WB, Hinds PS, Chamberlain JM. Automated outcome classification of computed tomography imaging reports for pediatric traumatic brain injury. Acad Emerg Med. 2016;23(2):171–8. https://doi.org/10.1111/acem.12859 .

Mahan M, Rafter D, Casey H, Engelking M, Abdallah T, Truwit C, Oswood M, Samadani U. tbiExtractor: a framework for extracting traumatic brain injury common data elements from radiology reports. bioRxiv 585331 (2019). https://doi.org/10.1101/585331 . Accessed 05 Dec 2020.

Brizzi K, Zupanc SN, Udelsman BV, Tulsky JA, Wright AA, Poort H, Lindvall C. Natural language processing to assess palliative care and end-of-life process measures in patients with breast cancer with leptomeningeal disease. Am J Hosp Palliat Med. 2019;37(5):371–6. https://doi.org/10.1177/1049909119885585 .

Van Haren RM, Correa AM, Sepesi B, Rice DC, Hofstetter WL, Mehran RJ, Vaporciyan AA, Walsh GL, Roth JA, Swisher SG, Antonoff MB. Ground glass lesions on chest imaging: evaluation of reported incidence in cancer patients using natural language processing. Ann Thorac Surg. 2019;107(3):936–40. https://doi.org/10.1016/j.athoracsur.2018.09.016 .

Noorbakhsh-Sabet N, Tsivgoulis G, Shahjouei S, Hu Y, Goyal N, Alexandrov AV, Zand R. Racial difference in cerebral microbleed burden among a patient population in the mid-south United States. J Stroke Cerebrovasc Dis. 2018;27(10):2657–61. https://doi.org/10.1016/j.jstrokecerebrovasdis.2018.05.031 .

Gould MK, Tang T, Liu I-LA, Lee J, Zheng C, Danforth KN, Kosco AE, Di Fiore JL, Suh DE. Recent trends in the identification of incidental pulmonary nodules. Am J Respir Crit Care Med. 2015;192(10):1208–14. https://doi.org/10.1164/rccm.201505-0990OC .

Huhdanpaa HT, Tan WK, Rundell SD, Suri P, Chokshi FH, Comstock BA, Heagerty PJ, James KT, Avins AL, Nedeljkovic SS, Nerenz DR, Kallmes DF, Luetmer PH, Sherman KJ, Organ NL, Griffith B, Langlotz CP, Carrell D, Hassanpour S, Jarvik JG. Using natural language processing of free-text radiology reports to identify type 1 modic endplate changes. J Digit Imaging. 2018;31(1):84–90. https://doi.org/10.1007/s10278-017-0013-3 .

Masino AJ, Grundmeier RW, Pennington JW, Germiller JA, Crenshaw EB. Temporal bone radiology report classification using open source machine learning and natural langue processing libraries. BMC Med Inform Decis Mak. 2016;16(1):65. https://doi.org/10.1186/s12911-016-0306-3 .

Valtchinov VI, Lacson R, Wang A, Khorasani R. Comparing artificial intelligence approaches to retrieve clinical reports documenting implantable devices posing MRI safety risks. J Am Coll Radiol. 2020;17(2):272–9. https://doi.org/10.1016/j.jacr.2019.07.018 .

Zech J, Forde J, Titano JJ, Kaji D, Costa A, Oermann EK. Detecting insertion, substitution, and deletion errors in radiology reports using neural sequence-to-sequence models. Ann Transl Med. 2019. https://doi.org/10.21037/atm.2018.08.11 .

Zhang Y, Merck D, Tsai EB, Manning CD, Langlotz CP. Optimizing the factual correctness of a summary: a study of summarizing radiology reports. arXiv:1911.02541 [cs.CL] (2019). Accessed 30 Oct 2020.

Steinkamp JM, Chambers C, Lalevic D, Zafar HM, Cook TS. Toward complete structured information extraction from radiology reports using machine learning. J Digit Imaging. 2019;32(4):554–64. https://doi.org/10.1007/s10278-019-00234-y .

Cocos A, Qian T, Callison-Burch C, Masino AJ. Crowd control: effectively utilizing unscreened crowd workers for biomedical data annotation. J Biomed Inform. 2017;69:86–92. https://doi.org/10.1016/j.jbi.2017.04.003 .

Ratner A, Hancock B, Dunnmon J, Goldman R, Ré C. Snorkel MeTaL: weak supervision for multi-task learning. In: Proceedings of the second workshop on data management for end-to-end machine learning. DEEM’18, vol. 3, pp. 1–4. ACM, Houston, TX, USA (2018). https://doi.org/10.1145/3209889.3209898 . https://doi.org/10.1145/3209889.3209898 . Accessed 30 Oct 2020.

Zhu H, Paschalidis IC, Hall C, Tahmasebi A. Context-driven concept annotation in radiology reports: anatomical phrase labeling. In: AMIA summits on translational science proceedings, vol. 2019, pp. 232–41 (2019). Accessed 30 Oct 2020.

Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space (2013). http://arxiv.org/abs/1301.3781 . Accessed 7 Feb 2021.

Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–43 (2014).

Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A. Advances in pre-training distributed word representations. In: Proceedings of the international conference on language resources and evaluation (LREC 2018) (2018).

Peters M.E, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. CoRR abs/1802.05365 (2018). \_eprint: 1802.05365.

Devlin J, Chang M-W, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

National Library of Medicine: Unified medical language system (2021). https://www.nlm.nih.gov/research/umls/index.html . Accessed 7 Feb 2021.

RSNA: RadLex (2021). http://radlex.org/ . Accessed 7 Feb 2021.

National Library of Medicine: SNOMED CT, (2021). https://www.nlm.nih.gov/healthit/snomedct/index.html . Accessed 07 Feb 2021.

Bulu H, Sippo DA, Lee JM, Burnside ES, Rubin DL. Proposing new RadLex terms by analyzing free-text mammography reports. J Digit Imaging. 2018;31(5):596–603. https://doi.org/10.1007/s10278-018-0064-0 .

Hassanpour S, Langlotz CP. Unsupervised topic modeling in a large free text radiology report repository. J Digit Imaging. 2016;29(1):59–62. https://doi.org/10.1007/s10278-015-9823-3 .

Zhao Y, Fesharaki NJ, Liu H, Luo J. Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation. BMC Med Inform Decis Mak. 2018;18(1):61. https://doi.org/10.1186/s12911-018-0645-3 .

Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37–46. https://doi.org/10.1177/001316446002000104 .

Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform. 2018;22(5):1589–604. https://doi.org/10.1109/JBHI.2017.2767063 .

Chen D, Liu S, Kingsbury P, Sohn S, Storlie CB, Habermann EB, Naessens JM, Larson DW, Liu H. Deep learning and alternative learning strategies for retrospective real-world clinical data. npj Digit Med. 2019;2(1):1–5. https://doi.org/10.1038/s41746-019-0122-0 .

Yang H, Li L, Yang R, Zhou Y. Towards automated knowledge discovery of hepatocellular carcinoma: extract patient information from Chinese clinical reports. In: Proceedings of the 2nd international conference on medical and health informatics. ICMHI ’18, pp. 111–6. ACM, New York, NY, USA (2018). https://doi.org/10.1145/3239438.3239445 . Accessed 30 Oct 2020.

Wood D.A, Lynch J, Kafiabadi S, Guilhem E, Busaidi A.A, Montvila A, Varsavsky T, Siddiqui J, Gadapa N, Townend M, Kiik M, Patel K, Barker G, Ourselin S, Cole JH, Booth TC. Automated labelling using an attention model for radiology reports of MRI scans (ALARM). arXiv:2002.06588 [cs.CV] (2020). Accessed 03 Dec 2020.

Ong CJ, Orfanoudaki A, Zhang R, Caprasse FPM, Hutch M, Ma L, Fard D, Balogun O, Miller MI, Minnig M, Saglam H, Prescott B, Greer DM, Smirnakis S, Bertsimas D. Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports. PLoS ONE. 2020;15(6):0234908. https://doi.org/10.1371/journal.pone.0234908 .

Smit A, Jain S, Rajpurkar P, Pareek A, Ng A, Lungren M. Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp. 1500–19. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.117 . https://www.aclweb.org/anthology/2020.emnlp-main.117 . Accessed 03 Dec 2020.

Grivas A, Alex B, Grover C, Tobin R, Whiteley W. Not a cute stroke: analysis of rule- and neural network-based information extraction systems for brain radiology reports. In: Proceedings of the 11th international workshop on health text mining and information analysis (2020).

Ettinger A. What BERT is not: lessons from a new suite of psycholinguistic diagnostics for language models. Trans Assoc Comput Linguist. 2020;8:34–48. https://doi.org/10.1162/tacl_a_00298 .

Alsentzer E, Murphy J, Boag W, Weng W-H, Jindi D, Naumann T, McDermott M. Publicly available clinical BERT embeddings. In: Proceedings of the 2nd clinical natural language processing workshop, pp. 72–8. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019). https://doi.org/10.18653/v1/W19-1909 . https://www.aclweb.org/anthology/W19-1909 .

Smit A, Jain S, Rajpurkar P, Pareek A, Ng AY, Lungren MP. CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. CoRR abs/2004.09167 (2020). \_eprint: 2004.09167.

Yasaka K, Abe O. Deep learning and artificial intelligence in radiology: current applications and future directions. PLOS Med. 2018;15(11):1002707. https://doi.org/10.1371/journal.pmed.1002707 .

Percha B, Zhang Y, Bozkurt S, Rubin D, Altman RB, Langlotz CP. Expanding a radiology lexicon using contextual patterns in radiology reports. J Am Med Inform Assoc. 2018;25(6):679–85. https://doi.org/10.1093/jamia/ocx152 .

Tahmasebi AM, Zhu H, Mankovich G, Prinsen P, Klassen P, Pilato S, van Ommering R, Patel P, Gunn ML, Chang P. Automatic normalization of anatomical phrases in radiology reports using unsupervised learning. J Digit Imaging. 2019;32(1):6–18. https://doi.org/10.1007/s10278-018-0116-5 .

Banerjee I, Chen MC, Lungren MP, Rubin DL. Radiology report annotation using intelligent word embeddings: applied to multi-institutional chest CT cohort. J Biomed Inform. 2018;77:11–20. https://doi.org/10.1016/j.jbi.2017.11.012 .

Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag. 2018;13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738 .

Donnelly LF, Grzeszczuk R, Guimaraes CV, Zhang W, Bisset GS III. Using a natural language processing and machine learning algorithm program to analyze inter-radiologist report style variation and compare variation between radiologists when using highly structured versus more free text reporting. Curr Probl Diagn Radiol. 2019;48(6):524–30. https://doi.org/10.1067/j.cpradiol.2018.09.005 .

Xie Z, Yang Y, Wang M, Li M, Huang H, Zheng D, Shu R, Ling T. Introducing information extraction to radiology information systems to improve the efficiency on reading reports. Methods Inf Med. 2019;58(2–03):94–106. https://doi.org/10.1055/s-0039-1694992 .

Zech J, Pain M, Titano J, Badgeley M, Schefflein J, Su A, Costa A, Bederson J, Lehar J, Oermann EK. Natural language-based machine learning models for the annotation of clinical radiology reports. Radiology. 2018;287(2):570–80. https://doi.org/10.1148/radiol.2018171093 .

Yim W, Kwan SW, Johnson G, Yetisgen M. Classification of hepatocellular carcinoma stages from free-text clinical and radiology reports. In: AMIA annual symposium proceedings, vol. 2017, pp. 1858–67 (2018). Accessed 30 Oct 2020.

Payrovnaziri SN, Chen Z, Rengifo-Moreno P, Miller T, Bian J, Chen JH, Liu X, He Z. Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. J Am Med Inform Assoc. 2020;27(7):1173–85. https://doi.org/10.1093/jamia/ocaa053 .

Dong H, Suárez-Paniagua V, Whiteley W, Wu H. Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation. J Biomed Inform. 2021. https://doi.org/10.1016/j.jbi.2021.103728 .

Download references

Acknowledgements

Not applicable.

This research was supported by the Alan Turing Institute, MRC, HDR-UK and the Chief Scientist Office. B.A.,A.C,D.D.,A.G. and C.G. have been supported by the Alan Turing Institute via Turing Fellowships (B.A,C.G.) and Turing project funding (ESPRC Grant EP/N510129/1). A.G. was also funded by a MRC Mental Health Data Pathfinder Award (MRC-MCPC17209). H.W. is MRC/Rutherford Fellow HRD UK (MR/S004149/1). H.D. is supported by HDR UK National Phemomics Resource Project. V.S-P. is supported by the HDR UK National Text Analytics Implementation Project. W.W. is supported by a Scottish Senior Clinical Fellowship (CAF/17/01).

Author information

Authors and affiliations.

School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland

Arlene Casey, Daniel Duma & Beatrice Alex

Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland

Emma Davidson, Michael Poon & William Whiteley

Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland

Hang Dong & Víctor Suárez-Paniagua

Health Data Research UK, London, UK

Hang Dong, Víctor Suárez-Paniagua & Honghan Wu

Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland

Andreas Grivas, Claire Grover & Richard Tobin

Nuffield Department of Population Health, University of Oxford, Oxford, UK

William Whiteley

Institute of Health Informatics, University College London, London, UK

Edinburgh Futures Institute, University of Edinburgh, Edinburgh, Scotland

Beatrice Alex

You can also search for this author in PubMed   Google Scholar

Contributions

B.A., W.W. and H.W. conceptualised this study. D.D. carried out the search including automated filtering and designing meta-enriching steps. BA, AG, CG and RT advised on the automatic data collection method devised by DD. M.T.C.P, A.G., H.D. and D.D carried out the first stage review and A.C., E.D., V.S-P, M.T.C.P, A.G., H.D., B.A. and D.D. carried out the second-stage review. A.C. synthesised the data and wrote the main manuscript with contributions from all authors. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Arlene Casey .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Publication list with application and technical categories.

Additional file 2.

Individual properties for every publication.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Casey, A., Davidson, E., Poon, M. et al. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 21 , 179 (2021). https://doi.org/10.1186/s12911-021-01533-7

Download citation

Received : 09 February 2021

Accepted : 17 May 2021

Published : 03 June 2021

DOI : https://doi.org/10.1186/s12911-021-01533-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Natural language processing
  • Systematic review

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

natural language processing literature review

ACM Digital Library home

  • Advanced Search

Neural natural language processing for long texts: : A survey on classification and summarization

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, recommendations, automatic keyphrase extraction for arabic news documents based on kea system.

A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document ...

Recent advances in document summarization

The task of automatic document summarization aims at generating short summaries for originally long documents. A good summary should cover the most important information of the original document or a cluster of documents, while being coherent, non-...

Natural Language Processing for Sentiment Analysis: An Exploratory Analysis on Tweets

In this paper, we present our preliminary experiments on tweets sentiment analysis. This experiment is designed to extract sentiment based on subjects that exist in tweets. It detects the sentiment that refers to the specific subject using Natural ...

Information

Published in.

Pergamon Press, Inc.

United States

Publication History

Author tags.

  • Natural language processing
  • Long document
  • Document classification
  • Document summarization
  • Sentiment analysis
  • Deep neural networks
  • Short-survey

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Environ Res Public Health

Logo of ijerph

A Narrative Literature Review of Natural Language Processing Applied to the Occupational Exposome

Annika m. schoene.

1 Department of Computer Science, University of Manchester, Manchester M13 9PL, UK

Ioannis Basinas

2 Department of Health Science, University of Manchester, Manchester M13 9PL, UK; [email protected] (I.B.); [email protected] (M.v.T.)

Martie van Tongeren

Sophia ananiadou, associated data.

Not applicable.

The evolution of the Exposome concept revolutionised the research in exposure assessment and epidemiology by introducing the need for a more holistic approach on the exploration of the relationship between the environment and disease. At the same time, further and more dramatic changes have also occurred on the working environment, adding to the already existing dynamic nature of it. Natural Language Processing (NLP) refers to a collection of methods for identifying, reading, extracting and untimely transforming large collections of language. In this work, we aim to give an overview of how NLP has successfully been applied thus far in Exposome research. Methods: We conduct a literature search on PubMed, Scopus and Web of Science for scientific articles published between 2011 and 2021. We use both quantitative and qualitative methods to screen papers and provide insights into the inclusion and exclusion criteria. We outline our approach for article selection and provide an overview of our findings. This is followed by a more detailed insight into selected articles. Results: Overall, 6420 articles were screened for the suitability of this review, where we review 37 articles in depth. Finally, we discuss future avenues of research and outline challenges in existing work. Conclusions: Our results show that (i) there has been an increase in articles published that focus on applying NLP to exposure and epidemiology research, (ii) most work uses existing NLP tools and (iii) traditional machine learning is the most popular approach.

1. Introduction

Natural Language Processing is an area of research within Artificial Intelligence (AI) that is concerned with giving computers the ability to understand natural language (spoken and written) in the same way a human could [ 1 ]. Knowledge of computational linguistics (rule-based modelling of human language), statistics, machine learning and deep learning are used either individually or combined to achieve the aforementioned goal [ 2 ]. The term Exposome was first introduced by [ 3 ], who defined an area of research that takes systematic measurements of exposures (e.g., occupational, physical environment or socio-economic factors) that a person is exposed to throughout life (pre-birth until death) and affects their health outcomes [ 3 ]. However, the term Exposome itself has not been fully integrated into all areas of exposure research yet, where often the term ‘ exposure research ’ is used when referring to the same or similar concepts [ 4 ]. At the same time, text mining and NLP techniques are increasingly applied in a variety of exposure-related research areas. Whilst there are a variety of surveys and literature reviews in NLP and its various subtasks [ 5 , 6 , 7 ], there is no review of NLP and text mining techniques used in the field of occupational and environmental exposure research. This review fills that gap by providing a description of existing tools based on NLP and text mining techniques that have been applied in occupational and environmental exposure research. For this, we utilise a hybrid approach combining classical and automatic reviewing methods with RobotAnalyst [ 8 ], which is a recently developed web-based software system that combines text mining and machine learning algorithms. Papers published in the PubMed, Scopus and WoS databases are screened and reviewed to answer the following research questions:

  • What are the most common text mining and NLP approaches used in exposure assessment research?
  • What resources are used for this task?
  • What are the most common NLP methods used?
  • What are the main challenges and future directions of research?

2. Review Methodology

In this literature review, a search was conducted in three scientific literature databases. We include articles available in full and peer-reviewed, where our search returned 6420 articles, out of which 5957 were selected for pre-screening after duplicates were removed. In Figure 1 , we show the process of selecting for this review, where for each search on the three different platforms (PubMed, Scopus and Web of Sciene), we used the following query terms:

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-08544-g001.jpg

Overview of article selection process used in this narrative literature review.

  • (“natural language processing” OR “text mining” OR “text-mining” OR “text and data mining” OR ontology OR lexic* OR corpus OR corpora) AND (exposome OR exposure OR socioexposome OR (“risk factor” AND (“work” OR “occupational” OR “environmental*”)))

Pre-screening was performed as a two-step process. First, to reduce human workload, we utilised RobotAnalyst [ 8 ] to identify 998 full papers. RobotAnalyst is a web-based and freely available software system that utilises both text mining and machine learning methods to categorise and rank references for their relevance (Free access to RobotAnalyst can be requested to reproduce this work: http://www.nactem.ac.uk/robotanalyst/ (accessed on 2 November 2021). The system uses an iterative classification process which makes decisions based on the abstract for each reference. Next, we manually screened the titles and abstracts of those papers using the inclusion and exclusion criteria outlined below. The inclusion and exclusion criteria used to select studies relevant to occupational exposure research were provided by two experts in occupational exposure. Based on these criteria, we identified 80 papers that specifically focused on text mining and/or natural language processing in the field of exposure research. Next, the full papers were reviewed for their relevance to occupational exposure and usage of NLP or text mining methods. Finally, 40 copies of the full papers of those were retrieved and reviewed in full, resulting in a total of 37 articles that fulfilled our defined inclusion and exclusion criteria.

Inclusion criteria:

  • Original work;
  • Study exposures concerning humans;
  • Study occupational and/or environmental exposures of humans, such as airborne agents (e.g., particulates or substances and biological agents (viruses)), stressors, psycho-social and physical (e.g., muscle-skeletal) exposures as well as workplace accidents;
  • Have their full texts available;
  • Are written in English;
  • Focus on text mining or natural language processing and their texts containing a method, experiments and result section.

Exclusion criteria:

  • Studied animal or plant exposures;
  • Studied drug, nutrition or dietary exposures on humans;
  • Written in another language than English;
  • Commentaries, opinion papers or editorials.

In the following section, we summarise the findings of this literature review, where we focus on the types of resources used, computational methods and existing NLP tools. In Figure 2 , we show the number of papers published each year, where we can observe an increase in publications over time. We also categorise each paper in Table 1 based on NLP tools used, resources and computational method. Finally, we give a brief overview of the literature reviews and qualitative research in this area.

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-08544-g002.jpg

Number of NLP papers applied to occupational exposure research published each year from 2010 to 2021.

A categorisation of each paper based on tools used , resources and computational methods .

Papers
NLTK[ , , , , ],
[ ]
Other[ , , , , ],
[ , , ],
[ , , , ],
[ , , ]
Not declared[ , , , , , ],
[ , , , , ],
[ , , ]
Scientific literature[ , , , , , ],
[ , , , ],
[ , , ],
[ , , , , , ],
[ , , , ]
Existing Database[ , , , , ]
Twitter[ , , ]
EHR[ , , ]
Accident reports[ , , ]
Machine learning[ , , , ],
[ , , , ],
[ , , , , ],
[ , , , , ],
[ , , , ]
Knowledge based[ , , , , ]
Database creation and fusion[ , , , , , ],
[ , , ]
Rule-based algorithms[ , ]
  • A.   Resources

There are different types of resources used, where the most common resource is the existing scientific literature (see Figure 3 ). Other data sources include databases, social media platforms, electronic health records and accident reports (see Table 1 ).

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-08544-g003.jpg

A chart showing the different types of resources used in the selected articles.

  • B.   Computational Methods

Overall, there are four main categories of computational approaches used which include machine learning, knowledge-based approaches, and database creation and fusion approaches. Figure 4 shows the split of computational approaches found in this review.

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-08544-g004.jpg

A chart showing the computational methods utilised in the selected articles.

  • C.   Existing NLP tools

There are a number of different existing NLP preprocessing tools used (see Figure 5 ), where NLTK [ 49 ] is the most commonly used for preprocessing textual data. Given the vast number of different NLP tools used in other studies, we have summarised the tools as ‘ Other ’. However, it has to be noted that a large amount of studies did not declare the type of text mining tool that was used in their work.

An external file that holds a picture, illustration, etc.
Object name is ijerph-19-08544-g005.jpg

A chart showing a summary of the different types of NLP tools in each article.

3.1. Machine Learning Methods

Ref. [ 9 ] proposes a contactless clinical decision support system to diagnose patients with COVID-19 and monitor quarantine progression using Electronic Health Records. Relevant keywords are extracted from unstructured text using NLTK, and the results are added to a searchable database. The final steps of this work include the integration of the system with cloud services and visualisation to make results accessible to clinicians. The work by [ 28 ] proposes a computational approach of mapping the impact of climate change on global health via scientific literature. A total of 3730 papers are labelled manually and subsequently fed into an SVM (Support Vector Machine) to classify the unlabelled documents into the different label categories. Next, topic modelling is used to analyse and visualise the content of the literature. The authors of [ 15 ] propose to use scientific literature on PubMed to assess the impact of environmental exposures from early life using different unsupervised learning methods (e.g., LDA (Latent Dirichlet Allocation)) to gain insight into the different topics. The work by [ 29 ] models the impact of COPD (chronic obstructive pulmonary) from smoking using Adverse Outcome Pathways generated from the scientific literature. The is collected and filtered from PubMed to create a corpus and then clustered using the text mining approach proposed by [ 50 ]. Research by [ 10 ] classifies incident reports to improve aviation safety into two categories using an LSTM (Long Short-Term Memory) with attention. A total of 200,000 reports are preprocessed using NLTK, and word vectors are generated using ULMFiT (Universal Language Model Fine-tuning for Text Classification) [ 51 ]. Ref. [ 12 ] extracts information from the scientific literature to evaluate the impact of human exposure to electromagnetic fields, where topic modelling is used to generate domain-specific lexicons. Work by [ 42 ] develops a computational literature review approach for in utero exposure to environmental pollutants, where they aim to identify multiple chemicals and their health effects and reduce the burden of manual literature reviews. The titles and abstracts of 54,134 papers are clustered using the DoCTER software [ 16 ]. The authors of [ 30 ] propose a network-based predictive model to assess chemical toxicity for risk assessment of environmental pollutants. The Registry of Toxic Effects of Chemical Substances (RTECS) database [ 52 ] is used, where chemicals were annotated with an identifier to show the structure of it. Work by [ 13 ] introduces a supervised machine learning approach to complement a previous manual literature retrieval for the Exposome-Explorer database [ 53 ], where an extensive variety of machine learning algorithms are evaluated using Sckit-Learn [ 54 ]. Ref. [ 48 ] uses multivariable logistic regression to classify the spread of household transmission of COVID-19 in healthcare workers. As part of this work, term-frequency inverse document frequency (tf-idf) matrices are used match confirmed cases by residential address. The authors of [ 17 ] use Chinese accident reports for safety risk analysis in the construction industry, where a software called ROST is used to preprocess the documents and perform cluster and network structural analysis. Research conducted by [ 14 ] develops a corpus of over 3500 abstracts that were manually annotated by an Exposome expert for chemical exposures according to a taxonomy. The taxonomy is based on 32 nodes and was split into two categories: biomonitoring and exposure routes. Finally, the data were fed into an SVM (Support Vector Machine) to classify unseen documents. The authors of [ 11 ] analyse the sentiment of tweets collected based on a specific geolocation (Texas counties along I-20) to determine if there is a link between CVD (cardiovascular disease) rates and factors that may cause or increase the risk included on the tweets. A voting classifier is used to determine the sentiment of each tweet into positive or negative, where an accuracy of 73.69% is achieved. Ref. [ 31 ] developed an ensemble classifier, called SOCcer , to map job titles to occupational classification codes (SOC). For this, a variety of publicly available resources were used to match job titles and tasks to the US SOC-2010 code, which resulted in a knowledge base of around 62,000 linked jobs. To train the ensemble classifier, job descriptions from a bladder cancer study were used as training data, whereas an evaluation of the algorithm was conducted on job titles for personal airborne measurements during an inspection. Research conducted by [ 18 ] collected data using Twitter’s API for ‘ asthma ’, and both manual (e.g., expert annotation and evaluation) and automatic analysis (e.g., topic modelling) are conducted to identify health-related tweets. One of the dominant topics identified by experts was environmental influences and references to triggers of asthma. The work by [ 22 ] uses text mining to assess chemical health risks, where PubMed abstracts are used to identify the mode of action (MOA) of carcinogens. For this work, they use the previously developed CRAB tool [ 55 ], which uses a bag-of-words approach to convert abstracts into vectors. Then, an SVM classifier with Jensen–Shannon divergence (JSD) kernel is trained to categorise the abstracts into a predefined taxonomy. The work by [ 23 ] develops a ranking algorithm to automatically recommend scientific abstracts for curation at CTD (Comparative Toxicogenomics Database [ 56 ]). This is completed by screening each abstract and assigning a document relevancy score (DRS), where 3583 articles are used from PubMed for this task. To analyse each abstract, a variety of text mining tools and approaches are used, which include ABNER [ 57 ], MetaMap [ 58 ] and Oscar3 [ 59 ] for gene/protein recognition and chemical recognition, respectively. Finally, a ranking algorithm is developed that sorts abstracts for curation relevance. The authors of [ 24 ] introduce a new method to classify biomedical documents for curation using the Comparative Toxicogenomics Database (CTD). A total of 1059 previously collected articles are annotated for entities (e.g., genes, chemicals, diseases and respective interactions), and manual abstract annotation is performed for chemicals relevant to the CTD. Finally, the documents are classified using a SVM. The authors of [ 25 ] use 225 electric power causality accident reports from China to identify factors that contribute to personal injury. TF-IDF is used to obtain the word frequency in a document, and the results are subsequently visualised using word clouds. The results are then used to extract key information on the dangers described in the reports. Our results also show that the majority of papers in this section utilise existing literature or databases to extract new information or classify unseen documents into existing categories. Classification experiments are performed using a wide variety of existing supervised machine learning algorithms (e.g.,: SVM or logistic regression). At the same time, new information is commonly uncovered and visualised using unsupervised learning methods (e.g.,: LSA or PCA). NLTK is a commonly used tool for preprocessing textual data, but there are also other NLP tools utilised that may be more suitable to deal with different languages or domains (e.g., ROST or CRAB).

3.2. Knowledge-Based Methods

Ref. [ 43 ] investigates Adverse Outcome Pathways (AOP) of pesticide exposure based on scientific literature collected on PubMed. For this, the recently developed AOP-helpFinder [ 60 ] is extended and subsequently known as AOP-helpFinder 2. The following properties were added: (i) the tool’s ability to automatically process and screen abstracts from PubMed, (ii) link stressors with a dictionary of events and (iii) calculate scores for both systems based on the position and weighted score for all event types. The tool is then evaluated by applying it to screen for a list of pesticides that have unknown long-term exposure effects on human health. Research conducted by [ 44 ] utilises a linguistic analysis of 261 scientific abstracts related to the ‘Exposome’ to gain insight into the current range of exposome research conducted. A literature search was performed, and an analysis was conducted using a combination of Termine [ 61 ] and NLTK [ 49 ] to extract multi-word terms and compute word frequency counts. The second part of this analysis uses over 500 biomedical ontologies provided at the National Center for Biomedical Ontology to automatically map abstracts to relevant ontologies. This work was subsequently extended by [ 62 ], who are using topic modelling and ontology analysis to provide an updated overview of knowledge representation tools relevant to exposure research. The work by [ 21 ] creates a new semantic resource for exposures, which is evaluated both in a clinical setting and on scientific literature. The resource contains (i) manual annotations derived from clinical notes and knowledge from the Unified Medical Language System (UMLS) to find exposome concepts. Ref. [ 20 ] use five corpora of epidemiological studies with different exposures and outcomes to extract exposure-related information that can aid systematic reviews and other summaries. In this work, a rule-based system called GATE [ 63 ] is used that relies on the development of dictionaries, where a total of 21 dictionaries were manually created with domain-specific exposures and outcomes. Research conducted by [ 19 ] uses rule-based patterns to analyse 60 PubMed abstracts in the obesity domain for six semantic concepts (study design, population, exposures, outcomes, covariates and effect size). Fourteen separate dictionaries are created that contain terms related to the previously mentioned six semantic concepts using a variety of tools [ 64 , 65 ]. Research conducted by [ 27 ] enhances the existing METLIN Exposome database to include over 950,000 unique small molecules. As part of this work, IBM Watson [ 66 ] is utilised, where Watson’s NLP approach is based on both rules (e.g., dictionary) and machine learning. The authors of [ 40 ] developed a rule-based SES (socioeconomic status) algorithm ( https://github.com/vserch/SES (accessed on 12 November 2021)) to analyse Electronic Health Records using the Ruby programming language. In this work, the effects of socioeconomic factors on overall health (e.g., mortality, education, occupation) in minorities are used to ensure that these factors will be considered as exposure in future work. In summary, we found that common knowledge sources are dictionaries, lists and ontologies, where sources for this knowledge often are existing literature or clinical notes. Interestingly, there is not one preferred text mining tool used in any of the studies, and therefore, a large variety of different NLP tools are utilised.

3.3. Database Creation and Fusion

One of the most popular databases created is the comparative toxicogenomics database (CTD), which was developed in 2004 and is updated annually [ 45 ]. Generally speaking, this resource is made up of three databases, which include (i) a third party database that contains data from external sources (e.g., MeSH), (ii) a manually curated database of data screened by scientists and (iii) a public web application that combines data from the curation database and third party database. The resources’ aim is to provide content that relates chemical exposures with human health to gain a better insight into diseases that are influenced by the environment. Research by [ 33 ] created an updated human exposome database for predicting the biotransformation of chemicals by using literature mining to manually identify scientific articles. For this work, PubMed was queried based on several keywords related to the exposome (e.g., human exposome, drinking water, air, and disinfection or combustion by-products), where most selected publications were review articles that contain environmental matrices (e.g., indoor air exposome, dust exposome, or waterborne chemicals). The work by [ 34 ] uses the text mining approach proposed by [ 36 ] to generate a new database of organic pollutants in China. The database is based on 2799 scientific publications and includes a total of 112,878 records. Research conducted by [ 46 ] uses the AOP-helpFinder tool as proposed by [ 36 ] to screen a PubMed corpus for exposure to endocrine-disrupting chemicals. The authors of [ 35 ] utilise text mining in combination with integrative systems biology to support decision making for the usage of BPFs (bisphenol F) in manufacturing and therefore circumvent adverse outcome pathways (AOP). To establish a connection between environmental exposures (e.g., to BPFs) and health effects, a variety of existing literature and databases such as PubMed, ToxCast, CompTox, and AOP-wiki are used. In this work, a previously proposed text mining tool called AOP-help Finder [ 60 ] is used to analyse abstracts for links between chemical substances and AOPs. The corpus for this work was developed using both automatic and manual searches. First, an automatic search of PubMed was conducted using the AOP-helpFinder tool to identify links between BPF and AOPs. Then, TOXLINE [ 67 ] was searched from the year 2017 for articles that contain BPF and synonyms of BPF in a toxicological context. The authors of [ 47 ] present an update of the environmental exposure to the nanomaterials database by using NLP to retrieve information from textual data and integrate it into the database. The first step in this work is to use OpenNLP ( https://opennlp.apache.org/ (accessed on 19 November 2021)) to preprocess and prepare a corpus of 10 scientific articles related to environmental risk assessment. An ontology called EXPOSEO ontology is subsequently developed and used to match the extracted information into concepts that can be integrated into the existing database. The work by [ 36 ] uses text mining to create a list of all chemicals related to ‘blood-associated chemicals’, which is then used to create a Blood Exposome Database. Several keywords were used to query PubMed, where the results were then checked manually to remove false positives and a phrase exclusion list was created. The final number of literature abstracts found is 1,085,023 ( https://exposome1.fiehnlab.ucdavis.edu/download/pmid_title_abstract_sb.zip (accessed on 19 November 2021)) and then linked to chemicals, based on the synonym for a chemical, existing links between PubChem and PubMed and by mining supplement tables for chemical synonyms using R (Code in R: https://github.com/barupal/exposome (accessed on 19 November 2021)). As a result, new blood chemicals were discovered in the literature. A similar approach for assessing cancer hazards was used by [ 68 ] using the PubMed literature. The work by [ 69 ] uses a three-step process to update the comparative toxicogenomics database (CDT) with exposure data from the scientific literature sourced on PubMed. A variety of techniques are used to extract vocabularies, which include but are not limited to MeSH [ 70 ], Gene Ontology [ 71 ] and NCBI Gene [ 72 ]. These techniques extract vocabularies for chemical and anatomy words, disease terms, biological processes and geographic locations, respectively. Finally, the data are integrated into the CDT, creating 49 new tables that contain 239 columns. Research by [ 37 ] proposes a new database called the Toxin-Toxin-Target Database (T3DB), which consolidates multi-disciplinary data on toxic compound exposure. A taxonomy of compounds is generated using a classifier to categorise compounds into groups, and then, an ontology of chemical entities is developed. In a nutshell, we find that there is a need for and high usage of databases that hold domain-specific knowledge for exposure research. Furthermore, most databases outlined in this review are generated using literature mining or existing databases, where information commonly retrieved include chemicals, anatomy words, disease terms, biological processes and geographic locations.

3.4. Literature Reviews and Qualitative Research

Ref. [ 38 ] conducts a review of existing ontologies relevant to the external exposome research and argues for the future development of semantic standards. This argument is driven by the variation of exposome resources, where differences include but are not limited to variables having the same or similar names but measuring different exposures. The work by [ 26 ] produces a systematic literature review on transport-related injury, where the first reviewer used traditional methods and the second reviewer utilised text mining techniques to perform the same review. The text mining portion of this work uses WordStat [ 73 ], QDA Miner [ 74 ], and literature screening was conducted in Abstrackr [ 75 ]. Research by [ 39 ] investigates how the public reacted to reports of increased lead levels in school drinking waters. Both a quantitative and qualitative evaluation was performed, where it was found that (i) the majority of tweets were by news agencies and people holding positions in public offices, and (ii) the three most important themes of tweets were information sharing, health concerns and socio-demographic disparities. Overall, we have found that there is a small number of existing reviews that include the use of NLP methods and tools in exposure research. In addition to this, there is also a utilisation of mixed methods to better gauge public opinion on exposure-related health concerns.

4. Discussion

There are a number of challenges remaining in the field of NLP applied to occupational exposure research. In the following section, we outline some challenges and opportunities for future work in this area:

  • Data volume and quality Whilst there has been some use of unsupervised machine learning methods (e.g., clustering via LDA) in the selected studies, a majority use supervised machine learning. One downside of this is that the latter approach requires human annotated data, which usually requires expert knowledge and is therefore a time-consuming and costly process. To overcome this issue, the use of semi-supervised or unsupervised learning methods might be explored, because it requires either significantly less annotated training data or none at all. An example of this is the use of topic modelling techniques to cluster jobs and exposures from the existing literature. Another opportunity lies in using semi-supervised Named Entity Recognition to increase the coverage of annotated literature.

This also includes Transformer-based methods [ 78 ] (e.g., large pre-trained language models such as BERT [ 79 ]), which have made a significant impact on the field of NLP over recent years and could prove to be useful in NLP for occupational exposure research. This type of deep learning method is based on attention [ 80 ], which has been shown to improve results in a variety of other domains that have utilised NLP (e.g., healthcare). These advances could be used to improve tasks such as Named Entity Recognition (NER) [ 81 ] or Relation Extraction (RE) [ 82 ] in occupational exposure research, which up until this point have relied on traditional machine learning only. Both tasks could prove useful in the context of occupational exposure research to automatically identify key concepts (e.g., types of exposures, jobs or work environments) but also how they relate to one another (e.g., a particular role is in a specific work place). Other advances could be made through the use of unsupervised methods, which thus far have also relied on traditional machine learning only. More recent methods such as Neural Topic Models (NTM) have become increasingly popular for different tasks, including document summarisation and text generation [ 83 ] due to their flexibility and capability. These methods could also be applied to occupational exposure research to uncover new topics and concepts at a larger scale or draw new connections between exposures and work environments. Similarly, NTM methods could also be coupled with pre-trained language models to further boost performance and result in more accurate representations of new topics [ 83 ].

  • Extrapolating existing research to other domains of exposure research Most of the research explored in this review is specific to a particular type of exposure, databases or enhancement of literature reviews. The domain-specificity and different needs/requirements for each type of exposure make it therefore hard to extrapolate these existing works to other fields, link and scale up existing approaches.

5. Conclusions

In this work, we have manually reviewed 37 papers relevant to NLP applied to occupational exposure research. Our results show that (i) there has been an increase in articles published, (ii) most work uses existing NLP tools, and (iii) traditional machine learning is the most popular approach. Furthermore, we have outlined challenges and opportunities for future research that could further advance the field.

Acknowledgments

This research was made possible by the support of the EPHOR (Exposome Project for Health and Occupational Research) consortium.

Abbreviations

The following abbreviations are used in this manuscript:

AIArtificial Intelligence
AOPAdverse Outcome Pathways
BERTBidirectional Encoder Representations from Transformers
CTDComparative Toxicogenomics Database
DRSDocument relevancy score
LDALatent Dirichlet Allocation
LSALatent semantic analysis
LSTMLong Short Term Memory
NERNamed Entity Recognition
NLPNatural Language Processing
NLTKNatural Language Toolkit
NTMNeural topic models
PCAPrincipal component analysis
RERelation Extraction
SVMSupport Vector Machine
TF-IDFfrequency–inverse document frequency
UMLSUnified Medical Language System

Funding Statement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 874703.

Author Contributions

A.M.S., I.B., M.v.T. and S.A. contributed to the conception and design of the literature review (e.g., selecting keywords). A.M.S. and S.A. retrieved relevant papers and completed pre-screening. I.B. and M.v.T. performed a final screening of the selected papers. A.M.S. wrote the manuscript and I.B., M.v.T. and S.A. provided feedback and corrections on individual sections. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Informed consent statement, data availability statement, conflicts of interest.

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

LLAssist: Simple Tools for Automating Literature Review Using Large Language Models

  • Yoga Haryanto, Christoforus

This paper introduces LLAssist, an open-source tool designed to streamline literature reviews in academic research. In an era of exponential growth in scientific publications, researchers face mounting challenges in efficiently processing vast volumes of literature. LLAssist addresses this issue by leveraging Large Language Models (LLMs) and Natural Language Processing (NLP) techniques to automate key aspects of the review process. Specifically, it extracts important information from research articles and evaluates their relevance to user-defined research questions. The goal of LLAssist is to significantly reduce the time and effort required for comprehensive literature reviews, allowing researchers to focus more on analyzing and synthesizing information rather than on initial screening tasks. By automating parts of the literature review workflow, LLAssist aims to help researchers manage the growing volume of academic publications more efficiently.

  • Computer Science - Digital Libraries;
  • Computer Science - Artificial Intelligence

Performance Evaluation of Natural Language Processing Algorithms for Sentiment Analysis

  • Original Research
  • Published: 26 July 2024
  • Volume 5 , article number  724 , ( 2024 )

Cite this article

natural language processing literature review

  • S. H. Annie Silviya 1 ,
  • S. Julia Faith 2 ,
  • R. Seetha 3 &
  • M. Hemalatha 4  

Sentiment analysis can be used to identify if a text’s sentiment is neutral, positive, or negative. One type of natural language processing is sentiment analysis. An interdisciplinary field encompassing linguistics, computer science, and artificial intelligence called “natural language processing” studies how computers and human language interact, with a focus on how to teach computers to process and analyze vast volumes of natural language data. Sentiment analysis, sometimes referred to as opinion mining or emotion artificial intelligence, is the methodical identification, extraction, quantification, and study of affective states and subjective data using natural language processing, text analysis, computational linguistics, and biometrics. Sentiment analysis is frequently used in online and social media platforms, voice of the customer materials (such as reviews and survey replies), and healthcare materials for marketing, customer service, and clinical medicine purposes. This paper discusses different sentiment analysis methods and selects the best approach for a certain task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

natural language processing literature review

Similar content being viewed by others

natural language processing literature review

Review on Sentiment Analysis Using Supervised Machine Learning Techniques

natural language processing literature review

Sentiment Analysis: A General Review and Comparison

natural language processing literature review

A survey on sentiment analysis methods, applications, and challenges

Kit Y, Mokji MM. Sentiment analysis using pre-trained language model with no fine-tuning and less resource. IEEE Access. 2022;10:107056–65. https://doi.org/10.1109/ACCESS.2022.3212367 .

Article   Google Scholar  

F. T. Saputra, S. H. Wijaya, Y. Nurhadryani and Defina, Lexicon Addition Effect on Lexicon-Based of Indonesian Sentiment Analysis on Twitter. 2020 International Conference on Informatics,Multimedia, Cyber and Information System (ICIMCIS), Jakarta, Indonesia. 2020. pp. 136–141. https://doi.org/10.1109/ICIMCIS51567.2020.9354269 .

K. S. Naveenkumar, R. Vinayakumar and K. P. Soman, Amrita-CEN-SentiDB: Twitter Dataset for Sentimental Analysis and Application of Classical Machine Learning and Deep Learning; 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 2019. pp. 1522–1527, https://doi.org/10.1109/ICCS45141.2019.9065337 .

Tyagi V, Kumar A, Das S. Sentiment analysis on twitter data using deep learning approach, 2020 2nd International Conference on Advances in Computing, Communication Controland Networking (ICACCCN), Greater Noida, India. 2020. pp. 187–190, https://doi.org/10.1109/ICACCCN51052.2020.9362853 .

Ibrahim A. Forecasting the early market movement in bitcoin using twitters sentiment analysis: an ensemble-based prediction model, 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), Toronto, ON, Canada, 2021. pp. 1-5. https://doi.org/10.1109/IEMTRONICS52119.2021.9422647

Sintoris K, Vergidis K. Extracting Business Process Models Using Natural Language Processing (NLP) Techniques, 2017 IEEE 19th Conference on Business Informatics (CBI), Thessaloniki, Greece. 2017. pp. 135–139. https://doi.org/10.1109/CBI.2017.41 .

Maass W. How visual salience influences natural language descriptions. IEE Colloquium on Representations: Integration of Sensory Information in Natural Language Processing, Artificial Intelligence and Neural Networks, London, UK. 1995. pp. 3/1–3/3. https://doi.org/10.1049/ic:19950663 .

Tao J, Fang Zheng, Li A, Ya Li. Advances in Chinese Natural Language Processing and Language resources, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments, Urumqi, China. 2009. pp. 13–18, https://doi.org/10.1109/ICSDA.2009.5278384 .

Peng T, Harris I, Sawa Y. Detecting Phishing Attacks Using Natural Language Processing and Machine Learning, 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA. 2018. pp. 300–301. https://doi.org/10.1109/ICSC.2018.00056 .

Jing X, Hao Y, Fei H, Li Z. Text encryption algorithm based on natural language processing, 2012 Fourth International Conference on Multimedia Information Networking and Security, Nanjing, China, 2012, pp. 670–672, https://doi.org/10.1109/MINES.2012.216 .

Karthick S, Victor RJ, Manikandan S, Goswami B. Professional chat application based on natural language processing, 2018 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), Bangalore, India. 2018. pp. 1–4. https://doi.org/10.1109/ICCTAC.2018.8370395 .

Hachaj T, Ogiela MR. Clusters of trends detection in microblogging: simple natural language processing vs hashtags—which is more informative, 2016 10th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS), Fukuoka, Japan. 2016. pp. 119–121. https://doi.org/10.1109/CISIS.2016.44 .

Fatwanto A. Software requirements specification analysis using natural language processing technique, 2013 International Conference on QiR, Yogyakarta, Indonesia. 2013. pp. 105–110. https://doi.org/10.1109/QiR.2013.6632546 .

Shingala A, Virparia P. Enhancing the relevance of information retrieval by querying the database in natural form, 2013 International Conference on Intelligent Systems and Signal Processing(ISSP), Vallabh Vidyanagar, India. 2013. pp. 408–412. https://doi.org/10.1109/ISSP.2013.6526944 .

Soni M, Gomathi S, Bhupendra Kumar Adhyaru Y. Natural Language Processing for the Job Portal Enhancement, 2020 7th International Conference on Smart Structures and Systems (ICSSS), Chennai, India. 2020. pp. 1–4. https://doi.org/10.1109/ICSSS49621.2020.9202046 .

Kłosowski P. Deep learning for natural language processing and language modelling, 2018 signal processing: algorithms, architectures, arrangements, and applications (SPA), Poznan, Poland. 2018. pp. 223-228. https://doi.org/10.23919/SPA.2018.8563389

Tissot HC, et al. Natural language processing for mimicking clinical trial recruitment in critical care: a semi-automated simulation based on the LeoPARDS trial. IEEE J Biomed Health Inform. 2020;24(10):2950–9. https://doi.org/10.1109/JBHI.2020.2977925 .

Alansary S, Nagi M, Adly N. A suite of tools for Arabic natural language processing: A UNL approach, 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), Sharjah, United Arab Emirates. 2013. pp. 1–6. https://doi.org/10.1109/ICCSPA.2013.6487236 .

Das S, Ashrafuzzaman M, Sheldon FT, Shiva S. “Network Intrusion Detection using Natural Language Processing and Ensemble Machine Learning,” 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 2020. pp. 829-835. https://doi.org/10.1109/SSCI47803.2020.9308268

Taira RK, Bashyam V, Kangarloo H. A field theoretical approach to medical natural language processing. IEEE Trans Inf Technol Biomed. 2007;11(4):364–75. https://doi.org/10.1109/TITB.2006.884368 .

Khanaferov D, Luc C, Wang T. “Social Network Data Mining Using Natural Language Processing and Density Based Clustering,” 2014 IEEE International Conference on Semantic Computing, Newport Beach, CA, USA. 2014. pp. 250–251. https://doi.org/10.1109/ICSC.2014.48 .

Xing Z, Parandehgheibi M, Xiao F, Kulkarni N, Pouliot C. “Content-based recommendation for podcast audio-items using natural language processing techniques,” 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA. 2016. pp. 2378–2383. https://doi.org/10.1109/BigData.2016.7840872 .

Yeo H. A Machine Learning Based Natural Language Question and Answering System for Healthcare Data Search using Complex Queries, “2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA. 2018. pp. 2467–2474, https://doi.org/10.1109/BigData.2018.8622448 .

Mills MT, Bourbakis NG. Graph-based methods for natural language processing and understanding—a survey and analysis. IEEE Trans Syst Man Cybern Syst. 2014;44(1):59–71. https://doi.org/10.1109/TSMCC.2012.2227472 .

Woldemariam Y. “Sentiment analysis in a cross-media analysis framework,” 2016 IEEE International Conference on Big Data Analysis (ICBDA), Hangzhou, China. 2016. pp. 1–5. https://doi.org/10.1109/ICBDA.2016.7509790 .

Download references

Acknowledgements

The authors acknowledged the Rajalakshmi Institute of Technology (RIT), Chennai, Tamilnadu, India; S.A.Engineering College, Chennai, Tamilnadu, India; Vellore Institute of Technology, Tamilnadu, India; R.M.K Engineering College, Chennai, Tamilnadu, India for supporting the research work by providing the facilities.

No funding received for this research.

Author information

Authors and affiliations.

Department of Computer Science and Engineering, Rajalakshmi Institute of Technology (RIT), Chennai, Tamilnadu, India

S. H. Annie Silviya

Department of Information Technology, S.A.Engineering College, Chennai, Tamilnadu, India

S. Julia Faith

Department of Information Technology, Vellore Institute of Technology, Vellore, Tamilnadu, India

Department of Computer Science and Engineering, RMD Engineering College, Chennai, Tamilnadu, India

M. Hemalatha

You can also search for this author in PubMed   Google Scholar

Contributions

The research outcomes were significantly shaped by the collaborative efforts and collective contributions of all authors involved in this endeavor.

Corresponding author

Correspondence to S. H. Annie Silviya .

Ethics declarations

Conflict of interest.

No conflict of interest.

Data Availability

The corresponding author can provide access to the dataset generated and analyzed in the current study upon reasonable request .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Silviya, S.H.A., Faith, S.J., Seetha, R. et al. Performance Evaluation of Natural Language Processing Algorithms for Sentiment Analysis. SN COMPUT. SCI. 5 , 724 (2024). https://doi.org/10.1007/s42979-024-03094-8

Download citation

Received : 29 May 2024

Accepted : 27 June 2024

Published : 26 July 2024

DOI : https://doi.org/10.1007/s42979-024-03094-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Natural language processing
  • Sentiment analysis
  • Text analysis
  • Computational linguistics

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. (PDF) REVIEW ON NATURAL LANGUAGE PROCESSING

    natural language processing literature review

  2. (PDF) A Systematic Literature Review on Phishing Email Detection Using

    natural language processing literature review

  3. Book Reviews: Natural Language Processing

    natural language processing literature review

  4. 10 Amazing Examples Of Natural Language Processing

    natural language processing literature review

  5. (PDF) User Stories and Natural Language Processing: A Systematic

    natural language processing literature review

  6. 10 Great Books If You Want To Learn About Natural Language Processing

    natural language processing literature review

VIDEO

  1. Natural Language Processing

  2. Natural Language Processing In 5 Minutes

  3. What is NLP (Natural Language Processing)?

  4. Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK

  5. Natural Language Processing explained in 2 minutes

  6. Lecture 1

COMMENTS

  1. (PDF) Natural Language Processing: A Review

    Natural Language Processing: A Review. March 2016; 6(3):207-210; ... This paper reviews the literature on NLP. It also covers or gives a hint about the history of NLP. ... Natural Language ...

  2. A Systematic Literature Review of Natural Language Processing: Current

    In this research paper, a comprehensive literature review was undertaken in order to analyze Natural Language Processing (NLP) application based in different domains. Also, by conducting qualitative research, we will try to analyze the development of the current state and the challenge of NLP technology as a key for Artificial Intelligence (AI ...

  3. A Systematic Literature Review on Natural Language Processing (NLP

    Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) technology used by machines to understand, analyze and interpret human languages. In the past decade, NLP received more recognition due to innovation in information and communication technology which led to various research. Thus, it is essential to understand the development taken in the knowledge of literature. The ...

  4. Natural language processing: state of the art, current trends and

    Natural language processing (NLP) has recently gained much attention for representing and analyzing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. In this paper, we first distinguish four phases by discussing different levels of NLP ...

  5. Advances in natural language processing

    Computational linguistics, also known as natural language processing (NLP), is the subfield of computer science concerned with using computational techniques to learn, understand, and produce human language content. Computational linguistic systems can have multiple purposes: The goal can be aiding human-human communication, such as in machine ...

  6. The State of the Art of Natural Language Processing—A Systematic

    ABSTRACT. Nowadays, natural language processing (NLP) is one of the most popular areas of, broadly understood, artificial intelligence. Therefore, every day, new research contributions are posted, for instance, to the arXiv repository. Hence, it is rather difficult to capture the current "state of the field" and thus, to enter it. This brought the id-art NLP techniques to analyse the NLP ...

  7. PDF A Systematic Literature Review of Natural Language Processing: Current

    Eghbal Ghazizadeh(&)and Pengxiang Zhu. Whitireia Polytechnic, 450 Queen Street, Auckland 1010, New Zealand. [email protected]. Abstract. In this research paper, a comprehensive literature review was undertaken in order to analyze Natural Language Processing (NLP) application based in different domains.

  8. A Systematic Literature Review of Natural Language Processing: Current

    A Systematic Literature Review of Natural Language Processing: Current State, Challenges and Risks. October 2020. DOI: 10.1007/978-3-030-63128-4_49. In book: Proceedings of the Future Technologies ...

  9. Systematic review of current natural language processing methods and

    Natural language processing (NLP) is a set of automated methods for interpreting different aspects of natural language, including syntax (the arrangement) and semantics (the meaning) of words and phrases (Figure 1). A spectrum of NLP approaches exists, ranging from identification of text strings to deep learning.

  10. Vision, status, and research topics of Natural Language Processing

    The field of Natural Language Processing (NLP) has evolved with, and as well as influenced, recent advances in Artificial Intelligence (AI) and computing technologies, opening up new applications and novel interactions with humans. Modern NLP involves machines' interaction with human languages for the study of patterns and obtaining ...

  11. Natural language processing in medicine: A review

    Abstract. Natural language processing (NLP) is a form of machine learning which enables the processing and analysis of free text. When used with medical notes, it can aid in the prediction of patient outcomes, augment hospital triage systems, and generate diagnostic models that detect early-stage chronic disease.

  12. Natural language processing: A review

    Natural language processing: A review. Natural language processing (NLP) has received a great deal of attention for its computer representation and evaluation of human language. AI, email spam location, data extraction, once-finished, clinical, and question addressing are only a couple of the applications. The article is broken into four areas ...

  13. Natural language processing (NLP) to facilitate abstract review in

    Abstract review is a time and labor-consuming step in the systematic and scoping literature review in medicine. Text mining methods, typically natural language processing (NLP), may efficiently replace manual abstract screening. This study applies NLP to a deliberately selected literature review problem, the trend of using NLP in medical research, to demonstrate the performance of this ...

  14. Natural Language Processing Challenges and Issues: A Literature Review

    Natural Language Processing (NLP) is the computerized approach to analysing text using both structured and unstructured data. NLP is a simple, empirically powerful, and reliable approach.

  15. How to conduct efficient and objective literature reviews using natural

    In this research, we thus outline the steps involved in conducting a literature review using natural language processing. Natural language processing is "a computer-assisted analytical technique aimed at automatically analyzing and comprehending human language (Manning & Schütze, 1999) that allows scholars to easily extract beneficial ...

  16. A Narrative Literature Review of Natural Language Processing Applied to

    Natural Language Processing is an area of research within Artificial Intelligence (AI) that is concerned with giving computers the ability to understand natural language (spoken and written) in the same way a human could [].Knowledge of computational linguistics (rule-based modelling of human language), statistics, machine learning and deep learning are used either individually or combined to ...

  17. Natural language processing (NLP) in management research: A literature

    Natural language processing (NLP) is gaining momentum in management research for its ability to automatically analyze and comprehend human language. Yet, despite its extensive application in management research, there is neither a comprehensive review of extant literature on such applications, nor is there a detailed walkthrough on how it can ...

  18. Literature review

    Literature Review The literature review is a paper that reviews a subfield of NLP of your choice. To ensure some intellectual diversity and depth of literature search, your review must cover at least 12 resesarch papers, and there must be at least 2 papers in each decade since 1990, and 2 papers from before 1990. That is, at least 2 papers in ...

  19. Natural Language Processing: from Bedside to Everywhere

    Natural language processing (NLP) techniques have gained importance in the medical field. Because NLP is a hot topic in computer science, the number of medical NLP studies is increasing each year dramatically. Despite the large number of studies, only a few practical studies have validated medical NLP applications in real-world settings.

  20. A systematic review of natural language processing applied to radiology

    Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports.

  21. A systematic review of natural language processing applications for

    Natural language processing (NLP) is a promising tool for collecting data that are usually hard to obtain during extreme weather, like community response and infrastructure performance. Patterns and trends in abundant data sources such as weather reports, news articles, and social media may provide insights into potential impacts and early warnings of impending disasters. This paper reviews ...

  22. Neural natural language processing for long texts: : A survey on

    The adoption of Deep Neural Networks (DNNs) has greatly benefited Natural Language Processing (NLP) during the past decade. However, the demands of long document analysis are quite different from those of shorter texts, while the ever increasing size of documents uploaded online renders automated understanding of lengthy texts a critical issue.

  23. A Narrative Literature Review of Natural Language Processing Applied to

    1. Introduction. Natural Language Processing is an area of research within Artificial Intelligence (AI) that is concerned with giving computers the ability to understand natural language (spoken and written) in the same way a human could [].Knowledge of computational linguistics (rule-based modelling of human language), statistics, machine learning and deep learning are used either ...

  24. LLAssist: Simple Tools for Automating Literature Review Using Large

    This paper introduces LLAssist, an open-source tool designed to streamline literature reviews in academic research. In an era of exponential growth in scientific publications, researchers face mounting challenges in efficiently processing vast volumes of literature. LLAssist addresses this issue by leveraging Large Language Models (LLMs) and Natural Language Processing (NLP) techniques to ...

  25. A systematic review of applications of natural language processing and

    TBED has gained a lot of attention in recent times. The paper presents a systematic literature review of the existing literature published between 2005 and 2021 in TBED. ... One of the significant contributions of AI has remained in Natural Language Processing (NLP), which glued together linguistic and computational techniques to assist ...

  26. Performance Evaluation of Natural Language Processing ...

    Literature review, categorization of issues, novel transformation approach and comprehensive test data set were the contributions of business process models and natural language processing personal assistant like Apple's Siri and Google's voice assistant were the most popular examples of Natural Language processing they were the ...