Advancement in Lung Cancer Diagnosis: A Comprehensive Review of Deep Learning Approaches

  • First Online: 08 August 2024

Cite this chapter

  • Djamel Bouchaffra 1 , 2 ,
  • Faycal Ykhlef 1 &
  • Samir Benbelkacem 1  

Part of the book series: Interdisciplinary Cancer Research

Lung cancer continues to pose a significant global health challenge. To overcome this challenge, continuous advancements are being achieved in diagnostic methodologies to enhance early detection and improve patient outcomes. This chapter provides a thorough examination of recent progress in lung cancer diagnosis through an extensive survey of deep learning approaches. Focusing on the integration of artificial intelligence (AI) techniques with medical imaging, the chapter encompasses an analysis of convolutional neural networks (CNNs), recurrent neural networks (RNNs), including long short-term memory (LSTMs) networks, and generative-pretrained transformers (GPTs) or large language models (LLMs). The chapter delves into the evolution of deep learning models for lung cancer detection, emphasizing their performance in image classification, lesion segmentation, and overall diagnostic accuracy. Additionally, we also showcase the literature that explores the integration of diverse imaging modalities, such as computed tomography (CT), positron emission tomography (PET), and magnetic resonance imaging (MRI), within deep learning frameworks to enhance the robustness and reliability of diagnostic systems. Furthermore, the review addresses the challenges inherent in the exploration of deep learning in lung cancer diagnosis, including issues related to data quality, model interpretability, and generalizability. Strategies to address these challenges, such as transfer learning, data augmentation (based on generative adversarial networks), and transformers, are thoroughly discussed. The comprehensive analysis presented in this chapter aims to provide a consolidated understanding of the current landscape of deep learning approaches in lung cancer diagnosis. By highlighting recent advancements, challenges, and potential solutions, this chapter contributes to the ongoing dialogue within the scientific community, fostering the development of more effective and reliable tools for early detection and management of lung cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, Tse D, Etemadi M, Ye W, Corrado G, Naidich DP, Shetty S (2019) End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med 25(6):954–961. https://doi.org/10.1038/s41591-019-0447-x

Article   Google Scholar  

Bandi P, Geessink O, Manson Q, Van Dijk M, Balkenhol M, Hermsen M, Ehteshami Bejnordi B, Lee B, Paeng K, Zhong A, Li Q, Zanjani FG, Zinger S, Fukuta K, Komura D, Ovtcharov V, Cheng S, Zeng S, Thagaard J, Dahl AB (2019) From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge. IEEE Trans Med Imaging 38(2):550–560. https://doi.org/10.1109/tmi.2018.2867350

Chen CJ, Ding A, Li Z, Luo C, Wallach HS (2021) Weakly supervised lesion localization and classification in chest x-rays: attributes and categories matter. arXiv preprint arXiv:2103.10826

Google Scholar  

Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, Moreira AL, Razavian N, Tsirigos A (2018) Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 24(10):1559–1567. https://doi.org/10.1038/s41591-018-0177-5

Cruz-Roa A, Gilmore H, Basavanhally A et al (2017) Accurate and reproducible invasive breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Sci Rep 7:46450. https://doi.org/10.1038/srep46450

Dosovitskiy A, Brown T, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Kavukcuoglu K (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2103.10826

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2020) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118. https://doi.org/10.1038/nature21056

Huynh E, Hosny A, Guthier C (2016) A two-stage transfer learning algorithm in medical imaging. Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 1–6. https://doi.org/10.1109/CVPRW.2016.16

Janowczyk A, Zuo R, Gilmore H (2017) CNN-based segmentation of histology images for prediction of cancer grade. J Pathol Inf 8:27. https://doi.org/10.4103/jpi.jpi_34_17

Nam JG, Park S, Hwang EJ, Lee JH, Jin KN, Lim KY, Park HJ (2022) Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology:211108. https://doi.org/10.1148/radiol.2018180237

Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, Ng AY (2017) Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225

Setio AAA, Traverso A, de Bel T, Berens MSN, van den Bogaard C, Cerello P, Chen H, Dou Q, Fantacci ME, Geurts B, van der Gugten R, Heng PA, Jansen B, de Kaste MMJ, Kotov V, Lin JY-H, Manders JTMC, Sóñora-Mengana A, García-Naranjo JC, Papavasileiou E (2017) Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Med Image Anal 42:1–13. https://doi.org/10.1016/j.media.2017.06.015

Shin H, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298. https://doi.org/10.1109/TMI.2016.2528162

Sirinukunwattana K, Raza SE, Tsang YW, Snead DR, Cree IA, Raj-poot NM (2016) Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans Med Imaging 35(5):1196–1206. https://doi.org/10.1109/TMI.2016.2525803

Wang X, Kang J, Zhang X (2020) A novel lung nodule detection system for CT images based on region growing and support vector machine. IEEE Access 8:182475–182483. https://doi.org/10.1109/ACCESS.2020.3029034

Yang B, Chen J, Liu W, Han Z, Guo Z (2021) A novel deep learning model for the identification and classification of lung nodules using global and local receptive fields. Comput Med Imaging Graph 89:101824

Yuan Y, Bar-Yoseph H, Yu S, Jiang H, Dewan M, Lubin N (2019) Deep learning for automated contouring of primary tumor volumes by MRI for nasopharyngeal carcinoma. Radiat Oncol 13(1):1–10. https://doi.org/10.1186/s13014-018-1127-y

Zhang W, Xie Y, Li L, Liu S, Zhang L, Tian J (2019) A transfer learning strategy for deep learning-based classification of 18F-FDG-PET images. Med Phys 46(7):3084–3093. https://doi.org/10.1002/mp.13547

Zhang L, Lu L, Nogues I, Summers RM, Liu S, Yao J, Li Q (2020) DeepPulmonary: a deep learning-based detection system for pulmonary nodules using chest CTs. IEEE Trans Med Imaging 39(3):1169–1179. https://doi.org/10.1109/TMI.2019.2945131

Download references

Author information

Authors and affiliations.

Centre de D́eveloppement des Technologies Avanćees, Algiers, Algeria

Djamel Bouchaffra, Faycal Ykhlef & Samir Benbelkacem

Laboratoire LIPN, UMR CNRS 7030 Institut Galilée—Université Sorbonne Paris Nord, Paris, France

Djamel Bouchaffra

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Djamel Bouchaffra .

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Bouchaffra, D., Ykhlef, F., Benbelkacem, S. (2024). Advancement in Lung Cancer Diagnosis: A Comprehensive Review of Deep Learning Approaches. In: Interdisciplinary Cancer Research. Springer, Cham. https://doi.org/10.1007/16833_2024_302

Download citation

DOI : https://doi.org/10.1007/16833_2024_302

Published : 08 August 2024

Publisher Name : Springer, Cham

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 20 February 2023

Deep learning ensemble 2D CNN approach towards the detection of lung cancer

  • Asghar Ali Shah 1 ,
  • Hafiz Abid Mahmood Malik 2 ,
  • AbdulHafeez Muhammad 1 ,
  • Abdullah Alourani 3 &
  • Zaeem Arif Butt 1  

Scientific Reports volume  13 , Article number:  2987 ( 2023 ) Cite this article

9424 Accesses

36 Citations

1 Altmetric

Metrics details

  • Biotechnology
  • Computational biology and bioinformatics
  • Health care

In recent times, deep learning has emerged as a great resource to help research in medical sciences. A lot of work has been done with the help of computer science to expose and predict different diseases in human beings. This research uses the Deep Learning algorithm Convolutional Neural Network (CNN) to detect a Lung Nodule, which can be cancerous, from different CT Scan images given to the model. For this work, an Ensemble approach has been developed to address the issue of Lung Nodule Detection. Instead of using only one Deep Learning model, we combined the performance of two or more CNNs so they could perform and predict the outcome with more accuracy. The LUNA 16 Grand challenge dataset has been utilized, which is available online on their website. The dataset consists of a CT scan with annotations that better understand the data and information about each CT scan. Deep Learning works the same way our brain neurons work; therefore, deep learning is based on Artificial Neural Networks. An extensive CT scan dataset is collected to train the deep learning model. CNNs are prepared using the data set to classify cancerous and non-cancerous images. A set of training, validation, and testing datasets is developed, which is used by our Deep Ensemble 2D CNN. Deep Ensemble 2D CNN consists of three different CNNs with different layers, kernels, and pooling techniques. Our Deep Ensemble 2D CNN gave us a great result with 95% combined accuracy, which is higher than the baseline method.

Similar content being viewed by others

lung cancer detection research paper

An ensemble deep learning model for risk stratification of invasive lung adenocarcinoma using thin-slice CT

lung cancer detection research paper

A diagnostic classification of lung nodules using multiple-scale residual network

lung cancer detection research paper

Effective lung nodule detection using deep CNN with dual attention mechanisms

Introduction.

Deep learning and machine learning algorithms provide state-of-the-art results in almost every field of life, including wireless sensor networks, order management systems, semantic segmentation, etc 1 , 2 , 3 . It hugely impacts bioinformatics, specifically cancer detection 4 , 5 . Cancer is a disease with the most death toll. It is the most dangerous disease ever known to humans. Cancer is still not curable as the people suffering from it come to know about it in the later stages. It is complicated to detect it at an early stage, and more cancer-related deaths are mostly lung cancer. Therefore, significant research has been conducted to develop a system that can detect lung cancer from CT scan images 6 . It is challenging to prevent cancer as it shows signs in the later stages where it is impossible to come out of it. So, people can only do a regular checkup every six months, especially those who drink and smoke. This study aims to develop a state-of-the-art system for the early detection of lung nodules using the latest proposed ensemble deep learning framework.

According to the latest report of the World Health Organization, death caused by Lung Cancer has moved from the top 9 to the top 6 in the list of diseases that cause the most significant number of deaths 7 . Lung Cancer has different types: small cell lung cancer and non-small cell Lung Cancer 8 . Figure  1 explains the CT Scan images used to detect the presence of a Lung Nodule, a cancer tumor. All tumors are not cancerous; the primary tumor types are Benign, Premalignant, and Malignant 6 .

figure 1

CT scan images show lung nodules with different locations and shapes in CT.

In this research, we have used a supervised deep learning model CNN because we need to classify the result as cancerous or non-cancerous. The simplest definition of understanding deep learning is, It learns from examples. It works like our brain works to learn from examples. For the concerns mentioned above related to Lung Nodule detection for early diagnosis of Lung Cancer, an ensemble of 2D CNN approaches has been developed to detect Lung Nodules. The data set used in this research is LUNA 16 Grand Challenge. Medical Sciences is one of the industries becoming an active part of practicing different machine learning and deep learning-based computerized automated software to balance the workload. With high-performance computing coming into the big picture, deep learning is becoming an active part of the research industry. The most critical role any deep learning model can play is to increase the system's efficiency, quality, and diagnosis to detect certain diseases more accurately and way before time to improve treatments and clinical results. Medicine and Health care are witnessing more implications if these deep learning and machine learning-based systems increase the accuracy of prediction and detection of diseases. Cancer is one of the most important parts of clinical research now a day's due to its high death rate and fewer chances of cure. Early detection of different cancer types can help in reducing the number of deaths all over the world. For the concerns mentioned earlier related to Lung Nodule detection for early diagnosis of Lung Cancer, an ensemble of 2D CNN approach has been developed to detect Lung Nodules. The data set used in this research is LUNA 16 Grand Challenge.

An Ensemble approach has been developed to help detect Lung nodules because it is tough to differentiate between a Lung Nodule and a Lung Tissue. For this purpose, a more accurate model should be developed to distinguish between the Lung Nodule Candidate and the actual Lung Nodule. Primarily the main issue faced by any researcher is the acquisition of relevant annotations/labeled image data instead of the availability of image data. All Free-text reports based on radiologists' findings are stored in the format of the PACS system. So, converting all these reports into more appropriate and accurate labeling of data and structural results can be a daunting task and requires text-mining methods. These text-mining methods themselves are an essential field of study. Deep learning nowadays is also widely used with text mining. In this regard, developing a structured reporting system will benefit Machine and Deep Learning objectives. This development can lead to the improvement of radiologic findings, and the patient care CAD system can help radiologists take the responsibility of more than one doctor. The Lung Nodule detection process includes a detailed inspection of Nodule Candidates and True Nodules. Lung Nodule candidates consist of true and false nodules resembling true ones. So, a classification system should be developed to select true nodules among all possible candidate nodules. Two challenges need to be addressed with more attention to establishing such nodules to detect true nodules.

Non-Nodules are highlighted, and some nodules are ignored in the CT scan, which is the radiological heterogeneity. It can lead to increased difficulty in differentiating between nodules and non-nodules. Nodules are in different sizes and different shapes. Larger nodules have a better tendency to be detected by the system, whereas small nodules have fewer chances, adding more to the challenges. Different shapes of a nodule are another factor that needs to be addressed by the model.

Related work

Many studies used deep learning and ensemble learning processes for classification problems 9 . The current CAD applications for Lung Cancer classifying lung nodules are very close to this paper's objective. Therefore, we researched the recently developed and state-of-the-art lung nodule classification techniques.

2D convolutional neural network

A two-dimensional CNN has been used to detect lung nodule presence in the CT scan. In 2D CNN, CNN only takes two dimensions. Around the image to get the main features and learn these features, CNN with a transfer learning approach was developed by Wangxia Zuo, Fuqiang Zhou, and Zuoxin Li 10 with MultiResolution CNN and Knowledge Transfer for Candidate Classification in Lung Nodule Detection. Image-wise calculation with CNN and different depth layers applied for Lung Nodule classification on Luna 16 Data Set to improve the accuracy of Lung Nodule Detection with 0.9733 Accuracy. Sanchez and Bram van Ginneken 11 developed CAD system for' pulmonary nodules using multi-view convolutional networks for False Positive Reduction. MultiView-KBC was developed for Lung Nodule Detection by Yutong Xie, Yong Xia, Jianpeng Zhang, Yang Song, Dagan Feng, Michael Fulham, and Weidong Cai 12 , which is based on Knowledge-based Collaborative Deep Learning for Benign-Malignant Lung Nodule Classification on Chest. Siddharth Bhatia, Yash Sinha, and Lavika Goel present a deep residual learning approach using CT Scan for cancer detection 13 . ResNet 14 and UNet models are used for feature extraction in this method. Machine learning algorithms XGBoost and RF (Random forest used to classify cancerous images. The accuracy of this model was 84%. The research proposed by Muhammad Imran Faisal, Saba Bashir, Zain Sikandar Khan, and Farhan Hassan Khan uses machine learning and ensamble learning methods to predict lung cancer through early symptoms. This study use different machine learning algorithms, including MLP (multilayer perceptron) 15 , SVM (Support vector machine) 16 , Naïve Bayes, and Neural network for the classification of lung cancer. The dataset used for this study is extracted from UCI repository. The accuracy of the ensemble learning method for the proposed study was 90% 17 .

3D convolutional neural network

Same as 2D CNN, but in this 3-Dimensional CNN, CNN considers three dimensions while learning the features like x, y, and z. Two sides are considered at once, like x and y, y and z, and z and x. False-Positive Reduction in Lung Nodules Detection using Chest Radiographs by an Ensemble of CNN was developed by Chaofeng Li, Guoce Zhu, Xiaojun Wu, and Yuanquan Wang 18 . For false positive reduction on Chest Radiographs with a fivefold cross-validation Multilevel contextual Encoding to detect the variable size and nodule shapes developed by Qi Dou, Hao Chen, Lequan Yu, Jing Qin, and Pheng-Ann Heng 19 . An Architecture developed to reduce the number of False Positives achieved 87% sensitivity with four false positives/scans. Qing Wu and Wenbing Zhao proposed a novel approach to detecting Small Cell Lung Cancer, and they suggested the entropy degradation method (EDM) for detecting Small Cell Lung Cancer. Due to the data set limitations, they developed their novel neural network, which they referred to as (EDM). They used 12 data sets: 6 were healthy, and six were cancerous. Their approach gave 77.8% accurate results in detecting Small Cell Lung Cancer. Wasudeo Rahane, Himali Dalvi, Yamini Magar Anjali Kalane, and Satyajeet Jondhale 20 used Machine Learning techniques to detect Lung Cancer with the help of Image Processing . Data were pre-processed with different image processing techniques so the machine learning algorithm could use it; a Support Vector Machine for the classification was used. Allison M Rossetto and Wenjin Zhou 21 give an approach to Convolution Neural Networks (CNN) with the help of multiple pre-processing methods. Deep learning played a significant role in this research. The implementation of CNNs did the accuracy of automated labeling of the scans. The results showed consistently high accuracy and a low percentage of false positives.

As discussed in the above section, none of the studies use an ensemble learning approach of machine learning or deep learning to identify the lung nodule. The main issue of the previous results was the improper or small dataset for the detection taken from minimum subjects. The above section clearly shows that the accuracy of detection with more machine learning or deep learning algorithms is very low. The current proposed study is going to cover these loopholes of the studies.

Proposed method

The previously presented studies had an issue with the ensemble learning approach. All the studies presented in the past did not use an ensemble learning approach of deep learning algorithms for lung cancer identification. As the ensemble learning approach gives the best average accuracies, this study will cover the loophole of the previous studies by using the ensemble learning approach on CNN algorithms using CT images taken from LUNA 16 dataset. A final solution Deep Ensemble 2D CNN is developed with the help of the Deep Learning Algorithm 22 to detect Lung Nodules from CT Scan images. It is imperative to select which model should be used to detect Lung Cancer with the help of Deep Learning. Here, the Supervised Deep Learning Algorithm 2D CNN is used to detect lung nodules. This section explains every step of the Deep Ensemble 2D CNN model that performs to get the best results and help develop a CAD system for Lung Nodule Detection. The idea of this Ensemble CNN with different CNN blocks is to get the correct features, which are very important to classify a true nodule among candidate nodules. In the end, we have calculated Accuracy, Precision, and recall using the formula below 23 , 24 .

In these equations, TPV is the true positive value, TVN is the True negative value, FPV is the False positive value, and FNV is the False-negative value 25 , 26 .

The step-by-step working of the model is explained as.

Access the dataset from Luna 16.

Data pre-processing (Data Balancing, Plotting, Data Augmentation, Feature extraction)

Splitting the dataset into training and testing data.

Applying Deep 2D Neural Network to the training and testing dataset.

Combine the prediction of Deep 2DNN.

Final Prediction of Lung cancer.

Figure 2 describes the research paradigm for the proposed model.

figure 2

The architecture of the proposed methodology.

Data collection

The crucial step in every research is the collection of data, as collecting the correct data helps get better results. The first step is organizing the enormous data set of CT Scan images. A Data set of CT Scan images were collected from LUNA 16 Data set which has helped to get the research completed 27 . It is essential to collect high-quality data so that the machines can understand the data easily. All CT Scan images are the same quality in showing the reports to any doctor. Images in the LUNA Data set were formatted as (.mhd) and (.raw) files. The .mhd files contained the header data, and the raw files had multidimensional image data. We used the SimpleITK python library to pre-process all these images to read all .mhd files.

Data pre-processing

The next step in the proposed solution is data pre-processing. It is a critical step in which data is converted into a more understandable form, making it very easy to understand and process by the machines 28 , 29 . It is the most vital step to transform data into the desired format so that devices can better understand it. All the CT scans in LUNA 16 Data set consisted of n 512 × 512 axial scans with 200 images in each CT scan. Only 1351 were positive nodules in these annotations, and all others were negative. There was an imbalance between the two classes, so we need to augment the data to overcome this issue. We can train the CNN model on all original pixels, increasing the computational load with training time. Instead, we decided to crop all images around the coordinates provided in the annotations. Figure  3 explains the dropped CT scan image from the dataset.

figure 3

Cropped CT scan images.

Furthermore, all the annotations provided in LUNA 16 Data set were in Cartesian coordinates. All these were converted to voxel coordinates. The image intensity in the dataset was defined in the Hounsfield scale. All these must be changed and rescaled for image processing purposes. All the images in the dataset belong to two classes which are positive and negative. Nodule Candidates with categories marked as 1 were positive and those with types marked as 0 were negative. Image Labels were created according to positive and negative. So finally, these label data can be used for training and testing.

It is usually in the format of Dicom images or MHD/Raw files. Before feeding data into any machine learning or deep learning model, it is crucial to converting the data into the required format so that machines can use it to understand and learn from it. Figure  4 shows the plotted image for the proposed system.

figure 4

Input images to the proposed methodology.

Converting data into JPEG images

The next step is to convert all the pre-processed data into Jpeg format so that computers can understand it. Jpeg format is human readable, and humans can verify whether all the images are in the desired format, which can be seen and viewed easily in Jpeg format. Furthermore, the data was converted into small 50 × 50 images so that it would reduce the size of the data and it will consume less computing power. Hue data size consumes a lot of computing power, so to overcome this issue, images were reduced to 50 × 50.

Data augmentation

It is imperative to augment the data when there is an imbalance issue. Manual data augmentation is done because data was not balanced. Data augmentation 30 helps in this regard so that it rotates the images in all possible directions and makes a copy of them. This way, you can create more copies of the same data from a different angle, which helps solve the data imbalance issue. We also used Keras Image Data Generator for image pre-processing and data augmentation. Keras Image Augmentation will zoom in and out to learn more about image data shear range to flip an image. These are critical steps so that the data is possibly processed in all possible ways so machines can learn the data in each possible way.

Split the data set into training and testing

The next important thing is splitting the data into testing and training or training and validation data. In this way, we can give machines the data to train and then provide the validation data to check the accuracy of our model. Reading the candidate's data from the CSV file and then splitting the cancerous and non-cancerous data so it can be correctly labeled. Making separate folders of cancer and non-cancer files is essential so that machines can learn what these files are and train. Training data is the data the artificial neural network and CNN will understand so they can learn more about the data and learn from it. It is a significant step to split the data so some portion of the data can be used for training. The next important thing is to give the test data to the artificial neural network and CNNs so the results can be generated and detect Lung Cancer form the CT Scan images can be done with the test data. Test Data is the actual data on which the algorithm's accuracy will be checked. If the result's accuracy is as required, then the results will be noted. If the results are not up to the mark, then some changes will be made to the layers in artificial neural networks and CNN to get more accurate results.

Deep ensemble 2D convolutional neural network

Figure  5 explains the different layers of the CNN Model. A final solution Deep Ensemble 2D CNN is developed with the help of the Deep Learning Algorithm to detect Lung Nodules from CT Scan images. It is essential to select which model should be used to detect Lung Cancer with the help of Deep Learning. This section explains every step our Deep Ensemble 2D CNN model will perform to get the best results and help develop a CAD system for Lung Nodule Detection. The idea of this Ensemble CNN with different CNN blocks is to get the correct features, which are very important to classify a true nodule among candidate nodules.

figure 5

Deep ensemble 2D CNN architecture.

A Deep Ensemble 2D CNN Architecture was designed for an effective Lung Nodule Detection CAD system. A total of 3 2D CNNs have been designed and developed with different layers and pooling techniques. Each CNN in Deep Ensemble 2D CNN architecture has a different number of feature maps kernels with Max Pooling, Average Pooling, and Batch Normalization. Convolutional Layers in CNN Architecture do the feature extraction work. Each kernel convolves on the input and extracts the main features, which will help make output features later used for learning.

Keeping that in mind, we designed a Deep CNN model with more depth layers with a different number of feature maps, which will help extract true nodules among nodule candidates. The first layer in this Deep Ensemble, 2D CNN architecture, has 32 feature maps and 6 in the third CNN to learn the features of nodules with 3 × 3 and 5 × 5 kernel sizes. As the layers go deeper, we increase the number of feature maps with the same kernel size. As the neural network grows with more layers, more memory blocks are created to store the information, which helps to decide the nodule. Each CNN in this Deep Ensemble 2D CNN has a different number of layers and kernels.

Furthermore, in CNN, Maxpooling 31 is used to get the maximum value from the pooling layer filter. In the 2nd CNN, batch normalization is used, and in the third CNN, Average Pooling is utilized to get the average of all values. Furthermore, more depth layers were added to increase the accuracy and tuning of the architecture to overcome the over-fitting issues. More layers were introduced into this architecture to increase the efficiency of this model. This Deep Ensemble 2D CNN will help get more accurate modular features and minimize the false positives in the true nodules. In the end, the predictions of all three CNN will be combined to make a more accurate model. Using these predictions final confusion matrix was developed, which gave good results. Figure  6 illustrates the CNN architecture.

figure 6

Convolutional neural network one architecture.

As mentioned in Deep Ensemble 2D CNN, this architecture is developed by developing and combining three different CNNs. This section explains the architecture of each CNN with several layers of each CNN. In CNN1 Architecture, three blocks of CNN are developed with a different number of layers and feature maps. The first CNN block has the 1st input layer of CNN, which uses a 3 × 3 kernel with 32 feature maps. In the first layer, RELU 32 is used as an activation function. The input size is given in the first layer, and we have used the same image size, 50 × 50, and RGB channels as 3. Moving into the further hidden layers, in this first block of CNN, the next CNN layer has the same 32 feature maps with the same kernel size of 3 × 3. At the end of this 1st block, a Max Pooling 2D function will sub-sample the data. Our Max pooling filter size is 2 × 2, which will convolve on the data extracted by the feature maps, and it will use a 2 × 2 filter and get the maximum value from the data. This first CNN block is essential to get the features of the nodule and non-nodules. It will extract the main features that help to distinguish between the nodule candidate and the true nodule. Moving into the 2nd of CNN, the number of feature maps increased with 64 feature maps and kept the kernel size same to 3 × 3. The activation function is the same as the above layers, RELU, and the next CNN layer has the same number of feature maps and kernel size. Moving forward toward the Max Pooling layer in this 2nd block, there is the same Max Pooling as above, which is 2 × 2. In the last and third block of CNN, only one layer of CNN has 128 feature maps with the same kernel size, which is 3 × 3, and the Max Pooling layer is the same as above, which has a 2 × 2 size. In this 3rd block of CNN, we have a dropout rate of 0.1, meaning 10% of the neurons will be dropped in this layer to increase accuracy and avoid over and under-fitting issues.

After the above CNN blocks, a Flatter layer will convert our CNN model into a one-dimensional layer, converting it into a pure ANN form. Then dense layers are added to make the architecture into a complete ANN form. In this layer, there is a dropout rate of 20. In the last layer, we have output dim as one because we need to predict only one result as nodule or non-nodule. Sigmoid is used as the activation function because we need binary output, not categorical, so Sigmoid is the best choice to predict the binary result 33 . Figure  7 illustrates the conversion of the CNN model array to flatten the layer.

figure 7

Convolutional neural network two architecture.

The second CNN in Deep Ensemble 2D CNN Architecture is different in structure as a different number of layers and batch normalization is used instead of Max Pooling to sub-sample the data. In the very first layer of CNN, we used 5 × 5 size of kernels with six feature maps only instead of 3 × 3 because we tried to develop a different CNN as compared to the very first so we could know which feature map size could help get a more accurate result.

With strides of 1 × 1, the kernel filter will move one by one, and for activation, we have used RELU like the first CNN model. The exact size of 50 × 50 is used with three RGB channels for input shape. After the first layer, Average Pooling is utilized instead of Max Pooling. Average pooling works the same way as Max Pooling, but the calculation differs. In Max Pooling, we get the maximum value, and in Average Pooling, an average of data is calculated inside the feature kernel used to subsample the data. This Average Pooling uses a 2 × 2 size of kernel and strides of 1 to move the filter one by one. After the Average Pooling layer, there are some more hidden layers of CNN. The second Layer of CNN has 16 Feature maps with the same filter size of 5 × 5, keeping the stride the same to move one by one.

After this layer, there is another layer of Average Pooling. In this pooling layer, the same filter size 2 × 2 is there, but this time there is the strides of 2, which means our filter will move two steps instead of the traditional one-step movement. We need to get the features in every possible way and help our network to get the components in every possible way and learn from them. What features can it get by moving only one step, what features will it get by moving two steps each time, and how much will it help to understand the data better. In the last and third layers of CNN, we have used the same kernel size, which is 5 × 5 with 120 feature maps, and keeping the strides to 1. After this last layer, we have a flattened layer that will convert the CNN layers into one-dimensional ANN Architecture. After that, the traditional ANN is used to learn from CNN and classify the data. In the last layer, the Sigmoid activation function is utilized. Our results are binary, as we need to predict only nodules and non-nodule. If there is a need to predict more than two for any categorical data, SoftMax is a good option, as explained in Fig.  8 .

figure 8

Convolutional neural network three architecture.

Several layers and feature maps have been used in the last and third CNN Models of this Deep Ensemble 2D CNN Architecture. This CNN model uses three layers with 32 feature maps and a kernel size of 3 × 3. In the first layer, the input shape of the data is the same as the image size, which is 50 × 50, and the activation function used is RELU. 2 Layers of CNN have a 3 × 3 filter size, and the third layer has a 5 × 5 size. The dropout rate in this set of CNN layers is set to 0.4 after three layers of CNN, meaning 40% of neurons will be dropped. In this architecture of CNN, no average or max pooling is used. Instead, batch normalization has been used to increase the learning rate of the mode. In the hidden layers of this CNN model, three layers of CNN with 64 feature maps and 3 × 3 kernel size have been used. In the last layer of this CNN block, a 5 × 5 kernel size is used. After this block of CNN, the dropout rate is added to 0.4, which means 40% of the neurons will be dropped. Moving forwards in the third section of this CNN model, there is one layer of CNN with feature maps of 128 and kernel size of 3 × 3. After filtering this last layer, we have a flattened layer, which will convert the CNN layers into one-dimensional ANN Architecture. Later, the traditional ANN is used to learn from the CNN and classify the data.

Our Deep Ensemble 2D CNN used RELU as the activation function. Rectified Linear Units (RELUs) are a well-known and mostly used activation function in our proposed CNN model. A study from Krizhevsky et al. 34 showed that RELUs enable the network to train several times faster than using the units in deep CNN. RELU is used for Input Layers and other multi-hidden layers in our Deep Ensemble 2D CNN.

As mentioned earlier, we used the Sigmoid activation function in the last layer 35 . Our results are binary, as we need to predict only nodules and non-nodule. If there is a need to predict more than two for any categorical data, then SoftMax is a good choice. Nonlinear Activation Functions make it easy for the model to adapt or generalize with a different type of data and differentiate between the output. Our classification task is the binary classification between nodule and non-nodule, so Sigmoid is the best choice for binary classification.

Moreover, we mainly use the Sigmoid function because it exists between 0 and 1. Therefore, it is primarily used for tasks where we must predict the probability as an output. Since the probability of anything exists between 0 and 1, Sigmoid is the right choice. The function is differentiable. That means we can find the slope of the Sigmoid curve at any two points. Figure  9 shows the working of the sigmoid function.

figure 9

The Sigmoid curve at any two points.

Experimental results and analysis

After pre-processing the data in the correct format, the very next important is to check the data on our Deep Ensemble 2D CNN Architecture. In this regard, the whole data was segmented into training and validation data. Both data segments have cancer and non-cancer lung nodule files, so the CNN Model can get to know both data types while training.

This section uses Deep Ensemble 2D CNN architecture and a validation split of 10%, which will help to use 90% of the data as training and the remaining as validation. It helps to make models train and test at the same time. With 70 epochs set in the model fit generator, it will iterate the dataset 70 times.

Result of CNN1

This section explains how each CNN has performed on the data. In the first CNN model, we first ran it on training and validation data. After the results, the test data was given to the model to predict the outcome of the CNN. The first iterative model of CNN provides an accuracy of 94.5%, which would be considered excellent results according to AUC accuracy values 36 . Figure  10 explains the results.

figure 10

Accuracy curve of CNN1.

As mentioned, the model was compiled with 70 epochs 37 . Each epoch validation split divides 80% of the data into training and 20% of the data into validation. The training progress and epochs also show that the classification accuracy is increasing. At the same time, the loss of the model decreases rapidly at each iteration. The loss curve gives the result of 0.14 at the first iteration of CNN. The results of the Loss curve are described in Fig.  11 .

figure 11

Loss curve of CNN1.

Table 1 explains the training accuracy and loss for the first CNN model.

It gradually decreased as the model got more and more training in each epoch, and in the end, only a fraction of 0.1891 was recorded. According to the above results, accuracy is not enough to judge the model's performance. Later, we gave our model some data to predict the results and had around 1600 images to predict. After the prediction was made, the next step was to check the accuracy of the predictions, and for this purpose, we made a confusion matrix 38 , 39 . Below are the confusion matrix results for the data used for the first CNN layer. Here Nodules and Non-Nodules are the values of the detected and non-detected lung cancer images explained in Fig.  12 . Figure  13 explains the ROC curve for the CNN model for the training dataset.

figure 12

Confusion matrix of CNN1.

figure 13

ROC curve of CNN1.

Result of CNN2

After the performance evaluation explanation of CNN1, moving forward in this section, it is explained how CNN2 has performed on the testing data. In the second CNN model, we first ran it on training and validation data. After getting the results, we gave this model the test data to predict the outcome of the CNN. The second model of CNN gave us some good results, which are stated below. The result also shows an accuracy of 0.93. The accuracy is gradually increasing from the first iteration to the last. Figure  14 explains the accuracy of CNN2.

figure 14

Classification accuracy of CNN2.

Table 2 shows that accuracy is insufficient to evaluate the model's performance. Afterward, we provided our model with information to forecast the outcomes and had roughly 1600 photos. The next step after making a prediction is to assess its accuracy, and a confusion matrix was created for this reason. The results of CCN2 for the testing images are explained in Fig.  15 . The ROC curve for the testing dataset is presented in Fig.  16 .

figure 15

Confusion matrix of CNN2.

figure 16

ROC curve of CNN2.

Result of CNN3

We explain how CNN3 fared on the testing data in the sections that follow the discussion of CNN2's performance evaluation. We first tested the third CNN model using training and validation data. After the outcomes, we provided this model with test data to forecast how the CNN would turn out. As a result, the CNN third model produced some promising results, which we have included in Table 3 . Figures  17 and 18 present the confusion Matrix and ROC curve for CNN3.

figure 17

Confusion matrix of CNN 3.

figure 18

ROC curve of CNN 3.

After the above results, accuracy is not enough to judge the model performance. After this, we gave our model some data to predict the results, and we had around 1600 images to predict. After the prediction was made, the next step was to check the accuracy of the predictions, and for this purpose, we made a confusion matrix. Below are the results.

Combine results of all CNN (deep ensemble 2D CNN architecture)

After combining the prediction of all three CNNs, which we have designed especially for this Lung Nodule issue. We clearly can see from the confusion matrix that there is a difference in TP, TN, FP, and FN, which tells that combining all three CNN was an excellent choice to increase the accuracy and reduce the False Positives. The CNN architecture results are combined using the averaging method of deep ensemble learning 40 .

Our Deep Ensemble 2D CNN has three different CNNs, which achieve an accuracy of 90% and above. Our CNN1 attained an accuracy of 94.07%, CNN2 achieved an accuracy of 94.44%, and CNN3 attained an accuracy of 94.23%. Now we shall calculate the overall accuracy, precision, and recall of our Deep Ensemble 2D CNN from the confusion matrix. Table 4 illustrates the overall results of the CNN model. Figures  19 and 20 explain the combined confusion matrix and ROC curve results for the CNN model.

figure 19

Confusion matrix of deep ensemble 2D CNN.

figure 20

ROC curve of deep ensemble 2D CNN.

Comparison with other methodologies

Below we have stated the comparison between our proposed deep ensemble 2D CNN methodology and baseline methodology, which we considered in this approach to improve the accuracy and performance of the model. The comparison of the model with the base papers of the study is illustrated in Table 5 .

Table 5 compares the proposed study with the previously presented studies. Knowledge-based Collaborative Deep Learning 11 obtained the highest accuracy of 94% from the previous studies. An accuracy of 90% was obtained from the Ensemble learning method with SVM, GNB, MLP, and NN 17 . This was the base paper for the study. The study uses the ensemble learning approach for deep learning CNN model for the early identification of Lung cancer from LUNA 16. The proposed study gives an accuracy of 95% with an ensemble learning model considered the highest accuracy in deep learning and ensemble learning algorithms presented to date.

Gaps and future direction

Our Multilayer CNN is developed by focusing on a 2D Convolution Neural Network. In future work, 3D CNN should be used as 3D can get more spatial information. More data should be gathered to make the model more mature and accurate. An extensive data set will help train the model on a new data set, which will help make the model more accurate. More diverse data will enhance the performance of the model.

There is always room for improvement in any research conducted. There is no final product that has been developed for the detection of any cancer. There has been no international standard developed that will be followed for the detection and prediction of cancers. So, there is always considerable room to increase the accuracy of predictions and detections. More work for detecting and forecasting different cancers will lead to new openings and solutions for detecting cancer in the early stages.

Cancer is a hazardous disease related to a massive number of deaths yearly. Billions of dollars have been spent till now on the research of cancer. Still, no final product has been developed for this purpose. It shows the need for more work to understand the cause and make early predictions. This opens a new opportunity for researchers to develop a system or conduct research that will be very helpful in early cancer detection. If this Is made possible to detect cancer in the very beginning, it can help millions of people out there. There has not been a standard set or final output product which will be used for cancer detections. So, all the researchers should collect current and fresh data and then apply different deep learning and machine learning algorithms to detect and predict cancer. It is essential to use new and existing data, which will help us know whether these Deep Learning and Machine Learning models still give the same accuracy.

Every year a massive number of deaths are related to cancer which is increasing daily. Billions of dollars have been spent on the research of cancer. It is still an unanswered mystery that needs to be solved. Cancer research is still going on and will be going on and on because no final product has been developed. No specific standards set are used for the detection and prediction of cancer. Cancer research is an open question that needs to get more attention. The latest research on the current data set will open gateways for new research by giving some latest stats and inside stories of what we have achieved till now for the detection and prediction of cancer. It will help to understand some latest causes or signs of cancer.

Many previous studies were presented by the researchers for identifying lung cancer, as discussed in the related work section. The problem of their researchers was low accuracy, lass algorithms, and an inefficient dataset. The proposed study was developed to overcome the loophole of the previous study by using the Deep 2D CNN approach. Three CNN models are used for the proposed study CNN1, CNN2, and CNN3. The results of these three models are deeply explained in Tables 1 , 2 , and 3 . After that, the ensemble 2D approach of deep learning combines all these three deep learning methods. The ensemble deep learning method gives an accuracy of 95%, which is the recorded maximum value of any deep learning algorithm for identifying lung cancer to date. This study shows state-of-the-art results of an ensemble learning approach for identifying lung cancer from the image dataset. In the future, a system may be developed that uses many algorithms in ensemble learning with another extensive and efficient dataset for identifying lung cancer.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.

Hojjatollah Esmaeili, Vesal Hakami, Behrouz Minaei Bidgoli, M. S. Application-specific clustering in wireless sensor networks using combined fuzzy firefly algorithm and random forest. Expert Syst. Appl. Volume 210 , (2022).

Sohail, A. et al. A systematic literature review on machine learning and deep learning methods for semantic segmentation. IEEE Access https://doi.org/10.1109/ACCESS.2022.3230983 (2022).

Article   Google Scholar  

Ilyas, S., Shah, A. A. & Sohail, A. Order management system for time and quantity saving of recipes ingredients using GPS tracking systems. IEEE Access 9 , 100490–100497 (2021).

Shah, A. A., Ehsan, M. K., Sohail, A. & Ilyas, S. Analysis of machine learning techniques for identification of post translation modification in protein sequencing: A review. in 4th International Conference on Innovative Computing, ICIC 2021 1–6 (IEEE, 2021). doi: https://doi.org/10.1109/ICIC53490.2021.9693020 .

Shah, A. A., Alturise, F., Alkhalifah, T. & Khan, Y. D. Evaluation of deep learning techniques for identification of sarcoma-causing carcinogenic mutations. Digit. Heal. 8 , (2022).

Rahane, W., Dalvi, H., Magar, Y., Kalane, A. & Jondhale, S. Lung cancer detection using image processing and machine learning healthcare. In Proceedings of the 2018 Interanational Conference on Current Trends Towards Converging Technology. ICCTCT 2018 1–5 (2018) doi: https://doi.org/10.1109/ICCTCT.2018.8551008 .

Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 2021. CA. Cancer J. Clin. 71 , 7–33 (2021).

Article   PubMed   Google Scholar  

Gilad, S. et al. Classification of the four main types of lung cancer using a microRNA-based diagnostic assay. J. Mol. Diagnostics 14 , 510–517 (2012).

Article   CAS   Google Scholar  

Ghasemi Darehnaei, Z., Shokouhifar, M., Yazdanjouei, H. & Rastegar Fatemi, S. M. J. SI-EDTL: Swarm intelligence ensemble deep transfer learning for multiple vehicle detection in UAV images. Int. J. Commun. Syst. https://doi.org/10.1002/cpe.6726 (2022).

Zuo, W., Zhou, F., Li, Z. & Wang, L. Multi-resolution cnn and knowledge transfer for candidate classification in lung nodule detection. IEEE Access 7 , 32510–32521 (2019).

Setio, A. A. A. et al. Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans. Med. Imaging 35 , 1160–1169 (2016).

Xie, Y. et al. Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT. IEEE Trans. Med. Imaging 38 , 991–1004 (2019).

Rao, G. S., Kumari, G. V., & Rao, B. P. Network for biomedical applications . vol. 2 (Springer Singapore, 2019).

Wang, W. et al. Exploring cross-image pixel contrast for semantic segmentation. In Proceedings of the. IEEE Int. Conf. Comput. Vis. 7283–7293 (2021) doi: https://doi.org/10.1109/ICCV48922.2021.00721 .

Ramchoun, H., Amine, M., Idrissi, J., Ghanou, Y. & Ettaouil, M. Multilayer perceptron: Architecture optimization and training. Int. J. Interact. Multimed. Artif. Intell. 4 , 26 (2016).

Google Scholar  

Berwick, R. An Idiot's Guide to Support vector machines (SVMs): A New Generation of Learning Algorithms Key Ideas. Village Idiot 1–28 (2003).

Faisal, M. I., Bashir, S., Khan, Z. S. & Hassan Khan, F. An evaluation of machine learning classifiers and ensembles for early stage prediction of lung cancer. In 2018 3rd International Conference on Emerging Trends Engineering Science Technology. ICEEST 2018 1–4 (2019). https://doi.org/10.1109/ICEEST.2018.8643311 .

Li, C., Zhu, G., Wu, X. & Wang, Y. False-positive reduction on lung nodules detection in chest radiographs by ensemble of convolutional neural networks. IEEE Access 6 , 16060–16067 (2018).

Dou, Q. et al. 3D deeply supervised network for automatic liver segmentation from CT volumes. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 9901 LNCS , 149–157 (2016).

Al-Tawalbeh, J. et al. Classification of lung cancer by using machine learning algorithms. In IICETA 2022 - 5th Interantional Conference on Engineering Technology Its Applications 528–531 (2022). https://doi.org/10.1109/IICETA54559.2022.9888332 .

Gulhane, M. & P.S, M. Intelligent Fatigue Detection and Automatic Vehicle Control System. Int. J. Comput. Sci. Inf. Technol. 6 , 87–92 (2014).

Shrestha, A. & Mahmood, A. Review of deep learning algorithms and architectures. IEEE Access 7 , 53040–53065 (2019).

Yu, L. et al. Prediction of pathologic stage in non-small cell lung cancer using machine learning algorithm based on CT image feature analysis. BMC Cancer 19 , 1–12 (2019).

Shah, A. A., Alturise, F., Alkhalifah, T. & Khan, Y. D. Deep Learning Approaches for Detection of Breast Adenocarcinoma Causing Carcinogenic Mutations. Int. J. Mol. Sci. 23 , (2022).

Shah, A. A. & Khan, Y. D. Identification of 4-carboxyglutamate residue sites based on position based statistical feature and multiple classification. Sci. Rep. 10 , 2–11 (2020).

Article   ADS   Google Scholar  

Mohammed, S. A., Darrab, S., Noaman, S. A. & Saake, G. Analysis of breast cancer detection using different machine learning techniques . Communications in Computer and Information Science vol. 1234 CCIS (Springer Singapore, 2020).

Chon, A. & Balachandar, N. Deep convolutional neural networks for lung cancer detection. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 9887 LNCS , 533–534 (2016).

Shamim, H. I., Shamim, H. S. & Shah, A. A. Automated vulnerability detection for software using NLP techniques. 48–57.

Guyon, I., Gunn, S., Nikravesh, M. & Zadeh, L. Feature extraction foundations. 1–8 (2006).

Chlap, P. et al. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 65 , 545–563 (2021).

Badrinarayanan, V., Kendall, A. & Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39 , 2481–2495 (2017).

Agarap, A. F. Deep learning using rectified linear units (ReLU). at http://arxiv.org/abs/1803.08375 (2018).

Naz, N., Ehsan, M. K., Qureshi, M. A., Ali, A. & Rizwan, M. Prediction of covid-19 daily infected cases ( worldwide & united states ) using regression models and Neural Network. 9 , 36–43 (2021).

Gonzalez, T. F. Handbook of approximation algorithms and metaheuristics. Handb. Approx. Algorithms Metaheuristics 1–1432 (2007) doi: https://doi.org/10.1201/9781420010749 .

Han, J. & Moraga, C. The influence of the sigmoid function parameters on the speed of backpropagation learning. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 930 , 195–201 (1995).

Cortes, C. & Mohri, M. AUC optimization vs. error rate minimization. Adv. Neural Inf. Process. Syst. (2004).

Marius-Constantin, P., Balas, V. E., Perescu-Popescu, L. & Mastorakis, N. Multilayer perceptron and neural networks. WSEAS Trans. Circuits Syst. 8 , 579–588 (2009).

Chicco, D. & Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21 , 1–13 (2020).

Visa Sofia, D. Confusion matrix-based feature selection sofia visa. Confusion Matrix-based Featur. Sel. Sofia 710 , 8 (2011).

Murray, I. Averaging predictions. 1–4 (2016).

Download references

Acknowledgements

The authors would like to thank the Deanship of Scientific Research at Majmaah University, Saudi Arabia, for supporting this work under Project number R-2023-16.

Author information

Authors and affiliations.

Department of Computer Sciences, Bahria University, Islamabad, Pakistan

Asghar Ali Shah, AbdulHafeez Muhammad & Zaeem Arif Butt

Faculty of Computer Studies, Arab Open University Bahrain, A’ali, Bahrain

Hafiz Abid Mahmood Malik

Department of Computer Science and Information, College of Science in Zulfi, Majmaah University, Al-Majmaah, Saudi Arabia

Abdullah Alourani

You can also search for this author in PubMed   Google Scholar

Contributions

A.A.S. and H.A.M.M. envisioned the idea for research designed, wrote and discussed the results. A.M., Z.A.B., and A.A. worked on the literature and discussion section. All authors provided critical feedback, reviewed the paper, and approved the manuscript.

Corresponding author

Correspondence to Hafiz Abid Mahmood Malik .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Shah, A.A., Malik, H.A.M., Muhammad, A. et al. Deep learning ensemble 2D CNN approach towards the detection of lung cancer. Sci Rep 13 , 2987 (2023). https://doi.org/10.1038/s41598-023-29656-z

Download citation

Received : 04 August 2022

Accepted : 08 February 2023

Published : 20 February 2023

DOI : https://doi.org/10.1038/s41598-023-29656-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Explainable lung cancer classification with ensemble transfer learning of vgg16, resnet50 and inceptionv3 using grad-cam.

  • Yogesh Kumaran S
  • J. Jospin Jeya
  • Mohammed Alojail

BMC Medical Imaging (2024)

Optimizing double-layered convolutional neural networks for efficient lung cancer classification through hyperparameter optimization and advanced image pre-processing techniques

  • M. Mohamed Musthafa
  • I. Manimozhi
  • Suresh Guluwadi

BMC Medical Informatics and Decision Making (2024)

DEL-Thyroid: deep ensemble learning framework for detection of thyroid cancer progression through genomic mutation

  • Asghar Ali Shah
  • Rehmana Younis
  • Zia UrRehman
  • Juanjuan Zhao

Scientific Reports (2024)

Assessing the efficacy of 2D and 3D CNN algorithms in OCT-based glaucoma detection

  • Rafiul Karim Rasel
  • Xiaoyi Raymond Gao

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

lung cancer detection research paper

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • DOI: 10.1016/j.jtho.2024.07.022
  • Corpus ID: 271704716

From the IASLC Early Detection and Screening Committee Terminology Issues in Screening and Early Detection of Lung Cancer - IASLC Early Detection and Screening Committee Expert Group Recommendations.

  • Rudolf M. Huber , M. Cavic , +21 authors Rudolf M. Huber
  • Published in Journal of Thoracic Oncology 1 August 2024

Related Papers

Showing 1 through 3 of 0 Related Papers

SYSTEMATIC REVIEW article

Research trends in lung cancer and the tumor microenvironment: a bibliometric analysis of studies published from 2014 to 2023.

Zhilan Huang&#x;

  • 1 The Fourth Clinical Medical College of Guangzhou University of Chinese Medicine, Shenzhen, Guangdong, China
  • 2 Department of Respiratory Medicine, Shenzhen Traditional Chinese Medicine Hospital, Shenzhen, China

Background: Lung cancer (LC) is one of the most common malignant tumors in the world and the leading cause of cancer-related deaths, which seriously threatens human life and health as well as brings a heavy burden to the society. In recent years, the tumor microenvironment (TME) has become an emerging research field and hotspot affecting tumor pathogenesis and therapeutic approaches. However, to date, there has been no bibliometric analysis of lung cancer and the tumor microenvironment from 2014 to 2023.This study aims to comprehensively summarize the current situation and development trends in the field from a bibliometric perspective.

Methods: The publications about lung cancer and the tumor microenvironment from 2014 to 2023 were extracted from the Web of Science Core Collection (WoSCC). The Microsoft Excel, Origin, R-bibliometrix, CiteSpace, and VOSviewer software are comprehensively used to scientifically analyze the data.

Results: Totally, 763 publications were identified in this study. A rapid increase in the number of publications was observed after 2018. More than 400 organizations published these publications in 36 countries or regions. China and the United States have significant influence in this field. Zhou, CC and Frontiers in Immunology are the most productive authors and journals respectively. Besides, the most frequently cited references were those on lung cancer pathogenesis, clinical trials, and treatment modalities. It suggests that novel lung cancer treatment models mainly based on the TME components, such as cancer-associated fibroblasts (CAFs) may lead to future research trends.

Conclusions: The field of lung cancer and the tumor microenvironment research is still in the beginning stages. Gene expression, molecular pathways, therapeutic modalities, and novel detection technologies in this field have been widely studied by researchers. This is the first bibliometric study to comprehensively summarize the research trend and development regarding lung cancer and tumor microenvironment over the last decade. The result of our research provides the updated perspective for scholars to understand the key information and cutting-edge hotspots in this field, as well as to identify future research directions.

Introduction

Lung cancer (LC) is a global health concern and one of the leading causes of cancer-related mortality. According to the global cancer statistics report published by the International Agency for Research on Cancer (IARC), incidence and mortality rates of lung cancer remain high, accounting for 18% of global cancer deaths in 2020 ( 1 – 3 ).

Surgery, radiotherapy, and chemotherapy have been the standard of care for lung cancer treatment in recent years. However, the clinical use of targeted therapies and immunotherapy has been increasing. The focus has shifted to detecting driver genes associated with tumor development, such as EGFR, KRAS, and MET, and identifying the signaling pathways of cell growth or apoptosis regulated by these genes. Targeting treatment to these genes has significantly improved the intermediate survival of lung cancer patients. Immunotherapy is now the standard first-line treatment for patients with advanced or metastatic mutation-negative driver genes in NSCLC. Unfortunately, tumor recurrence often leads to resistance to the initially effective drug ( 4 ).

With the emerging heated concept of tumor microenvironment (TME), increasing evidence suggests that TME promotes cancer progression and may mediate therapeutic resistance. Lung cancer-related therapies and studies are gradually expanding from focusing solely on the tumor cells themselves to the broader field of tumor microenvironment research. The development of cancer is strongly correlated with the physiological status of the tumor microenvironment, which can regulate tumor cells multiplication and bolster resistance to therapy. The TME is a hierarchically structured ecosystem that contains a variety of cell types ranging from tumor-associated macrophages (TAMs), immune cells, and cancer-associated fibroblasts (CAFs), as well as blood vessels, nerve fibers, extracellular matrix, and related noncellular components ( 5 – 7 ). In particular, immune cells play important roles in TME, which includes the promotion of tumor growth, and play a key role in host immune surveillance and elimination of neoplastic cancer cells ( 8 ). The cellular composition and functional status of the TME change depending on the tumor category, intrinsic characteristics of the cancer cells, tumor stage, and the characteristics of the individual patient. The effects of these cells can be mutual concerning the tumor and play a key role in host immunosurveillance and elimination of neoplastic cancer cells ( 9 ). Collectively, their interactions regulate regional immune effects and ultimately influence lung cancer outcomes, thus the cells in the TME and their secreted molecules are now considered to be critical in the pathogenesis of cancer for which they serve as potential targets for novel therapeutic cancer interventions.

TAM is also believed to be a key factor driving the TME to promote lung cancer development. It can be directly involved in tumor invasion, migration, epithelial-to-mesenchymal transition (EMT), and angiogenesis by secreting the chemokine CCL18. Ultimately, this leads to cancer progression and activates the NF-κB pathway in CAFs, inducing stemness and drug resistance in tumor cells ( 10 , 11 ). The study by Xiang indicated that the increase in Tregs and dendritic cells (DCs) in the may also to acquired resistance after targeted therapy and immunotherapy ( 12 ). Based on scRNA-seq analysis and in vitro experiments, Aiko showed that high levels of IL-1β in TME may cooperate with IFN-γ to induce up-regulated expression of PD-L1 in tumor cells through activation of MAPK signaling, which in turn leads to resistance to tumor immunosuppression ( 13 ).

Besides, the TME can be subdivided into six specific categories: hypoxic ecological niche, immune microenvironment, metabolic microenvironment, acidic ecological niche, innervation ecological niche, and mechanical microenvironment ( 6 ). There is also bi-directional communication between microenvironments, so that targeting one specific microenvironment may result in a series of changes in other specific microenvironments and relevant pathways. As more and more studies have demonstrated the involvement of TME components in immune evasion and drug resistance against tumor cells ( 14 – 16 ). TAM is dynamic and subject to change due to pathogenic factors. Cigarette smoking is the most significant risk factor for lung cancer incidence and mortality ( 17 , 18 ). Using a mouse cellular model, Bianchi discovered that exhaled tobacco smoke could induce the polarization of M2 phenotypic macrophages through various mechanisms. This ultimately hinders the anti-inflammatory effects of TAM in the TME of smokers with lung cancer, leading to the development of an immunosuppressive microenvironment ( 1 ). The study of lung cancer is shifting from a cancer-centered paradigm to one that considers the tumor microenvironment (TME) as a whole. It will be meaningful to monitor the dynamic pattern of development between changes in the TME and lung cancer.

Bibliometrics is an emerging method of literature analysis, which is a cross-discipline integrating mathematics, statistics, and bibliography. It is now widely applied by researchers, institutions, and countries in multiple disciplines and fields to build up all knowledge carriers into visual knowledge networks from quantitative and qualitative perspectives. Then through analyzing the big data intelligently the trend and current status of a particular research area can be derived, which will help to guide the policy decisions ( 19 ). Additionally, with bibliometric analysis, researchers can quickly, accurately, and comprehensively obtain detailed information, including the intellectual network, research topic evolution, potential development prospects, authors, collaborations, keywords, journals, countries, research institutes, references, and other details of the relevant research areas. Eventually, tools like CiteSpace, VoSviewer, R package, etc. can be used to visualize the results. Comparing it with the traditional literature review, the analysis based on bibliometrics provides a more comprehensive perspective of research trends with more objective data ( 20 ).

At present, there is no published bibliometric analysis available that covers lung cancer related to TME. Therefore, the research status as well as the research frontiers and hotspots in this field are still unclear. To provide a reference for further research and application, this paper recognizes and collects relevant literature data from databases. Literature analysis software has been used to study the annual number of publications, countries, publishing organizations, journals, authors, keywords, and references for the last 10 years at the intersection of TME and Lung Cancer, which will describe the progress, hotspots, and emerging trends of research in the field.

Materials and methods

Database and search strategies.

The data source we used was the Web of Science Core Collection ( http://wcs.webofknowledge.com ), which includes the Science Citation Index Expanded (SCIE) and the Social Science Citation Index (SSCI). A comprehensive database search was completed by both authors on January 7, 2024. The specific search formula was as follows: (TI = (“tumor microenvironment”) OR AK = (“tumor microenvironment”) OR TI = (“TME”) OR AK = (“TME”)) AND (TS = (“Pulmonary Neoplasm*”) OR TS = (“Lung Cancer*”) OR TS = (“Pulmonary Cancer*”)). After the preliminary search, two authors (XTY and HZL) independently reviewed and screened the searched publications based on the following inclusion criteria (1): the publication timespan was set from 1 January 2014 to 31 December 2023 (2); only English-language publications were included (3); the publication type was limited to articles or reviews; and (4) the publication was related to a study of both lung cancer and the tumor microenvironment. In order to ensure the representativeness of the selected publications, the search results underwent a title and abstract-based filtration process, which excluded irrelevant publications. The screening criteria were shown in Figure 1 . A total of 763 publications were included in the final analysis, which were eventually exported in “plain text” format with “full records and references”.

www.frontiersin.org

Figure 1 Process of publications selection in lung cancer and the tumor microenvironment.

Data analysis and visualization

Three scientometric software tools and two statistical mapping software were used in this study. Citespace (version 6.1.6, https://citespace.podia.com ) is a Java-based information visualization and analysis software developed by Professor Chen, who specialized in Computer Science and Intelligence at Drexel University ( 21 ). After transforming the data using the software, we defined the analysis period as 2014–2023, the time interval as 1 year, the g-index as k = 25, and the node types as “author,” “institution,” and “keyword” for the co-occurrence network analysis. Each node represents a type of project. Plus, the size and color of the node circles indicate the number or frequency of publications and the year in which these projects appeared. The links between nodes reflect the collaboration between different projects. In addition, VOSviewer (version 1.6.19, https://www.vosviewer.com ) ( 22 ) is a software tool designed by van Eck and Waltman, mainly used to construct and visualize metering networks. Visualization was achieved by analyzing the co-cited authors and co-cited publications through data importation. The bibliometrix package (version 4.2.2, http://www.bibliometrix.org ) ( 23 ) in R can also play a role in analyzing and visualizing scientific literature. In our study, we primarily utilize a tool to count, analyze, and visualize various aspects such as national geographies, journal trends, top citations, and authors’ publications per year. Meanwhile, the number of publications per year and their respective countries are exported. Finally, the trends are predicted and visualized using Microsoft Excel 2019 and Origin 2021 software.

Global trends of publication

The number of annual publications is an indicator of the developmental trend of scientific knowledge in a specific field. A total of 763 publications were selected to meet the inclusion criteria. This selection process is illustrated in Figure 1 . The publications consisted of 174 reviews (22.8%) and 589 articles (77.2%), of which 4 non-English publications were excluded, and 143 publications were excluded due to inconsistent publications type. According to the analysis of publication numbers, the annual trend can be divided into two phases. From 2014 to 2018, the number of publications fluctuated between 9 and 32. However, the number of publications has shown an upward trend since 2019. It is shown that the number in 2022 surged to its highest level (186, 24.38%). The average annual number of publications was 76.3. Besides, the annual growth rate was 17.7%. A growth trend model was constructed using Microsoft Excel 2019 with the following equation: Y = 2.6326x 2 -7.4977x +16.183 (R² = 0.9368), where X represents the cumulative publication year (starting from 2014) and Y represents the annual publication ( Figure 2A ).

www.frontiersin.org

Figure 2 (A) Number of annual research publications and growth trends. (B) The visualization of country. (C) The geographical distribution.

Analysis of country or region

A total of 36 countries/regions have published relevant studies. The details of the top 10 most productive countries, along with their H-indexes in the field of oncology in 2022, are presented in Figure 2B . Upon reviewing the number of articles published by countries in this field in descending order, it is evident that China and the United States were the countries with the highest number of publications. In particular, the Chinese region ranked first, accounting for the greatest number of 402 articles (52.69%), followed by the United States with 101 (13.24%) and Japan with 44 (5.77%). The H-Index is a measure of a scholar’s or country’s level of scientific research. It is often used to assess the corresponding scholar’s or country’s influence and contributions to the academic field ( 24 ). By analyzing the H-index in the field of oncology, we found that the United States has the highest overall scientific impact on oncology research (score = 848), followed by the United Kingdom, Germany, and France. Although China is ranked first in the number of publications, its H-index is relatively low (score = 327), indicating a need to enhance both the quality and the academic impact of its research. As shown in the world map in Figure 2C , the overall distribution of publications within the country-region is clearly illustrated. The darker colors indicate a higher number of publications in those regions. It also illustrates the collaboration between countries and regions, such as China and the United States, China and Australia, and the United States and Brazil, etc. The single-country and multinational cooperative publications for the top 10 countries are presented in Supplementary Table 1 . It can be observed that China ( 25 ) ranked first with the highest number of mutual co-publishing articles with other countries, followed by the United States ( 26 ) and Germany ( 9 ). However, considering the variation in the total number of publications per country, we utilize statistical analysis to compare them in terms of percentages. Consequently, the rate of cooperative papers published by Australia (70%) was much higher than that of France (54.5%), India (53.8%), and Germany (34.6%). Additionally, Supplementary Table 2 presents the total citation data per country, with China ranking first (5,103), followed by the United States (3,284), and Japan (735). The average number of citations per paper for articles reflects the quality and influence of publications in each country. In sum, Australia ranks first in average citations with 94.8, while the United Kingdom comes in second with 93.

Analysis of institutions

More than 400 institutions engaged in research regarding xxx between 2014 and 2023. The top 10 most prolific institutes are shown in Supplementary Table 3 . Meanwhile, we establish an institutional collaboration network by using the CiteSpace software ( Figure 3A ). Finally, it generates a total of 297 nodes and 477 edges, the numbers of which represent extensive cooperation between established institutions. The top 5 prolific institutions are Tongji University with 24 publications, Nanjing Medical University ( 21 ), Peking Union Medical College of the Chinese Academy of Medical Sciences ( 17 ), Sichuan University ( 16 ), and Sun Yat-sen University ( 16 ). What’s more, 20 institutions in total have published more than 10 publicaitons, with the majority of these research institutions located in China.

www.frontiersin.org

Figure 3 (A) The visualization of institution. (B) The visualization of author. (C) Cumulative publication trend of the top journals. (D) The dual-map overlay of journals.

Analysis of journals

Publications related to the tumor microenvironment and lung cancer have been published in 262 journals. The relative contents of the top 10 journals were listed in Supplementary Table 4 , including information such as country, academic district, impact factor, H-index, and total citations. Frontiers in Immunology published the highest number of papers (n = 57, 7.47%), followed by Cancers (n = 44, 5.77%), Journal for Immunotherapy of Cancer (n = 37, 4.85%), and Frontiers in Oncology (n = 31, 4.06%). There are 11 journals with more than 10 publications. Impact factor (IF) and academic reputation are important criteria for evaluating research outcomes and academic excellence. The journal with the highest impact factor in 2022 is the Journal of Thoracic Oncology (IF = 20.4), followed by the Journal for Immunotherapy of Cancer (IF = 10.9) and Frontiers in Immunology (IF = 7.3). Although the majority of journals are primarily located in Europe, there are also a few in Asia, the Americas, and Oceania. The H-index of a journal usually refers to the number of journal publications that have been cited at least H times by other publications. It is also an indicator of academic quality and influence. In general, a journal with a higher H-index would obtain greater impact. The journal with the largest H-index and the largest total citations is the Journal for Immunotherapy of Cancer (H-index = 14, total citations = 877), while the journal with the second largest number of publications is Frontiers in Immunology (H-index = 13), Cancers (H-index = 11), and Frontiers in Oncology (H-index = 10). In addition to these journals, the top 3 cited journals also include the Journal of Thoracic Oncology (total citations = 732) and Frontiers in Immunology (total citations = 628). Visualization of the cumulative publications from the top 10 journals is shown in Figure 3C via the bibliometric package in R. A noticeable increasing trend in the publications of many journals can be observed. It is worth noting that Frontiers in Immunology and Cancers have demonstrated a particularly notable increase from 2020 to 2023, with growth exceeding that of other journals during the same period. The dual-mapped overlay of scholarly journals illustrates the relationship between citing and cited journals ( Figure 3D ). Labels indicate the subject areas of the journals, and colored lines represent different citation paths, with the width of the paths proportional to the z-score levels ( 27 ). The two main pathways were (1) molecular, biology, and immunology - molecular, biology, genetics (z = 5.8265, f = 5754) (2); medicine, medical, clinical - molecular, biology, genetics (z = 3.0848, f = 3173).

Analysis of authors and cited authors

The top 10 most productive authors with the highest number of publications in this field are presented in Supplementary Table 3 . Zhou, Caicun from Tongji University Affiliated Shanghai Pulmonary Hospital, and Savai, Rajkumar from the Department of Lung Development and Remodeling were the two most prolific authors, with 10 and 8 publications, respectively. Moreover, most of them are from China, which suggests that China is still in the leading position in this field. According to the author cooperation interrelationships in this field drawn by citespace ( Figure 3B ), a total of 382 nodes and 794 edges, indicate that authors with a higher number of papers typically collaborate with regular co-authors and teams.

Co-cited author analysis is a method used to assess the influence and contributions of authors within the academic community based on how frequently they have been cited in scholarly literature. This type of analysis can help researchers gain insights into academic trends, research hotspots, and academic authorities in a particular field.

The network structure of co-cited authors was analyzed using VOSviewer software, as depicted in Figure 4A . The analysis revealed a total of 26,096 co-cited authors, with 51 authors cited more than 40 times. It can be observed in Supplementary Table 5 lists the top 10 cited authors, their affiliations, and the H-index. Martin Reck from the LungenClinic Grosshansdorf had the largest total number of citations (n = 178), followed by Herbst, Roy S. from Yale University (n = 172) and Rebecca L. Siegel from the American Cancer Society (n = 161). Most of these co-cited authors are from Europe or North America. Furthermore, the author with the highest H-index among the top 10 is Alberto Mantovani (187), who has made a significant impact on scholarship in this field. Following Mantovani is Ahmedin Jemal (139) in second place, and then Douglas Hanahan (109).

www.frontiersin.org

Figure 4 (A) The co-cited authors analysis. (B) The visualization of keyword. (C) Timeline view for keywords. (D) The three-field plot of lung cancer and the tumor microenvironment.

Analysis of keywords and timeline

The keyword co-occurrence analysis involves extracting and analyzing the keywords in the publication. Through the above process, it can help us identify the main topics, core contents, and key points so that the information can be well understood and utilized ( 26 ). The timeline view map is typically used to illustrate the evolution of specific research subjects or keywords over time, which enables researchers to observe research hotspots, citation relationships, and trends related to a particular topic or field across distinct timeframes. Furthermore, it facilitates an enhanced understanding of research dynamics and field evolution. The visualization of the relationship between keywords ( Figure 4B ) is achieved by constructing a visual network using the Citespace software, which consists of 401 nodes and 2766 edges. The size of the node is proportional to the frequency of occurrence of the associated keywords within the field. The larger the node, the greater the frequency of occurrence. Contour color reflects the year of occurrence, with darker colors indicating earlier years, while centrality indicates key nodes within a network. The top 20 keywords and their centrality are shown in Supplementary Table 6 . Only the keyword “breast cancer” has a centrality greater than 0.1, while the rest have centrality values below 0.1. This indicates that there are few key nodes in the network. Keywords with higher frequencies include “tumor microenvironment,” “lung cancer,” “expression,” “cancer,” “non-small cell cancer,” and so on. As demonstrated in Figure 4C , the evolution of the tumor microenvironment and lung cancer in this field is delineated through the log-likelihood ratio (LLR) algorithm, the modularity value (Q-value), and the mean silhouette (S-value), which serve as crucial metrics for evaluating the outcomes of graph plotting. The Q-value of the graph, which reached 0.3555 (>0.3), was rationalized into loosely coupled clusters, and the homogeneity within the clusters was credible. The S-value of 0.6853 indicates that the clustering configuration is reasonable and is divided into nine clusters. These clusters are based on the keywords “macrophage,” “EGFR mutation,” “tumor progression,” “drug resistance,” “immune infiltration,” “metabolism,” “lung cancer,” “single-cell RNA sequencing,” and “machine learning.” Additionally, the association between the top 10 countries, institutions, and shared keywords was analyzed using R-bibliometrix ( Figure 4D ).

Analysis of co-citation and burst reference

A total of 763 publications have been cited 13,663 times, with an average of 17.91 citations per paper. The top 10 most frequently cited publications are presented in Supplementary Table 7 ( 28 – 37 ). The article by Koyama S et al. ( 32 ), published in Cancer Research in 2016 and titled “STK11/LKB1 Deficiency Promotes Neutrophil Recruitment and Proinflammatory Cytokine Production to Suppress T-cell Activity in the Lung Tumor Microenvironment,” ranked first in this field with a total of 374 citations. Three reviews were included in the top 10 most cited publications. The most cited article was published by Koyama S et al. ( 32 ), which received an average of 41.56 citations per year, followed by Bremnes Rm et al. ( 29 ), whose article was cited 37.11 times per year.

In the field, there are 36,868 co-citations available, out of which 72 references had more than 20 co-citations. As demonstrated in Figure 5A , the relational network graph, comprising more than 35 co-citations selected through the VOSviewer software, has a total of 322 edges and a total link strength of 1882. Besides, the top 10 co-cited references are listed in Supplementary Table 8 ( 25 , 38 – 46 ). The reference “Global Cancer Statistics” by Ahmedin Jemal in CA: A Cancer Journal for Clinicians had the greatest number of citations ( 25 ), while “Hallmarks of Cancer: The Next Generation” ( 40 ) and “Cancer Statistics, 2021” ( 45 ) follow closely. As can be seen in the table, the top ten co-cited references were mainly published between 2011 and 2021. With two articles published in the New England Journal, two in CA: A Cancer Journal for Clinicians, and the remaining articles published in various journals.

www.frontiersin.org

Figure 5 (A) The co-cited references analysis. (B) The top twenty-five references with the strongest citation bursts.

An outbreak of co-citation refers to literature that has been cited together by a wide range of researchers within a specific period. Based on a co-citation literature analysis of 763 documents imported into Citespace, the blue timeline in the figure illustrates the strongest citation bursts. These bursts are defined as the periods between co-cited references by different researchers. The red segments on the timeline represent the time intervals between bursts and indicate the start and end years of the bursts ( 47 ). The top 25 references with the strongest citation bursts are displayed in Figure 5B , as identified by the Citespace software. These bursts occurred as early as 2015 and as late as 2023. The most significant citation bursts were published in 2015 by Borghaei et al. ( 39 ) in the New England Journal, with a burst intensity of 12 and an outbreak period from 2016 to 2020. Overall, the citation burst intensity of these 25 references ranged from 12 to 3.68. The majority of the strongest citation bursts were published in the New England Journal of Medicine ( 7 ).

As far as incomplete statistics are concerned, this paper is the first bibliometric article in this field. The study involves a statistical analysis of the hotspots and trends related to lung cancer and the tumor microenvironment over the past decade using bibliometrics. This analysis utilizes software tools like Citespace, VOSviewer, and the R programming language. As illustrated in Figure 2A , the research in this field demonstrates a persistent upward trend, with a non-significant growth rate observed between 2014 and 2018. However, the number of publications has exhibited a notable acceleration since 2019, reaching levels comparable to those observed in 2022 and 2023. The number of annual publications has surpassed 100 for the first time in 2021, suggesting that this field is gradually attracting widespread attention from researchers, and the prospect of development is promising. The number of annual publications in the field is expected to reach approximately 650 by 2030, as predicted in the fitted model. By analyzing national and regional publications, it is evident that China and the United States are the primary research hubs in this field. This trend is closely linked to the concentration of research technology and top-tier talent in both countries. In addition, another important factor contributing to the annual increase in the number of articles published in China is likely associated with the large population base in China and the continuously rising incidence and mortality rate of lung cancer ( 48 ), a topic that receives widespread attention and support for research. The data presented in Supplementary Table 1 indicate a relatively low proportion of collaborative research among countries in this field, with the majority of such research mainly being conducted in Europe and Oceania. Moreover, the average citation ranking of articles suggests that academic research conducted through international collaborations is more likely to attract greater attention and have a higher academic impact. While the East Asia region shows a significant publication volume, the level of inter-country cooperation is comparatively low. Therefore, it is evident that cooperation across countries will contribute to improving resource efficiency and academic development in this field in the future.

Research institutes refer to organizations or units specialized in scientific research, technological development, and innovation. This typically includes universities, research institutes, laboratories, and similar entities. These institutions can reflect the background and sources of research findings, as well as directly affect the quality and credibility of academic research. This study finds that the primary research institutions in this field are situated in China, which is closely associated with the large number of publications in the Chinese region. Meanwhile, Tongji University in China, which ranked first, has been at the forefront of lung cancer research, strongly correlated with the institution’s investment and adequate resources in research. As shown in Supplementary Table 4 , most journals published in this field are predominantly located in the Q1 and Q2 quartiles of the Journal Citation Reports (JCR), indicating that they are highly valued by researchers worldwide. Nevertheless, the impact factor of the journals is relatively low, suggesting that scholarship in this area of research still needs improvement.

In terms of authors, Prof. Caicun Zhou from Tongji University in China is the most productive author. He has a long-term involvement in research on the molecular mechanisms and clinical effectiveness of drugs in lung cancer. In particular, Prof. Zhou has made significant contributions in the following areas: mechanisms of lung cancer drugs, immunotherapy for lung cancer, and the establishment of a model for predicting the risk of recurrence of lung cancer ( 37 , 49 – 51 ). The following most prolific author is Professor Rajkumar Savai from the Department of Lung Development and Remodeling. He specializes in researching the association between lung cancer and the tumor microenvironment through tumor signaling pathways, macrophages, immunotherapy, etc. ( 52 – 55 ).

The most cited article in this field is a paper published in 2016 by Koyama S et al. regarding the effect of STK11/LKB1 deletion on the immune microenvironment in a KRAS-driven mouse model of NSCLC. The study found that immune escape is mediated by the suppression of myeloid cells and aberrant cytokine production in LKB1-deficient tumors ( 32 ). Next is a review of tumor-infiltrating lymphocytes (TILs) and non-small cell lung cancer by Bremnes Rm and colleagues in 2016 ( 29 ), as well as a review of the tumor microenvironment and metastatic mechanisms in lung cancer by Wood Sl et al. in 2014 ( 35 ).

Research hotspots and frontiers

Keywords and cluster analysis are significant tools for identifying the hotspots and frontiers of research. As a result, we summarize the analysis to provide insight into the interconnection between lung cancer and the tumor microenvironment, suggesting several potential directions for future research.

The first topic to be considered is the relationship between gene expression and signaling pathways in lung cancer in the context of the tumor microenvironment. From the clustering in Figure 4C , it can be concluded that macrophages represent a significant category within the field of study. These cells commonly originate from bone marrow hematopoietic stem cells and subsequently differentiate into circulating monocytes in the peripheral blood. Then they migrate to various tissues and organs within the body, including the skeletal system, lungs, and liver. In these locations, they eventually transform into macrophages with specificity ( 56 , 57 ). Specifically, macrophages can be divided into two subtypes depending on their activation status, function, and secreted cytokines. One subtype, M1-type macrophages, is activated by LPS, IFN-γ, or TNF and secretes high levels of pro-inflammatory cytokines. These cytokines play a role in the cellular immune response facilitated by type I helper T cells. The other subtype, M2-type macrophages, is polarized by Th2 cytokines. They induce immunosuppression, participate in pro-carcinogenic functions, and promote tumor growth and metastasis ( 58 – 60 ). Liu et al. ( 61 ) demonstrated that glucose metabolic pathways are intricately linked with polarization state shifts in tumor-associated macrophages. M1 macrophages upregulate glycolysis and the pentose phosphate pathway to trigger inflammation, while M2 macrophages rely more on the tricarboxylic acid cycle and mitochondrial metabolism to suppress anti-tumor immunity and promote tumor metastasis. In addition, tumor-associated fibroblasts (CAFs) were identified as a potential trend and direction for future research in this field, based on the outbreak of cited literature. Cancer-associated fibroblasts (CAFs) influence the creation of extracellular matrix (ECM) structures and metabolism in the TME. They play a crucial role in regulating tumor immunity and resistance to chemotherapeutic agents ( 62 ). Qiao et al. ( 63 ) found that in KRAS-mutant lung adenocarcinoma, STK11/LKB1 mutation by constructing a mouse model of lung cancer. Adhesion plaque kinase (FAK) inhibitors inhibited the activation of CAFs and further promoted the infiltration of CD8 T cells, DC cells, and M1-type macrophages into the tumors, thus remodeling the tumor microenvironment. Samart et al. ( 64 ) concluded that Musashi-2 (MSI2) has a potential impact on CAFs in regulating the invasive and metastatic spread of NSCLC cells by analyzing genomics and proteomics data. Additionally, a new complex regulatory axis involving MSI2/IL-6 was identified, indicating an interaction between NSCLC-derived CAFs and NSCLC cells through paracrine signaling. Cords et al. ( 65 ) found differences in the spatial distribution of distinct CAF phenotypes in TME and identified specific CAF phenotypes that were associated with good versus poor patient prognosis. Tumor metabolism essentially refers to the abnormal metabolic features of tumor cells that contribute to the proliferation and progression of tumors. This involves glucose uptake and utilization, as well as nucleotide synthesis. While metabolism-targeted therapies are widely discussed nowadays ( 66 , 67 ). In line with the research focus of this field, lung cancer offers a promising path for exploring tumor metabolism and tumor microenvironment in precision medicine. Tumor metabolism has a significant impact on the tumor microenvironment, thus affecting the proliferation of lung cancer cells. Moreover, this research area offers insights that can inform the treatment of lung cancer ( 68 ). Liu et al. ( 69 ) predicted the prognosis of lung adenocarcinoma (LUAD) and the efficacy of various immunotherapies by constructing a model of glutamine metabolism. Furthermore, a research has demonstrated that deoxypodophyllotoxin (DPT) inhibits glycolysis by preventing the overexpression of HIF-1α, which in turn suppresses cell proliferation in NSCLC ( 70 ).

Secondly, we will be examining the links and interactions between lung cancer treatment and the tumor microenvironment. The analysis of keywords and clusters has revealed that scholars have devoted considerable attention to the treatment of lung cancer research, including studies on tumor progression and drug resistance. Firstly, the conventional treatment options include surgical intervention, radiation therapy, and chemotherapy. Surgery is considered the most effective clinical treatment for lung cancer ( 71 ). Yet, surgery is often limited to patients with stage I and II operable NSCLC and a preferred local treatment modality is recommended ( 8 , 72 ). On the other hand, radiotherapy applies to all stages of lung cancer, affecting the functions of the immune system in various ways. With advances in imaging, radiotherapy is now faster and more precise in the treatment of lung cancer ( 73 ). Chemotherapy has a high frequency of keyword occurrences in this field, and platinum-based adjuvant chemotherapy is commonly used in patients diagnosed with stage II and stage III NSCLC ( 74 ). A combination of two cytotoxic drugs is recommended as first-line treatment for advanced metastatic NSCLC ( 8 ). However, the chemotherapy drugs can cause central and peripheral neurotoxicity, cardiotoxicity, gastrointestinal toxicity, and hematologic toxicity in humans ( 75 , 76 ). From the perspective of immune mechanisms, chemotherapeutic agents can enhance the immune response by inducing immunogenic cell death, potentiating T-cell activation, and increasing the activity of tumor-killing immune cells ( 77 ). Whereas, the tumor microenvironment also contributes to an increase in resistance to chemotherapy drugs ( 78 ). With ongoing research on tumor molecules, targeted therapy is becoming a crucial treatment for non-small cell lung cancer. EGFR, KRAS, and ALK are the primary susceptibility genes for common NSCLC driver mutations ( 79 , 80 ). Among these, EGFR mutation is a crucial factor for cluster analysis in this field. Targeted therapies exert their effects primarily on tumor cells by blocking specific signaling pathways. In addition, targeted therapies have a direct effect on tumor cells mainly by blocking specific signaling pathways, which the immune microenvironment interacts with to affect targeted drug sensitivity ( 81 ).

The tumor microenvironment also influences targeted drug resistance. EGFR mutations may decrease the number of CD8+ cells, elevate the Treg population, and activate the STAT-3 intracellular pathway, leading to immune escape and increased resistance to targeted drugs ( 82 ). Moreover, vascular endothelial growth factor (VEGF)-targeted drugs can affect the tumor microenvironment in NSCLC by inhibiting immune escape, normalizing tumor vasculature, modulating T-cell numbers, and increasing tumor immune cells ( 83 ). The field of immunotherapy has seen significant progress in the last decade, introducing new therapies that have become a standard treatment for patients with stage III or IV NSCLC. The tumor microenvironment plays a significant role in determining both the sensitivity and resistance to immune drugs ( 84 ). Immune checkpoint inhibitors (ICIs) represent another significant cluster in this field, encompassing a range of molecules, including PD-1, PD-L1, and CTLA-4, which exert anti-tumor effects by regulating the interaction of Treg cells with antigen-presenting cells or tumor cells ( 85 ). The antitumor efficacy of PD-1/PD-L1 blockers has been shown to correlate with an increased presence of CD8+ tumor-infiltrating lymphocytes and the overexpression of chemokines and cytokines in the tumor microenvironment ( 86 ). The efficacy of immune checkpoint inhibitors is linked to the activation of effector immune cells, such as tumor-infiltrating lymphocytes, dendritic cells, and others. Conversely, resistance is mainly associated with the infiltration of immune cells, including regulatory Treg cells, myeloid-derived suppressor cells (MDSCs), and tumor-associated macrophages, as well as the recruitment of chemokines and high expression of vascular endothelial growth factor (VEGF) ( 87 ). Compared with NSCLC, SCLC has no significant effect on immunotherapy. In addition to the low expression of PD-L1 in tumor cells, various factors such as low expression or deletion of MHC I and MHC II proteins in the tumor microenvironment, and inhibition of the proliferation of CD4+ cells can also contribute to immune escape ( 88 ).

Thirdly, the discussion will focus on the impact of new technologies and methods on the direction of research within this field. Single-cell RNA sequencing (scRNA-seq), one of the crucial research techniques in the field, is a novel approach that allows for the examination of the transcriptome of individual cells within a sequenced sample. This method facilitates the analysis of cell types and heterogeneity in gene expression ( 89 ). Hu et al. ( 90 ) demonstrated, using single-cell techniques and in vivo experiments, that tumor-associated macrophages (TAMs) promote IL-6 expression through the formation of an IL6-STAT3-C/EBPβ-IL6 positive feedback loop. This loop, in turn, induces the epithelial-to-mesenchymal transition (EMT) pathway as a mechanism to enhance migration, invasion, and metastasis in lung cancer. Han ( 91 ) found that Osimertinib, in combination with anti-angiogenic agents, increased the number of CD8 T cells and proliferation of T cells compared with a single agent by analyzing tumor tissue using ScRNA-seq. Mao ( 92 ) demonstrated that the expression of the CDC25C gene affects the invasion and migration of lung cancer cells through the study and analysis of scRNA-seq data, suggesting it may play a crucial role in the EMT pathway. Machine learning is also a significant research area in this field. It primarily focuses on the general concept of various models and strategies. Currently, machine learning methods are experiencing a gradual increase in the field of medical research ( 93 ). Cury et al. ( 94 ) employed machine learning modeling to predict the impact of the pectoralis major muscle region on NSCLC.

Limitations

This study is subject to certain limitations. Firstly, the research data were exclusively retrieved from the SSCI and SCI-E databases of WoSCC, which may lead to incomplete data and related results. Secondly, the data we selected were exclusively published in English. This exclusion of books, conference papers, and other types of publications may have resulted in the omission of some articles. Thirdly, although this study employed rigorous screening criteria and a comprehensive double search and review process, the search formula may not fully encompass all relevant research findings in this field, potentially leading to the omission of crucial research contributions. Fourthly, since the software is not analyzed in the same manner, errors may exist in some of the results.

Conclusions

In this research, bibliometric methods were used to visualize articles on lung cancer and the tumor microenvironment published between 2014 and 2023. This approach enabled researchers to gain into the current status, frontiers, and hotspots in this field. The findings indicate that the number of publications in this field is generally increasing. The majority of these publications are authored by researchers in China and the United States. Additionally, research on the correlation between the tumor microenvironment and lung cancer molecular signaling pathways and therapy is gaining increasing attention. These fields are expected to be significant focal points for future research on lung cancer and tumor microenvironment.

Data availability statement

The original contributions presented in the study are included in the article/ Supplementary Material . Further inquiries can be directed to the corresponding author.

Author contributions

ZH: Data curation, Formal Analysis, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. TX: Formal Analysis, Investigation, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. WX: Funding acquisition, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing. ZC: Data curation, Investigation, Software, Validation, Visualization, Writing – review & editing. ZW: Investigation, Methodology, Software, Validation, Visualization, Writing – review & editing. LY: Investigation, Methodology, Software, Validation, Visualization, Writing – review & editing.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the Sanming Project of Medicine in Shenzhen (grant number SZZYSM202311001).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2024.1428018/full#supplementary-material

1. Schabath MB, Cote ML. Cancer progress and priorities: lung cancer. Cancer Epidemiol Biomarkers Prev . (2019) 28:1563–79. doi: 10.1158/1055-9965.EPI-19-0221

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin . (2023) 73:17–48. doi: 10.3322/caac.21763

3. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin . (2021) 71:209–49. doi: 10.3322/caac.21660

4. Muthusamy B, Berktas M, Li J, Thomas DS, Sun P, Taylor A, et al. EGFR mutation testing, treatment and survival in stage I-III non-small-cell lung cancer: CancerLinQ Discovery database retrospective analysis. Future Oncol . (2024), 1–14. doi: 10.1080/14796694.2024.2347826

5. Dai J, Su Y, Zhong S, Cong L, Liu B, Yang J, et al. Exosomes: key players in cancer and potential therapeutic strategy. Signal Transduct Target Ther . (2020) 5:145. doi: 10.1038/s41392-020-00261-0

6. Jin MZ, Jin WL. The updated landscape of tumor microenvironment and drug repurposing. Signal Transduct Target Ther . (2020) 5:166. doi: 10.1038/s41392-020-00280-x

7. Zulfiqar B, Farooq A, Kanwal S, Asghar K. Immunotherapy and targeted therapy for lung cancer: Current status and future perspectives. Front Pharmacol . (2022) 13:1035171. doi: 10.3389/fphar.2022.1035171

8. Lemjabbar-Alaoui H, Hassan OU, Yang YW, Buchanan P. Lung cancer: Biology and treatment options. Biochim Biophys Acta . (2015) 1856:189–210. doi: 10.1016/j.bbcan.2015.08.002

9. de Visser KE, Joyce JA. The evolving tumor microenvironment: From cancer initiation to metastatic outgrowth. Cancer Cell . (2023) 41:374–403. doi: 10.1016/j.ccell.2023.02.016

10. Korbecki J, Olbromski M, Dzięgiel P. CCL18 in the progression of cancer. Int J Mol Sci . (2020) 21(21):7955. doi: 10.3390/ijms21217955

11. Zeng W, Xiong L, Wu W, Li S, Liu J, Yang L, et al. CCL18 signaling from tumor-associated macrophages activates fibroblasts to adopt a chemoresistance-inducing phenotype. Oncogene . (2023) 42:224–37. doi: 10.1038/s41388-022-02540-2

12. Xiang Y, Liu X, Wang Y, Zheng D, Meng Q, Jiang L, et al. Mechanisms of resistance to targeted therapy and immunotherapy in non-small cell lung cancer: promising strategies to overcoming challenges. Front Immunol . (2024) 15:1366260. doi: 10.3389/fimmu.2024.1366260

13. Hirayama A, Tanaka K, Tsutsumi H, Nakanishi T, Yamashita S, Mizusaki S, et al. Regulation of PD-L1 expression in non-small cell lung cancer by interleukin-1β. Front Immunol . (2023) 14:1192861. doi: 10.3389/fimmu.2023.1192861

14. Kao KC, Vilbois S, Tsai CH, Ho PC. Metabolic communication in the tumour-immune microenvironment. Nat Cell Biol . (2022) 24:1574–83. doi: 10.1038/s41556-022-01002-x

15. Lim AR, Rathmell WK, Rathmell JC. The tumor microenvironment as a metabolic barrier to effector T cells and immunotherapy. Elife . (2020) 9:e55185. doi: 10.7554/eLife.55185

16. Zhou Y, Cheng L, Liu L, Li X. NK cells are never alone: crosstalk and communication in tumour microenvironments. Mol Cancer . (2023) 22:34. doi: 10.1186/s12943-023-01737-7

17. Barta JA, Powell CA, Wisnivesky JP. Global epidemiology of lung cancer. Ann Glob Health . (2019) 85(1):8. doi: 10.5334/aogh.2419

18. Clément-Duchêne C, Vignaud JM, Stoufflet A, Bertrand O, Gislard A, Thiberville L, et al. Characteristics of never smoker lung cancer including environmental and occupational risk factors. Lung Cancer . (2010) 67:144–50. doi: 10.1016/j.lungcan.2009.04.005

19. Thompson DF, Walker CK. A descriptive and historical review of bibliometrics with applications to medical sciences. Pharmacotherapy . (2015) 35:551–9. doi: 10.1002/phar.1586

20. Abouzid M, Karaźniewicz-Łada M, Abdelazeem B, Brašić JR. Research trends of vitamin D metabolism gene polymorphisms based on a bibliometric investigation. Genes (Basel) . (2023) 14(1):215. doi: 10.3390/genes14010215

21. Chen C. Science mapping: A systematic review of the literature. J Data Inf Sci . (2017) 2:1–40. doi: 10.1515/jdis-2017-0006

CrossRef Full Text | Google Scholar

22. van Eck NJ, Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics . (2010) 84:523–38. doi: 10.1007/s11192-009-0146-3

23. Massimo A, Corrado C. bibliometrix: An R-tool for comprehensive science mapping analysis. J Informetrics . (2017) 11:959–75. doi: 10.1016/j.joi.2017.08.007

24. Xia DM, Wang XR, Zhou PY, Ou TL, Su L, Xu SG. Research progress of heat stroke during 1989-2019: a bibliometric analysis. Mil Med Res . (2021) 8:5. doi: 10.1186/s40779-021-00300-z

25. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer statistics. CA Cancer J Clin . (2011) 61:69–90. doi: 10.3322/caac.v61:2

26. Zhao J, Li M. Worldwide trends in prediabetes from 1985 to 2022: A bibliometric analysis using bibliometrix R-tool. Front Public Health . (2023) 11:1072521. doi: 10.3389/fpubh.2023.1072521

27. Xu Q, Zhou Y, Zhang H, Li H, Qin H, Wang H. Bibliometric analysis of hotspots and frontiers of immunotherapy in pancreatic cancer. Healthcare (Basel) . (2023) 11(3):304. doi: 10.3390/healthcare11030304

28. Badalamenti G, Fanale D, Incorvaia L, Barraco N, Listì A, Maragliano R, et al. Role of tumor-infiltrating lymphocytes in patients with solid tumors: Can a drop dig a stone? Cell Immunol . (2019) 343:103753. doi: 10.1016/j.cellimm.2018.01.013

29. Bremnes RM, Busund LT, Kilvær TL, Andersen S, Richardsen E, Paulsen EE, et al. The role of tumor-infiltrating lymphocytes in development, progression, and prognosis of non-small cell lung cancer. J Thorac Oncol . (2016) 11:789–800. doi: 10.1016/j.jtho.2016.01.015

30. Caetano MS, Zhang H, Cumpian AM, Gong L, Unver N, Ostrin EJ, et al. IL6 blockade reprograms the lung tumor microenvironment to limit the development and progression of K-ras-mutant lung cancer. Cancer Res . (2016) 76:3189–99. doi: 10.1158/0008-5472.CAN-15-2840

31. Faget J, Groeneveld S, Boivin G, Sankar M, Zangger N, Garcia M, et al. Neutrophils and snail orchestrate the establishment of a pro-tumor microenvironment in lung cancer. Cell Rep . (2017) 21:3190–204. doi: 10.1016/j.celrep.2017.11.052

32. Koyama S, Akbay EA, Li YY, Aref AR, Skoulidis F, Herter-Sprie GS, et al. STK11/LKB1 deficiency promotes neutrophil recruitment and proinflammatory cytokine production to suppress T-cell activity in the lung tumor microenvironment. Cancer Res . (2016) 76:999–1008. doi: 10.1158/0008-5472.CAN-15-1439

33. Lou Y, Diao L, Cuentas ER, Denning WL, Chen L, Fan YH, et al. Epithelial-mesenchymal transition is associated with a distinct tumor microenvironment including elevation of inflammatory signals and multiple immune checkpoints in lung adenocarcinoma. Clin Cancer Res . (2016) 22:3630–42. doi: 10.1158/1078-0432.CCR-15-1434

34. Steggerda SM, Bennett MK, Chen J, Emberley E, Huang T, Janes JR, et al. Inhibition of arginase by CB-1158 blocks myeloid cell-mediated immune suppression in the tumor microenvironment. J Immunother Cancer . (2017) 5:101. doi: 10.1186/s40425-017-0308-4

35. Wood SL, Pernemalm M, Crosbie PA, Whetton AD. The role of the tumor-microenvironment in lung cancer-metastasis and its relationship to potential therapeutic targets. Cancer Treat Rev . (2014) 40:558–66. doi: 10.1016/j.ctrv.2013.10.001

36. Zhang X, Zeng Y, Qu Q, Zhu J, Liu Z, Ning W, et al. PD-L1 induced by IFN-γ from tumor-associated macrophages via the JAK/STAT3 and PI3K/AKT signaling pathways promoted progression of lung cancer. Int J Clin Oncol . (2017) 22:1026–33. doi: 10.1007/s10147-017-1161-7

37. Zhao S, Ren S, Jiang T, Zhu B, Li X, Zhao C, et al. Low-dose apatinib optimizes tumor microenvironment and potentiates antitumor effect of PD-1/PD-L1 blockade in lung cancer. Cancer Immunol Res . (2019) 7:630–43. doi: 10.1158/2326-6066.CIR-17-0640

38. Altorki NK, Markowitz GJ, Gao D, Port JL, Saxena A, Stiles B, et al. The lung microenvironment: an important regulator of tumour growth and metastasis. Nat Rev Cancer . (2019) 19:9–31. doi: 10.1038/s41568-018-0081-9

39. Borghaei H, Paz-Ares L, Horn L, Spigel DR, Steins M, Ready NE, et al. Nivolumab versus docetaxel in advanced nonsquamous non-small-cell lung cancer. N Engl J Med . (2015) 373:1627–39. doi: 10.1056/NEJMoa1507643

40. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell . (2011) 144:646–74. doi: 10.1016/j.cell.2011.02.013

41. Herbst RS, Morgensztern D, Boshoff C. The biology and management of non-small cell lung cancer. Nature . (2018) 553:446–54. doi: 10.1038/nature25183

42. Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods . (2015) 12:453–7. doi: 10.1038/nmeth.3337

43. Reck M, Rodríguez-Abreu D, Robinson AG, Hui R, Csőszi T, Fülöp A, et al. Pembrolizumab versus chemotherapy for PD-L1-positive non-small-cell lung cancer. N Engl J Med . (2016) 375:1823–33. doi: 10.1056/NEJMoa1606774

44. Rittmeyer A, Barlesi F, Waterkamp D, Park K, Ciardiello F, von Pawel J, et al. Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a phase 3, open-label, multicentre randomised controlled trial. Lancet . (2017) 389:255–65. doi: 10.1016/S0140-6736(16)32517-X

45. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2021. CA Cancer J Clin . (2021) 71:7–33. doi: 10.3322/caac.21654

46. Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun . (2013) 4:2612. doi: 10.1038/ncomms3612

47. Huang X, Fan X, Ying J, Chen S. Emerging trends and research foci in gastrointestinal microbiome. J Transl Med . (2019) 17:67. doi: 10.1186/s12967-019-1810-x

48. Long J, Zhai M, Jiang Q, Li J, Xu C, Chen D. The incidence and mortality of lung cancer in China: a trend analysis and comparison with G20 based on the Global Burden of Disease Study 2019. Front Oncol . (2023) 13:1177482. doi: 10.3389/fonc.2023.1177482

49. Chen P, Zhang L, Zhang W, Sun C, Wu C, He Y, et al. Galectin-9-based immune risk score model helps to predict relapse in stage I-III small cell lung cancer. J Immunother Cancer . (2020) 8(2):e001391. doi: 10.1136/jitc-2020-001391

50. Jiang M, Wu C, Zhang L, Sun C, Wang H, Xu Y, et al. FOXP3-based immune risk model for recurrence prediction in small-cell lung cancer at stages I-III. J Immunother Cancer . (2021) 9(5):e002339. doi: 10.1136/jitc-2021-002339

51. Qiao M, Jiang T, Liu X, Mao S, Zhou F, Li X, et al. Immune checkpoint inhibitors in EGFR-mutated NSCLC: dusk or dawn? J Thorac Oncol . (2021) 16:1267–88. doi: 10.1016/j.jtho.2021.04.003

52. Karger A, Mansouri S, Leisegang MS, Weigert A, Günther S, Kuenne C, et al. ADPGK-AS1 long noncoding RNA switches macrophage metabolic and phenotypic state to promote lung cancer growth. EMBO J . (2023) 42:e111620. doi: 10.15252/embj.2022111620

53. Karger A, Nandigama R, Stenzinger A, Grimminger F, Pullamsetti SS, Seeger W, et al. Hidden treasures: macrophage long non-coding RNAs in lung cancer progression. Cancers (Basel) . (2021) 13(16):4127. doi: 10.3390/cancers13164127

54. Marwitz S, Turkowski K, Nitschkowski D, Weigert A, Brandenburg J, Reiling N, et al. The multi-modal effect of the anti-fibrotic drug pirfenidone on NSCLC. Front Oncol . (2019) 9:1550. doi: 10.3389/fonc.2019.01550

55. Sarode P, Schaefer MB, Grimminger F, Seeger W, Savai R. Macrophage and tumor cell cross-talk is fundamental for lung tumor progression: we need to talk. Front Oncol . (2020) 10:324. doi: 10.3389/fonc.2020.00324

56. Mass E, Ballesteros I, Farlik M, Halbritter F, Günther P, Crozet L, et al. Specification of tissue-resident macrophages during organogenesis. Science . (2016) 353(6304):aaf4238. doi: 10.1126/science.aaf4238

57. Song Y, Hu J, Ma C, Liu H, Li Z, Yang Y. Macrophage-derived exosomes as advanced therapeutics for inflammation: current progress and future perspectives. Int J Nanomedicine . (2024) 19:1597–627. doi: 10.2147/IJN.S449388

58. Boutilier AJ, Elsawa SF. Macrophage polarization states in the tumor microenvironment. Int J Mol Sci . (2021) 22(13):6995. doi: 10.3390/ijms22136995

59. Li C, Xu X, Wei S, Jiang P, Xue L, Wang J. Tumor-associated macrophages: potential therapeutic strategies and future prospects in cancer. J Immunother Cancer . (2021) 9(1):e001341. doi: 10.1136/jitc-2020-001341

60. Orecchioni M, Ghosheh Y, Pramod AB, Ley K. Macrophage Polarization: Different Gene Signatures in M1(LPS+) vs. Classically and M2(LPS-) vs. Alternatively Activated Macrophages. Front Immunol . (2019) 10:1084. doi: 10.3389/fimmu.2019.01084

61. Liu J, Cao X. Glucose metabolism of TAMs in tumor chemoresistance and metastasis. Trends Cell Biol . (2023) 33:967–78. doi: 10.1016/j.tcb.2023.03.008

62. Kalluri R. The biology and function of fibroblasts in cancer. Nat Rev Cancer . (2016) 16:582–98. doi: 10.1038/nrc.2016.73

63. Qiao M, Zhou F, Liu X, Jiang T, Wang H, Li X, et al. Targeting focal adhesion kinase boosts immune response in KRAS/LKB1 co-mutated lung adenocarcinoma via remodeling the tumor microenvironment. Exp Hematol Oncol . (2024) 13:11. doi: 10.1186/s40164-023-00471-6

64. Samart P, Heenatigala Palliyage G, Issaragrisil S, Luanpitpong S, Rojanasakul Y. Musashi-2 in cancer-associated fibroblasts promotes non-small cell lung cancer metastasis through paracrine IL-6-driven epithelial-mesenchymal transition. Cell Biosci . (2023) 13:205. doi: 10.1186/s13578-023-01158-5

65. Cords L, Engler S, Haberecker M, Rüschoff JH, Moch H, de Souza N, et al. Cancer-associated fibroblast phenotypes are associated with patient outcome in non-small cell lung cancer. Cancer Cell . (2024) 42:396–412.e5. doi: 10.1016/j.ccell.2023.12.021

66. Martínez-Reyes I, Chandel NS. Cancer metabolism: looking forward. Nat Rev Cancer . (2021) 21:669–80. doi: 10.1038/s41568-021-00378-6

67. Stine ZE, Schug ZT, Salvino JM, Dang CV. Targeting cancer metabolism in the era of precision oncology. Nat Rev Drug Discovery . (2022) 21:141–62. doi: 10.1038/s41573-021-00339-6

68. Fahrmann JF, Vykoukal JV, Ostrin EJ. Amino acid oncometabolism and immunomodulation of the tumor microenvironment in lung cancer. Front Oncol . (2020) 10:276. doi: 10.3389/fonc.2020.00276

69. Liu J, Shen H, Gu W, Zheng H, Wang Y, Ma G, et al. Prediction of prognosis, immunogenicity and efficacy of immunotherapy based on glutamine metabolism in lung adenocarcinoma. Front Immunol . (2022) 13:960738. doi: 10.3389/fimmu.2022.960738

70. Yang Y, Liu L, Sun J, Wang S, Yang Z, Li H, et al. Deoxypodophyllotoxin inhibits non-small cell lung cancer cell growth by reducing HIF-1α-mediated glycolysis. Front Oncol . (2021) 11:629543. doi: 10.3389/fonc.2021.629543

71. da Cunha Santos G, Shepherd FA, Tsao MS. EGFR mutations and lung cancer. Annu Rev Pathol . (2011) 6:49–69. doi: 10.1146/annurev-pathol-011110-130206

72. Hirsch FR, Scagliotti GV, Mulshine JL, Kwon R, Curran WJ Jr., Wu YL, et al. Lung cancer: current therapies and new targeted treatments. Lancet . (2017) 389:299–311. doi: 10.1016/S0140-6736(16)30958-8

73. Vinod SK, Hau E. Radiotherapy treatment for lung cancer: Current status and future directions. Respirology . (2020) 25 Suppl 2:61–71. doi: 10.1111/resp.13870

74. Nagasaka M, Gadgeel SM. Role of chemotherapy and targeted therapy in early-stage non-small cell lung cancer. Expert Rev Anticancer Ther . (2018) 18:63–70. doi: 10.1080/14737140.2018.1409624

75. Feliu J, Heredia-Soto V, Gironés R, Jiménez-Munarriz B, Saldaña J, Guillén-Ponce C, et al. Management of the toxicity of chemotherapy and targeted therapies in elderly cancer patients. Clin Transl Oncol . (2020) 22:457–67. doi: 10.1007/s12094-019-02167-y

76. Livshits Z, Rao RB, Smith SW. An approach to chemotherapy-associated toxicity. Emerg Med Clin North Am . (2014) 32:167–203. doi: 10.1016/j.emc.2013.09.002

77. Xue Y, Gao S, Gou J, Yin T, He H, Wang Y, et al. Platinum-based chemotherapy in combination with PD-1/PD-L1 inhibitors: preclinical and clinical studies and mechanism of action. Expert Opin Drug Delivery . (2021) 18:187–203. doi: 10.1080/17425247.2021.1825376

78. Herzog BH, Devarakonda S, Govindan R. Overcoming chemotherapy resistance in SCLC. J Thorac Oncol . (2021) 16:2002–15. doi: 10.1016/j.jtho.2021.07.018

79. Leiter A, Veluswamy RR, Wisnivesky JP. The global burden of lung cancer: current status and future trends. Nat Rev Clin Oncol . (2023) 20:624–39. doi: 10.1038/s41571-023-00798-3

80. Nooreldeen R, Bach H. Current and future development in lung cancer diagnosis. Int J Mol Sci . (2021) 22(16):8661. doi: 10.3390/ijms22168661

81. Jia Y, Li X, Jiang T, Zhao S, Zhao C, Zhang L, et al. EGFR-targeted therapy alters the tumor microenvironment in EGFR-driven lung tumors: Implications for combination therapies. Int J Cancer . (2019) 145:1432–44. doi: 10.1002/ijc.32191

82. Madeddu C, Donisi C, Liscia N, Lai E, Scartozzi M, Macciò A. EGFR-mutated non-small cell lung cancer and resistance to immunotherapy: role of the tumor microenvironment. Int J Mol Sci . (2022) 23(12):6489. doi: 10.3390/ijms23126489

83. Zhao Y, Guo S, Deng J, Shen J, Du F, Wu X, et al. VEGF/VEGFR-targeted therapy and immunotherapy in non-small cell lung cancer: targeting the tumor microenvironment. Int J Biol Sci . (2022) 18:3845–58. doi: 10.7150/ijbs.70958

84. Dantoing E, Piton N, Salaün M, Thiberville L, Guisier F. Anti-PD1/PD-L1 immunotherapy for non-small cell lung cancer with actionable oncogenic driver mutations. Int J Mol Sci . (2021) 22(12):6288. doi: 10.3390/ijms22126288

85. Castellanos EH, Horn L. Immunotherapy in lung cancer. Cancer Treat Res . (2016) 170:203–23. doi: 10.1007/978-3-319-40389-2_10

86. Wu X, Gu Z, Chen Y, Chen B, Chen W, Weng L, et al. Application of PD-1 blockade in cancer immunotherapy. Comput Struct Biotechnol J . (2019) 17:661–74. doi: 10.1016/j.csbj.2019.03.006

87. Horvath L, Thienpont B, Zhao L, Wolf D, Pircher A. Overcoming immunotherapy resistance in non-small cell lung cancer (NSCLC) - novel approaches and future outlook. Mol Cancer . (2020) 19:141. doi: 10.1186/s12943-020-01260-z

88. Meijer JJ, Leonetti A, Airò G, Tiseo M, Rolfo C, Giovannetti E, et al. Small cell lung cancer: Novel treatments beyond immunotherapy. Semin Cancer Biol . (2022) 86:376–85. doi: 10.1016/j.semcancer.2022.05.004

89. Zhang J, Liu X, Huang Z, Wu C, Zhang F, Han A, et al. T cell-related prognostic risk model and tumor immune environment modulation in lung adenocarcinoma based on single-cell and bulk RNA sequencing. Comput Biol Med . (2023) 152:106460. doi: 10.1016/j.compbiomed.2022.106460

90. Hu Z, Sui Q, Jin X, Shan G, Huang Y, Yi Y, et al. IL6-STAT3-C/EBPβ-IL6 positive feedback loop in tumor-associated macrophages promotes the EMT and metastasis of lung adenocarcinoma. J Exp Clin Cancer Res . (2024) 43:63. doi: 10.1186/s13046-024-02989-x

91. Han R, Guo H, Shi J, Zhao S, Jia Y, Liu X, et al. Osimertinib in combination with anti-angiogenesis therapy presents a promising option for osimertinib-resistant non-small cell lung cancer. BMC Med . (2024) 22:174. doi: 10.1186/s12916-024-03389-w

92. Mao S, Wang Y, Chao N, Zeng L, Zhang L. Integrated analysis of single-cell RNA-seq and bulk RNA-seq reveals immune suppression subtypes and establishes a novel signature for determining the prognosis in lung adenocarcinoma. Cell Oncol (Dordr) . (2024). doi: 10.1007/s13402-024-00948-4

93. Cooray U, Watt RG, Tsakos G, Heilmann A, Hariyama M, Yamamoto T, et al. Importance of socioeconomic factors in predicting tooth loss among older adults in Japan: Evidence from a machine learning analysis. Soc Sci Med . (2021) 291:114486. doi: 10.1016/j.socscimed.2021.114486

94. Cury SS, de Moraes D, Oliveira JS, Freire PP, Dos Reis PP, Batista ML Jr., et al. Low muscle mass in lung cancer is associated with an inflammatory and immunosuppressive tumor microenvironment. J Transl Med . (2023) 21:116. doi: 10.1186/s12967-023-03901-5

Keywords: Lung cancer, the tumor microenvironment (TME), bibliometric, visualized analysis, trend

Citation: Huang Z, Xie T, Xie W, Chen Z, Wen Z and Yang L (2024) Research trends in lung cancer and the tumor microenvironment: a bibliometric analysis of studies published from 2014 to 2023. Front. Oncol. 14:1428018. doi: 10.3389/fonc.2024.1428018

Received: 05 May 2024; Accepted: 16 July 2024; Published: 31 July 2024.

Reviewed by:

Copyright © 2024 Huang, Xie, Xie, Chen, Wen and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wei Xie, [email protected]

† These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Share full article

Advertisement

Supported by

Study Puts a $43 Billion Yearly Price Tag on Cancer Screening

The estimate focused on five cancers for which there is medically recommended screening — breast, cervical, colorectal, lung and prostate — and found that colonoscopies accounted for most of the costs.

Two nurses in green-blue scrubs attend to a patient lying down for a colonoscopy in a hospital room.

By Gina Kolata

The United States spent $43 billion annually on screening to prevent five cancers, according to one of the most comprehensive estimates of medically recommended cancer testing ever produced.

The analysis, published on Monday in The Annals of Internal Medicine and based on data for the year 2021, shows that cancer screening makes up a substantial proportion of what is spent every year on cancer in the United States, which most likely exceeds $250 billion. The researchers focused their estimate on breast, cervical, colorectal, lung and prostate cancers, and found that more than 88 percent of screening was paid for by private insurance and the rest mostly by government programs.

Dr. Michael Halpern, the lead author of the estimate and a medical officer in the federally funded National Cancer Institute’s health care delivery research program, said his team was surprised by the high cost, and noted that it was likely to be an underestimate because of the limits of the analysis.

For Karen E. Knudsen, the chief executive of the American Cancer Society, the value of screening for the cancers is clear. “We are talking about people’s lives,” she said. “Early detection allows a better chance of survival. Full stop. It’s the right thing to do for individuals.”

“We screen for cancer because it works,” Dr. Knudsen added. “The cost is small compared to the cost of being diagnosed with late-stage disease.”

Other researchers say the finding supports their contentions that screening is overused, adding that there is a weak link between early detection and cancer survival and that the money invested in cancer testing is not being well spent.

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and  log into  your Times account, or  subscribe  for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?  Log in .

Want all of The Times?  Subscribe .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Biomed Res Int

Logo of bmri

This article has been retracted.

Lung cancer classification and prediction using machine learning and image processing, sharmila nageswaran.

1 Department of Sensor and Biomedical Technology, School of Electronics Engineering, Vellore Institute of Technology, Tamil Nadu, India

G. Arunkumar

2 Department of Computer Science and Engineering, Madanapalle Institute of Technology & Science, Madanapalle, Andhra Pradesh, India

Anil Kumar Bisht

3 Department of CS&IT, MJP Rohilkhand University, Bareilly, U. P., India

Shivlal Mewada

4 Department of Computer Science, Govt. College, Makdone (Vikram University), Ujjain, India

J. N. V. R. Swarup Kumar

5 Department of CSE, SR Gudlavalleru Engineering College, Gudlavalleru, India

Malik Jawarneh

6 Faculty of Computing Sciences, Gulf College, Oman

Evans Asenso

7 Department of Agricultural Engineering, University of Ghana, Ghana

Associated Data

The data shall be made available on request.

Lung cancer is a potentially lethal illness. Cancer detection continues to be a challenge for medical professionals. The true cause of cancer and its complete treatment have still not been discovered. Cancer that is caught early enough can be treated. Image processing methods such as noise reduction, feature extraction, identification of damaged regions, and maybe a comparison with data on the medical history of lung cancer are used to locate portions of the lung that have been impacted by cancer. This research shows an accurate classification and prediction of lung cancer using technology that is enabled by machine learning and image processing. To begin, photos need to be gathered. In the experimental investigation, 83 CT scans from 70 distinct patients were utilized as the dataset. The geometric mean filter is used during picture preprocessing. As a consequence, image quality is enhanced. The K -means technique is then used to segment the images. The part of the image may be found using this segmentation. Then, classification methods using machine learning are used. For the classification, ANN, KNN, and RF are some of the machine learning techniques that were used. It is found that the ANN model is producing more accurate results for predicting lung cancer.

1. Introduction

One of the most lethal types of the disease, lung cancer, is responsible for the passing away of about one million people every year. The current state of affairs in the world of medicine makes it absolutely essential to perform lung nodule identification on chest CT scans. This is due to the fact that lung nodules are becoming increasingly common. As a direct result of this, the deployment of CAD systems is required in order to accomplish the objective of early lung cancer identification [ 1 ].

When doing a CT scan, sophisticated X-ray equipment is utilized in order to capture images of the human body from a number of different angles. Following this, the images are fed into a computer, which processes them in such a way as to produce a cross-sectional view of the internal organs and tissues of the body [ 2 ].

A CAD approach was trained and assessed in two separate experiments. One research used a computer simulation using ground truth that was generated by computers. In this work, the cardiac-torso (XCAT) digital phantom was used to replicate 300 CT scans. The second research made use of patient-based ground truth using human subjects and implanted spherical nodules of varied sizes (i.e., 3-10 mm in diameter) at random inside the lung area of the simulated pictures. CT images from the LIDC-IDRI dataset were used to create the CAD technique. 888 CT pictures left for processing after CT scans with a wall thickness of more than 2.5 mm were disregarded. In all investigations, a 10-fold cross-validation approach was used to assess network hyperparameterization and generalization. The detection sensitivities were measured in response to the average false positives (FPs) per picture to assess the overall accuracy of the CAD approach. Using the free-receiver response operating characteristic (FROC) curve, the detection accuracy in the patient research was further evaluated in 9 previously published CAD investigations. The mean and standard error between the anticipated value and ground truth were used to measure the localization and diameter estimate accuracies. In all investigations, the average outcomes throughout the 10 cross-validation folds showed that the CAD approach had a high level of detection accuracy. In the patient trial, the corresponding sensitivities were 90.0 percent and 95.4 percent, showing superiority in the FROC curve analysis over many traditional and CNN-based lung nodule CAD approaches. In both investigations, the nodule localization and diameter estimation errors were fewer than 1 mm. The CAD approach that was created was highly efficient in terms of computing [ 3 ].

It is likely that intravenous injection of contrast (X-ray dye) may considerably improve the quality of CT imaging, which can reveal a wide variety of organs and tissues. This is one of the potential benefits of contrast injection. In addition, CT scans can reliably detect kidney or gallstones, as well as abnormal fluid buildup or enlarged lymph nodes in the abdominal region or pelvis. This is in addition to the capacity to detect gallstones and kidney stones. Because the CT scan is unable to provide a precise diagnosis of certain organs, such as the stomach, it can, however, be used to reveal abnormalities in the soft tissues that are positioned nearby, offering an indirect diagnosis of these organs [ 4 , 5 ].

If lung cancer is detected at an early stage, the American Cancer Society estimates that a patient has a 47 percent chance of surviving the disease. It is quite unlikely that X-ray pictures may accidentally reveal lung cancer in its earlier stages [ 6 ]. It is famously difficult to detect lesions that are round and have a diameter of 510 millimeters or less. A CT scan of a patient diagnosed with lung cancer is shown in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is BMRI2022-1755460.001.jpg

CT scan image for lung cancer.

The processing of images is an essential activity for a diverse variety of business sectors. It is utilized in X-ray imaging of the lungs in order to find regions that contain cancerous growths. In order to detect areas of the lung that have been affected by cancer, image processing techniques such as noise reduction, feature extraction, identification of damaged regions, and maybe a comparison with data on the medical history of lung cancer are utilized. The majority of the time, digital image processing makes use of a diverse set of methods to merge a number of distinct aspects of a picture into a single coherent entity. This research takes an innovative technique in order to zero down on a particular aspect of the overall lung image. The split region may be seen in a variety of ways, including from different viewpoints and when illuminated in different ways. When utilizing this method, one of the key benefits is the ability to differentiate between portions of a picture that have been impacted by cancer and sections that have not been affected by cancer by comparing the intensity of the two sets of photos [ 6 , 7 ].

As a result of the fact that the majority of patients are diagnosed at a more advanced stage, lung cancer is the primary cause of death resulting from cancer. There is currently no chance of a successful treatment being developed. Lung cancer is consistently ranked as one of the most lethal forms of the disease, regardless of whether a country is industrialized or developing. The incidence of lung cancer in developing countries is on the rise as a result of a longer life expectancy, more urbanization, and the adoption of Western lifestyles. The early detection of cancer and the survival of people with the disease are both essential to the control of lung disease [ 8 , 9 ].

The literature survey section contains a review of various techniques for the classification and detection of cancer using image processing and classification. The methodology section presents accurate classification and prediction of lung cancer using machine learning and image processing-enabled technology. First, images are acquired. Then, images are preprocessed using the geometric mean filter. This results in improving image quality. Then, images are segmented using the K -means algorithm. This segmentation helps in the identification of the region of interest. Then, machine learning classification techniques are applied. The result section contains details related to the dataset and results achieved by various techniques.

To reduce the amount of data that has to be broken down, this study illustrates a method to separate the lung tissue from a chest CT. We will likely have a fully automated computation for cutting the lung tissue into sections and for separating the two sides of the lung as well. The threshold shown in the image separates fat from low-thickness tissue (the lungs). Cleaning is done to get rid of the commotion, air, and flight routes. Finally, a combination of morphological operations is used to tame the unexpected limit. The database used for the evaluation was obtained from a book that instructs radiologists. The current analysis shows that the linked division computation attempts to handle a wide range of different circumstances. The portioned lungs' textural accents were taken off, and it was provided. The neurological system is used to differentiate between the various lung diseases [ 10 ].

1.1. Literature Survey

Palani and Venkatalakshmi [ 11 ] have given predictive modeling of lung cancer illness by continuous monitoring. They did this by using fuzzy cluster-linked augmentation with a categorization. The fuzzy clustering approach is essential to the production of accurate picture segmentation. We instead utilized the fuzzy C -means clustering approach in order to accomplish our goal of further disentangling the characteristics of the transitional area from those of the lung cancer image. In this particular investigation, the Otsu thresholding method was applied in order to distinguish the transition area from the lung cancer representation. In addition to this, the right edge picture is utilized in conjunction with the morphological, thinning procedure in order to improve the presentation of the segmentation. The current Association Rule Mining (ARM), the conventional decision tree (DT), and the CNN are combined with a novel incremental classification technique in order to accomplish classification in an incremental fashion. In order to carry out the operations, standard images from the database were utilized, as well as the most recent data on the patient's health collected from IoT devices that were attached to the patient. The culmination of the research indicates that the predictive modeling system has become more accurate.

Deep residual learning was utilized by Bhatia et al. in order to develop a method for determining whether or not a CT picture contains lung cancer. The researchers have devised a preprocessing pipeline by making use of the UNet and ResNet models. This pipeline is intended to highlight and extract features from sections of the lung that are cancerous. An ensemble of XGBoost and random forest classifiers is used to gather predictions about the likelihood that a CT scan is malignant. The results of each classifier's predictions are then pooled, and the final result is used to determine the likelihood that a CT scan is malignant. The LIDC-IRDI has an accuracy that is 84 percent higher than that of typical techniques [ 12 ].

Joon et al. [ 13 ] segmented lung cancer using an active spline model as their method of analysis. With X-ray photos, through the use of this technique, X-ray images of the lung have been obtained. To begin, it is recommended that a median filter be used for noise detection while the preprocessing stage is being carried out. During the phase devoted to segmentation, further K -means and fuzzy C -means clustering are utilized for the purpose of feature capture. In this research, the ultimate feature retrieval outcome is reached after the X-ray picture has been segmented. The recommended model was developed by the application of the SVM approach for classification. In order to simulate the findings of the cancer detection system, MATLAB is utilized. The purpose of this study was to detect and categorize lung cancer by making use of images that were both normal and malignant.

Nithila and Kumar [ 14 ] have developed an active contouring model, and this model has been deployed. An application of a variation level set function was used for the segmentation of the lungs. It is essential to properly segment the parenchyma in order to arrive at an appropriate diagnosis of lung illness. CT, which stands for computerized tomography, was the first imaging modality to make use of image analysis in this manner. A significant advancement in CT lung image segmentation has been made by the development of the SBGF-new SPF function, which stands for selective binary and Gaussian filtering-new signed pressure force. By taking this strategy, external lung limitations have been identified, and inefficient expansion at the margins has been prevented. Comparisons are being made between the currently under consideration algorithm and four distinct active contour models. The results of the tests demonstrate that the strategy that was provided is reliable and can be computed very quickly [ 13 ].

Lakshmanaprabu et al. [ 15 ] created OODN (Optimal Deep Neural Network) by lowering the number of characteristics in lung CT scans and comparing it to other classification algorithms. This allowed them to design a more accurate method. The adoption of an automated classification method for lung cancer has cut down on the amount of time needed for human labeling and removed the possibility of mistakes being made by the individual doing the labeling. According to the findings of the researchers, the performance of the machine learning algorithms in terms of accuracy and precision in the detection of normal and abnormal lung photos has significantly increased. According to the findings, the research was successful in classifying lung pictures with a peer specificity of 94.56 percent, a level of accuracy of 96.2 percent, and a level of sensitivity of 94.2 percent. It has been shown that it is feasible to increase the performance of cancer detection in CAT scans [ 14 ]. The research has shown that this is the case.

Talukdar and Sarma have placed a strong emphasis on the use of image processing methods for the diagnosis of lung cancer (2018). Deep learning methodologies are being applied to the study of lung cancer. The most prevalent kind of cancer, lung cancer, is taking the lives of an alarmingly high number of individuals. The likelihood of an individual acquiring lung cancer was evaluated with a computed tomography (CT) scan. The growth of precancerous tissue is referred to as “nodules,” and their presence is utilized as a general indication of cancer. Educated radiologists are able to detect nodules and often predict their relationship with cancer. However, these radiologists are also capable of producing false positive and false negative findings. Because the patient is under continual stress, a tremendous quantity of data is evaluated, and a decision that is suitable for the patient is made in a timely manner. As a consequence of this, developing a computer-aided detection system that is capable of rapidly detecting features based on the input of radiologists is most likely to be the answer [ 15 ].

Yu et al. have obtained histopathology whole-slide slides of lung cancer and squamous cell carcinoma that have been stained with hematoxylin and eosin (2016). Patients' photographs were taken from TCGA (The Cancer Genome Atlas) and the Stanford TMA (Tissue Microarray Database), plus an additional 294 photos. Even when conducted with the greatest of intentions, an assessment of human pathology cannot properly predict the patient's prognosis. A total of 9,879 quantitative elements of an image were retrieved, and machine learning algorithms were used to select the most important aspects and differentiate between patients who survived for a short period of time and those who survived for a long period of time after being diagnosed with stage I adenocarcinoma or squamous cell carcinoma. The researchers used the TMA cohort to validate the survival rate of the recommended framework (P0.036 for tumor type). According to the findings of this study, the characteristics that are created automatically may be able to forecast the prognosis of a lung cancer patient and, as a consequence, may help in the development of personalized medication. The methodologies that were outlined can be utilized in the analysis of histopathology images of various organs [ 16 ].

Pol Cirueda and his colleagues used an aggregation of textures that kept the spatial covariances across features consistent. Mixing the local responses of texture operator pairs is done using traditional aggregation functions like the average; nonetheless, doing so is a vital step in avoiding the problems of traditional aggregation. Pretreatment computed tomography (CT) scans were utilized in order to assist in the prediction of NSCLC nodule recurrence prior to the administration of medication. After that, the recommended methods were put to use in order to compute the kind of NSCLC nodule recurrence according to the manifold regularized sparse classifier. These discoveries, which offer up new study possibilities on how to use morphological, tissue traits to evaluate cancer invasion, need to be confirmed and investigated further. However, this will not be possible without more research. When modeling orthogonal information, the author focused on the textural characteristics of nodular tissue and coupled those characteristics with other variables such as the size and shape of the tumor [ 17 ].

The creation of a method for the early detection and accurate diagnosis of lung cancer that makes use of CT, PET, and X-ray images by Manasee Kurkure and Anuradha Thakare in 2016 has garnered a significant amount of attention and enthusiasm. The utilization of a genetic algorithm that permits the early identification of lung cancer nodules by diagnostics allows for the optimization of the findings to be accomplished. It was necessary to employ both Naive Bayes and a genetic algorithm in order to properly and swiftly classify the various stages of cancer images. This was done in order to circumvent the intricacy of the generation process. The categorization has an accuracy rate of up to eighty percent [ 18 ].

Sangamithraa and Govindaraju [ 19 ] have used a preprocessing strategy in order to eliminate the unwanted unaffected by the use of median and Wiener filters. This was done in order to improve the quality of the data. The K -means method is used to do the segmentation of the CT images. EK-mean clustering is the method that is used to achieve clustering. To extract contrast, homogeneity, area, corelation, and entropy features from images, fuzzy EK-mean segmentation is utilized. A back propagation neural network is utilized in order to accomplish the classification [ 20 ].

According to Ashwini Kumar Saini et al. (2016), a summary of the types of noise that might cause lung cancer and the strategies for removing them has been provided. Due to the fact that lung cancer is considered to be one of the most life-threatening kinds of cancer, it is essential that it be detected in its earlier stages. If the cancer has a high incidence and mortality rate, this is another indication that it is a particularly dangerous form of the disease. The quality of the digital dental X-ray image analysis must be significantly improved for the study to be successful. A pathology diagnosis in a clinic continues to be the gold standard for detecting lung cancer, despite the fact that one of the primary focuses of research right now is on finding ways to reduce the amount of image noise. X-rays of the chest, cytological examinations of sputum samples, optical fiber investigations of the bronchial airways, and final CT and MRI scans are the diagnostic tools that are utilized most frequently in the detection of lung malignancies (MRI). Despite the availability of screening methods like CT and MRI that are more sensitive and accurate in many parts of the world, chest radiography continues to be the primary and most prevalent kind of surgical treatment. It is routine practice to test for lung cancer in its early stages using chest X-rays and CT scans; however, there are problems associated with the scans' weak sensitivities and specificities [ 19 ].

Neural ensemble-based detection is the name given to the automated method of illness diagnosis that was suggested in Kureshi et al.'s research [ 21 ] (NED). The approach that was suggested utilized feature extraction, classification, and diagnosis as its three main components. In this experiment, the X-ray chest films that were taken at Bayi Hospital were utilized. This method is recommended because it has a high identification rate for needle biopsies in addition to a decreased number of false negative identifications. As a result, the accuracy is improved automatically, and lives are saved [ 22 ].

Kulkarni and Panditrao [ 23 ] have created a novel algorithm for early-stage cancer identification that is more accurate than previous methods. The program makes use of a technology that processes images. The amount of time that passes is one of the factors that is considered while looking for anomalies in the target photographs. The position of the tumor can be seen quite clearly in the original photo. In order to get improved outcomes, the techniques of watershed segmentation and Gabor filtering are utilized at the preprocessing stage. The extracted interest zone produces three phases that are helpful in recognizing the various stages of lung cancer: eccentricity, area, and perimeter. These phases may be found in the extracted interest zone. It has been revealed that the tumors come in a variety of dimensions. The proposed method is capable of providing precise measurements of the size of the tumor at an early stage [ 21 ].

Westaway et al. [ 24 ] used a radiomic approach to identify three-dimensional properties from photos of lung cancer in order to provide prediction information. As is well known, classifiers are devised to estimate the length of time an organism will be able to continue existing. The Moffitt Cancer Center in Tampa, Florida, served as the location from where these photographs for the experiment's CT scans were obtained. Based on the properties of the pictures produced by CT scans, which may suggest phenotypes, human analysis may be able to generate more accurate predictions. When a decision tree was used to make the survival predictions, it was possible to accurately forecast seventy-five percent [ 23 ] of the outcomes.

CT (computed tomography) images of lung cancer have been categorized with the use of a lung cancer detection method that makes use of image processing. This method was described by Chaudhary and Singh [ 25 ]. Several other approaches, including segmentation, preprocessing, and the extraction of features, have been investigated thus far. The authors have distinguished segmentation, augmentation, and feature extraction, each in its own unique section. In Stages I, II, and III, the cancer is contained inside the chest and manifests as larger, more invasive tumors. By Stage IV, however, cancer has spread to other parts of the body [ 24 ], at which point it is said to be in Stage IV.

2. Methodology

This section shows an accurate classification and prediction of lung cancer using technology that is enabled by machine learning and image processing. To begin, photos need to be gathered. After that, a geometric mean filter is used to perform preprocessing on the images. This ultimately leads to an improvement in image quality. After that, the K -means method is used to segment the images. The identification of the region of interest is facilitated by this segmentation. After that, categorization strategies based on machine learning are utilized. Figure 2 illustrates the classification and prediction of lung cancer utilizing technology that enables machine learning and image processing.

An external file that holds a picture, illustration, etc.
Object name is BMRI2022-1755460.002.jpg

Classification and prediction of lung cancer using machine learning and image processing-enabled technology.

The preprocessing of images plays a significant role in the proper classification of photographs of illnesses. CT scans provide images with a broad variety of artefacts, including noise, which may be seen in these scans. These artefacts may be removed by using image filtering methods. A geometric mean filter is applied to the input pictures in an effort to decrease the amount of noise [ 25 ].

This is accomplished by using a method known as linear discriminant analysis (LDA), which cuts down on the amount of space required for the initial data matrix. The PCA and LDA are two examples of parallel transformation algorithms. In contrast to the supervised LDA method, the PCA is an unsupervised analysis method. In contrast to principal component analysis (PCA), latent dynamic analysis (LDA) seeks to identify a feature subspace that maximizes the possibility of class restoration. It is possible to avoid overfitting by placing more importance on the class-reparability of the data rather than the processing costs [ 26 ].

The method of segmentation is used in the process of medical image processing. The basic role of a picture is to differentiate between components that are beneficial and those that are harmful. As a consequence of this, it separates a picture into distinct pieces based on the degree to which each component is similar to its surrounding components. This effect may be achieved by manipulating the intensity as well as the texture. An area of interest that has been segmented may be utilized as a diagnostic tool to quickly get information that is pertinent to the issue at hand. When it comes to the process of segmenting medical pictures, the technique known as K -means clustering is the one that is used most often. During the clustering process, the picture is divided into a number of different groups, also known as clusters, which do not overlap with one another. These clusters are not connected to one another in any way. In this picture, there are a few distinct clusters that can be noticed. Every one of them has its own one-of-a-kind collection of reference points to which each pixel is assigned. To divide the available data into k separate groups, the K -means clustering algorithm divides the available information based on k reference points [ 27 ].

Artificial neural networks, also known as ANNs, are used often in the medical industry for the purpose of classifying medical images for the goal of diagnosing illness. In terms of the way it performs its tasks, the ANN is fairly comparable to the human brain. It is feasible to get the knowledge required to make an informed guess about the category that a photograph belongs to by looking at a collection of images that have already been categorized. This may be accomplished by looking at a collection of pictures that have been categorized. A category has already been selected for each of the pictures included in this gallery. An artificial neural network (ANN) is constructed up of artificial neurons, which are programmed to behave in a manner that is analogous to that of their biological counterparts in the human brain. Neurons are able to communicate with one another outside of their bodies through connections. It is possible to assign weights to neurons and edges, and those weights may be changed at any time throughout the process of learning. The standard structure of an artificial neural network has three layers: an input layer, a hidden layer, and an output layer that is in charge of creating the signal. This is the architecture that is used the most often. The most popular topologies for artificial neural networks include an input layer, a hidden layer, and a final layer; however, there are other possible configurations as well. It is conceivable that there is just one hidden layer, that there are several hidden levels, or that there are no hidden layers at all. Each and every one of these options is not completely out of the question. The weights that need to be adjusted until the desired output is reached are tucked away in a layer that is below the active layer [ 28 ]. The iterations are closely related to computing efficiency during the training of the ANN model. Precision will suffer by having too few hidden layer neurons, while too many neurons would lengthen training time.

The KNN approach, which is the method that is used in ML the most commonly, makes it easy to learn about the algorithms that are employed in ML. It is a technique of supervised learning that does not need the use of any parameters. The phase that the k -training NN goes through is thus significantly quicker than the phase that other classifiers go through. The testing stage, on the other hand, takes longer and uses more memory as it goes on. In order to use k -nearest neighbors to categorize new kinds of data points, one needs first to have data that is already organized into many different categories. Because training observations are included in each labeled dataset, the algorithm is able to establish a connection between x and y in each training dataset ( x , y ). The typical practice at this location is delaying the processing in order to locate the KNN function. The contributions of neighbors may be weighted in classification models as well as regression models, which can result in a higher average score for those who live in close proximity to one another in comparison to those who live farther away. As the distance between two neighbors increases, an additional weighting of 1/ d is applied to each neighbor [ 29 ]. Despite producing good precision on the test dataset, KNN is still slower and more expensive to run in terms of both time and memory. To store the whole training dataset for prediction, it needs a lot of memory. Additionally, as Euclidean distance is very reactive to orders of magnitude, features in the dataset with high magnitudes always have a higher weight than those with low magnitudes. Last but not least, we must remember that KNN is not appropriate for large-dimensional datasets.

It is possible to construct predictive models by using the random forest approach, which is used by a lot of people. Only two of the many applications that may be accomplished using RF are regression and classification [ 30 ]. It is possible to develop machine learning algorithms that are capable of making predictions with a high degree of accuracy so long as datasets are changed appropriately [ 31 ]. This approach is highly user-friendly in comparison to other algorithms, and it has a lot of support from members of the general public. For the purposes of this model, RF is an abbreviation for “random forest,” and true to its moniker, the model creates random forests. With the help of this technique, one may generate an entire grove of decision trees, each of which is trained in a distinct way. This method was used to build the current thicket of trees representing the many possible multiple-choice responses. As a direct consequence of this, they were integrated in order to provide even more accurate projections [ 22 ].

3. Result Analysis

A dataset of 83 CT images from 70 different patients was used in the experimental study [ x ]. Images are preprocessed using the geometric mean filter. This results in improving image quality. Then, images are segmented using the K -means algorithm. This segmentation helps in the identification of the region of interest. Then, machine learning classification techniques are applied.

For performance comparison, three parameters, accuracy, sensitivity, and specificity, are used:

where TP is true positive, TN is true negative, FP is false positive, and FN is false negative.

Results of different machine learning predictors are shown in Figures ​ Figures3 3 ​ 3 – 5 . The accuracy of ANN is better.

An external file that holds a picture, illustration, etc.
Object name is BMRI2022-1755460.003.jpg

Accuracy of machine learning techniques for lung cancer detection.

An external file that holds a picture, illustration, etc.
Object name is BMRI2022-1755460.004.jpg

Sensitivity of machine learning techniques for lung cancer detection.

An external file that holds a picture, illustration, etc.
Object name is BMRI2022-1755460.005.jpg

Specificity of machine learning techniques for lung cancer detection.

4. Conclusion

Lung cancer is one of the deadliest types of the disease, claiming the lives of approximately one million people each year. Given the current state of affairs in medicine, it is critical that lung nodule identification be performed on chest CT scans. As a result, the use of CAD systems is crucial for the early detection of lung cancer. Image processing is a necessary activity that is employed in a wide range of economic domains. It is used in X-ray imaging of the lungs to find areas of the body that have developed malignant growths. Image processing techniques such as noise reduction, feature extraction, identification of damaged regions, and maybe comparison with data on the medical history of lung cancer are used to locate sections of the lung that have been affected by cancer. This study demonstrates accurate lung cancer classification and prediction using technologies enabled by machine learning and image processing. To begin, photographs must be collected. Following that, the images are preprocessed using a geometric mean filter. This eventually leads to an increase in image quality. The K -means approach is then used to segment the images. This segmentation makes it easier to identify the region of interest. Following that, machine learning-based categorization algorithms are used. ANN predicts lung cancer with more accuracy. This research will help to increase the accuracy of lung cancer detection systems that use strong classification and prediction techniques. This study brings cutting-edge images based on machine learning techniques for implementation purposes.

Data Availability

Conflicts of interest.

The authors declare that they have no conflict of interest.

IMAGES

  1. IRJET- Lung Cancer Detection using Matlab Image Processing Techniques

    lung cancer detection research paper

  2. (PDF) Lung Cancer Tumor Detection using Image Processing and Bounding Box

    lung cancer detection research paper

  3. (PDF) ESR/ERS white paper on lung cancer screening

    lung cancer detection research paper

  4. Results of Papers Studying Detection of Lung Cancer

    lung cancer detection research paper

  5. (PDF) Efficient CNN for Lung Cancer Detection

    lung cancer detection research paper

  6. (PDF) Lung Cancer Detection on CT Images by using Image Processing

    lung cancer detection research paper

COMMENTS

  1. Deep learning-based algorithm for lung cancer detection on chest

    Lung cancer is the primary cause of cancer death worldwide, with 2.09 million new cases and 1.76 million people dying from lung cancer in 2018 1.Four case-controlled studies from Japan reported in ...

  2. Deep learning for lungs cancer detection: a review

    Just like other cancers, the early detection of lungs cancer is mandatory due to which the chances of survival increase (Pathak et al. 2018).A large number of people affected by lung cancer cannot survive due to the delay in early detection, the overall survival rate of the patient is five years which is less than 20% (Roointan et al. 2019).Age is not a vital prognostic factor when it comes to ...

  3. Performance of machine learning algorithms for lung cancer ...

    Prediction of lung cancer can be useful if the system for cancer prediction works after symptom detection and also correlates to the patient's habits and state about the cancer at a low risk.

  4. Effective lung nodule detection using deep CNN with dual attention

    Lung cancer is the main reason for cancer-related deaths, according to the American Cancer Society. Following to the statistics for cancer in 2022, there were almost 1.9 million reported cases and ...

  5. Recent advancements in deep learning based lung cancer detection: A

    A report from the International Agency for Research on Cancer (IARC) states that 27 million new cases of cancer are expected before 2030. 1 in 18 men and 1 in 46 women are estimated to develop lung cancer over a lifetime. This paper discusses an overview of lung cancer, along with publicly available benchmark data sets for research purposes.

  6. Deep Learning Techniques to Diagnose Lung Cancer

    The pooled sensitivity and specificity of deep learning approaches for lung cancer detection were 93% and 68%, respectively. The results showed that AI plays an important role in medical imaging, but there are still many research challenges. Go to: This study extensively surveys papers published between 2014 and 2022.

  7. Lung Cancer Prediction using Machine Learning: A ...

    The prominent cause of cancer-related mortality throughout the globe is "Lung Cancer". Hence beforehand detection, prediction and diagnosis of lung cancer has become essential as it expedites and simplifies the consequent clinical board. To erect the progress and medication of cancerous conditions machine learning techniques have been utilized because of its accurate outcomes. Various ...

  8. Advancement in Lung Cancer Diagnosis: A Comprehensive Review ...

    Lung cancer, a fierce adversary in the realm of oncology, stands as a leading cause of cancer-related mortality worldwide. Despite significant strides in medical research and treatment modalities, the complexity of lung cancer poses a continuous challenge, demanding innovative approaches for early and accurate diagnosis.

  9. A survey on cancer detection via convolutional neural networks: Current

    An automatic decision-making process is thus an essential need for cancer detection and diagnosis. This paper presents a comprehensive survey on automated cancer detection in various human body organs, namely, the breast, lung, liver, prostate, brain, skin, and colon, using convolutional neural networks (CNN) and medical imaging techniques.

  10. Recent advancement in cancer detection using machine learning

    The database developed to support lung nodules research and contains 244,617 images. ... Table 2 presents a detailed comparison of current methods of lung cancer detection on LIDC-IDRI dataset. ... Accordingly, this paper has presented a systematic review of current techniques in diagnosis and cure of several cancers affecting human body badly ...

  11. Lung Cancer Detection Using Convolutional Neural Network on

    This paper compares three of the most popular ML techniques commonly used for breast cancer detection and diagnosis, namely Support Vector Machine (SVM), Random Forest (RF) and Bayesian Networks (BN).

  12. A Review of Deep Learning Techniques for Lung Cancer Screening and

    Numerous systems have been created, and research into the detection of lung cancer is still ongoing. However, some systems still need to be improved in order to obtain the best detection accuracy possible, which is going towards 100%. ... The review paper examines the body of knowledge on lung cancer diagnosis and presents several ...

  13. Lung Cancer Detection Using Deep Learning and Explainable Methods

    Lung cancer is one of the most prevalent deadly diseases and it can extend to the rest of the human body. One way to detect it in CT scan images is by using deep learning models. Explaining these models by XAI techniques and radiologists make the results trusted for medical use. In this paper, the deep learning models inceptionV3and ResNet50 were used to classify CT scans of lungs for the ...

  14. Lung cancer prediction using machine learning and advanced imaging

    The 2017 lung cancer detection data science bowel (DSB) competition hosted by Kaggle was a much larger two-stage competition than the earlier LungX competition with a total of 1,972 teams taking part. In stage 1, a large training dataset of 1,397 patients was provided comprising 362 with lung cancer and 1,035 without, along with an initial ...

  15. Real-time detection of Lung cancer using CNN

    One of the leading causes of mortality worldwide is Lung cancer, and early identification is important for successful treatment. Deep Learning Technologies Convolutional Neural Networks (CNNs) have recently exhibited significant guarantees in medical image refining. This research paper offers a real-time lung cancer recognition method based on a CNN. The suggested method takes a digitised ...

  16. Clinical application of convolutional neural network lung nodule

    Introduction. Early-stage lung cancer diagnosis through detection of nodules on computed tomography (CT) remains integral to patient survivorship, promoting national screening programmes and diagnostic tools using artificial intelligence (AI) convolutional neural networks (CNN); the software of AI-Rad Companion™ (AIRC), capable of self-optimising feature recognition.

  17. (PDF) A Review of most Recent Lung Cancer Detection ...

    The detection of lung cancer has previously been considered using techniques of image. processing as the work implemented by (Abdillah et al., 2017) in initiation with deep learning. and neural ...

  18. Deep learning ensemble 2D CNN approach towards the detection of lung cancer

    Lung Cancer has different types: small cell lung cancer and non-small cell Lung Cancer 8. Figure 1 explains the CT Scan images used to detect the presence of a Lung Nodule, a cancer tumor.

  19. Lung Cancer Prediction and Detection Using Image ...

    The image processing mechanisms were frequently utilized for prediction of Lung-Cancer and. also for premature detection to avoid the Lung -Cancer [8]. Image processing mechanisms include. several ...

  20. Detection and classification of lung cancer using CNN and Google net

    Various functions were applied. The Author [3] conducted a study focused entirely on the identification of lung cancer medical images using deep neural networks. The goal of this study was to see if there was any evidence of cancer in a patient's lungs. To aid clinicians in visual diagnostics by training deep neural networks to detect lung cancer.

  21. Development and validation of an integrated system for lung cancer

    We developed and validated a liquid biopsy-based comprehensive lung cancer screening and management system called PKU-LCSMS which combined a blood multi-omics based lung cancer screening model incorporating cfDNA methylation and protein features and an AI-aided pulmonary nodules diagnostic model integrating CT images and cfDNA methylation features in sequence to streamline the entire process ...

  22. Advances in lung cancer screening and early detection

    Evidence from the US and Europe . In 2011, results from the US-based National Lung Screening Trial (NLST) indicated a 20% decrease in lung cancer-related mortality after a median follow-up of 6.5 years in patients undergoing annual LDCT screening compared with scanning by radiography at the same frequency for 3 years 19.Notably, a relative decrease of 6.7% (95% CI 1.2-13.6, P = 0.02) in all ...

  23. Lung Cancer Detection and Classification from Chest CT Scans Using

    Lung cancer is one of the key causes of death amongst humans globally, with a mortality rate of approximately five million cases annually. The mortality rate is even higher than breast cancer and prostate cancer combination. However, early detection and diagnosis can improve the survival rate. Different modalities are used for lung cancer detection and diagnosis, while Computed Tomography (CT ...

  24. From the IASLC Early Detection and Screening ...

    DOI: 10.1016/j.jtho.2024.07.022 Corpus ID: 271704716; From the IASLC Early Detection and Screening Committee Terminology Issues in Screening and Early Detection of Lung Cancer - IASLC Early Detection and Screening Committee Expert Group Recommendations.

  25. Frontiers

    Introduction. Lung cancer (LC) is a global health concern and one of the leading causes of cancer-related mortality. According to the global cancer statistics report published by the International Agency for Research on Cancer (IARC), incidence and mortality rates of lung cancer remain high, accounting for 18% of global cancer deaths in 2020 (1-3).

  26. (PDF) Lung Cancer Detection using Deep Learning

    In this paper, the author prop oses a method of detecting lung cancer in a CT scan using a 2D-UNet. model on a web application. The author cropp ed 2D cancer masks on its reference image using the ...

  27. Prediction of Cancer Disease using Machine learning Approach

    Deep Convolutional Neural Network CNNs is used to identify or label a medical image in some research papers. Diagnosed lung cancer in 2015 with a multiscal two-layer CNN ... the predictive models using the machine learning algorithms reported in the literal works are less for lung cancer detection with IoT integration. There is a high scope to ...

  28. Study Puts a $43 Billion Yearly Price Tag on Cancer Screening

    She added that "the value of screening is settled science." Cancer death rates have been plummeting in the past few decades. Experts debate the reasons, but Dr. H. Gilbert Welch, a senior ...

  29. Lung Cancer Classification and Prediction Using Machine Learning and

    The creation of a method for the early detection and accurate diagnosis of lung cancer that makes use of CT, PET, and X-ray images by Manasee Kurkure and Anuradha Thakare in 2016 has garnered a significant amount of attention and enthusiasm. ... ANN predicts lung cancer with more accuracy. This research will help to increase the accuracy of ...

  30. Four cancer cases with pathological germline variant RAD51D c.270

    Pathological germline variants (PGVs) of RAD51D increase the risk of breast and ovarian cancer. In East Asia, c.270_271dup is the most frequently detected PGV of RAD51D; however, only a few cases have been reported in Japan.We report four cancer cases with a germline RAD51D c.270_271dup PGV. Three of them (lung cancer: 2, oral cancer: 1) were incidentally identified by whole genome sequencing ...