Developing an innovative lung cancer detection model for accurate diagnosis in AI healthcare systems

Table of Contents

Experimental setup

The model (CNN-GRU) was implemented by conducting several experiments to validate its performance. In these experiments, different data augmentation techniques such as rotation, and brightness were used for data pre-processing. The data set of LC was divided into 80% and 20% for training and testing using holdout cross-validation techniques. Moreover, evaluation metrics were computed for model performance evaluation. In the experimental setting, the hyperparameters were manually modified to improve model performance. The batch size was set to 32, 64, and 120, with 30, 50, and 60 epochs and a learning rate of 0.0001 and 0.001. In addition, a dropout rate of 0.2 was introduced to all experiments to prevent overfitting and improve generalization. These manually selected settings were determined based on empirical observations to obtain the best training stability and accuracy. All experiments were repeated with different hyperparameters until they obtained stable results and we only reported stable results. Also, various tools were employed for simulating these experimentations, such as Python with TensorFlow, and Keras. For the experiments, an Intel® Core™ i5-2400 CPU, 4 GB of RAM, and Windows 10 were used on the computer.

Results analysis

The experimental results are analyzed here.

Data pre-processing results

The original IQ-OTH/NCCD lung cancer dataset had 1097 images, with an imbalance between classes: 561 malignant, 120 benign, and 416 normal. To overcome this issue, data augmentation techniques such as rotations, and color brightness were used to boost the number of images in underrepresented categories. As a result, the enlarged dataset now has 1683 images, with each class balanced at 561 images, resulting in enhanced model performance and reduced bias towards the majority class.

CNN model results

The predictive capability of the CNN model was assessed through various experiments employing original and augmented datasets. The LC dataset has 1190 images of three classes(Malignant, Normal, and Benign). The data augmentation techniques rotations and color brightness were used. There are 1683 images included in the new dataset. The model was for 50 epochs, batch size was 120, and other hyperparameters(PR) for all experiments are listed in Table 2. Table 2 shows that the CNN architecture with the SGD optimizer at a rate of 0.0001 obtained 96.12% accuracy, 93.02% specificity, 95.24% sensitivity, 98.87% precision, and 97.73% F1 score. The CNN with the ADAM and at an LR of 0.0001 obtained accuracy (96.89%), specificity (99.78%), sensitivity (100%), precision(99.45%), and F1 score(98.78%). Results of the CNN model are graphically shown in Fig. 6.

Table 2 CNN model results on original and augmented datasets.

The results of the CNN model on the augmented data set are reported in Table 2. Table 2 shows that the CNN architecture with the SGD optimizer and at a learning rate of 0.0001 obtained 96.78% accuracy, 98.54% specificity, 99.78% sensitivity, 99.23% precision, and 98.03% F1 score. CNN architecture with the ADAM at a learning rate of 0.0001 obtained accuracy (97.02%), specificity (97.30%), sensitivity (100%), precision (99.89%), and F1 score (98.88%). Figure 7 shows graphically the results of the CNN model on the newly generated dataset.

GRU model results

The results of the GRU model on the original data set are reported in Table 3. Table 3 demonstrated that the GRU model with the SGD optimizer and at a learning rate of 0.0001 obtained accuracy (97.00%), specificity(100.00%), sensitivity(97.07%), precision(99.00%), and F1 score (99.10%). The GRU with the ADAM same LR of 0.0001 obtained accuracy (98.23%), specificity(98.68%), sensitivity(100.00%), precision(97.00%), and F1 score(99.34%). Results of GRU architecture on the original dataset are shown graphically in Fig. 8.

Table 3 GRU model results on original and augmented datasets.

The GRU model experimental results on augmented data are reported in Table 3. Table 3 presented that the GRU architecture with the SGD optimizer and at a learning rate of 0.0001 obtained accuracy(97.98%), specificity(99.78%), sensitivity(99.23%), precision(99.23%), F1 score (98.03%). GRU with the ADAM at an LR of 0.0001 obtained accuracy(98.97%), specificity(99.80%), 100.00% sensitivity, precision(99.96%), and F1-score(98.95%). Results of GRU architecture on the augmented dataset are shown graphically in Fig. 9.

Proposed (CNN-GRU) model results

The proposed model (CNN-GRU) has been evaluated experimentally using original and augmented data sets. The results of the models with SGD and ADAM optimizers are reported in Table 4. Table 4 demonstrated that the proposed architecture with the SGD optimizer obtained accuracy(98.95%), specificity(100%), sensitivity(99.87%), precision(97.04%), 98.99% F1 score. The model with the ADAM obtained accuracy(99.12%), specificity(99.89%), sensitivity(96.34%), precision(99.23%) , and F1 score(99.32%).

Table 4 The (CNN-GRU) model results on original and augmented datasets: The number of trainable parameters in a model indicates its space complexity.

on the other side With an augmented data set the model (CNN-GRU) results in Table 4 shows that the Proposed model with the SGD optimizer and at a learning rate of 0.0001 obtained accuracy(98.99%), specificity(98.78%), sensitivity(98.54%), precision(99.73%), F1 score(99.33%). The model with the ADAM at a learning rate of 0.0001 obtained accuracy(99.77%), specificity (100%), sensitivity(99.%), precision(99.98% ), and F1 score(99.97%). Results of the proposed model (CNN-GRU) on original and augmented datasets are shown graphically in Figs. 10 and 11.

The CNN-GRU mosel performance is validated with an independent CT-Scan images data set and the model is trained with an augmented IQ-OTH/NCCD data set under the parameters( Eposes=50, Batch size=120, LR=0.0001). The cross-validation results are reported in Table 5. The Model CNN-GRU with SGD optimizer achieved 98.97% accuracy, 97.40% specificity, 99.20% sensitivity, 96.64% precision, and 99.76% f1-score. While ADAM optimizer 99.68% accuracy, 99.78% specificity, 99.86 % sensitivity, 99.90% precision and 99.94% f1-score. The performance of the model with an independent CT-Scan images dataset is a little low as compared to the IQ-OTH/NCCD test data set as reported in Tables 4 and 5. These results show that the proposed model has the capability for generalization.

Table 5 Model (CNN-GRU) cross-validation with CT-Scan images data set.

From the above experimental results analysis we concluded that the model (CNN-GRU) with ADAM on augmented data achieved 99.77% accuracy. The high accuracy of the proposed model is due to the integration of the CNN and GRU models and the data augmentation approach.

Comparison of proposed model with baseline models using T-Test

The accuracy of the model compared with the baseline models in Table 6 and Fig. 12. The model achieved 99.77% accuracy as compared to baseline models. To statistically validate the proposed model(CNN-GRU) performance compared to a baseline, we incorporate hypothesis testing using T-tests (for two groups)³⁰. The T-test determines whether the means of two independent models differ significantly from one another. The null hypothesis \(H_0\) states that the two groups’ means are equal. The alternative hypothesis \(H_1\) states that the two groups’ means differ. If \(p < 0.05\) the proposed model significantly outperforms the baseline models and If\(p \ge 0.05\), there is no significant difference between the proposed model and baseline models.

According to the T-Test, the P-value is equal to 0.000 which means that \(0.000 < 0.05\) and it demonstrates that the proposed model outperformance than baseline models. Due to high accuracy, we recommend the model for LC detection in AI-based healthcare systems.

Table 6 Accuracy comparison with baseline models.

Model complexity computation

The space and Time complexity of model CNN-GRU with optimizer SGD and ADAM with original and augmented data are reported in Table 4 for the detection of Lung cancer. The space complexity is analyzed by taking into account of model trainable parameters because the proposed model uses deep learning techniques. To compute the time complexity, we use the training time of the model. Table 4 presented that CNN-GRU has the worst space complexity since its trainable parameter is 140 million with ADAM optimizer on augmented data, while CNN-GRU has 7.4 hours the best space-time complexity with SDG on original data. Additionally, for the time complexity, the CNN-GRU model has the worst time complexity because its training time is 10.3h hours. Thus the proposed CNN-GRU model predictive accuracy is high but it is computationally more complex due to the complex structure, more training parameters, and huge training data. However, the complexity problem can be handled by using more high-performance technology such as GPU.

Discussion

Lung cancer is a critical clinical issue and around the world, many people are affected by it. The accurate and on-time diagnosis of lung cancer is a critical challenge for medical professionals and researchers. Lung cancer diagnosis on conventional methods is not effective for accurate and on-time diagnosis for reliable treatment and recovery. To tackle these issues researchers nowadays incorporate artificial intelligence mechanisms for initial stage diagnosis of lung cancer using medical big data such as patient medical history, MRI, and CT Scan image data. Deep learning, a major AI technology, requires big data to exhibit the self-learning procedure to multiple computations of data patterns that classify tumors into their related classes. However deep learning-based lung cancer diagnosis requires big and properly balanced label data for effective training and testing of the model. The deep learning model especially the convolutional neural network is a more suitable model for medical image analysis³¹. The CNN algorithm can extract more deep patterns from images for accurate image classification³². The gated recurrent units (GRU) model is also a suitable model for the analysis of medical imaging data as compared to LSTM for its computational efficiency, faster training, and lower memory requirements, making it ideal for large-scale medical imaging. With a simpler gating mechanism, GRU effectively captures long-term dependencies while requiring less computational power. Additional techniques of deep learning such as attention techniques and data augmentation techniques can also improve the predictive capability of the CNN model for precise diagnosis of Lung cancer in AI-based healthcare systems³³.

In this study, we proposed a deep learning techniques-based integrated model(CNN-GRU) for lung cancer detection. In the designing of the model, the CNNs and GRUs models are integrated. The GRU captures temporal relationships between medical image sequences, such as multiple CT scans over time. To enhance the model’s capacity to identify subtle patterns and progressions, the GRU monitors changes in lung tissue and tumor growth while the Convolutional Neural Network (CNN) extracts spatial data. The model is enhanced in accuracy and efficacy by integrating CNN’s spatial analysis with GRU’s sequential learning. The (CNN-GRU) model was validated using LC image data. The hold-out validation technique was used for the training and testing of the model. Moreover, evaluation metrics were applied for the evaluation of the model. The optimization algorithm SGD and ADAM are incorporated during the model’s training. According to the experimental results in Tables 4 and 5. The CNN-GRU model with SGD optimizer obtained accuracy(98.95%), specificity(100%), sensitivity(99.87%), precision(97.04%), 98.99% F1 score. The model with the ADAM obtained accuracy(98.12%), specificity(99.89%), sensitivity(96.34%), precision(99.23%), and F1 score(99.32%) results on original data as reported in Table 4.

On the other side with an augmented data set the model (CNN-GRU) results in Table 4 shows that the model with the SGD optimizer and at a learning rate of 0.0001 obtained accuracy(98.99%), specificity(98.78%), sensitivity(98.54%), precision(99.73%), F1 score(99.33%). The model with the ADAM at a learning rate of 0.0001 obtained accuracy(99.77%), specificity (100%), sensitivity(99.%), precision(99.98% ), and F1 score(99.97%).

The integrated model with augmented data increased accuracy from 99.12% to 99.77% with Adam optimizer. The improved accuracy of the proposed model demonstrated that the structure of the model is well suited for deep pattern recognition and classification of images. From the results analysis, we concluded that the proposed model is more suitable for accurate diagnosis of Lung cancer in AI-based health care systems.

However, the prospered model obtained greater accuracy, but it had significant technological limitations. It requires huge well-structured data and is computationally intensive, restricting its application in resource-constrained situations. Differences in images have an impact on generalization, and overfitting persists. Furthermore, this work did not investigate sophisticated data augmentation such as elastic deformations, random cropping, and adversarial training, and deep learning methods such as transfer learning and federated learning, which could improve model resilience and flexibility. The proposed CNN-GRU model predictive accuracy is high as compared to the baseline model but it is computationally more complex due to the complex structure, more training parameters, and huge training data. However, the complexity problem can be handled by using more high-performance technologies.

Furthermore, deploying AI in healthcare faces challenges such as integration with existing workflows, data privacy compliance, and regulatory approvals. Issues like model bias, interpretability, and clinician trust must also be addressed for successful adoption.

To overcome these limitations, future research should explore the integration of advanced techniques, such as multi-modal data fusion, transfer learning, and more sophisticated model architectures. Additionally, efforts should be made to enhance the model’s generalization capabilities and interpretability, ensuring that it can be applied effectively across diverse patient populations. By addressing these challenges, a more reliable, accurate, and scalable model for lung cancer diagnosis can be developed, ultimately advancing healthcare systems and improving patient outcomes.

link