Featured Issues


Search Articles

Convolutional Neural Network Mediated Detection of Pneumonia

November 16, 2021
Rohan Ghotra, Syosset High School

Abstract: Pneumonia, a fatal lung disease, is caused by infection of Streptococcus pneumoniae; it is detected by chest x-rays that reveal inflammation of the alveoli. However, the efficiency by which it is diagnosed can be improved through the use of artificial intelligence. Convolutional neural networks (CNNs), a form of artificial intelligence, have recently demonstrated enhanced accuracy when classifying images. This study used CNNs to analyze chest x-rays and predict the probability the patient has pneumonia. Furthermore, a comprehensive investigation was conducted, examining the function of various components of the CNN, in the context of pneumonia x-rays. This study was able to achieve significantly high performance, making it viable for clinical implementation. Furthermore, the architecture of the proposed model is applicable to various other diseases, and can thus be used to optimize the disease diagnosis industry.

Keywords: artificial intelligence, disease diagnosis, pneumonia, convolutional neural networks, machine learning


I. Introduction

Streptococcus pneumoniae is an infectious bacteria that causes pneumonia, a disease characterized by inflammation of the alveoli in the lungs. Causing over 2500 deaths each year, this disease can be lethal if not treated. Furthermore, the mortality rate grows exponentially as the age of the diseased increases, reaching as high as 2.2 percent. Pneumonia has also been found to be prevalent in many infected with SARS-CoV-2. Nevertheless, pneumonia is easily treated with antibiotics if it is diagnosed at an early stage.

Currently, pneumonia is diagnosed with the help of x-rays. Several images are compiled to construct a two-dimensional cross section of the subject's chest. A modified form of this technique also exists, in which a computed tomography (CT) scan is used to generate a three-dimensional map. In both cases, the resulting image is then manually examined for symptoms of pneumonia. Although this process has proven to be effective, it can take up to 20 minutes to complete. Automation of pneumonia diagnosis would streamline the process and remove the need for specialization in that field.

In this study, a convolutional neural network was designed and trained to analyze pneumonia radiographs and produce an appropriate diagnosis. The proposed model was demonstrated to achieve a high accuracy rate and low time cost.

II. Convolutional Neural Networks

Although feed-forward neural networks perform well on most datasets, as the number of inputs grows, larger networks are needed. As a result, the computation time increases, as well as the chance of overfitting, a condition in which the model memorizes the training data and cannot generalize to new data. In 1994, a new architecture for deep learning, dubbed the convolutional neural network, was proposed. This model specializes in image classification, employing feature extraction and analysis.

Prior to convolutional neural networks, modified forms of neural networks had been proposed to improve performance of image classifiers. Algorithms such as frequency modulation, biorthogonal wavelet transform, and logistic regression have been previously used to reduce the computational load and counteract the large input sizes. However, each of these techniques requires preprocessing to extract and validate features from the image. Convolutional neural networks resolve this issue by implementing feature extraction in its early layers; later layers of the model are then used to analyze the detected attributes, as demonstrated in Figure 2 [11].

III. Pooling

The three types of pooling are applicable to different situations, depending on the task. Average pooling is typically used to prevent overfitting. When a convolutional neural network overfits, its filters look for features unique to each image in the training dataset; when a one of these features is identified, the model pairs it with an output. By stochastically determining the representative elements, the model cannot associate filters with images, as there is a chance an unrepresentative element will be chosen that will cause the model to produce a false prediction.

Average pooling uses the average of each pool when generating the pooled feature map; as a result, this pooling technique is influenced by outliers. This works well in situations, as the features will sway the representative elements, allowing their shape to be retained. However, it is important to note that the contrast between the foreground and the background will decrease, as the background pixels will also pull the average towards themselves. This is illustrated in Figure 7: in both scenarios, the feature is still visible in the pooled feature map, albeit with less contrast [13].

Max pooling differs from average pooling, as it takes the largest element from each subsection. This technique is more appropriate than average pooling in situations where the background is dark, and the foreground is light; since the lighter elements have higher values, they will be selected when pooling. However, when the image's features are dark, background elements are selected during pooling, as their values are larger. As shown in Figure 6, the feature is maintained in image B, but lost in image A [13].

In this study, max pooling was used, as it is the most appropriate technique for the dataset of chest CT scans. As shown in Figure 10, the chest x-rays used in this study consist of a black background, with white features. Since max pooling preserves these types of features better than average pooling, it was used to improve model performance.

IV. Fully Connected Layers

Once all the features from the input image have been extracted, they must be analyzed for the model to produce a prediction. Fully connected layers (FCLs) are employed to perform this task. They function very similar to feed forward neural networks (Figure 1) in that they employ a system of neurons and synapses to analyze a series of inputs. In a convolutional neural network, the fully connected layer is placed after the convolution and pooling layers; it receives a list of pooled maps as input [10].

V. Results and discussion

Using the GPU provided by Google Colaboratory, each model was trained in roughly ten minutes. This is a relatively low time-cost, making the model cheap and easy to train, an important characteristic when being implemented in a clinical environment.

The training periods of the five models are illustrated in Figure 9. Graphs 9a and 9b display a steady upward trend in validation AUROC and AUPR as the model trained. The best performing model achieved a maximum validation AUROC of 0.9856 and a maximum validation AUPR of 0.9832, indicating the model performed very well. Moreover, in Figure 9b, the model AUPR statistics had not yet completely plateaued; this suggests additional training would further increase the accuracy. Figure 9c shows a consistent decrease in validation loss during training. The models exhibited no signs of overfitting, as demonstrated by unidirectional trend in all three graphs; if overfitting had occurred, the graphs would resemble a parabola - after some improvement, the models' performance would begin to worsen.

After training, each model was evaluated on the test images; the results are summarized in Table 3. The five models had an average AUROC of 0.9728, with the best model reaching 0.9754. The AUROC statistic measures the discriminatory behavior\footnote{Discriminatory behavior - the ability of a neural network to produce outputs close to 0 and 1} of a model; the high AUROC value of the proposed model indicates that it performs well in distinguishing between normal and pneumonia. In contrast to AUROC, the AUPR statistics measures the frequency of correct classifications; the proposed model's mean AUPR of 0.9710 indicates it performs well in classifying pneumonia and normal x-rays. The difference between AUROC and AUPR can best be understood as quality vs quantity; AUROC represents the quality of predictions whereas AUPR represents the quantity of correct predictions. The proposed model was successful in achieving high values in both AUROC and AUPR.

VI. Conclusion

In this study, we designed a convolutional neural network to diagnose patients with pneumonia, by analyzing chest x-rays. The architecture of the proposed model was revolutionary in that it only used two convolutional layers with a mere 32 filters each. The model's size was significantly smaller than most conventional neural networks, thus giving it computational superiority and allowing it to be trained in relatively short periods of time. In addition, the model achieved high performance in both the receiver operating characteristic curve and the precision recall curve.

It's success in distinguishing normal from pneumonia diseased patients makes it viable for clinical use. Implementing an artificial intelligence tool in medical facilities can help streamline the process by which pneumonia is detected; the model constructed in this study diagnoses x-rays in less than 3 milliseconds, compared to the 20 minutes required by human analysis. Furthermore, this architecture is extendable to other diseases, including cancer, arthritis, and multiple sclerosis, that use medical imaging for diagnosis. As such, when an x-ray is obtained, it can be fed through several neural networks, each trained to detect a different disease, resulting in a distribution depicting the patient's probability of having each disease.

In the future, modifications of the proposed architecture can be researched. Residual connections can be employed to link the output of the first convolutional layer to the fully connected layers. In addition, different optimization algorithms and activation functions can be tested for their impact on performance. Conducting more experiments before clinical implementation is important in that a false negative error can be fatal; thus, it is necessary to research methods that can further increase accuracy.


References

  1. Albawi,  S.,  Mohammed,  T.  A.,  &  Al-Zawi,  S.   (2017).   Understanding  of  a  convolutionalneural network.  In 2017 international conference on engineering and technology (icet) (p. 1-6).  doi:  10.1109/ICEngTechnol.2017.8308186
  2. Bebis,  G., & Georgiopoulos,  M.  (1994).  Feed-forward neural networks. IEEE Potentials, 13(4), 27-31.  doi:  10.1109/45.329294
  3. Bjorck,  J.,  Gomes,  C.,  Selman,  B.,  &  Weinberger,  K.  Q.   (2018).   Understanding  batch normalization. arXiv preprint arXiv:1806.02375.
  4. Eckle, K., & Schmidt-Hieber, J. (2019). A comparison of deep networks with relu activation function and linear spline-type methods. Neural Networks,110, 232–242.
  5. Himavathi,  S.,  Anitha,  D., & Muthuramalingam,  A.  (2007).  Feedforward neural network implementation in fpga using layer multiplexing for effective resource utilization. IEEE Transactions on Neural Networks,18(3), 880-888.  doi:  10.1109/TNN.2007.891626
  6. Ho, Y., & Wookey, S.  (2019).  The real-world-weight cross-entropy loss function:  Modeling the costs of mislabeling. IEEE Access,8, 4806–4813.
  7. Huss-Lederman, S., Jacobson, E. M., Johnson, J. R., Tsao, A., & Turnbull, T.  (1996).  Implementation of strassen’s algorithm for matrix multiplication.  In Supercomputing’96:Proceedings of the 1996 acm/ieee conference on supercomputing(pp. 32–32).
  8. Kermany, D., Zhang, K., Goldbaum, M., et al. (2018). Labeled optical coherence tomography (oct) and chest x-ray images for classification. Mendeley data,2(2).
  9. LeCun, Y., Haffner, P., Bottou, L., & Bengio, Y.  (1999).  Object recognition with gradient-based  learning. In Shape, contour and grouping in computer vision (pp.  319–345). Springer.
  10. Liu, K., Kang, G., Zhang, N., & Hou, B. (2018). Breast cancer classification based on fully-connected layer first convolutional neural networks. IEEE Access,6, 23722-23732. doi:10.1109/ACCESS.2018.2817593
  11. Nagi, J., Ducatelle, F., Di Caro, G. A., Cire ̧san, D., Meier, U., Giusti, A., . . .  Gambardella, L. M.  (2011).  Max-pooling convolutional neural networks for vision-based hand gesture recognition.  In 2011 ieee international conference on signal and image processing applications (icsipa) (p. 342-347).  doi: 10.1109/ICSIPA.2011.6144164
  12. Ruder, S.  (2016).  An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.
  13. Yu,  D.,  Wang,  H.,  Chen,  P.,  &  Wei,  Z.   (2014).   Mixed  pooling  for  convolutional  neural networks.   In International conference on rough sets and knowledge technology(pp.364–375).