Nhat Anh Phung Tuan

Nhat Anh Phung Tuan

Master's Thesis

Evaluation of NeVA in Predicting Radiologist’s Eye Movement on Chest X-ray Data


Dr. Dario Zanca, Thomas Altstidl (M. Sc.), Prof. Dr. Björn Eskofier


03 / 2022 – 09 / 2023


Chest radiography (chest X-ray or CXR) is the most commonly performed diagnostic examination in the world [1]. It is typically used to identify acute and chronic cardiopulmonary conditions, to verify that devices such as pacemakers, central lines, and tubes are correctly positioned, and to assist in related medical workups. Eye tracking in radiology has been extensively studied for the purpose of education, perception understanding and fatigue measurement [2], [3], [4], [5].
In Deep Learning, CXR is used for multiple disease classification (e.g. pneumonia, tuberculosis), segmentation (e.g. lung, thorax, heart), localization of abnormalities, transfer learning for tuberculosis detection, etc. [6]. Eye-gaze in radiology has been researched in combination with CXR to improve segmentation [8] and disease classification [7], [9]. Recently, a public dataset Eye-gaze Data for Chest X-ray [10] was released on PhysioNet [11], which contains eye-gaze movement of a professional radiologist interpreting front chest radiographs from MIMIC-CXR [12]. The dataset focuses on two clinically prevalent and high impact diseases, pneumonia and congestive heart failure (CHF), together with normal cases as comparison class. The publication also contains preliminary models showcasing the effectiveness of combining eye-gaze data with CXR images to improve classification accuracy.
Neural Visual Attention (NeVA) [13] is a neural network model proposed with the purpose of generating visual scanpaths in a top-down manner. With NeVA, human-like scanpaths are generated without the model training directly for such an objective. NeVA consists of 3 main building blocks: a differentiable foveation mechanism, a task model, and an attention mechanism. The foveation mechanism aims to simulate the human foveated vision (i.e., a center of high visual acuity and a coarse resolution in the periphery); the task model is a neural network pre-trained on a visual downstream task; finally, the attention mechanism selects the next location of interest depending on the current perceived stimulus that best solves the task model’s resolution. By using NeVA on CXR data, we aim to evaluate how similar (or dissimilar) NeVA scanpaths are to a professional radiologist’s eye movement when reading CXR. Specifically, we intend to use two of the three models proposed in the same publication [10] during our experiments:
• Baseline model (BM): a convolutional neural network trained on CXR images for classification.
• Temporal heatmap model (THM): the baseline model enriched by features extracted from the radiologist eye-tracking data.
Because NeVA can use gradient information from any model to generate scanpath, we believe the paths generated by using a specialized model, such as the BM, could behave similarly to those of an expert. Furthermore, we intend to test if injecting attention from NeVA as temporal information on the THM could improve the accuracy of the classification network.


[1] Mettler F. A., et al. : Radiologic and Nuclear Medicine Studies in the United States and Worldwide: Frequency, Radiation Dose, and Comparison with Other Radiation Sources-1950-2007, Radiology 253, 520-531, 2009.
[2] Stephen A. W., et al. : Analysis of perceptual expertise in radiology – Current knowledge and a new perspective, Frontiers in human neuroscience 13, 213, 2019.
[3] v.d Gijp A., et al. : How visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology., Advances in Health Sciences Education, 1-23, 2017.
[4] Krupinski E. A., et al. : Current perspectives in medical image perception., Attention, Perception, & Psychophysics, 72(5), 1205-1217, 2010.
[5] Tourassi G.,et al. : Investigating the link between radiologists’ gaze, diagnostic decision, and image content., J Am Med Inform Assoc 20(6):1067-1075, 2013.
[6] Erdi C., et al. : Deep learning for chest X-ray analysis: A survey. Med Image Anal Volume 72, 102125, ISSN 1361-8415, 2021.
[7] Khosravan N., et al. : A collaborative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional model, and deep learning., Med Image Anal 51:101-115, 2019.
[8] Stember J.N., et al. : Eye Tracking for Deep Learning Segmentation Using Convolutional Neural Networks. J Digit Imaging 32, 597-604, 2019.
[9] Aresta G., et al. : Automatic lung nodule detection combined with gaze information improves radiologists’ screening performance., IEEE Journal of Biomedical and Health Informatics, 2020.
[10] Karargyris A., et al. : Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development., Sci Data 8, 92, 2021.
[11] Goldberger A., et al. : PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals., Circulation [Online], 101 (23), pp. e215-e220, 2000.
[12] Johnson, A.E.W., et al. : MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports., Sci Data 6, 317, 2019.
[13] Leo S., et al. : Behind the Machine’s Gaze: Neural Networks with Biologically-inspired Constraints Exhibit Human-like Visual Attention., Transactions on Machine Learning Research, 2835-8856, 2022.