Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions: Transformer for Improved Generalization
Published in Ophthalmology Science, 2022
The paper describes a study that evaluates the diagnostic accuracy and explainability of a Vision Transformer deep learning technique called Data-efficient image Transformer (DeiT), compared to ResNet-50, in detecting primary open-angle glaucoma (POAG) using fundus photographs. POAG is a leading cause of irreversible blindness worldwide, and early diagnosis is essential to prevent vision loss. The study used a dataset of 66,715 fundus photographs from 1,636 participants in the Ocular Hypertension Treatment Study (OHTS) and an additional 5 external datasets of 16,137 photographs of healthy and glaucoma eyes. Data-efficient image Transformer models were trained to detect 5 ground-truth OHTS POAG classifications: OHTS end point committee POAG determinations because of disc changes (model 1), visual field (VF) changes (model 2), or either disc or VF changes (model 3) and Reading Center determinations based on disc (model 4) and VFs (model 5). The study found that compared with the best-performing ResNet-50 models, the DeiT models demonstrated similar diagnostic performance on the OHTS test sets for all 5 ground-truth POAG labels, with AUROC ranging from 0.82 (model 5) to 0.91 (model 1). However, Data-efficient image Transformer AUROC was consistently higher than ResNet-50 on the 5 external datasets, suggesting that DeiT has better generalizability. For example, AUROC for the main OHTS end point (model 3) was between 0.08 and 0.20 higher in the DeiT than ResNet-50 models. The study also evaluated the explainability of the DeiT and ResNet-50 models by comparing the attention maps derived directly from DeiT to 3 gradient-weighted class activation map strategies. The saliency maps from the DeiT highlight localized areas of the neuroretinal rim, suggesting important rim features for classification, while the ResNet-50 maps show a more diffuse, generalized distribution around the optic disc. The authors conclude that Vision Transformers, such as DeiT, have the potential to improve generalizability and explainability in deep learning models for detecting eye disease and possibly other medical conditions that rely on imaging for clinical diagnosis and management. The study’s findings suggest that DeiT is a promising deep learning model for detecting POAG from fundus photographs and may have important clinical implications for the early diagnosis and management of this condition.
Recommended citation: Fan, Rui, Kamran Alipour, Christopher Bowd, Mark Christopher, Nicole Brye, James A. Proudfoot, Michael H. Goldbaum et al. Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions: Transformer for Improved Generalization. Ophthalmology Science 3, no. 1 (2023): 100233. https://www.sciencedirect.com/science/article/pii/S2666914522001221