Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions: Transformer for Improved Generalization

Published in Ophthalmology Science, 2022

Recommended citation: Fan, Rui, Kamran Alipour, Christopher Bowd, Mark Christopher, Nicole Brye, James A. Proudfoot, Michael H. Goldbaum et al. Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions: Transformer for Improved Generalization. Ophthalmology Science 3, no. 1 (2023): 100233. https://www.sciencedirect.com/science/article/pii/S2666914522001221

The paper compares the diagnostic accuracy and explainability of a Vision Transformer deep learning technique called Data-efficient image Transformer (DeiT) to ResNet-50, in detecting primary open-angle glaucoma (POAG) using fundus photographs. The study shows that DeiT has similar diagnostic performance to ResNet-50 on the OHTS test sets, but consistently higher performance on external datasets. The saliency maps from DeiT highlight localized areas of the neuroretinal rim, indicating important rim features for classification, while the ResNet-50 maps show a more diffuse, generalized distribution around the optic disc. The experiments show that Vision Transformers can improve generalizability and explainability in deep learning models for detecting eye disease and possibly other medical conditions that rely on imaging for clinical diagnosis and management.