Selected Publications

Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions: Transformer for Improved Generalization

Published in Ophthalmology Science, 2022

The paper describes a study that evaluates the diagnostic accuracy and explainability of a Vision Transformer deep learning technique called Data-efficient image Transformer (DeiT), compared to ResNet-50, in detecting primary open-angle glaucoma (POAG) using fundus photographs. POAG is a leading cause of irreversible blindness worldwide, and early diagnosis is essential to prevent vision loss. The study used a dataset of 66,715 fundus photographs from 1,636 participants in the Ocular Hypertension Treatment Study (OHTS) and an additional 5 external datasets of 16,137 photographs of healthy and glaucoma eyes. Data-efficient image Transformer models were trained to detect 5 ground-truth OHTS POAG classifications: OHTS end point committee POAG determinations because of disc changes (model 1), visual field (VF) changes (model 2), or either disc or VF changes (model 3) and Reading Center determinations based on disc (model 4) and VFs (model 5). The study found that compared with the best-performing ResNet-50 models, the DeiT models demonstrated similar diagnostic performance on the OHTS test sets for all 5 ground-truth POAG labels, with AUROC ranging from 0.82 (model 5) to 0.91 (model 1). However, Data-efficient image Transformer AUROC was consistently higher than ResNet-50 on the 5 external datasets, suggesting that DeiT has better generalizability. For example, AUROC for the main OHTS end point (model 3) was between 0.08 and 0.20 higher in the DeiT than ResNet-50 models. The study also evaluated the explainability of the DeiT and ResNet-50 models by comparing the attention maps derived directly from DeiT to 3 gradient-weighted class activation map strategies. The saliency maps from the DeiT highlight localized areas of the neuroretinal rim, suggesting important rim features for classification, while the ResNet-50 maps show a more diffuse, generalized distribution around the optic disc. The authors conclude that Vision Transformers, such as DeiT, have the potential to improve generalizability and explainability in deep learning models for detecting eye disease and possibly other medical conditions that rely on imaging for clinical diagnosis and management. The study’s findings suggest that DeiT is a promising deep learning model for detecting POAG from fundus photographs and may have important clinical implications for the early diagnosis and management of this condition.

Recommended citation: Fan, Rui, Kamran Alipour, Christopher Bowd, Mark Christopher, Nicole Brye, James A. Proudfoot, Michael H. Goldbaum et al. Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions: Transformer for Improved Generalization. Ophthalmology Science 3, no. 1 (2023): 100233. https://www.sciencedirect.com/science/article/pii/S2666914522001221

Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces

Published in Arxiv, 2022

Despite their high accuracies, modern complex image classifiers cannot be trusted for sensitive tasks due to their unknown decision-making process and potential biases. Counterfactual explanations are very effective in providing transparency for these black-box algorithms. Nevertheless, generating counterfactuals that can have a consistent impact on classifier outputs and yet expose interpretable feature changes is a very challenging task. We introduce a novel method to generate causal and yet interpretable counterfactual explanations for image classifiers using pretrained generative models without any re-training or conditioning. The generative models in this technique are not bound to be trained on the same data as the target classifier. We use this framework to obtain contrastive and causal sufficiency and necessity scores as global explanations for black-box classifiers. On the task of face attribute classification, we show how different attributes influence the classifier output by providing both causal and contrastive feature attributions, and the corresponding counterfactual images.

Recommended citation: Alipour, K., Lahiri, A., Adeli, E., Salimi, B., Pazzani, M. (2022). Improving users' Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces. https://arxiv.org/abs/2206.05257

Improving users’ mental model with attention‐directed counterfactual edits

Published in Applied AI Letters, 2021

In the domain of visual question answering (VQA), studies have shown improvement in users’ mental model of the VQA system when they are exposed to examples of how these systems answer certain image‐question (IQ) pairs. In this work, we show that showing controlled counterfactual IQ examples are more effective at improving the mental model of users as compared to simply showing random examples. We compare a generative approach and a retrieval‐based approach to show counterfactual examples. We use recent advances in generative adversarial networks to generate counterfactual images by deleting and inpainting certain regions of interest in the image.

Recommended citation: Alipour, K., Ray, A., Lin, X., Cogswell, M., Schulze, J. P., Yao, Y., & Burachas, G. T. (2021). Improving users' mental model with attention‐directed counterfactual edits. Applied AI Letters, e47. https://onlinelibrary.wiley.com/doi/pdf/10.1002/ail2.47

The impact of explanations on AI competency prediction in VQA

Published in 2020 IEEE International Conference on Humanized Computing and Communication with Artificial Intelligence (HCCAI), 2020

In this paper, we evaluate the impact of explanations on the user’s mental model of AI agent competency within the task of visual question answering (VQA).

Recommended citation: Alipour, K., Ray, A., Lin, X., Schulze, J. P., Yao, Y., & Burachas, G. T. (2020, September). The impact of explanations on AI competency prediction in VQA. In 2020 IEEE International Conference on Humanized Computing and Communication with Artificial Intelligence (HCCAI) (pp. 25-32). IEEE. https://ieeexplore.ieee.org/abstract/document/9230378

Deep learning improves contrast in low-fluence photoacoustic imaging

Published in Biomedical optics express, 2020

This paper proposes a denoising method using a multi-level wavelet-convolutional neural network to map low fluence illumination source images to its corresponding high fluence excitation map.

Recommended citation: Hariri, A., Alipour, K., Mantri, Y., Schulze, J. P., & Jokerst, J. V. (2020). Deep learning improves contrast in low-fluence photoacoustic imaging. Biomedical optics express, 11(6), 3360-3373. https://opg.optica.org/DirectPDFAccess/21EB20FE-AFEF-45DD-B143A1B1EBE3ECFC_432226/boe-11-6-3360.pdf?da=1&id=432226&seq=0&mobile=no

A study on multimodal and interactive explanations for visual question answering

Published in SafeAI @ AAAI 2020, 2020

This paper evaluates multimodal explanations in the setting of a Visual Question Answering (VQA) task, by asking users to predict the response accuracy of a VQA agent with and without explanations. We use between-subjects and within-subjects experiments to probe explanation effectiveness in terms of improving user prediction accuracy, confidence, and reliance, among other factors.

Recommended citation: Alipour, K., Schulze, J. P., Yao, Y., Ziskind, A., & Burachas, G. (2020). A study on multimodal and interactive explanations for visual question answering. arXiv preprint arXiv:2003.00431. https://arxiv.org/pdf/2003.00431.pdf