A study on multimodal and interactive explanations for visual question answering

Published in SafeAI @ AAAI 2020, 2020

Recommended citation: Alipour, K., Schulze, J. P., Yao, Y., Ziskind, A., & Burachas, G. (2020). A study on multimodal and interactive explanations for visual question answering. arXiv preprint arXiv:2003.00431. https://arxiv.org/pdf/2003.00431.pdf

This paper evaluates multimodal explanations in the setting of a Visual Question Answering (VQA) task, by asking users to predict the response accuracy of a VQA agent with and without explanations. We use between-subjects and within-subjects experiments to probe explanation effectiveness in terms of improving user prediction accuracy, confidence, and reliance, among other factors.