Neural Networks for Detecting Irrelevant Questions during Visual Question Answering
International Conference on Artificial Neural Networks (ICANN),
Editors: Igor Farkaš, Paolo Masulli, Stefan Wermter,
Volume LNCS 12397,
pages 786-797,
doi: 10.1007/978-3-030-61616-8_63
- Oct 2020
Visual question answering (VQA) is a task to produce correct
answers to questions about images. When given an irrelevant question
to an image, existing models for VQA will still produce an answer rather
than predict that the question is irrelevant. This situation shows that
current VQA models do not truly understand images and questions. On
the other hand, producing answers for irrelevant questions can be misleading in real-world application scenarios. To tackle this problem, we
hypothesize that the abilities required for detecting irrelevant questions
are similar to those required for answering questions. Based on this hypothesis, we study what performance a state-of-the-art VQA network
can achieve when trained on irrelevant question detection. Then, we analyze the influences of reasoning and relational modeling on the task of
irrelevant question detection. Our experimental results indicate that a
VQA network trained on an irrelevant question detection dataset outperforms existing state-of-the-art methods by a big margin on the task
of irrelevant question detection. Ablation studies show that explicit reasoning and relational modeling benefits irrelevant question detection. At
last, we investigate a straight-forward idea of integrating the ability to
detect irrelevant questions into VQA models by joint training with extended VQA data containing irrelevant cases. The results suggest that
joint training has a negative impact on the models performance on the
VQA task, while the accuracy on relevance detection is maintained. In
this paper we claim that an efficient neural network designed for VQA
can achieve high accuracy on detecting relevance, however integrating
the ability to detect relevance into a VQA model by joint training will
lead to degradation of performance on the VQA task.
Keywords: Visual question answering · Irrelevant question detection ·
Multimodality · Deep neural networks.
@InProceedings{LWW20, author = {Li, Mengdi and Weber, Cornelius and Wermter, Stefan}, title = {Neural Networks for Detecting Irrelevant Questions during Visual Question Answering}, booktitle = {International Conference on Artificial Neural Networks (ICANN)}, editors = {Igor Farkaš, Paolo Masulli, Stefan Wermter}, number = {}, volume = {LNCS 12397}, pages = {786-797}, year = {2020}, month = {Oct}, publisher = {Springer}, doi = {10.1007/978-3-030-61616-8_63}, }