Pointing-Guided Target Estimation via Transformer-Based Attention
Artificial Neural Networks and Machine Learning. ICANN 2025 International Workshops and Special Sessions,
Editors: Walter Senn, Marcello Sanguineti, Ausra Saudargiene, Igor V. Tetko, Alessandro E. P. Villa, Viktor Jirsa, Yoshua Bengio,
pages 85–97,
doi: 10.1007/978-3-032-04552-2_10
- Sep 2025
Deictic gestures, like pointing, are a fundamental form of non-verbal communication, enabling humans to direct attention to specific objects or locations. This capability is essential in Human-Robot Interaction (HRI), where robots should be able to predict human intent and anticipate appropriate responses. In this work, we propose the Multi-Modality Inter-TransFormer (MM-ITF), a modular architecture to predict objects in a controlled tabletop scenario with the NICOL robot, where humans indicate targets through natural pointing gestures. Leveraging inter-modality attention, MM-ITF maps 2D pointing gestures to object locations, assigns a likelihood score to each, and identifies the most likely target. Our results demonstrate that the method can accurately predict the intended object using monocular RGB data, thus enabling intuitive and accessible human-robot collaboration. To evaluate the performance, we introduce a patch confusion matrix, providing insights into the models predictions across candidate object locations. Code available at: https://github.com/lucamuellercode/MMITF.

@InProceedings{MAAGW25, author = {Müller, Luca and Ali, Hassan and Allgeuer, Philipp and Gajdošech, Lukáš and Wermter, Stefan}, title = {Pointing-Guided Target Estimation via Transformer-Based Attention}, booktitle = {Artificial Neural Networks and Machine Learning. ICANN 2025 International Workshops and Special Sessions}, journal = {}, editors = {Walter Senn, Marcello Sanguineti, Ausra Saudargiene, Igor V. Tetko, Alessandro E. P. Villa, Viktor Jirsa, Yoshua Bengio}, number = {}, volume = {}, pages = {85–97}, year = {2025}, month = {Sep}, publisher = {}, doi = {10.1007/978-3-032-04552-2_10}, }