Where Is My Eye? Spatial-Temporal Modeling for Event-Based Eye Tracking (IJHCI 2026)

Abstract

Eye tracking is critical for VR/AR applications, requiring low-latency, high-frequency systems due to rapid eye movements. Event cameras, with high temporal resolution and dynamic range, are well suited for this task. However, most existing event-based eye tracking (EET) methods focus on discrete event features while ignoring eye structural information, limiting accuracy and practicality. Therefore, this paper presents a global-local spatiotemporal modeling scheme that leverages both eye structure and event characteristics. Specifically, we first propose a multi-scale information extraction module to derive rich local structural features such as eyelid morphology, which facilitates adaptation to scale variations of the eye. Given events’ intrinsic discreteness, we further propose a module for modeling long-range feature dependencies to mitigate the challenges posed by sparse spatial information. These two modules can be seamlessly integrated with LSTM to extract spatiotemporal features. Extensive experiments on different event-based datasets validate that the proposed approach surpasses existing state-of-the-art methods.

Publication
IJHCI 2026