Multi-domain Collaborative Feature Representation for Robust Visual Object Tracking (The Visual Computer 2021 (Proc. CGI 2021))


Jointly exploiting multiple different yet complementary domain information has been proven to be an effective way to perform robust object tracking. This paper focuses on effectively representing and utilizing complementary features from the frame domain and event domain for boosting object tracking performance in challenge scenarios. Specifically, we propose Common Features Extractor (CFE) to learn potential common representations from the RGB domain and event domain. For learning the unique features of the two domains, we utilize a Unique Extractor for Event (UEE) based on Spiking Neural Networks to extract edge cues in the event domain which may be missed in RGB in some challenging conditions, and a Unique Extractor for RGB (UER) based on Deep Convolutional Neural Networks to extract texture and semantic information in RGB domain. Extensive experiments on standard RGB benchmark and real event tracking dataset demonstrate the effectiveness of the proposed approach. We show our approach outperforms all compared state-of-the-art tracking algorithms and verify event-based data is a powerful cue for tracking in challenging scenes.

The Visual Computer (Proc. CGI)

RGB and Corresponding Stacked Event images

Results on OTB

Results on EED



Using ESIM on GOT-10k to generate dataset for training and testing.
Jiqing Zhang
Jiqing Zhang
Ph.D. student