Spatio-temporal Video Re-localization by Warp LSTM

Feng, Yang; Ma, Lin; Liu, Wei; Luo, Jiebo

Abstract:The need for efficiently finding the video content a user wants is increasing because of the erupting of user-generated videos on the Web. Existing keyword-based or content-based video retrieval methods usually determine what occurs in a video but not when and where. In this paper, we make an answer to the question of when and where by formulating a new task, namely spatio-temporal video re-localization. Specifically, given a query video and a reference video, spatio-temporal video re-localization aims to localize tubelets in the reference video such that the tubelets semantically correspond to the query. To accurately localize the desired tubelets in the reference video, we propose a novel warp LSTM network, which propagates the spatio-temporal information for a long period and thereby captures the corresponding long-term dependencies. Another issue for spatio-temporal video re-localization is the lack of properly labeled video datasets. Therefore, we reorganize the videos in the AVA dataset to form a new dataset for spatio-temporal video re-localization research. Extensive experimental results show that the proposed model achieves superior performances over the designed baselines on the spatio-temporal video re-localization task.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1905.03922 [cs.CV]
	(or arXiv:1905.03922v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1905.03922

Computer Science > Computer Vision and Pattern Recognition

Title:Spatio-temporal Video Re-localization by Warp LSTM

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators