Temporal Attentive Alignment for Video Domain Adaptation

Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Yoo, Ruxin Chen, Jian Zheng

Last updated on May 16, 2021

Overall illustration for cross-domain action recognition.

Motivation

Although various image-based domain adaptation (DA) techniques have been proposed in recent years, domain shift in videos is still not well-explored. Furthermore, there do not exist well-organized datasets to evaluate and benchmark the performance of DA algorithms for videos. Therefore, new datasets and approaches for video DA are desired.

Challenges

Videos can suffer from domain shift along both the spatial and temporal directions, bringing the need of alignment for embedded feature spaces along both directions. However, most DA approaches focus on spatial direction only.

Our Approaches

We propose Temporal Attentive Adversarial Adaptation Network (TA³N) to simultaneously attend, align and learn temporal dynamics across domains. TA³N contains the following components:

TemRelation module: learn various local temporal features embedded with relation information in different temporal scales
Domain attention mechanism: align local features with larger domain discrepancy (i.e. contribute more to the overall domain shift)
- Assign local features with attention weights calculated using domain entropy

The domain attention mechanism in TA³N (Thicker arrows correspond to larger attention weights).

Results

We evaluate both small- and large-scale datasets, and achieve the state-of-the-art performance:

Quantitative results:
The comparison of accuracy (%) with other approaches on UCF-HMDB_full (UCF –> HMDB).
Qualitative results:
The comparison of t-SNE visualization between image-based DA and our approach (blue: source, orange: target).

Please check our paper for more results.

Videos

ICCV'19 Oral Presentation (please turn on closed captions):

ICCV'19 Oral Presentation (officially recorded):

Resources

Papers & Code

ICCV'19	CVPRW'19

Presentations

Slides (ICCV'19)	Poster (ICCV'19)

Datasets

We propose two large-scale cross-domain action recognition datasets, UCF-HMDB_full and Kinetics-Gameplay.

The snapshots in the *UCF-HMDB_full* dataset.

The snapshots in the *Kinetics-Gameplay* dataset.

To download the dataset, please visit our GitHub.
Feel free to check our paper for more dataset details.

If you find this project useful, please cite our papers:

Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Yoo, Ruxin Chen, and Jian Zheng, “Temporal Attentive Alignment for Large-Scale Video Domain Adaptation”, IEEE International Conference on Computer Vision (ICCV), 2019 [Oral (acceptance rate: 4.6%), travel grant awarded].
Min-Hung Chen, Zsolt Kira, and Ghassan AlRegib, “Temporal Attentive Alignment for Video Domain Adaptation”, CVPR Workshop on Learning from Unlabeled Videos (LUV), 2019.

BibTex

@inproceedings{chen2019temporal,
  title={Temporal attentive alignment for large-scale video domain adaptation},
  author={Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan and Yoo, Jaekwon and Chen, Ruxin and Zheng, Jian},
  booktitle={IEEE International Conference on Computer Vision (ICCV)},
  year={2019}
}

@article{chen2019taaan,
  title={Temporal Attentive Alignment for Video Domain Adaptation},
  author={Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan},
  journal={CVPR Workshop on Learning from Unlabeled Videos},
  year={2019},
  url={https://arxiv.org/abs/1905.10861}
}

Members

¹Georgia Institute of Technology ²Sony Interactive Entertainment LLC ³Binghamton University
*work partially done as a SIE intern

Min-Hung Chen¹*	Zsolt Kira¹	Ghassan AlRegib¹	Jaekwon Yoo²	Ruxin Chen²	Jian Zheng³*