Min-Hung Chen

Min-Hung Chen

Senior Research Scientist

NVIDIA Research

About Me

My name is Min-Hung (Steve) Chen (陳敏弘 in Chinese). I am a Senior Research Scientist at NVIDIA Research Taiwan, working on Vision+X Multi-Modal AI. I received my Ph.D. degree from Georgia Tech, advised by Prof. Ghassan AlRegib and in collaboration with Prof. Zsolt Kira. Before joining NVIDIA, I was working on Biometric Research for Cognitive Services as a Research Engineer II at Microsoft Azure AI, and was working on Edge-AI Research as a Senior AI Engineer at MediaTek, respectively.

My research interest is mainly Multi-Modal AI, including Vision-Language, Video Understanding, Cross-Modal Learning, Efficient Tuning, and Transformer. I am also interested in Learning without Fully Supervision, including domain adaptation, transfer learning, continual learning, X-supervised learning, etc.

[Recruiting] NVIDIA Taiwan is hiring Research Scientist (fulltime & internship). I am also open to research collaboration. Please drop me an email if you are interested in.


  • Transfer Learning
  • Unsupervised Learning
  • Video Understanding
  • Vision Transformer
  • Computer Vision
  • Deep Learning
  • Machine Learning


  • PhD in Electrical and Computer Engineering, 2020

    Georgia Institute of Technology

  • MSc in Integrated Circuits and Systems, 2012

    National Taiwan University

  • BSc in Electrical Engineering, 2010

    National Taiwan University


Work Experience


Senior Research Scientist

NVIDIA Research

Oct 2022 – Present Taipei, Taiwan
Vision+X Multi-Modal AI

Research Engineer II


Jan 2022 – Oct 2022 Taipei, Taiwan
Cutting-edge AI research for generalizable and explainable facial liveness approaches
Deploy research approaches to next-generation cloud service solutions

Senior AI Engineer

MediaTek Inc.

Oct 2020 – Dec 2021 Hsinchu, Taiwan
Research and develop cutting-edge methodologies for Edge-AI
Coordinate academic-industry collaboration for EcoSystem (e.g. co-host CVPR'21 workshop)

Research Intern

Baidu USA

May 2019 – Dec 2019 Sunnyvale, CA, US
Cross-domain action segmentation with self-supervised learning

Research Intern


May 2018 – Aug 2018 San Mateo, CA, US
Cross-domain action recognition

Deep Learning Engineer Intern


Aug 2017 – Dec 2017 San Francisco, CA, US
Vision-based autonomous retail store

Ph.D. Research

Georgia Institute of Technology

Aug 2014 – Aug 2020 Atlanta, GA, US
Video understanding beyond fully supervision
Human action understanding
Robust machine learning for autonomous vehicle

Research Assistant

Academia Sinica

Jul 2013 – Jul 2014 Taipei City, Taiwan
Multi-modal action recognition



Ultimate Awesome Transformer Attention

An ultimately comprehensive paper list of Vision Transformer and Attention, including papers, codes, and related websites.

Vision-based Autonomous Retail Store

Deep Learning and Computer Vision system for real-time autonomous retail stores using only RGB cameras.

Deep Learning for Smartphone ISP

The Learned Smartphone ISP Challenge for the CVPR 2021 MAI Workshop.

Action Segmentation with Temporal Domain Adaptation

Cross-domain action segmentation by aligning temporal feature spaces.

Activity Recognition with RNN and Temporal-ConvNet

Two methods (TS-LSTM and Temporal-Inception) to exploit spatiotemporal dynamics for activity recognition.

Temporal Attentive Alignment for Video Domain Adaptation

Cross-domain action recognition with new datasets and novel video-based DA approaches.

Traffic Sign Detection under Challenging Conditions

A large-scale traffic sign detection dataset with various challenging conditions.

Professional Activities

Competition committees

Professional Talks

Conference reviewers

  • IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), including Workshop (CVPRW)
  • International Conference on Learning Representations (ICLR)
  • Advances in Neural Information Processing Systems (NeurIPS)
  • IEEE/CVF International Conference on Computer Vision (ICCV)
  • International Conference on Machine Learning (ICML)
  • European Conference on Computer Vision (ECCV), including Workshop (ECCVW)
  • Association for the Advancement of Artificial Intelligence (AAAI)
  • IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
  • British Machine Vision Conference (BMVC)
  • IEEE International Conference on Image Processing (ICIP)
  • Asian Conference on Computer Vision (ACCV)
  • IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
  • IAPR International Conference on Pattern Recognition (ICPR)
  • IAPR International Conference on Image Analysis and Processing (ICIAP)
  • IEEE International Workshop on Multimedia Signal Processing (MMSP)
  • European Signal Processing Conference (EUSIPCO)

Journal reviewers

  • Elsevier Pattern Recognition (PR)
  • Springer International Journal of Computer Vision (IJCV)
  • IEEE Transactions on Intelligent Transportation Systems (TITS)
  • IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
  • IEEE Access

Recent & Upcoming Talks

Learned Smartphone ISP Challenge

10-min invited presentation for the MAI workshop at CVPR 2021

Bridging Distributional Discrepancy with Temporal Dynamics for Video Understanding

Invited talk by Dr. Jun-Cheng Chen at Academia Sinica

My Research Journey for Video Understanding

Invited talk by Prof. Yen-Yu Lin at NYCU

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

5-min invited presentation for the WebVision workshop at CVPR 2020

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation

5-min video for the Oral presentation in ICCV 2019

Selected Publications

Please see my Google Scholar for the complete publication list.
Quickly discover relevant content by filtering publications.

Learned Smartphone ISP on Mobile NPUs With Deep Learning, Mobile AI 2021 Challenge: Report

[CVPRW 2021] The report paper for The Learned Smartphone ISP challenge in the CVPR 2021 MAI Workshop.

Network Space Search for Pareto-Efficient Spaces

[CVPRW 2021 (Oral)] A novel AutoML paradigm to directly search for favorable network spaces automatically instead of a single architecture.

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

[CVPR 2020] Cross-domain action segmentation by aligning feature spaces across multiple temporal scales with self-supervised learning to reduce spatio-temporal variability.

Interpretable Self-Attention Temporal Reasoning for Driving Behavior Understanding

[ICASSP 2020] Driving behavior classification based on temporal and causal reasoning.

Action Segmentation with Mixed Temporal Domain Adaptation

[WACV 2020] Cross-domain action segmentation by aligning temporal feature spaces to reduce spatio-temporal variability.

Color learning

[US Patent] Color-component spatio-temporal learning for traffic sign detection.

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation

[ICCV 2019 (Oral)] Cross-domain action recognition with new datasets and novel attention-based DA approaches.

Traffic Sign Detection Under Challenging Conditions: A Deeper Look into Performance Variations and Spectral Characteristics

[TITS 2019] A large-scale traffic sign detection dataset with various challenging conditions.

TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition

[SPIC 2019] Simple but effective CNN- and RNN-based approaches to exploit temporal dynamics for videos.

Depth and Skeleton Associated Action Recognition without Online Accessible RGB-D Cameras

[CVPR 2014] Multi-modal adaptation with multiple kernel learning for action recognition.

Honors & Awards

  • 2023 EURASIP Best Paper Award for Image Communication Journal (Fall 2023)
  • Outstanding Reviewer for ICML (Summer 2022)
  • Outstanding Reviewer for ICCV (Fall 2021)
  • Outstanding Reviewer for CVPR (Summer 2021)
  • Student Travel Grant Award for ICCV (Fall 2019)
  • Ministry of Education Technologies Incubation Scholarship, Taiwan (Fall 2014 - Spring 2017)
  • Otto F. and Jenny H. Krauss Fellowship, Georgia Institute of Technology (Fall 2014 - Spring 2015)

Teaching Experience

Graduate Teaching Assistant

Georgia Institute of Technology

National Taiwan University

  • Statistical Image Processing (Spring 2012)
  • Computer Programming (Fall 2011)