Min-Hung Chen

Senior Research Scientist

NVIDIA Research

About Me

My name is Min-Hung (Steve) Chen (陳敏弘 in Chinese). I am a Senior Research Scientist at NVIDIA Research Taiwan, working on Vision+X Multi-Modal AI. I received my Ph.D. degree from Georgia Tech, advised by Prof. Ghassan AlRegib and in collaboration with Prof. Zsolt Kira. Before joining NVIDIA, I was working on Biometric Research for Cognitive Services as a Research Engineer II at Microsoft Azure AI, and was working on Edge-AI Research as a Senior AI Engineer at MediaTek, respectively.

My research interest is mainly Multi-Modal AI, including Vision-Language, Video Understanding, Cross-Modal Learning, Efficient Tuning, and Transformer. I am also interested in Learning without Fully Supervision, including domain adaptation, transfer learning, continual learning, X-supervised learning, etc.

[Recruiting] NVIDIA Taiwan is hiring Research Scientist (fulltime & internship). I am also open to research collaboration. Please drop me an email if you are interested in.

[Note] The Projects, Talks, and Publications Sections are out of date. Please mainly check the News Section.

Interests

Transfer Learning
Unsupervised Learning
Video Understanding
Vision Transformer
Computer Vision
Deep Learning
Machine Learning

Education

PhD in Electrical and Computer Engineering, 2020
Georgia Institute of Technology
MSc in Integrated Circuits and Systems, 2012
National Taiwan University
BSc in Electrical Engineering, 2010
National Taiwan University

News

Jun. 2025: Our papers "HERMES" ( Code ) and "LongSplat" are accepted to ICCV 2025!! See you at Hawaii!!
May 2025: I am serving as an organizer for The Workshop on Ego-Exo Sensing for Smart Mobility (X-Sense) @ ICCV 2025.
Apr. 2025: I will be serving as a workshop reviewer for Tiny Titans: The next wave of On-Device Learning for Foundational Models (TTODLer-FM) @ ICML 2025 and Representation Learning with Very Limited Resources: When Data, Modalities, Labels, and Computing Resources are Scarce (LIMIT) @ ICCV 2025, respectively.
Apr. 2025: Our "V2V-LLM" work is accepted to CVPR 2025 Workshops (Best Paper in T4V and Oral in DriveX)!!
Mar. 2025: I am serving as an organizer for The Workshop on Transformers for Vision (T4V) @ CVPR 2025.
Mar. 2025: I am selected as an outstanding reviewer for the SCOPE workshop @ ICLR 2025.
Feb. 2025: Our papers "Omni-RGPT" ( Website ) and "AuraFusion360" ( Website ) are accepted to CVPR 2025!!
Feb. 2025: I will be serving as a workshop reviewer for Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo) @ CVPR 2025.
Jan. 2025: Our papers "SANER" ( Website ) and "Hymba" ( Code & Hugging Face & NV Blog ) are accepted to ICLR 2025!!
Jan. 2025: I am serving as a journal reviewer for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
Jan. 2025: I will be serving as a workshop reviewer for Scalable Optimization for Efficient and Adaptive Foundation Models (SCOPE) @ ICLR 2025.
Oct. 2024: Four papers, "SemPLeS", "ST-CLIP", "CorrFill" and "ORFormer", are accepted to WACV 2025!!
Sep. 2024: Our paper "DRAIL" ( Code & Website ) is accepted to NeurIPS 2024 and ICLR 2024 Workshop (Gen4AIDM)!!
Jun. 2024: Our "GroPrompt" paper is accepted to CVPR 2024 Workshop (CVinW)!!
May 2024: Our paper "DoRA" ( Code & Website & NV Blog ) is accepted to ICML 2024 as Oral (acceptance rate: 1.5%)!!!
Apr. 2024: I will be serving as a workshop reviewer for Transformers for Vision (T4V) Workshop @ CVPR 2024.
Feb. 2024: Two papers, "CoDe" and "PartDistill", are accepted to CVPR 2024!!
Feb. 2024: The paper list for Vision Transformer/Attention has obtained 4000+ stars!!
Sep. 2023: Our paper "TS-LSTM and temporal-inception" ( Blog & arXiv & Code) received the 2023 EURASIP Best Paper Award for Image Communication Journal!!
Jul. 2023: Two papers, "CEVR" and "MIT", are accepted to ICCV 2023!! See you in Paris!
Jun. 2023: Our "QuAVF" work is selected as the 1st place winner of the CVPR 2023 Ego4D Challenge in the Audio-Visual Social Understanding: Talking to me track!!
Apr. 2023: I will be serving as a workshop reviewer for Transformers for Vision (T4V) Workshop @ CVPR 2023.
Apr. 2023: Our "GAIN" paper is accepted to CVPR 2023 Workshop (Biometrics) with the Best Paper Award!!
Nov. 2022: I joined the Taipei Team at NVIDIA Research as a Senior Research Scientist, working on Vision+X Multi-Modal AI.
Oct. 2022: Our "HIT" paper is accepted to WACV 2023!!
Oct. 2022: Our "ROGUE" paper is accepted to BVMC 2022!!
Jul. 2022: I am selected as an outstanding reviewer for ICML 2022.
Apr. 2022: I released a comprehensive paper list for Vision Transformer/Attention to facilitate related research.
Jan. 2022: I joined the Face Science Team at Microsoft Azure AI as a Research Engineer II, working on Cutting-edge AI Research for Cognitive Services.
Sep. 2021: I am selected as an outstanding reviewer for ICCV 2021.
May. 2021: I am selected as an outstanding reviewer for CVPR 2021.
Jan. 2021: I am co-organizing the Learned Smartphone ISP Challenge in the Mobile AI (MAI) Workshop at CVPR 2021 with ETHZ! Please check the Project page for more details.
Oct. 2020: I joined the AI team at MediaTek Taiwan as a Senior AI Engineer, working on Deep Learning Research for Edge-AI.
Aug. 2020: I officially obtained my Ph.D. degree from Georgia Tech!!! (Feel free to check my Ph.D. Dissertation for more details)

Work Experience

Senior Research Scientist

NVIDIA Research

Oct 2022 – Present Taipei, Taiwan

Vision+X Multi-Modal AI

Research Engineer II

Microsoft

Jan 2022 – Oct 2022 Taipei, Taiwan

Cutting-edge AI research for generalizable and explainable facial liveness approaches
Deploy research approaches to next-generation cloud service solutions

Senior AI Engineer

MediaTek Inc.

Oct 2020 – Dec 2021 Hsinchu, Taiwan

Research and develop cutting-edge methodologies for Edge-AI
Coordinate academic-industry collaboration for EcoSystem (e.g. co-host CVPR'21 workshop)

Research Intern

Baidu USA

May 2019 – Dec 2019 Sunnyvale, CA, US

Cross-domain action segmentation with self-supervised learning

Research Intern

PlayStation

May 2018 – Aug 2018 San Mateo, CA, US

Cross-domain action recognition

Deep Learning Engineer Intern

Aipoly

Aug 2017 – Dec 2017 San Francisco, CA, US

Vision-based autonomous retail store

Ph.D. Research

Georgia Institute of Technology

Aug 2014 – Aug 2020 Atlanta, GA, US

Video understanding beyond fully supervision
Human action understanding
Robust machine learning for autonomous vehicle

Research Assistant

Academia Sinica

Jul 2013 – Jul 2014 Taipei City, Taiwan

Multi-modal action recognition

Projects

Ultimate Awesome Transformer Attention

An ultimately comprehensive paper list of Vision Transformer and Attention, including papers, codes, and related websites.

Vision-based Autonomous Retail Store

Deep Learning and Computer Vision system for real-time autonomous retail stores using only RGB cameras.

Deep Learning for Smartphone ISP

The Learned Smartphone ISP Challenge for the CVPR 2021 MAI Workshop.

Action Segmentation with Temporal Domain Adaptation

Cross-domain action segmentation by aligning temporal feature spaces.

Activity Recognition with RNN and Temporal-ConvNet

Two methods (TS-LSTM and Temporal-Inception) to exploit spatiotemporal dynamics for activity recognition.

Temporal Attentive Alignment for Video Domain Adaptation

Cross-domain action recognition with new datasets and novel video-based DA approaches.

Traffic Sign Detection under Challenging Conditions

A large-scale traffic sign detection dataset with various challenging conditions.

Professional Activities

Organizers

Professional Talks

May. 2025: Invited talk at NTHU, Taiwan (Topic: Multimodal AI Research at NVIDIA Taiwan).
May. 2024: Invited talk at NTHU, Taiwan (Topic: Multimodal AI Research at NVIDIA Taiwan).
May. 2023: Invited talk at NYCU, Taiwan (Topic: My Research Journey: TW x US x Academics x Industry).
Jun. 2021: Invited talk at CVPR MAI Workshop 2021 (Topic: Learned Smartphone ISP Challenge: Results and Top Solutions).
May. 2021: Invited talk at Academia Sinica, Taiwan (Topic: Bridging Distributional Discrepancy with Temporal Dynamics for Video Understanding).
Jan. 2021: Invited talk at NYCU, Taiwan (Topic: My Research Journey for Video Understanding).
Publication talks at CVPR2020 and ICCV2019.

Conference reviewers

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), including Workshop (CVPRW)
International Conference on Learning Representations (ICLR), including Workshop (ICLRW)
Advances in Neural Information Processing Systems (NeurIPS)
IEEE/CVF International Conference on Computer Vision (ICCV), including Workshop (ICCVW)
International Conference on Machine Learning (ICML), including Workshop (ICMLW)
European Conference on Computer Vision (ECCV), including Workshop (ECCVW)
Association for the Advancement of Artificial Intelligence (AAAI)
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
British Machine Vision Conference (BMVC)
IEEE International Conference on Image Processing (ICIP)
Asian Conference on Computer Vision (ACCV)
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
IAPR International Conference on Pattern Recognition (ICPR)
IAPR International Conference on Image Analysis and Processing (ICIAP)
IEEE International Workshop on Multimedia Signal Processing (MMSP)
European Signal Processing Conference (EUSIPCO)

Journal reviewers

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Elsevier Pattern Recognition (PR)
Springer International Journal of Computer Vision (IJCV)
IEEE Transactions on Intelligent Transportation Systems (TITS)
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
IEEE Access

Recent & Upcoming Talks

Learned Smartphone ISP Challenge

10-min invited presentation for the MAI workshop at CVPR 2021

Jun 20, 2021 8:50 AM — 9:00 AM Virtual

Project Video

Bridging Distributional Discrepancy with Temporal Dynamics for Video Understanding

Invited talk by Dr. Jun-Cheng Chen at Academia Sinica

May 3, 2021 2:00 PM — 4:00 PM Auditorium 122 at CITI

Slides Poster @ AS

My Research Journey for Video Understanding

Invited talk by Prof. Yen-Yu Lin at NYCU

Jan 7, 2021 10:10 AM — 11:00 AM EC114 at NYCU

Slides Poster @ NYCU

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

5-min invited presentation for the WebVision workshop at CVPR 2020

Jun 15, 2020 10:41 AM — 10:45 AM Virtual

Project Slides Video

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation

5-min video for the Oral presentation in ICCV 2019

Oct 31, 2019 3:05 PM — 3:10 PM COEX Convention Center

Project Slides Video

Featured Publications

Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira

June 2020 In CVPR

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

[CVPR 2020] Cross-domain action segmentation by aligning feature spaces across multiple temporal scales with self-supervised learning to reduce spatio-temporal variability.

PDF Code Project Poster Slides DOI ArXiv Overview Talk 1-min Video 5-min Video CVF Open Access IEEE Xplore

Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Yoo, Ruxin Chen, Jian Zheng

October 2019 In ICCV [Oral]

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation

[ICCV 2019 (Oral)] Cross-domain action recognition with new datasets and novel attention-based DA approaches.

PDF Code Dataset Project Poster Slides DOI ArXiv Blog Talk Oral Video CVF Open Access IEEE Xplore

Selected Publications

Please see my Google Scholar for the complete publication list.

Quickly discover relevant content by filtering publications.

Learned Smartphone ISP on Mobile NPUs With Deep Learning, Mobile AI 2021 Challenge: Report

[CVPRW 2021] The report paper for The Learned Smartphone ISP challenge in the CVPR 2021 MAI Workshop.

Andrey Ignatov, Cheng-Ming Chiang, Hsien-Kai Kuo, Anastasia Sycheva, Radu Timofte, Min-Hung Chen, Man-Yu Lee, Yu-Syuan Xu, Yu Tseng

PDF Project ArXiv CVF Open Access IEEE Xplore

Network Space Search for Pareto-Efficient Spaces

[CVPRW 2021 (Oral)] A novel AutoML paradigm to directly search for favorable network spaces automatically instead of a single architecture.

Min-Fong Hong, Hao-Yun Chen, Min-Hung Chen, Yu-Syuan Xu, Hsien-Kai Kuo, Yi-Min Tsai, Hung-Jen Chen, Kevin Jou

PDF Video ArXiv CVF Open Access IEEE Xplore

Bridging Distributional Discrepancy with Temporal Dynamics for Video Understanding

[Thesis] My Ph.D. Dissertation at Georgia Tech.

Min-Hung Chen

Link @ GaTech

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

[CVPR 2020] Cross-domain action segmentation by aligning feature spaces across multiple temporal scales with self-supervised learning to reduce spatio-temporal variability.

Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, Zsolt Kira