CGU AICV Lab Computer Vision Lab of the Department of Artificial Intelligence at Chang Gung University

Advanced Computer Vision 2026

Spring 2026, Friday 1:10pm to 4:00pm, Classroom: The second Medical Science Building Room 104
Instructor: Chih-Yuan Yang

Course Information

This course is titled Advanced Computer Vision. However, the field is vast—in 2025 alone, CVPR saw over 2,800 papers published, not to mention specialized CV conferences like ICCV and WACV, or major AI venues like NeurIPS, ICLR, ICML, and AAAI. At an advanced level, we must shift our focus toward specific areas of expertise rather than attempting to cover fundamental knowledge. In this class, I want to guide students through the latest research papers to explore their new ideas, the problems they aim to solve, their current limitations, and their relevance to students’ own research. I expect students to answer these core questions: What is the paper proposing? Is the source code available? Are the results reproducible? How does their approach benefit your research? And finally, are there ways to improve upon their solutions? As this is a literature-heavy course, I assume students already possess foundational knowledge in Computer Vision. While I do not plan to lecture on basic concepts like pixels, color spaces, filters, or neural networks, I will step in to clarify concepts or provide necessary background if discussions become confusing or technical gaps arise.

Prerequisites

In this course, I want students to read the latest papers from top computer vision conferences and journals, which are the state-of-the-art research reports. Students need to present their findings, understanding, reproduced experimental results, and ideas for improvements in the classroom. By understanding those cutting-edge methods, students should gain knowledge and get some ideas for their own research. This course requires programming experience and fundamental knowledge of computer vision.

Syllabus

Week Date Topic Slides Recording Action
1 2/27       Holiday: Peace Memorial Day Compensation Day
2 3/6 Introduction to this course and the top computer vision conferences. Presented paper: 1: 2025 ICCVW Towards Proactive Social Robots: Distilling Visual Knowledge from Large Vision-Language Models 1:pptx 1:YouTube  
3 3/13 1: 2025 ICCV GeoFormer: Geometry Point Encoder for 3D Object Detection with Graph-based Transformer 2: 2026 arXiv LoopViT: Scaling Visual ARC with Looped Transformers 1:pptx 2:pdf    
4 3/20 1: 2025 IEEE TGRS OWRT-DETR: A Novel Real-Time Transformer Network for Small-Object Detection in Open-Water Search and Rescue From UAV Aerial Imagery 2: 2025 ICCV From Gaze to Movement: Predicting Visual Attention for Autonomous Driving Human-Machine Interaction based on Programmatic Imitation Learning 1:pptx 2:Google slides    
5 3/27 1: 2026 ICRL Sapiens2 2: 2025 WACV Memoire: Learning User Personas from Gallery Tags for Personalized Photo Curation 1:pptx 2:pptx 1:YouTube  
6 4/3       Holiday: Children’s Day
7 4/10 1: 2025 CVPR Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized? 2: 2025 ICLR VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning 1:pptx 2:pdf   Term project proposal
8 4/17 1: 2025 CVPR Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection 2: 2025 ICCV Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths 1:pptx 2:Google Slides    
9 4/24 1-1: 2025 ICCV Scaling Action Detection: AdaTAD++ with Transformer-Enhanced Temporal-Spatial Adaptation 1-2: 2026 CVPR Findings Track TP2-DETR: Unlocking Deformable DETR for Zero-Shot Temporal Action Proposal Generation with Temporal Feature Pyramids 2: 2025 CVPR Free Lunch Enhancements for Multi-modal Crowd Counting 1:pptx 2:pptx 1:YouTube  
10 5/1       Holiday: Labor Day
11 5/8 Midterm presentations      
12 5/14 1: 2025 ICCV Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections 2: 2026 ICML Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency 1:pptx 2:pdf    
12 5/15 1: LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models 2: 2025 CVPR Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues 1:pptx 2:Google Slides    
13 5/22 1: 2025 CVPR MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps 2: Introduction to research tools and skills 1:pptx 2:pptx    
14 5/29 Introduction to Latex, discussion for term project challenges and solutions 1:pptx 1:YouTube  
16 6/12 Term project presentation and final report      


Term Project Topics, Slides, and Reports

Topic Slides Report Code
       


Textbook

We do not have a textbook because no textbook can cover the findings reported by latest research papers. An evoling large language model is more useful than a textbook for you to retrive new knowledge.

Reference Books

Existing Full-length Course Lecture Recordings

Existing Online Lecture Videos for Computer Vision Knowledge Points

Existing Computer Vision Course Slides for Self-Learning

Grading

Your final grade will be made up from

  • 50% Your paper presentations in the classroom
  • 10% Discussion participation in the classroom
  • 40% Term project, including proposal (5%), midterm presentation (10%), term project presentation (15%), and term project report (10%). Maximum 5 members each group.
  • late policy
    I do not have a strict late policy because there are only a few students taking this course. I will directly ask students why I do not see their submissions via Teams messages.

Contact Info and Office Hour

Chih-Yuan Yang: cyyang@cgu.edu.tw
Office hours: Tue 10:30~11:30 Management Building Room 1416