Course Overview
From LLMs to Multimodal Foundation Models and Large Physical Models
This course introduces multimodal foundation models as a key stage in the evolution of modern AI. Large language models mainly take text as input and generate text as output. Multimodal foundation models extend this paradigm by connecting language with images, audio, video, and other sensory signals, enabling AI systems to understand, generate, and reason across multiple forms of information.
Looking forward, multimodal foundation models will evolve into large physical models that understand geometry, motion, actions, and real-world constraints. The course covers frontier techniques beyond existing textbooks, including recent architectures, training methods, and future directions.
Course Team
Lectures
Lecture 14: 高阶多模态大模型
This lecture covers advanced multimodal large models, including large models for quantitative finance and physical AI models..
Download SlidesLecture 15: 大模型与Agent Skill:实战与经验
This lecture introduces the Practical Applications and Experience of foundation models and agent skills.
Download SlidesFinal Project
期末大项目
This section introduces the final project.
Final Project Guideline: The report must be written with the provided templates, either in English or Chinese.
Final Project AssignmentDownload English Template
Download Chinese Template
Assignments
Course Assignments
This section provides all homework assignments for the course.
Download Assignment 1: Transitioning to World Action Models
Download Assignment 2: Stability in Non-Contrastive Learning
Download Assignment 3: Instance Discrimination and Queue Dynamics
Download Assignment 4: In-Context Learning as Optimization
Download Assignment 5: Navigating the IQ Triangle in Transformers
Download Assignment 6: Evaluating Fine-Tuning Paradigms
Download Assignment 7: Masking Strategies in Self-Supervised Vision
Reviews
Review Materials
This section provides review materials for exams and course preparation.
Rebuttal Guideline: The rebuttal must be written in English and must not exceed one page.
Download ReviewDownload Rebuttal Template