Multimodal Foundation Models

Course Slides and Learning Materials

Towards Large Physical Models

Course Overview

From LLMs to Multimodal Foundation Models and Large Physical Models

Development path from large language models to multimodal foundation models and large physical models

This course introduces multimodal foundation models as a key stage in the evolution of modern AI. Large language models mainly take text as input and generate text as output. Multimodal foundation models extend this paradigm by connecting language with images, audio, video, and other sensory signals, enabling AI systems to understand, generate, and reason across multiple forms of information.

Looking forward, multimodal foundation models will evolve into large physical models that understand geometry, motion, actions, and real-world constraints. The course covers frontier techniques beyond existing textbooks, including recent architectures, training methods, and future directions.

Course Team

Guangrun Wang

Guangrun Wang

Course instructor

Xiao Li

Xiao Li

Teaching assistant

Xiaoxin Lin

Xiaoxin Lin

Teaching assistant

Jiaying Zhou

Jiaying Zhou

Teaching assistant

Lectures

Lecture 7: 新一代AI架构

This lecture introduces new AI architectures.

Download Slides

Lecture 14: 高阶多模态大模型

This lecture covers advanced multimodal large models, including large models for quantitative finance and physical AI models..

Download Slides

Lecture 15: 大模型与Agent Skill：实战与经验

This lecture introduces the Practical Applications and Experience of foundation models and agent skills.

Download Slides

Lecture 17: 数学之美：大模型中算力不够，数学来凑

This lecture introduces how elegant math can help foundation models.

6月26日，最后一课

Download Slides

Final Project

期末大项目

This section introduces the final project.

Final Project Guideline: The report must be written with the provided templates, either in English or Chinese.

Final Project Assignment
Download English Template
Download Chinese Template
Grading Rubric（评分标准）

Assignments

Course Assignments

This section provides all homework assignments for the course.

Download Assignment 1: Transitioning to World Action Models

Download Assignment 2: Stability in Non-Contrastive Learning

Download Assignment 3: Instance Discrimination and Queue Dynamics

Download Assignment 4: In-Context Learning as Optimization

Download Assignment 5: Navigating the IQ Triangle in Transformers

Download Assignment 6: Evaluating Fine-Tuning Paradigms

Download Assignment 7: Masking Strategies in Self-Supervised Vision

Download Homework Assignment

Reviews

Review Materials

This section provides review materials for exams and course preparation.

Rebuttal Guideline: The rebuttal must be written in English and must not exceed one page.

Download Review
Download Rebuttal Template