Students should work in teams of no more than five members. Each project should involve at least two modalities. VLA models are considered part of multimodal large models.
Students may contact the instructor to request GPU resources. Each team should use fewer than 16 A100 GPUs (based on the global assignment).