CS525
Download as PDF
Data for AI
Course Description
CS525 surveys the landscape of data for AI with a focus on contemporary learning problems such as training language models or multimodal models. Students will learn about important datasets and common data processing methods including filtering and deduplication. Further topics will include synthetic data, data attribution, and environments for reinforcement learning. The course will also cover ethical and legal aspects of training data such as copyright and privacy. Over the course of the class, students will build a training set for a learning problem of their choice. The class will consist of faculty lectures, student presentations, and guest lectures.
Grading Basis
ROP - Letter or Credit/No Credit
Min
3
Max
3
Course Repeatable for Degree Credit?
No
Course Component
Seminar
Enrollment Optional?
No
Programs
CS525
is a
completion requirement
for:
- (from the following course set: )
- (from the following course set: )