The rise of massive self-supervised (pre-trained) models has transformed various data-driven fields such as natural language processing, computer vision, robotics, and medical imaging. This advanced graduate course aims to provide a holistic view of the issues related to these models: We will start with the history of how we got here, and then delve into the latest success stories. We will then focus on the implications of these technologies: social harms, security risks, legal issues, and environmental impacts. The class ends with reflections on the future implications of this trajectory.
Prerequisites: Students must have extensive experience with deep learning, machine learning, artificial intelligence, and natural language processing. Familiarity with linear algebra, statistics and probability are necessary, as well as with the design and implementation of learning models (via one of the learing libraries, such as PyTorch, Tensorflow, Keras, JAX). Students must be comfortable with reading papers and extracting key concepts and ideas from papers.
For much of the semester, each class will involve the presentation and discussion of recent important papers on pre-trained (self-supervised) statistical models. The objective of the course is to instill a holistic view of the latest developments in various fields (NLP, computer vision, biology. etc.), and help the participants understand their broad implications.
Each paper will be presented by a group of students each with an assigned "role". This role defines the lens through which they read the paper and determines what they prepare for the group in-class discussion. Here are the roles we will experiment with:
The presentation of each role can be done individually or in groups of ≤3. If done as a group, you and your partner should decide how to equally divide the work for a given paper presentation session.
Who presents what role and when? At the beginning of the semester, students will be divided into two halves, one half presenting on Tuesdays and the other on Thursdays. In a given class session, the students in the presenting half will each be given a random role (determined the week before at the end of the classes). Each role group (irrespective of how many students are assigned to it) should aim for specified time budgets for each role. You're encouraged to have slides for your role, though it is not mandatory. If you do so, I would recommend less than 7-10 slides to make sure stay within our time budget.
What slides? To minimize time spent context switching or fighting with screen sharing/projector dongles, we will have a shared pool of slides (hosted on Google presentations, will be shared a week before). Each role group are encouraged to title their slides with "[role emoji]: [student name]" (as in "🏺: Jane,John") so that the slides are quickly identified during the session. If you choose to make slides, you're not expected to prepare a full-blown presentation -- they're encouraged for visual aid and facilitating the presentation.
Before the class Please provide a short answer to a prompt posed by the instructor a few days before the class.
The beginning of each class Come up with one question about the paper (either something you're confused about or something you'd like to hear discussed more).
During the class While only a subset of the class will participate in presenting a paper, the rest of the class is expected to come to class ready to participate in the discussions.
The current class schedule is below (subject to change):
Here are the topics we wanted to cover but didn't have time for:
All students in the class will write a "mini-paper" as a final project. The topic of this project is open-ended. This project, for example, can focus on demonstrate systemic limitations of a prior work or suggesting improvements on methods or benchmarks discussed in the class.
Here are several resources available for free:
Besides these resources, we will try our best to satisfy individual needs through discussion.
Since this is a discussion class, it's especially important that we respect everyone's perspective and input. In particular, I value the perspectives of individuals from all backgrounds reflecting the diversity of our students. I will strive to make this classroom an inclusive space for all students. Please let me know if there is anything I can do to improve.
This course will have a zero-tolerance philosophy regarding plagiarism or other forms of cheating, and incidents of academic dishonesty will be reported. A student who has doubts about how the Honor Code applies to this course should obtain specific guidance from the course instructor before submitting the respective assignment.
The Johns Hopkins University is committed to equal opportunity for its faculty, staff, and students. To that end, the university does not discriminate on the basis of sex, gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, military status, immigration status or other legally protected characteristic. The University's Discrimination and Harassment Policy and Procedures provides information on how to report or file a complaint of discrimination or harassment based on any of the protected statuses listed in the earlier sentence, and the University’s prompt and equitable response to such complaints.