Large self-supervised (pre-trained) models (such as Large Language Models or LLMs) have transformed various data-driven fields, such as natural language processing (NLP). This advanced course aims to provide a holistic view of the issues related to these models. The class will mainly involve reading and discussing recent papers in the field.

The focuses of this class will involve various issues regarding "scaling": data efficiency, model social, long context, multi-modality, reasoning grounded in web or physical world, security/legal/privacy issues.

Note: The course is different from 601.471/671 (offered in the spring semesters) which is focused on building the foundations for self-supervised models.

Prerequisites: Natural Language Processing (CS 465/665), NLP: Self-Supervised Models (CS 471/671), or instructor consent.

Relevant Courses at Hopkins: This course has some overlap with "Natural Language Processing" (EN.601/665), and "Artificial Agents" (EN.601.470/670), though the courses have different focuses.

Logistics




Assignment

The course has ONE assignment to measure your understanding of the foundational concepts of self-supervised learning. This is to make sure that when coming in, you know all the pre-requisites needed for the class. They will be released on this website, and submissions should be uploaded to Gradescope.


Pre/in-class Participation

TBD


Final project

The objective of the final project is to make use of what you have learned during this course to solve a hard problem.

The final project milestones include: (1) A project proposal, (2) A project midway report, (3) progress update presentation, (4) a final report, (5) a final project poster summarizing the technical aspects of the project. See the course calendar for the due dates.


What papers should we read?

Which papers should we read? What are the important topics in the field? Use this form to suggest papers and topics for the class.


Content Schedule

The current class schedule is below (subject to change):

Date Topic Course Materials Events Deadlines
#1 - Tue Jan 23 Course introduction:
  • Course overview
  • Plan and expectations
[slides: pptx, pdf]
Suggested reading: Dive into Deep Learning: Linear Algebra in PyTorch<
Additional Reading:
  1. Python / Numpy Tutorial (with Jupyter and Colab)
  2. Optimization: Stochastic Gradient Descent
HW1 is released! [tex]
#2 - Thu Jan 25 Reviewing the foundations: Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#3 - Tue Jan 30 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
HW1 due
#4 - Thu Feb 1 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#5 - Tue Feb 6 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#6 - Thu Feb 8 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#8 - Thu Feb 15 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#9 - Tue Feb 20 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#10 - Thu Feb 22 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#11 - Tue Feb 27 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#12 - Thu Feb 29 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#13 - Tue Mar 5 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#14 - Thu Mar 7 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#15 - Tue Mar 12 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
Apr 1 Project proposals deadline
#17 - Tue Mar 19 No Class - Spring Break
#18 - Thu Mar 21 No Class - Spring Break
#19 - Tue Mar 26 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#20 - Thu Mar 28 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#21 - Tue Apr 2 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#22 - Thu Apr 4 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
TBD Midway reports deadline
#23 - Tue Apr 9 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#24 - Thu Apr 11 TND Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#25 - Tue Apr 16 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#26 - Thu Apr 18 TBD Suggested Reading: TBD
Additional Reading:
  1. TBD
  2. TBD
#27 - Tue Apr 23 Project progress presentation
#28 - Thu Apr 25 Project progress presentation
#29 - Tue April 30 No Class - Reading Days
#30 - Thu May 2 No Class - Reading Days
May 13 Final project reports
May 13 Final project poster session (6-9pm)


Relevant Resources

Here are several resources available for free:

Besides these resources, we will try our best to satisfy individual needs through discussion.


Code of Conduct

The strength of the university depends on academic and personal integrity. In this course, you must be honest and truthful, abiding by the Computer Science Academic Integrity Policy:

Cheating is wrong. Cheating hurts our community by undermining academic integrity, creating mistrust, and fostering unfair competition. The university will punish cheaters with failure on an assignment, failure in a course, permanent transcript notation, suspension, and/or expulsion. Offenses may be reported to medical, law or other professional or graduate schools when a cheater applies. Violations can include cheating on exams, plagiarism, reuse of assignments without permission, improper use of the Internet and electronic devices, unauthorized collaboration, alteration of graded assignments, forgery and falsification, lying, facilitating academic dishonesty, and unfair competition. Ignorance of these rules is not an excuse.

Academic honesty is required in all work you submit to be graded. Except where the instructor specifies group work, you must solve all homework and programming assignments without the help of others. For example, you must not look at anyone else’s solutions (including program code) to your homework problems. However, you may discuss assignment specifications (not solutions) with others to be sure you understand what is required by the assignment. If your instructor permits using fragments of source code from outside sources, such as your textbook or on-line resources, you must properly cite the source. Not citing it constitutes plagiarism. Similarly, your group projects must list everyone who participated.

In the above paragraph "outside sources" also include content that was produced by an AI assistant like ChatGPT. This follows either by treating the AI assistant as a person for the purposes of this policy (controversial) or acknowledging that the AI assistant was trained directly on people's original work. Thus, while you are not forbidden from using these tools, you should consider the above policy carefully and quote where appropriate. Assignments that are in large part quoted from an AI assistant are very unlikely to be evaluated positively. In addition, if a student's work is substantially identical to another student's work, that will be grounds for an investigation of plagiarism regardless of whether the prose was produced by an AI assistant.

Falsifying program output or results is prohibited. Your instructor is free to override parts of this policy for particular assignments. To protect yourself: (1) Ask the instructor if you are not sure what is permissible. (2) Seek help from the instructor, TA or CAs, as you are always encouraged to do, rather than from other students. (3) Cite any questionable sources of help you may have received.

Report any violations you witness to the instructor. You can find more information about university misconduct policies on the web for undergraduates and graduates students.

Johns Hopkins University is committed to equal opportunity for its faculty, staff, and students. To that end, the university does not discriminate on the basis of sex, gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, military status, immigration status or other legally protected characteristic. The University's Discrimination and Harassment Policy and Procedures provides information on how to report or file a complaint of discrimination or harassment based on any of the protected statuses listed in the earlier sentence, and the University’s prompt and equitable response to such complaints.