Large self-supervised (pre-trained) models have transformed various data-driven fields such as natural language processing (NLP). In this course, students will gain a thorough introduction to self-supervised learning techniques for NLP applications. Through lectures, assignments, and a final project, students will learn the necessary skills to design, implement, and understand their own self-supervised neural network models, using the Pytorch framework.

Note: The course is different from 601.771 (offered in the fall semesters) which involves reading recent papers and is geared toward grad students that want to specialize in the latest developments in self-supervised models.

Prerequisites: (1) Data Structures (601.226), (2) All the class assignments will be in Python/PyTorch. If you don’t know Python or PyTorch but have experience in other programming languages (Java, C++, etc.) you can probably pick Python/PyTorch pretty quickly. (3) Calculus and linear algebra: you should be comfortable with matrix operations (matrix multiplication, transpose, inverse, dot product, gradients). (4) Probability: basic probability properties (conditionals, marginals, mean, standard deviation), distributions (gaussian, categorical). (5) Background in Natural Language Processing & Machine Learning or having finished one of the relevant courses such as Machine Learning (475.675), Artificial Intelligence (464.664), Natural Language Processing (600.465), Machine Translation (600.468), or Introduction to HLT (601.467/667).

Relevant Courses at Hopkins: This course has some overlap with "Natural Language Processing" (EN.601/665), "Introduction to Human Language Technology" (601.467/667), and "Artificial Agents" (EN.601.470/670), though the courses have different focuses.



The homework is your opportunity to practice doing the thing. The lectures and office hours hopefully provide good intuition and motivation and justification for the skills we want you to develop, but the best way to develop those skills is by trying to solve the problems yourself. The practice is far more important than the solution.

The course has ~12 weekly assignments which will improve both your theoretical understanding and your practical skills. All assignments contain both written questions and programming parts (mainly in Python). They will be released on this website, and submissions should be uploaded to Gradescope.

Here is a tentative list of topics for the assignments:

# Focus
#1 Algebra, calculus, probability recap, implementing Skip-Gram model, classification, evaluation, comparison to basic features (unigrams, bigrams) and existing word embeddings.
#2 Understanding softmax function, classification via vector representations, playing with gradient descent.
#3 PyTorch introduction, automatic differentiation, computation graph, how to use PyTorch on GPUs, basic feedforward network and backpropagation, Word2vec as a feedforward net with automatic differentiation
#4 Neural language model with feedforward network, evaluating language modeling, comparison to count-based model, comparing the representation to Word2vec
#5 Recurrent neural language model and evaluation, comparison to all the earlier models
#6 Transformer-based language model, comparison to the earlier models, distributed training
#7 Decoding language models: greedy, nucleus, typical; various issues related to text generation.
#8 Fine-tuning, prefix-tuning, adapting existing models, comparison to the earlier model; [mis]-Interpreting continuous prompts.
#9 Modifying Transformer for long contexts
#10 In-context learning; dealing with uncertainties
#11 Retrieval-augmented language models
#12 Alignment with human feedback

Midterm exam

There will be one in-class midterm. The details of this exam is TBD.

Final project

The objective of the final project is to make use of what you have learned during this course to solve a hard problem.

The project deliverables are: (1) A project proposal, (2) a final report, (3) a final project poster summarizing the technical aspects of the project.

Content Schedule

Each session will involve an instructor-led presentation on a focused topic self-supervised models. There will be weekly assignments related to class presentations, a midterm exam, and a final project.

The current class schedule is below (subject to change):

Date Topic Course Materials Events Deadlines
#1 - Tue Jan 24 Course overview
Plan and expectations
HW1 released! [tex] [pdf] [colab]
⬇️ -- Self-supervised Word Representations
#2 - Thu Jan 26 Human language and word meaning
Word2vec overview
Word2vec objective function
Suggested Reading: Jurafsky & Martin Chapter 6
Additional Reading:
  1. Python / Numpy Tutorial (with Jupyter and Colab)
  2. Learning Word Embeddings
  3. Dive into Deep Learning: Word Embeddings
  4. Dive into Deep Learning: Gradient Descent
  5. Efficient Estimation of Word Representations in Vector Space (Original W2V paper)
Fri Jan 27 TA Review Session (virtual over Zoom): Math background + Python [zoom link] [slides] Time: 9 - 9:50 AM
#3 - Tue Jan 31 Word2vec objective function (continued)
Inspecting the resulting word vectors
Evaluating word vectors
Suggested Reading: Jurafsky & Martin Chapter 6
Additional Reading:
  1. Optimization: Stochastic Gradient Descent
  2. Distributed Representations of Words and Phrases and their Compositionality (Negative Sampling paper)
HW2 released! [tex] [pdf] [colab] HW1 due
⬇️ -- Self-Supervised Representation of Feedforward Neural Language Models
#4 - Thu Feb 2 Word2vec limitations
Feedforward networks
Neural nets: brief history
Word2vec as simple feedforward net
Suggested Reading: Jurafsky & Martin Chapter 7
Additional Reading:
  1. Neural Networks: the Architecture
  2. Neural Networks: data and loss
  3. Dive into Deep Learning: Multilayer Perceptron
  4. Dive into Deep Learning: Practitioners Guide to Neural Networks
  5. Dive into Deep Learning: Linear Algebra in PyTorch
Fri Feb 3 TA Review Session (virtual over Zoom): Backpropagation and PyTorch Time: 9 - 9:50 AM
#5 - Tue Feb 7 Analytical backpropagation
Automatic differentiation
Practical tips for training neural networks
Suggested Reading: Jurafsky & Martin Chapter 7
Additional Reading:
  1. Neural Networks: Backpropagation
  2. Neural Networks: Training and empirical tips
  3. Computing Neural Network Gradients
  4. Learning representations by back-propagating errors (the original backpropagation paper)
HW2 due
#6 - Thu Feb 9 Extending Word2Vec with FFNs Language modeling objective
N-gram language modeling
Measuring LM quality
Language Modeling with FFN
Suggested Reading: Jurafsky & Martin Chapter 7
Additional Reading:
  1. Jurafsky & Martin Chapter 3
  2. Revisiting Simple Neural Probabilistic Language Models
⬇️ -- Self-Supervised Representation of Recurrent Neural Language Models
#7 - Tue Feb 14 Limitations of feedforward language models
Recurrent neural networks
Sequence-to-sequence model
Training Seq2Seq models
Aside: Input units: words, subwords, characters, bytes
Suggested Reading: CS231N course notes on RNNs
Additional Reading:
  1. CS224N course notes on RNNs
  2. Dive into Deep Learning: Recursive Neural Networks
  3. The Unreasonable Effectiveness of Recurrent Neural Networks (blog post overview)
  4. Learning long-term dependencies with gradient descent is difficult (one of the original vanishing gradient papers)
#8 - Thu Feb 16 Generation and surprisal
Text generation algorithms:
- greedy decoding,
- stochastic (top-k, nucleus)
- exhaustive search
- beam search
⬇️ -- Self-Supervised Representation of Transformer Language Models
#9 - Tue Feb 21 Self-attention: how it works
Positional embeddings
Computational complexity
Self-attention: hacks and variants
Suggested Reading: Attention Is All You Need
Additional Reading:
  1. Dive into Deep Learning: Attention Mechanism
  2. The Annotated Transformer
  3. The Illustrated Transformer
#10 - Thu Feb 23 Transformer architecture
Various pre-training objective functions
Existing models: BERT, RoBERTa, T5, GPT
Scaling laws
Multilingual properties
Pre-training hacks
Suggested Reading: TBD
Additional Reading:
  1. The Illustrated BERT, ELMo, and co
  2. Exploring the limits of transfer learning with a unified text-to-text transformer
  3. BART: Denoising Sequence-to-Sequence Pre-training
TBD PyTorch + Backprop Review Session (virtual) Time: TBD
⬇️ -- Doing Things with Language Models
#11 - Tue Feb 28 Adapting models with parameter change
#12 - Thu Mar 2 Adapting models with prompting
In-context learning
Multi-step reasoning
Connection to supervised learning
Failure modes of in-context learning
#13 - Tue Mar 7 Hallucination issue
Calibrating model uncertainties
- Consistency
- Sensitivity
- Mutual information
- Flatness
⬇️ -- Language Models and Long Context
#14 - Thu Mar 9 Modifying self-attention for long context
- Review of retrieval
- Dense retrieval
- Case study: answering factual questions
#15 - Tue Mar 14 Retrieval augmented language models
  1. Tips on course project
⬇️ -- Social Concerns and Alignment with Human Values
#16 - Thu Mar 16 Bias, fairness and toxic language
Moral frameworks and reasoning
Truthfulness and veracity
  1. Illustrating Reinforcement Learning from Human Feedback
#17 - Tue Mar 21 No Class - Spring Break
#18 - Thu Mar 23 No Class - Spring Break
#19 - Tue Mar 28 The alignment problem and challenges
Alignment as human-in-loop feedback
Supervised learning with human judgement
Reinforcement learning with human judgements
#20 - Thu Mar 30 Reinforcement learning with human judgements (continued)
Evaluating alignment
Remaining challenges
⬇️ -- Memorization, Security and Privacy
#21 - Tue Apr 4 Memorization in language models
Quantifying memorization
Concerns for applications
#22 - Thu Apr 6 Mitigating memorization of private information
⬇️ -- Self-Supervised Learning Text and Other Modalities
#23 - Tue Apr 11 Joint representation of language and programming languages
#24 - Thu Apr 13 Language grounded in structured layouts (web pages, phone apps, ...)
#25 - Tue Apr 18 Joint representation of language and visual information
#26 - Thu Apr 20 Joint representation of text and speech signals
⬇️ -- Future of Self-Supervised Models
#27 - Tue Apr 25 Bias/fairness concerns and feedback loops
Environmental concerns
Privacy and security issues
Legal issues
#28 - Thu Apr 27 Future of self-supervised models
Availability of data
Availability of compute
Limitations and open problems
#29 - Tue May 2 No Class - Reading Days
#30 - Thu May 4 No Class - Reading Days
TBD Final project presentation
TBD Final report submission deadline

Reference text

There is no required text. Though the following can be useful:

Relevant Resources

Here are several resources available for free:

Besides these resources, we will try our best to satisfy individual needs through discussion.


The strength of the university depends on academic and personal integrity. In this course, you must be honest and truthful, abiding by the Computer Science Academic Integrity Policy:

Cheating is wrong. Cheating hurts our community by undermining academic integrity, creating mistrust, and fostering unfair competition. The university will punish cheaters with failure on an assignment, failure in a course, permanent transcript notation, suspension, and/or expulsion. Offenses may be reported to medical, law or other professional or graduate schools when a cheater applies. Violations can include cheating on exams, plagiarism, reuse of assignments without permission, improper use of the Internet and electronic devices, unauthorized collaboration, alteration of graded assignments, forgery and falsification, lying, facilitating academic dishonesty, and unfair competition. Ignorance of these rules is not an excuse.

Academic honesty is required in all work you submit to be graded. Except where the instructor specifies group work, you must solve all homework and programming assignments without the help of others. For example, you must not look at anyone else’s solutions (including program code) to your homework problems. However, you may discuss assignment specifications (not solutions) with others to be sure you understand what is required by the assignment. If your instructor permits using fragments of source code from outside sources, such as your textbook or on-line resources, you must properly cite the source. Not citing it constitutes plagiarism. Similarly, your group projects must list everyone who participated.

In the above paragraph "outside sources" also include content that was produced by an AI assistant like ChatGPT. This follows either by treating the AI assistant as a person for the purposes of this policy (controversial) or acknowledging that the AI assistant was trained directly on people's original work. Thus, while you are not forbidden from using these tools, you should consider the above policy carefully and quote where appropriate. Assignments that are in large part quoted from an AI assistant are very unlikely to be evaluated positively. In addition, if a student's work is substantially identical to another student's work, that will be grounds for an investigation of plagiarism regardless of whether the prose was produced by an AI assistant.

Falsifying program output or results is prohibited. Your instructor is free to override parts of this policy for particular assignments. To protect yourself: (1) Ask the instructor if you are not sure what is permissible. (2) Seek help from the instructor, TA or CAs, as you are always encouraged to do, rather than from other students. (3) Cite any questionable sources of help you may have received.

Report any violations you witness to the instructor. You can find more information about university misconduct policies on the web for undergraduates and undergraduates students.

Johns Hopkins University is committed to equal opportunity for its faculty, staff, and students. To that end, the university does not discriminate on the basis of sex, gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, military status, immigration status or other legally protected characteristic. The University's Discrimination and Harassment Policy and Procedures provides information on how to report or file a complaint of discrimination or harassment based on any of the protected statuses listed in the earlier sentence, and the University’s prompt and equitable response to such complaints.