Large self-supervised (pre-trained) models (such as Large Language Models or LLMs) have transformed various data-driven fields, such as natural language processing (NLP). This advanced course aims to provide a holistic view of the issues related to these models. The class will mainly involve reading and discussing recent papers in the field.

The focuses of this class will involve various issues: data efficiency, robustness, long context, multi-modality, reasoning grounded in web or physical world, security/legal/privacy issues.

Note: The course is different from (more advanced than) 601.471/671 (offered in the spring semesters) which is focused on building the foundational concepts.

Prerequisites: Natural Language Processing (CS 465/665), NLP: Self-Supervised Models (CS 471/671), or instructor consent.

Relevant Courses at Hopkins: This course has some overlap with "Natural Language Processing" (EN.601/665), and "Artificial Agents" (EN.601.470/670), though the courses have different focuses.

Logistics

Expectations and Deliverables




Final project

The objective of the final project is to make use of what you have learned during this course to solve a hard problem.

The final project milestones include: (1) A project proposal, (2) A project midway report, (3) progress update presentation, (4) a final report, (5) a final project poster summarizing the technical aspects of the project. See the course calendar for the due dates.



Content Schedule

The current class schedule is below (subject to change). You can also see this spreadsheet containing a larger set of papers we considered.

Date Topic Course Materials
#1 - Tue Aug 27 Reviewing the foundation Course introduction:
  • Course overview
  • Plan and expectations
[slides: pptx, pdf]
Reviewing the foundations:
  • Language modeling
[slides: pptx, pdf]
#2 - Thu Aug 29 Reviewing the foundation
  • Transformers
  • Pre-training
[slides: pptx, pdf]
#3 - Tue Sept 3 Reviewing the foundation
  • Prompting
  • Tuning
[slides: pptx, pdf]
#4 - Thu Sept 5 Reviewing the foundation
  • Retrieval
  • Alignment
[slides: pptx, pdf]
#5 - Tue Sept 10 Pre-training Main Reading(s): The Llama 3 Herd of Models (Sec 3 and the relevant portion of Sec 5)
Additional Suggested Reading:
  1. OLMo: Accelerating the Science of Language Models
  2. Apple Intelligence Foundation Language Models (up to Sec 4 and relevants from Sec 6)
[slides: pptx, pdf]
#6 - Thu Sept 12 Alignment Main Reading(s): The Llama 3 Herd of Models (Sec 4 and the relevant portion of Sec 5)
Additional Suggested Reading:
  1. Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
  2. Fundamental Limitations of Alignment in Large Language Models
[slides: pptx, pdf]
#8 - Tue Sept 17 Data for Pre-training Main Reading(s): The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Additional Suggested Reading:
  1. Dated Data: Tracing Knowledge Cutoffs in Large Language Models
  2. A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
[slides: pptx, pdf]
#9 - Thu Sept 19 Evaluation Main Reading(s): Are Emergent Abilities of Large Language Models a Mirage?
Additional Suggested Reading:
  1. Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
  2. MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
  3. Understanding Emergent Abilities of Language Models from the Loss Perspective
[slides: pptx, pdf]
#9 - Tue Sept 24 Scalable oversight Main Reading(s): Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Additional Suggested Reading:
  1. On scalable oversight with weak LLMs judging strong LLMs
  2. Measuring Progress on Scalable Oversight for Large Language Models
  3. Prover-Verifier Games improve legibility of LLM outputs
[slides: pptx, pdf]
#10 - Thu Sept 26 Reasoning and inference-scaling Main Reading(s): Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Additional Suggested Reading:
  1. An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
  2. Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
  3. Are More LM Calls All You Need? Towards the Scaling Properties of Compound AI Systems
  4. Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
[slides: pptx, pdf]
#11 - Tue Oct 1 Interpreting LLM activations via SAEs Main Reading(s): Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Section 1-4, i.e. From the top to "Features as Computational Intermediates")
Additional Suggested Reading:
  1. Scaling and evaluating sparse autoencoders
  2. Disentangling Dense Embeddings with Sparse Autoencoders
[slides: pptx, pdf]
Oct 3 Project proposals deadline
#12 - Thu Oct 3 Security Main Reading(s): Stealing Part of a Production Language Model (summary)
Additional Suggested Reading:
  1. Logits of API-protected LLMs leak proprietary information
[slides: pptx, pdf]
#13 - Tue Oct 8 Guest Speaker Ziang Xiao, Assistant Professor of Computer Science at JHU
Title: Towards Human-Centered Evaluation of Generative Models
#14 - Thu Oct 10 Weight Quantization Main Reading(s): AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Additional Suggested Reading:
  1. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
  2. Optimal brain compression: A framework for accurate post-training quantization and pruning
  3. Extreme Compression of Large Language Models via Additive Quantization
  4. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
[slides: pptx, pdf]
#15 - Tue Oct 15 Fast n-gram membership/counting Main Reading(s): Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Additional Suggested Reading:
  1. Data Portraits: Recording Foundation Model Training Data
  2. N-gram Is Back: Residual Learning of Neural Text Generation with n-gram Language Model
[slides: pptx, pdf]
Oct 15 Project proposal revision deadline
#16 - Thu Oct 17 No Class - Fall break
#17 - Tue Oct 22 Efficient decoding Main Reading(s): Fast Inference from Transformers via Speculative Decoding
Additional Suggested Reading:
  1. Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
  2. Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
  3. Few-Shot Semantic Parsing with Language Models Trained on Code
  4. A semi-comprehensive collection of papers around "speculative decoding"
[slides: pptx, pdf]
#18 - Thu Oct 24 Long-context training Main Reading(s): How to Train Long-Context Language Models (Effectively)
Additional Suggested Reading:
  1. Effective Long-Context Scaling of Foundation Models
  2. Data Engineering for Scaling Language Models to 128K Context
[slides: pptx, pdf]
#19 - Tue Oct 29 Model merging Main Reading(s): TIES-Merging: Resolving Interference When Merging Models
Additional Suggested Reading:
  1. RE-Adapt: Reverse Engineered Adaptation of Large Language Models
  2. TBD
[slides: pptx, pdf]
#20 - Thu Oct 31 Representing world Main Reading(s): The Platonic Representation Hypothesis
Additional Suggested Reading:
  1. The Linear Representation Hypothesis and the Geometry of Large Language Models
  2. Emergent Linear Representations in World Models of Self-Supervised Sequence Models
[slides: pptx, pdf]
#21 - Tue Nov 5 Weight Adaptation Main Reading(s): DoRA: Weight-Decomposed Low-Rank Adaptation
Additional Suggested Reading:
  1. QLoRA: Efficient Finetuning of Quantized LLMs
  2. RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
[slides: pptx, pdf]
#22 - Thu Nov 7 Representation Adaptation Main Reading(s): Steering Language Models With Activation Engineering
Additional Suggested Reading:
  1. In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
  2. ReFT: Representation Finetuning for Language Models
[slides: pptx, pdf]
#23 - Tue Nov 12 Episodic memory Main Reading(s): CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Additional Suggested Reading:
  1. Language Modeling with Editable External Knowledge
  2. HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
[slides: pptx, pdf]
#24 - Thu Nov 14 Compression Main Reading(s): Training LLMs over Neurally Compressed Text
Additional Suggested Reading:
  1. Rethinking LLM Memorization through the Lens of Adversarial Compression
[slides: pptx, pdf]
Nov 14 Midway reports deadline
#25 - Tue Nov 19 Tool use Main Reading(s): LLMs in the Imaginarium: tool learning through simulated trial and error
Additional Suggested Reading:
  1. CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
  2. WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment
[slides: pptx, pdf]
#26 - Thu Nov 21 Mixture of Experts Main Reading(s): Mixtral of Experts
Additional Suggested Reading:
  1. ST-MoE: Designing Stable and Transferable Sparse Expert Models
  2. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
#27 - Tue Nov 26 No Class - Fall Recess
#28 - Thu Nov 28 No Class - Fall Recess
#29 - Tue Dec 3 Robotic planning Main Reading(s): VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Additional Suggested Reading:
  1. TBD
  2. TBD
#30 - Thu Dec 5 TBD Main Reading(s): TBD
Additional Suggested Reading:
  1. TBD
  2. TBD
Dec 9-10 Reading Days
Dec 17 Final project reports
Dec 17 Final project poster session (6-9pm) -- final exam schedule

Relevant Resources

Here are several resources available for free:

Besides these resources, we will try our best to satisfy individual needs through discussion.


Code of Conduct

The strength of the university depends on academic and personal integrity. In this course, you must be honest and truthful, abiding by the Computer Science Academic Integrity Policy:

Cheating is wrong. Cheating hurts our community by undermining academic integrity, creating mistrust, and fostering unfair competition. The university will punish cheaters with failure on an assignment, failure in a course, permanent transcript notation, suspension, and/or expulsion. Offenses may be reported to medical, law or other professional or graduate schools when a cheater applies. Violations can include cheating on exams, plagiarism, reuse of assignments without permission, improper use of the Internet and electronic devices, unauthorized collaboration, alteration of graded assignments, forgery and falsification, lying, facilitating academic dishonesty, and unfair competition. Ignorance of these rules is not an excuse.

Academic honesty is required in all work you submit to be graded. Except where the instructor specifies group work, you must solve all homework and programming assignments without the help of others. For example, you must not look at anyone else’s solutions (including program code) to your homework problems. However, you may discuss assignment specifications (not solutions) with others to be sure you understand what is required by the assignment. If your instructor permits using fragments of source code from outside sources, such as your textbook or on-line resources, you must properly cite the source. Not citing it constitutes plagiarism. Similarly, your group projects must list everyone who participated.

In the above paragraph "outside sources" also include content that was produced by an AI assistant like ChatGPT. This follows either by treating the AI assistant as a person for the purposes of this policy (controversial) or acknowledging that the AI assistant was trained directly on people's original work. Thus, while you are not forbidden from using these tools, you should consider the above policy carefully and quote where appropriate. Assignments that are in large part quoted from an AI assistant are very unlikely to be evaluated positively. In addition, if a student's work is substantially identical to another student's work, that will be grounds for an investigation of plagiarism regardless of whether the prose was produced by an AI assistant.

Falsifying program output or results is prohibited. Your instructor is free to override parts of this policy for particular assignments. To protect yourself: (1) Ask the instructor if you are not sure what is permissible. (2) Seek help from the instructor, TA or CAs, as you are always encouraged to do, rather than from other students. (3) Cite any questionable sources of help you may have received.

Report any violations you witness to the instructor. You can find more information about university misconduct policies on the web for undergraduates and graduates students.

Johns Hopkins University is committed to equal opportunity for its faculty, staff, and students. To that end, the university does not discriminate on the basis of sex, gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, military status, immigration status or other legally protected characteristic. The University's Discrimination and Harassment Policy and Procedures provides information on how to report or file a complaint of discrimination or harassment based on any of the protected statuses listed in the earlier sentence, and the University’s prompt and equitable response to such complaints.