Rejection Sampling
Rejection sampling is a technique used in large language model (LLM) training, particularly in reinforcement learning from human feedback (RLHF) and related alignment methods. It serves as a simple but effective approach to generate high-quality training data by selectively accepting or rejecting samples based on predefined criteria.
Core Concept
The process involves:
- Generating multiple candidate responses from the LLM for a given prompt
- Evaluating each candidate against a quality metric (typically using a reward model or human raters)
- Rejecting low-quality samples and accepting only the best candidates for training
Mathematically, if we have a reward function \(R(x)\) that scores sample quality, rejection sampling selects samples where \(R(x) > \tau\) for some threshold \(\tau\).
Applications in LLM Training
Reward Modeling
In RLHF Phase 2, rejection sampling helps create high-quality demonstration data by filtering out poor responses, ensuring the policy model learns from only the best examples.
Data Augmentation
By generating diverse candidates and selecting the highest-quality ones, rejection sampling creates additional training data that aligns better with human preferences.
Quality Control
During inference or data generation, rejection sampling acts as a filter to ensure only acceptable outputs are used or presented.
Advantages and Limitations
Advantages:
- Simple to implement and understand
- Requires no gradient computations
- Can be highly effective when the reward model is accurate
Limitations:
- Computationally expensive (requires generating multiple candidates)
- Sample inefficient (many generations are discarded)
- Performance depends heavily on the quality of the reward function
Rejection sampling represents a fundamental building block in modern LLM alignment pipelines, often serving as a baseline for more sophisticated sampling techniques like proximal policy optimization (PPO) or best-of-n sampling.