Introducing Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training
Overview
Today I'm excited to release Pivotal Token Search (PTS), a new technique for identifying and optimizing critical decision points in language model generations. Inspired by the recent Phi-4 paper from Microsoft, PTS helps identify specific "pivotal tokens" that dramatically shift the probability of a successful generation.
The Problem
Traditional Direct Preference Optimization (DPO) treats all tokens equally when learning from preferences. However, in many complex reasoning tasks, the success of a generation often hinges on just a handful of critical decisions. For example, when solving a math problem, choosing "cross-multiplying" versus "multiplying both sides" can dramatically affect whether the model reaches the correct solution, even though both approaches are mathematically valid.
How PTS Works
PTS uses a binary search algorithm to identify tokens that cause significant shifts in the probability of generation success:
- For a given prompt and completion, we estimate P(success|prefix) at various points in the generation
- We recursively subdivide the sequence to find points where adding a single token causes a large change in this probability
- We then create preference pairs focused specifically on these pivotal tokens
This targeted approach offers several advantages over standard DPO:
- More efficient learning signal by focusing on critical decisions
- Better handling of cases where both preferred/rejected responses contain valid reasoning
- Enhanced ability to improve reasoning without additional data
Released Resources
Implementation
Our GitHub repository contains:
- Full implementation of the PTS algorithm
- Data generation pipelines
- Evaluation tools and metrics
- Usage examples
Datasets
We're releasing several PTS-generated datasets for different domains:
pts
: The pivotal tokens discovered using PTSpts-dpo-pairs
: The exported preference pairs using pivotal tokens that can be used for DPOpts-steering-vectors
: Activation vectors exported from pivotal tokens that can be used for steering during inference
Models
Try our fine-tuned models that have been optimized using PTS:
- Base models fine-tuned with PTS preference pairs
- Specialized models for DeepSeek-R1 and Qwen3
Example Use Case
When fine-tuning a model to solve math problems, traditional approaches might provide an entire correct solution as a preference example. With PTS, we can identify that the critical decision was choosing to factor a quadratic rather than complete the square. By creating a preference pair focused on just that pivotal token, we provide a cleaner learning signal.
Future Directions
We're actively exploring:
- Multi-token pivotal sequences
- Applications to agent trajectory optimization
- Using PTS for model interpretability
- Combining PTS with other alignment techniques
Get Involved
We welcome community contributions! Try out PTS on your own tasks, experiment with our datasets, or contribute to the codebase.
- ๐ฆ GitHub: https://github.com/codelion/pts
- ๐ Datasets: https://huggingface.co/datasets?other=pts
- ๐ค Models: https://huggingface.co/models?other=pts
Let me know if you have questions or feedback in the comments!