Philosophy

Transformers is a PyTorch-first library. It provides models that are faithful to their papers, easy to use, and easy to hack.

A longer, in-depth article with examples, visualizations and timelines is available here as our canonical reference.

Our philosophy evolves through practice. What follows are out current, stable principles.

Who this library is for

Researchers and educators exploring or extending model architectures.
Practitioners fine-tuning, evaluating, or serving models.
Engineers who want a pretrained model that “just works” with a predictable API.

What you can expect

Three core classes are required for each model: configuration, models, and a preprocessing class. Tokenizers handle NLP, image processors handle images, video processors handle videos, feature extractors handle audio, and processors handle multimodal inputs.
All of these classes can be initialized in a simple and unified way from pretrained instances by using a common from_pretrained() method which downloads (if needed), caches and loads the related class instance and associated data (configurations’ hyperparameters, tokenizers’ vocabulary, processors’ parameters and models’ weights) from a pretrained checkpoint provided on Hugging Face Hub or your own saved checkpoint.
On top of those three base classes, the library provides two APIs: pipeline() for quickly using a model for inference on a given task and Trainer to quickly train or fine-tune a PyTorch model.

Core tenets

The following tenets solidified over time, and they’re detailed in our new philosophy blog post. They guide maintainer decisions when reviewing PRs and contributions.

Source of Truth. Implementations must be faithful to official results and intended behavior.

One Model, One File. Core inference/training logic is visible top-to-bottom in the model file users read.

Code is the Product. Optimize for reading and diff-ing. Prefer explicit names over clever indirection.

Standardize, Don’t Abstract. Keep model-specific behavior in the model. Use shared interfaces only for generic infra.

DRY* (Repeat when it helps users). End-user modeling files remain self-contained. Infra is factored out.

Minimal User API. Few codepaths, predictable kwargs, stable methods.

Backwards Compatibility. Public surfaces should not break. Old Hub artifacts have to keep working..

Consistent Public Surface. Naming, outputs, and optional diagnostics are aligned and tested.

Main classes

Configuration classes store the hyperparameters required to build a model. These include the number of layers and hidden size. You don’t always need to instantiate these yourself. When using a pretrained model without modification, creating the model automatically instantiates the configuration.
Model classes are PyTorch models (torch.nn.Module), wrapped by at least a PreTrainedModel.
Modular transformers. Contributors write a small modular_*.py shard that declares reuse from existing components. The library auto-expands this into the visible modeling_*.py file that users read/debug. Maintainers review the shard; users hack the expanded file. This preserves “One Model, One File” without boilerplate drift. See the contributing documentation for more information.
Preprocessing classes convert the raw data into a format accepted by the model. A tokenizer stores the vocabulary for each model and provides methods for encoding and decoding strings in a list of token embedding indices. Image processors preprocess vision inputs, video processors preprocess videos inputs, feature extractors preprocess audio inputs, and processors preprocess multimodal inputs.

All these classes can be instantiated from pretrained instances, saved locally, and shared on the Hub with three methods:

from_pretrained() lets you instantiate a model, configuration, and preprocessing class from a pretrained version either provided by the library itself (the supported models can be found on the Model Hub) or stored locally (or on a server) by the user.
save_pretrained() lets you save a model, configuration, and preprocessing class locally so that it can be reloaded using from_pretrained().
push_to_hub() lets you share a model, configuration, and a preprocessing class to the Hub, so it is easily accessible to everyone.

Update on GitHub