Transformers documentation
ExecuTorch
ExecuTorch
ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch ecosystem and supports the deployment of PyTorch models with a focus on portability, productivity, and performance.
ExecuTorch introduces well defined entry points to perform model, device, and/or use-case specific optimizations such as backend delegation, user-defined compiler transformations, memory planning, and more. The first step in preparing a PyTorch model for execution on an edge device using ExecuTorch is to export the model. This is achieved through the use of a PyTorch API called torch.export.
ExecuTorch Integration
An integration point is being developed to ensure that 🤗 Transformers can be exported using torch.export. The goal of this integration is not only to enable export but also to ensure that the exported artifact can be further lowered and optimized to run efficiently in ExecuTorch, particularly for mobile and edge use cases.
A wrapper module designed to make a PreTrainedModel exportable with torch.export,
specifically for use with static caching. This module ensures that the exported model
is compatible with further lowering and execution in ExecuTorch.
Note:
This class is specifically designed to support export process using torch.export
in a way that ensures the model can be further lowered and run efficiently in ExecuTorch.
forward
< source >( input_ids: Tensor cache_position: Tensor ) → torch.Tensor
Forward pass of the module, which is compatible with the ExecuTorch runtime.
This forward adapter serves two primary purposes:
Making the Model
torch.export-Compatible: The adapter hides unsupported objects, such as theCache, from the graph inputs and outputs, enabling the model to be exportable usingtorch.exportwithout encountering issues.Ensuring Compatibility with
ExecuTorchruntime: The adapter matches the model’s forward signature with that inexecutorch/extension/llm/runner, ensuring that the exported model can be executed inExecuTorchout-of-the-box.
transformers.convert_and_export_with_cache
< source >( model: PreTrainedModel example_input_ids: Tensor = None example_cache_position: Tensor = None ) → Exported program (torch.export.ExportedProgram)
Parameters
- model (
PreTrainedModel) — The pretrained model to be exported. - example_input_ids (
torch.Tensor) — Example input token id used bytorch.export. - example_cache_position (
torch.Tensor) — Example current cache position used bytorch.export.
Returns
Exported program (torch.export.ExportedProgram)
The exported program generated via torch.export.
Convert a PreTrainedModel into an exportable module and export it using torch.export,
ensuring the exported model is compatible with ExecuTorch.