Seed-X-PPO-7B
Introduction
We are excited to introduce Seed-X, a powerful series of open-source multilingual translation language models, including an instruction model, a reinforcement learning model, and a reward model. It pushes the boundaries of translation capabilities within 7 billion parameters.
https://huggingface.co/ByteDance-Seed/Seed-X-PPO-7B with ONNX weights.
Usage
Python with Optimum and ORT:
import os
import onnxruntime as ort
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer, PretrainedConfig, GenerationConfig
def main():
work_dir = "[huggingface/dir]"
config = PretrainedConfig.from_pretrained(work_dir)
gen_config = GenerationConfig.from_pretrained(work_dir)
suffix = "_q4"
model_path = os.path.join(work_dir, "onnx", f"model{suffix}.onnx")
use_gpu = True
providers = [
("CUDAExecutionProvider", {"device_id": 0})
] if use_gpu else []
providers.append("CPUExecutionProvider")
sess_options = ort.SessionOptions()
ort_model = ort.InferenceSession(model_path, sess_options, providers=providers)
llm_model = ORTModelForCausalLM(
session=ort_model,
config=config,
generation_config=gen_config,
use_io_binding=True,
use_cache=True,
)
tokenizer = AutoTokenizer.from_pretrained(work_dir)
prompt = "Translate the following English sentence into Spanish:\nYou are using a model of type mistral to instantiate a model of type . <es>"
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
print("Input prompt: ", prompt)
generated_ids = llm_model.generate(
**inputs,
max_new_tokens=512,
num_beams=4,
do_sample=False,
temperature=1.0,
)
predicts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("Output: ", "".join(predicts[len(prompt):]))
return
if __name__ == "__main__":
main()
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Fhrozen/Seed-X-PPO-7B-ONNX
Base model
ByteDance-Seed/Seed-X-PPO-7B