QuantFactory Banner

QuantFactory/DRT-o1-7B-GGUF

This is quantized version of Krystalan/DRT-o1-7B created using llama.cpp

Original Model Card

DRT-o1

๐Ÿค— DRT-o1-7B   |   ๐Ÿค— DRT-o1-8B   |   ๐Ÿค— DRT-o1-14B   |    ๐Ÿ“‘ Paper

This repository contains the resources for our paper "DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought"

Updates:

  • 2024.12.31: We updated our paper with more detals and analyses. Check it out!
  • 2024.12.31: We released the testing set of our work, please refer to data/test.jsonl
  • 2024.12.30: We released a new model checkpoint using Llama-3.1-8B-Instruct as the backbone, i.e., ๐Ÿค— DRT-o1-8B
  • 2024.12.24: We released our paper. Check it out!
  • 2024.12.23: We released our model checkpoints. ๐Ÿค— DRT-o1-7B and ๐Ÿค— DRT-o1-14B.

If you find this work is useful, please consider cite our paper:

@article{wang2024drt,
  title={DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought},
  author={Wang, Jiaan and Meng, Fandong and Liang, Yunlong and Zhou, Jie},
  journal={arXiv preprint arXiv:2412.17498},
  year={2024}
}

Quick Links

Introduction

In this work, we introduce DRT-o1, an attempt to bring the success of long thought reasoning to neural machine translation (MT). To this end,

  • ๐ŸŒŸ We mine English sentences with similes or metaphors from existing literature books, which are suitable for translation via long thought.
  • ๐ŸŒŸ We propose a designed multi-agent framework with three agents (i.e., a translator, an advisor and an evaluator) to synthesize the MT samples with long thought. There are 22,264 synthesized samples in total.
  • ๐ŸŒŸ We train DRT-o1-8B, DRT-o1-7B and DRT-o1-14B using Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct as backbones.

Our goal is not to achieve competitive performance with OpenAIโ€™s O1 in neural machine translation (MT). Instead, we explore technical routes to bring the success of long thought to MT. To this end, we introduce DRT-o1, a byproduct of our exploration, and we hope it could facilitate the corresponding research in this direction.

Models

Model Access

Backbone Model Access
DRT-o1-7B ๐Ÿค— Qwen2.5-7B-Instruct ๐Ÿค— DRT-o1-7B
DRT-o1-8B ๐Ÿค— Llama-3.1-8B-Instruct ๐Ÿค— DRT-o1-8B
DRT-o1-14B ๐Ÿค— Qwen2.5-14B-Instruct ๐Ÿค— DRT-o1-14B

Model Performance

GRF CometKiwi GRB BLEU CometScore
Llama-3.1-8B-Instruct 79.25 70.14 73.30 18.55 74.58
Qwen2.5-7B-Instruct 81.53 70.36 77.92 27.02 76.78
Qwen2.5-14B-Instruct 84.74 72.01 80.85 30.23 78.84
Marco-o1-7B 82.41 71.62 77.50 29.48 77.41
QwQ-32B-preview 86.31 71.48 83.08 27.46 78.68
DRT-o1-8B 84.49 70.85 80.80 32.67 78.81
DRT-o1-7B 85.57 71.78 82.38 35.54 80.19
DRT-o1-14B 87.19 72.11 83.20 36.46 80.64

Model Prompts

During model inference, please use the following prompts:

  • System prompt: You are a philosopher skilled in deep thinking, accustomed to exploring complex problems with profound insight.
  • User prompt: Please translate the following text from English to Chinese:\n[An English text]

DRT-o1 models will first generate the thought and then provide the final translation, with the following format:

<thought>
[Reasoning process]
</thought>
<output>
[Final translation]
</output>

Quickstart

  • โ›ท๏ธ Huggingface Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Krystalan/DRT-o1-7B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Please translate the following text from English to Chinese:\nThe mother, with her feet propped up on a stool, seemed to be trying to get to the bottom of that answer, whose feminine profundity had struck her all of a heap."
messages = [
    {"role": "system", "content": "You are a philosopher skilled in deep thinking, accustomed to exploring complex problems with profound insight."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
  • โ›ท๏ธ vllm:

Deploying LLMs:

python3 -m vllm.entrypoints.openai.api_server --model [model_ckpt] --served-model-name [model_name]

Calling LLMs:

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model=[model_name],
    messages=[
        {"role": "system", "content": "You are a philosopher skilled in deep thinking, accustomed to exploring complex problems with profound insight."},
        {"role": "user", "content": "Please translate the following text from English to Chinese:\nThe mother, with her feet propped up on a stool, seemed to be trying to get to the bottom of that answer, whose feminine profundity had struck her all of a heap."},
    ],
    temperature=0.1,
    top_p=0.8,
    max_tokens=2048,
    extra_body={
        "repetition_penalty": 1.05,
    },
)
print("Chat response:", chat_response)

Translation Cases

Source Sentence Qwen2.5-14B-Instruct QwQ-32B-preview DRT-o1-14B
Women were weeping and children crying, and all were going as fast as seemingly lay in their power, looking behind now and then as if pursued by some deadly enemy. ๅฆ‡ๅฅณๅœจๅ“ญๆณฃ๏ผŒๅญฉๅญไปฌๅœจๅ“€ๅท๏ผŒๆ‰€ๆœ‰ไบบ้ƒฝไผผไนŽๅœจๅฐฝๅ…จๅŠ›ๅฟซ้€Ÿๅ‰่กŒ๏ผŒไธๆ—ถๅ›žๅคดๅผ ๆœ›๏ผŒไปฟไฝ›่ƒŒๅŽๆœ‰ๆŸไธช่‡ดๅ‘ฝ็š„ๆ•Œไบบๅœจ่ฟฝ่ตถใ€‚ ๅฆ‡ๅฅณไปฌๅ“ญๆณฃ็€๏ผŒๅญฉๅญไปฌๅ“ญๅ–Š็€๏ผŒๆฏไธชไบบ้ƒฝๅœจๅฐฝๅŠ›ๅŠ ๅฟซ่„šๆญฅ๏ผŒไธๆ—ถๅ›žๅคดๅผ ๆœ›๏ผŒไปฟไฝ›่ƒŒๅŽๆœ‰่‡ดๅ‘ฝ็š„ๆ•Œไบบๅœจ่ฟฝๅ‡ปใ€‚ ๅฆ‡ๅฅณๆ‚ฒๆณฃ๏ผŒๅญฉ็ซฅๅ“€ๅท๏ผŒไผ—ไบบ็–พๅฅ”๏ผŒไธๆ—ถๅ›ž้ฆ–๏ผŒไปฟไฝ›ๆœ‰่‡ดๅ‘ฝไน‹ๆ•ŒๅœจๅŽ็ดง่ฟฝใ€‚
Wait through this season of the rain Till suns of autumn dry the plain, Then shall thy giant foe, and all His host and realm, before thee fall. ็ญ‰ๅพ…่ฟ™้›จๅญฃ่ฟ‡ๅŽป๏ผŒ็›ดๅˆฐ็ง‹ๆ—ฅ็š„้˜ณๅ…‰ๆ™’ๅนฒๅŽŸ้‡Ž๏ผŒ้‚ฃๆ—ถไฝ ็š„ๅทจๆ•ŒๅŠๅ…ถ้ƒจไผ—ๅ’Œ็Ž‹ๅ›ฝ้ƒฝๅฐ†ๅ€’ๅœจไฝ ้ขๅ‰ใ€‚ ๅฟ่ฟ‡้›จๅญฃๆผซ้•ฟๅคœ๏ผŒๅพ…็ง‹้˜ณๅ‡็ฉบ๏ผŒๆ™’ๅนฒๅคงๅœฐๆนฟๆถฆ่‰ฒ๏ผŒ้‚ฃๆ—ถ๏ผŒๅทจๆ•Œๅžฎๅฐ๏ผŒไป–็š„ๅ†›้˜Ÿๅ’Œ็Ž‹ๅ›ฝ๏ผŒๅ…จๅœจไฝ ่„šไธ‹ใ€‚ ๅކๅฐฝ่ฟ™้›จๅญฃ๏ผŒๅพ…็ง‹ๆ—ฅ้˜ณๅ…‰้ฉฑๆ•ฃๅนณๅŽŸไธŠ็š„ๆœ€ๅŽไธ€ไธๆนฟๆถฆ๏ผŒ้‚ฃๆ—ถ๏ผŒไฝ ้‚ฃๅทจๆ•Œ๏ผŒ่ฟžๅŒไป–็š„ๅ†›้˜ŸไธŽ็Ž‹ๅ›ฝ๏ผŒ้ƒฝๅฐ†ๅŒๅŒๅœจไฝ ่„šไธ‹ใ€‚
And if their words had taken another turn...if he himself had been less fastidious about intruding on another man's secrets...it was cruel to think how thin a film had shut out rescue from all this guilt and misery. ่€Œๅฆ‚ๆžœไป–ไปฌ็š„่จ€่พž่ฝฌๅ‘ไบ†ๅฆไธ€ไธชๆ–นๅ‘โ€ฆโ€ฆๅฆ‚ๆžœไป–่‡ชๅทฑๅฏนไพตๆ‰ฐไป–ไบบ็ง˜ๅฏ†ไธ้‚ฃไนˆไป‹ๆ„โ€ฆโ€ฆๆƒณๅˆฐๆญฃๆ˜ฏ้‚ฃๅฑ‚่–„่–„็š„้š”้˜‚ๅฐ†ๆ•‘ๆดๆ‹’ไน‹้—จๅค–๏ผŒไฝฟๅพ—่ฟ™ไธ€ๅˆ‡็š„็ฝชๆถๅ’Œ็—›่‹ฆๆ— ๆณ•้ฟๅ…๏ผŒ็œŸๆ˜ฏๆฎ‹้…ทใ€‚ ่€Œไธ”๏ผŒๅ‡ๅฆ‚ไป–ไปฌ็š„่ฐˆ่ฏ่ตฐๅ‘ไธๅŒโ€ฆโ€ฆๅ‡ๅฆ‚ไป–่‡ชๅทฑไธๅฏนๅนฒๆถ‰ไป–ไบบ็งไบ‹ๅฆ‚ๆญค่ฐจๆ…Žโ€ฆโ€ฆ้‚ฃๆ—ถไป–ไผšๆ„่ฏ†ๅˆฐ๏ผŒๅช่ฆไธ€ๅฑ‚่–„่–„็š„ๅฑ้šœๅฐฑ้˜ป้š”ไบ†ๆ•‘ๆด๏ผŒไฝฟๅพ—ๆ‰€ๆœ‰่ฟ™ไบ›็ฝชๆถๅ’Œ็—›่‹ฆๆ— ๆณ•่งฃ่„ฑ๏ผŒ่ฟ™็œŸๆ˜ฏๆฎ‹ๅฟใ€‚ ่‹ฅไป–ไปฌ็š„่จ€่พž็จไฝœ่ฝฌๅ˜โ€ฆโ€ฆ่‹ฅไป–ๆœฌไบบไธ้‚ฃไนˆไป‹ๆ„ไพตๅ…ฅไป–ไบบ็š„ๅฟƒๆ‰‰โ€ฆโ€ฆๆƒณๅˆฐ่ฟ™ๅฑ‚่–„่–„็š„้š”้˜‚็ซŸๅฐ†ๆ‰€ๆœ‰ๅ†…็–šไธŽ็—›่‹ฆ้š”็ฆปไบŽๆ•‘ๆดไน‹ๅค–๏ผŒๅฎžไธบๆฎ‹ๅฟใ€‚
This cold officer upon a monument, who dropped epithets unconcernedly down, would be finer as a dead man, he thought. ไป–่ฎคไธบ๏ผŒ่ฟ™ไธช็ซ™ๅœจ็บชๅฟต็ข‘ไธŠ็š„ๅ†ทๆผ ๅฎ˜ๅ‘˜๏ผŒ่‹ฅๆญปไบ†ไผšๆ›ดๅฅฝ๏ผŒไป–ไธๅธฆไปปไฝ•ๆ„Ÿๆƒ…ๅœฐๆŠ›ไธ‹ไบ†ไธ€ไบ›็งฐๅ‘ผใ€‚ ่ฟ™ไธชๅ†ทๅ†ฐๅ†ฐ็š„ๅฎ˜ๅ‘˜็ซ™ๅœจ็บชๅฟต็ข‘ไธŠ๏ผŒๆฏซไธๅœจๆ„ๅœฐๆŠ›ไธ‹ไธ€ไบ›็งฐๅท๏ผŒไป–ๆƒณ๏ผŒๅฆ‚ๆžœไฝœไธบไธ€ไธชๆญปไบบไผšๆ›ดๅ‡บ่‰ฒใ€‚ ็บชๅฟต็ข‘ไธŠ็š„ๅ†ทๆทกๅฎ˜ๅ‘˜๏ผŒๆผซไธ็ปๅฟƒๅœฐๅŸๅ’้‚ฃไบ›ไฟฎ้ฅฐ่ฏญ๏ผŒไป–ๅฟƒๆƒณ๏ผŒ่‹ฅๅŒ–ไธบไบก่€…๏ผŒๆˆ–่ฎธๆ›ดๆ˜พๅฐŠ่ดตใ€‚

Data

We release the testing set of our work, please refer to data/test.jsonl, where en indicates the English source sentences, and zh denotes the corresponding Chinese translation.

We will release the long-thought MT data as well as the data collection codes soon!

License

This work is licensed under cc-by-nc-sa-4.0

Downloads last month
150
GGUF
Model size
7.62B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for QuantFactory/DRT-o1-7B-GGUF

Base model

Qwen/Qwen2.5-7B
Quantized
(207)
this model