Add files using upload-large-folder tool

Browse files

Files changed (14) hide show

.gitattributes +3 -0
README.md +197 -0
added_tokens.json +28 -0
chat_template.jinja +89 -0
genai_config.json +53 -0
hfbadge.svg +100 -0
merges.txt +0 -0
model.onnx +3 -0
model.onnx.data +3 -0
output.gif +3 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +239 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+output.gif filter=lfs diff=lfs merge=lfs -text
+model.onnx.data filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,197 @@

+---
+license: apache-2.0
+library_name: onnxruntime_genai
+base_model:
+- Prince-1/Osmosis-Structure-0.6B
+tags:
+- sml
+- onnx
+- onnxruntime_genai
+---
+# `Osmosis-Structure-0.6B`: Small Language Model for Structured Outputs
+![huggingface badge](hfbadge.svg)
+<div align="center">
+</div>
+`Osmosis-Structure-0.6B` is a specialized small language model (SLM) designed to excel at structured output generation. Despite its compact 0.6B parameter size, this model demonstrates remarkable performance on extracting structured information when paired with supported frameworks.
+Our approach leverages structured output during training, forcing our model to only focus on the value for each key declared by the inference engine, which significantly improves the accuracy of the model's ability to produce well-formatted, structured responses across various domains, particularly in mathematical reasoning and problem-solving tasks.
+<div align="center">
+![Osmosis Structure Demo](output.gif)
+</div>
+## Results
+We evaluate the effectiveness of osmosis-enhanced structured generation on challenging mathematical reasoning benchmarks. The following results demonstrate the dramatic performance improvements achieved through structured outputs with osmosis enhancement across different model families - the same technique that powers `Osmosis-Structure-0.6B`.
+### Math DAPO 17K Dataset
+<div align="center">
+| Model | Structured Output | Structured w/ Osmosis | Performance Gain |
+|-------|:-------------:|:-------------:|:-------------:|
+| Claude 4 Sonnet | 15.52% | **69.40%** | +347% |
+| Claude 4 Opus | 15.28% | **69.91%** | +357% |
+| GPT-4.1 | 10.53% | **70.03%** | +565% |
+| OpenAI o3 | 91.14% | **94.05%** | +2.9% |
+<em>Table 1: Performance on Math DAPO 17K.</em>
+</div>
+### AIME 1983-2024 Dataset
+<div align="center">
+| Model | Structured Output | Structured w/ Osmosis | Performance Gain |
+|-------|:-------------:|:-------------:|:-------------:|
+| Claude 4 Sonnet | 16.29% | **62.59%** | +284% |
+| Claude 4 Opus | 22.94% | **65.06%** | +184% |
+| GPT-4.1 | 2.79% | **39.66%** | +1322% |
+| OpenAI o3 | 92.05% | **93.24%** | +1.3% |
+<em>Table 2: Performance on AIME 1983-2024.</em>
+</div>
+> **Key Insight**: These results demonstrate that by allowing models to think freely and leverage test time compute, we are able to increase performance and still maintain the structured guarantee after the fact with a SLM. `Osmosis-Structure-0.6B` is specifically designed and optimized to maximize these benefits in a compact 0.6B parameter model.
+## Model Training
+`Osmosis-Structure-0.6B` is built on top of `Qwen3-0.6B`. We first established a baseline format using 10 samples of randomly generated text and their JSON interpretations. We then applied reinforcement learning to approximately 500,000 examples of JSON-to-natural language pairs, consisting of either reasoning traces with their final outputs, or natural language reports with their expected structured formats.
+We used [verl](https://github.com/volcengine/verl) as the framework to train our model and [SGLang](https://github.com/sgl-project/sglang) as the rollout backend. To enable structured training, we modified parts of the verl codebase to allow for *per sample schema* to be passed into the training data.
+## Usage
+### SGLang
+We recommend an engine like SGLang to be used to serve the model, to serve, run the following:
+`python3 -m sglang.launch_server --model-path osmosis-ai/Osmosis-Structure-0.6B --host 0.0.0.0 --api-key osmosis`
+And to use the endpoint:
+```python
+import json
+from openai import OpenAI
+api_key = "osmosis"
+api_base_url = "http://0.0.0.0:30000/v1"
+client = OpenAI(
+    api_key=api_key,
+    base_url=api_base_url,
+)
+# Schema for extracting structured output from reasoning traces
+json_schema = json.dumps(
+    {
+        "type": "object",
+        "properties": {
+            "answer": {"type": "string"}
+        },
+        "required": ["answer"]
+    }
+)
+# You can also dump pydantic models to json schema as well
+# Example reasoning trace input
+reasoning_trace = """
+Problem: Solve for x in the equation 2x + 5 = 13
+Let me work through this step by step:
+First, I need to isolate the term with x. I'll subtract 5 from both sides:
+2x + 5 - 5 = 13 - 5
+2x = 8
+Next, I'll divide both sides by 2 to solve for x:
+2x ÷ 2 = 8 ÷ 2
+x = 4
+Let me verify this answer by substituting back into the original equation:
+2(4) + 5 = 8 + 5 = 13 ✓
+Ok, which means I got the correct answer, and I'm confident about my answer.
+"""
+response = client.chat.completions.create(
+    model="osmosis-ai/Osmosis-Structure-0.6B",
+    messages=[
+        {
+            "role": "system",
+            "content": f"You are a helpful assistant that understands and translates text to JSON format according to the following schema. {json_schema}"
+        },
+        {
+            "role": "user",
+            "content": reasoning_trace,
+        },
+    ],
+    temperature=0,
+    max_tokens=512,
+    response_format={
+        "type": "json_schema",
+        "json_schema": {"name": "reasoning_extraction", "schema": json.loads(json_schema)},
+    },
+)
+print(json.dumps(json.loads(response.choices[0].message.content), indent=2))
+```
+### Ollama
+You can also use Ollama as an inference provider on local machines, here is a sample code of the setup:
+```python
+from ollama import chat
+from pydantic import BaseModel
+class Answer(BaseModel):
+  answer: int
+reasoning_trace = """
+Problem: Solve for x in the equation 2x + 5 = 13
+Let me work through this step by step:
+First, I need to isolate the term with x. I'll subtract 5 from both sides:
+2x + 5 - 5 = 13 - 5
+2x = 8
+Next, I'll divide both sides by 2 to solve for x:
+2x ÷ 2 = 8 ÷ 2
+x = 4
+Let me verify this answer by substituting back into the original equation:
+2(4) + 5 = 8 + 5 = 13 ✓
+Ok, which means I got the correct answer, and I'm confident about my answer.
+"""
+response = chat(
+  messages=[
+    {
+        "role": "system",
+        "content": f"You are a helpful assistant that understands and translates text to JSON format according to the following schema. {Answer.model_json_schema()}"
+    },
+    {
+      'role': 'user',
+      'content': reasoning_trace,
+    }
+  ],
+  model='Osmosis/Osmosis-Structure-0.6B',
+  format=Answer.model_json_schema(),
+)
+answer = Answer.model_validate_json(response.message.content)
+print(answer)
+```

added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,89 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

genai_config.json ADDED Viewed

	@@ -0,0 +1,53 @@

+{
+    "model": {
+        "bos_token_id": 151643,
+        "context_length": 40960,
+        "decoder": {
+            "session_options": {
+                "log_id": "onnxruntime-genai",
+                "provider_options": []
+            },
+            "filename": "model.onnx",
+            "head_size": 128,
+            "hidden_size": 1024,
+            "inputs": {
+                "input_ids": "input_ids",
+                "attention_mask": "attention_mask",
+                "position_ids": "position_ids",
+                "past_key_names": "past_key_values.%d.key",
+                "past_value_names": "past_key_values.%d.value"
+            },
+            "outputs": {
+                "logits": "logits",
+                "present_key_names": "present.%d.key",
+                "present_value_names": "present.%d.value"
+            },
+            "num_attention_heads": 16,
+            "num_hidden_layers": 28,
+            "num_key_value_heads": 8
+        },
+        "eos_token_id": [
+            151645,
+            151643
+        ],
+        "pad_token_id": 151643,
+        "type": "qwen3",
+        "vocab_size": 151936
+    },
+    "search": {
+        "diversity_penalty": 0.0,
+        "do_sample": true,
+        "early_stopping": true,
+        "length_penalty": 1.0,
+        "max_length": 40960,
+        "min_length": 0,
+        "no_repeat_ngram_size": 0,
+        "num_beams": 1,
+        "num_return_sequences": 1,
+        "past_present_share_buffer": false,
+        "repetition_penalty": 1.0,
+        "temperature": 0.6,
+        "top_k": 1,
+        "top_p": 0.95
+    }
+}

hfbadge.svg ADDED Viewed

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60696305ffd5f220fbf5e220da84e6b103a9b9695e92067bd5684d92cf809853
+size 728062

model.onnx.data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:055ddb88d47d1c98488558a6b175f11fe84a212d60039bd44367f0bb227f0b24
+size 1513750528

output.gif ADDED Viewed

Git LFS Details

SHA256: 53738442a23c3b5b65732d2513d6d3426923058aafb755a4da2e86981dea46cb
Pointer size: 132 Bytes
Size of remote file: 8.47 MB

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
+size 11422654

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,239 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff