Osmosis-Apply-1.7B
Osmosis-Apply-1.7B
is a specialized language model finetuned on Qwen3-1.7B
designed to perform code merges, similar to the apply feature of modern AI code editors. Given an original code snippet and an edit snippet, this model can apply the edit snippet to original code snippet, updating the code snippet with the edit.
Here's an example. Let's say we prompt an LLM to fill out the body of this binary search function.
def binary_search(arr, x):
left = 0
right = len(arr)
# TODO: fill out the body of this
return -1
arr = [1,2,3,4,5,6,7,8,9]
assert binary_search(arr, 0) == -1
assert binary_search(arr, 1) == 0
assert binary_search(arr, 2) == 1
assert binary_search(arr, 3) == 2
assert binary_search(arr, 8) == 7
assert binary_search(arr, 9) == 8
assert binary_search(arr, 10) == -1
With a custom prompt, the LLM produces an edit snippet that includes the binary search code and some surrounding context.
// ... existing code ...
left = 0
right = len(arr)
while(left < right):
mid = left + (right - left) // 2
if(arr[mid] == x):
return mid
elif(arr[mid] < x):
left = mid + 1
else:
right = mid
return -1
arr = [1,2,3,4,5,6,7,8,9]
// ... existing code ...
Osmosis-Apply-1.7B
can apply this edit snippet to the original code, producing the updated, final code.
def binary_search(arr, x):
left = 0
right = len(arr)
while(left < right):
mid = left + (right - left) // 2
if(arr[mid] == x):
return mid
elif(arr[mid] < x):
left = mid + 1
else:
right = mid
return -1
arr = [1,2,3,4,5,6,7,8,9]
assert binary_search(arr, 0) == -1
assert binary_search(arr, 1) == 0
assert binary_search(arr, 2) == 1
assert binary_search(arr, 3) == 2
assert binary_search(arr, 8) == 7
assert binary_search(arr, 9) == 8
assert binary_search(arr, 10) == -1
Benchmarks
We benchmarked our model against several large language models using 10,000 random samples from commitpackft. The rewards are calculated according to our reward function (see: Reward function section).
Model | Average reward |
---|---|
Osmosis-Apply-1.7B | 0.98046 |
Claude 4 Sonnet | 0.93284 |
OpenAI o3 | 0.86394 |
Gemini-2.5-Flash | 0.77452 |
Table 1: Performance on 10k samples from commitpackft.
Methodology
Osmosis-Apply-1.7B
was trained on about 100k randomly sampled commits from the commitpackft dataset, which is less than 15% of the entire dataset. A unified diff was generated between old_contents
and new_contents
, and the unified diff was parsed to create a natural language diff, similar to those outputted by LLMs.
import difflib
unified_diff = difflib.unified_diff(old_code, new_code)
natural_language_diff = generate_from_unified_diff(unified_diff)
The original code + edit were provided as input to the model along with a custom system prompt.
<code>
{ORIGINAL CODE}
</code>
<edit>
{EDIT SNIPPET}
</edit>
Infrastructure
We used verl as the framework to train our model and SGLang as the rollout backend.
Model system prompt
Below is the system prompt we trained our model with.
SYSTEM_PROMPT = \
'''
You are a helpful assistant for a code editor that applies an edit to code to merge them together. That is, you will be given code wrapper in <code> tags and an edit wrapped in <edit> tags, and you will apply the edit to the code.
For example:
<code>
CODE_SNIPPET
</code>
<edit>
EDIT_SNIPPET
</edit>
The code is any type of code and the edit is in the form of:
// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...
The merged code must be exact with no room for any errors. Make sure all whitespaces are preserved correctly. A small typo in code will cause it to fail to compile or error out, leading to poor user experience.
Output the code wrapped in <code> tags.
'''
Edit format
The edit format is designed to be in mostly natural language, with // ... existing code ...
condensing original code that remains unchanged between edits. It is important that when prompting the LLM, it is also instructed provide some additional context (unchanged lines from the original code surrounding the edit), so that Osmosis-Apply-1.7B
can locate where to insert the edit.
// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...
We find that the simple, sequential nature of this edit format makes it easier for smaller models to work with and larger models to output, in exchange for parsability and exactness.
Reward function
We use a simple reward function that looks for exactness in the model outputs.
TL;DR:
- If the new code is exactly correct including whitespaces, then give a large reward (1.0).
- If the new code is correct when excluding empty lines, then give a small reward (0.2).
- Otherwise, give no reward (0.0).
Below is the entire reward function.
import re
def extract_solution(solution_str):
matches = list(re.finditer(r'<code>(.*?)</code>', solution_str, re.DOTALL))
# If nonempty matches and exactly one <code> block exists
if(matches and len(matches) == 1):
return matches[0].group(1).strip()
return None
def filter_empty_lines(lines):
return list(filter(lambda line : line.strip() != "", lines))
def calc_score(answer, ground_truth):
answer = answer.strip()
ground_truth = ground_truth.strip()
if(answer == ground_truth):
return 1.0
else:
answer_lines = filter_empty_lines(answer.splitlines(True))
ground_truth_lines = filter_empty_lines(ground_truth.splitlines(True))
# Give small positive reward if lines are almost correct
if(answer_lines == ground_truth_lines):
return 0.2
return 0
def compute_score(data_source, solution_str, ground_truth, extra_info=None, format_score=0.0, score=1.0):
answer = extract_solution(solution_str=solution_str)
if answer is None:
return 0
else:
return calc_score(answer, ground_truth)
Usage
LLM prompt
Since edits should be generated in a specific format, we have provided an example prompt to give to a coding LLM. This prompt is by no means perfect and can be tweaked a bit to get better results.
You are an AI coding assistant that takes in original code and responds with an edit snippet to the users.
```
<edit>
// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...
</edit>
```
Your response must strictly follow this format.
Guidelines for creating the edit snippet:
1. Regardless of programming language, collapse unchanged lines of code with this exact literal (ignoring backticks): `// ... existing code ...`
2. Provide 2-3 lines of context above and below your changes in the edit to help indicate where it is in the file. If the change is at the start or end of the file, just provide what you can.
3. You do not need to begin or end with `// ... existing code ...` for edits that include the beginning or end of file.
4. Make sure whitespaces, indentation, and formatting matches the original code.
5. You may make as many edits as you would like, but condense edits so that there are not too many, similar to a unified diff.
6. Wrap your final output in <edit> tags.
Here is an example.
Original code:
```
def binary_search(arr, x):
left = 0
right = len(arr)
# TODO: fill out the body of this
return -1
arr = [1,2,3,4,5,6,7,8,9]
assert binary_search(arr, 0) == -1
assert binary_search(arr, 1) == 0
assert binary_search(arr, 2) == 1
assert binary_search(arr, 3) == 2
assert binary_search(arr, 8) == 7
assert binary_search(arr, 9) == 8
assert binary_search(arr, 10) == -1
```
Generated edit:
```
<edit>
// ... existing code ...
left = 0
right = len(arr)
while(left < right):
mid = left + (right - left) // 2
if(arr[mid] == x):
return mid
elif(arr[mid] < x):
left = mid + 1
else:
right = mid
return -1
arr = [1,2,3,4,5,6,7,8,9]
// ... existing code ...
</edit>
```
Serving
During development, we used SGLang to serve the model, though it should be straightforward enough to do something similiar with Ollama.
Below is an example using SGLang.
python3 -m sglang.launch_server --model-path osmosis-ai/Osmosis-Apply-1.7B --host 0.0.0.0 --api-key osmosis
from openai import OpenAI
import re
def create_query(old_code, edit):
return f"<code>\n{old_code}\n</code>\n\n<edit>\n{edit}\n</edit>"
def extract_solution(solution_str):
matches = list(re.finditer(r'<code>(.*?)</code>', solution_str, re.DOTALL))
# If nonempty matches and exactly one <code> block exists
if(matches and len(matches) == 1):
return matches[0].group(1).strip()
return None
SYSTEM_PROMPT = \
'''
You are a helpful assistant for a code editor that applies an edit to code to merge them together. That is, you will be given code wrapper in <code> tags and an edit wrapped in <edit> tags, and you will apply the edit to the code.
For example:
<code>
CODE_SNIPPET
</code>
<edit>
EDIT_SNIPPET
</edit>
The code is any type of code and the edit is in the form of:
// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...
The merged code must be exact with no room for any errors. Make sure all whitespaces are preserved correctly. A small typo in code will cause it to fail to compile or error out, leading to poor user experience.
Output the code wrapped in <code> tags.
'''
api_key = "osmosis"
api_base_url = "http://0.0.0.0:30000/v1"
client = OpenAI(
api_key=api_key,
base_url=api_base_url,
)
def generate_completion(query: str, system_prompt: str) -> str:
messages = [
{
"role": "user",
"content": query,
},
{
"role": "system",
"content": system_prompt,
},
]
response = client.chat.completions.create(
model="",
messages=messages,
temperature=0,
max_tokens=3072,
)
completion = response.choices[0].message.content
return completion
original_code = \
'''
def binary_search(arr, x):
left = 0
right = len(arr)
# TODO: fill out the body of this
return -1
arr = [1,2,3,4,5,6,7,8,9]
assert binary_search(arr, 0) == -1
assert binary_search(arr, 1) == 0
assert binary_search(arr, 2) == 1
assert binary_search(arr, 3) == 2
assert binary_search(arr, 8) == 7
assert binary_search(arr, 9) == 8
assert binary_search(arr, 10) == -1
'''
edit = \
'''
// ... existing code ...
left = 0
right = len(arr)
while(left < right):
mid = left + (right - left) // 2
if(arr[mid] == x):
return mid
elif(arr[mid] < x):
left = mid + 1
else:
right = mid
return -1
arr = [1,2,3,4,5,6,7,8,9]
// ... existing code ...
'''
completion = generate_completion(create_query(original_code, edit), SYSTEM_PROMPT)
updated_code = extract_solution(completion)
print(updated_code)
- Downloads last month
- 163