---
base_model: Writer/Palmyra-Local-1.7B
tags:
- instruct
- finetune
- DPO
- distillation
- small
- local
- On Device 
- Transformers.js
- Enterprise LLM
- Enterprise
- Enterprise ready
model_type: palmyra
model-index:
- name: Palmyra-Med-70B
  results: []
license: other
license_name: writer-open-model-license
license_link: https://writer.com/legal/open-model-license/
extra_gated_prompt: >-
  By clicking "Agree", you agree to the [License
  Agreement](https://writer.com/legal/open-model-license/)
  and acknowledge Writer's [Privacy
  Policy](https://writer.com/legal/acceptable-use/).
extra_gated_fields:
  Name: text
  Email: text
  Organization or Affiliation: text
  Receive email updates and promotions on Writer products, services, and research?:
    type: select
    options:
    - 'Yes'
    - 'No'
  I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Writer: checkbox
language:
- en
---

**Palmyra-local-1.7B-Instruct**  

**Introduction**  
Palmyra-local is part of the Palmyra series of domain-specialized language models, designed for high performance on enterprise and task-specific use cases. This release features a 1.7 billion parameter instruction-tuned variant of Palmyra-local, built for local deployment and optimized for enterprise-grade language understanding and generation.

Compared to earlier versions, Palmyra-local brings the following enhancements:

- **Stronger domain reasoning in code and math**, powered by targeted expert tuning and curated domain datasets.  
- **Improved instruction-following**, generation of long-form outputs (8K+ tokens), accurate handling of structured data (e.g., tables), and consistent structured output generation (especially JSON).  
- **Robust prompt handling**, enabling nuanced role-play, dynamic agent behavior, and complex prompt chaining in enterprise workflows.  
- **Extended context support**, with a maximum context window of 128K tokens and generation support for up to 8K tokens.  
- **Multilingual capabilities**, supporting over 29 languages including English, Spanish, French, German, Chinese, Arabic, Japanese, and more.

This repository includes the **instruction-tuned Palmyra-local 1.7B model**, with the following architecture details:

- **Type**: Causal Language Model  
- **Training Stages**: Pretraining + Instruction Tuning  
- **Architecture**: Transformer with RoPE positional encoding
- **Total Parameters**: 1.7B  
- **Number of Layers**: 28  
- **Attention Heads**: GQA


## Training Details
- Architecture: Palmyra 
- Training Method: From scratch
- Attention Mechanism: GQA
- Training Data: [~1T packed dataset]


## Benchmark Results

| Benchmark | Palmyra-local-1.7B | Qwen2.5-1.5B-Instruct | GPT-4 mini | Llama-3.2-1B-Instruct | Llama-3.2-3B-Instruct |
|-----------|--------------------|----------------------|------------|----------------------|----------------------|
| HumanEval | 74.10              | 61.60                | N/A        | N/A                  | N/A                  |
| MBPP      | 66.86              | 63.20                | N/A        | N/A                  | N/A                  |
| GSM8K     | 81.0              | 73.20                | 88.6       | N/A                  | 75.6                 |
| MATH      | 60.94              | 55.20                | 64.0       | N/A                  | 46.7                 |
| MMLU      | 59.82              | 58.37                | 67.3       | 32.2                 | 58.0                 |
| MMLU Pro  | 34.10              | 32.40                | 52.8       | N/A                  | N/A                  |
| Average   | 62.8              | 57.33                | N/A        | N/A                  | N/A                  |

**Notes:**

- **HumanEval** and **MBPP**: Benchmark data for these tasks were not available for **GPT-4 mini**, **Llama-3.2-1B-Instruct**, and **Llama-3.2-3B-Instruct** based on the model created sources.


## Usage

### Install dependencies

requirements.txt

```txt
transformers==4.51.0
torch==2.6.0
tokenizers==0.21.1
accelerate==1.6.0
```

```bash
pip install -r requirements.txt
```

---

### Inference

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Writer/Palmyra-local-1_7B"
auth_token = "xxx"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=auth_token)

# Load model with quantization for lower memory usage (optional)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    token=auth_token,
)

# Prepare input
messages = [
    {"role": "user", "content": "Write a blog post about strangelets"},
]

# Check if apply_chat_template is available, fallback if not
if hasattr(tokenizer, "apply_chat_template"):
    input_ids = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    )
else:
    input_text = messages[0]["content"]
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# Ensure input_ids is on the same device as the model
input_ids = input_ids.to(model.device)

# Generation config
gen_conf = {
    "max_new_tokens": 256,
    "eos_token_id": tokenizer.eos_token_id,
    "temperature": 0.7,
    "top_p": 0.9,
}

# Generate output
with torch.inference_mode():
    output_id = model.generate(input_ids, **gen_conf)

# Decode output
output_text = tokenizer.decode(output_id[0][input_ids.shape[1]:], skip_special_tokens=True)

print(output_text)
```


### Citation and Related Information

To cite this model:

```
@misc{Palmyra-Local-1.7B,
  author = {Writer Engineering team},
  title = {{Palmyra-Local-1.7B: A powerful LLM designed for On device run}},
  howpublished = {\url{https://dev.writer.com}},
  year = 2025,
  month = March 
}
```

Contact
Hello@writer.com