Palmyra-local-1_7B / README.md
kiranr's picture
Upload folder using huggingface_hub
a0e844b verified
---
base_model: Writer/Palmyra-Local-1.7B
tags:
- instruct
- finetune
- DPO
- distillation
- small
- local
- On Device
- Transformers.js
- Enterprise LLM
- Enterprise
- Enterprise ready
model_type: palmyra
model-index:
- name: Palmyra-Med-70B
results: []
license: other
license_name: writer-open-model-license
license_link: https://writer.com/legal/open-model-license/
extra_gated_prompt: >-
By clicking "Agree", you agree to the [License
Agreement](https://writer.com/legal/open-model-license/)
and acknowledge Writer's [Privacy
Policy](https://writer.com/legal/acceptable-use/).
extra_gated_fields:
Name: text
Email: text
Organization or Affiliation: text
Receive email updates and promotions on Writer products, services, and research?:
type: select
options:
- 'Yes'
- 'No'
I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Writer: checkbox
language:
- en
---
**Palmyra-local-1.7B-Instruct**
**Introduction**
Palmyra-local is part of the Palmyra series of domain-specialized language models, designed for high performance on enterprise and task-specific use cases. This release features a 1.7 billion parameter instruction-tuned variant of Palmyra-local, built for local deployment and optimized for enterprise-grade language understanding and generation.
Compared to earlier versions, Palmyra-local brings the following enhancements:
- **Stronger domain reasoning in code and math**, powered by targeted expert tuning and curated domain datasets.
- **Improved instruction-following**, generation of long-form outputs (8K+ tokens), accurate handling of structured data (e.g., tables), and consistent structured output generation (especially JSON).
- **Robust prompt handling**, enabling nuanced role-play, dynamic agent behavior, and complex prompt chaining in enterprise workflows.
- **Extended context support**, with a maximum context window of 128K tokens and generation support for up to 8K tokens.
- **Multilingual capabilities**, supporting over 29 languages including English, Spanish, French, German, Chinese, Arabic, Japanese, and more.
This repository includes the **instruction-tuned Palmyra-local 1.7B model**, with the following architecture details:
- **Type**: Causal Language Model
- **Training Stages**: Pretraining + Instruction Tuning
- **Architecture**: Transformer with RoPE positional encoding
- **Total Parameters**: 1.7B
- **Number of Layers**: 28
- **Attention Heads**: GQA
## Training Details
- Architecture: Palmyra
- Training Method: From scratch
- Attention Mechanism: GQA
- Training Data: [~1T packed dataset]
## Benchmark Results
| Benchmark | Palmyra-local-1.7B | Qwen2.5-1.5B-Instruct | GPT-4 mini | Llama-3.2-1B-Instruct | Llama-3.2-3B-Instruct |
|-----------|--------------------|----------------------|------------|----------------------|----------------------|
| HumanEval | 74.10 | 61.60 | N/A | N/A | N/A |
| MBPP | 66.86 | 63.20 | N/A | N/A | N/A |
| GSM8K | 81.0 | 73.20 | 88.6 | N/A | 75.6 |
| MATH | 60.94 | 55.20 | 64.0 | N/A | 46.7 |
| MMLU | 59.82 | 58.37 | 67.3 | 32.2 | 58.0 |
| MMLU Pro | 34.10 | 32.40 | 52.8 | N/A | N/A |
| Average | 62.8 | 57.33 | N/A | N/A | N/A |
**Notes:**
- **HumanEval** and **MBPP**: Benchmark data for these tasks were not available for **GPT-4 mini**, **Llama-3.2-1B-Instruct**, and **Llama-3.2-3B-Instruct** based on the model created sources.
## Usage
### Install dependencies
requirements.txt
```txt
transformers==4.51.0
torch==2.6.0
tokenizers==0.21.1
accelerate==1.6.0
```
```bash
pip install -r requirements.txt
```
---
### Inference
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Writer/Palmyra-local-1_7B"
auth_token = "xxx"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=auth_token)
# Load model with quantization for lower memory usage (optional)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
token=auth_token,
)
# Prepare input
messages = [
{"role": "user", "content": "Write a blog post about strangelets"},
]
# Check if apply_chat_template is available, fallback if not
if hasattr(tokenizer, "apply_chat_template"):
input_ids = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
)
else:
input_text = messages[0]["content"]
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
# Ensure input_ids is on the same device as the model
input_ids = input_ids.to(model.device)
# Generation config
gen_conf = {
"max_new_tokens": 256,
"eos_token_id": tokenizer.eos_token_id,
"temperature": 0.7,
"top_p": 0.9,
}
# Generate output
with torch.inference_mode():
output_id = model.generate(input_ids, **gen_conf)
# Decode output
output_text = tokenizer.decode(output_id[0][input_ids.shape[1]:], skip_special_tokens=True)
print(output_text)
```
### Citation and Related Information
To cite this model:
```
@misc{Palmyra-Local-1.7B,
author = {Writer Engineering team},
title = {{Palmyra-Local-1.7B: A powerful LLM designed for On device run}},
howpublished = {\url{https://dev.writer.com}},
year = 2025,
month = March
}
```
Contact
[email protected]