|
--- |
|
base_model: Writer/Palmyra-Local-1.7B |
|
tags: |
|
- instruct |
|
- finetune |
|
- DPO |
|
- distillation |
|
- small |
|
- local |
|
- On Device |
|
- Transformers.js |
|
- Enterprise LLM |
|
- Enterprise |
|
- Enterprise ready |
|
model_type: palmyra |
|
model-index: |
|
- name: Palmyra-Med-70B |
|
results: [] |
|
license: other |
|
license_name: writer-open-model-license |
|
license_link: https://writer.com/legal/open-model-license/ |
|
extra_gated_prompt: >- |
|
By clicking "Agree", you agree to the [License |
|
Agreement](https://writer.com/legal/open-model-license/) |
|
and acknowledge Writer's [Privacy |
|
Policy](https://writer.com/legal/acceptable-use/). |
|
extra_gated_fields: |
|
Name: text |
|
Email: text |
|
Organization or Affiliation: text |
|
Receive email updates and promotions on Writer products, services, and research?: |
|
type: select |
|
options: |
|
- 'Yes' |
|
- 'No' |
|
I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Writer: checkbox |
|
language: |
|
- en |
|
--- |
|
|
|
**Palmyra-local-1.7B-Instruct** |
|
|
|
**Introduction** |
|
Palmyra-local is part of the Palmyra series of domain-specialized language models, designed for high performance on enterprise and task-specific use cases. This release features a 1.7 billion parameter instruction-tuned variant of Palmyra-local, built for local deployment and optimized for enterprise-grade language understanding and generation. |
|
|
|
Compared to earlier versions, Palmyra-local brings the following enhancements: |
|
|
|
- **Stronger domain reasoning in code and math**, powered by targeted expert tuning and curated domain datasets. |
|
- **Improved instruction-following**, generation of long-form outputs (8K+ tokens), accurate handling of structured data (e.g., tables), and consistent structured output generation (especially JSON). |
|
- **Robust prompt handling**, enabling nuanced role-play, dynamic agent behavior, and complex prompt chaining in enterprise workflows. |
|
- **Extended context support**, with a maximum context window of 128K tokens and generation support for up to 8K tokens. |
|
- **Multilingual capabilities**, supporting over 29 languages including English, Spanish, French, German, Chinese, Arabic, Japanese, and more. |
|
|
|
This repository includes the **instruction-tuned Palmyra-local 1.7B model**, with the following architecture details: |
|
|
|
- **Type**: Causal Language Model |
|
- **Training Stages**: Pretraining + Instruction Tuning |
|
- **Architecture**: Transformer with RoPE positional encoding |
|
- **Total Parameters**: 1.7B |
|
- **Number of Layers**: 28 |
|
- **Attention Heads**: GQA |
|
|
|
|
|
## Training Details |
|
- Architecture: Palmyra |
|
- Training Method: From scratch |
|
- Attention Mechanism: GQA |
|
- Training Data: [~1T packed dataset] |
|
|
|
|
|
## Benchmark Results |
|
|
|
| Benchmark | Palmyra-local-1.7B | Qwen2.5-1.5B-Instruct | GPT-4 mini | Llama-3.2-1B-Instruct | Llama-3.2-3B-Instruct | |
|
|-----------|--------------------|----------------------|------------|----------------------|----------------------| |
|
| HumanEval | 74.10 | 61.60 | N/A | N/A | N/A | |
|
| MBPP | 66.86 | 63.20 | N/A | N/A | N/A | |
|
| GSM8K | 81.0 | 73.20 | 88.6 | N/A | 75.6 | |
|
| MATH | 60.94 | 55.20 | 64.0 | N/A | 46.7 | |
|
| MMLU | 59.82 | 58.37 | 67.3 | 32.2 | 58.0 | |
|
| MMLU Pro | 34.10 | 32.40 | 52.8 | N/A | N/A | |
|
| Average | 62.8 | 57.33 | N/A | N/A | N/A | |
|
|
|
**Notes:** |
|
|
|
- **HumanEval** and **MBPP**: Benchmark data for these tasks were not available for **GPT-4 mini**, **Llama-3.2-1B-Instruct**, and **Llama-3.2-3B-Instruct** based on the model created sources. |
|
|
|
|
|
## Usage |
|
|
|
### Install dependencies |
|
|
|
requirements.txt |
|
|
|
```txt |
|
transformers==4.51.0 |
|
torch==2.6.0 |
|
tokenizers==0.21.1 |
|
accelerate==1.6.0 |
|
``` |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
--- |
|
|
|
### Inference |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_id = "Writer/Palmyra-local-1_7B" |
|
auth_token = "xxx" |
|
|
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=auth_token) |
|
|
|
# Load model with quantization for lower memory usage (optional) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
trust_remote_code=True, |
|
token=auth_token, |
|
) |
|
|
|
# Prepare input |
|
messages = [ |
|
{"role": "user", "content": "Write a blog post about strangelets"}, |
|
] |
|
|
|
# Check if apply_chat_template is available, fallback if not |
|
if hasattr(tokenizer, "apply_chat_template"): |
|
input_ids = tokenizer.apply_chat_template( |
|
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" |
|
) |
|
else: |
|
input_text = messages[0]["content"] |
|
input_ids = tokenizer(input_text, return_tensors="pt").input_ids |
|
|
|
# Ensure input_ids is on the same device as the model |
|
input_ids = input_ids.to(model.device) |
|
|
|
# Generation config |
|
gen_conf = { |
|
"max_new_tokens": 256, |
|
"eos_token_id": tokenizer.eos_token_id, |
|
"temperature": 0.7, |
|
"top_p": 0.9, |
|
} |
|
|
|
# Generate output |
|
with torch.inference_mode(): |
|
output_id = model.generate(input_ids, **gen_conf) |
|
|
|
# Decode output |
|
output_text = tokenizer.decode(output_id[0][input_ids.shape[1]:], skip_special_tokens=True) |
|
|
|
print(output_text) |
|
``` |
|
|
|
|
|
### Citation and Related Information |
|
|
|
To cite this model: |
|
|
|
``` |
|
@misc{Palmyra-Local-1.7B, |
|
author = {Writer Engineering team}, |
|
title = {{Palmyra-Local-1.7B: A powerful LLM designed for On device run}}, |
|
howpublished = {\url{https://dev.writer.com}}, |
|
year = 2025, |
|
month = March |
|
} |
|
``` |
|
|
|
Contact |
|
[email protected] |
|
|