README.md · Writer/Palmyra-local-1

metadata

base_model: Writer/Palmyra-Local-1.7B
tags:
  - instruct
  - finetune
  - DPO
  - distillation
  - small
  - local
  - On Device
  - Transformers.js
  - Enterprise LLM
  - Enterprise
  - Enterprise ready
model_type: palmyra
model-index:
  - name: Palmyra-Med-70B
    results: []
license: other
license_name: writer-open-model-license
license_link: https://writer.com/legal/open-model-license/
extra_gated_prompt: >-
  By clicking "Agree", you agree to the [License
  Agreement](https://writer.com/legal/open-model-license/) and acknowledge
  Writer's [Privacy Policy](https://writer.com/legal/acceptable-use/).
extra_gated_fields:
  Name: text
  Email: text
  Organization or Affiliation: text
  Receive email updates and promotions on Writer products, services, and research?:
    type: select
    options:
      - 'Yes'
      - 'No'
  I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Writer: checkbox
language:
  - en

Palmyra-local-1.7B-Instruct

Introduction
Palmyra-local is part of the Palmyra series of domain-specialized language models, designed for high performance on enterprise and task-specific use cases. This release features a 1.7 billion parameter instruction-tuned variant of Palmyra-local, built for local deployment and optimized for enterprise-grade language understanding and generation.

Compared to earlier versions, Palmyra-local brings the following enhancements:

Stronger domain reasoning in code and math, powered by targeted expert tuning and curated domain datasets.
Improved instruction-following, generation of long-form outputs (8K+ tokens), accurate handling of structured data (e.g., tables), and consistent structured output generation (especially JSON).
Robust prompt handling, enabling nuanced role-play, dynamic agent behavior, and complex prompt chaining in enterprise workflows.
Extended context support, with a maximum context window of 128K tokens and generation support for up to 8K tokens.
Multilingual capabilities, supporting over 29 languages including English, Spanish, French, German, Chinese, Arabic, Japanese, and more.

This repository includes the instruction-tuned Palmyra-local 1.7B model, with the following architecture details:

Type: Causal Language Model
Training Stages: Pretraining + Instruction Tuning
Architecture: Transformer with RoPE positional encoding
Total Parameters: 1.7B
Number of Layers: 28
Attention Heads: GQA

Training Details

Architecture: Palmyra
Training Method: From scratch
Attention Mechanism: GQA
Training Data: [~1T packed dataset]

Benchmark Results

Benchmark	Palmyra-local-1.7B	Qwen2.5-1.5B-Instruct	GPT-4 mini	Llama-3.2-1B-Instruct	Llama-3.2-3B-Instruct
HumanEval	74.10	61.60	N/A	N/A	N/A
MBPP	66.86	63.20	N/A	N/A	N/A
GSM8K	81.0	73.20	88.6	N/A	75.6
MATH	60.94	55.20	64.0	N/A	46.7
MMLU	59.82	58.37	67.3	32.2	58.0
MMLU Pro	34.10	32.40	52.8	N/A	N/A
Average	62.8	57.33	N/A	N/A	N/A

Notes:

HumanEval and MBPP: Benchmark data for these tasks were not available for GPT-4 mini, Llama-3.2-1B-Instruct, and Llama-3.2-3B-Instruct based on the model created sources.

Usage

Install dependencies

requirements.txt

transformers==4.51.0
torch==2.6.0
tokenizers==0.21.1
accelerate==1.6.0

pip install -r requirements.txt

Inference

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Writer/Palmyra-local-1_7B"
auth_token = "xxx"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=auth_token)

# Load model with quantization for lower memory usage (optional)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    token=auth_token,
)

# Prepare input
messages = [
    {"role": "user", "content": "Write a blog post about strangelets"},
]

# Check if apply_chat_template is available, fallback if not
if hasattr(tokenizer, "apply_chat_template"):
    input_ids = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    )
else:
    input_text = messages[0]["content"]
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# Ensure input_ids is on the same device as the model
input_ids = input_ids.to(model.device)

# Generation config
gen_conf = {
    "max_new_tokens": 256,
    "eos_token_id": tokenizer.eos_token_id,
    "temperature": 0.7,
    "top_p": 0.9,
}

# Generate output
with torch.inference_mode():
    output_id = model.generate(input_ids, **gen_conf)

# Decode output
output_text = tokenizer.decode(output_id[0][input_ids.shape[1]:], skip_special_tokens=True)

print(output_text)

Citation and Related Information

To cite this model:

@misc{Palmyra-Local-1.7B,
  author = {Writer Engineering team},
  title = {{Palmyra-Local-1.7B: A powerful LLM designed for On device run}},
  howpublished = {\url{https://dev.writer.com}},
  year = 2025,
  month = March 
}

Contact [email protected]