Palmyra-local-1_7B / README.md
kiranr's picture
Upload folder using huggingface_hub
a0e844b verified
metadata
base_model: Writer/Palmyra-Local-1.7B
tags:
  - instruct
  - finetune
  - DPO
  - distillation
  - small
  - local
  - On Device
  - Transformers.js
  - Enterprise LLM
  - Enterprise
  - Enterprise ready
model_type: palmyra
model-index:
  - name: Palmyra-Med-70B
    results: []
license: other
license_name: writer-open-model-license
license_link: https://writer.com/legal/open-model-license/
extra_gated_prompt: >-
  By clicking "Agree", you agree to the [License
  Agreement](https://writer.com/legal/open-model-license/) and acknowledge
  Writer's [Privacy Policy](https://writer.com/legal/acceptable-use/).
extra_gated_fields:
  Name: text
  Email: text
  Organization or Affiliation: text
  Receive email updates and promotions on Writer products, services, and research?:
    type: select
    options:
      - 'Yes'
      - 'No'
  I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Writer: checkbox
language:
  - en

Palmyra-local-1.7B-Instruct

Introduction
Palmyra-local is part of the Palmyra series of domain-specialized language models, designed for high performance on enterprise and task-specific use cases. This release features a 1.7 billion parameter instruction-tuned variant of Palmyra-local, built for local deployment and optimized for enterprise-grade language understanding and generation.

Compared to earlier versions, Palmyra-local brings the following enhancements:

  • Stronger domain reasoning in code and math, powered by targeted expert tuning and curated domain datasets.
  • Improved instruction-following, generation of long-form outputs (8K+ tokens), accurate handling of structured data (e.g., tables), and consistent structured output generation (especially JSON).
  • Robust prompt handling, enabling nuanced role-play, dynamic agent behavior, and complex prompt chaining in enterprise workflows.
  • Extended context support, with a maximum context window of 128K tokens and generation support for up to 8K tokens.
  • Multilingual capabilities, supporting over 29 languages including English, Spanish, French, German, Chinese, Arabic, Japanese, and more.

This repository includes the instruction-tuned Palmyra-local 1.7B model, with the following architecture details:

  • Type: Causal Language Model
  • Training Stages: Pretraining + Instruction Tuning
  • Architecture: Transformer with RoPE positional encoding
  • Total Parameters: 1.7B
  • Number of Layers: 28
  • Attention Heads: GQA

Training Details

  • Architecture: Palmyra
  • Training Method: From scratch
  • Attention Mechanism: GQA
  • Training Data: [~1T packed dataset]

Benchmark Results

Benchmark Palmyra-local-1.7B Qwen2.5-1.5B-Instruct GPT-4 mini Llama-3.2-1B-Instruct Llama-3.2-3B-Instruct
HumanEval 74.10 61.60 N/A N/A N/A
MBPP 66.86 63.20 N/A N/A N/A
GSM8K 81.0 73.20 88.6 N/A 75.6
MATH 60.94 55.20 64.0 N/A 46.7
MMLU 59.82 58.37 67.3 32.2 58.0
MMLU Pro 34.10 32.40 52.8 N/A N/A
Average 62.8 57.33 N/A N/A N/A

Notes:

  • HumanEval and MBPP: Benchmark data for these tasks were not available for GPT-4 mini, Llama-3.2-1B-Instruct, and Llama-3.2-3B-Instruct based on the model created sources.

Usage

Install dependencies

requirements.txt

transformers==4.51.0
torch==2.6.0
tokenizers==0.21.1
accelerate==1.6.0
pip install -r requirements.txt

Inference

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Writer/Palmyra-local-1_7B"
auth_token = "xxx"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=auth_token)

# Load model with quantization for lower memory usage (optional)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    token=auth_token,
)

# Prepare input
messages = [
    {"role": "user", "content": "Write a blog post about strangelets"},
]

# Check if apply_chat_template is available, fallback if not
if hasattr(tokenizer, "apply_chat_template"):
    input_ids = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    )
else:
    input_text = messages[0]["content"]
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# Ensure input_ids is on the same device as the model
input_ids = input_ids.to(model.device)

# Generation config
gen_conf = {
    "max_new_tokens": 256,
    "eos_token_id": tokenizer.eos_token_id,
    "temperature": 0.7,
    "top_p": 0.9,
}

# Generate output
with torch.inference_mode():
    output_id = model.generate(input_ids, **gen_conf)

# Decode output
output_text = tokenizer.decode(output_id[0][input_ids.shape[1]:], skip_special_tokens=True)

print(output_text)

Citation and Related Information

To cite this model:

@misc{Palmyra-Local-1.7B,
  author = {Writer Engineering team},
  title = {{Palmyra-Local-1.7B: A powerful LLM designed for On device run}},
  howpublished = {\url{https://dev.writer.com}},
  year = 2025,
  month = March 
}

Contact [email protected]