--- base_model: Writer/Palmyra-Local-1.7B tags: - instruct - finetune - DPO - distillation - small - local - On Device - Transformers.js - Enterprise LLM - Enterprise - Enterprise ready model_type: palmyra model-index: - name: Palmyra-Med-70B results: [] license: other license_name: writer-open-model-license license_link: https://writer.com/legal/open-model-license/ extra_gated_prompt: >- By clicking "Agree", you agree to the [License Agreement](https://writer.com/legal/open-model-license/) and acknowledge Writer's [Privacy Policy](https://writer.com/legal/acceptable-use/). extra_gated_fields: Name: text Email: text Organization or Affiliation: text Receive email updates and promotions on Writer products, services, and research?: type: select options: - 'Yes' - 'No' I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Writer: checkbox language: - en --- **Palmyra-local-1.7B-Instruct** **Introduction** Palmyra-local is part of the Palmyra series of domain-specialized language models, designed for high performance on enterprise and task-specific use cases. This release features a 1.7 billion parameter instruction-tuned variant of Palmyra-local, built for local deployment and optimized for enterprise-grade language understanding and generation. Compared to earlier versions, Palmyra-local brings the following enhancements: - **Stronger domain reasoning in code and math**, powered by targeted expert tuning and curated domain datasets. - **Improved instruction-following**, generation of long-form outputs (8K+ tokens), accurate handling of structured data (e.g., tables), and consistent structured output generation (especially JSON). - **Robust prompt handling**, enabling nuanced role-play, dynamic agent behavior, and complex prompt chaining in enterprise workflows. - **Extended context support**, with a maximum context window of 128K tokens and generation support for up to 8K tokens. - **Multilingual capabilities**, supporting over 29 languages including English, Spanish, French, German, Chinese, Arabic, Japanese, and more. This repository includes the **instruction-tuned Palmyra-local 1.7B model**, with the following architecture details: - **Type**: Causal Language Model - **Training Stages**: Pretraining + Instruction Tuning - **Architecture**: Transformer with RoPE positional encoding - **Total Parameters**: 1.7B - **Number of Layers**: 28 - **Attention Heads**: GQA ## Training Details - Architecture: Palmyra - Training Method: From scratch - Attention Mechanism: GQA - Training Data: [~1T packed dataset] ## Benchmark Results | Benchmark | Palmyra-local-1.7B | Qwen2.5-1.5B-Instruct | GPT-4 mini | Llama-3.2-1B-Instruct | Llama-3.2-3B-Instruct | |-----------|--------------------|----------------------|------------|----------------------|----------------------| | HumanEval | 74.10 | 61.60 | N/A | N/A | N/A | | MBPP | 66.86 | 63.20 | N/A | N/A | N/A | | GSM8K | 81.0 | 73.20 | 88.6 | N/A | 75.6 | | MATH | 60.94 | 55.20 | 64.0 | N/A | 46.7 | | MMLU | 59.82 | 58.37 | 67.3 | 32.2 | 58.0 | | MMLU Pro | 34.10 | 32.40 | 52.8 | N/A | N/A | | Average | 62.8 | 57.33 | N/A | N/A | N/A | **Notes:** - **HumanEval** and **MBPP**: Benchmark data for these tasks were not available for **GPT-4 mini**, **Llama-3.2-1B-Instruct**, and **Llama-3.2-3B-Instruct** based on the model created sources. ## Usage ### Install dependencies requirements.txt ```txt transformers==4.51.0 torch==2.6.0 tokenizers==0.21.1 accelerate==1.6.0 ``` ```bash pip install -r requirements.txt ``` --- ### Inference ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "Writer/Palmyra-local-1_7B" auth_token = "xxx" # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=auth_token) # Load model with quantization for lower memory usage (optional) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True, token=auth_token, ) # Prepare input messages = [ {"role": "user", "content": "Write a blog post about strangelets"}, ] # Check if apply_chat_template is available, fallback if not if hasattr(tokenizer, "apply_chat_template"): input_ids = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ) else: input_text = messages[0]["content"] input_ids = tokenizer(input_text, return_tensors="pt").input_ids # Ensure input_ids is on the same device as the model input_ids = input_ids.to(model.device) # Generation config gen_conf = { "max_new_tokens": 256, "eos_token_id": tokenizer.eos_token_id, "temperature": 0.7, "top_p": 0.9, } # Generate output with torch.inference_mode(): output_id = model.generate(input_ids, **gen_conf) # Decode output output_text = tokenizer.decode(output_id[0][input_ids.shape[1]:], skip_special_tokens=True) print(output_text) ``` ### Citation and Related Information To cite this model: ``` @misc{Palmyra-Local-1.7B, author = {Writer Engineering team}, title = {{Palmyra-Local-1.7B: A powerful LLM designed for On device run}}, howpublished = {\url{https://dev.writer.com}}, year = 2025, month = March } ``` Contact Hello@writer.com