--- language: - en library_name: transformers pipeline_tag: text-generation tags: - esper - esper-3 - valiant - valiant-labs - qwen - qwen-3 - qwen-3-8b - 8b - reasoning - code - code-instruct - python - javascript - dev-ops - jenkins - terraform - scripting - powershell - azure - aws - gcp - cloud - problem-solving - architect - engineer - developer - creative - analytical - expert - rationality - conversational - chat - instruct base_model: Qwen/Qwen3-8B datasets: - sequelbox/Titanium2.1-DeepSeek-R1 - sequelbox/Tachibana2-DeepSeek-R1 - sequelbox/Raiden-DeepSeek-R1 license: apache-2.0 --- **[Support our open-source dataset and model releases!](https://huggingface.co/spaces/sequelbox/SupportOpenSource)** ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64f267a8a4f79a118e0fcc89/qdicXwrO_XOKRTjOu2yBF.jpeg) Esper 3: [Qwen3-4B](https://huggingface.co/ValiantLabs/Qwen3-4B-Esper3), [Qwen3-8B](https://huggingface.co/ValiantLabs/Qwen3-8B-Esper3) Esper 3 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3. - Finetuned on our [DevOps and architecture reasoning](https://huggingface.co/datasets/sequelbox/Titanium2.1-DeepSeek-R1) and [code reasoning](https://huggingface.co/datasets/sequelbox/Tachibana2-DeepSeek-R1) data generated with Deepseek R1! - Improved [general and creative reasoning](https://huggingface.co/datasets/sequelbox/Raiden-DeepSeek-R1) to supplement problem-solving and general chat performance. - Small model sizes allow running on local desktop and mobile, plus super-fast server inference! ## Prompting Guide Esper 3 uses the [Qwen 3](https://huggingface.co/Qwen/Qwen3-8B) prompt format. Esper 3 is a reasoning finetune; **we recommend enable_thinking=True for all chats.** Example inference script to get started: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "ValiantLabs/Qwen3-8B-Esper3" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Write a Terraform configuration that uses the `aws_ami` data source to find the latest Amazon Linux 2 AMI. Then, provision an EC2 instance using this dynamically determined AMI ID." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True # Switches between thinking and non-thinking modes. Default is True. ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=32768 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content try: # rindex finding 151668 () index = len(output_ids) - output_ids[::-1].index(151668) except ValueError: index = 0 thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") print("thinking content:", thinking_content) print("content:", content) ``` ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63444f2687964b331809eb55/VCJ8Fmefd8cdVhXSSxJiD.jpeg) Esper 3 is created by [Valiant Labs.](http://valiantlabs.ca/) [Check out our HuggingFace page to see all of our models!](https://huggingface.co/ValiantLabs) We care about open source. For everyone to use.