QuantFactory
/

DeepSeek-Coder-V2-Lite-Instruct-GGUF

+---
+license: other
+license_name: deepseek-license
+license_link: LICENSE
+base_model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
+pipeline_tag: text-generation
+---
+<!-- markdownlint-disable first-line-h1 -->
+<!-- markdownlint-disable html -->
+<!-- markdownlint-disable no-duplicate-header -->
+# QuantFactory/DeepSeek-Coder-V2-Lite-Instruct-GGUF
+This is quantized version of [QuantFactory/DeepSeek-Coder-V2-Lite-Instruct-GGUF](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct) created using llama.cpp
+# Model Description
+<div align="center">
+  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V2" />
+</div>
+<p align="center">
+  <a href="https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/paper.pdf"><b>Paper Link</b>👁️</a>
+</p>
+# DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
+## 1. Introduction
+We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.
+<p align="center">
+  <img width="100%" src="https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/figures/performance.png?raw=true">
+</p>
+In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.  The list of supported programming languages can be found [here](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/supported_langs.txt).
+## 2. Model Downloads
+We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the [DeepSeekMoE](https://arxiv.org/pdf/2401.06066) framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public.
+<div align="center">
+|            **Model**            | **#Total Params** | **#Active Params** | **Context Length** |                         **Download**                         |
+| :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: |
+|   DeepSeek-Coder-V2-Lite-Base   |        16B        |        2.4B        |        128k        | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Base) |
+| DeepSeek-Coder-V2-Lite-Instruct |        16B        |        2.4B        |        128k        | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct) |
+|     DeepSeek-Coder-V2-Base      |       236B        |        21B         |        128k        | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Base) |
+|   DeepSeek-Coder-V2-Instruct    |       236B        |        21B         |        128k        | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct) |
+</div>
+## 3. Chat Website
+You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: [coder.deepseek.com](https://coder.deepseek.com/sign_in)
+## 4. API Platform
+We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/), and you can also pay-as-you-go at an unbeatable price.
+<p align="center">
+  <img width="40%" src="https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/figures/model_price.jpg?raw=true">
+</p>
+## 5. How to run locally
+**Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB*8 GPUs are required.**
+### Inference with Huggingface's Transformers
+You can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
+#### Code Completion
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
+input_text = "#write a quick sort algorithm"
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_length=128)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+#### Code Insertion
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
+input_text = """<｜fim▁begin｜>def quick_sort(arr):
+    if len(arr) <= 1:
+        return arr
+    pivot = arr[0]
+    left = []
+    right = []
+<｜fim▁hole｜>
+        if arr[i] < pivot:
+            left.append(arr[i])
+        else:
+            right.append(arr[i])
+    return quick_sort(left) + [pivot] + quick_sort(right)<｜fim▁end｜>"""
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_length=128)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])
+```
+#### Chat Completion
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
+messages=[
+    { 'role': 'user', 'content': "write a quick sort algorithm in python."}
+]
+inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
+# tokenizer.eos_token_id is the id of <|EOT|> token
+outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
+print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
+```
+The complete chat template can be found within `tokenizer_config.json` located in the huggingface model repository.
+An example of chat template is as belows:
+```bash
+<｜begin▁of▁sentence｜>User: {user_message_1}
+Assistant: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}
+Assistant:
+```
+You can also add an optional system message:
+```bash
+<｜begin▁of▁sentence｜>{system_message}
+User: {user_message_1}
+Assistant: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}
+Assistant:
+```
+### Inference with vLLM (recommended)
+To utilize [vLLM](https://github.com/vllm-project/vllm) for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650.
+```python
+from transformers import AutoTokenizer
+from vllm import LLM, SamplingParams
+max_model_len, tp_size = 8192, 1
+model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
+sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
+messages_list = [
+    [{"role": "user", "content": "Who are you?"}],
+    [{"role": "user", "content": "write a quick sort algorithm in python."}],
+    [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
+]
+prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
+outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
+generated_text = [output.outputs[0].text for output in outputs]
+print(generated_text)
+```
+## 6. License
+This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-CODE). The use of DeepSeek-Coder-V2 Base/Instruct models is subject to [the Model License](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-MODEL). DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use.
+## 7. Original Model Contact
+If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).