Model Overview

Description:

CodeLlama-13B-QML is a large language model customized by the Qt Company for Fill-In-The-Middle code completion tasks in the QML programming language, especially for Qt Quick Controls compliant with Qt 6 releases. The CodeLlama-13B-QML model is designed for companies and individuals that want to self-host their LLM for HMI (Human Machine Interface) software development instead of relying on third-party hosted LLMs. It can be run via cloud services or locally, via Ollama.

This model reaches a score of 89% on the QML100 Fill-In-the-Middle code completion benchmark for Qt 6-compliant code. In comparison, other models scored:

DeepSeek V3: 87%
Claude 4 Sonnet: 81%
CodeLlama-7B-QML: 80%
Claude 3.7 Sonnet: 76%
Codestral: 69%
CodeLlama 13B: 66%
GPT-4o: 62%
CodeLlama 7B: 61%

This model was fine-tuned based on raw data from over 5500 human-created QML code snippets using the LoRa fine-tuning method. CodeLlama-13B-QML is not optimised for the creation of Qt5-release compliant, C++, or Python code.

Terms of use:

By accessing this model, you are agreeing to the Llama 2 terms and conditions of the license, acceptable use policy and Meta’s privacy policy. By using this model, you are furthermore agreeing to the Qt AI Model terms & conditions.

Usage:

CodeLlama-13B-QML is a medium-sized Language Model that requires significant computing resources to perform with inference (response) times suitable for automatic code completion. Therefore, it should be used with a GPU accelerator, either in the cloud environment such as AWS, Google Cloud, Microsoft Azure, or locally.

Large Language Models, including CodeLlama-13B-QML, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building AI systems.

The repository contains multiple files with adapters.

How to run CodeLlama-13B-QML in cloud deployment:

The configuration depends on the chosen cloud technology.

Running a CodeLlama-13B-QML in the cloud requires working with Docker and vLLM for optimal performance. Make sure all required dependencies are installed (transformers, accelerate and peft modules). Use bfloat16 precision. The setup leverages the base model from Hugging Face (requiring an access token) combined with adapter weights from the repository. Using vLLM enables efficient inference with an OpenAI-compatible API endpoint, making integration straightforward. vLLM serves as a highly optimized backend that implements request batching and queuing mechanisms, providing excellent serving optimization. The docker container should be run on an instance with GPU accelerator. The configuration has been thoroughly tested on Ubuntu 22.04 LTS running NVIDIA driver with A100 80GB GPUs, demonstrating stable and efficient performance.

How to run CodeLlama-13B-QML in ollama:

We have preloaded the model to Ollama for your convenience.

1. Download and install Ollama from Ollama's web page (if you are not using it yet):

https://ollama.com/download

2. Run the model with the following command in Ollama's CLI:

ollama run theqtcompany/codellama-13b-qml

Now, you can set and use CodeLlama-13B-QML as an LLM for code completions in the Qt AI Assistant or other coding assistants. If you want to test the model in Ollama, then you can write curl requests in Ollama's CLI, as shown below.

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "theqtcompany/codellama-13b-qml",
  "Prompt": "<SUF>\n    title: qsTr(\"Hello World\")\n}<PRE>import QtQuick\n\nWindow {\n    width: 640\n    height: 480\n    visible: true\n<MID>",
  "stream": false,
  "temperature": 0,
  "top_p": 1,
  "repeat_penalty": 1.05,
  "num_predict": 500,
  "stop": ["<SUF>", "<PRE>", "</PRE>", "</SUF>", "< EOT >", "\\end", "<MID>", "</MID>", "##"]
}'

In general, the prompt format for CodeLlama-13B-QML is:

"<SUF>{suffix}<PRE>{prefix}<MID>"

If there is no suffix, please use:

"<PRE>{prefix}<MID>"

Modify and Adapt CodeLlama-13B-QML:

The HuggingFace repository contains all necessary components including the .safetensors files and tokenizer configurations, giving you everything needed to modify the model across various environments and better suit your specific requirements or train it on your custom dataset.

Model Version:

v3.0

Attribution:

Downloads last month: 298

GGUF

Model size

0.3B params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for QtGroup/CodeLlama-13B-QML

Base model

meta-llama/CodeLlama-13b-hf

Adapter

(1)

this model