Uploaded model

Developed by: pmking27
License: apache-2.0
Finetuned from model : pmking27/PrathameshLLM-2B

Provided Quants Files

Name	Quant method	Bits	Size
PrathameshLLM-2B.IQ3_M.gguf	IQ3_M	3	1.31 GB
PrathameshLLM-2B.IQ3_S.gguf	IQ3_S	3	1.29 GB
PrathameshLLM-2B.IQ3_XS.gguf	IQ3_XS	3	1.24 GB
PrathameshLLM-2B.IQ4_NL.gguf	IQ4_NL	4	1.56 GB
PrathameshLLM-2B.IQ4_XS.gguf	IQ4_XS	4	1.5 GB
PrathameshLLM-2B.Q2_K.gguf	Q2_K	2	1.16 GB
PrathameshLLM-2B.Q3_K_L.gguf	Q3_K_L	3	1.47 GB
PrathameshLLM-2B.Q3_K_M.gguf	Q3_K_M	3	1.38 GB
PrathameshLLM-2B.Q3_K_S.gguf	Q3_K_S	3	1.29 GB
PrathameshLLM-2B.Q4_0.gguf	Q4_0	4	1.55 GB
PrathameshLLM-2B.Q4_K_M.gguf	Q4_K_M	4	1.63 GB
PrathameshLLM-2B.Q4_K_S.gguf	Q4_K_S	4	1.56 GB
PrathameshLLM-2B.Q5_0.gguf	Q5_0	5	1.8 GB
PrathameshLLM-2B.Q5_K_M.gguf	Q5_K_M	5	1.84 GB
PrathameshLLM-2B.Q5_K_S.gguf	Q5_K_S	5	1.8 GB
PrathameshLLM-2B.Q6_K.gguf	Q6_K	6	2.06 GB
PrathameshLLM-2B.Q8_0.gguf	Q8_0	8	2.67 GB

First install the package

Run one of the following commands, according to your system:

# Base ctransformers with no GPU acceleration
pip install llama-cpp-python
# With NVidia CUDA acceleration
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
# Or with OpenBLAS acceleration
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
# Or with CLBLast acceleration
CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
# Or with AMD ROCm GPU acceleration (Linux only)
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
# Or with Metal GPU acceleration for macOS systems only
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

# In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
pip install llama-cpp-python

Model Download Script

import os
from huggingface_hub import hf_hub_download

# Specify model details
model_repo_id = "pmking27/PrathameshLLM-2B-GGUF"  # Replace with the desired model repo
filename = "PrathameshLLM-2B.Q4_K_M.gguf"  # Replace with the specific GGUF filename
local_folder = "."  # Replace with your desired local storage path

# Create the local directory if it doesn't exist
os.makedirs(local_folder, exist_ok=True)

# Download the model file to the specified local folder
filepath = hf_hub_download(repo_id=model_repo_id, filename=filename, cache_dir=local_folder)

print(f"GGUF model downloaded and saved to: {filepath}")

Replace model_repo_id and filename with the desired model repository ID and specific GGUF filename respectively. Also, modify local_folder to specify where you want to save the downloaded model file.

Simple llama-cpp-python Simple inference example code

from llama_cpp import Llama

llm = Llama(
  model_path = filepath,  # Download the model file first
  n_ctx = 32768,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads = 8,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers = 35         # The number of layers to offload to GPU, if you have GPU acceleration available
)
# Defining the Alpaca prompt template
alpaca_prompt = """
### Instruction:
{}

### Input:
{}

### Response:
{}"""

output = llm(
  alpaca_prompt.format(
        '''
        You're an assistant trained to answer questions using the given context.

        context:

        General elections will be held in India from 19 April 2024 to 1 June 2024 to elect the 543 members of the 18th Lok Sabha. The elections will be held in seven phases and the results will be announced on 4 June 2024. This will be the largest-ever election in the world, surpassing the 2019 Indian general election, and will be the longest-held general elections in India with a total span of 44 days (excluding the first 1951–52 Indian general election). The incumbent prime minister Narendra Modi who completed a second term will be contesting elections for a third consecutive term.

        Approximately 960 million individuals out of a population of 1.4 billion are eligible to participate in the elections, which are expected to span a month for completion. The Legislative assembly elections in the states of Andhra Pradesh, Arunachal Pradesh, Odisha, and Sikkim will be held simultaneously with the general election, along with the by-elections for 35 seats among 16 states.
        ''', # instruction
        "In how many phases will the general elections in India be held?", # input
        "", # output - leave this blank for generation!
    ), #Alpaca Prompt
  max_tokens = 512,  # Generate up to 512 tokens
  stop = ["<eos>"],   #stop token
  echo = True        # Whether to echo the prompt
)

output_text = output['choices'][0]['text']
start_marker = "### Response:"
end_marker = "<eos>"
start_pos = output_text.find(start_marker) + len(start_marker)
end_pos = output_text.find(end_marker, start_pos)

# Extracting the response text
response_text = output_text[start_pos:end_pos].strip()

print(response_text)

Simple llama-cpp-python Chat Completion API Example Code

from llama_cpp import Llama
llm = Llama(model_path = filepath, chat_format="gemma")  # Set chat_format according to the model you are using
message=llm.create_chat_completion(
    messages = [
        {"role": "system", "content": "You are a story writing assistant."},
        {
            "role": "user",
            "content": "Write a story about llamas."
        }
    ]
)
message['choices'][0]["message"]["content"]

pmking27
/

PrathameshLLM-2B-GGUF

Uploaded model

Provided Quants Files

First install the package

Model Download Script

Simple llama-cpp-python Simple inference example code

Simple llama-cpp-python Chat Completion API Example Code

Model tree for pmking27/PrathameshLLM-2B-GGUF

Collection including pmking27/PrathameshLLM-2B-GGUF

KolhapurLLM