--- language: - en license: apache-2.0 tags: - text-generation-inference - transformers - gemma - gguf - llama.cpp base_model: pmking27/PrathameshLLM-2B --- # Uploaded model - **Developed by:** pmking27 - **License:** apache-2.0 - **Finetuned from model :** pmking27/PrathameshLLM-2B ## Provided Quants Files | Name | Quant method | Bits | Size | | ---- | ---- | ---- | ---- | | [PrathameshLLM-2B.IQ3_M.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.IQ3_M.gguf) | IQ3_M | 3 | 1.31 GB| | [PrathameshLLM-2B.IQ3_S.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.IQ3_S.gguf) | IQ3_S | 3 | 1.29 GB| | [PrathameshLLM-2B.IQ3_XS.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.IQ3_XS.gguf) | IQ3_XS | 3 | 1.24 GB| | [PrathameshLLM-2B.IQ4_NL.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.IQ4_NL.gguf) | IQ4_NL | 4 | 1.56 GB| | [PrathameshLLM-2B.IQ4_XS.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.IQ4_XS.gguf) | IQ4_XS | 4 | 1.5 GB| | [PrathameshLLM-2B.Q2_K.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q2_K.gguf) | Q2_K | 2 | 1.16 GB| | [PrathameshLLM-2B.Q3_K_L.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q3_K_L.gguf) | Q3_K_L | 3 | 1.47 GB| | [PrathameshLLM-2B.Q3_K_M.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q3_K_M.gguf) | Q3_K_M | 3 | 1.38 GB| | [PrathameshLLM-2B.Q3_K_S.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q3_K_S.gguf) | Q3_K_S | 3 | 1.29 GB| | [PrathameshLLM-2B.Q4_0.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q4_0.gguf) | Q4_0 | 4 | 1.55 GB| | [PrathameshLLM-2B.Q4_K_M.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q4_K_M.gguf) | Q4_K_M | 4 | 1.63 GB| | [PrathameshLLM-2B.Q4_K_S.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q4_K_S.gguf) | Q4_K_S | 4 | 1.56 GB| | [PrathameshLLM-2B.Q5_0.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q5_0.gguf) | Q5_0 | 5 | 1.8 GB| | [PrathameshLLM-2B.Q5_K_M.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q5_K_M.gguf) | Q5_K_M | 5 | 1.84 GB| | [PrathameshLLM-2B.Q5_K_S.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q5_K_S.gguf) | Q5_K_S | 5 | 1.8 GB| | [PrathameshLLM-2B.Q6_K.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q6_K.gguf) | Q6_K | 6 | 2.06 GB| | [PrathameshLLM-2B.Q8_0.gguf](https://huggingface.co/pmking27/PrathameshLLM-2B-GGUF/blob/main/PrathameshLLM-2B.Q8_0.gguf) | Q8_0 | 8 | 2.67 GB| #### First install the package Run one of the following commands, according to your system: ```shell # Base ctransformers with no GPU acceleration pip install llama-cpp-python # With NVidia CUDA acceleration CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python # Or with OpenBLAS acceleration CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python # Or with CLBLast acceleration CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python # Or with AMD ROCm GPU acceleration (Linux only) CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python # Or with Metal GPU acceleration for macOS systems only CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python # In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA: $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on" pip install llama-cpp-python ``` ## Model Download Script ```python import os from huggingface_hub import hf_hub_download # Specify model details model_repo_id = "pmking27/PrathameshLLM-2B-GGUF" # Replace with the desired model repo filename = "PrathameshLLM-2B.Q4_K_M.gguf" # Replace with the specific GGUF filename local_folder = "." # Replace with your desired local storage path # Create the local directory if it doesn't exist os.makedirs(local_folder, exist_ok=True) # Download the model file to the specified local folder filepath = hf_hub_download(repo_id=model_repo_id, filename=filename, cache_dir=local_folder) print(f"GGUF model downloaded and saved to: {filepath}") ``` Replace `model_repo_id` and `filename` with the desired model repository ID and specific GGUF filename respectively. Also, modify `local_folder` to specify where you want to save the downloaded model file. #### Simple llama-cpp-python Simple inference example code ```python from llama_cpp import Llama llm = Llama( model_path = filepath, # Download the model file first n_ctx = 32768, # The max sequence length to use - note that longer sequence lengths require much more resources n_threads = 8, # The number of CPU threads to use, tailor to your system and the resulting performance n_gpu_layers = 35 # The number of layers to offload to GPU, if you have GPU acceleration available ) # Defining the Alpaca prompt template alpaca_prompt = """ ### Instruction: {} ### Input: {} ### Response: {}""" output = llm( alpaca_prompt.format( ''' You're an assistant trained to answer questions using the given context. context: General elections will be held in India from 19 April 2024 to 1 June 2024 to elect the 543 members of the 18th Lok Sabha. The elections will be held in seven phases and the results will be announced on 4 June 2024. This will be the largest-ever election in the world, surpassing the 2019 Indian general election, and will be the longest-held general elections in India with a total span of 44 days (excluding the first 1951–52 Indian general election). The incumbent prime minister Narendra Modi who completed a second term will be contesting elections for a third consecutive term. Approximately 960 million individuals out of a population of 1.4 billion are eligible to participate in the elections, which are expected to span a month for completion. The Legislative assembly elections in the states of Andhra Pradesh, Arunachal Pradesh, Odisha, and Sikkim will be held simultaneously with the general election, along with the by-elections for 35 seats among 16 states. ''', # instruction "In how many phases will the general elections in India be held?", # input "", # output - leave this blank for generation! ), #Alpaca Prompt max_tokens = 512, # Generate up to 512 tokens stop = [""], #stop token echo = True # Whether to echo the prompt ) output_text = output['choices'][0]['text'] start_marker = "### Response:" end_marker = "" start_pos = output_text.find(start_marker) + len(start_marker) end_pos = output_text.find(end_marker, start_pos) # Extracting the response text response_text = output_text[start_pos:end_pos].strip() print(response_text) ``` #### Simple llama-cpp-python Chat Completion API Example Code ```python from llama_cpp import Llama llm = Llama(model_path = filepath, chat_format="gemma") # Set chat_format according to the model you are using message=llm.create_chat_completion( messages = [ {"role": "system", "content": "You are a story writing assistant."}, { "role": "user", "content": "Write a story about llamas." } ] ) message['choices'][0]["message"]["content"] ```