Description

This is "Lemnos" , a new Instruction Tuned model based on the Llama 2 model architecture.

It was trained on general wikipedia corpus and then finetuned on a custom instruction dataset.

It is only for use as an experimental version prior launching a new one which also supports Greek.

Usage:

Prerequisites packages:

  • transformers
  • accelerate
  • bitsandbytes-0.43.1

Minimum Environment: T4 GPU (The free of charge Google Colab T4, should run fine) or just run all this Colab (make sure you select T4 GPU): https://colab.research.google.com/drive/1lp-JygPxsaQp-NdB7Mh_uVVYeIIXcAlt?usp=sharing

Notice: Since it is a 7B parameter model, in FP32 and it takes some time to load all the safetensors. An alternative, 4-bit quantized will be uploaded soon.

# Upgrade in case bitsandbytes already installed
pip install transformers accelerate bitsandbytes -U
# or from Colab
!pip install transformers accelerate bitsandbytes -U
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

# Specify the model hub
hub_model = 'gsar78/Lemnos_it_en_v2'

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hub_model, trust_remote_code=True)

# Configure the BitsAndBytesConfig for 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False
)

# Load the model with the specified configuration
model = AutoModelForCausalLM.from_pretrained(
    hub_model,
    quantization_config=bnb_config,
    trust_remote_code=True,
    device_map="auto"
)

# Function to generate text based on a prompt using the Alpaca format
def generate_text(prompt, max_length=512):
    # Format the prompt according to the Alpaca format
    alpaca_prompt = f"### Instruction:\n{prompt}\n\n### Response:\n"
    
    # Tokenize the input prompt
    inputs = tokenizer(alpaca_prompt, return_tensors="pt").to(model.device)
    
    # Generate text using the model
    outputs = model.generate(
        input_ids=inputs['input_ids'],
        max_length=max_length,
        num_return_sequences=1,
        pad_token_id=tokenizer.eos_token_id
    )
    
    # Decode the generated tokens to text
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Remove the prompt part from the output to get only the response
    response = generated_text[len(alpaca_prompt):]
    
    return response


# Example question
prompt = "What are the three basic colors?"
generated_text = generate_text(prompt)
print(generated_text)

# Output:
# Red, blue, and yellow.
Downloads last month
12
Safetensors
Model size
6.74B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train gsar78/Lemnos_it_en_v2

Collection including gsar78/Lemnos_it_en_v2