L-Mul Optimized: meta-llama/Meta-Llama-3-8B-Instruct

This is a modified version of Meta's Llama-3-8B-Instruct model. The modification consists of replacing the standard attention mechanism with one that uses a custom, approximate matrix multiplication algorithm termed "L-Mul".

This work was performed as part of a research project to evaluate the performance and accuracy trade-offs of algorithmic substitutions in transformer architectures.

This model is intended strictly for educational and scientific purposes.

Model Description

The core architecture of meta-llama/Meta-Llama-3-8B-Instruct is preserved. However, the standard LlamaAttention modules have been dynamically replaced with a custom version that utilizes the l_mul_attention function for its core computations. This function is defined in the lmul.py file included in this repository.

Base Model: meta-llama/Meta-Llama-3-8B-Instruct
Modification: Replacement of standard attention with L-Mul approximate attention.
Primary Use-Case: Research and educational analysis of algorithmic impact on LLMs.

How to Get Started

To use this model, you must use the trust_remote_code=True flag when loading it. This is required to execute the custom lmul.py file that defines the new attention mechanism.

You can load the model using the transformers library. Since this model is stored in a subdirectory of a collective repository, you first need to download the specific files.

from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import snapshot_download
import torch

# Define the repository and the specific model subfolder
repo_id = "Peacemann/LMUL-Optimized-Models"
model_name = "meta-llama_Meta-Llama-3-8B-Instruct"

# Download the specific model snapshot
# Note: On Windows, you might need to set local_dir_use_symlinks=False
local_model_path = snapshot_download(
    repo_id=repo_id,
    allow_patterns=f"{model_name}/*",
)
# Construct the full path to the model files within the snapshot
local_model_path = f"{local_model_path}/{model_name}"


# Load the tokenizer and model, trusting the remote code to load lmul.py
tokenizer = AutoTokenizer.from_pretrained(local_model_path)
model = AutoModelForCausalLM.from_pretrained(
    local_model_path,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Example usage
prompt = "The L-Mul algorithm is an experimental method for..."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

For high-throughput inference, you can use vLLM:

from vllm import LLM

# The local_model_path is the same as downloaded above
llm = LLM(model=local_model_path, trust_remote_code=True)

Intended Uses & Limitations

This model is intended for researchers and students exploring the internal workings of LLMs. It is a tool for visualizing and analyzing the effects of fundamental algorithmic changes.

This model is NOT intended for any commercial or production application.

The modification is experimental. The impact on the model's performance, safety alignment, accuracy, and potential for generating biased or harmful content is unknown and untested. It inherits all limitations and biases of the original Llama-3-8B-Instruct model, and its behavior may be altered in unpredictable ways.

Licensing Information

The use of this model is subject to the original Llama 3 Community License Agreement. By using this model, you agree to the terms outlined in the license. The license can be found here.

Peacemann
/

Meta-Llama-3-8B-Instruct_LMUL

L-Mul Optimized: meta-llama/Meta-Llama-3-8B-Instruct

Model Description

How to Get Started

Intended Uses & Limitations

Licensing Information

Model tree for Peacemann/Meta-Llama-3-8B-Instruct_LMUL

Collection including Peacemann/Meta-Llama-3-8B-Instruct_LMUL

LMUL-Optimized-Models