--- license: apache-2.0 language: - id - en library_name: transformers tags: - text-embedding - retrieval - matryoshka - sea-lion pipeline_tag: feature-extraction base_model: - aisingapore/Llama-SEA-LION-v3.5-8B-R --- # Matryoshka Embedding Model (Merged) for SEA-LION 8B This repository contains the **full, standalone** fine-tuned model weights for a Matryoshka-style text embedding model based on `aisingapore/Llama-SEA-LION-v3.5-8B-R`. The LoRA adapters have been merged into the base model for simpler deployment. **Note:** This is the full model, resulting in a large repository size (16GB+). For a much more lightweight version (~50MB) that uses LoRA adapters, please see the [adapter-only repository here](https://huggingface.co/evoreign/sea-lion-8b-mrl-embedding). ### Model Features - **Base Model:** `aisingapore/Llama-SEA-LION-v3.5-8B-R` - **Latent Attention Pooling**: A sophisticated pooling mechanism that uses cross-attention to summarize token sequences into a single vector. - **Matryoshka Representation Learning (MRL)**: Trained to produce nested embeddings. You can use the full 4096-dimension embedding for maximum performance, or slice it to a smaller dimension (e.g., 1024, 512, 128) for a trade-off in speed and storage. ## Intended Use This model is ideal for generating fixed-size embeddings for tasks like: - Semantic Search & Information Retrieval - Retrieval-Augmented Generation (RAG) - Clustering and Text Similarity ## How to Use Loading this model is simpler as the LoRA adapters are already merged. You still need the custom code from `modeling.py` and the weights for the pooling/projection heads. ```python import torch import torch.nn.functional as F from transformers import AutoModelForCausalLM, AutoTokenizer from huggingface_hub import hf_hub_download import importlib.util # --- 1. Setup and Load Components --- device = torch.device("cuda" if torch.cuda.is_available() else "cpu") repo_id = "evoreign/sea-lion-8b-mrl-embedding-merged" # --- 2. Dynamically Load Custom Classes --- print("Downloading custom modeling code...") modeling_path = hf_hub_download(repo_id=repo_id, filename="modeling.py") spec = importlib.util.spec_from_file_location("modeling", modeling_path) modeling = importlib.util.module_from_spec(spec) spec.loader.exec_module(modeling) LatentAttentionPooling = modeling.LatentAttentionPooling MatryoshkaProjection = modeling.MatryoshkaProjection print("Custom classes loaded successfully.") # --- 3. Load Merged Model --- print("Loading the full merged model (this may take time and memory)...") # No PeftModel needed, we load directly from the repo ID. model = AutoModelForCausalLM.from_pretrained( repo_id, torch_dtype=torch.float16, # Use float16 for memory efficiency device_map="auto", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained(repo_id) # --- 4. Load Custom Pooling and Projection Heads --- HIDDEN_SIZE = model.config.hidden_size MAX_DIM = 4096 print("Loading custom pooling and projection heads...") pooler = LatentAttentionPooling(hidden_size=HIDDEN_SIZE).to(device).to(torch.float16) projection = MatryoshkaProjection(hidden_size=HIDDEN_SIZE, max_embed_dim=MAX_DIM).to(device).to(torch.float16) pooler_path = hf_hub_download(repo_id=repo_id, filename="pooler.pt") projection_path = hf_hub_download(repo_id=repo_id, filename="projection.pt") pooler.load_state_dict(torch.load(pooler_path, map_location=device)) projection.load_state_dict(torch.load(projection_path, map_location=device)) model.eval() pooler.eval() projection.eval() # --- 5. Create the Inference Function --- def embed_texts_mrl(texts, out_dim=None): with torch.no_grad(): inputs = tokenizer( texts, return_tensors="pt", padding=True, truncation=True, max_length=4096 ).to(device) # Use model() directly as it's not a PeftModel out = model(**inputs, output_hidden_states=True) hidden = out.hidden_states[-1] mask = inputs.attention_mask pooled = pooler(hidden, attention_mask=mask) z_max = projection(pooled) z = z_max[:, :out_dim] if out_dim else z_max return F.normalize(z, p=2, dim=1) # --- 6. Example Usage --- my_texts = ["Contoh kalimat untuk di-embed.", "Another sentence to embed."] emb_256 = embed_texts_mrl(my_texts, out_dim=256) print("Sliced embedding shape:", emb_256.shape) # Expected output: torch.Size([2, 256]) ``` ### Training Details - **Loss Function:** In-batch contrastive loss with hard negatives. - **MRL Objective:** Loss was averaged across dimensions [128, 256, 512, 1024, 2048, 4096]. - **Dataset:** Fine-tuned on a private triplet dataset (`query`, `positive`, `hard_negative`). --- **Author:** [Edbert Khovey]