VoRA Command R

This is a VoRA (Vision as LoRA) adaptation of Command R 7B, enabling vision-language understanding capabilities.

Model Details

  • Base Model: CohereForAI/c4ai-command-r7b-12-2024
  • Vision Adapter: LoRA with rank 32 applied to attention layers
  • Image Resolution: 224x224
  • Vision Placeholder Token: «

Quickstart

The model can be used as follows:

import torch
from transformers import AutoProcessor, AutoModelForCausalLM

model_name = "your-username/cmd-r-vora-4"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

conversation = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "url": "{image path or url}"
            },
            {
                "type": "text", 
                "text": "« Describe this image."
            }
        ]
    }
]

model_inputs = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=True, return_tensors='pt', return_dict=True).to(model.device)
gen_kwargs = {"max_new_tokens": 1024, "eos_token_id": processor.tokenizer.eos_token_id}

with torch.inference_mode():
    outputs = model.generate(model_inputs, **gen_kwargs)
    output_text = processor.tokenizer.batch_decode(
        outputs, skip_special_tokens=True
    )
    print(output_text)
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maximuspowers/cmd-r-vora-4

Finetuned
(4)
this model

Dataset used to train maximuspowers/cmd-r-vora-4