AI & ML interests

Framework for on-device AI inference.

Recent Activity

hmunachii  updated a Space 16 days ago
Cactus-Compute/README
hmunachii  updated a model 20 days ago
Cactus-Compute/Jan-Nano-GGUF
hmunachii  published a model 20 days ago
Cactus-Compute/Jan-Nano-GGUF
View all activity

Email   Discord   Design Docs   

A cross-platform framework for deploying LLMs, VLMs, Embedding Models, TTS models and more locally on smartphones.

Features

  • Available in Flutter and React-Native for cross-platform developers.
  • Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek etc.
  • Accommodates from FP32 to as low as 2-bit quantized models, for efficiency and less device strain.
  • MCP tool-calls to make AI performant and helpful (set reminder, gallery search, reply messages) etc.
  • iOS xcframework and JNILibs for native setups
  • Neat and tiny C++ build for custom hardware
  • Chat templates with Jinja2 support

Flutter

  1. Update pubspec.yaml: Add cactus to your project's dependencies. Ensure you have flutter: sdk: flutter (usually present by default).
    dependencies:
      flutter:
        sdk: flutter
      cactus: ^0.1.3
    
  2. Install dependencies: Execute the following command in your project terminal:
    flutter pub get
    
  3. Flutter Text Completion
    import 'package:cactus/cactus.dart';
    
    // Initialize
    final lm = await CactusLM.init(
        modelUrl: 'huggingface/gguf/link',
        nCtx: 2048,
    );
    
    // Completion 
    final messages = [CactusMessage(role: CactusMessageRole.user, content: 'Hello!')];
    final params = CactusCompletionParams(nPredict: 100, temperature: 0.7);
    final response = await lm.completion(messages, params);
    
    // Embedding 
    final text = 'Your text to embed';
    final params = CactusEmbeddingParams(normalize: true);
    final result = await lm.embedding(text, params);
    
  4. Flutter VLM Completion
    import 'package:cactus/cactus.dart';
    
    // Initialize (Flutter handles downloads automatically)
    final vlm = await CactusVLM.init(
        modelUrl: 'huggingface/gguf/link',
        mmprojUrl: 'huggingface/gguf/mmproj/link',
    );
    
    // Multimodal Completion (can add multiple images)
    final messages = [CactusMessage(role: CactusMessageRole.user, content: 'Describe this image')];
    
    final params = CactusVLMParams(
        images: ['/absolute/path/to/image.jpg'],
        nPredict: 200,
        temperature: 0.3,
    );
    
    final response = await vlm.completion(messages, params);
    

N/B: See the Flutter Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.

React Native

  1. Install the cactus-react-native package: Using npm:

    npm install cactus-react-native
    

    Or using yarn:

    yarn add cactus-react-native
    
  2. Install iOS Pods (if not using Expo): For native iOS projects, ensure you link the native dependencies. Navigate to your ios directory and run:

    npx pod-install
    
  3. React-Native Text Completion

    // Initialize
    const lm = await CactusLM.init({
        model: '/path/to/model.gguf',
        n_ctx: 2048,
    });
    
    // Completion 
    const messages = [{ role: 'user', content: 'Hello!' }];
    const params = { n_predict: 100, temperature: 0.7 };
    const response = await lm.completion(messages, params);
    
    // Embedding 
    const text = 'Your text to embed';
    const params = { normalize: true };
    const result = await lm.embedding(text, params);
    
  4. React-Native VLM

    // Initialize
    const vlm = await CactusVLM.init({
        model: '/path/to/vision-model.gguf',
        mmproj: '/path/to/mmproj.gguf',
    });
    
    // Multimodal Completion (can add multiple images)
    const messages = [{ role: 'user', content: 'Describe this image' }];
    
    const params = {
        images: ['/absolute/path/to/image.jpg'],
        n_predict: 200,
        temperature: 0.3,
    };
    
    const response = await vlm.completion(messages, params);
    

N/B: See the React Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and various options.

C++

Cactus backend is written in C/C++ and can run directly on any ARM/X86/Raspberry PI hardware like phones, smart tvs, watches, speakers, cameras, laptops etc.

  1. Setup You need CMake 3.14+ installed, or install with brew install cmake (on macOS) or standard package managers on Linux.

  2. Build from Source

    git clone https://github.com/your-org/cactus.git
    cd cactus
    mkdir build && cd build
    cmake .. -DCMAKE_BUILD_TYPE=Release
    make -j$(nproc)
    
  3. CMake Integration Add to your CMakeLists.txt:

    # Add Cactus as subdirectory
    add_subdirectory(cactus)
    
    # Link to your target
    target_link_libraries(your_target cactus)
    target_include_directories(your_target PRIVATE cactus)
    
    # Requires C++17 or higher 
    
  4. Basic Text Completion

    #include "cactus/cactus.h"
    #include <iostream>
    
    int main() {
        cactus::cactus_context context;
        
        // Configure parameters
        common_params params;
        params.model.path = "model.gguf";
        params.n_ctx = 2048;
        params.n_threads = 4;
        params.n_gpu_layers = 99; // Use GPU acceleration
        
        // Load model
        if (!context.loadModel(params)) {
            std::cerr << "Failed to load model" << std::endl;
            return 1;
        }
        
        // Set prompt
        context.params.prompt = "Hello, how are you?";
        context.params.n_predict = 100;
        
        // Initialize sampling
        if (!context.initSampling()) {
            std::cerr << "Failed to initialize sampling" << std::endl;
            return 1;
        }
        
        // Generate response
        context.beginCompletion();
        context.loadPrompt();
        
        while (context.has_next_token && !context.is_interrupted) {
            auto token_output = context.doCompletion();
            if (token_output.tok == -1) break;
        }
        
        std::cout << "Response: " << context.generated_text << std::endl;
        return 0;
    }
    

To learn more, see the C++ Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.

Performance

Device Gemma3 1B Q4 (toks/sec) Qwen3 4B Q4 (toks/sec)
iPhone 16 Pro Max 54 18
iPhone 16 Pro 54 18
iPhone 16 49 16
iPhone 15 Pro Max 45 15
iPhone 15 Pro 45 15
iPhone 14 Pro Max 44 14
OnePlus 13 5G 43 14
Samsung Galaxy S24 Ultra 42 14
iPhone 15 42 14
OnePlus Open 38 13
Samsung Galaxy S23 5G 37 12
Samsung Galaxy S24 36 12
iPhone 13 Pro 35 11
OnePlus 12 35 11
Galaxy S25 Ultra 29 9
OnePlus 11 26 8
iPhone 13 mini 25 8
Redmi K70 Ultra 24 8
Xiaomi 13 24 8
Samsung Galaxy S24+ 22 7
Samsung Galaxy Z Fold 4 22 7
Xiaomi Poco F6 5G 22 6

We are completely open-source and would appreciate feedback!

Repo: https://github.com/cactus-compute/cactus

datasets 0

None public yet