
AI & ML interests
Framework for on-device AI inference.
Recent Activity
A cross-platform framework for deploying LLMs, VLMs, Embedding Models, TTS models and more locally on smartphones.
- Available in Flutter and React-Native for cross-platform developers.
- Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek etc.
- Accommodates from FP32 to as low as 2-bit quantized models, for efficiency and less device strain.
- MCP tool-calls to make AI performant and helpful (set reminder, gallery search, reply messages) etc.
- iOS xcframework and JNILibs for native setups
- Neat and tiny C++ build for custom hardware
- Chat templates with Jinja2 support
- Update
pubspec.yaml
: Addcactus
to your project's dependencies. Ensure you haveflutter: sdk: flutter
(usually present by default).dependencies: flutter: sdk: flutter cactus: ^0.1.3
- Install dependencies:
Execute the following command in your project terminal:
flutter pub get
- Flutter Text Completion
import 'package:cactus/cactus.dart'; // Initialize final lm = await CactusLM.init( modelUrl: 'huggingface/gguf/link', nCtx: 2048, ); // Completion final messages = [CactusMessage(role: CactusMessageRole.user, content: 'Hello!')]; final params = CactusCompletionParams(nPredict: 100, temperature: 0.7); final response = await lm.completion(messages, params); // Embedding final text = 'Your text to embed'; final params = CactusEmbeddingParams(normalize: true); final result = await lm.embedding(text, params);
- Flutter VLM Completion
import 'package:cactus/cactus.dart'; // Initialize (Flutter handles downloads automatically) final vlm = await CactusVLM.init( modelUrl: 'huggingface/gguf/link', mmprojUrl: 'huggingface/gguf/mmproj/link', ); // Multimodal Completion (can add multiple images) final messages = [CactusMessage(role: CactusMessageRole.user, content: 'Describe this image')]; final params = CactusVLMParams( images: ['/absolute/path/to/image.jpg'], nPredict: 200, temperature: 0.3, ); final response = await vlm.completion(messages, params);
N/B: See the Flutter Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.
Install the
cactus-react-native
package: Using npm:npm install cactus-react-native
Or using yarn:
yarn add cactus-react-native
Install iOS Pods (if not using Expo): For native iOS projects, ensure you link the native dependencies. Navigate to your
ios
directory and run:npx pod-install
React-Native Text Completion
// Initialize const lm = await CactusLM.init({ model: '/path/to/model.gguf', n_ctx: 2048, }); // Completion const messages = [{ role: 'user', content: 'Hello!' }]; const params = { n_predict: 100, temperature: 0.7 }; const response = await lm.completion(messages, params); // Embedding const text = 'Your text to embed'; const params = { normalize: true }; const result = await lm.embedding(text, params);
React-Native VLM
// Initialize const vlm = await CactusVLM.init({ model: '/path/to/vision-model.gguf', mmproj: '/path/to/mmproj.gguf', }); // Multimodal Completion (can add multiple images) const messages = [{ role: 'user', content: 'Describe this image' }]; const params = { images: ['/absolute/path/to/image.jpg'], n_predict: 200, temperature: 0.3, }; const response = await vlm.completion(messages, params);
N/B: See the React Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and various options.
Cactus backend is written in C/C++ and can run directly on any ARM/X86/Raspberry PI hardware like phones, smart tvs, watches, speakers, cameras, laptops etc.
Setup You need
CMake 3.14+
installed, or install withbrew install cmake
(on macOS) or standard package managers on Linux.Build from Source
git clone https://github.com/your-org/cactus.git cd cactus mkdir build && cd build cmake .. -DCMAKE_BUILD_TYPE=Release make -j$(nproc)
CMake Integration Add to your
CMakeLists.txt
:# Add Cactus as subdirectory add_subdirectory(cactus) # Link to your target target_link_libraries(your_target cactus) target_include_directories(your_target PRIVATE cactus) # Requires C++17 or higher
Basic Text Completion
#include "cactus/cactus.h" #include <iostream> int main() { cactus::cactus_context context; // Configure parameters common_params params; params.model.path = "model.gguf"; params.n_ctx = 2048; params.n_threads = 4; params.n_gpu_layers = 99; // Use GPU acceleration // Load model if (!context.loadModel(params)) { std::cerr << "Failed to load model" << std::endl; return 1; } // Set prompt context.params.prompt = "Hello, how are you?"; context.params.n_predict = 100; // Initialize sampling if (!context.initSampling()) { std::cerr << "Failed to initialize sampling" << std::endl; return 1; } // Generate response context.beginCompletion(); context.loadPrompt(); while (context.has_next_token && !context.is_interrupted) { auto token_output = context.doCompletion(); if (token_output.tok == -1) break; } std::cout << "Response: " << context.generated_text << std::endl; return 0; }
To learn more, see the C++ Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.
Device | Gemma3 1B Q4 (toks/sec) | Qwen3 4B Q4 (toks/sec) |
---|---|---|
iPhone 16 Pro Max | 54 | 18 |
iPhone 16 Pro | 54 | 18 |
iPhone 16 | 49 | 16 |
iPhone 15 Pro Max | 45 | 15 |
iPhone 15 Pro | 45 | 15 |
iPhone 14 Pro Max | 44 | 14 |
OnePlus 13 5G | 43 | 14 |
Samsung Galaxy S24 Ultra | 42 | 14 |
iPhone 15 | 42 | 14 |
OnePlus Open | 38 | 13 |
Samsung Galaxy S23 5G | 37 | 12 |
Samsung Galaxy S24 | 36 | 12 |
iPhone 13 Pro | 35 | 11 |
OnePlus 12 | 35 | 11 |
Galaxy S25 Ultra | 29 | 9 |
OnePlus 11 | 26 | 8 |
iPhone 13 mini | 25 | 8 |
Redmi K70 Ultra | 24 | 8 |
Xiaomi 13 | 24 | 8 |
Samsung Galaxy S24+ | 22 | 7 |
Samsung Galaxy Z Fold 4 | 22 | 7 |
Xiaomi Poco F6 5G | 22 | 6 |
We are completely open-source and would appreciate feedback!
models
15


Cactus-Compute/OuteTTS-0.2-500m-GGUF

Cactus-Compute/Gemma3-4B-Instruct-GGUF

Cactus-Compute/Qwen2.5-Omni-3B-GGUF

Cactus-Compute/Qwen3-4B-Instruct-GGUF

Cactus-Compute/Gemma3-1B-Instruct-GGUF

Cactus-Compute/Qwen3-1.7B-Instruct-GGUF

Cactus-Compute/Qwen3-600m-Instruct-GGUF

Cactus-Compute/Qwen3-embedding-600m-GGUF
