Run LLMs, ASR, and TTS natively in apps and games.
Rust core · iOS · Android · Flutter · Unity
Private, offline, no cloud required.
Documentation · SDKs · Models · Join Discord · Follow on X · Issues
| Goal | Path |
|---|---|
| Fastest demo (2 min) | Download CLI → |
| Build a mobile or desktop app | Flutter SDK → |
| Add AI NPCs to your game | Unity SDK → and try the 3D tavern demo |
| Android native | Kotlin SDK → |
| Rust / embedded | Core crate → |
Xybrid is a Rust-powered runtime with native bindings for every major platform.
| SDK | Platforms | Install | Status | Sample |
|---|---|---|---|---|
| Flutter | iOS, Android, macOS, Linux, Windows | pub.dev | Available | README |
| Unity | macOS, Windows, Linux, iOS, Android | See below | Available | Unity 3D AI tavern |
| Swift | iOS, macOS | Swift Package Manager | Coming Soon | README |
| Kotlin | Android | Maven Central | Available | README |
| CLI | macOS, Linux, Windows | Download binary | Available | — |
| Rust | All | xybrid-core / xybrid-sdk |
Available | — |
Every SDK wraps the same Rust core — identical model support and behavior across all platforms.
Unity — Package Manager → Add from git URL:
https://github.com/xybrid-ai/xybrid.git#upmThe
upmbranch contains pre-built native libraries for all platforms. To pin a specific version:https://github.com/xybrid-ai/xybrid.git#upm/v0.1.0-beta8
Flutter — add to your pubspec.yaml:
dependencies:
xybrid_flutter: ^0.1.0Kotlin (Android) — add to your build.gradle.kts:
dependencies {
implementation("ai.xybrid:xybrid-kotlin:0.1.0-beta8")
}See each SDK's README for platform-specific setup: Flutter · Unity · Swift · Kotlin · Rust
Run a model in one line from the CLI, or three lines from any SDK:
CLI:
xybrid run kokoro-82m --input "Hello world" -o output.wavFlutter:
final model = await Xybrid.model('kokoro-82m').load();
final result = await model.run(XybridEnvelope.text('Hello world'));
// result → 24kHz WAV audioKotlin:
val model = XybridModelLoader.fromRegistry("kokoro-82m").load()
val result = model.run(Envelope.text("Hello world"))
// result → 24kHz WAV audioSwift:
let model = try ModelLoader.fromRegistry(modelId: "kokoro-82m").load()
let result = try model.run(envelope: Envelope.text("Hello world"))
// result → 24kHz WAV audioUnity (C#):
var model = XybridClient.LoadModel("kokoro-82m");
var result = model.Run(Envelope.Text("Hello world"));
// result → 24kHz WAV audioRust:
let model = Xybrid::model("kokoro-82m").load()?;
let result = model.run(&Envelope::text("Hello world"))?;
// result → 24kHz WAV audioChain models together — build a voice assistant in 3 lines of YAML:
# voice-assistant.yaml
name: voice-assistant
stages:
- model: whisper-tiny # Speech → text
- model: qwen2.5-0.5b # Process with LLM
- model: kokoro-82m # Text → speechCLI:
xybrid run voice-assistant.yaml --input question.wav -o response.wavFlutter:
final pipeline = Xybrid.pipeline(yaml: yamlString);
final result = await pipeline.run(XybridEnvelope.audio(bytes: audioBytes, sampleRate: 16000));Kotlin:
// Pipeline support coming soon — use single model loading for nowSwift:
// Pipeline support coming soon — use single model loading for nowUnity (C#):
// Pipeline support coming soon — use single model loading for nowRust:
let pipeline = Xybrid::pipeline(&yaml_string).load()?;
pipeline.load_models()?;
let result = pipeline.run(&Envelope::audio(audio_bytes))?;All models run entirely on-device. No cloud, no API keys required. Browse the full registry with xybrid models list.
| Model | Type | Params | Why start here |
|---|---|---|---|
| SmolLM2 360M | LLM | 360M | Best quality-to-size ratio for any device |
| Kokoro 82M | TTS | 82M | High-quality speech, 24 voices, fast |
| Whisper Tiny | ASR | 39M | Accurate multilingual transcription |
| Model | Params | Format | Description |
|---|---|---|---|
| Whisper Tiny | 39M | SafeTensors | Multilingual transcription (Candle runtime) |
| Wav2Vec2 Base | 95M | ONNX | English ASR with CTC decoding |
| Model | Params | Format | Description |
|---|---|---|---|
| Kokoro 82M | 82M | ONNX | High-quality, 24 natural voices |
| KittenTTS Nano | 15M | ONNX | Ultra-lightweight, 8 voices |
| Model | Params | Format | Description |
|---|---|---|---|
| Gemma 3 1B | 1B | GGUF Q4_K_M | Google's mobile-optimized LLM |
| Llama 3.2 1B | 1B | GGUF Q4_K_M | Meta's general purpose, 128K context |
| Qwen 2.5 0.5B | 500M | GGUF Q4_K_M | Compact on-device chat |
| Qwen 3.5 0.8B | 800M | GGUF Q4_K_M | Latest Qwen with reasoning (thinking mode) |
| Qwen 3.5 2B | 2B | GGUF Q4_K_M | Larger Qwen 3.5 with extended reasoning |
| SmolLM2 360M | 360M | GGUF Q4_K_M | Best tiny LLM, excellent quality/size ratio |
| Model | Type | Params | Priority | Status |
|---|---|---|---|---|
| Phi-4 Mini | LLM | 3.8B | P2 | Spec Ready (first multi-quant: Q4, Q8, FP16) |
| Qwen3 0.6B | LLM | 600M | P2 | Planned |
| Trinity Nano | LLM (MoE) | 6B (1B active) | P2 | Planned |
| LFM2 700M | LLM | 700M | P2 | Planned |
| Nomic Embed Text v1.5 | Embeddings | 137M | P1 | Blocked (needs Tokenize/MeanPool steps) |
| LFM2-VL 450M | Vision | 450M | P2 | Planned |
| Whisper Tiny CoreML | ASR | 39M | P2 | Planned |
| Qwen3-TTS 0.6B | TTS | 600M | P2 | Blocked (needs custom SafeTensors runtime) |
| Chatterbox Turbo | TTS | 350M | P3 | Blocked (needs ModelGraph template) |
| Capability | iOS | Android | macOS | Linux | Windows |
|---|---|---|---|---|---|
| Speech-to-Text | ✅ | ✅ | ✅ | ✅ | ✅ |
| Text-to-Speech | ✅ | ✅ | ✅ | ✅ | ✅ |
| Language Models | ✅ | ✅ | ✅ | ✅ | ✅ |
| Vision Models | 🔜 | 🔜 | 🔜 | 🔜 | 🔜 |
| Embeddings | 🔜 | 🔜 | 🔜 | 🔜 | 🔜 |
| Pipeline Orchestration | ✅ | ✅ | ✅ | ✅ | ✅ |
| Model Download & Caching | ✅ | ✅ | ✅ | ✅ | ✅ |
| Hardware Acceleration | Metal, ANE | CPU | Metal, ANE | CUDA | CUDA |
SDK pipeline support: Flutter ✅ · Rust ✅ · Kotlin 🔜 · Swift 🔜 · Unity 🔜
- Privacy first — All inference runs on-device. Your data never leaves the device.
- Offline capable — No internet required after initial model download.
- Cross-platform — One API across iOS, Android, macOS, Linux, and Windows.
- Pipeline orchestration — Chain models together (ASR → LLM → TTS) in a single call.
- Automatic optimization — Hardware acceleration on Apple Neural Engine, Metal, and CUDA.
| Xybrid | Ollama | llama.cpp | ONNX Runtime | |
|---|---|---|---|---|
| Mobile (iOS/Android) | ✅ | ❌ | ❌ | ✅ |
| Game engine (Unity) | ✅ | ❌ | ❌ | ❌ |
| Multi-stage pipelines | ✅ | ❌ | ❌ | ❌ |
| ASR + TTS + LLM in one SDK | ✅ | ❌ | ❌ | ❌ |
| Runs in-process (no server) | ✅ | ❌ | ✅ | ✅ |
| No cloud required | ✅ | ✅ | ✅ | ✅ |
We welcome contributions! See CONTRIBUTING.md for guidelines on setting up your development environment, submitting pull requests, and adding new models.
Apache License 2.0 — see LICENSE for details.



