AI running entirely in your browser — zero API calls
No servers. No API costs. This runs an LLM directly on your GPU via WebGPU. Your data never leaves your computer.
How fast? First load downloads the model (200-900MB depending on choice, cached after). Responses take 5-30s depending on model and GPU.
Requirements: Modern GPU (NVIDIA/AMD/Apple Silicon), Chrome 113+ or Edge 113+, or Safari 18+.
Tip: Start with Qwen2.5-0.5B for fast responses. Switch to 1.5B for smarter answers if your GPU handles it.