Model

WebGPU: Checking...

Model: Loading...

Speed: —

Memory: —

Downloading 0%

Quick prompts

How it works ▼

No servers. No API costs. This runs an LLM directly on your GPU via WebGPU. Your data never leaves your computer.

How fast? First load downloads the model (200-900MB depending on choice, cached after). Responses take 5-30s depending on model and GPU.

Requirements: Modern GPU (NVIDIA/AMD/Apple Silicon), Chrome 113+ or Edge 113+, or Safari 18+.

Tip: Start with Qwen2.5-0.5B for fast responses. Switch to 1.5B for smarter answers if your GPU handles it.

WebLLM Chat