Cloudflare adds built-in AI inference to Workers with pay-per-token pricing

Cloudflare has been steadily building its AI infrastructure over the past year, and this release ties it all together. Workers AI lets you call inference models directly from your edge function code with a single API call - no external provider needed, no API keys to manage, no cold starts waiting for a distant GPU. The models run on Cloudflare's network, which means inference happens close to your users geographically.

The model selection covers the most common use cases. Llama 3 handles text generation and chat. Mistral offers a lighter alternative for simpler tasks. Stable Diffusion handles image generation. Whisper does speech-to-text. For most founders, this covers 80% of AI feature needs without the complexity of managing multiple AI provider relationships.

Pricing is refreshingly simple: you pay per token for text models and per request for image and audio models. There are no monthly minimums, no reserved capacity commitments, and no surprise bills from idle infrastructure. A small SaaS app generating a few thousand AI responses per day would pay single-digit dollars per month. For comparison, the same workload on a dedicated GPU instance would cost 50-100x more. The trade-off is that you are limited to the models Cloudflare hosts, so if you need Claude or GPT-4 specifically, you still need those providers.

Founder Takeaway

For adding basic AI features like summarization, chat, or image generation to your app, Cloudflare Workers AI is the cheapest and simplest option to start with.

Clinton Feyisitan

Founder at Fewer Tools. Covers AI tools, dev infrastructure, and SaaS for founders.

← More News

Cloudflare adds built-in AI inference to Workers with pay-per-token pricing

Related tools

Clinton Feyisitan