Blazing fast inference — sub-100ms latency

AI Inference API
built for speed.

OpenAI-compatible API with the fastest open-source models. Drop-in replacement — just change the base URL and API key. Metered billing, usage analytics, rate limiting.

quickstart.py
from openai import OpenAI

client = OpenAI(
    base_url="https://your-api.workers.dev/v1",
    api_key="inf-your-api-key-here"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Why Inference

The fastest way to ship AI.

OpenAI-compatible endpoints powered by the fastest open-source models. Per-token metering, real-time analytics, and built-in rate limiting. No vendor lock-in.

Blazing Fast

Sub-100ms time-to-first-token. Up to 1200 tokens/second on our fastest models. Hardware-optimized inference.

🔌

OpenAI Compatible

Drop-in replacement for any OpenAI SDK — Python, Node, Go, Rust. Just change the base URL and API key.

📊

Usage Analytics

Real-time dashboards showing token usage, latency, costs, and per-key breakdowns. Export data anytime.

💳

Metered Billing

Pay only for what you use. Per-token pricing with credit-based plans. No surprise bills.

🔒

Rate Limiting

Built-in per-key rate limiting. Protect your budget with configurable RPM limits per API key.

🌍

Global Edge

Requests routed through Cloudflare edge network. Low latency worldwide. 99.9% uptime.

Models

Access the fastest models.

Multiple open-source models through a single API. All optimized for speed.

LLaMA 3.3 70B

Versatile reasoning

~330 tok/s

LLaMA 3.1 8B

Ultra-fast tasks

~1200 tok/s

Mixtral 8x7B

Great for code

~480 tok/s

Gemma 2 9B

Compact & efficient

~900 tok/s

Pricing

Simple, transparent pricing.

Start free. Scale when you need to. No hidden fees.

Free

$0/mo

Try it out

1,000 credits/month
30 requests/min
2 API keys
Core models
Usage dashboard
Get started
Most Popular

Starter

$19.99/mo

For building apps

50,000 credits/month
120 requests/min
5 API keys
All models
Usage dashboard
Priority support
Get started

Pro

$49.99/mo

For production

200,000 credits/month
300 requests/min
20 API keys
All models
Usage dashboard
Priority support
Get started

Enterprise

$199.99/mo

For scale

1,000,000 credits/month
1,000 requests/min
100 API keys
All models
Usage dashboard
Dedicated support
Get started

Quick Start

Get running in under a minute.

Three steps to your first API call.

01

Get your API key

Sign up for free and grab your API key from the dashboard. No credit card required.

02

Install the OpenAI SDK

pip install openai — or use any OpenAI-compatible SDK in your language of choice.

03

Make your first request

Point the SDK at our base URL, pass your API key, and call chat completions.

curl https://your-api.workers.dev/v1/chat/completions \
  -H "Authorization: Bearer inf-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'