LIVE Cheapest: GLM-4.7-Flash $0.000/Mtok in 153 models tracked Updated Jun 25, 2026
Jun 25, 2026
ModelPriceWatch$/Mtok
Pricing / Best For / Fastest LLM APIs for Low-Latency Apps

Fastest LLM APIs for Low-Latency Apps

LLM APIs optimized for fast inference and low latency. Compare pricing for models available on fast inference providers like Groq, Together, and Fireworks.

29 models qualify Showing top 15 Sorted by blended cost
1

Granite 4.0 Micro

IBM

$0.017 in $0.112 out
$0.065/Mtok blended
128K ctx
2

Llama 3.1 8B

Meta

$0.050 in $0.080 out
$0.065/Mtok blended
128K ctx
3

LFM2 24B A2B

Together

$0.030 in $0.120 out
$0.075/Mtok blended
128K ctx

Cost calculator for this use case

🥇 Granite 4.0 Micro $—
🥈 Llama 3.1 8B $—
🥉 LFM2 24B A2B $—

Full ranking — top 15 models

# Model Provider Input $/Mtok Output $/Mtok Blended Context
1 Granite 4.0 Micro IBM $0.017 $0.112 $0.065 128K
2 Llama 3.1 8B Meta $0.050 $0.080 $0.065 128K
3 LFM2 24B A2B Together $0.030 $0.120 $0.075 128K
4 Nova Micro Amazon $0.035 $0.140 $0.088 128K
5 Ministral 3 3B Mistral $0.100 $0.100 $0.100 128K
6 Reka Edge Reka $0.100 $0.100 $0.100 66K
7 Qwen-Turbo Alibaba $0.050 $0.200 $0.125 1M
8 Mistral Small 3.2 24B Mistral $0.080 $0.200 $0.140 128K
9 Gemini 2.5 Flash Google $0.075 $0.300 $0.188 1M
10 DeepSeek V4 Flash DeepSeek $0.140 $0.280 $0.210 1M
11 DeepSeek V4 Flash Fireworks $0.140 $0.280 $0.210 1M
12 GLM-4.7-FlashX Zhipu $0.070 $0.400 $0.235 128K
13 Gemini 2.5 Flash-Lite Google $0.100 $0.400 $0.250 1M
14 Qwen-Flash Alibaba $0.115 $0.460 $0.288 1M
15 Grok 4.1 Fast xAI $0.200 $0.500 $0.350 2M

How models are selected

Models tagged as fast/speed-optimized, sorted by blended cost.

Prices are per million tokens (Mtok), sourced directly from official provider pricing pages and verified by our automated scraper pipeline that runs 3x daily. "Blended cost" is the average of input and output pricing — a quick proxy for typical 50/50 usage patterns.