Prompt Token Cost Calculator
Paste your prompt, set output tokens, and instantly compare API costs across GPT, Claude, Gemini, and Llama β with caching and batch pricing.
β 375 words
Used for the scale estimate table below
Llama 3.1 8B128K
Meta
Gemini 2.0 Flash-Lite1M
Gemini 2.5 Flash-Lite1M
Gemini 2.0 Flash1M
Llama 3.3 70B128K
Meta
GPT-5 mini400K
OpenAI
Gemini 2.5 Flash1M
Gemini 3 Flash1M
Claude Haiku 3.5200K
Anthropic
Claude Haiku 4.5200K
Anthropic
Gemini 2.5 Pro2M
Gemini 3 Pro1M
GPT-5.41M
OpenAI
Claude Sonnet 4.6200K
Anthropic
Claude Opus 4.6200K
Anthropic
Production Scale β 1,000 calls/day
| Model | Provider | Per Call | Daily Cost | Monthly Cost (Γ30 days) |
|---|---|---|---|---|
| Llama 3.1 8B | Meta | $0.00004 | $0.04 | $1.20 |
| Gemini 2.0 Flash-Lite | $0.00015 | $0.15 | $4.50 | |
| Gemini 2.5 Flash-Lite | $0.0002 | $0.20 | $6.00 | |
| Gemini 2.0 Flash | $0.0002 | $0.20 | $6.00 | |
| Llama 3.3 70B | Meta | $0.0002 | $0.20 | $6.00 |
| GPT-5 mini | OpenAI | $0.001 | $1.00 | $30.00 |
| Gemini 2.5 Flash | $0.00125 | $1.25 | $37.50 | |
| Gemini 3 Flash | $0.0015 | $1.50 | $45.00 | |
| Claude Haiku 3.5 | Anthropic | $0.002 | $2.00 | $60.00 |
| Claude Haiku 4.5 | Anthropic | $0.0025 | $2.50 | $75.00 |
| Gemini 2.5 Pro | $0.005 | $5.00 | $150.00 | |
| Gemini 3 Pro | $0.006 | $6.00 | $180.00 | |
| GPT-5.4 | OpenAI | $0.0075 | $7.50 | $225.00 |
| Claude Sonnet 4.6 | Anthropic | $0.0075 | $7.50 | $225.00 |
| Claude Opus 4.6 | Anthropic | $0.01 | $12.50 | $375.00 |
How to Use
Paste your prompt
Type or paste your system message, user prompt, or full conversation. Token count updates instantly in your browser β nothing is uploaded.
Set expected output
Enter the average output tokens the model will generate. Default is 500 tokens (β 375 words). Adjust to match your typical response length.
Filter, sort & compare
Use provider tabs to focus on one vendor. Toggle "Cheapest first" to rank models by cost. Enable "Batch API" to see 50%-off async pricing. Toggle caching to simulate cached input.
Project production cost
Enter your daily API call volume β the scale table shows projected daily and monthly spend (Γ30 days) per model so you can budget before committing.
How LLM API pricing works
Every major LLM provider bills on a pay-per-token model. You pay separately for input tokens (your prompt, system message, conversation history) and output tokens (the model's generated response). Output tokens are typically priced 3β10Γ higher than input because generating each token requires a full sequential forward pass β whereas input tokens are read in parallel in a single pass.
The formula: Total cost = (input tokens Γ input rate) + (output tokens Γ output rate). Rates are expressed per million tokens ($/1M). A 1,000-token prompt at $3/1M costs $0.003 β small per call, but 10,000 calls/day at $0.01 each is $3,000/month. Use the scale table above to see how costs compound at your volume.
Looking to just count tokens without cost analysis? Try our AI Token Calculator β it shows words, characters, and a token-to-cost reference table alongside the count.
2026 LLM Pricing Reference
Current pricing per 1 million tokens. Batch pricing is 50% of standard for async workloads. Always verify with your provider before committing to a budget.
| Model | Provider | Context | Input / 1M | Output / 1M | Out/In Ratio | Type |
|---|---|---|---|---|---|---|
| GPT-5.4 | OpenAI | 1M | $2.50 | $15.00 | 6.0Γ | Standard |
| GPT-5.4 (Batch) | OpenAI | 1M | $1.25 | $7.50 | 6.0Γ | Batch |
| GPT-5 mini | OpenAI | 400K | $0.250 | $2.00 | 8.0Γ | Standard |
| GPT-5 mini (Batch) | OpenAI | 400K | $0.125 | $1.00 | 8.0Γ | Batch |
| Claude Opus 4.6 | Anthropic | 200K | $5.00 | $25.00 | 5.0Γ | Standard |
| Claude Sonnet 4.6 | Anthropic | 200K | $3.00 | $15.00 | 5.0Γ | Standard |
| Claude Sonnet 4.6 (Batch) | Anthropic | 200K | $1.50 | $7.50 | 5.0Γ | Batch |
| Claude Haiku 4.5 | Anthropic | 200K | $1.00 | $5.00 | 5.0Γ | Standard |
| Claude Haiku 4.5 (Batch) | Anthropic | 200K | $0.500 | $2.50 | 5.0Γ | Batch |
| Claude Haiku 3.5 | Anthropic | 200K | $0.800 | $4.00 | 5.0Γ | Standard |
| Gemini 3 Pro | 1M | $2.00 | $12.00 | 6.0Γ | Standard | |
| Gemini 3 Flash | 1M | $0.500 | $3.00 | 6.0Γ | Standard | |
| Gemini 2.5 Pro | 2M | $1.25 | $10.00 | 8.0Γ | Standard | |
| Gemini 2.5 Flash | 1M | $0.300 | $2.50 | 8.3Γ | Standard | |
| Gemini 2.5 Flash-Lite | 1M | $0.100 | $0.400 | 4.0Γ | Standard | |
| Gemini 2.0 Flash | 1M | $0.100 | $0.400 | 4.0Γ | Standard | |
| Gemini 2.0 Flash-Lite | 1M | $0.075 | $0.300 | 4.0Γ | Standard | |
| Llama 3.3 70B | Meta | 128K | $0.230 | $0.400 | 1.7Γ | Standard |
| Llama 3.1 8B | Meta | 128K | $0.050 | $0.080 | 1.6Γ | Standard |
Cost Optimization Strategies
Practical techniques to reduce API spend without sacrificing output quality.
| Strategy | Typical Savings | Notes |
|---|---|---|
| Prompt Caching | Up to 90% on input | Cache static system prompts or documents. Supported by OpenAI and Anthropic. Toggle in calculator above. |
| Batch API | 50% overall | OpenAI Batch and Anthropic Batches process requests within 24 hours at 50% discount. Enable in calculator above. |
| Shorten system prompt | 10β40% on input | Every request sends the system prompt. A 500-token reduction Γ 10K calls/day = 5M fewer input tokens/day. |
| Limit max_tokens | 20β60% on output | Set max_tokens to the minimum your use case needs. Output is typically your most expensive line item. |
| Route to smaller model | 80β95% overall | Use Haiku / Flash-Lite for classification, extraction, and summarization. Reserve Opus / Pro for complex reasoning. |
| Compress conversation history | 30β70% on input | Summarize older turns instead of appending the full chat history to every new request. |
Savings percentages are estimates. Actual results depend on your specific prompts, output length, and provider plan. Always measure with your real workload.
FAQ
Have more questions? Contact us