Prompt Token Cost Calculator

Paste your prompt, set output tokens, and instantly compare API costs across GPT, Claude, Gemini, and Llama β€” with caching and batch pricing.

Cost Calculator
Quick fill:
Input Tokens
0
Output Tokens (est.)
500
Total Tokens
500
Characters
0

β‰ˆ 375 words

Used for the scale estimate table below

Cheapest

Llama 3.1 8B128K

Meta

$0.00004
In $0.00Out $0.00004

Gemini 2.0 Flash-Lite1M

Google

$0.00015
In $0.00Out $0.00015

Gemini 2.5 Flash-Lite1M

Google

$0.0002
In $0.00Out $0.0002

Gemini 2.0 Flash1M

Google

$0.0002
In $0.00Out $0.0002

Llama 3.3 70B128K

Meta

$0.0002
In $0.00Out $0.0002

GPT-5 mini400K

OpenAI

$0.001
In $0.00Out $0.001

Gemini 2.5 Flash1M

Google

$0.00125
In $0.00Out $0.00125

Gemini 3 Flash1M

Google

$0.0015
In $0.00Out $0.0015

Claude Haiku 3.5200K

Anthropic

$0.002
In $0.00Out $0.002

Claude Haiku 4.5200K

Anthropic

$0.0025
In $0.00Out $0.0025

Gemini 2.5 Pro2M

Google

$0.005
In $0.00Out $0.005

Gemini 3 Pro1M

Google

$0.006
In $0.00Out $0.006

GPT-5.41M

OpenAI

$0.0075
In $0.00Out $0.0075

Claude Sonnet 4.6200K

Anthropic

$0.0075
In $0.00Out $0.0075

Claude Opus 4.6200K

Anthropic

$0.01
In $0.00Out $0.01

Production Scale β€” 1,000 calls/day

ModelProviderPer CallDaily CostMonthly Cost (Γ—30 days)
Llama 3.1 8BMeta$0.00004$0.04$1.20
Gemini 2.0 Flash-LiteGoogle$0.00015$0.15$4.50
Gemini 2.5 Flash-LiteGoogle$0.0002$0.20$6.00
Gemini 2.0 FlashGoogle$0.0002$0.20$6.00
Llama 3.3 70BMeta$0.0002$0.20$6.00
GPT-5 miniOpenAI$0.001$1.00$30.00
Gemini 2.5 FlashGoogle$0.00125$1.25$37.50
Gemini 3 FlashGoogle$0.0015$1.50$45.00
Claude Haiku 3.5Anthropic$0.002$2.00$60.00
Claude Haiku 4.5Anthropic$0.0025$2.50$75.00
Gemini 2.5 ProGoogle$0.005$5.00$150.00
Gemini 3 ProGoogle$0.006$6.00$180.00
GPT-5.4OpenAI$0.0075$7.50$225.00
Claude Sonnet 4.6Anthropic$0.0075$7.50$225.00
Claude Opus 4.6Anthropic$0.01$12.50$375.00

How to Use

Paste your prompt

Type or paste your system message, user prompt, or full conversation. Token count updates instantly in your browser β€” nothing is uploaded.

Set expected output

Enter the average output tokens the model will generate. Default is 500 tokens (β‰ˆ 375 words). Adjust to match your typical response length.

Filter, sort & compare

Use provider tabs to focus on one vendor. Toggle "Cheapest first" to rank models by cost. Enable "Batch API" to see 50%-off async pricing. Toggle caching to simulate cached input.

Project production cost

Enter your daily API call volume β€” the scale table shows projected daily and monthly spend (Γ—30 days) per model so you can budget before committing.

How LLM API pricing works

Every major LLM provider bills on a pay-per-token model. You pay separately for input tokens (your prompt, system message, conversation history) and output tokens (the model's generated response). Output tokens are typically priced 3–10Γ— higher than input because generating each token requires a full sequential forward pass β€” whereas input tokens are read in parallel in a single pass.

The formula: Total cost = (input tokens Γ— input rate) + (output tokens Γ— output rate). Rates are expressed per million tokens ($/1M). A 1,000-token prompt at $3/1M costs $0.003 β€” small per call, but 10,000 calls/day at $0.01 each is $3,000/month. Use the scale table above to see how costs compound at your volume.

Looking to just count tokens without cost analysis? Try our AI Token Calculator β€” it shows words, characters, and a token-to-cost reference table alongside the count.

2026 LLM Pricing Reference

Current pricing per 1 million tokens. Batch pricing is 50% of standard for async workloads. Always verify with your provider before committing to a budget.

ModelProviderContextInput / 1MOutput / 1MOut/In RatioType
GPT-5.4OpenAI1M$2.50$15.006.0Γ—Standard
GPT-5.4 (Batch)OpenAI1M$1.25$7.506.0Γ—Batch
GPT-5 miniOpenAI400K$0.250$2.008.0Γ—Standard
GPT-5 mini (Batch)OpenAI400K$0.125$1.008.0Γ—Batch
Claude Opus 4.6Anthropic200K$5.00$25.005.0Γ—Standard
Claude Sonnet 4.6Anthropic200K$3.00$15.005.0Γ—Standard
Claude Sonnet 4.6 (Batch)Anthropic200K$1.50$7.505.0Γ—Batch
Claude Haiku 4.5Anthropic200K$1.00$5.005.0Γ—Standard
Claude Haiku 4.5 (Batch)Anthropic200K$0.500$2.505.0Γ—Batch
Claude Haiku 3.5Anthropic200K$0.800$4.005.0Γ—Standard
Gemini 3 ProGoogle1M$2.00$12.006.0Γ—Standard
Gemini 3 FlashGoogle1M$0.500$3.006.0Γ—Standard
Gemini 2.5 ProGoogle2M$1.25$10.008.0Γ—Standard
Gemini 2.5 FlashGoogle1M$0.300$2.508.3Γ—Standard
Gemini 2.5 Flash-LiteGoogle1M$0.100$0.4004.0Γ—Standard
Gemini 2.0 FlashGoogle1M$0.100$0.4004.0Γ—Standard
Gemini 2.0 Flash-LiteGoogle1M$0.075$0.3004.0Γ—Standard
Llama 3.3 70BMeta128K$0.230$0.4001.7Γ—Standard
Llama 3.1 8BMeta128K$0.050$0.0801.6Γ—Standard

Cost Optimization Strategies

Practical techniques to reduce API spend without sacrificing output quality.

StrategyTypical SavingsNotes
Prompt CachingUp to 90% on inputCache static system prompts or documents. Supported by OpenAI and Anthropic. Toggle in calculator above.
Batch API50% overallOpenAI Batch and Anthropic Batches process requests within 24 hours at 50% discount. Enable in calculator above.
Shorten system prompt10–40% on inputEvery request sends the system prompt. A 500-token reduction Γ— 10K calls/day = 5M fewer input tokens/day.
Limit max_tokens20–60% on outputSet max_tokens to the minimum your use case needs. Output is typically your most expensive line item.
Route to smaller model80–95% overallUse Haiku / Flash-Lite for classification, extraction, and summarization. Reserve Opus / Pro for complex reasoning.
Compress conversation history30–70% on inputSummarize older turns instead of appending the full chat history to every new request.

Savings percentages are estimates. Actual results depend on your specific prompts, output length, and provider plan. Always measure with your real workload.

FAQ

Have more questions? Contact us