Token Generation Speed Simulator

Simulate token generation speed for large language models and see how throughput feels in a streaming UI.

Output
Elapsed time
0.000 s
Target speed
100 tok/s

How to Use

Set speed

Choose tokens per second to mimic a GPU or API tier. Higher values approximate H100-class decode; lower values resemble CPU or edge devices.

Set length

Select how many tokens of output to simulate. The panel caps the sample text length so the demo stays responsive in the browser.

Start simulation

Press Start to stream characters proportional to token throughput. Elapsed time and effective speed appear below.

Compare scenarios

Re-run with different sliders to see how latency changes user perception when revealing long answers progressively.

Throughput vs time-to-first-token

Users care about both how fast the first characters appear and how quickly the rest streams. APIs often report these separately. This simulator focuses on steady-state decode pacing using your chosen tokens per second.

At 100 tokens/s, generating 1000 tokens would take roughly 10.0 s in an idealized steady decode with no prefill stalls—compare that to the elapsed timer while the demo runs on a short excerpt.

FAQ

Have more questions? Contact us