Token Generation Speed Simulator
Simulate token generation speed for large language models and see how throughput feels in a streaming UI.
How to Use
Set speed
Choose tokens per second to mimic a GPU or API tier. Higher values approximate H100-class decode; lower values resemble CPU or edge devices.
Set length
Select how many tokens of output to simulate. The panel caps the sample text length so the demo stays responsive in the browser.
Start simulation
Press Start to stream characters proportional to token throughput. Elapsed time and effective speed appear below.
Compare scenarios
Re-run with different sliders to see how latency changes user perception when revealing long answers progressively.
Throughput vs time-to-first-token
Users care about both how fast the first characters appear and how quickly the rest streams. APIs often report these separately. This simulator focuses on steady-state decode pacing using your chosen tokens per second.
At 100 tokens/s, generating 1000 tokens would take roughly 10.0 s in an idealized steady decode with no prefill stalls—compare that to the elapsed timer while the demo runs on a short excerpt.
FAQ
Have more questions? Contact us