Number of Token per Second Calculations

The author is trying to calculate the expected number of tokens per second from the "llama 7b" model deployed on A10G, which has a processing power of 31.52 TFLOPS for FP16. They use the formula number of tokens = (TFLOPS / (2 * number of model parameters)) and find the expected number of tokens to be approximately 2251.4285714285716 tokens per second.

However, the actual output they receive from the model is significantly lower, at around 32 tokens per second. The author is unsure why there is such a large discrepancy between the expected and actual results, and is seeking clarification or potential reasons for this difference.

Key takeaways:

The user is trying to calculate the number of tokens per second they expect to get from the "llama 7b" model deployed on A10G.
The formula used for calculation is the number of tokens = (TFLOPS / (2 * number of model parameters)).
According to their calculations, they should be getting 2251.4285714285716 tokens per second.
However, the actual output they are getting from the model is approximately 32 tokens per second, leading them to question if they are missing something in their calculations.

Number of Token per Second Calculations

Key takeaways:

Comments (0)

Newsletter