GPT-3.5 and GPT-4 response times

The article discusses the response times of different GPT models, specifically OpenAI's GPT-3.5, Azure's GPT-3.5, and OpenAI's GPT-4. The author conducted tests to determine the latency per token for each model, finding that Azure's GPT-3.5 was more than twice as fast as OpenAI's GPT-3.5, and OpenAI's GPT-4 was almost three times slower than its GPT-3.5. The author suggests that to reduce response times, users should aim to generate as few tokens as possible.

The author also provides details on how the experiments were conducted, including the use of a linear regression model to determine latency per token. The experiments were conducted in May 2023, using a 500Mbit network in Estonia. The author used a paid account for OpenAI and US-East endpoints for Azure. The author concludes by providing the best linear fits and raw data points for each model.

Key takeaways:

The response time of GPT APIs largely depends on the number of output tokens generated by the model.
Azure's GPT-3.5 model is more than twice as fast as OpenAI's GPT-3.5 model, with a latency of 34ms per generated token compared to OpenAI's 73ms.
OpenAI's GPT-4 model is almost three times slower than its GPT-3.5 model, with a latency of 196ms per generated token.
To make GPT API responses faster, it is recommended to generate as few tokens as possible.

GPT-3.5 and GPT-4 response times

Key takeaways:

Comments (0)

Newsletter