The author also notes that while GPT-4 is still more expensive, it is no longer slower for the majority of requests. They also hint at exploring whether OpenAI deliberately slows down users as they approach their rate limits. The article concludes with the surprising finding that latencies for the 99th percentile of requests have more than halved in just three months.
Key takeaways:
- Over the past few months, GPT-4 has been catching up in speed, closing the latency gap with GPT 3.5.
- Factors contributing to latency include round trip time, queuing time, and processing time, which can vary significantly based on the complexity and length of the prompt.
- High token count doesn’t always translate to a slower response. For instance, a simple prompt with 204 tokens can receive a response in a brisk 4.5 seconds, while a complex 33-token prompt might take 32 seconds to process.
- While GPT-4 is costlier, it’s not slower anymore for the majority of requests. The team is also exploring whether latency increases as users get near their rate limits.