Non-determinism in GPT-4 is caused by Sparse MoE

The author of the article discusses the non-deterministic nature of OpenAI's GPT-4 and GPT-3.5-turbo models, even at a temperature of 0.0. They propose a hypothesis that the non-determinism is not due to a bug or floating point calculation inaccuracies, but rather a result of batched inference in sparse Mixture of Experts (MoE) models, which is a feature of GPT-4's architecture. The author supports this hypothesis with empirical evidence showing that API calls to GPT-4 and potentially some 3.5 models are substantially more non-deterministic than other OpenAI models.

The author also speculates that GPT-3.5-turbo may be a MoE model as well, due to its speed, non-deterministic nature, and the removal of logprobs. They suggest that this non-determinism should be visibly obvious to anyone working with MoE models and that the lack of public awareness indicates a knowledge gap in the AI community. The author concludes by calling for more research and understanding of MoE models.

Key takeaways:

The author presents a hypothesis that the non-determinism in GPT-4 and potentially GPT-3.5-turbo is primarily due to batched inference in sparse Mixture of Experts (MoE) models, rather than non-deterministic CUDA optimized floating point operations.
Through empirical testing, the author demonstrates that API calls to GPT-4 and potentially some 3.5 models are significantly more non-deterministic than other OpenAI models.
The author speculates that GPT-3.5-turbo may also be a MoE model due to its speed, non-determinism, and removal of logprobs.
The author suggests that the lack of widespread understanding of MoE models and their inherent non-determinism indicates a knowledge gap in the AI community.

Non-determinism in GPT-4 is caused by Sparse MoE

Key takeaways:

Comments (0)

Newsletter