The author also speculates that GPT-3.5-turbo may be a MoE model as well, due to its speed, non-deterministic nature, and the removal of logprobs. They suggest that this non-determinism should be visibly obvious to anyone working with MoE models and that the lack of public awareness indicates a knowledge gap in the AI community. The author concludes by calling for more research and understanding of MoE models.
Key takeaways:
- The author presents a hypothesis that the non-determinism in GPT-4 and potentially GPT-3.5-turbo is primarily due to batched inference in sparse Mixture of Experts (MoE) models, rather than non-deterministic CUDA optimized floating point operations.
- Through empirical testing, the author demonstrates that API calls to GPT-4 and potentially some 3.5 models are significantly more non-deterministic than other OpenAI models.
- The author speculates that GPT-3.5-turbo may also be a MoE model due to its speed, non-determinism, and removal of logprobs.
- The author suggests that the lack of widespread understanding of MoE models and their inherent non-determinism indicates a knowledge gap in the AI community.