GPT-4 Turbo with Vision is a step backwards for coding

OpenAI's newly released GPT-4 Turbo with Vision has shown a decline in performance on Aider's coding benchmark suites compared to previous GPT-4 models. It scored only 62% on the benchmark, the lowest among all GPT-4 models, and exhibited a higher tendency towards "lazy coding". Lazy coding refers to the model's tendency to omit necessary code and leave comments instead.

Despite the full support for the new GPT-4 Turbo with Vision model, Aider will continue to use the `gpt-4-1106-preview` as the default model due to its superior coding performance. The new model also scored poorly on Aider's refactoring benchmark, making it the "laziest" coder among all GPT-4 Turbo models.

Key takeaways:

OpenAI's newly released GPT-4 Turbo with Vision performs worse on aider’s coding benchmark suites than all the previous GPT-4 models, scoring only 62%.
The new model is more prone to 'lazy coding', often omitting necessary code and leaving comments instead.
On aider’s refactoring benchmark, the GPT-4 Turbo with Vision model scores only 34%, making it the laziest coder of all the GPT-4 Turbo models.
Despite full support for the new model, aider will continue to use 'gpt-4-1106-preview' by default, as it is the strongest coder of the GPT-4 models.

GPT-4 Turbo with Vision is a step backwards for coding

Key takeaways:

Comments (0)

Newsletter