The performance of GPT-4V varies across different tasks. In some tests, it has passed 100% of the time, while in others, it has failed consistently. The cost of each request also varies. The article emphasizes that while the prompts used for the tests were chosen for their accuracy, other prompts may yield different results. The tests are designed to act as a reference for the capabilities of GPT-4V.
Key takeaways:
- The article presents a series of tests conducted on GPT-4 Vision (GPT-4V) to evaluate its performance over time.
- Each test involves running the same prompt and image through GPT-4V and comparing the result to a human-written result.
- The tests are designed to monitor core features of GPT-4V, including its ability to count objects in an image, read text from an image, identify objects and their positions, and extract relevant data from images.
- Tests are run daily at 1am PT and the results are updated on the website once all tests are complete.