GPT-4V Checkup

The article presents a series of tests conducted on GPT-4 Vision (GPT-4V) to evaluate its performance over time. The tests involve various tasks such as counting objects in an image, reading text from an image, identifying objects, and extracting relevant data from images. The results of these tests are compared to human-written results to measure the accuracy of GPT-4V. The tests are conducted daily and the website is updated with the results.

The performance of GPT-4V varies across different tasks. In some tests, it has passed 100% of the time, while in others, it has failed consistently. The cost of each request also varies. The article emphasizes that while the prompts used for the tests were chosen for their accuracy, other prompts may yield different results. The tests are designed to act as a reference for the capabilities of GPT-4V.

Key takeaways:

The article presents a series of tests conducted on GPT-4 Vision (GPT-4V) to evaluate its performance over time.
Each test involves running the same prompt and image through GPT-4V and comparing the result to a human-written result.
The tests are designed to monitor core features of GPT-4V, including its ability to count objects in an image, read text from an image, identify objects and their positions, and extract relevant data from images.
Tests are run daily at 1am PT and the results are updated on the website once all tests are complete.

GPT-4V Checkup

Key takeaways:

Comments (0)

Newsletter