GitHub - nickandbro/chatGPT_Vision_To

The article discusses a method to enhance the precision of object identification in images using ChatGPT Vision. The process involves breaking an image into nine sections and having ChatGPT Vision classify objects within these sections. If an object is identified in a section, that section is further divided into nine subsections for more precise object identification. This process repeats until no more sections can be identified or the image becomes too small for further division.

However, the method has some limitations. Objects that span multiple sections may not be correctly identified. Also, the subdivision of sections can lead to a loss of resolution, which can negatively impact the model's performance. Additionally, there can be instances where ChatGPT functions do not accurately identify the mentioned sections.

Key takeaways:

ChatGPT Vision To Coords is a method to classify objects in an image by breaking the image into 9 sections and identifying objects in those sections.
The process involves breaking the image into sections, sending it to ChatGPT Vision for processing, identifying the sections with objects, and repeating the process for better precision.
To use it, one needs to clone the repository, install the requirements, insert the API key into config.py, and change the image path in main.py to the desired image.
Some issues with this method include difficulty in correctly identifying objects that span multiple sections, loss of resolution when subdividing sections, and occasional failure of ChatGPT functions to correctly identify the mentioned sections.

GitHub - nickandbro/chatGPT_Vision_To_Coords

Key takeaways:

Comments (0)

Newsletter