Apple's new AI model could help Siri see how iOS apps work

Apple is reportedly developing a new multimodal large language model (MLLM) called Ferret-UI, which could enhance Siri's understanding of the layout of apps on an iPhone display. The technology, detailed in a paper by Cornell University, could enable Siri to comprehend and interact more effectively with user interface screens. The Ferret-UI system includes a magnification feature to upscale images to any resolution, making icons and text more readable, and divides the screen into two smaller sections for more effective processing and training.

The Ferret-UI technology could potentially allow Siri to perform actions for users within apps by selecting graphical elements on its own. This could be particularly beneficial for visually impaired users, as the advanced model could explain what is on the screen in detail and carry out actions for the user simply by vocal command. However, it is not yet confirmed whether the technology will be incorporated into systems like Siri.

Key takeaways:

Apple's Ferret LLM could potentially increase Siri's capabilities by understanding the layout of apps in an iPhone display.
The Ferret-UI, a new multimodal large language model (MLLM), is designed to understand the user interfaces of mobile displays, including referring, grounding, and reasoning capabilities.
Ferret-UI uses a magnification system to upscale images to any resolution, making icons and text more readable, and divides the screen into two smaller sections for processing and training.
The Ferret-UI could potentially be incorporated into systems like Siri, offering advanced control over a device like an iPhone and providing useful applications for the visually impaired.

Apple's new AI model could help Siri see how iOS apps work

Key takeaways:

Comments (0)

Newsletter