Apple researchers develop AI that can ‘see’ and understand screen context

Apple researchers have developed a new AI system, ReALM (Reference Resolution As Language Modeling), that can understand ambiguous references to on-screen entities and context, enabling more natural interactions with voice assistants. The system uses large language models to convert the complex task of reference resolution into a pure language modeling problem, achieving significant performance gains over existing methods. It can reconstruct the screen using parsed on-screen entities and their locations to generate a textual representation that captures the visual layout.

The research highlights the potential for focused language models to handle tasks like reference resolution in production systems. However, the researchers caution that automated parsing of screens has limitations and handling complex visual references may require incorporating computer vision and multi-modal techniques. Despite trailing behind tech rivals in AI, Apple is making significant strides in AI research and is expected to unveil new AI-powered features at its Worldwide Developers Conference in June.

Key takeaways:

Apple researchers have developed a new artificial intelligence system called ReALM (Reference Resolution As Language Modeling) that can understand ambiguous references to on-screen entities and context, improving interactions with voice assistants.
The system reconstructs the screen using parsed on-screen entities and their locations to generate a textual representation that captures the visual layout, outperforming GPT-4 on the task.
Despite the advancements, the researchers caution that relying on automated parsing of screens has limitations and handling more complex visual references would likely require incorporating computer vision and multi-modal techniques.
Apple is making significant strides in artificial intelligence research, with expectations of unveiling a new large language model framework, an “Apple GPT” chatbot, and other AI-powered features at its Worldwide Developers Conference in June.

Apple researchers develop AI that can ‘see’ and understand screen context

Key takeaways:

Comments (0)

Newsletter