Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - rounak/PhoneAgent

Jun 02, 2025 - github.com
PhoneAgent is an iPhone application developed during an OpenAI hackathon that utilizes OpenAI models to perform tasks across multiple apps, mimicking human interaction. It allows users to execute commands such as sending messages, downloading apps, and controlling phone settings through text or voice prompts. The app operates by accessing the accessibility tree of apps, enabling it to tap, swipe, scroll, type, and open applications. It features an optional "Always On" mode that listens for commands starting with a wake word, even when the app is in the background.

The app is powered by OpenAI's GPT-4.1 model and uses Xcode's UI testing harness to interact with iOS apps without requiring a jailbreak. It communicates with a UI test via a TCP Server to trigger prompts. While it offers innovative features, it has limitations such as imperfect keyboard input, issues with capturing view hierarchies during animations, and not waiting for long-running tasks to complete. The app is experimental, and users are advised to run it in an isolated environment as app contents are sent to OpenAI's API.

Key takeaways:

  • PhoneAgent is an iPhone app that uses OpenAI models to perform tasks across multiple apps, similar to a human user.
  • The app can perform actions like tapping, swiping, scrolling, typing, and opening apps using the accessibility tree of iOS apps.
  • It features an optional Always On mode that listens for prompts starting with a wake word, even when the app is in the background.
  • The project uses Xcode's UI testing harness to interact with apps and the system without requiring a jailbreak.
View Full Article

Comments (0)

Be the first to comment!