Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - elfvingralf/macOSpilot-ai-assistant: Voice + Vision powered AI assistant that answers questions about any application, in context and in audio.

Dec 12, 2023 - github.com
macOSpilot is a personal AI assistant for macOS that answers questions in any application. It works by taking a screenshot of the active window and sending it to OpenAI GPT Vision along with a transcript of the user's question. The answer is then displayed in text and converted into audio using OpenAI TTS (text to speech). The assistant can be triggered with a keyboard shortcut and works with any application on macOS.

To use macOSpilot, users need to install the NodeJS project and dependencies, configure the application in `index.js`, and run it in the background. When a question is asked, macOSpilot sends the question and a screenshot to OpenAI's Whisper and Vision APIs. The response is displayed in a small notification window and read out loud using OpenAI's TTS API. A history of answers is available in another window. The application also allows for various configurations, including changing the keyboard shortcut, OpenAI Vision prompt, image size, and window sizes.

Key takeaways:

  • macOSpilot is a personal AI assistant for macOS that answers questions in any application, using a keyboard shortcut to trigger the assistant and provide answers in context and in audio.
  • The assistant works by taking a screenshot of the active window and sending it to OpenAI GPT Vision along with a transcript of the user's question. The answer is then displayed in text and converted into audio using OpenAI TTS (text to speech).
  • To use macOSpilot, users need to install the NodeJS project and dependencies, configure the application in `index.js`, and add their OpenAI API key. The application can be triggered using a keyboard shortcut and the user's question is spoken into the microphone.
  • Improvements for macOSpilot could include enabling optional conversation state in between sessions, using buffers instead of writing/reading screenshot and audio files to disk, making assistant audio and always-on-top window configurable in UI, making screenshot settings configurable in UI, and fixing the microphone issue not working as .app.
View Full Article

Comments (0)

Be the first to comment!