Indexing iCloud Photos with AI Using LLaVA and pgvector

The author discusses his fascination with AI and his project to leverage a multi-modal Language Learning Model (LLM) to improve the semantic search on his photo archive in iCloud. He uses LLaVA with Q4 and llamafile to execute the LLM model, which has a REST API that allows him to encode images. He also experiments with different prompts to generate descriptions for the images. The author then uses the SentenceTransformer model to generate embeddings for these descriptions and stores them in a Postgres database using the pgvector extension.

The author also expresses his concern about the over-reliance on OpenAI and encourages support for open-source LLMs. He argues that while OpenAI has contributed significantly to advancing the field, there are other special LLMs that can work on edge and should not be overlooked. The author concludes by sharing Python code used for generating descriptions, embeddings, and querying them.

Key takeaways:

The author is exploring the use of local AI models, specifically Language Models (LLMs), to improve the semantic search of his photo archive in iCloud.
The author uses a multi-modal LLM that can understand images and describes them in detail. The descriptions are then embedded as vectors using a popular algorithm, allowing users to search based on these descriptions.
The author uses the pgvector extension with Postgres database to store the state of his application and to query the similarity of the generated descriptions.
The author encourages support for open-source LLMs and emphasizes the importance of not being dependent on a single company for AI development.

Indexing iCloud Photos with AI Using LLaVA and pgvector

Key takeaways:

Comments (0)

Newsletter