Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - unum-cloud/uform: Multi-Modal AI inference library for Multi-Lingual Text, Image, and Video Search, Recommendations, and other Vision-Language tasks, up to 5x faster than OpenAI CLIP 🖼️ & 🖋️

Aug 18, 2023 - github.com
UForm is a multi-modal inference library designed to encode multi-lingual texts, images, and soon, audio, video, and documents into a shared vector space. It comes with pre-trained networks and is available on HuggingFace. The library supports three types of multi-modal encoding: late-fusion models, early-fusion models, and mid-fusion models. The late-fusion models encode each modality independently, making them suitable for retrieval in extensive collections. Early-fusion models encode both modalities jointly, making them ideal for re-ranking relatively small retrieval results. Mid-fusion models are a combination of the two, allowing for encoding each modality separately and enhancing them with a cross-attention mechanism.

UForm provides a range of models with different architectures and languages. The multilingual models were trained on a language-balanced dataset. The library also provides additional tools to calculate semantic compatibility between an image and a text, such as Cosine Similarity and Matching Score. Cosine Similarity is computationally cheap and suitable for retrieval in large collections, while Matching Score captures fine-grained features and is suitable for re-ranking.

Key takeaways:

  • UForm is a Multi-Modal Modal inference library designed to encode Multi-Lingual Texts, Images, and soon, Audio, Video, and Documents, into a shared vector space.
  • It offers three types of multi-modal encoding: late-fusion models, early-fusion models, and mid-fusion models, each with different capabilities and use cases.
  • The UForm library is efficient and can be run on various platforms, from large servers to mobile phones, and is available on HuggingFace.
  • It also provides tools to calculate semantic compatibility between an image and a text, namely Cosine Similarity and Matching Score.
View Full Article

Comments (0)

Be the first to comment!