Jina AI Launches World's First Open-Source 8K Text Embedding, Rivaling OpenAI

Jina AI, a Berlin-based AI company, has announced the launch of its second-generation text embedding model, `jina-embeddings-v2`. This open-source model supports an 8K (8192 tokens) context length, matching the capabilities of OpenAI's proprietary model, `text-embedding-ada-002`. In a direct comparison, `jina-embeddings-v2` outperformed the OpenAI model in several areas including Classification Average, Reranking Average, Retrieval Average, and Summarization Average.

The new model was built from scratch over three months and its 8K context length allows for applications in legal document analysis, medical research, literary analysis, financial forecasting, and conversational AI. The model is available in two versions, a base model for tasks requiring higher accuracy and a small model for lightweight applications. Jina AI's future plans include publishing an academic paper detailing the technical aspects of `jina-embeddings-v2`, developing an OpenAI-like embeddings API platform, and launching German-English models.

Key takeaways:

Jina AI has launched its second-generation text embedding model, `jina-embeddings-v2`, which supports an 8K (8192 tokens) context length, matching OpenAI's proprietary model in capabilities and performance.
The `jina-embeddings-v2` model outperforms its OpenAI counterpart in several areas, including Classification Average, Reranking Average, Retrieval Average, and Summarization Average.
The new model's 8K context length enables it to be used in various industry applications such as legal document analysis, medical research, literary analysis, financial forecasting, and conversational AI.
Jina AI plans to publish an academic paper detailing the technical intricacies and benchmarks of `jina-embeddings-v2`, develop an OpenAI-like embeddings API platform, and launch German-English models.

Jina AI Launches World's First Open-Source 8K Text Embedding, Rivaling OpenAI

Key takeaways:

Comments (0)

Newsletter