GitHub - JigsawStack/insanely-fast-whisper-api

The article introduces the Insanely Fast Whisper API, a tool designed to transcribe audio using OpenAI's Whisper Large v3. The API is powered by Transformers, Optimum, and flash-attn, and is deployable on cloud infrastructure with GPUs for scalable production use cases. It is particularly optimized for Fly.io's recent GPU service launch, but can also be deployed on any VM environment that supports GPUs and Docker. The API has been benchmarked on Nvidia A100 - 80GB and fly.io GPU infrastructure, with impressive transcription times for 150 minutes of audio.

The article provides detailed instructions on how to deploy the API on Fly and other cloud providers, as well as how to run it locally. It also mentions that the API will soon be available as a fully managed service on JigsawStack, a platform that provides powerful APIs for various use cases. The API supports several features, including language auto-detection, parallel batch computation, audio diarization, and webhook calls on completion or error. The article concludes by acknowledging the contributions of Vaibhav Srivastav and OpenAI Whisper to the project.

Key takeaways:

The Insanely Fast Whisper API is a project that uses OpenAI's Whisper Large v3 to transcribe audio quickly and efficiently, and it can be deployed on any cloud infrastructure that supports GPUs and Docker.
The API has been tested on Nvidia A100 - 80GB and fly.io GPU infrastructure, with impressive results in terms of time taken to transcribe audio.
The project can be deployed on Fly.io's GPU service, and the instructions for deployment, including setting up speaker diarization and an auth token, are provided.
The API will soon be available as a fully managed API on JigsawStack, which provides powerful APIs for various use cases while keeping costs low.

GitHub - JigsawStack/insanely-fast-whisper-api

Key takeaways:

Comments (0)

Newsletter