Federated Finetuning of OpenAI's Whisper

The blog post introduces a code example that uses Federated Learning to fine-tune Open AI’s Whisper, a state-of-the-art ASR model, for the task of keyword spotting. The process involves using a pre-trained Whisper encoder from Transformers, freezing its parameters, and federating the learning of a classification head to classify 1-second audio waveforms into one of twelve possible classes. The example uses the Google SpeechCommands dataset and the Flower framework to ensure client privacy by taking the training to the data source.

The post also discusses running the example on the new Raspberry Pi 5, which exhibits superior performance across tasks compared to the previous Raspberry Pi 4. This makes it suitable for demanding on-device training workloads. The blog post provides a detailed comparison of the time taken for various stages of the process on both Raspberry Pi 4 and 5, showing significant improvements in the newer model.

Key takeaways:

The blog post introduces a code example that uses Federated Learning to fine-tune Open AI’s Whisper, a state-of-the-art ASR model, for the task of keyword spotting.
Federated Learning allows large models trained on publicly available data to be fine-tuned using private data without having to copy the data to a central server. This ensures client privacy.
The example uses a pre-trained Whisper encoder to classify 1-second audio waveforms into one of twelve possible classes. The Google SpeechCommands dataset is used for this purpose.
The blog post also includes a benchmark of the new Raspberry Pi 5, showing its superior performance across tasks compared to the previous Raspberry Pi 4, making it suitable for demanding on-device training workloads.

Federated Finetuning of OpenAI's Whisper

Key takeaways:

Comments (0)

Newsletter