Lab Notes: Transcription with Vibe and Buzz

By Nelson Garcia

Vibe and Buzz are free audio/video transcription programs. 

Vibe

Vibe is an on-device transcription program that will take an audio file or a video file and convert whatever is spoken in that file into text that you can format in a bunch of different ways.  It is simple to use, and it runs on Mac, Windows, and Linux. It is open source in both the software and the engine that it is using. It uses Open AI Whisper engine to do the transcription. 

Useful features include:

  • It can function offline
  • It will transcribe into over 90 different languages.
  • It has diarization (speaker recognition) and can edit\change speakers.
  • It can be saved in various formats including .txt, .srt., .docx, and .pdf.
  • It can summarize transcripts with llama or Claude (additional steps required)
  • Optimized for GPU (Graphics Processing Unit).
  • It will NOT translate from English to another language, but it can translate into English from over 90 languages.
  • Note: Like other transcription software\programs, it doesn’t handle crosstalk very well.

Where to download it:

https://thewh1teagle.github.io/vibe

How to Get Started with Vibe

StepWhat You DoNotes / Tips
1. Download / InstallGo to Vibe’s GitHub Releases page and download the version for your OS.Windows users can get the .exe file, macOS users the .dmg or .app file.
2. Launch the AppRun Vibe. It will open a GUI where you can load files or use a microphone.The interface is intuitive and cross-platform.
3. Add an Audio / Video FileImport an audio or video file to transcribe.Supports common formats like MP3, WAV, MP4, MOV, and more.
4. Configure SettingsChoose your transcription model, set language, enable GPU if available, etc.GPU acceleration can speed up transcription significantly.
5. Run TranscriptionClick the ‘Transcribe’ button to begin.The app displays a live progress indicator while transcribing.
6. Review & EditReview the text and correct errors or punctuation.Transcripts may require manual cleanup for accuracy.
7. Export / SaveSave your output in TXT, SRT, PDF, or DOCX formats.SRT or VTT formats are ideal for captions.
8. (Optional) Summarize / ProcessUse summarization or analysis features if enabled.Integrates with local AI models like Ollama.

Tips for Getting Good Results

  • Use clear, high-quality audio with minimal background noise.
  • Choose a model size that balances speed and accuracy. Enable GPU acceleration if available for faster processing.
  • Split very long files into smaller segments for better accuracy.
  • Adjust timestamp and segmentation settings for subtitles.
  • Always review and correct transcripts before final use.

Buzz

Buzz transcribes and translates audio offline on your personal computer and you don’t need internet connection for it to work. It is free, and open source. The interface is simple. It uses OpenAI’s Whisper to do this. However, setting up translation requires additional setup with either Ollama or Groq.com and this is more complicated to setup.  Despite the translation issue, Buzz does transcription very well and the program is very easy to use. It runs on both Windows, Mac and Linux systems.  Useful features include:

  • It can import several audio and video files such as .mp3, .wav, .m4a, .mp4, .avi, .mov and several others.
  • It has “Live Recording” feature so you can transcribe using your PC’s microphone.
  • You can save your transcript in .srt, .txt, and. vtt format.
  • Will transcribe in 100 different languages
  • Many models available (tiny, medium, Large, etc.)
  • Very accurate using the large model although it took the longest to complete
  • Note: The translation portion is not very intuitive and requires advanced user setup
  • Note: It does not offer diarization

Where to download it:

https://github.com/chidiwilliams/buzz/releases/tag/v1.2.0

How to Get Started with Buzz

StepWhat You DoNotes / Tips
1. Download / InstallDownload the .exe from GitHub ‘Releases’ page and download the version for your OS.Windows users can get the .exe file, macOS users the .dmg or .app file.
2. Launch the AppRun Buzz. It will open a GUI where you can load files or use a microphone.The interface is intuitive and cross-platform.
3. Add an Audio / Video FileImport an audio or video file to transcribe.Supports common formats like MP3, WAV, MP4, MOV, and more.
4. Set Options/Choose ModeChoose Transcribe (speech-to-text) or Translate (speech-to-English). Select model size, input language and export option.Advanced settings have option for AI translation but setting up is very complicated for the average user.
5. Run TranscriptionClick the ‘Run’ button to begin.The app displays Status progress. Larger models take longer but are more accurate
6. Review & EditReview the text and correct errors or punctuation.Transcripts may require manual cleanup for accuracy.
7. Export / SaveExport your final text in formats like TXT, SRT, VTT for subtitles or documentation.SRT or VTT formats are ideal for captions.
8. Optional: Batch ProcessingBuzz supports transcribing multiple files simultaneously for efficiency.Need a robust computer for batch processing.

Tips for Getting Good Results

  • Use clear, high-quality audio to minimize transcription errors.
  • Choose larger models for higher accuracy if your system can handle them.
  • Best for transcription–translation setup is complicated.
  • Split long files into smaller segments for smoother performance.
  • Always review and correct transcripts for proper nouns and technical terms.
  • Use playback and looping to align text with audio accurately.
  • Export in SRT or VTT formats for captioning and subtitle use.
  • Keep your model and app updated for improved accuracy and stability.

Leave a Reply