Lab Notes: Transcription with Vibe and Buzz

Vibe and Buzz are free audio/video transcription programs.

Vibe

Vibe is an on-device transcription program that will take an audio file or a video file and convert whatever is spoken in that file into text that you can format in a bunch of different ways. It is simple to use, and it runs on Mac, Windows, and Linux. It is open source in both the software and the engine that it is using. It uses Open AI Whisper engine to do the transcription.

Useful features include:

It can function offline
It will transcribe into over 90 different languages.
It has diarization (speaker recognition) and can edit\change speakers.
It can be saved in various formats including .txt, .srt., .docx, and .pdf.
It can summarize transcripts with llama or Claude (additional steps required)
Optimized for GPU (Graphics Processing Unit).
It will NOT translate from English to another language, but it can translate into English from over 90 languages.
Note: Like other transcription software\programs, it doesn’t handle crosstalk very well.

Where to download it:

https://thewh1teagle.github.io/vibe

How to Get Started with Vibe

Step	What You Do	Notes / Tips
1. Download / Install	Go to Vibe’s GitHub Releases page and download the version for your OS.	Windows users can get the .exe file, macOS users the .dmg or .app file.
2. Launch the App	Run Vibe. It will open a GUI where you can load files or use a microphone.	The interface is intuitive and cross-platform.
3. Add an Audio / Video File	Import an audio or video file to transcribe.	Supports common formats like MP3, WAV, MP4, MOV, and more.
4. Configure Settings	Choose your transcription model, set language, enable GPU if available, etc.	GPU acceleration can speed up transcription significantly.
5. Run Transcription	Click the ‘Transcribe’ button to begin.	The app displays a live progress indicator while transcribing.
6. Review & Edit	Review the text and correct errors or punctuation.	Transcripts may require manual cleanup for accuracy.
7. Export / Save	Save your output in TXT, SRT, PDF, or DOCX formats.	SRT or VTT formats are ideal for captions.
8. (Optional) Summarize / Process	Use summarization or analysis features if enabled.	Integrates with local AI models like Ollama.

Tips for Getting Good Results

Use clear, high-quality audio with minimal background noise.
Choose a model size that balances speed and accuracy. Enable GPU acceleration if available for faster processing.
Split very long files into smaller segments for better accuracy.
Adjust timestamp and segmentation settings for subtitles.
Always review and correct transcripts before final use.

Buzz

Buzz transcribes and translates audio offline on your personal computer and you don’t need internet connection for it to work. It is free, and open source. The interface is simple. It uses OpenAI’s Whisper to do this. However, setting up translation requires additional setup with either Ollama or Groq.com and this is more complicated to setup. Despite the translation issue, Buzz does transcription very well and the program is very easy to use. It runs on both Windows, Mac and Linux systems. Useful features include:

It can import several audio and video files such as .mp3, .wav, .m4a, .mp4, .avi, .mov and several others.
It has “Live Recording” feature so you can transcribe using your PC’s microphone.
You can save your transcript in .srt, .txt, and. vtt format.
Will transcribe in 100 different languages
Many models available (tiny, medium, Large, etc.)
Very accurate using the large model although it took the longest to complete
Note: The translation portion is not very intuitive and requires advanced user setup
Note: It does not offer diarization

Where to download it:

https://github.com/chidiwilliams/buzz/releases/tag/v1.2.0

How to Get Started with Buzz

Step	What You Do	Notes / Tips
1. Download / Install	Download the .exe from GitHub ‘Releases’ page and download the version for your OS.	Windows users can get the .exe file, macOS users the .dmg or .app file.
2. Launch the App	Run Buzz. It will open a GUI where you can load files or use a microphone.	The interface is intuitive and cross-platform.
3. Add an Audio / Video File	Import an audio or video file to transcribe.	Supports common formats like MP3, WAV, MP4, MOV, and more.
4. Set Options/Choose Mode	Choose Transcribe (speech-to-text) or Translate (speech-to-English). Select model size, input language and export option.	Advanced settings have option for AI translation but setting up is very complicated for the average user.
5. Run Transcription	Click the ‘Run’ button to begin.	The app displays Status progress. Larger models take longer but are more accurate
6. Review & Edit	Review the text and correct errors or punctuation.	Transcripts may require manual cleanup for accuracy.
7. Export / Save	Export your final text in formats like TXT, SRT, VTT for subtitles or documentation.	SRT or VTT formats are ideal for captions.
8. Optional: Batch Processing	Buzz supports transcribing multiple files simultaneously for efficiency.	Need a robust computer for batch processing.

Tips for Getting Good Results

Use clear, high-quality audio to minimize transcription errors.
Choose larger models for higher accuracy if your system can handle them.
Best for transcription–translation setup is complicated.
Split long files into smaller segments for smoother performance.
Always review and correct transcripts for proper nouns and technical terms.
Use playback and looping to align text with audio accurately.
Export in SRT or VTT formats for captioning and subtitle use.
Keep your model and app updated for improved accuracy and stability.

nlsblog.org

National Litigation Support Blog for Federal/Community Defenders and CJA Practitioners

Lab Notes: Transcription with Vibe and Buzz

Like this:

Share this:

Like this:

Discover more from nlsblog.org