New Update
Advertisment
Meta has introduced NotebookLlama, an open-source podcast-generation feature inspired by Google’s NotebookLM. Like NotebookLM, NotebookLlama processes text files, such as PDFs of news articles, into podcast-style audio. Using Meta’s Llama models, it first creates a transcript, adds 'dramatisation,' and then converts it into speech through open-source text-to-speech models.
However, early feedback suggests that NotebookLlama’s audio sounds noticeably robotic, with voices sometimes overlapping, which disrupts the flow. Meta-researchers acknowledge these limitations, citing the text-to-speech model as the main challenge for creating natural-sounding audio. “The text-to-speech model is the limitation of how natural this will sound,” they wrote on NotebookLlama’s GitHub page.
Wow! Meta dropped an open NotebookLM recipe: NotebookLlama 🔥
— Vaibhav (VB) Srivastav (@reach_vb) October 27, 2024
It uses L3.2 1B/ 3B for pre-processing the PDF, L3.1 70B for Transcript creation, L3.1 8B for re-writes and Parler TTS for Text to Speech ⚡
Step 1: Pre-process PDF: Use Llama-3.2-1B-Instruct to pre-process the PDF… pic.twitter.com/L7hb5GsMtl
Additionally, like all AI in this space, NotebookLlama faces the issue of hallucinations, where AI can generate inaccurate information. Despite its promising potential, it highlights the ongoing challenges in AI audio generation, particularly around audio quality and content accuracy.