News

Meta’s AI podcast generation feature NotebookLlama debuts with room to improve

NotebookLlama first creates a transcript, adds 'dramatisation,' and then converts it into speech through open-source text-to-speech models. Early feedback suggests that its audio sounds noticeably robotic, with voices sometimes overlapping.

Social Samosa

28 Oct 2024 12:23 IST

New Update

Meta has introduced NotebookLlama, an open-source podcast-generation feature inspired by Google’s NotebookLM. Like NotebookLM, NotebookLlama processes text files, such as PDFs of news articles, into podcast-style audio. Using Meta’s Llama models, it first creates a transcript, adds 'dramatisation,' and then converts it into speech through open-source text-to-speech models.

However, early feedback suggests that NotebookLlama’s audio sounds noticeably robotic, with voices sometimes overlapping, which disrupts the flow. Meta-researchers acknowledge these limitations, citing the text-to-speech model as the main challenge for creating natural-sounding audio. “The text-to-speech model is the limitation of how natural this will sound,” they wrote on NotebookLlama’s GitHub page.

Wow! Meta dropped an open NotebookLM recipe: NotebookLlama 🔥

It uses L3.2 1B/ 3B for pre-processing the PDF, L3.1 70B for Transcript creation, L3.1 8B for re-writes and Parler TTS for Text to Speech ⚡

Step 1: Pre-process PDF: Use Llama-3.2-1B-Instruct to pre-process the PDF… pic.twitter.com/L7hb5GsMtl
— Vaibhav (VB) Srivastav (@reach_vb) October 27, 2024

Additionally, like all AI in this space, NotebookLlama faces the issue of hallucinations, where AI can generate inaccurate information. Despite its promising potential, it highlights the ongoing challenges in AI audio generation, particularly around audio quality and content accuracy.

Meta text to speech open source AI podcast AI audio generation