Advertisment

Meta’s AI podcast generation feature NotebookLlama debuts with room to improve

NotebookLlama first creates a transcript, adds 'dramatisation,' and then converts it into speech through open-source text-to-speech models. Early feedback suggests that its audio sounds noticeably robotic, with voices sometimes overlapping.

author-image
Social Samosa
New Update
122

Meta has introduced NotebookLlama, an open-source podcast-generation feature inspired by Google’s NotebookLM. Like NotebookLM, NotebookLlama processes text files, such as PDFs of news articles, into podcast-style audio. Using Meta’s Llama models, it first creates a transcript, adds 'dramatisation,' and then converts it into speech through open-source text-to-speech models.

NotebookLlama
Image- Meta

 

However, early feedback suggests that NotebookLlama’s audio sounds noticeably robotic, with voices sometimes overlapping, which disrupts the flow. Meta-researchers acknowledge these limitations, citing the text-to-speech model as the main challenge for creating natural-sounding audio. “The text-to-speech model is the limitation of how natural this will sound,” they wrote on NotebookLlama’s GitHub page.

Additionally, like all AI in this space, NotebookLlama faces the issue of hallucinations, where AI can generate inaccurate information. Despite its promising potential, it highlights the ongoing challenges in AI audio generation, particularly around audio quality and content accuracy.

Meta text to speech open source AI podcast AI audio generation