It use youtube_transcript_api
The product is a script that enhances YouTube transcript formatting, adding punctuation and structure. Users mention its reliance on YouTube's API, which lacks official support for transcripts, leading to inefficient scraping and potential bandwidth issues. There's speculation about YouTube's motives, suggesting they limit competition and keep training data costly. Positive feedback exists alongside technical discussions about tools like yt-dlp, error handling, and content-to-markdown conversion. Questions about copyright, obtaining transcripts, and searching subtitles also arise. The code is open-source with an MIT license.
Users criticized the product for poorly formatted auto-generated subtitles, lack of a clear API endpoint from Google, and YouTube's cultural resistance to exposing a transcript API. Inefficient scraping methods were likened to pseudo-DDOS attacks. The absence of a clear competitor and behavior that limits competition were noted. Complaints about YouTube restricting data access, raw transcripts lacking punctuation, unclear punctuation derivation, and suggestions to use Whisper for retranscription were made. Technical issues with --trim-filenames not working, rg failing with timestamp metadata, and gibberish subtitles were reported. There were also mentions of YouTube blocking IPs and advice to avoid API requests for transcripts.