Skip to content

Too Long, Didn't Watch(TL/DW): Your Personal Research Multi-Tool - Open Source NotebookLM (eventually)

License

Notifications You must be signed in to change notification settings

rmusser01/tldw

 
 

Repository files navigation

TL/DW: Too Long, Didnt Watch

Download, Transcribe & Summarize: Video+Audio+Documents+Articles & Books(WIP). All automated

More: Full-Text-Search across everything ingested, Local LLM inference as part of it for those who don't want to mess with setting up an LLM, and a WebApp to interact with the script in a more user-friendly manner (all features are exposed through it).

Demo may be broken but should be working. If it's not, let me know. (HF dev spaces is touchy...)

License


Table of Contents


What is this (TL/DW)?

  • Don't care, give me code
    • Take a URL, a single(or multiple) video, a list of URLs, or a list of local videos, one per line in a text file, and feed it into the script(or GUI) and have each video transcribed (faster-whisper), summarized (Your LLM of choice), and ingested into a SQLite DB.
    • git clone https://github.com/rmusser01/tldw -> cd tldw -> python -m venv .\ -> . .\scripts\activate.ps1 -> pip install -r requirements.txt ->
      • CLI usage: python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s -api openai -k tag_one tag_two tag_three
      • GUI usage: python summarize.py -gui
      • GUI with local LLM: python summarize.py -gui --local_llama (will ask you questions about which model to download)
  • Short Summary
    • Take a URL, single video, list of URLs, or list of local videos + URLs and feed it into the script and have each video transcribed (and audio downloaded if not local) using faster-whisper.
    • Transcriptions can then be shuffled off to an LLM API endpoint of your choice, whether that be local or remote.
    • Rolling summaries (i.e. chunking up input and doing a chain of summaries) is supported only through OpenAI currently, though the scripts here will let you do it with exllama or vLLM, using the scripts in there for the entire pipeline.
  • Longer Summary/Goal
    • To be able to act as an ingestion tool for personal database storage. The idea being that there is so much data one comes across, and we can store it all as text.
    • Imagine, if you were able to keep a copy of every talk, research paper or article you've ever read, and have it at your finger tips in a moments notice.
    • Now, imagine if you could ask questions about that data/information(LLM), and be able to string it together with other pieces of data, to try and create sense of it all (RAG)
    • The end goal of this project, is to be a personal data assistant, that ingests recorded audio, videos, articles, free form text, documents, and books as text into a SQLite (for now, would like to build a shim for ElasticSearch/Similar) DB, so that you can then search across it at any time, and be able to retrieve/extract that information, as well as be able to ask questions about it. (Plus act as a nice way of personally tagging data for possible future training of your personal AI agent :P)
    • And of course, this is all open-source/free, with the idea being that this can massively help people in their efforts of research and learning.

For commercial API usage for use with this project: Claude Sonnet 3.5, Cohere Command R+, DeepSeek. Flipside I would say none honestly. The(the largest players) will gaslight you and charge you money for it. Fun. From @nrose 05/08/2024 on Threads:

No, it’s a design. First they train it, then they optimize it. Optimize it for what- better answers? No. For efficiency. 
Per watt. Because they need all the compute they can get to train the next model.So it’s a sawtooth. 
The model declines over time, then the optimization makes it somewhat better, then in a sort of reverse asymptote, 
they dedicate all their “good compute” to the next bigger model.Which they then trim down over time, so they can train 
the next big model… etc etc.
None of these companies exist to provide AI services in 2024. They’re only doing it to finance the things they want to 
build in 2025 and 2026 and so on, and the goal is to obsolete computing in general and become a hidden monopoly like 
the oil and electric companies. 
2024 service quality is not a metric they want to optimize, they’re forced to, only to maintain some directional income

For offline LLM usage, I recommend the following fine-tuned Mistral-Instruct v0.2 model:

Alternatively, there is https://huggingface.co/microsoft/Phi-3-mini-4k-instruct, which you can get in a GGUF format from here: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf

  • Or you can let the script download and run a local server for you, using llama.cpp/llamafile and one of the above models.
    • (It'll ask you if you want to download one, and if so, which one out of a choice of 3)

CLI Screenshot

GUI Screenshot tldw-summarization-gui-demo


Quickstart

  1. Update your drivers. (I.e. CUDA for Nvidia GPUs, or AMD drivers (ROCM) for AMD GPUs )
  2. Install Python3 for your platform - https://www.python.org/downloads/
  3. Download the repo: git clone https://github.com/rmusser01/tldw or manually download it (Green code button, upper right corner -> Download ZIP) and extract it to a folder of your choice.
  4. Open a terminal, navigate to the directory you cloned the repo to, or unzipped the downloaded zip file to, and run the following commands:
    • Create a virtual env: python -m venv .\
    • Launch/activate your virtual env: . .\scripts\activate.ps1
      • If you don't already have cuda installed(Nvidia), py -m pip install --upgrade pip wheel & pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
      • Or AMD (Windows): pip install torch-directml
      • Or CPU Only: pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu
    • pip install -r requirements.txt - may take a bit of time...
  5. You are Ready to Go! Check out the below sample commands:
  • Transcribe audio from a Youtube URL:

    • python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s
  • Transcribe audio from a Youtube URL & Summarize it using (anthropic/cohere/openai/llama (llama.cpp)/ooba (oobabooga/text-gen-webui)/kobold (kobold.cpp)/tabby (Tabbyapi)) API:

    • python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s -api <your choice of API>
      • Make sure to put your API key into config.txt under the appropriate API variable
  • Transcribe a list of Youtube URLs & Summarize them using (anthropic/cohere/openai/llama (llama.cpp)/ooba (oobabooga/text-gen-webui)/kobold (kobold.cpp)/tabby (Tabbyapi)) API:

    • python summarize.py ./ListofVideos.txt -api <your choice of API>
      • Make sure to put your API key into config.txt under the appropriate API variable
  • Transcribe & Summarize a List of Videos on your local filesytem with a text file:

    • python summarize.py -v ./local/file_on_your/system
  • Download a Video with Audio from a URL:

    • python summarize.py -v https://www.youtube.com/watch?v=4nd1CDZP21ss
  • Perform a summarization of a longer transcript using 'Chunking'

    • python summarize.py -roll -detail 0.01 https://www.youtube.com/watch?v=4nd1CDZP21s
      • Detail can go from 0.01 to 1.00, increments at a measure of .01.
  • Run it as a WebApp

    • python summarize.py -gui - This requires you to either stuff your API keys into the config.txt file, or pass them into the app every time you want to use it.
      • It exposes every CLI option, and has a nice toggle to make it 'simple' vs 'Advanced'
      • Has an option to download the generated transcript, and summary as text files from the UI.
      • Can also download video/audio as files if selected in the UI (WIP - doesn't currently work)
      • Gives you access to the whole SQLite DB backing it, with search, tagging, and export functionality
        • Yes, that's right. Everything you ingest, transcribe and summarize is tracked through a local(!) SQLite DB.
        • So everything you might consume during your path of research, tracked and assimilated and tagged.
        • All into a shareable, single-file DB that is open source and extremely well documented. (The DB format, not this project :P)
  • Convert an epub book to text and ingest it into the DB

    1. Download/Install pandoc for your platform:
    1. Convert your epub to a text file:
      • $ pandoc -f epub -t plain -o filename.txt filename.epub
    2. Ingest your converted epub into the DB:
      • python summarize.py path/to/your/textfile.txt --ingest_text_file --text_title "Book Title" --text_author "Author Name" -k additional,keywords

Setting it up

  • Requirements

  • Linux

    1. Download necessary packages (Python3, ffmpeg - sudo apt install ffmpeg / dnf install ffmpeg, Update your GPU Drivers/CUDA drivers if you'll be running an LLM locally)
    2. Open a terminal, navigate to the directory you want to install the script in, and run the following commands:
    3. git clone https://github.com/rmusser01/tldw
    4. cd tldw
    5. Create a virtual env: python -m venv ./
    6. Launch/activate your virtual environment: source ./bin/activate
    7. Setup the necessary python packages:
    8. Then see Linux && Windows
  • Windows

    1. Download necessary packages (Python3, Update your GPU drivers/CUDA drivers if you'll be running an LLM locally, ffmpeg will be installed by the script)
    2. Open a terminal, navigate to the directory you want to install the script in, and run the following commands:
    3. git clone https://github.com/rmusser01/tldw
    4. cd tldw
    5. Create a virtual env: python -m venv ./
    6. Launch/activate your virtual env: PowerShell: . .\scripts\activate.ps1 or for CMD: .\scripts\activate.bat
    7. Setup the necessary python packages:
    8. See Linux && Windows
  • Linux && Windows

    1. pip install -r requirements.txt - may take a bit of time...
    2. Script Usage:
      • Put your API keys and settings in the config.txt file.
        • This is where you'll put your API keys for the LLMs you want to use, as well as any other settings you want to have set by default. (Like the IP of your local LLM to use for summarization)
      • (make sure your in the python venv - Run ./bin/activate or .\scripts\activate.ps1 or .\scripts\activate.bat from the tldw directory)
      • Run python ./summarize.py <video_url> - The video URL does not have to be a youtube URL. It can be any site that ytdl supports.
      • You'll then be asked if you'd like to run the transcription through GPU(1) or CPU(2).
        • Next, the video will be downloaded to the local directory by ytdl.
        • Then the video will be transcribed by faster_whisper. (You can see this in the console output)
          • The resulting transcription output will be stored as both a json file with timestamps, as well as a txt file with no timestamps.
      • Finally, you can have the transcription summarized through feeding it into an LLM of your choice.
    3. GUI Usage:
      • Put your API keys and settings in the config.txt file.
        • This is where you'll put your API keys for the LLMs you want to use, as well as any other settings you want to have set by default. (Like the IP of your local LLM to use for summarization)
      • (make sure your in the python venv - Run source ./bin/activate or .\scripts\activate.ps1 or .\scripts\activate.bat from the tldw directory)
      • Run python ./summarize.py -gui - This will launch a webapp that will allow you to interact with the script in a more user-friendly manner.
        • You can pass in the API keys for the LLMs you want to use in the config.txt file, or pass them in when you use the GUI.
        • You can also download the generated transcript and summary as text files from the UI.
        • You can also download the video/audio as files from the UI. (WIP - doesn't currently work)
        • You can also access the SQLite DB that backs the app, with search, tagging, and export functionality.
    4. Local LLM with the Script Usage:
      • (make sure your in the python venv - Run source ./bin/activate or .\scripts\activate.ps1 or .\scripts\activate.bat from the tldw directory)
      • I recognize some people may like the functionality and idea of it all, but don't necessarily know/want to know about LLMs/getting them working, so you can also have the script download and run a local model, using system RAM and llamafile/llama.cpp.
      • Simply pass --local_llm to the script (python summarize.py --local-llm), and it'll ask you if you want to download a model, and which one you'd like to download.
      • Then, after downloading and selecting a model, it'll launch the model using llamafile, so you'll have a browser window/tab opened with a frontend to the model/llama.cpp server.
      • You'll also have the GUI open in another tab as well, a couple seconds after the model is launched, like normal.
      • You can then interact with both at the same time, being able to ask questions directly to the model, or have the model ingest output from the transcript/summary and use it to ask questions you don't necessarily care to have stored within the DB. (All transcripts, URLs processed, prompts used, and summaries generated, are stored in the DB, so you can always go back and review them or re-prompt with them)
  • Setting up Epub to Markdown conversion with Pandoc

  • Converting Epub to markdown

    • pandoc -f epub -t markdown -o output.md input.epub
  • Setting up PDF to Markdown conversion with Marker

    • Linux
      1. sudo apt install python3-venv
      2. python3 -m venv ./Helper_Scripts/marker_venv
      3. source ./Helper_Scripts/marker_venv/bin/activate
      4. pip install marker
    • Windows
      1. Install python3 from https://www.python.org/downloads/
      2. python Helper_Scripts\marker_venv\Scripts\activate\activate.ps1
      3. pip install marker
  • Converting PDF to markdown

    • Convert a Single PDF to Markdown:
      • marker_single /path/to/file.pdf /path/to/output/folder --batch_multiplier 2 --langs English
    • Convert a Folder of PDFs to Markdown:
      • marker /path/to/folder/with/pdfs /path/to/output/folder --batch_multiplier 2 --langs English
  • Ingest Converted text files en-masse

    • python summarize.py <path_to_text_file> --ingest_text_file --text_title "Title" --text_author "Author Name" -k additional,keywords

Using tldw

  • Single file (remote URL) transcription
    • Single URL: python summarize.py https://example.com/video.mp4
  • Single file (local) transcription)
    • Transcribe a local file: python summarize.py /path/to/your/localfile.mp4
  • Multiple files (local & remote)
    • List of Files(can be URLs and local files mixed): python summarize.py ./path/to/your/text_file.txt"
  • Download and run an LLM using only your system RAM! (Need at least 8GB Ram, realistically 12GB)
    • python summarize.py -gui --local_llama

Save time and use the config.txt file, it allows you to set these settings and have them used when ran.

positional arguments:
  input_path            Path or URL of the video

options:
  -h, --help            show this help message and exit
  -v, --video           Download the video instead of just the audio
  -api API_NAME, --api_name API_NAME
                        API name for summarization (optional)
  -key API_KEY, --api_key API_KEY
                        API key for summarization (optional)
  -ns NUM_SPEAKERS, --num_speakers NUM_SPEAKERS
                        Number of speakers (default: 2)
  -wm WHISPER_MODEL, --whisper_model WHISPER_MODEL
                        Whisper model (default: small)| Options: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3
  -off OFFSET, --offset OFFSET
                        Offset in seconds (default: 0)
  -vad, --vad_filter    Enable VAD filter
  -log {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Log level (default: INFO)
  -gui, --user_interface
                        Launch the Gradio user interface
  -demo, --demo_mode    Enable demo mode
  -prompt CUSTOM_PROMPT, --custom_prompt CUSTOM_PROMPT
                        Pass in a custom prompt to be used in place of the existing one.
                         (Probably should just modify the script itself...)
  -overwrite, --overwrite
                        Overwrite existing files
  -roll, --rolling_summarization
                        Enable rolling summarization
  -detail DETAIL_LEVEL, --detail_level DETAIL_LEVEL
                        Mandatory if rolling summarization is enabled, defines the chunk  size.
                         Default is 0.01(lots of chunks) -> 1.00 (few chunks)
                         Currently only OpenAI works.
  -model LLM_MODEL, --llm_model LLM_MODEL
                        Model to use for LLM summarization (only used for vLLM/TabbyAPI)
  -k KEYWORDS [KEYWORDS ...], --keywords KEYWORDS [KEYWORDS ...]
                        Keywords for tagging the media, can use multiple separated by spaces (default: cli_ingest_no_tag)
  --log_file LOG_FILE   Where to save logfile (non-default)
  --local_llm           Use a local LLM from the script(Downloads llamafile from github and 'mistral-7b-instruct-v0.2.Q8' - 8GB model from Huggingface)
  --server_mode         Run in server mode (This exposes the GUI/Server to the network)
  --share_public SHARE_PUBLIC
                        This will use Gradio's built-in ngrok tunneling to share the server publicly on the internet. Specify the port to use (default: 7860)
  --port PORT           Port to run the server on (default: 7860)


Sample commands:
    1. Simple Sample command structure:
        summarize.py <path_to_video> -api openai -k tag_one tag_two tag_three

    2. Rolling Summary Sample command structure:
        summarize.py <path_to_video> -api openai -prompt "custom_prompt_goes_here-is-appended-after-transcription" -roll -detail 0.01 -k tag_one tag_two tag_three

    3. FULL Sample command structure:
        summarize.py <path_to_video> -api openai -ns 2 -wm small.en -off 0 -vad -log INFO -prompt "custom_prompt" -overwrite -roll -detail 0.01 -k tag_one tag_two tag_three

    4. Sample command structure for UI:
        summarize.py -gui -log DEBUG
  • Download Audio only from URL -> Transcribe audio:

    python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s

  • Transcribe audio from a Youtube URL & Summarize it using (anthropic/cohere/openai/llama (llama.cpp)/ooba (oobabooga/text-gen-webui)/kobold (kobold.cpp)/tabby (Tabbyapi)) API:

    python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s -api

    • Make sure to put your API key into config.txt under the appropriate API variable
  • Download Video with audio from URL -> Transcribe audio from Video:

    python summarize.py -v https://www.youtube.com/watch?v=4nd1CDZP21s

  • Download Audio+Video from a list of videos in a text file (can be file paths or URLs) and have them all summarized:

    python summarize.py --video ./local/file_on_your/system --api_name <API_name>

  • Transcribe & Summarize a List of Videos on your local filesytem with a text file:

    python summarize.py -v ./local/file_on_your/system

  • Run it as a WebApp:

    `python summarize.py -gui

By default, videos, transcriptions and summaries are stored in a folder with the video's name under './Results', unless otherwise specified in the config file.


Setting up a Local LLM Inference Engine


Pieces & What's in the original repo?

  • What's in the Repo currently?
    1. summarize.py - Main script for downloading, transcribing, and summarizing videos, audio files, books and documents.
    2. config.txt - Config file used for settings for main app.
    3. requirements.txt - Packages to install for Nvidia GPUs
    4. AMD_requirements.txt - Packages to install for AMD GPUs
    5. llamafile - Llama.cpp wrapper for local LLM inference, is multi-platform and multi-LLM compatible.
    6. media_summary.db - SQLite DB that stores all the data ingested, transcribed, and summarized.
    7. prompts.db - SQLite DB that stores all the prompts.
    8. App_Function_Libraries Folder - Folder containing all of the applications function libraries
    9. Tests Folder - Folder containing tests for the application (ha.)
    10. Helper_Scripts - Folder containing helper scripts for the application
    11. HF - Docker file and requirements.txt for Huggingface Spaces hosting
    12. models - Folder containing the models for the speaker diarization LLMs
    13. tldw-original-scripts - Original scripts from the original repo
  • What's in the original repo?
    • summarize.py - download, transcribe and summarize audio
      1. First uses yt-dlp to download audio(optionally video) from supplied URL
      2. Next, it uses ffmpeg to convert the resulting .m4a file to .wav
      3. Then it uses faster_whisper to transcribe the .wav file to .txt
      4. After that, it uses pyannote to perform 'diarorization'
      5. Finally, it'll send the resulting txt to an LLM endpoint of your choice for summarization of the text.
    • chunker.py - break text into parts and prepare each part for LLM summarization
    • roller-*.py - rolling summarization
      • can-ai-code - interview executors to run LLM inference
    • compare.py - prepare LLM outputs for webapp
    • compare-app.py - summary viewer webapp

Similar/Other projects:


Credits

Releases

No releases published

Languages

  • Python 79.6%
  • HTML 20.3%
  • Dockerfile 0.1%