Skip to content

Extract your docs (CSV, PDF, JSON, HTML, DOCS, Sheets and more) for your own GPT and LLM projects using Unstructured.io via streamlit

Notifications You must be signed in to change notification settings

rririanto/unstructured-demo-streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extract your docs using Unstructured-IO

Description

This Streamlit app is designed to help you analyze and extract valuable insights from challenging data formats commonly found in enterprise settings, such as HTML, PDF, CSV, PNG, PPTX, and more.

This app uses unstructured.io as a base library, providing an easy way to extract and convert unstructured data into a format compatible with popular vector databases and LLM frameworks. With this tool, you can streamline complex data handling and ensure compatibility with your preferred data analysis pipelines.

Supported file types:

Category Document Types
Plaintext .txt, .eml, .msg, .xml, .html, .md, .rst, .json, .rtf
Images .jpeg, .png
Documents .doc, .docx, .ppt, .pptx, .pdf, .odt, .epub, .csv, .tsv, .xlsx
Find out more about it unstructured.io

To get started, upload any docs file and it will be show's on the preview. You can also adjust the parameters to fine-tune your tests.

Accessing the App

You can access the app on the Streamlit Cloud community at https://unstructured-demo.streamlit.app/.

Getting Started

The app does not require any API key to function; extractions will be processed on streamlit cloud serverunless you choose to process them on unstructured.io server.

However, if you choose to use unstructured.io API, I gave you a temporary key in the app, but it might be limited. Create your own at unstructured. After obtaining your API key, select unstructured.io API, enter your own API, and upload your file.

Feedback

If you have any feedback or questions about this app, please reach out to me on Twitter at @rririanto.

Thank you for checking out the tool!

About

Extract your docs (CSV, PDF, JSON, HTML, DOCS, Sheets and more) for your own GPT and LLM projects using Unstructured.io via streamlit

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages