Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
PythonCatchTheTornado/text-extract-api

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

65.7/100
3.0KForks: 252
View on GitHubHomepage →
Loading report...

Similar Projects

MinerU

91

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python55.7K

skyvern

88

Automate browser based workflows with AI

Python20.7K

gorilla

86

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python12.7K

h2ogpt

66

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

Python12.0K
Back to List