⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

PythonCatchTheTornado/text-extract-api

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

62.3/100

★ 3.1KForks: 271

View on GitHub →Homepage →

Loading report...

Similar Projects

MinerU

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Python★ 60.9K

skyvern

Automate browser based workflows with AI

Python★ 21.3K

gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python★ 12.8K

XHS-Downloader

小红书（XiaoHongShu、RedNote）链接提取/作品采集工具：提取账号发布、收藏、点赞、专辑作品链接；提取搜索结果作品、用户链接；采集小红书作品信息；提取小红书作品下载地址；下载小红书作品文件

Python★ 10.9K

← Back to List