Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
PythonCatchTheTornado/text-extract-api

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

62.3/100
3.1KForks: 271
View on GitHubHomepage →
Loading report...

Similar Projects

MinerU

93

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Python60.9K

skyvern

88

Automate browser based workflows with AI

Python21.3K

gorilla

84

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python12.8K

XHS-Downloader

90

小红书(XiaoHongShu、RedNote)链接提取/作品采集工具:提取账号发布、收藏、点赞、专辑作品链接;提取搜索结果作品、用户链接;采集小红书作品信息;提取小红书作品下载地址;下载小红书作品文件

Python10.9K
Back to List