Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonopendatalab/MinerU

MinerU

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

93.1/100
67.0KForks: 5.6K
View on GitHubHomepage →
Loading report...

Similar Projects

wdoc

81

Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, etc

Python519

json_repair

87

Repair malformed JSON from LLMs, APIs, logs, and user input in Python.

Python5.0K

docext

67

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

Python2.0K

Hands-On-AI-Engineering

69

A curated collection of practical AI projects implementing OCR systems, RAG, AI agents, and other AI use cases.

Python1.9K
Back to List