Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
PythonJudgmentLabs/judgeval

judgeval

The Continuous-Improvement Stack for Agents. Our environment data and evals power agent improvement and monitoring.

84.2/100
1.0KForks: 93
View on GitHubHomepage →
Loading report...

Similar Projects

opik

93

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python19.5K

DeepGit

65

Deep research agent to help you find the best GitHub repositories 🕵️!

Python880

deer-flow

84

An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of tasks that could take minutes to hours.

Python70.8K

Decepticon

87

Autonomous Hacking Agent for Red Team

Python4.3K
Back to List