⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

PythonJudgmentLabs/judgeval

judgeval

The Continuous-Improvement Stack for Agents. Our environment data and evals power agent improvement and monitoring.

84.4/100

★ 1.0KForks: 94

View on GitHub →Homepage →

Loading report...

Similar Projects

opik

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Python★ 20.8K

DeepGit

Deep research agent to help you find the best GitHub repositories 🕵️!

Python★ 893

deer-flow

An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of tasks that could take minutes to hours.

Python★ 77.8K

Decepticon

Autonomous Hacking Agent for Red Team

Python★ 4.9K

← Back to List