⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

Pythonclaw-eval/claw-eval

claw-eval

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

48.4/100

★ 683Forks: 59

View on GitHub →Homepage →

Loading report...

Similar Projects

ClawProBench

ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

Python★ 800

nexent

Nexent is a zero-code platform for auto-generating production-grade AI agents using Harness Engineering principles — unified tools, skills, memory, and orchestration with built-in constraints, feedback loops, and control planes.

Python★ 5.3K

opensquilla

OpenSquilla — Token-Efficient AI Agent with same budget, higher intelligence density

Python★ 4.8K

PPTAgent

An Agentic Framework for Reflective PowerPoint Generation

Python★ 4.7K

← Back to List