Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonsierra-research/tau2-bench

tau2-bench

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

74.3/100
816Forks: 203
View on GitHubHomepage →
Loading report...

Similar Projects

TheAgentCompany

58

An agent benchmark with tasks in a simulated software company.

Python648

InferenceX

69

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3

Python638

AI_Diplomacy

59

Frontier Models playing the board game Diplomacy.

Python633

langchain

94

The agent engineering platform

Python128.7K
Back to List