Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonsierra-research/tau2-bench

tau2-bench

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

76.8/100
1.1KForks: 271
View on GitHubHomepage →
Loading report...

Similar Projects

InferenceX

69

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3

Python868

TheAgentCompany

57

An agent benchmark with tasks in a simulated software company.

Python689

AI_Diplomacy

52

Frontier Models playing the board game Diplomacy.

Python656

AutoGPT

96

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python183.7K
Back to List