⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

Pythonsierra-research/tau2-bench

tau2-bench

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

81.0/100

★ 1.7KForks: 419

View on GitHub →Homepage →

Loading report...

Similar Projects

InferenceX

Open Source Continuous Inference Benchmark Research Platform — Kimi K2.7-Code, MiniMax M3, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 & soon™ TPUv6e/v7/Trainium2/3 | 开源持续推理基准研究平台 — Kimi K2.7-Code、MiniMax M3、DeepSeekv4、GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72，即将推出™ TPUv6e/v7/Trainium2/3

Python★ 1.3K

AI_Diplomacy

Frontier Models playing the board game Diplomacy.

Python★ 687

meta-agents-research-environments

Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environments where agents must adapt their strategies as new information becomes available, mirroring real-world challenges.

Python★ 531

hermes-agent

The agent that grows with you

Python★ 220.0K

← Back to List