Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonmicrosoft/promptbench

promptbench

A unified evaluation framework for large language models

68.3/100
2.8KForks: 219
View on GitHubHomepage →
Loading report...

Similar Projects

promptflow

89

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.

Python11.0K

opencompass

86

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python6.7K

llm-guard

60

The Security Toolkit for LLM Interactions

Python2.6K

evalplus

64

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python1.7K
Back to List