Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
Pythonnatolambert/rlhf-book

rlhf-book

Textbook on reinforcement learning from human feedback

80.3/100
1.7KForks: 153
View on GitHubHomepage →
Loading report...

Similar Projects

distilabel

85

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python3.1K

HALOs

56

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Python907

oat

71

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python637

AutoGPT

97

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python182.3K
Back to List