⚠

Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.

C++andrewkchan/yalm

yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

38.7/100

★ 555Forks: 56

View on GitHub →

Loading report...

Similar Projects

ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++★ 905

lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

C++★ 2.3K

PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++★ 8.8K

aphrodite-engine

Large-scale LLM inference engine

C++★ 1.7K

← Back to List