Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
C++Tiiny-AI/PowerInfer

PowerInfer

High-speed Large Language Model Serving for Local Deployment

62.1/100
8.8KForks: 498
View on GitHub
Loading report...

Similar Projects

lemonade

85

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

C++2.3K

ZhiLight

73

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++905

yalm

39

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++555

distributed-llama

76

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

C++2.9K
Back to List