Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
C++Tiiny-AI/PowerInfer

PowerInfer

High-speed Large Language Model Serving for Local Deployment

57.2/100
9.4KForks: 563
View on GitHub
Loading report...

Similar Projects

lemonade

86

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

C++3.6K

ZhiLight

65

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++904

deeplake

79

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

C++9.1K

distributed-llama

80

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

C++2.9K
Back to List