Back to List
Notice:This resource is provided by a third-party author. Please review the code with AI tools or manually before use to ensure security and compatibility.
C++vectorch-ai/ScaleLLM

ScaleLLM

A high-performance inference system for large language models, designed for production environments.

60.6/100
500Forks: 41
View on GitHubHomepage →
Loading report...

Similar Projects

cactus

86

Low-latency AI engine for mobile devices & wearables

C++5.3K

vllm-ascend

78

Community maintained hardware plugin for vLLM on Ascend

C++2.2K

ZhiLight

58

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++905

tiny-vllm

52

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

C++776
Back to List