Large-scale LLM inference engine
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
A highly optimized LLM inference acceleration engine for Llama and its variants.
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai