Fast Multimodal LLM on Mobile Devices
Production ready toolkit to run AI locally
High-speed Large Language Model Serving for Local Deployment
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
A highly optimized LLM inference acceleration engine for Llama and its variants.