aniketmaurya/llm-inference
Large Language Model (LLM) Inference API and Chatbot
Seamlessly integrate LLMs into scikit-learn.
Large Language Model (LLM) Inference API and Chatbot
A high-throughput and memory-efficient inference and serving engine for LLMs
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.
2 captures since 2026-05-27