GreyDGL/PentestGPT
Automated Penetration Testing Agentic Framework Powered by Large Language Models
Automated Penetration Testing Agentic Framework Powered by Large Language Models
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Collection of evals for Inspect AI
1 capture since 2026-05-25