FranxYao/chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.
Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works with Claude Code, Codex, OpenClaw, or any LLM agent.
A framework for prompt tuning using Intent-based Prompt Calibration
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
1 capture since 2026-05-27