Sign in
← Back to search

ARBML/Taqyim

Python intefrace for evaluation on chatgpt models

Stars
19
Forks
4
Commits
46
Language
Jupyter Notebook
Awesome lists
1

Similar repositories

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

4191 stars
Python 1 awesome list

claw-eval/claw-eval

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

632 stars
Python 2 awesome lists

sierra-research/tau2-bench

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

1284 stars
Python 1 awesome list

google/BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

3242 stars
Python 2 awesome lists

Tracked growth

1 capture since 2026-05-27

Latest capture 2026-05-27 12:23

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2023-05-27
  • First commit: 2023-05-27
  • Last pushed: 2024-02-13
  • Archived: no
  • Stack detected: —
  • License: MIT

AI development signals

No AI development config files detected.