StatAI Lab

StatEval — Benchmarking Statistical Reasoning in Large Language Models

We are excited to announce StatEval, the first benchmark systematically organized along both difficulty and disciplinary axes to evaluate large language models’ statistical reasoning, developed by the team of Professor Fan Zhou at Shanghai University of Finance and Economics.

StatEval includes two carefully curated datasets:

Both test sets are publicly available and can be accessed on Hugging Face:

Next post
New paper accepted at JCGS