StatAI Lab

StatEval — Benchmarking Statistical Reasoning in Large Language Models

2026-02-15T00:00:00+00:00

We are excited to announce StatEval, the first benchmark systematically organized along both difficulty and disciplinary axes to evaluate large language models’ statistical reasoning, developed by the team of Professor Fan Zhou at Shanghai University of Finance and Economics.

StatEval includes two carefully curated datasets:

Foundational Knowledge Dataset — over 13,000 problems sourced from 50+ textbooks, covering the full spectrum of foundational statistical knowledge.
Statistical Research Dataset — over 2,000 proof-based questions collected from 18 top-tier journals in statistics, probability, econometrics, and machine learning.

Both test sets are publicly available and can be accessed on Hugging Face:

New paper accepted at JCGS

2026-01-15T00:00:00+00:00

Our paper, “Spatio-Temporal Prediction of Fine-Grained Origin-Destination Matrices with Applications to Ridesharing” (Authors: Run Yang, Runpeng Dai, Siran Gao, Xiaocheng Tang, Fan Zhou, Hongtu Zhu), has been accepted in the Journal of Computational and Graphical Statistics (JCGS). This work develops novel spatiotemporal prediction methods for fine-grained origin-destination matrices, with applications to ridesharing platform optimization. Congratulations to all authors!

New paper accepted at EACL 2026

2026-01-10T00:00:00+00:00

Our paper, “Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models” (Authors: Runpeng Dai, Run Yang, Fan Zhou, Hongtu Zhu), has been accepted at the European Chapter of the Association for Computational Linguistics (EACL 2026). This work investigates the security vulnerabilities of large language models and paves the way for more robust LLM deployments. Congratulations to all authors!