DAB | EPIC Data Lab, UC Berkeley

What is the Data Agent Benchmark?

Users across enterprises increasingly rely on AI agents to query their data through natural language. However, building reliable data agents remains difficult because real-world data is often fragmented across multiple heterogeneous database systems, with inconsistent references and information buried in unstructured text.

DAB is the first benchmark that tests agents on these challenges. It covers 12 real-world datasets across 9 domains and 4 database systems (PostgreSQL, MongoDB, SQLite, DuckDB).

Datasets

Queries

Domains

DBMSes

From EPIC Data Lab, UC Berkeley and Hasura PromptQL.

Submit to the Leaderboard

Run your agent on all 54 queries with at least n = 5 trials/query and open a pull request with your results JSON.

Collect one result per dataset × query × trial.
Package into one JSON file.
Open a PR with your JSON and agent details.

[
  {
    "dataset": "bookreview",
    "query": "1",
    "run": 0,
    "answer": "2020s"
  },
  ...
]

Full instructions in the README.

Leaderboard

Pass@1 = fraction of queries answered correctly on the first attempt, averaged across n trials per query. The overall score is computed per-dataset first, then averaged across datasets (stratified).

#	Agent	Team	n	Pass@1	Date

Per-Dataset Pass@1

Sort by agent to compare dataset difficulty.

Sort by agent

Order

DAB: Data Agent Benchmark

What is the Data Agent Benchmark?

Submit to the Leaderboard

Leaderboard

Per-Dataset Pass@1

Benchmark Queries