NeurIPS Datasets & Benchmarks: Raising the Bar for Dataset Submissions
Authors:
DB Track chairs: Lora Aroyo, Francesco Locatello, Konstantina Palla,
DB Track resource and metadata chairs: Meg Risdal, Joaquin Vanschoren
The NeurIPS Datasets & Benchmarks Track exists to highlight the crucial role that high-quality datasets and benchmarks play in advancing machine learning research. While algorithmic innovation often takes center stage, the progress of AI depends just as much on the quality, accessibility, and rigor of the datasets that fuel these models. Our goal is to ensure that impactful dataset contributions receive the recognition and scrutiny they deserve.
This blog post accompanies the release of the Call for Papers for the 2025 Datasets & Benchmarks Track (https://neurips.cc/Conferences/2025/CallForDatasetsBenchmarks), outlining key updates to submission requirements and best practices. Please note that this year Datasets & Benchmarks Track this year will follow the NeurIPS2025 Main Track Call for Papers, with the addition of three track-specific points: (1) Single-blind submissions, (2) Required dataset and benchmark code submission and (3) Specific scope for datasets and benchmarks paper submission. See the Call for Papers for details.
The Challenge of Assessing High-Quality Datasets
Unlike traditional research papers that have well-established peer review standards, dataset and benchmark papers require unique considerations on their review evaluation. A high-quality dataset must be well-documented, reproducible, and accessible while adhering to best practices in data collection and ethical considerations. Without clear guidelines and automated validation, reviewers face inconsistencies in their assessments, and valuable contributions risk being overlooked.
To address these challenges, we have developed submission and review guidelines that align with widely recognized frameworks in the research community and the open-source movement. For instance, in 2024, we encouraged authors to use established documentation standards such as datasheets for datasets, dataset nutrition labels, data statements for NLP, data cards, and accountability frameworks. By promoting these frameworks, we aim to ensure that dataset contributions are well-documented and transparent, making it easier for researchers to assess their reliability and impact.
Raising the Bar: Machine-Readable Metadata with Croissant
A persistent challenge has been the lack of a standardized, reliable way for reviewers to assess datasets against industry best practices. Unlike the main track, which has commonly accepted standards for paper submissions, dataset reviews still have to mature in this respect.
In 2024, we took a significant step toward improving dataset review by encouraging authors to generate a Croissant machine-readable metadata file to document their datasets. Croissant is an open community effort created because existing standards for dataset metadata lack ML-specific support and lag behind AI’s dynamically evolving requirements. Croissant records ML-specific metadata that enables datasets to be loaded directly into ML frameworks and tools, streamlines usage and community sharing independent of hosting platforms, and includes responsible AI metadata. At that time, Croissant tooling was still in its early stages, and many authors found the process burdensome. Since then, Croissant has matured significantly and gained industry and community adoption. Platforms like Hugging Face, Kaggle, OpenML, and Dataverse now natively support Croissant, making metadata generation effortless.
Making High-Quality Dataset Submissions the Standard
With these improvements in tooling and ecosystem support, we are now requiring dataset authors to ensure that their datasets are properly hosted. That means that they release their datasets via a data repository (e.g., Hugging Face, Kaggle, OpenML, or Dataverse) or provide a custom hosting solution that supports and ensures long-term access and includes a croissant description. We also provide detailed guidelines for authors to make the process as smooth as possible. This requirement ensures that:
- Datasets are easily accessible and discoverable through widely used research platforms over long periods of time.
- Standard interfaces (e.g., via Python client libraries) simplify dataset retrieval for both researchers and reviewers.
- Metadata is automatically validated to streamline the review process.
By enforcing this requirement, we are lowering the barriers to high-quality dataset documentation while improving the overall transparency and reproducibility of dataset contributions.
Looking Ahead
The NeurIPS Datasets & Benchmarks Track is committed to evolving alongside the broader research community. By integrating best practices and leveraging industry standards like Croissant, we aim to enhance the visibility, impact, and reliability of dataset contributions. These changes will help ensure that machine learning research is built on a foundation of well-documented, high-quality datasets that drive meaningful progress.
If you are preparing a dataset submission for NeurIPS, we encourage you to explore Croissant-integrated repositories today and take advantage of the powerful tools available to streamline your metadata generation. Let’s work together to set a new standard for dataset contributions