Announcing the NeurIPS 2022 Datasets & Benchmarks Track
By Rachel Thomas, Deepti Ghadiyaram, Joaquin Vanschoren
We are happy to announce that NeurIPS 2022 will continue the Datasets and Benchmarks track introduced last year. We’ll have a single submission deadline this year to allow more discussion, and we still welcome nominations for new PC members.
The use of datasets and benchmarks is foundational to machine learning, yet work on datasets and benchmarks is often still undervalued, unsupported, and taken for granted (Sambasivan et al 2021). Good models require good data, as well as critical inspection of how datasets are constructed and used. Traditionally, machine learning conference incentives and processes have typically been designed for research centring algorithms and models, making it harder to publish data work. For example, datasets often can not be reviewed in a double-blind fashion, but instead require additional checks around whether the data was collected responsibly, shows bias, and will remain accessible. In response, NeurIPS launched a Datasets and Benchmarks track in 2021 as a venue for outstanding research at the same calibre as the main track.
The inaugural Datasets and Benchmarks track at NeurIPS 2021 was well-received, with 484 papers submitted and 174 accepted. Datasets and Benchmarks has quickly become a prestigious, high-quality venue, treated as first-class work on par with algorithms work. Research appropriate for this track obviously includes challenging new datasets and insightful new benchmarks, but also meta-analyses of how authors use datasets, pitfalls, and ethical issues; the construction of datasets or benchmarks for new tasks; dataset variants that address ethical issues; and software such as data generators or benchmarking systems. Ethical and responsible data sourcing is a key part of the evaluation in this track.
Best practices around reproducibility should again be followed for new benchmarks, and public availability is required for new datasets (this can include credentialized access for sensitive data). Since a double-blind review is often not possible for datasets, a single-blind review is used where needed. In cases where the work can be equally well-reviewed anonymously, authors can also choose to submit double-blind. Next to a scientific paper, authors must submit supplementary materials on how the data was collected and organized, what kind of information it contains, how it should be used ethically and responsibly, and how it will be made available and maintained. Authors are free in describing this to the best of their ability, yet we encourage the use of dataset documentation frameworks, such as datasheets for datasets, dataset nutrition labels, data statements for NLP, and accountability frameworks.
Please consider submitting to the 2022 Datasets and Benchmark NeurIPS track. The abstract submission deadline will be June 5th, 2022 (anytime on earth). This year there will be a single submission deadline, in order to allow a longer interactive discussion phase. Submissions to the track will be part of the NeurIPS conference, presented alongside the main conference papers, as well as published in associated proceedings hosted on the NeurIPS website, next to the main conference proceedings. More details are available in the call for proposals. You can view the accepted papers from 2021 here, and see the two best paper award winners here. Finally, since we are still growing the Datasets and Benchmarks programme committee, we welcome nominations for new PC members. Please send all questions to datasetsbenchmarks@neurips.cc.
We are looking forward to another great edition of the NeurIPS Datasets and Benchmarks track, and hope to see you at the conference.