September 2025

September 30 2025

Reflections on the 2025 Review Process from the Program Committee Chairs

Communications Chairs 2025 2025 Conference

NeurIPS has grown at an unprecedented pace in recent years, fundamentally reshaping how the conference operates. To provide transparency and share insights into how NeurIPS continues to refine its processes, we—the Program Committee (PC) Chair Team—are publishing this blog post on our experience managing the review process at scale, with support from the Communication Chairs.

NeurIPS has multiple tracks, each with their own chairs and program committees, as well as their own processes and criteria for selecting papers. This year the three tracks that constitute the poster sessions in the main conference include the Main Program track, Datasets and Benchmarks track and the Position Paper track. While some attempts have been made to align some of these tracks, each track is unique and thus should not adopt the exact same processes. This blog post focuses specifically on the Main Program, but the Datasets and Benchmarks chairs have released a blog post, as have the Position Paper Track chairs.

How the scale of NeurIPS submissions impacts calibration on decision-making

Recruiting SACs, ACs, and Reviewers at Scale

Like most AI conferences, NeurIPS has seen a substantial increase in the number of submissions in the past few years, growing from 9,467 submissions in 2020 to 21,575 in 2025. Critically, this increase in submissions not only is a challenge from the perspective of ensuring that there are enough reviewers, area chairs (ACs), and senior area chairs (SACs) to give papers fair review (thank you to all of the 20,518 reviewers, 1,663 ACs and 199 SACs who work tirelessly to make this happen), but it also introduces effects that make the review process noisier. For example, recruiting has to happen at a larger scale, but this creates a greater possibility that papers might be reviewed and handled by less experienced reviewers/ACs/SACs. For this year, we had to recruit a larger number of ACs from self-nomination forms, so we had a larger number of first-time NeurIPS ACs. Another example is that the NeurIPS community continues to expand in topic breadth over time, with backgrounds ranging from statistical physics to psychology joining the field, and new topics such as LLMs becoming highly popular as existing fields evolve. This means that the existing pool of reviewers may not be topically qualified to handle these topics – and even experienced reviewers may struggle to cover some topics when the relevant expert communities are still emerging.

Calibration Process
For this reason, PC chairs need to focus more on calibration of decision-making, doing their best to ensure that the reviewing process remains reliable in the presence of this noise. This year, we relied on a range of signals for this – for example, if there is disagreement between reviewers, or between reviewers and the AC, we often look for consensus between the AC and SAC to calibrate. We found it extremely helpful when ACs actively communicated and provided feedback in this process. Many ACs fought for papers they thought were good even if reviewers disagreed, and often we followed their lead, ending in an accept decision for the paper. Conversely, sometimes ACs flagged significant negative issues with evidence after the rebuttal that reviewers did not catch, and we ask SACs to carefully discuss these issues with the ACs to ensure the evidence is convincing. This then led to reject decisions even when papers had high ratings. We understand and we are truly regretful that this might have severely disappointed some authors. However, we trusted the ACs and SACs to be experienced and professional members of the community whose consensus guards the quality of the program. Finally, we note in some cases decisions had to be reverted for reasons outside of what could be disclosed to the AC. This included situations like the 11 unfortunate cases where one of the co-authors are confirmed to be grossly negligent in their reviews under the our new responsible reviewing policies, in line with practices by CVPR and other sister conferences who have adopted similar policies.

At the same time, we acknowledge that the calibration process is not perfect, and it is impossible for it to be perfect. Calibrating decisions is a constrained optimization process. Everyone involved has limited resources: authors do not have infinite time to comment, and reviewers, ACs, SACs and PCs do not have infinite bandwidth to read and respond. Even if for some individual cases, the decision could be retrospectively improved, we cannot revisit these decisions as if the original constraints driving the decision did not exist. For example, some authors asked why they could not make rebuttals if ACs raised new issues independently of the reviewers, but applying this fairly across reviews would require adding additional weeks to the reviewing timeline, which would derail conference planning due to fixed timeframes.

Upholding Scientific Standards Despite Challenges

Nonetheless, we are optimistic that on balance, NeurIPS decisions were consistent with the standards expected from NeurIPS papers established by prior years. The main track this year received 21,575 valid paper submissions, of which 5,290 were accepted. That is, the main track had an acceptance rate of 24.52%, on par with that of previous years. In general, accepted papers were consistent with reviewer impressions of papers (Figure 1), with the PCs manually reviewing many outliers (but not all, due to time resource constraints.)

Figure 1. A histogram of scores across papers in the main program track, color coded by accept versus reject decisions.

Finally, we want to address the topic of space constraints, since some community members were concerned that papers meeting the NeurIPS standard were rejected for this reason. In principle, space is a constraint we have to be mindful of, as our physical venues can only permit so many papers. We were transparent about this with our SACs and monitored it closely as part of our duty. For the main track, however, as we put our attention on resolving boundary and outlier decisions using the consensus of ACs and SACs during calibration, we reached a comfortably lower number of accepted papers than the capacity of our venue holds. That is, space ultimately played a negligible role compared to scientific merit.

An initiative on responsible reviewing this year

Review processes at NeurIPS are always evolving, and not just because the PC chairs have to keep up with the increased scale of the conference and new contexts – we also want to improve the reviewing process from year-to-year and ensure we help build a sustainable conference process that is responsive to community needs and concerns. For this reason, in addition to our duties directly managing the review process, we often focus on specific areas for improvement from year to year. For this year, we have focused on the topic of responsible reviewing.

Among other initiatives, we announced new policies under the responsible reviewing initiative (see our previous blog post). Towards the goal of improving the quality of reviews, this policy allows PCs and other tracks’ chairs to withhold reviews from authors whose own reviewing obligations were unmet on time. In addition, when the reviewers are grossly negligent during the review process, their co-authored papers need to be desk-rejected. In addition, we paid attention to the issue of collusion rings and fake peer review accounts, which had been hot areas of discussion in the community in previous years. This has led to a lower reliance on bidding, as we could not assume everyone was a good actor in face of evidence some were not. In turn, this led to a greater reliance on the matching system used by OpenReview, which likely introduced different types of noise in reviews this year, increasing our workload in calibration. However, we feel this to be an important trade-off to disrupt these academic integrity challenges.

A call for feedback

Overall, we would like to encourage the community to continue to provide feedback on how to improve the review process at NeurIPS. This feedback is critical for us to understand how the changes we implement from year to year to evolve NeurIPS may affect the community. For example, one new source of noise we observed this year was reviewers increasing their scores just to end back-and-forth discussion with the authors, but being silent or critical in private discussions with the AC, which leads us to believe that increasing levels of author engagement in rebutting their papers is now leading to reviewer fatigue. We encourage the community to raise issues like this with us, so we can factor this into future years’ planning. In particular, we encourage community members to attend the Town Hall, which will be hosted on-site at the conference this year, where they can ask questions and raise comments to the organizers in real-time.

September 30 2025

Reflecting on the 2025 Review Process from the Datasets and Benchmarks Chairs

Communications Chairs 2025 2025 Conference

The Datasets and Benchmarks (DB) track, since introduced in 2021, has played a pivotal role in raising the profile of datasets and benchmarks, which is foundational to machine learning, in the NeurIPS community. As this track continues to grow and refine its processes from year to year, we, the DB Chairs, decided to write this blog post to make our process this year transparent. This blog post focuses specifically on the DB track, but the Program Committee chairs have also written a blog post, as have the Position Track chairs.

Raising the standards for dataset and benchmark submissions

As the NeurIPS Datasets and Benchmarks (DB) track continues to mature, its growth in submissions is beginning to stabilize, as is also its establishment within the community. After three years of exponential growth, the track received 1,995 submissions this year. While this is a large number, the increase from the 1,820 submissions received last year is smaller than in previous years (for comparison, there were 987 submissions in 2023), suggesting that the track has begun to stabilize. In light of this maturity, significant steps were taken for the second consecutive year to align its standards and processes with the main track. This strategic alignment aims to ensure that papers involving datasets and benchmarks are held to the same rigorous criteria as main track papers, and maintain the reputation for quality that NeurIPS proceedings papers have. This is also particularly relevant for papers that blur the lines across tracks (most frequently benchmarks and evaluations).

A first step in the direction of alignment with the main track began in 2024, when DB track saw an acceptance rate of 25.3% closely mirroring the main program’s 25.8% rate. For 2025, our goal was to align even more with the main track by working together from recruiting reviewers and ACs, adapting the majority of the main track’s reviewing processes, and finally jointly implementing the responsible reviewing initiative. This initiative is designed to safeguard the quality of reviews, addressing issues of late or low-quality feedback that can hinder the peer-review process. All this was aimed at ensuring a consistent and high level of scrutiny for all submissions.

Building on the lessons learned from previous years, the DB track chairs have also focused on streamlining the submission and review process for datasets. The objective was to create a more standardized submission process for authors and a more efficient evaluation method for reviewers, and in this to allow for easier access, comparison, and assessment of datasets. We elaborated on these changes in our blog post titled NeurIPS Datasets & Benchmarks: Raising the Bar for Dataset Submissions. We highlighted the evolving nature of AI research, where the distinction between a “dataset paper” and a “main track” paper can be nuanced. To address this, the updated best practices and requirements for 2025 include more stringent criteria for dataset submission, hosting, and reproducibility. These measures are intended to ensure that datasets are not only useful and accessible at the time of publication but also remain so over time, thereby improving their accessibility for the review process.

Many of these enhancements are reflected in the updated Call for Papers for the DB track, which this year directly references the main track’s call for papers while providing specific guidelines for dataset and benchmark submissions. This change sends a clear message about the intent to elevate the quality of the DB track submissions to the well-established standards of the main conference.

DB Track calibration process for ensuring fairness and consistency

A critical component of our commitment to maturing the DB track is the careful calibration of our decision-making process. Like all tracks at NeurIPS, we must ensure that reviewing standards are applied consistently and fairly across all submissions, accounting for the natural variance that exists among reviewers, Area Chairs (ACs), and Senior Area Chairs (SACs). As the Program Committee Chairs have noted for the main track in their blog post, many factors can introduce noise into reviewer scores and feedback. It is our responsibility as the DB track chairs to mitigate this and ensure that every paper receives a fair evaluation.

In line with our primary goal of raising the standards and streamlining processes for the DB track, this year, we collaborated closely with the PC chairs to parallel many of the main track reviewing processes. This marks a significant step forward from previous years, where the DB track largely defined its own standards. While our processes were not identical, as DB papers have unique considerations, we adopted similar protocols to the main track for resolving disagreements, such as supporting ACs and SACs throughout the process of calibration for the final decision making.

This year, we implemented two significant changes to the review process, hand in hand with the main track. First, we introduced a revised scoring system. Second, we rolled out the responsible reviewing initiative, which focused on safeguarding review quality and timeliness.

The new scoring system may have influenced how reviewers assessed submissions. In particular, we have observed two distinct trends in the DB track during this process:

Increased average score: Unlike papers in the main track, which one can expect to be more method and algorithm oriented, it is less common for a dataset submission to be fundamentally “technically incorrect.”
Subjective nature of contributions: Evaluating the merit of a dataset or benchmark can be highly subjective. A dataset that fills a critical gap for a smaller, long-tail research area might be just as valuable or novel as one that targets a well-established “head” problem.

We think that these factors can lead to reviewer scores that skew higher and have a tighter distribution compared to the papers in the main track. As a result, it can be difficult for the ACs to differentiate clearly between two papers that may have the same average score but vastly different merits and trade-offs.

To address this, this year, at the end of the calibration process, which was mirrored from the main track, we implemented an additional step that allowed us to have a nuanced evaluation system, rather than relying solely on numerical scores. Thus, we asked our SACs, based on their discussion with the ACs, to produce a relative ranking of the papers within their stack of papers. The same procedure was also done last year. The main difference is that last year, we did not have a template for the rankings, and SACs were asked to explain their ranking in a live meeting. To be mindful of the increased effort, this year, we structured this interaction using a ranking form. For any paper ranked below the track-wide average score of 4.25, SACs were required to provide a detailed description of its merits and motivation. This combination of relative ranking and qualitative justification provided a much richer signal.

Provide us with feedback

As with all tracks, community feedback is essential for us to continue to improve our processes: please reach out if you have any feedback. This year, we advanced many changes to improve the quality of datasets and benchmarks papers at NeurIPS, under the idea that a NeurIPS DB paper should not just be correct, but also meet standards for impact and scientific relevance. In future years, we would like to better describe these criteria to reviewers/ACs/SACs, and would invite feedback on this, especially. In particular, we encourage community members to attend our Town Hall at the conference venue, where they can ask questions and provide comments to the organizers in real-time.

September 26 2025

NeurIPS 2025 September Newsletter

Communications Chairs 2025 2025 Conference, General, NeurIPS Newsletters

Welcome to the September edition of the NeurIPS monthly Newsletter!

The NeurIPS Newsletter keeps you updated on NeurIPS events, planning, feedback requests, and new initiatives. This edition focuses on NeurIPS 2025, happening in San Diego and in Mexico City.

This newsletter includes:

Paper Decisions Released
Invited Speaker Lineup Announced
Financial Assistance and Volunteer Applications Open Until Oct 1
NeurIPS Mexico City Calls for Tutorials, Socials, Workshops, and Startup Pitches

Paper Decisions Released

Congratulations to all authors of accepted papers! We look forward to seeing your work presented at NeurIPS 2025 in San Diego and Mexico City.

Invited Speaker Lineup Announced

The NeurIPS 2025 announced this year’s speaker lineup:

Kyunghyun Cho (New York University, Genetech)
Yejin Choi (University of Washington, AI2)
Melanie Mitchell (Santa Fe Institute)
Andrew Saxe (University College London)
Richard Sutton (University of Alberta, Amii)
Zeynep Tufekci (Princeton University)

Read more on the full blog post with speaker bios: Blog Post

Financial Assistance and Volunteer Applications Open Till Oct 1

This year’s financial assistance and volunteer applications are now open! The Financial Assistance programs will prioritize student & junior postdoc (< 2 years) conference authors. If funding amounts allow, we will also consider funding applications on the grounds of other inclusion criteria, including diversity and accessibility, affinity group membership, financial hardship, and first-time attendance.

Additionally, student volunteers help support the conference and gain behind-the-scenes experience while receiving complimentary registration.

Deadline: October 1, 2025

Apply here: Financial Assistance and Volunteering

NeurIPS Mexico City Announces Calls for Tutorials, Socials, Workshops and Startup Pitches

Tutorials — Share your expertise with the community through a 2.5-hour deep dive on cutting-edge research or applications.

Submission deadline: 26 September 2025 (AoE)

Details: https://neurips.cc/Conferences/2025/CDMXCallForTutorials

Socials — Host a fun and creative gathering to spark conversation, build community, and bring people together beyond the sessions.

Submission deadline: 1 October 2025 (AoE)

Details: https://neurips.cc/Conferences/2025/CDMXCallForSocials

Workshops — Lead a full-day interactive session that pushes the boundaries of research and fosters interdisciplinary exchange.

Submission deadline: 30 September 2025 (AoE)

Details: https://neurips.cc/Conferences/2025/CDMXCallForWorkshops

Startup Pitch — Showcase your early-stage AI startup to researchers, investors, and the global AI community at a live pitch event.

Submission deadline: 15 October 2025 (AoE)

Details: https://neurips.cc/Conferences/2025/CDMXCallForStartupPitch

Danielle Belgrave, Cheng Zhang, Alex Lu, Jean Kossaifi, and Mengye Ren

NeurIPS 2025 General Chairs and Communication Chairs

September 10 2025

2025 Invited Speaker Lineup Announced

Katherine Gorman 2025 Conference

The NeurIPS 2025 organizing committee is pleased to announce the invited speakers for this years conference. The lineup includes six researchers representing areas spanning theoretical machine learning, reinforcement learning, AI and society, cognitive science, natural language processing, and deep learning applications.

Kyunghyun Cho

Glen de Vries Professor of Health Statistics, NYU; Executive Director of Frontier Research, Prescient Design, Genentech
Website: kyunghyuncho.me

Cho’s work spans machine learning and natural language processing. He co-developed the Gated Recurrent Unit (GRU) architecture and has contributed to neural machine translation and sequence-to-sequence learning. He is a CIFAR Fellow of Learning in Machines & Brains and received the 2021 Samsung Ho-Am Prize in Engineering. He served as program chair for ICLR 2020, NeurIPS 2022, and ICML 2022.

Yejin Choi

Professor of Computer Science, Stanford University; Dieter Schwarz Foundation Senior Fellow, Stanford HAI; Distinguished Scientist, NVIDIA
Website: yejinc.github.io

Choi’s research focuses on natural language processing, with emphasis on commonsense reasoning and language understanding. She is a 2022 MacArthur Fellow and was named to Time’s Most Influential People in AI in 2023. She has received multiple Test of Time Awards from ACL and CVPR, and Best Paper Awards at venues including ACL, EMNLP, ICML, and NeurIPS. She previously held positions at the University of Washington and Allen Institute for AI.

Melanie Mitchell

Professor, Santa Fe Institute
Website: melaniemitchell.me

Mitchell’s research areas include AI, cognitive science, and complex systems, with focus on conceptual abstraction and analogy-making in humans and AI systems. She authored “Complexity: A Guided Tour,” which won the 2010 Phi Beta Kappa Science Book Award, and “Artificial Intelligence: A Guide for Thinking Humans,” which was named one of the five best books on AI by both the New York Times and the Wall Street Journal. She received her PhD from the University of Michigan under Douglas Hofstadter, with whom she developed the Copycat cognitive architecture.

Andrew Saxe

Professor of Theoretical Neuroscience & Machine Learning, Gatsby Computational Neuroscience Unit and Sainsbury Wellcome Centre, UCL
Website: saxelab.org

Saxe’s research focuses on mathematical theories of learning in neural networks. He has developed exact solutions for learning dynamics in deep linear networks and studies connections between artificial and biological learning systems. He is a CIFAR Fellow of Learning in Machines & Brains and recipient of the 2019 Wellcome Trust Beit Prize. His work includes theoretical analyses of semantic development and the dynamics of representation learning.

Richard Sutton

Distinguished Research Scientist, Professor, University of Alberta; Chief Scientific Advisor, Amii
Website: incompleteideas.net

Sutton co-developed temporal difference learning and policy gradient methods in reinforcement learning. He received the 2024 Turing Award with Andrew Barto for foundational contributions to reinforcement learning. He is co-author of the textbook “Reinforcement Learning: An Introduction” and is a Fellow of the Royal Society and the Royal Society of Canada. His research focuses on computational principles underlying learning and decision-making.

Zeynep Tufekci

Henry G. Bryant Professor of Sociology and Public Affairs, Princeton University; New York Times Columnist
Website: zeynep.me

Tufekci examines the interplay of science, technology and society through a sociological framework and complex systems lens, focusing especially on digital, computational, and artificial intelligence technologies. She was a 2022 Pulitzer Prize finalist for commentary on the COVID-19 pandemic. Her book “Twitter and Tear Gas: The Power and Fragility of Networked Protest” examines the dynamics of social movements in the digital age. She is also faculty associate at the Berkman Klein Center for Internet & Society at Harvard University.