Communications Chairs 2025

About Communications Chairs 2025

Posts by Alex Lu:

November 26 2025

Announcing the 2025 Sejnowski-Hinton Prize

Communications Chairs 2025 2025 Conference

The NeurIPS organizing committee is pleased to announce the winner of the 2025 Sejnowski-Hinton Prize: Timothy Lillicrap, Daniel Cownden, Douglas Tweed, and Colin Akerman for their groundbreaking 2016 paper “Random synaptic feedback weights support error backpropagation for deep learning” published in Nature Communications (arXiv version).

The 2025 Winner: Feedback Alignment

Theories of the brain have long sought to explain how neural circuits learn efficiently using only local information—how synapses adjust based on signals available at each connection rather than through explicit, global error propagation. Inspired by this challenge, researchers have explored ways to train artificial neural networks that perform gradient descent using local learning rules.

This paper is recognized for its contribution to that pursuit through the discovery of “feedback alignment”. The authors demonstrated that multi-layer networks can learn effectively using fixed, random feedback weights, rather than requiring exact backward weight symmetry as in backpropagation. Remarkably, the network’s forward weights naturally align with these random feedback signals over the course of learning, yielding a biased yet useful estimate of the loss gradient. This insight provided the first concrete, biologically grounded solution to the long-standing weight transport problem, the question of how real neurons might follow loss gradients without non-local information transfer.

The work had a significant impact, helping to establish a new sub-field of “biologically plausible” learning rules in the NeurIPS community and beyond.

About the Prize

The Sejnowski-Hinton Prize is awarded annually to a paper published within the past ten years that has made major contributions to computational theories of the brain drawing on insights from artificial intelligence, and has had a significant impact on the NeurIPS community. A seven-person committee—Blake Richards, Sebastian Seung, Eva Dyer, Razvan Pascanu, Maneesh Sahani, Terry Sejnowski, and Geoffrey Hinton—selected this year’s winner. The prize carries a $10,000 award to be shared among the authors.

Origin of the Prize

First awarded in 2025, the Sejnowski-Hinton Prize is rooted in collaboration and generosity. In 1983, Geoffrey Hinton and Terry Sejnowski made a pact: if one of them received a Nobel Prize for their work on Boltzmann machines and the other didn’t, they would share the prize money. In 2024, when John Hopfield and Geoffrey Hinton were awarded the Nobel Prize in Physics—with Boltzmann machines cited among the contributions—Terry declined his share. Geoffrey then donated these funds to NeurIPS to establish the Sejnowski-Hinton Prize, honoring their longstanding partnership and commitment to advancing computational theories of the brain and collaborative work in the community.

November 26 2025

Announcing the NeurIPS 2025 Best Paper Awards

Communications Chairs 2025 2025 Conference

The Best Paper Award Committee members were nominated by the Program Chairs and the Database and Benchmark track chairs, who selected leading researchers across machine learning topics. These nominations were approved by the General Chairs and Next Generation and Accessibility Chairs.

The best paper award committees were tasked with selecting a handful of highly impactful papers from the Main Track and the Datasets & Benchmark Track of the conference.

With that, we are excited to share the news that the best and runner-up paper awards this year go to seven groundbreaking papers, including four best papers (one of which is from the datasets and benchmarks track) and three runner-ups. The seven papers highlight advances in diffusion model theory, self-supervised reinforcement learning, attention mechanisms for large language models, reasoning capabilities in LLMs, online learning theory, neural scaling laws, and benchmarking methodologies for language model diversity.

The winners are presented here in alphabetical order by title.

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Liwei Jiang, Yuanjun Chai, Margaret Li, Mickel Liu, Raymond Fok, Nouha Dziri, Yulia Tsvetkov, Maarten Sap, Yejin Choi

Abstract

Large language models (LMs) often struggle to generate diverse, human-like creative content, raising concerns about the long-term homogenization of human thought through repeated exposure to similar outputs. Yet scalable methods for evaluating LM output diversity remain limited, especially beyond narrow tasks such as random number or name generation, or beyond repeated sampling from a single model. To address this gap, we introduce Infinity-Chat, a large-scale dataset of 26K diverse, real-world, open-ended user queries that admit a wide range of plausible answers with no single ground truth. We introduce the first comprehensive taxonomy for characterizing the full spectrum of open-ended prompts posed to LMs, comprising 6 top-level categories (e.g., creative content generation, brainstorm & ideation) that further breaks down to 17 subcategories. Using Infinity-Chat, we present a large-scale study of mode collapse in LMs, revealing a pronounced Artificial Hivemind effect in open-ended generation of LMs, characterized by (1) intra-model repetition, where a single model consistently generates similar responses, and more so (2) inter-model homogeneity, where different models produce strikingly similar outputs. Infinity-Chat also includes 31,250 human annotations, across absolute ratings and pairwise preferences, with 25 independent human annotations per example. This enables studying collective and individual-specific human preferences in response to open-ended queries. Our findings show that state-of-the-art LMs, reward models, and LM judges are less well calibrated to human ratings on model generations that elicit differing idiosyncratic annotator preferences, despite maintaining comparable overall quality. Overall, INFINITY-CHAT presents the first large-scale resource for systematically studying real-world open-ended queries to LMs, revealing critical insights to guide future research for mitigating long-term AI safety risks posed by the Artificial Hivemind.

Reflections from the Selection Committee

This paper makes a substantial and timely contribution to the understanding of diversity, pluralism, and societal impact in modern language models. The authors introduce Infinity-Chat, a rigorously constructed benchmark of 26K real-world open-ended queries paired with 31K dense human annotations, enabling systematic evaluation of creative generation, ideation, and subjective preference alignment, dimensions historically underexamined in AI evaluation. Beyond releasing a valuable dataset, the paper provides deep analytical insights through the first comprehensive taxonomy of open-ended prompts and an extensive empirical study across more than 70 models, revealing the Artificial Hivemind effect: pronounced intra- and inter-model homogenization that raises serious concerns about long-term risks to human creativity, value plurality, and independent thinking. The findings expose critical miscalibration between current reward models, automated judges, and diverse human preferences, highlighting the tension between alignment and diversity and establishing a foundation for future work on preserving heterogeneity in AI systems. Overall, this work sets a new standard for datasets and benchmarks that advance scientific understanding and address pressing societal challenges rather than solely improving technical performance.

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, Junyang Lin

Abstract

Gating mechanisms have been widely utilized, from early models like LSTMs and Highway Networks to recent state space models, linear attention, and also softmax attention. Yet, existing literature rarely examines the specific effects of gating. In this work, we conduct comprehensive experiments to systematically investigate gating-augmented softmax attention variants. Specifically, we perform a comprehensive comparison over 30 variants of 15B Mixture-of-Experts (MoE) models and 1.7B dense models trained on a 3.5 trillion token dataset. Our central finding is that a simple modification—applying a head-specific sigmoid gate after the Scaled Dot-Product Attention (SDPA)—consistently improves performance. This modification also enhances training stability, tolerates larger learning rates, and improves scaling properties. By comparing various gating positions and computational variants, we attribute this effectiveness to two key factors: (1) introducing non-linearity upon the low-rank mapping in the softmax attention, and (2) applying query-dependent sparse gating scores to modulate the SDPA output. Notably, we find this sparse gating mechanism mitigates massive activation, attention sink and enhances long-context extrapolation performance. We also release related codes (https://github.com/qiuzh20/gated_attention}) and models (https://huggingface.co/QwQZh/gated_attention) to facilitate future research. Furthermore, the most effective SDPA output gating is used in the Qwen3-Next models (https://huggingface.co/collections/Qwen/qwen3-next).

Reflections from the Selection Committee

The main finding of this paper is that the performance of large language models using softmax attention can be consistently improved by introducing head-specific sigmoid gating after the scaled dot product attention operation in both dense and mixture-of-experts (MoE) Transformer models. This finding is backed up by more than thirty experiments on different variants of gated softmax attention using 15B MoE and 1.7B dense models trained on large-scale datasets of 400B, 1T, or 3.5T tokens. The paper also includes careful analyses showing that the introduction of the authors’ recommended form of gating improves the training stability of large language models, reduces the “attention sink” phenomenon that has been widely reported in attention models, and enhances the performance of context length extension. The main recommendation of the paper is easily implemented, and given the extensive evidence provided in the paper for this modification to LLM architecture, we expect this idea to be widely adopted. This paper represents a substantial amount of work that is possible only with access to industrial scale computing resources, and the authors’ sharing of the results of their work, which will advance the community’s understanding of attention in large language models, is highly commendable, especially in an environment where there has been a move away from open sharing of scientific results around LLMs.

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Kevin Wang , Ishaan Javali, Michał Bortkiewicz, Tomasz Trzcinski, Benjamin Eysenbach

Abstract

Scaling up self-supervised learning has driven breakthroughs in language and vision, yet comparable progress has remained elusive in reinforcement learning (RL). In this paper, we study building blocks for self-supervised RL that unlock substantial improvements in scalability, with network depth serving as a critical factor. Whereas most RL papers in recent years have relied on shallow architectures (around 2 — 5 layers), we demonstrate that increasing the depth up to 1024 layers can significantly boost performance. Our experiments are conducted in an unsupervised goal-conditioned setting, where no demonstrations or rewards are provided, so an agent must explore (from scratch) and learn how to maximize the likelihood of reaching commanded goals. Evaluated on simulated locomotion and manipulation tasks, our approach increases performance on the self-supervised contrastive RL algorithm by — , outperforming other goal-conditioned baselines. Increasing the model depth not only increases success rates but also qualitatively changes the behaviors learned.

Reflections from the Selection Committee
This paper challenges the conventional assumption that the information provided by reinforcement learning (RL) is insufficient to effectively guide the numerous parameters of deep neural networks, hence suggesting that large AI systems be predominantly trained through self-supervision, with RL reserved solely for fine-tuning. The work introduces a novel and easy-to-implement RL paradigm for the effective training of very deep neural networks, employing self-supervised and contrastive RL. The accompanying analysis demonstrates that RL can scale efficiently with increasing network depth, leading to the emergence of more sophisticated capabilities. In addition to presenting compelling results, the study includes several useful analyses, for example, for highlighting the important role of batch size scaling for deeper networks within contrastive RL.

Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training

Tony Bonnaire, Raphaël Urfin, Giulio Biroli, Marc Mezard

Abstract

Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data and allow generalization. In this work, we investigate the role of the training dynamics in the transition from generalization to memorization. Through extensive experiments and theoretical analysis, we identify two distinct timescales: an early time at which models begin to generate high-quality samples, and a later time beyond which memorization emerges. Crucially, we find that increases linearly with the training set size , while remaining constant. This creates a growing window of training times where models generalize effectively, despite showing strong memorization if training continues beyond it. It is only when it becomes larger than a model-dependent threshold that overfitting disappears at infinite training times. These findings reveal a form of implicit dynamical regularization in the training dynamics, which allows to avoid memorization even in highly overparameterized settings. Our results are supported by numerical experiments with standard U-Net architectures on realistic and synthetic datasets, and by a theoretical analysis using a tractable random features model studied in the high-dimensional limit.

Reflections from the Selection Committee

This paper presents foundational work on the implicit regularization dynamics of diffusion models, delivering a powerful result by unifying empirical observation with formal theory. The critical finding is the quantitative identification of two distinct, predictable timescales, an early, dataset-independent generalization phase followed by a linear, dataset-size-dependent memorization phase . This demonstration of an expanding window for effective generalization is not merely an empirical finding but is rigorously explained by deriving the spectral properties of the random features model using random matrix theory. By linking the practical success of diffusion models directly to a provable dynamical property (the implicit postponement of overfitting), the paper provides fundamental, actionable insight into the mechanisms governing modern generative AI, setting a new standard for analytical depth in the study of generalization.

Runners Up

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly in mathematics and programming tasks. It is widely believed that, similar to how traditional RL helps agents to explore and learn new strategies, RLVR enables LLMs to continuously self-improve, thus acquiring novel reasoning abilities that exceed the capacity of the corresponding base models. In this study, we take a critical look at \textit{the current state of RLVR} by systematically probing the reasoning capability boundaries of RLVR-trained LLMs across diverse model families, RL algorithms, and math/coding/visual reasoning benchmarks, using pass@\textit{k} at large \textit{k} values as the evaluation metric. While RLVR improves sampling efficiency towards the correct path, we surprisingly find that current training does \emph{not} elicit fundamentally new reasoning patterns. We observe that while RLVR-trained models outperform their base models at smaller values of (\eg, =1), base models achieve higher pass@ score when is large. Moreover, we observe that the reasoning capability boundary of LLMs often narrows as RLVR training progresses. Further coverage and perplexity analysis shows that the reasoning paths generated by RLVR models are already included in the base models’ sampling distribution, suggesting that their reasoning abilities originate from and are \textit{bounded} by the base model. From this perspective, treating the base model as an upper bound, our quantitative analysis shows that six popular RLVR algorithms perform similarly and remain far from optimal in fully leveraging the potential of the base model. In contrast, we find that distillation can introduce new reasoning patterns from the teacher and genuinely expand the model’s reasoning capabilities. Taken together, our findings suggest that current RLVR methods have not fully realized the potential of RL to elicit genuinely novel reasoning abilities in LLMs. This underscores the need for improved RL paradigms—such as continual scaling and multi-turn agent-environment interaction—to unlock this potential.

Reflections from the Selection Committee

This paper delivers a masterfully executed and critically important negative finding on a widely accepted, foundational assumption in Large Language Model (LLM) research: that Reinforcement Learning with Verifiable Rewards (RLVR) elicits genuinely new reasoning capabilities. The paper shows that RLVR training, across various model families, tasks, and algorithms, enhances sampling efficiency without expanding the reasoning capacity already present in base models. RL narrows exploration, rewarded trajectories are amplified, but the broader solution space shrinks, revealing that RLVR optimizes within, rather than beyond, the base distribution. This is an important finding which will hopefully incentivize fundamentally new RL paradigms able to navigate the vast action space and genuinely expand LLM reasoning capabilities.

Optimal Mistake Bounds for Transductive Online Learning

Zachary Chase, Steve Hanneke, Shay Moran, Jonathan Shafer

Abstract

We resolve a 30-year-old open problem concerning the power of unlabeled data in online learning by tightly quantifying the gap between transductive and standard online learning. We prove that for every concept class with Littlestone dimension , the transductive mistake bound is at least . This establishes an exponential improvement over previous lower bounds of , , and , respectively due to Ben-David, Kushilevitz, and Mansour (1995, 1997) and Hanneke, Moran, and Shafer (2023). We also show that our bound is tight: for every , there exists a class of Littlestone dimension with transductive mistake bound . Our upper bound also improves the previous best known upper bound from Ben-David et al. (1997). These results demonstrate a quadratic gap between transductive and standard online learning, thereby highlighting the benefit of advanced access to the unlabeled instance sequence. This stands in stark contrast to the PAC setting, where transductive and standard learning exhibit similar sample complexities.

Reflections from the Selection Committee

This paper presents a breakthrough in learning theory, deserving the NeurIPS Best Paper Runner-Up award for its elegant, comprehensive, and definitive resolution of a 30-year-old open problem. The authors have not only precisely quantified the optimal mistake bound for transductive online learning as Ω(√d), but they have also achieved a tight match with an O(√d) upper bound. This establishes a quadratic gap between transductive and standard online learning, a result that represents an exponential leap beyond all previous logarithmic lower bounds and dramatically highlights the theoretical value of unlabeled data in this setting—a crucial insight distinct from its more limited role in PAC learning.

The novelty and ingenuity of their proof techniques are quite remarkable. For the lower bound, the adversary employs a sophisticated strategy that balances forcing mistakes with carefully managing the shrinking of the version space, leveraging the concept of “paths in trees” as a fundamental underlying structure. The upper bound, demonstrating the learnability within O(√d) mistakes, introduces an innovative hypothesis class construction that embeds a “sparse encoding” for off-path nodes – a probabilistic design where most off-path labels are zero, but the rare ones carry immense information. The learner’s strategy to exploit this class is equally brilliant, integrating several non-standard sophisticated techniques: “Danger Zone Minimization” to control the instance sequence presented by the adversary, “Splitting Experts” via a multiplicative weights approach to handle uncertainty about a node’s on-path status, and a strategic “Transition to Halving” once sufficient information is gathered from the sparsely encoded off-path labels. This intricate interplay between a cleverly constructed hypothesis class and a highly adaptive learning algorithm showcases a masterclass in theoretical analysis and design.

Superposition Yields Robust Neural Scaling

Yizhou Liu, Ziming Liu, Jeff Gore

Abstract

The success of today’s large language models (LLMs) depends on the observation that larger models perform better. However, the origin of this neural scaling law, that loss decreases as a power law with model size, remains unclear. We propose that representation superposition, meaning that LLMs represent more features than they have dimensions, can be a key contributor to loss and cause neural scaling. Based on Anthropic’s toy model, we use weight decay to control the degree of superposition, allowing us to systematically study how loss scales with model size. When superposition is weak, the loss follows a power law only if data feature frequencies are power-law distributed. In contrast, under strong superposition, the loss generically scales inversely with model dimension across a broad class of frequency distributions, due to geometric overlaps between representation vectors. We confirmed that open-sourced LLMs operate in the strong superposition regime and have loss scaling inversely with model dimension, and that the Chinchilla scaling laws are also consistent with this behavior. Our results identify representation superposition as a central driver of neural scaling laws, providing insights into questions like when neural scaling laws can be improved and when they will break down.

Reflections from the Selection Committee:

This paper moves beyond observation of neural scaling laws—the empirically established phenomenon in which model loss exhibits a power-law decrease as model size, dataset size, or computational resources are increased—to demonstrate that representation superposition constitutes the primary mechanism governing these laws. Authors introduce a controlled “toy model” to examine how superposition and data structure affect the scaling of loss with model size and demonstrate that under strong superposition where features are overlapping, the loss scales consistently as an inverse power law with respect to the model dimension. The core findings are supported by a series of carefully designed experiments and offer fresh insights into an important research area.

The selection of these papers reflects the remarkable breadth of research presented at NeurIPS 2025, spanning generative modeling, reinforcement learning, natural language processing, learning theory, neural scaling, and benchmarking methodologies. The diversity of topics among the awarded papers demonstrates the vibrant and multifaceted nature of machine learning research.

We extend our congratulations to all the award recipients and look forward to seeing these works presented at the conference this December! Please note that the award certificates will be given out during the paper’s respective oral sessions by the session chairs.

We would also like to extend our gratitude and appreciation to the members of the Best Paper Award Committee listed here.

Best Paper Award Committee for Main Track and Database and Benchmark Tracks

Jacob Andreas (MIT, United States)
Sander Dieleman (Google DeepMind, UK) 
Dilek Hakkani-Tur (University of Illinois Urbana-Champaign, United States) 
Brian Kingsbury (IBM, United States) 
Mirella Lapata (University of Edinburgh, Scotland) 
Vincent Lepetit (Ecole des Ponts ParisTech, France) 
Ulrich Paquet (AIMES & Google DeepMind, Africa) 
Violet Peng (UCLA, United States) 
Doina Precup (McGill University, Canada) 
Masashi Sugiyama (RIKEN & University of Tokyo, Japan) 
Vincent Tan (National University of Singapore, Singapore) 
Yee Whye Teh (University of Oxford, United Kingdom) 
Xing Xie (Microsoft, China) 
Luke Zettlemoyer (University of Washington/Meta, United States)

November 10 2025

NeurIPS CDMX 2025: Accepted Workshops and Tutorials

Communications Chairs 2025 2025 Conference

We are thrilled to announce the accepted workshops and tutorials happening at NeurIPS Mexico City. Workshops will be held on November 30th and December 1st, and Tutorials on December 2nd.

Mexico City Workshops on Sunday 30 Nov

Vision Language Models: Challenges of Real World Deployment

Vision language models (VLMs) have demonstrated remarkable capabilities in integrating visual perception with natural language understanding, powering applications such as multimodal assistants, robotics, autonomous systems, and accessibility tools. However, their real-world deployment faces significant challenges in efficiency, scalability, and reliability. This workshop will bring together researchers and practitioners from academia and industry to highlight cutting-edge research, systems-level optimizations, and evaluation methodologies that are often overlooked yet pivotal for robust real-world integration. Efficiency, robustness, and reliability will be emphasized as core design principles, essential to advancing VLMs from experimental systems to dependable deployed technologies. By convening researchers at the intersection of multimodal learning, efficient inference and training, robustness and uncertainty estimation, and large-scale systems design, the workshop aims to establish concrete pathways toward building VLMs that can operate reliably under practical constraints. We hope this workshop will serve as a venue for exchanging insights on model design, efficiency techniques, and robustness evaluation that bridge the gap between research and real-world systems.

NeurIPS 2025 Workshop on Embodied and Safe-Assured Robotic Systems

This workshop focuses on advancing safe and quality-assured embodied robotic systems. Embodied systems—including autonomous robots, self-driving vehicles, robotic arms, and humanoid robots—are increasingly deployed in safety-critical real-world scenarios. Ensuring their trustworthiness—encompassing safety, reliability, and predictable behavior—remains a pressing challenge. Despite notable progress in perception, reasoning, and control, many AI-based robotic systems still operate as “black boxes,” often exhibiting unpredictable behaviors. Failures can emerge from complex sensorimotor interactions, adversarial inputs, or novel environments, leading to safety incidents and diminished user trust.

NeurIPS 2025 Workshop Research Development AI Mexico

The Research Development of AI in Mexico: Main Applications workshop seeks to showcase, strengthen, and connect the most impactful developments in Artificial Intelligence (AI) and Data Science emerging from Mexico and the broader Latin American region. Over the past four decades, Mexico has cultivated a robust research community in AI through pioneering contributions in areas such as computational intelligence, autonomous robotics, fuzzy systems, and natural language processing, led by institutions including CIC–IPN, INAOE, UNAM, ITESM, CINVESTAV, and Universidad Veracruzana.Today, the region is undergoing a strategic transformation, shifting from foundational research to the development of applied AI technologies addressing real-world needs in healthcare, education, agriculture, smart cities, cybersecurity, and sustainability. This evolution has been further propelled by increased access to open data, advances in computing infrastructure, and growing collaborations between academia, government, and industry.Despite these advances, Latin America faces distinctive challenges in the development and deployment of AI. These include limited funding, underrepresentation in global AI initiatives, digital inequality, and the need for responsible, inclusive, and culturally relevant AI systems. Additionally, emerging concerns related to AI ethics, algorithmic bias, and regulatory frameworks must be addressed proactively to ensure equitable and trustworthy technology adoption.This workshop aims to create a forum for researchers, students, practitioners, and policymakers to engage in meaningful dialogue about the current landscape and future directions of AI in Mexico and Latin America. By promoting interdisciplinary collaboration, the workshop will highlight impactful case studies, emerging research trajectories, and opportunities for cross-border cooperation, while fostering a shared vision for AI that is ethical, sustainable, and aligned with regional priorities.

NeurIPS 2025 Workshop on Socially Responsible and Trustworthy Foundation Models

The Socially Responsible and Trustworthy Foundation Models (ResponsibleFM) Workshop at NeurIPS 2025 Mexico City is envisioned as a vital interdisciplinary forum dedicated to advancing ethical, inclusive, and socially conscious research practices in the rapidly evolving field of foundation models, including language models and multimodal models. As foundation models are tremendously reshaping human communication, decision-making, and societal infrastructures, there is a growing recognition of the profound impacts these systems can have, both positive and negative, on individuals and communities. In particular, previous research has documented a wide range of risks and harms associated with foundation models, including but not limited to bias and discrimination, misinformation propagation, privacy violations, environmental concerns, and unintended social consequences.

Mexico City Workshops on Monday 1 Dec

First Workshop on LLM Persona Modeling

Large language models (LLMs) are increasingly used to simulate human-like personas for applications in research, education, healthcare, and interactive AI systems. While such persona modeling creates opportunities for interdisciplinary innovation, it raises challenges around authenticity, consistency, bias, and ethical deployment. This workshop brings together perspectives from AI, psychology, cognitive science, and human–computer interaction to advance robust methods, standardized evaluation frameworks, and responsible practices for persona modeling in LLMs. Through invited talks, panels, posters, and discussions, the event will chart a roadmap for interdisciplinary collaboration and future research in this emerging area.

Centering Low-Resource Languages and Cultures in the Age of Large Language Models

Large Language Models (LLMs) have transformed NLP research and applications, yet they are still predominantly trained on high-resource, globally dominant languages. This imbalance leads to poor performance and limited applicability for low-resource languages, which are rich in tone, morphology, and cultural meaning. As a result, current AI systems risk reinforcing linguistic inequality, cultural erasure, and lack of accessibility in critical domains like education and healthcare.This workshop aims to reframe language technology by centering low-resource languages, cultures, and epistemologies in the age of LLMs. We seek to bring together researchers, linguists, developers, healthcare professionals, and technologists to share insights and develop strategies for building inclusive, culturally grounded, and linguistically robust language models. The workshop emphasizes collaboration across disciplines and regions to ensure both technical advancement and social relevance.Key areas of focus include developing LLM architectures tailored to low-resource linguistic features, ethical and community-centered dataset collection, and multilingual benchmarks designed specifically for underrepresented languages. We also highlight the importance of healthcare and medical machine translation to support equitable access to information and improve public health outcomes. Ultimately, this workshop aims to advance responsible AI innovation that empowers low-resource language communities and shapes a more inclusive future for global language technologies.

NORA: The First Workshop on Knowledge Graphs & Agentic Systems Interplay

Agents have experienced significant growth in recent years, largely due to the rapid technological advancements of Large Language Models (LLMs). Although these agents benefit from LLMs’ advanced generation proficiency, they still suffer from catastrophic forgetting and a limited context window size compared to the agents’ needs in terms of contextual information. Knowledge Graphs (KGs) are a powerful paradigm for structuring and managing connected pieces of information while unlocking deeper insights than traditional methods. Their value is immense for tasks that require context, integration, and reasoning. However, this power comes at the cost of significant upfront and ongoing investment in construction, curation, and specialized expertise. The first version of this workshop aims at analyzing and discussing emerging and novel practices, ongoing research and validated or deployed innovative solutions that showcase the growing synergy between LLMs agents and KGs.

7th International Workshop on Large Scale Holistic Video Understanding: Toward Video Foundation Models

This workshop aims to advance the field of video understanding by fostering discussions around holistic and generalist video foundation models. Building upon the Holistic Video Understanding (HVU) initiative and dataset introduced in 2019, we have successfully organized eight HVU workshops and tutorials at top-tier venues such as CVPR and ICCV, uniting researchers, practitioners, and students from around the world. These efforts have played a central role in moving the community beyond narrow action recognition tasks toward multi-faceted, semantic, and generalist video understanding.With the emergence of large-scale foundation models and video large language models (Video-LLMs), the landscape of video understanding is rapidly evolving. These models enable unified reasoning across spatial, temporal, and multimodal dimensions, yet introduce new challenges in scalability, efficiency, interpretability, and responsible deployment.The HVU Workshop 2025 will provide a platform to explore these frontiers, discussing topics such as multimodal representation learning, long-context reasoning, evaluation of general-purpose video systems, efficient adaptation and scaling laws, and the ethical and societal implications of video AI. Our goal is to bring together a diverse and inclusive community to define the next chapter of holistic, generalist, and responsible video understanding.

Mexico City Tutorials on Tue 2 Dec

Efficient Transformers: State of the art in pruning, sparse attention, and transformer funneling

Transformer architectures consume the lionshare of computational budgets associated with today’s most powerful language and vision models, making research into greater computational efficiency a hot and essential direction. Our proposed tutorial surveys the bleeding edge of three complementary research threads that together comprise a significant part of the current industrial toolkit for achieving computational efficiency in Transformers: (1) pruning, the structured or unstructured removal of weights, layers and heads; (2) sparse attention & routing, including block, sliding-window, locality-sensitive hashing; and (3) funneling, which pools intermediate representations to shorten sequences through depth. We will then feature an expert industrial and academic panel of speakers from Caltech, MIT, Anthropic, Google Deepmind, and Microsoft, hearing about the latest trends seen in top industrial labs. Attendees will leave with actionable recipes for building sub-10 B-parameter models that match or exceed dense baselines on language, vision and multi-modal benchmarks. The tutorial targets researchers and practitioners who build or deploy Transformer models and assumes familiarity with basic deep-learning concepts but not with any specific efficiency method. All slides and publication materials will be released under a permissive license.

Geospatial Foundation Models: Overview, Application and Benchmarking

Geospatial foundation models (GeoFMs) are a class of large-scale deep learning models, typically based on the transformer architecture, that are pre-trained on vast, diverse datasets of Earth Observation data to learn a general, transferable understanding of the Earth’s surface. These models help address long-standing challenges in Earth Observation by dramatically reducing the need for manually labeled data, handling vast and diverse data streams (e.g., optical, SAR, multispectral, LiDAR), and enabling robust performance across time, space, and sensor types. In this tutorial, we will give an overview of the recent advancements in GeoFMs, highlighting the main challenges in developing these models and differences from foundation models developed for other domains. We will also show practical examples of fine-tuning and inferencing GeoFMs for different downstream tasks using the TerraTorch open-source framework, which facilitates the use to publicly available GeoFMs such as SatMAE, Prithvi-EO, DOFA, Galileo and TerraMind. Finally, we will introduce best practices for systematic and reproducible benchmarking of GeoFMs using the TerraTorch Iterate plug-in and its integration with GEO-Bench.

From Tuning to Guarantees: Statistically Valid Hyperparameter Selection

The performance and reliability of modern machine learning systems depend critically on hyperparameter selection. Whether tuning a large language model, configuring a vision pipeline, or deploying AI in safety-critical environments, the choice of hyperparameters is decisive. Current tuning strategies such as grid or random search and Bayesian optimization are powerful for empirical optimization but they do not provide statistical guarantees on the reliability of the selected configuration after deployment. This gap becomes critical when models must satisfy strict performance, safety, or fairness requirements. This tutorial introduces a rigorous and practical framework that treats hyperparameter selection as a statistical testing problem. By constructing valid p- or e-values for candidate configurations and applying multiple hypothesis testing (MHT) procedures, practitioners can control deployment risk with finite-sample guarantees. We begin with the Learn-Then-Test (LTT) methodology for average-risk control and build up to multiple key extensions, such as controlling the quantile risk using quantile LTT (QLTT), multi-objective optimization through Pareto Testing (PT), incorporating prior information through the concept of reliability graphs, and data-efficient selection through adaptive LTT (aLTT). Throughout the tutorial, we emphasize conceptual clarity, plain-language explanations of assumptions, and hands-on demonstrations with minimal, reproducible notebooks. Attendees will gain a drop-in toolkit for augmenting existing tuning workflows with statistically valid selection. They will learn how to formalize relevant risk functions, generate valid evidence, choose appropriate error-rate controls (FWER/FDR), and navigate the trade-offs between statistical conservatism and power under limited data. No prior expertise in multiple hypothesis testing is required.

How to Build Agents to Generate Kernels for Faster LLMs (and Other Models!)

The compute demanded by modern AI has been exploding since 2016; the FLOPs used to train frontier models have grown at a rate of 2.4x per year, and the inference side is growing even faster—already an estimated 80% of total AI electricity use. Large language models and other deep networks rely on highly tuned GPU kernels to achieve state-of-the-art performance; these efficient kernels directly translate to cost and energy savings. In this 2.5-hour in-person tutorial, we demonstrate how LLM-powered agents can generate and optimize GPU kernels for CUDA, HIP/ROCm, and Triton. We begin with a unified primer on GPU‐programming fundamentals and common tooling (memory hierarchy, occupancy, profilers), then introduce an agentic loop: prompt engineering, compiler/profiler feedback as tools, iterative kernel refinement, correctness validation, and automated benchmarking. We will provide additional benchmarking examples on HIP and Triton, on top of Stanford’s KernelBench that covers CUDA, KernelBot as reliable source of human curated dataset for heterogenous GPU code, and show how to turn runtime and profiler metrics into reward signals that drive kernel optimizations. On top of this loop, we build an inference-scaling framework in which the LLM proposes candidate kernels, compiles them, measures latency/throughput/energy, and feeds those signals back as rewards. By combining test-time scaling techniques the agent iteratively discovers increasingly accurate and efficient kernels. Attendees will compare generated code against expert kernels, inspect wins and losses. By the end, participants will walk away with a reproducible pipeline for LLM-driven GPU‐kernel optimization.

Positional Encoding: Past, Present, and Future

Positional Encoding is a foundational yet often opaque component of Transformer architectures, underpinning how self-attention mechanisms capture sequence order in language, vision, and multimodal models. Despite its centrality to the success of modern LLMs, and other attention-reliant architectures, the mathematical in- tuition behind positional encoding remains challenging and inaccessible to many researchers and practitioners. This workshop aims to demystify positional encoding by bridging formal theory with intuitive understanding and practical experimentation. Through a series of guided lectures, visual demonstrations, and hands-on coding sessions, participants will explore the operational principles behind effective positional representations, the evolution of key methods (from sinusoidal and learned embeddings to rotary and relative encodings), and open challenges that motivate current research directions. We will also provide open-source code implementations, mathematical visualizations, and collaborative ideation sessions for fostering new positional encoding concepts. By easing the barrier to entry for this mathematically intensive, yet crucial topic, the workshop seeks to foster deeper understanding, interdisciplinary exchange, and novel contributions to the future of Positional Encoding, and Transformer design

Science of Trustworthy Generative Foundation Models

We are living through a moment that once belonged to science fiction: generative foundation models can write, reason, design, diagnose, and increasingly, decide. They are no longer just predicting the next word — they are shaping knowledge, influencing choices, and becoming collaborators in science, medicine, education, and daily life. But here’s the tension: as their capabilities accelerate, our ability to trust them has not kept pace. Trustworthiness can’t remain a “patch after the failure” or a moral hope layered on top of engineering. It must evolve into a science—a discipline as rigorous as the one that created these models in the first place. In this tutorial, we explore what that science looks like: how we understand model behaviors, measure and stress-test trust, and design systems that earn it. We’ll build the foundations together, then step into the frontier—where models begin to exhibit human-like cognitive behaviors that inspire wonder, but also demand responsibility and new forms of alignment. This session is an invitation: to move beyond building models that impress us, toward building models we can trust with what matters.

November 3 2025

NeurIPS Newsletter – October 2025

Communications Chairs 2025 2025 Conference

The Newsletter is an easy way to keep up to date with NeurIPS events and planning progress. As a reminder, NeurIPS 2025 will be held in San Diego, from Tuesday, Dec 2 to Sunday, Dec 7, 2025.

Welcome to the October edition of the NeurIPS monthly Newsletter!

This Newsletter includes:

NeurIPS 2025 Journal Track : accepted papers from TMLR and AoS
Announcing the affinity events
Announcing the accepted NeurIPS socials
Reflecting on the review and acceptance process this year
NeurIPS Foundation Launches “Bridging the Future” Grants Program
Invited speakers: details coming soon!

NeurIPS 2025 Journal Track – accepted papers from TMLR and AoS

The NeurIPS 2025 Journal Track continues to strengthen the bridge between the machine learning and statistics communities by bringing together leading research from premier journals and the vibrant NeurIPS conference. This year, we are proud to feature 15 papers from the Journal of Machine Learning Research and 20 papers from the Annals of Statistics, all of which will be presented as posters in San Diego. These works exemplify the rigor, depth, and cross-disciplinary innovation that lie at the heart of NeurIPS, fostering meaningful dialogue between journal publications and the broader conference community. Read more on the accepted papers: https://blog.neurips.cc/2025/10/22/bridging-journals-and-the-neurips-community-journal-track-at-neurips-2025/
Announcing the affinity events

At NeurIPS, affinity groups play a crucial role in promoting and supporting the ideas and voices of various communities that are defined by some axis of joint identity and raise awareness of issues that affect their members. In addition, they provide members of these affinity groups with increased opportunities to showcase their work, engage in discussions, and build connections during NeurIPS events, promoting diversity and inclusion at NeurIPS. Read more on the affinity events for the this year:
https://blog.neurips.cc/2025/10/08/announcing-the-neurips-2025-affinity-events/
Announcing the accepted NeurIPS socials

We’re thrilled to announce the eight accepted social events taking place in San Diego, and the three accepted social events taking place in Mexico City (CDMX).

The NeurIPS Socials bring our community together to discuss, debate, and celebrate the latest advances in AI and machine learning. From interdisciplinary exchanges to focused thematic sessions, the NeurIPS social events serve as vital spaces for connection and dialogue, extending the intellectual and collaborative spirit of the conference beyond its technical program.

Accepted NeurIPS 2025 socials:
https://blog.neurips.cc/2025/11/03/neurips-2025-socials-join-the-conversations/

Accepted NeurIPS 2025 Mexico City Socials:
https://blog.neurips.cc/2025/11/03/neurips-2025-socials-cdmx-join-the-conversations/
Reflecting on the review and acceptance process this year

Chairs from across the main program, dataset and benchmarks, and position paper tracks have released blogs reflecting upon their track’s review and acceptance process this year.

Reflecting on the review process from the Program Committee Chairs
https://blog.neurips.cc/2025/09/30/reflections-on-the-2025-review-process-from-the-program-committee-chairs/

Reflecting on the review process for the position track papers:
https://blog.neurips.cc/2025/10/02/reflecting-on-the-inaugural-neurips-position-paper-track-a-pilot-year-journey/

Reflecting on the 2025 Review Process from the Datasets and Benchmarks Chairs
https://blog.neurips.cc/2025/09/30/reflecting-on-the-2025-review-process-from-the-datasets-and-benchmarks-chairs/
NeurIPS Foundation Launches “Bridging the Future” Grants Program

The Neural Information Processing Systems Foundation is excited to announce the opening of the Bridging the Future grants program for this year. The program supports innovative activities that broaden participation in Artificial Intelligence and Machine Learning.

This initiative reflects NeurIPS’s commitment to expanding access to AI/ML education and research opportunities across all levels—from grade school students taking their first steps in computational thinking to senior faculty developing new pedagogical approaches. By investing in the next generation of AI researchers and practitioners, we aim to build a more diverse, inclusive, and vibrant research community.

Find out more in our blog post: https://blog.neurips.cc/2025/10/30/neurips-foundation-opens-bridging-the-future-grants-program/
Look for an announcement on what our invited speakers will address soon!

Find a full list of our invited speakers here: https://neurips.cc/virtual/2025/eventlistwithbios/invited%20talk

Danielle Belgrave, Cheng Zhang, Jean Kossaifi, Alex Lu and Mengye Ren

NeurIPS 2025 General Chairs and Communication Chairs

You are receiving this newsletter as per your subscription preferences in your NeurIPS profile. As you prepare to attend NeurIPS, we hope that you will find the following information valuable. To unsubscribe from the NeurIPS newsletter, unselect the “Subscribe to Newsletter” checkbox in your profile: https://neurips.cc/Profile/subscribe. To update your email preference, visit: https://neurips.cc/FAQ/EmailPreferences

November 3 2025

NeurIPS 2025 Socials CDMX : Join the Conversations!

Communications Chairs 2025 2025 Conference

We are pleased to announce the three social events for NeurIPS 2025 in Mexico City (CDMX). Designed to foster connection and dialogue, these gatherings extend the conference’s collaborative spirit. They provide a vibrant setting for the community to engage in interdisciplinary exchange and celebrate the field’s advances outside the technical program.

Accepted Socials at NeurIPS 2025 CDMX

SomosNLP at NeurIPS: Building Principles for Inclusive Spanish & Portuguese NLP

This social aims to create a space for academic and technical reflection on the challenges of developing Natural Language Processing (NLP) for languages beyond English, with a special focus on Spanish and Portuguese.

They propose a forum to debate methodological, conceptual, and practical tensions, specifically:

• How benchmarks and datasets shape progress in non-English NLP.

• What biases and omissions today’s AI ecosystem imposes.

• What strategies can advance inclusive, multilingual NLP.

• How to navigate tensions between industry-driven approaches (fast, large-scale, performance-oriented) and community/academia-driven approaches (careful, linguistically grounded).

Day: Monday, December 1, 18:30

Women in AI Social — Amplifying Voices in AI

The “Women in AI Social – Amplifying Voices in AI” is an interactive community event designed to foster meaningful connections, peer dialogue, and collective reflection among women and allies in the AI research ecosystem. The event embraces inclusive, participatory formats, such as rotating roundtables, speed networking, and creative group activities, that center the voices of all attendees.

With a focus on mentorship, inclusion, and global-local perspectives (especially from Latin America), the event invites participants to share experiences, build supportive networks, and envision a more equitable AI future. It is open to all registered NeurIPS CDMX participants, regardless of gender or career stage, and culminates in an optional post-event dinner or social outing to continue the conversations in an informal setting.

Day: Wednesday, December 3, 18:30

Claiming Your True Market Value as an AI Researcher – Negotiation Workshop & Fireside Chat

As AI reshapes industries, compensation is changing faster than researchers can track. New labs, startups, and top companies are competing for talent with vastly different pay structures, currencies, and cultural norms. Yet most researchers are never formally taught how to understand their worth or navigate these systems. The result is an uneven landscape where brilliant minds often make life-changing decisions without the information they need.

The session begins with a concise, data-driven talk on current AI compensation and negotiation trends, grounded in real stories and case studies. From there, a fireside chat and open Q&A invite candid, experience-driven insights from researchers who have navigated these conversations firsthand.

Day: Thursday, December 4, 18:30

NeuripsCDMX social chairs

Lourdes Martínez and Ulises Moya

November 3 2025

NeurIPS 2025 Socials : Join the Conversations!

Communications Chairs 2025 2025 Conference

As NeurIPS 2025 approaches, we’re thrilled to announce the eight accepted social events taking place in San Diego. These gatherings bring our community together to discuss, debate, and celebrate the latest advances in AI and machine learning. From interdisciplinary exchanges to focused thematic sessions, the NeurIPS social events serve as vital spaces for connection and dialogue, extending the intellectual and collaborative spirit of the conference beyond its technical program.

Accepted Socials at NeurIPS 2025

Learning Theory Alliance

https://let-all.com/neurips25.html

This social event, organized by the Learning Theory Alliance, will feature a fireside chat and ask-me-anything (AMA) with a senior community member followed by informal round-table discussions. The session aims to provide mentorship, share insights on research and career development, and foster connections among researchers in learning theory. Building on the success of last year’s NeurIPS social, “Theory in the Age of LLMs,” this event will strengthen engagement across the theoretical and applied machine learning communities and support the Alliance’s mission to cultivate an inclusive, collaborative, and globally connected learning theory community.

Day: Wednesday, December 3

When Errors Dream: Exploring Collective Creativity through AI Hallucination

When Errors Dream reframes AI hallucination as creative material. In a two-hour open-space jam in festival style, attendees rotate through small groups to generate surprising AI outputs—text, image, or sound—and transform them into collaborative artworks and interactive experiences using digital and analog media. No prior skills are required; emphasis is on the joy of making with machines, not polish. Designed to be drop-in friendly, the format scales to 150–200 participants through science-fair-style stations and quick exquisite-corpse creation cycles, fostering inclusive networking through shared play rather than one-directional talks.

Day: Thursday, December 4

The Role of AI in Scientific Peer Review

https://ai-scientific-peer-review.github.io

This social event will explore the role of Artificial Intelligence (AI) in addressing the current challenges and shaping the future of scientific peer review. We will examine how AI can be applied across the entire scholarly publishing process, from authoring to reviewing, editing, and even readership. The event will foster critical discussion on the ethical implications, potential benefits, and practical implementation of AI in this critical scientific process. Our goal is to bring together researchers, practitioners, and stakeholders from diverse fields in an interactive format to build community and explore actionable solutions for a more efficient, fair, and transparent peer review system.

Day: Wednesday, December 3

NeuroAI: From Neurons to Transformers

https://neuroai-social-websi-xk4i.bolt.host

As artificial intelligence grows ever more brain like, the dialogue between neuroscience and machine learning has never been more important. NeuroAI: From Neurons to Transformers explores how ideas from biological cognition are inspiring next-generation AI, and how, in turn, large-scale foundation models are reshaping how we study the brain and teach about it. The social invites participants to discuss neural computation, cognitive modeling, and educational technology—from brain-inspired architectures that mimic learning to AI-driven tools that personalize education and accelerate scientific discovery. In an inclusive setting, attendees will connect across disciplines, neuroscience, AI, EdTech, ethics, and beyond, to share ideas, debate implications, and imagine a future where understanding minds, human or artificial, transforms how we learn, create, and collaborate.

Day: Thursday, December 4

Agents Safety Panel

As AI systems become increasingly capable and widely deployed, ensuring their safety and reliability is more important than ever. Join us for a 30-minute panel discussion on the safety of agents from development to deployment, followed by a brief Q&A session. The rest of the event will consist of discussion and mingling among attendees. We will provide drinks and snacks. This event is co-organized by the Center for AI Safety (CAIS) and UK AI Security Institute (AISI).

Day: Wednesday, December 3

Nonprofits Working on Openness and Trust in AI

https://enterprise.wikimedia.com/blog/neurips-event

Join us for an in-person social event at NeurIPS 2025, to explore the intersections between generative AI data and open trusted datasets. This session will feature representatives from the Wikimedia Foundation, ML Commons and AI Alliance, offering an opportunity to connect with nonprofits committed to using technology for academic and social missions. The event will begin with presentations from both organizations, highlighting their goals, projects and research (e.g., Wikipedia, AI Alliance’s Open Trusted Data Initiative, MLCommon’s Croissant data standards) and challenges with trust and responsible data usage in AI. Following the presentations, the session will transition into roundtable discussions focused on current initiatives and an open Q&A.

Day: Wednesday, December 3

Value Chain from Research to ROI
womeninai.co/wailabs

We aim to understand: How can research be ready to accelerate into products/applied solutions? What kind of environments, processes & catalysts are needed to establish pathways from research to larger ecosystems consisting of products & businesses? We will have a panel discussion on these topics followed by a hands-on group activity where attendees will ideate sector-specific product applications from their own research. The event is structured to help researchers articulate their research’s broader potential, emphasizing ecosystem thinking and practical steps needed to scale research impact.

Day: Thursday, December 4

Evaluating Agentic Systems: Bridging Research Benchmarks and Real-World Impact

https://luma.com/mkyyvypm

Agentic AI systems—LLM-driven agents capable of autonomous planning, tool use, and multi-step task execution—are rapidly advancing, yet methods for evaluating them remain underdeveloped. Traditional metrics for static or single-turn tasks fail to capture the complexity of open-ended, long-horizon interactions where goals evolve and behaviors emerge dynamically. This social aims to bridge research and industry perspectives on designing frameworks, simulation environments, and metrics that assess reliability, alignment, and safety in autonomous agents. Through lightning talks, panel discussions, and networking, the event fosters an interactive exchange on how to meaningfully evaluate and benchmark the next generation of agentic AI systems.

Day: Thursday, December 4

Social Co-Chairs

Ehsan Adeli (Stanford University)

Alessandra Tosi (Mind Foundry)

Saining Xie (New York University)

October 2 2025

Reflecting on the Inaugural NeurIPS Position Paper Track: A Pilot Year Journey

Communications Chairs 2025 2025 Conference

As we announce decisions for the first-ever NeurIPS Position Paper Track, we find ourselves reflecting on an extraordinary experimental year. When NeurIPS launched this pilot track, we ventured into uncharted territory. The community response was overwhelming and inspiring. Every submission represented not just research, but a vision for how position papers could contribute to broader conversations in our field.

We want to take a moment to share our reflections, including lessons we’ll take forward into the future. This blog post is specific to the position paper track, which is just one of several tracks at NeurIPS: the Main Program track and Datasets and Benchmarks track are also releasing blog posts reflecting on their processes. To all authors who trusted us with their ideas and hard work, thank you. To all Reviewers, Emergency Reviewers, and Area Chairs who dedicated their time, thank you, too!

The Numbers

We received nearly 700 initial submissions—a response that both humbled and energized us. After accounting for withdrawals and desk reviews, we had 496 viable candidates for full consideration. From this impressive pool, we selected 40 papers for presentation at the conference. That’s an acceptance rate of approximately 8%. We know that this number is lower than other tracks in the conference, but it reflects the specific affordances of position papers and the conference strategy for piloting this track.

The NeurIPS 2025 Position Paper Track envisions giving each accepted author more focused attention: we are planning to provide accepted authors with new formats of presentation and tailored communication guidance to further evolve their message. We feel this better fits with the NeurIPS position paper track goal for this year of widespread visibility to serve as a springboard for public discussion. This does mean that we had to accept a limited number of papers.

If your paper received high scores but it wasn’t ultimately accepted for presentation at the conference, you’re in good company. We had a wealth of great papers this year! Additionally, some excellent papers with high technical merit weren’t the best fit for this inaugural community discussion—either requiring specialized expertise that might limit broader engagement or being more focused on technical implementations and methods than we envision for this year’s track.

We encourage authors who received positive reviews to incorporate that feedback for submissions to other venues and platforms. These conversations are too important to end here.

Clarifying Our Process

We’ve received questions about our process, and we want to be transparent as we continue to adapt and learn from this pilot year. We’re grateful for all the suggestions authors provided in the author survey, which included proposals for new review mechanisms, specific feedback on the call for submissions, and thoughtful ideas about track themes. We will continue to digest your proposals and translate these ideas into learnings for future conference position paper tracks.

One thing we piloted this year that we want to provide extra context on is the adjudication process. From the outset, adjudication was designed to improve our review process and address procedural concerns about reviewer behavior—not to alter paper scores after the fact. That said, all author feedback, including adjudication requests, was available to Area Chairs and Program Chairs to inform metareviews and decision-making.

We have several ongoing investigations into reviewer conduct concerns that will continue until appropriate resolution. For every author who requested adjudication, we’re corresponding individually.

Addressing Community Concerns

We’ve received thoughtful feedback that has highlighted some unintended consequences of our pilot approach. We want to address these directly:

Exclusivity and Limited Discussion: Our small acceptance pool, while intended to provide focused attention to selected papers, has understandably raised concerns about exclusivity and limited diversity of viewpoints. We stand by our decision to focus on fewer papers this year towards a goal of more amplification and resources on each, and the community feedback is something that we will take into account for future years.

Responses to Reviews: We experimented with changes to the review process, moving from rebuttals to an author survey and creating a space for adjudication where possible misbehavior might be in question. We were hoping to create more civil exchanges and provide mechanisms for addressing review concerns within the more constrained timeline of the position paper track. Our decision was Informed by a growing body of research that shows that the process of rebuttals can be biased and expose reviewers to “peer review bullying” Author survey responses, including how the paper would change based on reviewer feedback, were read and considered carefully in metareviews and the final decisions. In other words, we looked at everything when making the decisions.

Timeline Challenges: We apologize that our ambitious design was not accompanied by punctual logistics execution when we encountered storms along the way. We initially synchronized with other track deadlines within NeurIPS. Unfortunately, we found ourselves needing a significant number of emergency reviewers and additional time to ensure every paper received proper evaluation and a minimum of three reviews. This was a result of the novel format of the position paper track, uncertainty around reviewer expectations, and a greater number of reviewers who were unable to complete their work or became unresponsive than we had planned for.

We did not anticipate the scale of reviewer coordination required and failed to meet our original timeline commitments. When it became clear this would happen, we sent an email to authors suggesting that they prepare their papers for re-submission to other conferences before our decisions came out. In future iterations of our track, we will work to plan deadlines more strategically.

Lessons Learned

This pilot year has been a learning experience, including some challenging community feedback that prompted important reflection.

What We’ve Learned:

We need to provide much clearer guidance about appropriate submissions and review expectations including examples that people can reference;
We need to overrecruit and overassign reviewers to papers at a higher rather than dis-similar tracks to ensure a minimum of three reviews for every paper by the deadline;
The position paper format works best when facilitating broad, meaningful discussions.

Addressing Community Concerns:

On Exclusivity: Our small acceptance pool raised concerns about limited diversity of viewpoints. While we have a small number of papers, we believe they touch on many of the topics we saw voiced by the community. We believe future position paper tracks should be resourced to target a higher acceptance rate so that more authors have the opportunity to present.
On Review Process: Our track’s different processes were difficult for some authors who followed prior venue guidance for improving paper acceptance, meanwhile some authors and reviewers took the changes at face value. Future chairs should be more creative with the strategies we use not only to communicate our ideas, but to listen to feedback and constructive criticism.
On Timeline Challenges: We needed more time to recruit, assign, and complete emergency reviews. This resulted in a final decision date that overlapped with other conferences. In future years, chairs should be more cognizant of deadlines for our sister conferences and try to provide further advance notice to authors on how to handle the conflict.

We put out blog posts and communicated via OpenReview for our decision-making standards and changes to timelines regularly and at important timepoints. We would love the community’s advice on how to better communicate, our current channels (including blog posts!) may not be effectively reaching the audience for our messages.

Gratitude and Next Steps

The success of this experiment hinges on your willingness to engage with something new and trust us with your research. Whether your paper was selected or not, your contribution to this inaugural year has been meaningful and appreciated. Thank you.

The conversations that matter most in our field—about ethics, impact, and AI’s future—don’t end with a single conference track. They continue in labs, journals, online discussions, and countless ways our community engages with challenges and opportunities ahead.

Thank you for making the first NeurIPS Position Paper Track possible! We hope you enjoy all of the discussions sparked by these papers at NeurIPS in December!

September 30 2025

Reflections on the 2025 Review Process from the Program Committee Chairs

Communications Chairs 2025 2025 Conference

NeurIPS has grown at an unprecedented pace in recent years, fundamentally reshaping how the conference operates. To provide transparency and share insights into how NeurIPS continues to refine its processes, we—the Program Committee (PC) Chair Team—are publishing this blog post on our experience managing the review process at scale, with support from the Communication Chairs.

NeurIPS has multiple tracks, each with their own chairs and program committees, as well as their own processes and criteria for selecting papers. This year the three tracks that constitute the poster sessions in the main conference include the Main Program track, Datasets and Benchmarks track and the Position Paper track. While some attempts have been made to align some of these tracks, each track is unique and thus should not adopt the exact same processes. This blog post focuses specifically on the Main Program, but the Datasets and Benchmarks chairs have released a blog post, as have the Position Paper Track chairs.

How the scale of NeurIPS submissions impacts calibration on decision-making

Recruiting SACs, ACs, and Reviewers at Scale

Like most AI conferences, NeurIPS has seen a substantial increase in the number of submissions in the past few years, growing from 9,467 submissions in 2020 to 21,575 in 2025. Critically, this increase in submissions not only is a challenge from the perspective of ensuring that there are enough reviewers, area chairs (ACs), and senior area chairs (SACs) to give papers fair review (thank you to all of the 20,518 reviewers, 1,663 ACs and 199 SACs who work tirelessly to make this happen), but it also introduces effects that make the review process noisier. For example, recruiting has to happen at a larger scale, but this creates a greater possibility that papers might be reviewed and handled by less experienced reviewers/ACs/SACs. For this year, we had to recruit a larger number of ACs from self-nomination forms, so we had a larger number of first-time NeurIPS ACs. Another example is that the NeurIPS community continues to expand in topic breadth over time, with backgrounds ranging from statistical physics to psychology joining the field, and new topics such as LLMs becoming highly popular as existing fields evolve. This means that the existing pool of reviewers may not be topically qualified to handle these topics – and even experienced reviewers may struggle to cover some topics when the relevant expert communities are still emerging.

Calibration Process
For this reason, PC chairs need to focus more on calibration of decision-making, doing their best to ensure that the reviewing process remains reliable in the presence of this noise. This year, we relied on a range of signals for this – for example, if there is disagreement between reviewers, or between reviewers and the AC, we often look for consensus between the AC and SAC to calibrate. We found it extremely helpful when ACs actively communicated and provided feedback in this process. Many ACs fought for papers they thought were good even if reviewers disagreed, and often we followed their lead, ending in an accept decision for the paper. Conversely, sometimes ACs flagged significant negative issues with evidence after the rebuttal that reviewers did not catch, and we ask SACs to carefully discuss these issues with the ACs to ensure the evidence is convincing. This then led to reject decisions even when papers had high ratings. We understand and we are truly regretful that this might have severely disappointed some authors. However, we trusted the ACs and SACs to be experienced and professional members of the community whose consensus guards the quality of the program. Finally, we note in some cases decisions had to be reverted for reasons outside of what could be disclosed to the AC. This included situations like the 11 unfortunate cases where one of the co-authors are confirmed to be grossly negligent in their reviews under the our new responsible reviewing policies, in line with practices by CVPR and other sister conferences who have adopted similar policies.

At the same time, we acknowledge that the calibration process is not perfect, and it is impossible for it to be perfect. Calibrating decisions is a constrained optimization process. Everyone involved has limited resources: authors do not have infinite time to comment, and reviewers, ACs, SACs and PCs do not have infinite bandwidth to read and respond. Even if for some individual cases, the decision could be retrospectively improved, we cannot revisit these decisions as if the original constraints driving the decision did not exist. For example, some authors asked why they could not make rebuttals if ACs raised new issues independently of the reviewers, but applying this fairly across reviews would require adding additional weeks to the reviewing timeline, which would derail conference planning due to fixed timeframes.

Upholding Scientific Standards Despite Challenges

Nonetheless, we are optimistic that on balance, NeurIPS decisions were consistent with the standards expected from NeurIPS papers established by prior years. The main track this year received 21,575 valid paper submissions, of which 5,290 were accepted. That is, the main track had an acceptance rate of 24.52%, on par with that of previous years. In general, accepted papers were consistent with reviewer impressions of papers (Figure 1), with the PCs manually reviewing many outliers (but not all, due to time resource constraints.)

Figure 1. A histogram of scores across papers in the main program track, color coded by accept versus reject decisions.

Finally, we want to address the topic of space constraints, since some community members were concerned that papers meeting the NeurIPS standard were rejected for this reason. In principle, space is a constraint we have to be mindful of, as our physical venues can only permit so many papers. We were transparent about this with our SACs and monitored it closely as part of our duty. For the main track, however, as we put our attention on resolving boundary and outlier decisions using the consensus of ACs and SACs during calibration, we reached a comfortably lower number of accepted papers than the capacity of our venue holds. That is, space ultimately played a negligible role compared to scientific merit.

An initiative on responsible reviewing this year

Review processes at NeurIPS are always evolving, and not just because the PC chairs have to keep up with the increased scale of the conference and new contexts – we also want to improve the reviewing process from year-to-year and ensure we help build a sustainable conference process that is responsive to community needs and concerns. For this reason, in addition to our duties directly managing the review process, we often focus on specific areas for improvement from year to year. For this year, we have focused on the topic of responsible reviewing.

Among other initiatives, we announced new policies under the responsible reviewing initiative (see our previous blog post). Towards the goal of improving the quality of reviews, this policy allows PCs and other tracks’ chairs to withhold reviews from authors whose own reviewing obligations were unmet on time. In addition, when the reviewers are grossly negligent during the review process, their co-authored papers need to be desk-rejected. In addition, we paid attention to the issue of collusion rings and fake peer review accounts, which had been hot areas of discussion in the community in previous years. This has led to a lower reliance on bidding, as we could not assume everyone was a good actor in face of evidence some were not. In turn, this led to a greater reliance on the matching system used by OpenReview, which likely introduced different types of noise in reviews this year, increasing our workload in calibration. However, we feel this to be an important trade-off to disrupt these academic integrity challenges.

A call for feedback

Overall, we would like to encourage the community to continue to provide feedback on how to improve the review process at NeurIPS. This feedback is critical for us to understand how the changes we implement from year to year to evolve NeurIPS may affect the community. For example, one new source of noise we observed this year was reviewers increasing their scores just to end back-and-forth discussion with the authors, but being silent or critical in private discussions with the AC, which leads us to believe that increasing levels of author engagement in rebutting their papers is now leading to reviewer fatigue. We encourage the community to raise issues like this with us, so we can factor this into future years’ planning. In particular, we encourage community members to attend the Town Hall, which will be hosted on-site at the conference this year, where they can ask questions and raise comments to the organizers in real-time.

September 30 2025

Reflecting on the 2025 Review Process from the Datasets and Benchmarks Chairs

Communications Chairs 2025 2025 Conference

The Datasets and Benchmarks (DB) track, since introduced in 2021, has played a pivotal role in raising the profile of datasets and benchmarks, which is foundational to machine learning, in the NeurIPS community. As this track continues to grow and refine its processes from year to year, we, the DB Chairs, decided to write this blog post to make our process this year transparent. This blog post focuses specifically on the DB track, but the Program Committee chairs have also written a blog post, as have the Position Track chairs.

Raising the standards for dataset and benchmark submissions

As the NeurIPS Datasets and Benchmarks (DB) track continues to mature, its growth in submissions is beginning to stabilize, as is also its establishment within the community. After three years of exponential growth, the track received 1,995 submissions this year. While this is a large number, the increase from the 1,820 submissions received last year is smaller than in previous years (for comparison, there were 987 submissions in 2023), suggesting that the track has begun to stabilize. In light of this maturity, significant steps were taken for the second consecutive year to align its standards and processes with the main track. This strategic alignment aims to ensure that papers involving datasets and benchmarks are held to the same rigorous criteria as main track papers, and maintain the reputation for quality that NeurIPS proceedings papers have. This is also particularly relevant for papers that blur the lines across tracks (most frequently benchmarks and evaluations).

A first step in the direction of alignment with the main track began in 2024, when DB track saw an acceptance rate of 25.3% closely mirroring the main program’s 25.8% rate. For 2025, our goal was to align even more with the main track by working together from recruiting reviewers and ACs, adapting the majority of the main track’s reviewing processes, and finally jointly implementing the responsible reviewing initiative. This initiative is designed to safeguard the quality of reviews, addressing issues of late or low-quality feedback that can hinder the peer-review process. All this was aimed at ensuring a consistent and high level of scrutiny for all submissions.

Building on the lessons learned from previous years, the DB track chairs have also focused on streamlining the submission and review process for datasets. The objective was to create a more standardized submission process for authors and a more efficient evaluation method for reviewers, and in this to allow for easier access, comparison, and assessment of datasets. We elaborated on these changes in our blog post titled NeurIPS Datasets & Benchmarks: Raising the Bar for Dataset Submissions. We highlighted the evolving nature of AI research, where the distinction between a “dataset paper” and a “main track” paper can be nuanced. To address this, the updated best practices and requirements for 2025 include more stringent criteria for dataset submission, hosting, and reproducibility. These measures are intended to ensure that datasets are not only useful and accessible at the time of publication but also remain so over time, thereby improving their accessibility for the review process.

Many of these enhancements are reflected in the updated Call for Papers for the DB track, which this year directly references the main track’s call for papers while providing specific guidelines for dataset and benchmark submissions. This change sends a clear message about the intent to elevate the quality of the DB track submissions to the well-established standards of the main conference.

DB Track calibration process for ensuring fairness and consistency

A critical component of our commitment to maturing the DB track is the careful calibration of our decision-making process. Like all tracks at NeurIPS, we must ensure that reviewing standards are applied consistently and fairly across all submissions, accounting for the natural variance that exists among reviewers, Area Chairs (ACs), and Senior Area Chairs (SACs). As the Program Committee Chairs have noted for the main track in their blog post, many factors can introduce noise into reviewer scores and feedback. It is our responsibility as the DB track chairs to mitigate this and ensure that every paper receives a fair evaluation.

In line with our primary goal of raising the standards and streamlining processes for the DB track, this year, we collaborated closely with the PC chairs to parallel many of the main track reviewing processes. This marks a significant step forward from previous years, where the DB track largely defined its own standards. While our processes were not identical, as DB papers have unique considerations, we adopted similar protocols to the main track for resolving disagreements, such as supporting ACs and SACs throughout the process of calibration for the final decision making.

This year, we implemented two significant changes to the review process, hand in hand with the main track. First, we introduced a revised scoring system. Second, we rolled out the responsible reviewing initiative, which focused on safeguarding review quality and timeliness.

The new scoring system may have influenced how reviewers assessed submissions. In particular, we have observed two distinct trends in the DB track during this process:

Increased average score: Unlike papers in the main track, which one can expect to be more method and algorithm oriented, it is less common for a dataset submission to be fundamentally “technically incorrect.”
Subjective nature of contributions: Evaluating the merit of a dataset or benchmark can be highly subjective. A dataset that fills a critical gap for a smaller, long-tail research area might be just as valuable or novel as one that targets a well-established “head” problem.

We think that these factors can lead to reviewer scores that skew higher and have a tighter distribution compared to the papers in the main track. As a result, it can be difficult for the ACs to differentiate clearly between two papers that may have the same average score but vastly different merits and trade-offs.

To address this, this year, at the end of the calibration process, which was mirrored from the main track, we implemented an additional step that allowed us to have a nuanced evaluation system, rather than relying solely on numerical scores. Thus, we asked our SACs, based on their discussion with the ACs, to produce a relative ranking of the papers within their stack of papers. The same procedure was also done last year. The main difference is that last year, we did not have a template for the rankings, and SACs were asked to explain their ranking in a live meeting. To be mindful of the increased effort, this year, we structured this interaction using a ranking form. For any paper ranked below the track-wide average score of 4.25, SACs were required to provide a detailed description of its merits and motivation. This combination of relative ranking and qualitative justification provided a much richer signal.

Provide us with feedback

As with all tracks, community feedback is essential for us to continue to improve our processes: please reach out if you have any feedback. This year, we advanced many changes to improve the quality of datasets and benchmarks papers at NeurIPS, under the idea that a NeurIPS DB paper should not just be correct, but also meet standards for impact and scientific relevance. In future years, we would like to better describe these criteria to reviewers/ACs/SACs, and would invite feedback on this, especially. In particular, we encourage community members to attend our Town Hall at the conference venue, where they can ask questions and provide comments to the organizers in real-time.

September 10 2025

2025 Invited Speaker Lineup Announced

Communications Chairs 2025 2025 Conference

The NeurIPS 2025 organizing committee is pleased to announce the invited speakers for this years conference. The lineup includes six researchers representing areas spanning theoretical machine learning, reinforcement learning, AI and society, cognitive science, natural language processing, and deep learning applications.

Kyunghyun Cho

Glen de Vries Professor of Health Statistics, NYU; Executive Director of Frontier Research, Prescient Design, Genentech
Website: kyunghyuncho.me

Cho’s work spans machine learning and natural language processing. He co-developed the Gated Recurrent Unit (GRU) architecture and has contributed to neural machine translation and sequence-to-sequence learning. He is a CIFAR Fellow of Learning in Machines & Brains and received the 2021 Samsung Ho-Am Prize in Engineering. He served as program chair for ICLR 2020, NeurIPS 2022, and ICML 2022.

Yejin Choi

Professor of Computer Science, Stanford University; Dieter Schwarz Foundation Senior Fellow, Stanford HAI; Distinguished Scientist, NVIDIA
Website: yejinc.github.io

Choi’s research focuses on natural language processing, with emphasis on commonsense reasoning and language understanding. She is a 2022 MacArthur Fellow and was named to Time’s Most Influential People in AI in 2023. She has received multiple Test of Time Awards from ACL and CVPR, and Best Paper Awards at venues including ACL, EMNLP, ICML, and NeurIPS. She previously held positions at the University of Washington and Allen Institute for AI.

Melanie Mitchell

Professor, Santa Fe Institute
Website: melaniemitchell.me

Mitchell’s research areas include AI, cognitive science, and complex systems, with focus on conceptual abstraction and analogy-making in humans and AI systems. She authored “Complexity: A Guided Tour,” which won the 2010 Phi Beta Kappa Science Book Award, and “Artificial Intelligence: A Guide for Thinking Humans,” which was named one of the five best books on AI by both the New York Times and the Wall Street Journal. She received her PhD from the University of Michigan under Douglas Hofstadter, with whom she developed the Copycat cognitive architecture.

Andrew Saxe

Professor of Theoretical Neuroscience & Machine Learning, Gatsby Computational Neuroscience Unit and Sainsbury Wellcome Centre, UCL
Website: saxelab.org

Saxe’s research focuses on mathematical theories of learning in neural networks. He has developed exact solutions for learning dynamics in deep linear networks and studies connections between artificial and biological learning systems. He is a CIFAR Fellow of Learning in Machines & Brains and recipient of the 2019 Wellcome Trust Beit Prize. His work includes theoretical analyses of semantic development and the dynamics of representation learning.

Richard Sutton

Research Scientist, Keen Technologies; Professor, University of Alberta; Chief Scientific Advisor, Amii Chief Scientific Officer, ExperienceFlow.ai.
Website: incompleteideas.net

Sutton co-developed temporal difference learning and policy gradient methods in reinforcement learning. He received the 2024 Turing Award with Andrew Barto for foundational contributions to reinforcement learning. He is co-author of the textbook “Reinforcement Learning: An Introduction” and is a Fellow of the Royal Society and the Royal Society of Canada. His research focuses on computational principles underlying learning and decision-making.

Zeynep Tufekci

Henry G. Bryant Professor of Sociology and Public Affairs, Princeton University; New York Times Columnist
Website: zeynep.me

Tufekci examines the interplay of science, technology and society through a sociological framework and complex systems lens, focusing especially on digital, computational, and artificial intelligence technologies. She was a 2022 Pulitzer Prize finalist for commentary on the COVID-19 pandemic. Her book “Twitter and Tear Gas: The Power and Fragility of Networked Protest” examines the dynamics of social movements in the digital age. She is also faculty associate at the Berkman Klein Center for Internet & Society at Harvard University.