Alina Beygelzimer, Yann Dauphin, Percy Liang, Jennifer Wortman Vaughan
NeurIPS 2021 Program Chairs
Joaquin Vanschoren and Serena Yeung
NeurIPS 2021 Datasets & Benchmarks Chairs
NeurIPS 2021 will begin next week! As we prepare for the conference, we are delighted to take a moment to announce the recipients of the 2021 Outstanding Paper Awards, the Test of Time Award, and the new Datasets and Benchmarks Track Best Paper Awards.
First, we would like to say a huge thank you to the members of the community who led the award selection process. The Outstanding Paper Award committee consisted of Alice Oh, Daniel Hsu, Emma Brunskill, Kilian Weinberger, and Yisong Yue. The Test of Time Award committee consisted of Joelle Pineau, Léon Bottou, Max Welling, and Ulrike von Luxburg. We would also like to thank Nati Srebro, who helped set up the process for Outstanding Paper Award, and the members of the community who provided subject-matter expertise on specific papers and topics.
Outstanding Paper Awards
This year six papers were chosen as recipients of the Outstanding Paper Award. The committee selected these papers due to their excellent clarity, insight, creativity, and potential for lasting impact. Additional details about the paper selection process are provided below. While there is of course no perfect process for choosing award papers, we believe the NeurIPS community will appreciate the extremely strong contributions of these papers.
The award recipients are (in order of paper ID):
- A Universal Law of Robustness via Isoperimetry
By Sébastien Bubeck and Mark Sellke.
This paper proposes a theoretical model to explain why many state-of-the-art deep networks require many more parameters than are necessary to smoothly fit the training data. In particular, under certain regularity conditions about the training distribution, the number of parameters needed for an O(1)-Lipschitz function to interpolate training data below the label noise scales as nd, where n is the number of training examples, and d is the dimensionality of the data. This result stands in stark contrast to conventional results stating that one needs n parameters for a function to interpolate the training data, and this extra factor of d appears necessary in order to smoothly interpolate. The theory is simple and elegant, and consistent with some empirical observations about the size of models that have robust generalization on MNIST classification. This work also offers a testable prediction about the model sizes needed to develop robust models for ImageNet classification.
This paper will be presented Tuesday, December 7 at 08:20 GMT (12:20 am PST) in the session on Deep Learning Theory and Causality.
- On the Expressivity of Markov Reward
By David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael Littman, Doina Precup, and Satinder Singh.
Markov reward functions are the dominant framework for sequential decision making under uncertainty and reinforcement learning. This paper provides a careful, clear exposition of when Markov rewards are, or are not, sufficient to enable a system designer to specify a task, in terms of their preference for a particular behavior, preferences over behaviors, or preferences over state and action sequences. The authors demonstrate with simple, illustrative examples that there exist some tasks for which no Markov reward function can be specified that induces the desired task and result. Fortunately, they also show that it is possible in polynomial time to decide if a compatible Markov reward exists for a desired setting, and if it does, there also exists a polynomial time algorithm to construct such a Markov reward in the finite decision process setting. This work sheds light on the challenge of reward design and may open up future avenues of research into when and how the Markov framework is sufficient to achieve performance desired by human stakeholders.
This paper will be presented Tuesday, December 7 at 09:20 GMT (1:20 am PST) in the session on Reinforcement Learning.
- Deep Reinforcement Learning at the Edge of the Statistical Precipice
By Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, and Marc G. Bellemare.
Rigorous comparison of methods can accelerate meaningful scientific advances. This paper suggests practical approaches to improve the rigor of deep reinforcement learning algorithm comparison: specifically, that the evaluation of new algorithms should provide stratified bootstrap confidence intervals, performance profiles across tasks and runs, and interquartile means. The paper highlights that standard approaches for reporting results in deep RL across many tasks and multiple runs can make it hard to assess if a new algorithm represents a consistent and sizable advance over past methods, and illustrates this with empirical examples. The proposed performance summaries are designed to be feasible to compute with a small number of runs per task, which may be necessary for many research labs with limited computational resources.
his paper will be presented Wednesday, December 8 at 16:20 GMT (8:20 am PST) in the session on Deep Learning.
- MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
By Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, and Zaid Harchaoui.
This paper presents MAUVE, a divergence measure to compare the distribution of model-generated text with the distribution of human-generated text. The idea is simple and elegant, and it basically uses a continuous family of (soft) KL divergence measures of quantized embeddings of the two texts being compared. The proposed MAUVE measure is essentially an integration over the continuous family of measures, and aims to capture both Type I error (generating unrealistic text) and Type II error (not capturing all possible human text). The empirical experiments demonstrate that MAUVE identifies the known patterns of model-generated text and correlates better with human judgements compared to previous divergence metrics. The paper is well-written, the research question is important in the context of rapid progress of open-ended text generation, and the results are clear.
This paper will be presented Tuesday, December 7 at 8:00 GMT (midnight PST) in the session on Deep Learning.
- Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms
By Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, and Adrien Taylor.
This paper describes a “continuized” version of Nesterov’s accelerated gradient method in which the two separate vector variables evolve jointly in continuous-time—much like previous approaches that use differential equations to understand acceleration—but uses gradient updates that occur at random times determined by a Poisson point process. This new approach leads to a (randomized) discrete-time method that: (1) enjoys the same accelerated convergence as Nesterov’s method; (2) comes with a clean and transparent analysis that leverages continuous-time arguments, which is arguably easier to understand than prior analyses of accelerated gradient methods; and (3) avoids additional errors from discretizing a continuous-time process, which stands in stark contrast to several previous attempts to understand accelerated methods using continuous-time processes.
This paper will be presented Wednesday, December 8 at 16:00 GMT (8:00 am PST) in the session on Optimization.
- Moser Flow: Divergence-based Generative Modeling on Manifolds
By Noam Rozen, Aditya Grover, Maximilian Nickel, and Yaron Lipman.
This paper proposes a method for training continuous normalizing flow (CNF) generative models over Riemannian manifolds. The key idea is to leverage a result by Moser (1965) that characterizes the solution of a CNF (which Moser called an orientation preserving automorphism on manifolds) using a restricted class of ODEs that enjoys geometric regularity conditions, and is explicitly defined using the divergence of the target density function. The proposed Moser Flow method uses this solution concept to develop a CNF approach based on a parameterized target density estimator (which can be a neural network). Training amounts to simply optimizing the divergence of the density estimator, which side-steps running an ODE solver (required for standard backpropagation training). The experiments show faster training times and superior test performance compared to prior CNF work, as well as the ability to model densities on implicit surfaces with non-constant curvature such as the Stanford Bunny model. More generally, this concept of exploiting geometric regularity conditions to side-step expensive backpropagation training may be of broader interest.
This paper will be presented Saturday, December 11 at 00:00 GMT (Friday, December 10 at 4:00 pm PST) in the session on Generative Modeling.
The Outstanding Paper Committee determined a selection process with the goal of identifying an equivalence class of outstanding papers that represent some of the breadth of excellent research being conducted by the NeurIPS community.
The committee was given an initial batch of 62 papers including all papers that received an Oral slot and papers explicitly nominated by an Area Chair or Senior Area Chair. The committee used three phases of down-selection. In Phase 1, each paper in this initial batch was assigned one primary reader who determined if the paper should move on to Phase 2. In Phase 2, each paper was assigned an additional secondary reader. In Phase 3, all remaining papers were considered by the entire committee, and the primary and secondary readers were charged with articulating why a paper should be deserving of an award. In each subsequent phase, the committee made increasingly critical assessments and also made sharper comparisons across papers. In the later phases, the committee occasionally sought external input from subject matter experts. Thirty-two papers remained after Phase 1, thirteen after Phase 2, and the final six after Phase 3.
The committee identified two types of conflict of interest. Committee members with domain conflicts (e.g., authors from the same institution as the committee member), were not assigned as the primary or secondary readers on a paper. Committee members with personal conflicts (e.g., advisor/advisee relationships, previous co-authorship) were not assigned as readers and were additionally not allowed to provide input on whether the paper belonged in the final equivalence class.
Test of Time Award
Last but certainly not least, we are thrilled to announce that the recipient of the NeurIPS 2021 Test of Time Award is Online Learning for Latent Dirichlet Allocation by Matthew Hoffman, David Blei, and Francis Bach.
This paper introduces a stochastic variational gradient based inference procedure for training Latent Dirichlet Allocation (LDA) models on very large text corpora. On the theoretical side it is shown that the training procedure converges to a local optimum and that, surprisingly, the simple stochastic gradient updates correspond to a stochastic natural gradient of the evidence lower bound (ELBO) objective. On the empirical side the authors show that for the first time LDA can be comfortably trained on text corpora of several hundreds of thousands of documents, making it a practical technique for “big data” problems. The idea has made a large impact in the ML community because it represented the first stepping stone for general stochastic gradient variational inference procedures for a much broader class of models. After this paper, there would be no good reason to ever use full batch training procedures for variational inference anymore.
Historically, the Test of Time Award has been awarded to a paper from the NeurIPS conference 10 years back. In 2020, the committee considered a broader range of papers and ended up selecting a recipient from 2011 instead of 2010. Because of this, this year, we gave the Test of Time Award Committee the option of choosing any paper from 2010 or 2011. After some discussion, the committee decided to focus specifically on 2010 since no paper published at that conference had previously been honored.
The committee first ranked all NeurIPS 2010 papers according to citation count. They defined a cutoff threshold at roughly 500 citations and considered all papers that achieved at least this citation count. This resulted in 16 papers. The committee took two weeks to read those papers (with each paper read by one or more committee members) and then met to discuss.
In this discussion, there was exactly one paper supported by all four members of the committee: Online Learning for Latent Dirichlet Allocation. Each of the committee members ranked this paper higher than all the other candidate papers and there was no strong runner-up, so the decision was easy and unanimous.
The Test of Time Award talk will take place in the final session of the conference, Saturday, December 11 at 01:00 GMT (Friday, December 10 at 5:00 pm PST).
Datasets & Benchmarks Best Paper Awards
This year NeurIPS launched the new Datasets & Benchmarks track, to serve as a venue for data-oriented work. We are pleased to announce two best paper awards from this track. A short list of papers were selected based on reviewer scores. The final selected papers were chosen from this list based on a vote from all members of the advisory board. Both papers will be presented in the Datasets and Benchmarks Track 2 Session on Wednesday, December 8 at 16:00 GMT (8:00 am PST).
The award recipients are:
- Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
By Bernard Koch, Emily Denton, Alex Hanna, and Jacob Gates Foster.
This paper analyzes thousands of papers and studies the evolution of dataset use within different machine learning subcommunities, as well as the interplay between dataset adoption and creation. It finds that in most communities, there is an evolution towards using fewer different datasets over time, and that these datasets come from a handful of elite institutions. This evolution is problematic, since benchmarks become less generalizable, biases that exist within the sources of these datasets may be amplified, and it becomes harder for new datasets to be accepted by the research community. This is an important ‘wake up call’ for the machine learning community as a whole, to think more critically about which datasets are used for benchmarking, and to put more emphasis on the creation of new and more varied datasets.
- ATOM3D: Tasks on Molecules in Three Dimensions
By Raphael John Lamarre Townshend, Martin Vögele, Patricia Adriana Suriana, Alexander Derry, Alexander Powers, Yianni Laloudakis, Sidhika Balachandar, Bowen Jing, Brandon M. Anderson, Stephan Eismann, Risi Kondor, Russ Altman, and Ron O. Dror.
This paper introduces a collection of benchmark datasets with 3D representations of small molecules and/or biopolymers for solving a wide range of problems, spanning single molecular structure prediction and interactions between biomolecules as well as molecular functional and design/engineering tasks. Simple yet robust implementations of 3D models are then benchmarked against state-of-the-art models with 1D or 2D representation, and show better performance over lower-dimensional counterparts. This work provides important insight about how to choose and design models for a given task. Not only does this work provide benchmarking datasets, it also provides baseline models and open source tools to leverage these datasets and models, dramatically lowering the entry barrier for machine learning people to get into computational biology and molecule design.
Congratulations to all the award recipients!