Here are the highlights for the first day of NeurIPS 2021, which is dedicated to Tutorials!
There will be 10 tutorials in total. Each tutorial is four hours long, and there will be two tutorials running in parallel at any given time. Tutorials start on Monday 6 at 9.00 UTC-0 and end on Tuesday 7 at 5 UTC-0. The full list of tutorials and further details are available on the schedule page. Notice that registration won’t be required to attend tutorials (only login is required), however, registration is required to interact with Tutorial presenters as well as to access most of the other content of this year’s virtual NeurIPS conference.
Socials. We also have our first social gathering today, ML in Korea.
Opportunities. Remember to take advantage of our career website and mentorship opportunities, hang out at the café and send us your feedback via email or in the rocketchat channel #townhall.
Tomorrow, Day 2. We start the main conference program as well as several other socials, demonstrations, competitions and presentations from authors of the new track on Datasets and Benchmarks!
Alina Beygelzimer, Yann Dauphin, Percy Liang, and Jennifer Wortman Vaughan NeurIPS 2021 Program Chairs
As the impact of machine learning research grows, so does the risk that this research will lead to harmful outcomes. Several machine learning conferences—including ACL, CVPR, ICLR, and EMNLP—have taken steps to establish ethical expectations for the research community, including introducing ethics review processes. Last year, NeurIPS piloted its own ethics review process, chaired by Iason Gabriel. This year, we aimed to expand the ethics review process and ensure that it is in line with ongoing efforts to establish NeurIPS Ethics Guidelines that have been spearheaded by Marc’Aurelio Ranzato as General Chair.
At this early stage of adoption, ethics reviews are meant to be educational, not prohibitive. Our goal is not to police submissions, but instead to prompt reflection. The process that we implemented was intended to support this goal.
In some ways, the process has been a success. We were able to recruit qualified Ethics Reviewers with diverse areas of expertise; many Reviewers, AC, and authors engaged constructively with these Ethics Reviewers; and authors improved their papers based on the feedback they received. However, there are several ongoing challenges with this new process, including how to surface the right set of papers to undergo ethics review; how to fit the ethics review process into the overall paper review timeline without overburdening Ethics Reviewers; and how to set clear expectations in order to achieve more alignment between Ethics Reviewers on what constitutes an ethical issue and when an ethical issue has been properly addressed. In this blog post, we discuss the ethics review process and share some of our learnings and recommendations going forward.
Overview of the Ethics Review Process
We view ethics reviews as a way of obtaining an additional expert perspective to inform paper decisions and provide feedback to authors. To implement this, we allowed Reviewers and Area Chairs (ACs) to flag papers for ethics review, just as ACs sometimes solicit external expert perspectives on technical issues. When a paper was flagged, Ethics Reviewers were added to the committee for the paper, included in discussions about the paper, and given the opportunity to interact with the authors during the rolling discussion period, just like other members of the committee. Ethics Reviewers were encouraged to provide constructive criticism that could lead to project improvement and maturity, similar to standard paper reviews.
Because this process was for educational purposes first, Ethics Reviewers were assigned to each flagged paper, regardless of how likely it was that the paper would be accepted on technical grounds. This was a change from last year and required us to recruit a larger pool of Ethics Reviewers. However, we believe that all authors can benefit from feedback on the ethical aspects of their research and the opportunity to integrate this feedback into future iterations of the work. While it would reduce the burden on Ethics Reviewers, only soliciting ethics reviews for papers likely to be accepted would counteract our goal of making this a constructive process for everyone in the community, rather than just operating as a filter to catch unethical content prior to publication.
Before the process began, we recruited 105 Ethics Reviewers with a wide range of disciplinary backgrounds. Ethics considerations in machine learning research are quite diverse—ranging from standard research ethics issues, like obtaining appropriate consent from human subjects, all the way to substantially thornier issues concerning the downstream negative societal impact of the work—so we understood the importance of working with a range of experts and allowing them to weigh in on issues aligned with their expertise. The breakdown of expertise is in the table below; some Ethics Reviewers had expertise in more than one area.
Number of Ethics Reviewers with this expertise
Number of papers flagged with issues
Discrimination / Bias / Fairness Concerns
92
34
Inadequate Data and Algorithm Evaluation
43
22
Inappropriate Potential Applications & Impact (e.g., human rights concerns)
47
52
Legal Compliance (e.g., GDPR, copyright, terms of use)
13
28
Privacy and Security (e.g., consent)
34
51
Responsible Research Practice (e.g., IRB, documentation, research ethics)
45
30
Research Integrity Issues (e.g., plagiarism)
24
47
During the review process, Reviewers had the chance to flag papers for ethics review by checking a box on the review form. For flagged papers, Reviewers could specify the areas of expertise required to properly assess the paper from the list in the table. (We note that the specific list of areas in the table was created based on common issues that arose last year and our own expectations about the types of ethics issues that might be flagged. However, it is not perfect, as we discuss more below.) In total, Reviewers flagged 265 papers out of 9122 submissions. The breakdown of issues flagged is in the last column of the table; note that some papers were flagged for more than one area of concern.
We designed an algorithm to assign papers to Ethics Reviewers with the right expertise while minimizing the maximum number of papers each Ethics Reviewer was assigned. Each flagged paper was reviewed by 2 Ethics Reviewers and each Ethics Reviewer had on average 4–5 papers to review.
Since Ethics Reviewers could not be assigned until initial reviews were submitted (and therefore papers flagged), the ethics review period coincided with the initial author response period. During this period, authors could see if their paper had been flagged or not, but ethics reviews only became available later during the rolling discussion period. Once available, authors, other Reviewers, and ACs were able to respond to the ethics reviews and engage in discussion with the assigned Ethics Reviewers.
In most cases, the issues raised in ethics reviews were not significant enough to prevent publication. In a small number of cases in which more serious issues were raised, papers were escalated to the Ethics Review Chairs and Program Chairs to deliberate and make the final decision on the paper, taking into consideration feedback from all parties as well as any responses from the authors. This resulted in a small number of papers being conditionally accepted and one paper being rejected on ethical grounds, as discussed below.
Ethical Issues Identified
The ethics review process brought to the surface a variety of ethical issues that regularly appear in research papers submitted to NeurIPS. The most common types of issues encountered at NeurIPS involve:
A lack of sufficient reflection around topics that involve thorny ethical considerations. For example:
Generation models that could be used to generate realistic fake content for misinformation campaigns and may exhibit bias issues in their generated content. These include text generation and image generation (notably including face generation) applications, as well as voice conversion.
Biometric surveillance projects that raise privacy concerns and could be used in sensitive contexts such as criminal justice. These include facial recognition and voice recognition projects.
The continued use of deprecated datasets that had already been explicitly removed from circulation by their authors. Such datasets include Duke MCMC, Celeb-1M, and Tiny Images, whose use has been discouraged by authors due to ethical reasons.
Inappropriate communication and publication release around identified security vulnerabilities. For example, adversarial attack models applied to publicly deployed systems not being adequately disclosed to the involved agency before publication.
A lack of transparency on model or data details and decision-making, as it relates to ethical concerns. For example:
Failing to discuss potential biases deriving from the use of a method or dataset. A lack of documentation of acknowledgement of such issues, often manifesting in a lack of diverse examples featured in the paper or evaluating performance on a homogenous demographic.
Not providing adequate details on data provenance or distribution.
A lack of communication of the details of annotator work conditions.
Issues appropriately handling or sourcing data involving humans. For example:
Collecting information about individuals with no concern for privacy and consent.
Violating copyright restrictions.
Indications of mistreatment of Mturk workers, or annotator and data collection practices that seem exploitative.
Lack of sending the project through an Institutional Review Board (IRB) in situations clearly involving human subjects.
Uncritically emphasising explicitly harmful applications, such as police profiling.
In many cases in which issues were identified, Ethics Reviewers simply recommended that authors reflect on the issues and include a discussion of them in the paper, either by expanding the discussion of potential negative societal impacts or being more explicit about limitations of the work. In other cases, Ethics Reviewers recommended more substantial modifications to the work, such as running additional experiments, the use of a different dataset, data/code distribution restrictions, or increased transparency measures like the inclusion of model or dataset documentation.
In some cases, the concerns raised were so critical that the acceptance of the paper was made conditional on the authors implementing the suggested mitigations. All such cases were discussed by the Program Chairs and Ethics Review Chairs, and the Ethics Reviewers were consulted in determining conditions for acceptance. Of eight papers conditionally accepted for ethical reasons, all were eventually accepted.
In a single case, the Program Chairs and Ethics Review Chairs jointly determined that the required mitigations would be so challenging to execute that they were beyond the scope of what the authors could realistically accomplish within the time frame for the camera-ready. In this case, the Program Chairs made the call to reject the paper on ethical grounds.
It should be noted that Ethics Reviewers were not always in agreement with each other. For 61% of submissions reviewed by two ethics reviewers, at least one Ethics Reviewer checked the box in their review form to indicate the paper had no ethical issue; in 42% of these cases, the Ethics Reviewers were split, with the other saying there was an issue, while the other saying there was not. Additionally, for 82% of submissions reviewed by two ethics reviewers, at least one Ethics Reviewer checked the box to indicate that the authors had not acknowledged the issue; for 43% of these cases, the other Ethics Reviewer indicated that the issue had been adequately acknowledged by the authors. We should not expect perfect agreement among Ethics Reviewers, but it is worth considering whether better guidance on what constitutes an ethical issue and how to appropriately address them could be helpful.
Challenges Surfacing the Right Papers for Review
As implemented, the success of the ethics review process hinges on Reviewers and ACs appropriately flagging papers for ethics review. The biggest challenge that we faced—and one area that we as a community will need to work hard to improve if we want ethics review to be a success—is that there was a lot of uncertainty around which papers to flag, leading to inconsistency in which papers received ethics reviews.
97% of papers flagged for ethics review were flagged by only a single Reviewer; across 9122 submissions, only 8 papers were flagged by more than one Reviewer. Considering the 882 papers that were part of the broader consistency experiment (and therefore assigned to two independent committees for review), there were 23 papers in which the original copy was flagged and 22 papers for which the duplicate was flagged, but the overlap between these two sets was only 3 papers. (Another blog post containing the full results of the consistency experiment is coming soon!)
Still, there were some notable differences between papers that were flagged and those that were not. 29% of all flagged submissions were withdrawn compared with 20% of submissions overall. 16% of flagged papers were ultimately accepted compared with 25.6% of papers overall. And these differences are more stark than they appear since the 25.6% acceptance rate includes papers that were desk rejected for formatting violations or withdrawn before they received reviews.
Some of this inconsistency was due to “false positives”—papers that did not actually have issues of concern to Ethics Reviewers, but that were erroneously flagged anyways. As mentioned above, for 61% of flagged submissions with two ethics reviews, at least one Ethics Reviewer checked the box in their review form to indicate there was no ethical issue, with both Ethics Reviewers checking the box for 58% of these. False positives often involved:
Papers that Reviewers didn’t like. For example, some Reviewers flagged papers because the results were poorly presented.
Plagiarism and other serious Code of Conduct violations that were out of scope for the ethics review process and should instead have been escalated to the Program Chairs. We note that plagiarism was erroneously included as an example in the list of ethical issues that could be checked, which was likely the cause of this problem and easy to fix in future years.
In addition to this, there were “false negatives”—papers that were not flagged, even though they should have been. These are difficult to quantify since we don’t know what we missed. However, some false negatives were later surfaced through other means. These included:
Papers that made use of deprecated datasets that had been retracted for ethical reasons. These cases were surfaced by running a search of submissions for mentions of common deprecated datasets late in the review process.
Papers on biometric data generation (e.g., generating face or voice) or surveillance. These cases were again surfaced by running a keyword search late in the review process.
We recommend that in future years, NeurIPS should provide more extensive guidance and training for Reviewers on how to identify which papers to flag. We also recommend that future Program Chairs implement a systematic search for papers with common ethics issues so that these papers can be automatically included for ethics review without the need to rely on Reviewers to flag them.
Highlights and Lessons Learned
Overall, we consider the following to be highlights of the process this year:
We were able to recruit over 100 qualified Ethics Reviewers with diverse areas of expertise, including many from disciplines outside of machine learning. Ethics reviewers took their role seriously and their engagement led to high-quality ethics reviews.
The executed scale of the operation this year gave us confidence in expanding ethical review to accommodate the growing number of flagged cases for a conference of this size. Of the 264 papers flagged, 250 received at least one ethics review; 202 received two ethics reviews; and 48 received at least one review. This means that in total there were at least 452 submitted ethics reviews. In reality, even more reviews were submitted due to additional reviews completed for papers flagged during the discussion period and as part of the Datasets and Benchmarks track.
Many Reviewers, AC, and authors engaged constructively with Ethics Reviewers. Some authors indicated that they benefited from the feedback in their ethics reviews and made significant improvements to their papers as a result. Of the 452 submitted ethics reviews, 140 (31%) had responses from the authors. All eight papers that were conditionally accepted due to ethical issues were ultimately accepted.
As this process is still relatively new, there were also lessons learned, which suggest improvements to the process for future years:
As described above, there were challenges in surfacing the right set of papers to undergo ethics review, with many false positives and false negatives. This culminated with the Program Chairs running keyword searches over submissions late in the review process to identify papers with ethical issues that had been overlooked. We expect that reconsidering the set of ethical areas listed in the review form and providing better guidance on which papers to flag would help. We would additionally encourage future organizers to plan for a more systematic automated flagging of papers early in the review process.
While Ethics Reviewers were not required to read the full details of the papers they were assigned, in practice it was difficult for them to assess whether or not the authors had appropriately reflected on ethical issues without reading the whole paper, which was burdensome given the short period of time allotted for ethics reviews. The pointers that authors included as part of the NeurIPS Paper Checklist were not enough. This is difficult to address; having designated sections on potential negative societal impact and limitations makes this easier, but to catch all potential issues, a full read-through may still be necessary.
There was a fair amount of disagreement among Ethics Reviewers about whether flagged papers had ethical issues and whether these issues were addressed, as discussed above. We should not expect perfect agreement among Ethics Reviewers, but better guidance on what constitutes an ethical issue may be valuable here too.
Since the review process for the new NeurIPS Benchmarks & Datasets track was entirely independent of the review process for the main track, this track was initially omitted from the ethics review process and only incorporated after the review process was underway; 10 papers from that track were then flagged for ethical review and one was rejected in part for ethical concerns. Since many ethical issues are related to data, this track should be included in the ethics review process going forward.
The ethics review process is still quite new, and both community norms and official conference guidelines are still evolving. We are grateful for the opportunity to contribute to this evolution as we all work to ensure that this community operates in the best interests of those impacted by our research. To learn more about how to incorporate ethical practices in your own research, attend the plenary panel on this topic Friday, December 10 at 11pm UTC (3pm PST).
NeurIPS 2021 will begin next week! As we prepare for the conference, we are delighted to take a moment to announce the recipients of the 2021 Outstanding Paper Awards, the Test of Time Award, and the new Datasets and Benchmarks Track Best Paper Awards.
First, we would like to say a huge thank you to the members of the community who led the award selection process. The Outstanding Paper Award committee consisted of Alice Oh, Daniel Hsu, Emma Brunskill, Kilian Weinberger, and Yisong Yue. The Test of Time Award committee consisted of Joelle Pineau, Léon Bottou, Max Welling, and Ulrike von Luxburg. We would also like to thank Nati Srebro, who helped set up the process for Outstanding Paper Award, and the members of the community who provided subject-matter expertise on specific papers and topics.
Outstanding Paper Awards
This year six papers were chosen as recipients of the Outstanding Paper Award. The committee selected these papers due to their excellent clarity, insight, creativity, and potential for lasting impact. Additional details about the paper selection process are provided below. While there is of course no perfect process for choosing award papers, we believe the NeurIPS community will appreciate the extremely strong contributions of these papers.
The award recipients are (in order of paper ID):
A Universal Law of Robustness via Isoperimetry By Sébastien Bubeck and Mark Sellke. This paper proposes a theoretical model to explain why many state-of-the-art deep networks require many more parameters than are necessary to smoothly fit the training data. In particular, under certain regularity conditions about the training distribution, the number of parameters needed for an O(1)-Lipschitz function to interpolate training data below the label noise scales as nd, where n is the number of training examples, and d is the dimensionality of the data. This result stands in stark contrast to conventional results stating that one needs n parameters for a function to interpolate the training data, and this extra factor of d appears necessary in order to smoothly interpolate. The theory is simple and elegant, and consistent with some empirical observations about the size of models that have robust generalization on MNIST classification. This work also offers a testable prediction about the model sizes needed to develop robust models for ImageNet classification. This paper will be presented Tuesday, December 7 at 08:20 GMT (12:20 am PST) in the session on Deep Learning Theory and Causality.
On the Expressivity of Markov Reward By David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael Littman, Doina Precup, and Satinder Singh. Markov reward functions are the dominant framework for sequential decision making under uncertainty and reinforcement learning. This paper provides a careful, clear exposition of when Markov rewards are, or are not, sufficient to enable a system designer to specify a task, in terms of their preference for a particular behavior, preferences over behaviors, or preferences over state and action sequences. The authors demonstrate with simple, illustrative examples that there exist some tasks for which no Markov reward function can be specified that induces the desired task and result. Fortunately, they also show that it is possible in polynomial time to decide if a compatible Markov reward exists for a desired setting, and if it does, there also exists a polynomial time algorithm to construct such a Markov reward in the finite decision process setting. This work sheds light on the challenge of reward design and may open up future avenues of research into when and how the Markov framework is sufficient to achieve performance desired by human stakeholders. This paper will be presented Tuesday, December 7 at 09:20 GMT (1:20 am PST) in the session on Reinforcement Learning.
Deep Reinforcement Learning at the Edge of the Statistical Precipice By Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, and Marc G. Bellemare. Rigorous comparison of methods can accelerate meaningful scientific advances. This paper suggests practical approaches to improve the rigor of deep reinforcement learning algorithm comparison: specifically, that the evaluation of new algorithms should provide stratified bootstrap confidence intervals, performance profiles across tasks and runs, and interquartile means. The paper highlights that standard approaches for reporting results in deep RL across many tasks and multiple runs can make it hard to assess if a new algorithm represents a consistent and sizable advance over past methods, and illustrates this with empirical examples. The proposed performance summaries are designed to be feasible to compute with a small number of runs per task, which may be necessary for many research labs with limited computational resources. his paper will be presented Wednesday, December 8 at 16:20 GMT (8:20 am PST) in the session on Deep Learning.
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers By Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, and Zaid Harchaoui. This paper presents MAUVE, a divergence measure to compare the distribution of model-generated text with the distribution of human-generated text. The idea is simple and elegant, and it basically uses a continuous family of (soft) KL divergence measures of quantized embeddings of the two texts being compared. The proposed MAUVE measure is essentially an integration over the continuous family of measures, and aims to capture both Type I error (generating unrealistic text) and Type II error (not capturing all possible human text). The empirical experiments demonstrate that MAUVE identifies the known patterns of model-generated text and correlates better with human judgements compared to previous divergence metrics. The paper is well-written, the research question is important in the context of rapid progress of open-ended text generation, and the results are clear. This paper will be presented Tuesday, December 7 at 8:00 GMT (midnight PST) in the session on Deep Learning.
Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms By Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, and Adrien Taylor. This paper describes a “continuized” version of Nesterov’s accelerated gradient method in which the two separate vector variables evolve jointly in continuous-time—much like previous approaches that use differential equations to understand acceleration—but uses gradient updates that occur at random times determined by a Poisson point process. This new approach leads to a (randomized) discrete-time method that: (1) enjoys the same accelerated convergence as Nesterov’s method; (2) comes with a clean and transparent analysis that leverages continuous-time arguments, which is arguably easier to understand than prior analyses of accelerated gradient methods; and (3) avoids additional errors from discretizing a continuous-time process, which stands in stark contrast to several previous attempts to understand accelerated methods using continuous-time processes. This paper will be presented Wednesday, December 8 at 16:00 GMT (8:00 am PST) in the session on Optimization.
Moser Flow: Divergence-based Generative Modeling on Manifolds By Noam Rozen, Aditya Grover, Maximilian Nickel, and Yaron Lipman. This paper proposes a method for training continuous normalizing flow (CNF) generative models over Riemannian manifolds. The key idea is to leverage a result by Moser (1965) that characterizes the solution of a CNF (which Moser called an orientation preserving automorphism on manifolds) using a restricted class of ODEs that enjoys geometric regularity conditions, and is explicitly defined using the divergence of the target density function. The proposed Moser Flow method uses this solution concept to develop a CNF approach based on a parameterized target density estimator (which can be a neural network). Training amounts to simply optimizing the divergence of the density estimator, which side-steps running an ODE solver (required for standard backpropagation training). The experiments show faster training times and superior test performance compared to prior CNF work, as well as the ability to model densities on implicit surfaces with non-constant curvature such as the Stanford Bunny model. More generally, this concept of exploiting geometric regularity conditions to side-step expensive backpropagation training may be of broader interest. This paper will be presented Saturday, December 11 at 00:00 GMT (Friday, December 10 at 4:00 pm PST) in the session on Generative Modeling.
Selection Process:
The Outstanding Paper Committee determined a selection process with the goal of identifying an equivalence class of outstanding papers that represent some of the breadth of excellent research being conducted by the NeurIPS community.
The committee was given an initial batch of 62 papers including all papers that received an Oral slot and papers explicitly nominated by an Area Chair or Senior Area Chair. The committee used three phases of down-selection. In Phase 1, each paper in this initial batch was assigned one primary reader who determined if the paper should move on to Phase 2. In Phase 2, each paper was assigned an additional secondary reader. In Phase 3, all remaining papers were considered by the entire committee, and the primary and secondary readers were charged with articulating why a paper should be deserving of an award. In each subsequent phase, the committee made increasingly critical assessments and also made sharper comparisons across papers. In the later phases, the committee occasionally sought external input from subject matter experts. Thirty-two papers remained after Phase 1, thirteen after Phase 2, and the final six after Phase 3.
The committee identified two types of conflict of interest. Committee members with domain conflicts (e.g., authors from the same institution as the committee member), were not assigned as the primary or secondary readers on a paper. Committee members with personal conflicts (e.g., advisor/advisee relationships, previous co-authorship) were not assigned as readers and were additionally not allowed to provide input on whether the paper belonged in the final equivalence class.
Test of Time Award
Last but certainly not least, we are thrilled to announce that the recipient of the NeurIPS 2021 Test of Time Award is Online Learning for Latent Dirichlet Allocation by Matthew Hoffman, David Blei, and Francis Bach.
This paper introduces a stochastic variational gradient based inference procedure for training Latent Dirichlet Allocation (LDA) models on very large text corpora. On the theoretical side it is shown that the training procedure converges to a local optimum and that, surprisingly, the simple stochastic gradient updates correspond to a stochastic natural gradient of the evidence lower bound (ELBO) objective. On the empirical side the authors show that for the first time LDA can be comfortably trained on text corpora of several hundreds of thousands of documents, making it a practical technique for “big data” problems. The idea has made a large impact in the ML community because it represented the first stepping stone for general stochastic gradient variational inference procedures for a much broader class of models. After this paper, there would be no good reason to ever use full batch training procedures for variational inference anymore.
Selection Process:
Historically, the Test of Time Award has been awarded to a paper from the NeurIPS conference 10 years back. In 2020, the committee considered a broader range of papers and ended up selecting a recipient from 2011 instead of 2010. Because of this, this year, we gave the Test of Time Award Committee the option of choosing any paper from 2010 or 2011. After some discussion, the committee decided to focus specifically on 2010 since no paper published at that conference had previously been honored.
The committee first ranked all NeurIPS 2010 papers according to citation count. They defined a cutoff threshold at roughly 500 citations and considered all papers that achieved at least this citation count. This resulted in 16 papers. The committee took two weeks to read those papers (with each paper read by one or more committee members) and then met to discuss.
In this discussion, there was exactly one paper supported by all four members of the committee: Online Learning for Latent Dirichlet Allocation. Each of the committee members ranked this paper higher than all the other candidate papers and there was no strong runner-up, so the decision was easy and unanimous.
The Test of Time Award talk will take place in the final session of the conference, Saturday, December 11 at 01:00 GMT (Friday, December 10 at 5:00 pm PST).
Datasets & Benchmarks Best Paper Awards
This year NeurIPS launched the new Datasets & Benchmarks track, to serve as a venue for data-oriented work. We are pleased to announce two best paper awards from this track. A short list of papers were selected based on reviewer scores. The final selected papers were chosen from this list based on a vote from all members of the advisory board. Both papers will be presented in the Datasets and Benchmarks Track 2 Session on Wednesday, December 8 at 16:00 GMT (8:00 am PST).
The award recipients are:
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research By Bernard Koch, Emily Denton, Alex Hanna, and Jacob Gates Foster. This paper analyzes thousands of papers and studies the evolution of dataset use within different machine learning subcommunities, as well as the interplay between dataset adoption and creation. It finds that in most communities, there is an evolution towards using fewer different datasets over time, and that these datasets come from a handful of elite institutions. This evolution is problematic, since benchmarks become less generalizable, biases that exist within the sources of these datasets may be amplified, and it becomes harder for new datasets to be accepted by the research community. This is an important ‘wake up call’ for the machine learning community as a whole, to think more critically about which datasets are used for benchmarking, and to put more emphasis on the creation of new and more varied datasets.
ATOM3D: Tasks on Molecules in Three Dimensions By Raphael John Lamarre Townshend, Martin Vögele, Patricia Adriana Suriana, Alexander Derry, Alexander Powers, Yianni Laloudakis, Sidhika Balachandar, Bowen Jing, Brandon M. Anderson, Stephan Eismann, Risi Kondor, Russ Altman, and Ron O. Dror. This paper introduces a collection of benchmark datasets with 3D representations of small molecules and/or biopolymers for solving a wide range of problems, spanning single molecular structure prediction and interactions between biomolecules as well as molecular functional and design/engineering tasks. Simple yet robust implementations of 3D models are then benchmarked against state-of-the-art models with 1D or 2D representation, and show better performance over lower-dimensional counterparts. This work provides important insight about how to choose and design models for a given task. Not only does this work provide benchmarking datasets, it also provides baseline models and open source tools to leverage these datasets and models, dramatically lowering the entry barrier for machine learning people to get into computational biology and molecule design.
I am very pleased to announce that the schedule of NeurIPS 2021 is now available here. As mentioned in my earlier preview, the conference starts on Monday December 6, a day entirely dedicated to Tutorials on a broad range of topics. The subsequent four days will showcase the main program, which is structured in three sessions a day, each about 3 hour long. A major highlight of the main program is the fantastic set of invited speakers and discussion panels, and of course, the 2000 or so contributed papers whose selection took more than four months of hard work by our pool of expert reviewers, area chairs and senior area chairs.
Concurrent with the main program, there will also be demos and presentations by authors of our new track on Datasets & Benchmarks. During the intermissions, there will be lots of other events to choose from. There will be several Socials, Affinity Workshops, and presentations by participants of Competitions.
The following week, on Monday December 13 and Tuesday December 14 the conference will close with Workshops, typically focussed on highly technical topics.
Like in previous years, there will be about 40 Meetups around the world, to enable people from local communities to attend, either virtually or physically, and enjoy the conference together.
Please, have fun browsing the schedule and plan your attendance. Remember also that you can bookmark events and then sync them to your calendar. Remember that registration is required to access interactive content like poster sessions and to watch video broadcasts, as only tutorials and invited talks do not require registration.
Because there are no good models without good data, and only robust benchmarks measure true progress, NeurIPS launched the new Datasets and Benchmarks track, to serve as a venue for exceptional work focused on creating high-quality datasets, insightful benchmarks, and discussions on how to improve dataset development and data-oriented work more broadly. Further details about the motivation and setup are discussed in our earlier blog posthere.
In this inaugural year, we organized two rounds of submissions to get timely feedback from the community. Over the two rounds, we received 484 papers. We were pleasantly surprised by the quality and breadth of these submissions, out of which 174 have been accepted for publication. Please explore the final list of accepted papers.
The reviewing process involved a set of specific attention points, such as the long-term accessibility, ethics, and documentation quality of datasets, and the reproducibility of benchmarks. We are immensely grateful for the tremendous contributions of the 33 area chairs and 548 reviewers to make this new endeavor a success.
Of the 174 accepted papers, approximately 20% are related to computer vision; 15% about natural language processing; 15% about reinforcement learning and simulation environments; 7% about speech recognition; and 6% about multimodal data. In addition, 15% of papers covered meta-analyses, ethics, and explainability, and 22% covered various other topics. Overall, 55% of papers were identified as introducing new datasets, 20% benchmarks, and 25% a combination of both. While these are rough estimates, we hope they provide a sense of the distribution of topics in this year’s track.
The accepted papers will be presented in four oral and poster sessions alongside the main NeurIPS conference orals and posters. Since this is the first year that this track is organized, we will also hold a special symposium event on Thursday of the main conference where the impact and open challenges in creating datasets and running benchmarks will be openly discussed. For the symposium, we are excited to welcome as keynote speakers Olga Russakovsky (Princeton University), Raquel Urtasun (University of Toronto and Waabi), Erin LeDell (H2O.ai), and Douwe Kiela (FAIR).
The full schedule of NeurIPS Datasets and Benchmarks Track events can be found below. Please register for NeurIPS (if you haven’t already) and join us in this exciting new track!
Alina Beygelzimer, Yann Dauphin, Percy Liang, and Jennifer Wortman Vaughan NeurIPS 2021 Program Chairs
Machine learning is developing at a rapid pace which can leave little time for the community to reflect on where we are headed. Much of this development is fueled by empirical progress on benchmarks, but are these benchmarks measuring the right thing, and what are the scientific merits and limitations of leaderboard-climbing? We have seen the rise of massive models like GPT-3 that reshape our notion of what a machine learning model can do. How should the community respond to this costly trend? Finally, the ethical consequences of machine learning are ever more apparent. The societal impact of machine learning stems from the accumulation of contributions across the entire community and can feel distant from the day-to-day work of an individual researcher. How should we as researchers incorporate ethics into our own work?
These are big, open questions that have no right answers and there are many legitimate viewpoints within the NeurIPS community. To encourage dialogue, in addition to our traditional keynote talks and oral presentations of individual papers, we are excited to introduce a set of three plenary panel discussions at NeurIPS this year, in which experts can engage on the respective topics interactively.
We also encourage everyone to submit questions in advance on Rocket.Chat, so that we can better tailor the panels to the interests of the community.
The Consequences of Massive Scaling in Machine Learning
When: December 7 @ 7am UTC (pre-recorded)
Machine learning research has always prized algorithmic contributions. However, many recent big breakthroughs have been driven by scaling up the same basic algorithms and architectures. The most recent example is OpenAI’s massive language model GPT-3, which won a best paper award at NeurIPS in 2020. GPT-3 was based on the same Transformer architecture as its predecessors, but when scaled up, it resulted in remarkable unexpected behaviors, which had a massive impact on the way we think about language models. As more progress becomes driven by scaling, how should we adapt as a community? Should it affect what problems are considered interesting? Should publication norms take scale into account, or de-emphasize algorithmic contributions? How do we ensure that smaller institutions or academic labs can meaningfully research and audit large-scale systems? From a safety perspective, if behaviors appear emergently at scale, how can we ensure that systems behave as intended? In this panel, we will explore these critical questions so that the NeurIPS community at large can continue to make fundamental advances in the era of massive scaling.
Moderator: Jacob Steinhardt (University of California, Berkeley)
Panelists: Noah Goodman (Stanford University) Jared Kaplan (Anthropic) Melanie Mitchell (Santa Fe Institute) Joelle Pineau (Facebook) Oriol Vinyals (DeepMind)
The Role of Benchmarks in the Scientific Progress of Machine Learning
When: December 8 @ 3pm UTC (live)
Benchmark datasets have played a crucial role in driving empirical progress in machine learning, leading to an interesting dynamic between those on a quest for state-of-the-art performance and those creating new challenging benchmarks. In this panel, we reflect on how benchmarks can lead to scientific progress, both in terms of new algorithmic innovations and improved scientific understanding. First, what qualities of a machine learning system should a good benchmark dataset seek to measure? How well can benchmarks assess performance in dynamic and novel environments, or in tasks with an open-ended set of acceptable answers? Benchmarks can also raise significant ethical concerns including poor data collection practices, under- and misrepresentation of subjects, as well as misspecification of objectives. Second, even given high-quality, carefully constructed benchmarks, which research questions can we hope to answer from leaderboard-climbing, and which ones are deprioritized or impossible to answer due to the limitations of the benchmark paradigm? In general, we hope to deepen the community’s awareness of the important role of benchmarks for advancing the science of machine learning.
Moderator: Moritz Hardt (University of California, Berkeley)
Panelists: Lora Aroyo (Google) Sam Bowman (New York University) Isabelle Guyon (University of Paris-Saclay) Joaquin Vanschoren (Eindhoven University of Technology)
How should a machine learning researcher think about AI ethics?
When: December 10 @ 11pm UTC (live)
As machine learning becomes increasingly widespread in the real world, there has been a growing set of well-documented potential harms that need to be acknowledged and addressed. In particular, valid concerns about data privacy, algorithmic bias, automation risk, potential malicious uses, and more have highlighted the need for the active consideration of critical ethical issues in the field. In the light of this, there have been calls for machine learning researchers to actively consider not only the potential benefits of their research but also its potential negative societal impact, and adopt measures that enable positive trajectories to unfold while mitigating risk of harm. However, grappling with ethics is still a difficult and unfamiliar problem for many in the field. A common difficulty with assessing ethical impact is its indirectness: most papers focus on general-purpose methodologies (e.g., optimization algorithms), whereas ethical concerns are more apparent when considering downstream applications (e.g., surveillance systems). Also, real-world impact (both positive and negative) often emerges from the cumulative progress of many papers, so it is difficult to attribute the impact to an individual paper. Furthermore, regular research ethics mechanisms such as an Institutional Review Board (IRB) are not always a good fit for machine learning and problematic research practices involving extensive environmental and labor costs or inappropriate data use are so ingrained in community norms that it can be difficult to articulate where to draw the line as expectations evolve. How should machine learning researchers wrestle with these topics in their own research? In this panel, we invite the NeurIPS community to contribute questions stemming from their own research and other experiences, so that we can develop community norms around AI ethics and provide concrete guidance to individual researchers.
Moderator: Deborah Raji (University of California, Berkeley)
Panelists: Amanda Askell (Anthropic) Abeba Birhane (University College Dublin) Jesse Dodge (Allen Institute for AI) Casey Fiesler (University of Colorado, Boulder) Pascale Fung (Hong Kong University of Science and Technology) Hanna Wallach (Microsoft)
Freddie Kalaitzis and Gautam Kamath, NeurIPS 2021 Social Chairs
NeurIPS Socials are back for their third year! The social events can provide an excellent opportunity for those with a common interest to meet up, discuss, collaborate, debate, or celebrate. The community heard our call, and we’ve got an exciting lineup for NeurIPS 2021 — check them out below, or see them on the NeurIPS website here.
ML in Korea – A Social focused on individuals part of, or interested in, the ML research scene in Korea.
Latinx in AI Social – A social event with the goal of creating new connections between the NeurIPS community and Latinx researchers across the world.
Women in Research Roundtable – This roundtable aims to highlight the experiences of women researchers in tech and promote an engaging discussion around how women are excelling in the research science field.
Can Technology Be Used to Help Combat Maternal Mortality? – The United States has one of the highest maternal mortality rates among developed nations. This Social is meant to brainstorm and discuss strategies for combating maternal mortality in the United States and beyond.
Shine in Your Technical Presentation – This Social will cover how to design and deliver an effective presentation, featuring an opportunity to present and get feedback in a small group discussion.
Lapsed Physicists Wine-and-Cheese – This BYOWC (Bring Your Own Wine and Cheese) event is an informal opportunity to connect with members of the former Physicists community.
Space & ML – An opportunity to convene a community around ML application for space exploration, planetary stewardship of Earth and opportunities for ML in science.
BigScience – A Social related to the BigScience community, a one-year long research workshop on large multilingual datasets and large language models.
ML in India: A Billion Opportunities – This Social aims to facilitate interaction between those interested in exploring career opportunities in India and researchers who are currently based in India, and also create more awareness around ML problems that can have significant impact in the Indian context.
Queer in AI – A safe and inclusive casual networking and socializing space for LGBTQIA+ individuals involved with AI that celebrates all queer people around the world.
Women in AI Ignite – Ignite is a fast-paced presentation format, with five-minute talks consisting of 20 slides displayed for 15 seconds each. This is the third annual Women in AI Ignite event at NeurIPS.
Roundtable Chatroom – A community that fosters communication and sharing of great ideas dedicated to AI and ML practitioners. Will consist of Zoom roundtable chats, where participants discuss a topic for some time before moving on to the next one. Topics will include “The next big thing in AI,” “How to develop a research idea?,” and “Do we publish too much?”
We look forward to seeing you at NeurIPS 2021. Join us and register today!
ByAlina Beygelzimer, Yann Dauphin, Percy Liang, and Jennifer Wortman Vaughan, NeurIPS 2021 Program Chairs
We are thrilled to announce an outstanding lineup of keynote speakers for NeurIPS 2021! We have chosen speakers with a diverse set of research interests and backgrounds, from both within and outside the NeurIPS community, who will push us to think deeply about both the technical foundations of machine learning and the increasing impact of machine learning on society.
NeurIPS 2021 Speakers
Luis von Ahn is currently the co-founder and CEO of Duolingo, a language-learning platform created to bring free language education to the world. With over 500 million users, it is now the most popular way to learn languages and the most downloaded education app in the world. Previously, von Ahn co-invented CAPTCHAs and founded the company reCAPTCHA, which was sold to Google in 2009. Von Ahn is considered one of the pioneers of human computation and crowdsourcing. He has been named one of the 10 Most Brilliant Scientists by Popular Science Magazine, one of the 50 Best Brains in Science by Discover, one of the Top Young Innovators Under 35 by MIT Technology Review, and one of the 100 Most Innovative People in Business by Fast Company Magazine. He is the recipient of the Lemelson-MIT Prize and has been named a MacArthur Fellow. In 2021, von Ahn joined the Executive Committee of the Partnership for Central America.
Peter Bartlettis a professor in the Department of Electrical Engineering and Computer Sciences and the Department of Statistics at the University of California at Berkeley, Associate Director of the Simons Institute for the Theory of Computing, Director of the Foundations of Data Science Institute, and Director of the Collaboration on the Theoretical Foundations of Deep Learning. Bartlett is a leading researcher in machine learning and statistical learning theory. He is the co-author, with Martin Anthony, of the book “Neural Network Learning: Theoretical Foundations.” He was awarded the Malcolm McIntosh Prize for Physical Scientist of the Year in Australia, and has been chosen as an Institute of Mathematical Statistics Medallion Lecturer, an IMS Fellow and Australian Laureate Fellow, and a Fellow of the ACM. He was elected to the Australian Academy of Science in 2015. He will give the annual Posner Lecture, named in honor of Ed Posner, the first president of the NeurIPS Foundation, and delivered by a long-time contributor to the NeurIPS conference.
Meredith Broussard is an associate professor at the Arthur L. Carter Journalism Institute of New York University, research director at the NYU Alliance for Public Interest Technology, and the author of “Artificial Unintelligence: How Computers Misunderstand the World.” She is a data journalist and her academic research focuses on AI in investigative reporting and ethical AI, with a particular interest in using data analysis for social good. She appeared in the 2020 documentary Coded Bias, an official selection of the Sundance Film Festival that was nominated for an NAACP Image Award. She is an affiliate faculty member at the Moore Sloan Data Science Environment at the NYU Center for Data Science, a 2019 Reynolds Journalism Institute Fellow, and her work has been supported by New America, the Institute of Museum & Library Services, and the Tow Center at Columbia Journalism School. Her features and essays have appeared in The Atlantic, The New York Times, Slate, and other outlets.
Alessio Figalli is a chaired professor and director of the FIM-Institute for Mathematical Research at ETH Zürich. Figalli’s research is in the broad areas of Calculus of Variations and Partial Differential Equations, with a particular emphasis on optimal transport, Monge-Ampère equations, functional and geometric inequalities, elliptic PDEs of local and non-local type, free boundary problems, Hamilton-Jacobi equations, transport equations with rough vector-fields, and random matrix theory. Before joining ETH Zürich, he was faculty at the University of Texas-Austin. Among his many honors and awards, he received the Fields Medal in 2018 for “his contributions to the theory of optimal transport, and its application to partial differential equations, metric geometry, and probability.”
Mary L. Gray is a Senior Principal Researcher at Microsoft Research and Faculty Associate at Harvard University’s Berkman Klein Center for Internet and Society. She maintains a faculty position in the Luddy School of Informatics, Computing, and Engineering with affiliations in Anthropology and Gender Studies at Indiana University. An anthropologist and media scholar by training, Gray’s work focuses on how people’s everyday uses of technologies transform labor, identity, and human rights. Gray is the co-author (with computer scientist Siddharth Suri) of the award-winning book “Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass.” She chairs the Microsoft Research Ethics Review Program—the only federally-registered institutional review board of its kind in Tech—and is a member of Stanford University’s One-Hundred-Year Study on Artificial Intelligence (AI100) Standing Committee. Gray was named a MacArthur Fellow for her contributions to anthropology and the study of technology, digital economies, and society.
Gábor Lugosi is an ICREA research professor at the Department of Economics and Business, Pompeu Fabra University, Barcelona. His research focuses on the mathematical aspects of machine learning and related topics in probability and mathematical statistics, including combinatorial statistics, the analysis of random structures, and information theory. He is a co-author of several monographs on pattern recognition, density estimation, online learning, and concentration inequalities, including “Prediction, Learning, and Games” and “Concentration Inequalities: A Nonasymptotic Theory of Independence.” Lugosi will give the annual Breiman Lecture, named in honor of statistician Leo Breiman, who served on the NeurIPS Board for more than 10 years, and dedicated to work in statistics relevant to the NeurIPS community.
Radhika Nagpal is currently the Kavli Professor of Computer Science at Harvard University and a founding faculty member of the Wyss Institute for Biologically Inspired Engineering. Starting January 2022, she will be moving to Princeton University to lead new robotics initiatives. Nagpal leads the Self-organizing Systems Research Group (SSR) and her research interests span computer science, robotics, and biology. Nagpal has been the recipient of a Microsoft New Faculty Fellowship, NSF Career Award, Borg Early Career Award, Radcliffe Fellowship, and the McDonald Mentoring Award. She was named an AAAI and ACM Fellow, has been an invited TED speaker, and was chosen by the journal Nature as one of the top ten influential scientists and engineers of the year. Nagpal is the co-founder of ROOT Robotics, an educational robotics company aimed at democratizing AI and robotics through early education; her lab’s Kilobots have been commercialized with over 8000 robots sold worldwide. She is also the author of an influential Scientific American blog article on tenure-track life titled “The Awesomest 7-year Postdoc,” and is dedicated to creating a diverse and inclusive culture in STEM and academia.
Although keynotes will be pre-recorded, each will be streamed at a specified time as part of the conference program and followed by a live, moderated Q&A session with the audience. For those who prefer to watch the talks on-demand, they will be available to registered conference attendees at the start of the conference.
We are extremely excited to announce that the program will also feature a plenary interview with Daniel Kahneman. Daniel Kahneman is a Professor of Psychology and Public Affairs Emeritus at the Princeton School of Public and International Affairs, the Eugene Higgins Professor of Psychology Emeritus at Princeton University, and a fellow of the Center for Rationality at the Hebrew University in Jerusalem. He is a member of the National Academy of Science, the Philosophical Society, the American Academy of Arts and Sciences and a fellow of the American Psychological Association, the American Psychological Society, the Society of Experimental Psychologists, and the Econometric Society. He has been the recipient of too many awards to name, among them the Nobel Prize in Economic Sciences, the Lifetime Contribution Award of the American Psychological Association, and the Presidential Medal of Freedom. He is the author of “Thinking, Fast and Slow” and co-author of “Noise: A Flaw in Human Judgment.”
In lieu of a traditional keynote talk, Kahneman will be interviewed by Josh Tenenbaum, Professor of Computational Cognitive Science in MIT’s Department of Brain and Cognitive Sciences and a scientific director with the MIT Quest for Intelligence.
More details on the program will be announced soon. We look forward to seeing you virtually in December!
By Marc’Aurelio Ranzato, General Chair NeurIPS 2021
From September 22nd, registration to NeurIPS 2021 is now open! Please visit our registration page. Given the virtual nature of the conference, registration prices are greatly reduced, $25 for students, $100 for academics and $175 for the others. Registration aid is available to support greater participation in the conference and details about how to apply for registration support can be found here. Volunteer applications are now also being considered.
Registration will provide access to all interactive elements of this year’s program, including the ability to ask speakers questions, to participate in live poster sessions, to participate in the mentorship program, to access the Careers Website, and to chat and network with other attendees. Similar to last year, we will stream tutorials and keynote talks for free, and make all recordings available some time after the conference is over.
This year’s virtual event includes invited and contributed talks from the main program, tutorials, workshops, demos, competitions, a whole new track on datasets and benchmarks and more. A birds-eye view of this year’s schedule is shown below.
Day 1: Tutorials. The conference will start on Monday December 6 with Tutorial presentations on a wide array of topics, see the Tutorial Chairs’s post for more details. Tutorials will run back to back for the whole day to serve all various time zones of the world.
Days 2-5: Main conference. The main program will start on Tuesday December 7 and it will run till Friday December 10. There will be three sessions a day, spaced by eight hours and each lasting for about three hours. The main program will consist of an exciting mix of invited talks, live panel discussions, oral and poster presentations. Compared to last year, there will be more live events, which will offer a more engaging experience! The Program Chairs are now finalizing the program (author notifications are coming out next week). Stay tuned for a future blog post with more details, coming soon.
Concurrent with the main program there will also be presentations from the new track on Datasets and Benchmarks, as well as various demos. During the intermissions, there will be presentations from the organizers and participants about Competitions as well as numerous social gatherings.
Days 6-7: Workshops. New this year, we have scheduled a weekend break and we will resume activities Monday December 13 and Tuesday December 14, with two days dedicated to Workshops focusing on technical topics and emerging themes in the field.
The conference program is still being finalized, so please stay tuned for more updates from our amazing team of organizing chairs! Check back here for regular updates, and follow us on Twitter @neurips_conf.
By Freddie Kalaitzis, Gautam Kamath, NeurIPS 2021 Social Chairs
NeurIPS Socials are an excellent opportunity for those with a common interest to meet up, discuss, collaborate, debate, or celebrate. We recently released the Call for Socials, and would like to remind you that the deadline of September 8 is fast approaching! We solicit proposals from all registered NeurIPS attendees on any scientific or non-scientific topic that are aimed to enrich the community.
We want your imagination to run wild — Socials can be on almost any topic! Here are some examples Social topics from recent past conferences: