We are thrilled to announce that the NeurIPS Competition Track is now accepting proposals for the upcoming conference. NeurIPS hosts a competition track to promote innovative research and foster collaboration across different scientific disciplines.
We are looking for proposals for competitions that tackle important and challenging open problems in machine learning and AI. We encourage proposals that involve real-world datasets and have clear scientific questions in areas such as healthcare, biology, education, RL and robotics, NLP, and more.
The NeurIPS Competition Track offers a unique opportunity to showcase your research and compete with other leading researchers in the field. Winning teams will have the chance to present their work at the conference and receive recognition for their achievements.
We encourage all researchers, academics, and industry professionals working in machine learning and AI to submit proposals and take part in this exciting opportunity. We look forward to seeing the innovative ideas and solutions that will emerge from this year’s competition track.
To improve scientific rigour, transparency, and reproducibility, we introduce new publication requirements for competition reports co-authored by both organizers and participants.
Accepted competitions in 2023 will be required to submit their post-competition analyses as papers to the 2024 NeurIPS D&B track (next year).
In order to ensure that the competition proposal and results are consistent, reviewers in 2024 will have access to the 2023 proposals. To minimize experimental bias, any deviations from the proposal will need to be justified.
This move is expected to encourage a more systematic and rigorous approach to scientific competitions, leading to more meaningful contributions to the field of AI and machine learning.
To submit a proposal, check the call on theNeurIPS website and follow the instructions. The submission deadline is April 27th, 202323.59 AOE, so be sure to get your proposals in before then!
If you have any questions or concerns, please do not hesitate to contact us at email@example.com
With NeurIPS still fresh in our memory, planning is underway for NeurIPS 2023, which will be held once again in New Orleans, Dec 10-16, 2023. We are excited to join this upcoming year as co-General Chairs for the conference, and we look forward to contributing to this important meeting that holds significance for us and our field.
NeurIPS is a large conference and its organization continues to be largely driven by volunteers from our community committed to its success. As we start this process, we hope to have as wide a set of people from which to select as chairs for the conference. Please consider nominating yourself, or someone you know, as an organizer for one of the roles in the conference, or in a generalist role that you think might serve the conference better.
Please send nominations by Jan 13, 2023, using this form.
Serving as an organizer is a great way to build experience in crafting large scientific meetings and balancing the many tradeoffs involved, helps build new networks, and is a great way to give back to the community in a way that is different from reviewing and workshops.
We look forward to your nominations, and we hope to share updates as planning progresses in the new year.
from a wide variety of industry participants and 77 exhibitors. Please see the Expo schedule for more details.
The halls were full of enthusiastic high-school students from 11 local New Orleans high schools as NeurIPS hosted 240 students at its first Education Outreach Day organized by Matt Wang and Jessica Forde and a special thanks to Mary Ellen Perry for spearheading the idea!
The conference officially kicked off in Hall H at 5:00 pm CST with the Opening Remarks from our General Chairs – Shakir Mohamed and Sanmi Koyejo and Program Chairs – Alice Oh, Alekh Agarwal, Danielle Belgrave and Kyunghyun Cho followed by an opening invited talk by David Chalmers onAre Large Language Models Sentient? at 5:15 pm which was streamed for our virtual attendees.
Following the talk, everyone socialized at NeurIPS tasty Reception from 6:00 pm-8:00 pm.
If you have feedback and questions for the organizers, please send them via email to firstname.lastname@example.org for Wednesday’s Town hall in Theatre B, 30 November from 6:00-7:00 pm.
by Charvi Rastogi, Ivan Stelmakh, Hal Daumé III, Emma Pierson, and Nihar B. Shah, and theNeurIPS 2021 Program ChairsAlina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, and Zhenyu Xue (NeurIPS 2021 Workflow Manager)
There is a considerable body of research on peer review. Within the machine learning community, there have been experiments establishing significant disagreement across reviewers and across reviewer panels —including at NeurIPS 2021— and active discussions about the state of peer review. But how do author perceptions about their submitted papers match up to the outcomes of the peer-review process and perceptions of other authors? We investigate this question by asking authors who submitted papers to NeurIPS 2021 three questions:
(Q1) [At the time of paper submission] What is your best estimate of the probability (as a percentage) that this submission will be accepted?
(Q2) [At the time of paper submission; to authors submitting two or more papers] Rank your submissions in terms of your own perception of their scientific contributions to the NeurIPS community, if published in their current form.
(Q3) [After preliminary reviews were available to authors] After you read the reviews of this paper, how did your perception of the value of its scientific contribution to the NeurIPS community change (assuming it was published in its initially submitted form)?
Here are five key findings.
How well do authors estimate the probability of acceptance of their papers? Authors significantly overestimate their papers’ chances of acceptance. When answering Q1, authors were informed that the acceptance rate at NeurIPS over the last 4 years had been about 21%. The acceptance rate at NeurIPS 2021 turned out to be 25.8%. The authors’ responses had a nearly three-fold overestimate, with a median prediction of 70%.
Are some sub-groups better calibrated than others? We examined calibration error across sub-groups, measuring this error in terms of the Brier score (squared loss) and controlling for other confounders. We find that the calibration error of female authors is slightly (but statistically significantly) higher than that of male authors. We also see a trend of miscalibration decreasing with seniority, with authors who were invited to serve as (meta-)reviewers better calibrated than the rest. All sub-groups we examined over-predicted their papers’ chances of acceptance.
Among authors with multiple papers, how much do their predictions of acceptance probabilities agree with their own perceived scientific contribution? These two sets of responses are largely in agreement: The strict ranking provided by authors about their perceived scientific contribution (Q2) and the strict ranking induced by their predicted acceptance probabilities (Q1) agree for 93% of responses. However, there is a noticeable 7% of responses where the authors think that the peer review is more likely to reject the better of their two papers.
How much do co-authors agree on the relative quality of their joint papers? Strikingly, the amount of disagreement between co-authors in terms of the perceived relative scientific contribution of their papers (Q2) is similar to the amount of disagreement between authors and reviewers! In cases where one paper from an author was ultimately accepted and another rejected, authors rated the rejected paper higher about a third of the time. But looking at pairs of papers with overlapping authors in which both authors provided rankings, the co-authors also disagreed with each other about a third of the time. While there are discussions in the literature about inter-reviewer disagreements, this result suggests that there is a similar disagreement in co-authors’ views of their papers as well.
Does peer review change authors’ perception of their own papers? Question Q3 was a multiple-choice question with five choices: much more positive (“++”), slightly more positive (“+”), did not change (“0”), slightly more negative (“-”), and much more negative (“–”).
We find that among both accepted and rejected papers, about 50% of authors report that their perception of their own paper changed after seeing the initial reviews (Q3). Moreover, among both accepted and rejected papers, over 30% of authors report that their perception became more positive.
The fact that authors vastly overestimated the probability that their papers will be accepted suggests it would be useful for conference organizers and research mentors to attempt to recalibrate expectations prior to each conference. The disagreements we document around paper quality — between co-authors as well as between authors and reviewers — taken together with the disagreement among committees of reviewers observed in the complementary NeurIPS 2021 consistency experiment, suggest that assessing paper quality is not only an extremely noisy process but may be a fundamentally challenging task with no objectively right answer. The outcomes of paper submissions should thus be taken with a grain of salt. More broadly, as a community, we may take these findings into account when deciding on our policies and perceptions pertaining to the peer-review process and its outcomes. We hope the results of our experiment encourage discussion and introspection in the community.
We would like to thank all the participants for the time they took to provide survey responses. We are grateful to the OpenReview team, especially Melisa Bok, for their support in running the survey on the OpenReview.net platform.
Last year, NeurIPS launched the new Datasets and Benchmarks track to serve as a venue for exceptional work focused on creating high-quality datasets, insightful benchmarks, and discussions on improving dataset development and data-oriented work more broadly. Further details about the motivation and setup are discussed in our earlier blog post here.
This year, we received 447 submissions on a breadth of topics, out of which 163 have been accepted for publication. The acceptance rate was 36.46%. Please explore the list of accepted papers. The reviewing standards were again set very high, and the process involved a set of specific attention points, such as the impact and documentation quality of datasets, the reproducibility of benchmarks, as well as ethics, and long-term accessibility.
We are immensely grateful for the tremendous contributions of the 92 area chairs, 1064 reviewers, and 39 ethics reviewers to make this new endeavor a success. Different from last year, we organized a single reviewing round, more closely following the main NeurIPS review cycle, albeit with a longer rebuttal period which allowed many submissions to be substantially improved.
Of the 163 accepted papers, about half of the papers were identified as introducing new datasets, while the other half presented new benchmarks. They covered a broad range of topics. Approximately 23% of papers were related to computer vision; 8% natural language processing; 7% reinforcement learning and simulation environments; and 6% multimodal data. The remainder covered various other topics, such as speech processing, explainable AI, and ethics. While these are rough estimates, we hope they provide a sense of the distribution of topics in this year’s track.
This year, the Dataset and Benchmarks track also truly became a standard component of the NeurIPS conference. Datasets and Benchmarks papers are blended with the main conference papers in the poster sessions, panels, and on the virtual conference site. They will still be easily discoverable via a virtual site highlight page and stickers in the poster session. We are also delighted that the NeurIPS board has agreed to publish a single NeurIPS proceedings this year. The Datasets and Benchmarks papers will appear in the same proceedings as the other NeurIPS papers, with an indication that they are affiliated with the dataset and benchmark track to make them easy to find.
We are looking forward to another great edition of the NeurIPS Datasets and Benchmarks track, and hope to see you at the conference!
By Marco Ciccone, Gustavo Stolovitzky Jake Albrecht
NeurIPS is here and we will have a dedicated Competition Track for the sixth time!
Social Event and Poster Session
We are glad to invite you to our social event in New Orleans at the Conference on 29th November at 6 PM (Ballroom C).
The event will be opened by an invited talk from two pioneers of challenges in ML, Isabelle Guyon and Evelyne Viegas, who will discuss the role of competitions at NeurIPS, their evolution, and opportunities.
The invited talk will be followed by a poster session with the competition organizers presenting their challenges and the highlights of the past few months.
Competitions have a valuable place in research and in solving complex problems.
We encourage you to take advantage of this social opportunity to learn more about ML challenges and application trends.
This year we selected competitions covering a broad spectrum of challenges and disciplines such as AutoML, Graph Representation Learning, Security, Machine Learning for Physical and Life Sciences, Natural Language Processing and Understanding, Robotics, and Multi-Agent Systems.
We are excited to finally meet new and old faces of both organizers and participants who made the competition track a success during these years. There will be pizza, salad, and soft drinks for everyone!
After the success of the past editions, in addition to the physical event, the Competition Track will feature online workshops during the virtual week.
The online workshops aim to reach a larger audience and allow researchers worldwide interested in specific ML challenges to foster collaboration, exchange ideas, and grow a sense of community.
Each workshop is a focused session with invited talks from winners and experts of the specific competition. Check the schedule of each workshop on the conference competitions page or look at the general virtual program below.
We want to thank all the reviewers, organizers and competition participants for the hard work and integrity over the past months of preparation. We look forward to meeting you all in NOLA and virtually for two inspiring weeks of science.
by the General Chairs, Sanmi Koyejo and Shakir Mohamed
The two weeks of NeurIPS 2022 are close, and we are excited to meet everyone in person in New Orleans during the first week and then to continue our interaction during the virtual week. There is a lot to look forward to, and this post is meant to help navigate the various events and activities. In our previous updates we described the steps we took for safety and facilities, and the overall format of the conference.
A highlight of every NeurIPS are the keynotes from leading academic and industry leaders. This year’s topics are:
The in-person conference prioritizes in-person interaction and discussion, and this is centered around the poster sessions. There are two poster sessions each day of the main conference (Tues/Wed/Thurs). Poster boards have been placed with sufficient space for social distancing, we provide face shields for poster presenters, and we encourage mask-wearing for all attendees.
Posters come from three different streams:
Main Conference track. The main conference has 2,672 accepted papers. In addition to learning from the authors about their work directly, each paper has an individual page on the website where you can find a 5-minute video and a chat channel to discuss the work asynchronously.
Journal Showcase. This year, we introduced a journal-to-conference track, where you can learn about the work of papers accepted into journals in our field. There are 41 papers from JMLR and 33 papers from ReScience in this track.
Affinity Group Workshops and Expo
The first day of the meeting, Monday 28 Nov, includes most of the Affinity Group workshops as well as the Expo.. This day is an opportunity to reconnect and make new connections. If you are attending NeurIPS for the first time, then consider joining the New in ML workshop.
Affinity events. You can find the schedule for the Affinity Groups here. This year’s affinity events include: Global South in AI; Women in ML (in both weeks); North Africans in ML; LatinX in AI; Queer in AI; Black in AI; Indigenous in AI. The joint Affinity Poster Session in the early evening of the 28th is an opportunity for members across the Affinity Groups to showcase their work.
Expo. The Expo is an opportunity to hear about research and work from industry representatives from some of the platinum or gold exhibitors. There are expo talks, demonstrations, and workshops to experience; see the full expo schedule for locations and topics.
Competitions, Socials and Discussions
The evenings (6 PM onwards) of the in-person week provide further activities to get involved in the NeurIPS community. Some of the highlights are:
Competitions. On Tuesday 29 Nov 6pm, connect with other attendees to learn about this year’s competitions. There will be 22 competitions for you to interact with in an exhibition demo-style setup, and there will be pizza, salads and soft-drinks to keep you fed while you visit all the competition stands.
Ethics Review Open Discussion, also on Tuesday 29 Nov at 7pm. If you are interested in talking about ethics review processes and ways to improve them, join this moderated discussion led by the NeurIPS 2022 Ethics Review Chairs.
Town Hall onWednesday 30 Nov, 6pm.As members of the NeurIPS community, your thoughts on building a stronger NeurIPS community and wider considerations are essential to the health of the conference. Join this moderated discussion hosted by this year’s Communications chairs, and with updates from the NeurIPS board, the general and program chairs, the diversity, inclusion and accessibility chairs, and other members of this year’s organizing committee.
Socials. Find a social to make new connections. This year’s socials are broad and include: the negotiations social, K-Pop in NeurIPS; Women in AI Ignite; Un-Bookclub Haben: The Deafblind Woman Who Conquered Harvard Law; Interdisciplinary ML Mixer; ML&Space Social; Data Excellence; Industry, Academia, and the In-Betweens; Gulf coast AI. Check out the webpage listing all socials here.
Tutorials in 2022 are all virtual and held on Monday 5 December, covering time zones across the world. Catch up with the state-of-the-art across 13 tutorials, covering a broad range of subject areas in machine learning research. There are several times; see the tutorials blog and the website.
Spotlights and Paper Deep Dives
Since few of us can stay attentive for an all-day virtual conference, the virtual conference week keeps content focused to 2 × 2-hours blocks each day. In these sessions, you will get 1-minute spotlight presentations from authors of accepted papers followed by mini panel discussions, where 2 papers are grouped and discussed together. You get to ask questions through RocketChat.
These two-hour blocks repeat the following structure to fill the two hours:
[15mins] 1 min paper spotlights
[15mins] Paper panel with authors of 2 papers.
The times for these sessions: 9–11 AM UTC-8 and 9–11 AM UTC+8. There are 2 or 3 tracks in parallel. Make sure to block the applicable times in your schedule—add this from the website.
There are three days of workshops this year: two days during the in-person conference and one during the virtual week. There is a range of workshops; you can see the full list on the website and read more about the workshops on our blog.
We encourage you to join the conference both in-person and online, and register if you have not yet done so. All that’s left is to thank our organizing committees for the dedication they have given. And a special thanks to Mary-Ellen Perry, Lee Campbell, Brad Brockmeyer, Brian Nettleton, Terri Auricchio, Max Wiesner and other members of our logistics and organizing staff, without whom the conference would not be possible—when you bump into them online or in-person, please take a minute to share your thanks.
See you at the conference soon.
P.S. Our best wishes for a weekend ahead full of gratitude and grace. This post was written while listening to Jambalaya. And Tweet and Toot our content to help everyone plan for the two weeks ahead.
by Alekh Agarwal, Alice Oh, Danielle Belgrave, Kyunghyun Cho, Deepti Ghadiyaram, Joaquin Vanschoren
We are excited to announce the award-winning papers for NeurIPS 2022! The three categories of awards are Outstanding Main Track Papers, Outstanding Datasets and Benchmark Track papers, and the Test of Time paper. We thank the awards committee for the main track, Anima Anandkumar, Phil Blunsom, Naila Murray, Devi Parikh, Rajesh Ranganath, and Tong Zhang. For the Datasets and Benchmarks track, we thank Hugo Jair Escalante, Sergio Escalera, Isabelle Guyon, Neil Lawrence, Olga Russakovsky, and Serena Yeung.
Congratulations to all authors!
Is Out-of-distribution Detection Learnable? by Zhen Fang, Yixuan Li, Jie Lu, Jiahua Dong, Bo Han, Feng Liu This work provides a theoretical study of out-of-distribution (OOD) detection, focusing on the conditions under which such models are learnable. The work uses probably approximately correct (PAC) learning theory to show that OOD detection models are PAC learnable only for some conditions of the space of data distributions and the space of prediction models. It provides 3 concrete impossibility theorems, which can be easily applied to determine the feasibility of OOD detection in practical settings, and which was used in this work to provide a theoretical grounding for existing OOD detection approaches. This work also raises new theoretical questions, for example, about the learnability of near-OOD detection. As such, it has the potential for broad theoretical and practical impact in this important research area. Tues Nov 29 — Poster Session 1
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding by Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Raphael Gontijo-Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi High quality generative models of images based on Diffusion Process are having a huge impact both within and beyond machine learning. This work represents one of the state of the art of such models, but also innovates in demonstrating the effective combination of an independently trained large language model with an image decoder at scale. This inherently practical decoupling is likely to be a dominant paradigm for large scale text to image models. The results are impressive and of interest to a broad audience. Thurs Dec 1 — Poster Session 5
Elucidating the Design Space of Diffusion-Based Generative Models by Tero Karras, Miika Aittala, Timo Aila, Samuli Laine This paper is an excellent demonstration of how a well thought through survey, that seeks not just to list but to organise prior research into a coherent common framework, can provide insights that then lead to new modelling improvements. In this case the focus on this paper are generative models of images that incoporate some form of Diffusion Process, which have become extremely popular recently despite the difficulties of training such models. This paper is likely to be an important contribution in the evolution of both the understanding and implementation of Diffusion Process based models. Wed Dec 7 — Featured Papers Panels 3B
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation by Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi This work provides a framework for training embodied AI agents on large quantities of data, creating the potential for such agents to benefit from scaling, as language and image generation models have. The core of the framework is an engine for building procedurally-generated, physics-enabled environments with which agents can interact. This engine, in combination with provided digital assets and environmental controls, allows for generating a combinatorially large number of diverse environments. The authors demonstrate that this framework can be used to train SoTA models for several embodied AI tasks. The framework and code used in this work will be open-sourced, providing a valuable asset for the research community. Wed Nov 30 — Poster Session 3
A Neural Corpus Indexer for Document Retrieval by Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Hao Sun, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang This work proposes a neural indexer that takes as input a query and outputs, via a decoder combined with beam search, a list of IDs corresponding to relevant documents in the index. It joins a small but growing line of research that departs from the dominant high recall-sparse retrieval paradigm. Notably, this new paradigm allows for gradient-based optimization of the indexer for target applications using standard deep learning algorithms and frameworks. The proposed approach introduces architectural and training choices that result in significant improvements compared to prior work, demonstrating the promise of neural indexers as a viable alternative. The paper is well-written and discusses the limitations and open questions following from this work, which can serve as inspiration for future research. Thurs Dec 1 — Poster Session 5
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling by Gerard Ben Arous, Reza Gheissari, Aukosh Jagannath This work studies the scaling limits of SGD with constant step-size in the high-dimensional regime. It shows how complex SGD can be if the step size is large. Characterizing the nature of SDE and comparing it to the ODE when the step size is small gives insights into the nonconvex optimization landscape.
Gradient Descent: The Ultimate Optimizer by Kartik Chandra, Audrey Xie, Jonathan Ragan-Kelley, Erik Meijer This paper reduces sensitivity to hyperparameters in gradient descent by developing a method to optimize with respect to hyperparameters and recursively optimize *hyper*-hyperparameters. Since gradient descent is everywhere, the potential impact is tremendous. Wed Nov 30 — Poster Session 4
Riemannian Score-Based Generative Modelling by Valentin De Bortoli, Emile Mathieu, Michael John Hutchinson, James Thornton, Yee Whye Teh, Arnaud Doucet The paper generalizes score-based generative model (SGM) from Euclidean space to Riemannian manifolds by identifying major components that contribute to the success of SGMs. The method is both a novel and technically useful contribution. Wed Nov 30 — Poster Session 4
Gradient Estimation with Discrete Stein Operators by Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis Titsias, Lester Mackey This paper considers gradient estimation when the distribution is discrete. Most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, they introduce a variance reduction technique based on Stein operators for discrete distributions. Even though Stein operator is classical, this work provides a nice interpretation of it for gradient estimation and also shows practical improvement in experiments. Tues Nov 29 — Poster Session 1
An empirical analysis of compute-optimal large language model training by Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katherine Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Oriol Vinyals, Jack William Rae, Laurent Sifre The work asks “Given a fixed FLOPs budget, how should one trade-off model size and the number of training tokens?”. The work models this trade off, makes a prediction based on this model, and trains a model corresponding to that prediction. The resultant model, that is significantly smaller but is trained on significantly more tokens, outperforms its counterpart, while also being more practical to use downstream due to its smaller size. All in all, this work sheds new light on the way the community thinks about scale in the context of language models, which may be useful in other domains of AI as well. Wed Nov 30 — Poster Session 4
Beyond neural scaling laws: beating power law scaling via data pruning by Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos The importance of high quality data in order to achieve good results in machine learning is well known. Recent work on scaling laws has treated data quality as uniform and focussed on the relationship between computation and data. This work renews our focus on the importance of selecting high quality data as a means to achieve optimal scaling. It does so through a nicely designed analytic investigation that develops a theoretical model of the impact of data quality in concert with empirical instantiation of a range of data filtering metrics on ImageNet. This work is both insightful and timely and will shape the debate about the tradeoffs in the many dimensions of scale in machine learning. Wed Nov 30 — Poster Session 3
LAION-5B: An open large-scale dataset for training next generation image-text models byChristoph Schuhmann, Romain Beaumont, Richard Vencu, Cade W Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa R Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia Jitsev Studying the training and capabilities of language-vision architectures, such as CLIP and DALL-E, requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community. This work presents LAION-5B, a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, aimed at democratizing research on large-scale multi-modal models. Moreover, the authors use this data to successfully replicate foundational models such as CLIP, GLIDE and Stable Diffusion, provide several nearest neighbor indices, as well as an improved web-interface, and detection scores for watermark, NSFW, and toxic content detection. Wed Nov 30 — Poster Session 4
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge by Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar Autonomous agents have made great strides in specialist domains like Atari games and Go, but typically fail to generalize across a wide spectrum of tasks and capabilities. This work introduces MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions. It also proposes a novel agent learning algorithm that is able to solve a variety of open-ended tasks specified in free-form language. It provides an open-source simulation suite, knowledge bases, algorithm implementation, and pretrained models to promote research on generally capable embodied agents. Tue Nov 29 — Poster Session 2
Test of Time Award
This year, following the usual practice, we chose a NeurIPS paper from 10 years ago, and “ImageNet Classification with Deep Convolutional Neural Networks” by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, aka “AlexNet paper” was unanimously selected by the Program Chairs. In 2012, it was presented as the first CNN trained on the ImageNet Challenge, far surpassing the state-of-the-art at the time, and since then it has made a huge impact on the machine learning community. Geoff will be giving an invited talk on this and more recent research on Thursday, Dec. 1, at 2:30 pm. https://neurips.cc/Conferences/2022/ScheduleMultitrack?event=55869
We again congratulate the award winners and thank the award committee members and the reviewers, ACs, and SACs for nominating the papers. We are looking forward to hearing from the authors of these and all other NeurIPS 2022 papers in New Orleans and on our virtual platform.
Alekh Agarwal, Alice Oh, Danielle Belgrave, Kyunghyun Cho
by Adji Bousso Dieng, Andrew Gordon Wilson, Jessica Schrouff
We are excited to announce the tutorials selected for presentation at the NeurIPS 2022 conference! We look forward to an engaging program, spanning many exciting topics, including Lifelong Learning, Bayesian Optimization, Algorithmic Discrimination, Neurosymbolic Programming, Data Compression, NLP in Healthcare, and others. In this blog post, we detail our selection process, the program, reflections on submissions, and considerations for future tutorials.
Each virtual tutorial will consist of:
A presentation by the speakers (1h50)
Live Q&A with the speakers, answering technical or clarifying questions (10 minutes)
Live Panel with further researchers in the field to discuss challenges and promises (30 minutes)
There are two notable differences from last year’s programme: a mix of contributed and invited tutorials (rather than only invited), and a live panel.
On the Role of Meta-learning for Few-shot Learning Speaker: Eleni Triantafillou
Foundational Robustness of Foundation Models Speakers: Pin-Yu Chen, Sijia Liu, Sayak Paul
Lifelong Learning Machines Speakers: Tyler Hayes, Dhireesha Kudithipudi, Gido van de Ven
Neurosymbolic Programming Speakers: Swarat Chaudhuri, Armando Solar-Lezama, Jennifer Sun
Advances in NLP and their Applications to Healthcare Speaker: Ndapa Nakashole
Probabilistic Circuits: Representations, Inference, Learning and Applications Speakers: Antonio Vergari, YooJung Choi, Robert Pehar
Advances in Bayesian Optimization Speakers: Virginia Aglietti, Jacob Gardner, Jana Doppa
Algorithmic discrimination at the intersection Speakers: Golnoosh Farnadi, Vera Liao, Elliot Creager
Incentive-Aware Machine Learning: A Tale of Robustness, Fairness, Improvement, and Performativity Speaker: Chara Podimata
Data Compression with Machine Learning Speakers: Karen Ullrich, Yibo Yang, Stephan Mandt
Creative Culture and Machine Learning Speakers: Negar Rostamzadeh, Anna Huang, Mark Riedl
Theory and Practice of Efficient and Accurate Dataset Construction Speakers: Frederic Sala, Ramya Korlakai Vinayak
Fair and Socially Responsible ML for Recommendations: Challenges and Perspectives Speakers: Hannah Korevaar, Manish Raghavan, Ashudeep Singh
This year, we have experimented with a “contributed only” design (see related blog post). Our hope was to obtain a “community-led” selection of topics and speakers while emphasising diversity across but also within tutorials. Our call for proposals had clear guidelines for the selection of topics, speakers, panellists, format, etc.
We received 34 submissions by the (strict) deadline. Each submission was reviewed by two Tutorial Chairs based on interest and expertise on the topic. Each chair gave a score between 1 (strong reject) to 10 (strong accept) to encapsulate their overall impression of the proposal. We then shortlisted the submissions that had received a 6 or higher from at least one chair (14 proposals out of 34) and discussed when there were disagreements between the initial two reviews. A third review was then obtained from a different Tutorial Chair to finalise the decision to accept or reject a proposal. We accepted 9 proposals with this process.
Some relatively common reasons for low scores included (but were not limited to):
The topic is too niche for a very broad audience.
The topic has been presented in recent tutorials in major machine learning conferences.
The speakers have recently contributed to major ML conferences as tutorial and/or keynote speakers.
Low diversity, broadly construed.
Guidelines were not followed (e.g. no panel included).
While no feature in particular would guarantee acceptance, certain features were often present in proposals that were favourably reviewed:
The proposal was highly polished. Significant effort and thought had gone into carefully organising and planning the tutorial, paying close attention to instructions, with few loose ends. These features suggested that the presentation itself would be carefully planned, avoiding last minute organisation, logistical mishaps, etc.
The presenters had demonstrated significant commitment, contributions, and expertise in the chosen topic.
The topic would be both fresh as a tutorial and have a relatively broad appeal.
Diversity, broadly construed. For example, speakers and panels with diverse perspectives on the material.
Given the manageable workload, we did not desk-reject proposals for not following guidelines. In the future, desk rejections might be considered (e.g. multiple proposals were 10-12 pages long instead of the required 5).
Diversity in submitted proposals
Each proposal included 1 to 3 speakers, and up to 6 panellists. Across all submissions, there were a total of 85 speakers and 160 confirmed panellists (with some overlap with speakers). We had explicitly asked tutorial presenters to consider diversity in terms of (not exhaustive) gender, race, geographical location, institution, background and expertise, and to write a statement. Our goals were (1) to ensure that a diverse set of opinions were considered, and (2) members from underrepresented groups in the field were included in this program.
According to the diversity statements, researchers have mostly focused on background, expertise and geographical location as diversity dimensions. We note that geographical locations in proposals were mostly limited to Western Europe, the US and Canada. It therefore seems that researchers understood our first goal partly, but only a few proposals satisfactorily considered the second goal.
Aspects of gender and race or ethnicity were rarely addressed explicitly in diversity statements, and sometimes diversity was highlighted where it was not clearly present. This “lip-service” diversity led to the following results:
Men were overrepresented as speakers, with 17 out of 34 proposals including only male speakers.
Asian (South and East) and White researchers were overrepresented.
Diversity was more often addressed in the panel, than in the speakers.
To illustrate these impressions, we tried to identify each speaker and confirmed panellist according to their perceived gender (based on pronouns in the proposal), perceived race (from the proposal where available, otherwise from a combination of CV, online information and picture as last resort), institution and seniority level (early career: PhD student or early postdoc 0-3 years post-PhD, mid-career: Assistant Prof, 3-10 years post-PhD, senior: Prof, 10+ years post-PhD). We acknowledge that this classification is somewhat arbitrary and does not fully reflect the gender and racial identities of the speakers and panellists. We however believe it is important to provide an approximate quantification of the consequences of our design choices.
Figure 2: Perceived gender (pie chart) and race (bar plot) distribution of speakers across all submitted proposals. MENA stands for Middle East and North Africa. Note that percentages in the bar plot might not sum to 100 as race information might not be identifiable from the proposal or online information.
We observe that men (he/his pronouns) represent more than 75% of the proposed speakers. White and Asian speakers represent more than 90% of the speakers. Proposals were mostly submitted by and included more academics than industry researchers (26 industry out of 85). Seniority levels were well balanced (early: 26, mid: 33, senior: 26).
Diversity restricted to the panel
Diversity in terms of perceived gender slightly improved in the panel (Figure 3), but remained dominated by men. Similarly, Asian and White researchers still represented more than 80% of the panellists. Interestingly, the panels included fewer researchers from industry (31 out of 160, i.e. ~19%) and skewed more towards senior researchers (early: 26, mid: 47, senior: 81). Overall, we see that the diversity improves relative to speakers, but remains low.
Figure 3: Perceived gender and race distribution in the panels of submitted proposals.
As a note, we would like to highlight the fact that 3 proposals had made particular efforts in terms of the diversity. These efforts, combined with strong proposals and timely topics, led 2 of these proposals to be accepted (the last one being ineligible). These authors show that it is possible to propose a diverse set of speakers and panellists across all dimensions.
Finally, we assessed diversity across other dimensions, such as disability or being part of the LGBTQIA+ community. For privacy reasons, we do not communicate these numbers.
Improving on the quality of the program
We contacted the authors of accepted tutorials and worked with them in cases where we believed the program could be improved, in terms of organisation, scientific content, and diversity. Where appropriate, we also encouraged the speakers to rethink the format of the proposed tutorial to account for the online edition.
As we had initially planned for ~12-15 tutorials, we had the opportunity to invite tutorial speakers. We invited speakers by identifying researchers who have demonstrated excellence and expertise in a specific topic and who would benefit from the opportunity. We considered aspects of diversity in our selection to prioritise researchers from under-represented groups and maximised the diversity in topics.
Thanks to the responsiveness of invited speakers and to the work of authors of submitted proposals, we are able to provide an exciting list of topics. While we also increase the diversity of speakers and panellists, there is still room for improvement.
Figure 4: Perceived gender and race distribution across speakers and panellists after proposals were revised and speakers were invited.
Considerations for future editions
Carefully review the topics of tutorials at major ML conferences in the past 3 years. Topics that are overlapping are unlikely to be selected unless the tutorial brings a significantly novel point of view or extension.
Read and follow the guidelines. This might seem obvious but multiple proposals were rejected because they included speakers who were ineligible, did not include a panel, etc. While we did not desk-reject proposals this year, we did notice that the proposals we accepted were mostly following the guidelines. This simply highlights that the authors carefully considered the different requirements, wrote and proof-read their submission and submitted on time. This increases the chances of acceptance.
Diversity should be considered across all aspects, and proposals should include voices from under-represented groups. Proposals with 6 or 7 participants that all identify with masculine pronouns are unlikely to be accepted. Refer to directories from affinity groups (e.g. https://www.directory.wimlworkshop.org/, https://lxai.app/PUBLIC-DIRECTORY), request recommendations from more senior researchers in the field, and consider non-Western institutions.
If you would like to propose a topic for a tutorial but are ineligible, please pass the opportunity to someone else! Consider encouraging others in the field to submit a proposal.
Speakers and panellists
A proposal might stem from one or a couple of researchers who will then invite other speakers and panellists. These co-presenters and panellists also have an important role to play:
We observed that some panellists were considered as confirmed in multiple proposals. These were often more senior researchers. If you do get invited for multiple opportunities, please consider suggesting other researchers instead. We had strict guidance that every speaker and panellist could only be considered for one tutorial.
Similarly, if you cannot participate, please suggest other researchers and think about researchers from under-represented groups who would benefit from the opportunity.
If you see that the list of speakers and panellists is not very diverse or dominated by groups that are already over-represented in the field, reach out to the authors and ask them to modify the list.
Members from under-represented groups
While the burden of defining a diverse program should not fall on members of under-represented groups, there are a couple of steps that can be taken to increase visibility:
Create a website, fill in (and update) your personal page on your institution’s or company’s website or make public profiles on LinkedIn, DBLP, ResearchGate, Google Scholar, …
We recommend affinity groups to create open-source repositories of their members (where feasible) such that organisers can identify potential speakers to invite.
Submit a proposal!
Web presence helps authors, speakers, panellists and tutorial organisers find your profile to consider you for the opportunity. Without this information, it is difficult to estimate whether someone has the breadth of experience and communication skills that would make a tutorial successful.
We are of course not exempt from improvements. Some of the learnings we take for future editions include:
More proactively reach out to potential speakers to encourage them to submit a proposal. This includes repeated postings on mailing lists such as those from affinity groups, as well as directly reaching out to researchers.
Earlier invitations of invited tutorials.
Clearer guidelines, e.g. explaining the goals we are trying to achieve with the diversity statement.
Having a clear set of expectations and benefits for tutorial speakers and panellists.
NeurIPS organisers and board
Tutorial speakers provide significant content for the conference. When they come from under-represented groups, they could be better supported such that they can submit a proposal or accept this opportunity. Bottlenecks we have identified include:
No funding opportunity for tutorial speakers if they wanted to attend the in-person component of the conference.
No honorarium for speakers. For some speakers from under-represented groups, the exposure that a tutorial provides does not compensate for the toll that building such a program takes as this is time not devoted to research or grant applications.
Provide the opportunity for NeurIPS contributors to self-report their demographic characteristics.
The NeurIPS organisers and Board have been receptive to these requests and have now granted:
Tutorial speakers receive an in-person or virtual registration.
Thanks to DIA Chairs, tutorial speakers will also be considered in priority when applying for NeurIPS travel funding (previously limited to students and authors).
Each tutorial will receive a honorarium (to be split across speakers).
Self-reporting requires more consideration, especially given the laws regarding demographic surveys in different countries. It is being discussed for future meetings.
We are thankful to the organisers (in particular the General and DIA Chairs) and the Board for these measures. We believe they will help in providing an exciting and diverse set of tutorials in future editions.
We are extremely excited about the programme, and look forward to seeing you at the tutorials!
By the NeurIPS 2022 ethics review chairs: Sasha Luccioni, Inioluwa Deborah Raji, Cherie Poland, and William Isaac
TL;DR: The 2022 ethics review process is done – come discuss the process and related considerations with us at the Ethics Review Open Discussion on Tuesday, November 29th at NeurIPS!
With the 2022 decision process behind us and as this year’s conference approaches, we wanted to take this opportunity to reflect on the 2022 NeurIPS ethics review process.
The ethics review process was first introduced at the 2020 NeurIPS conference, implemented as a step towards improving the ethical awareness and engagement of NeurIPS authors and reviewers in order to inspire overall improvements to ethical research conduct, practice and reflection throughout the field, especially for those participating and presenting at the conference.
While the first year of the process was a pilot, last year was focused on operating the process at scale. This year’s main objective focused on consistency: incorporating the successful components from the previous editions of the ethics review process to reinforce its reliability and applicability for a conference of this size, further solidifying concrete policies to establish a coherent process moving forward.
Updates from the 2022 Ethics Review Process
The process saw updated Ethics Review Guidelines, which included new considerations regarding the misuse of ML algorithms to produce contradictory results, as well as the addition of a list of deprecated datasets to allow both authors and reviewers to check the status of training datasets and understand the different issues that may arise. The ethics reviews were not designed to be punitive or exclusionary. Rather, they were designed to inform, educate, and shed light on ethical concerns so authors could address these issues through an open discussion.
This year also saw the release of the first draft of the NeurIPS Provisional Code of Ethics, which aims to provide the community with more thorough ethical guidelines and expectations for the conference. As such, research integrity issues, including plagiarism, that were identified during the review process, were remanded and transferred to the Program Chairs.
Overview of the Ethics Review Process
Main NeurIPS track
We allowed technical reviewers and area chairs (ACs) to flag papers that they found to have ethical issues based on a list provided for guidance.
To help review these papers, we invited 328 individuals with diverse backgrounds and expertise in AI ethics to take part in the ethics review process. In total, 128 people agreed to participate.
The categories of ethics reviewer expertise included:
Discrimination / Bias / Fairness Concerns
Data and Algorithm Evaluation
Inappropriate Potential Applications & Impact (i.e. human rights concerns)
Privacy and Security (consent, etc.)
Research Integrity Issues (i.e. plagiarism, etc.)
Responsible Research Practice (i.e. IRB, documentation, research ethics, etc.)
Paper reviews were conducted in Open Review and reviewers were assigned algorithmically in a blinded fashion after preliminary conflict checks were performed, with many of the reviewers reviewing in multiple categories. Once the ethics reviews were completed, they were made available to the authors and technical reviewers so that discussions of concerns could be handled through open dialogue.
Handling false positives (flagged papers that did not have ethical issues): Of the 419 main track papers flagged for ethics review, no sub-category issues were flagged in 115 of them. This necessitated a manual review of all 115 papers to identify the potential ethical concerns. Of these, 103 papers had no apparent ethical issues.
Handling false negatives (papers with ethical concerns unflagged by technical reviewers): There were also papers that had clear ethical issues that were not surfaced by the primary reviewers. In accordance with the previous year, these papers were identified through a keyword search of especially challenging topics that had required additional ethical scrutiny in the past (i.e. key words such as surveillance, facial recognition, and biometric data generation).
No sub-category flagged
No apparent ethical issue
Main track: number of papers flagged
D&B: number of papers flagged
The Datasets and Benchmarks Track
While fewer papers were submitted to the Datasets and Benchmarks Track compared to the main track, the number of papers with ethical concerns was greater by percentage in the former: 81 papers were flagged for ethics review, with 31 having confirmed ethical issues. The ethical challenges arising from datasets range from issues of participant consent, privacy, anonymity, biometrics, data storage, and web scraping of data. These were all concerns that were raised in this year’s ethics review process. Concerns about the risk of harm and deprecation of the datasets, due to historical problems, were discussed at length among the technical and ethics reviewers. The ethics reviewers often recommended improved datasheet documentation, shedding light on issues such as consent, privacy, and third-party oversight of data collection processes and procedures.
Of the 31 submissions with confirmed ethical issues, two were minor, 25 were serious, and four were severe enough for the ethics chairs to recommend rejection or conditional acceptance upon additional review and deliberation between the ethics chairs. The decision to recommend rejection/conditional acceptance based on ethical grounds was not taken lightly and was made only after considerable open discussion (as seen on the Open Review pages of papers whose authors opted for their reviews to remain public). In these cases, the ethics chairs provided the area chairs with written justifications in support of their joint recommendations.The final decision on whether to accept or reject these papers was left to the track chairs.
Cross-cutting ethical issues
Issues pertaining to the utilization of Institutional Review Boards (IRBs) arose on several occasions in both the Main Track and the Datasets and Benchmarks Track. Concerns were related to ethical oversight of data collection, informed consent, ability to withdraw from participation in the dataset, data privacy, cross-border uses of data (global public availability), licensing, and copyright law issues. However, no requirements were made to use IRBs as a method of third-party oversight because the availability and access to IRBs as an oversight mechanism varies greatly between countries.
Moving forward, it is crucial for the conference and the larger community to consider diverging international ethical standards, as reviewers raised concerns about how to equally and equitably address ethical standards for future conferences when laws, regulations, and ethics differ by country. The focus should continue to be placed on the technical merits with attention to ethical implications and impact, without unnecessarily burdening authors to follow rigid ethical protocols. It is therefore important to establish norms to guide and educate about potential ethical harms and dual uses, rather than to impose penalties on authors for their work in advancing technically relevant and important research.
During the ethics review process, we received insightful feedback from technical reviewers, ethics reviewers and ACs that we deemed relevant to share with the NeurIPS community:
Lack of clarity around the proper use of the ethics flag: technical and ethics reviewers noted that more information was needed regarding the purpose of the ethics flag and how to use it. The way in which the process was set up did not enable technical reviewers to add comments or reasons for flagging beyond the categories provided, which sometimes made it difficult for ethics reviewers to pinpoint the issue.
Providing more support for technical reviewers: We also received several suggestions to add an FAQ to the technical reviewer guidelines with examples of how or when to use the ethics flag. This could be done by making the text box for comments a required field to prompt technical reviewers to provide their reason for flagging a paper. This year, over 112 papers in the main track were marked with an ethics flag without explanation or justification.
The difference between ethical guidelines and norms: While we endeavored to cover a large scope of potential issues and problems in the updated ethical guidelines, they remained high-level and did not address all possible use cases and applications. For instance, several debates cropped up regarding whether it was acceptable to download data from public repositories to train ML models and how copyright can/should be enforced.
The potential influence of paper visibility/promotion on reviewers’ evaluation: Given that NeurIPS doesn’t have an explicit policy regarding the use of social media and press releases before/during the review period, it was proposed that paper visibility could potentially influence reviewer evaluation.
Other minor points that were raised:
Visibility of ethics reviews: Determining whether ethics reviews should be made visible to the public, authors and reviewers from the onset can be a non-obvious challenge, as it dictates the level of awareness and quality of communication between these stakeholders. There is a clear educational benefit in making reviews publicly available to the community. However, it could potentially lead to unhelpful public engagement in the review process.
Communication between ER chairs and reviewers: There should be a clear articulation of guidelines on what reviewers are expected to flag and an established communication channel between ACs and reviewers to ensure they effectively comprehend the role of ethics reviews to appropriately solicit and react to the ethics reviews content.
Ethics “office hours”: Given the lack of clarity around the scope and purpose of ethics reviews, having regular “office hours” to answer questions from both the ethical and technical reviewers could be useful to both groups.
Better matching/reasons for ethics reviews: While the categories provided for flagging the papers aimed to cover many different ethical aspects, they did not enable easy matching between papers and reviewers. Adding more categories (and modifying existing ones) could help better match reviewer expertise to the concrete issues present in a paper in both conference tracks.
Lack of clarity around the role of ethics reviews: It was brought to our attention that the purpose of the ethics review wasn’t always clear to both technical reviewers and the broader community. For instance, whether ethical issues alone could constitute a reason for rejecting an otherwise technically sound submission was a question that was raised on Open Review. We hope to discuss this question with conference attendees at our Ethics Open Discussion that we will organize on Tuesday, November 29th.
Retrospectives on the ethics review process
Since starting this review process, the broader community has shown enthusiasm and interest in these efforts, including some academic reflection about the process and possible improvements. Similar major institutional ethics review efforts have been launched at places such as Microsoft, Deepmind and Stanford in order to mediate project approval and ethical consideration, citing the NeurIPS ethics review process as an impetus for their internal efforts.
We hope these efforts serve as evidence that since the introduction of ethical oversight practices at NeurIPS, there has been a growing interest and uptake of further ethical reflections as part of the research process in machine learning. In addition, these specific efforts and others highlight the broader community’s participation in actively informing the next steps as it relates to this process through their feedback and adoption of recommended practices.