Alina Beygelzimer, Yann Dauphin, Percy Liang, and Jennifer Wortman Vaughan
NeurIPS 2021 Program Chairs
Machine learning is developing at a rapid pace which can leave little time for the community to reflect on where we are headed. Much of this development is fueled by empirical progress on benchmarks, but are these benchmarks measuring the right thing, and what are the scientific merits and limitations of leaderboard-climbing? We have seen the rise of massive models like GPT-3 that reshape our notion of what a machine learning model can do. How should the community respond to this costly trend? Finally, the ethical consequences of machine learning are ever more apparent. The societal impact of machine learning stems from the accumulation of contributions across the entire community and can feel distant from the day-to-day work of an individual researcher. How should we as researchers incorporate ethics into our own work?
These are big, open questions that have no right answers and there are many legitimate viewpoints within the NeurIPS community. To encourage dialogue, in addition to our traditional keynote talks and oral presentations of individual papers, we are excited to introduce a set of three plenary panel discussions at NeurIPS this year, in which experts can engage on the respective topics interactively.
We also encourage everyone to submit questions in advance on Rocket.Chat, so that we can better tailor the panels to the interests of the community.
The Consequences of Massive Scaling in Machine Learning
When: December 7 @ 7am UTC (pre-recorded)
Machine learning research has always prized algorithmic contributions. However, many recent big breakthroughs have been driven by scaling up the same basic algorithms and architectures. The most recent example is OpenAI’s massive language model GPT-3, which won a best paper award at NeurIPS in 2020. GPT-3 was based on the same Transformer architecture as its predecessors, but when scaled up, it resulted in remarkable unexpected behaviors, which had a massive impact on the way we think about language models. As more progress becomes driven by scaling, how should we adapt as a community? Should it affect what problems are considered interesting? Should publication norms take scale into account, or de-emphasize algorithmic contributions? How do we ensure that smaller institutions or academic labs can meaningfully research and audit large-scale systems? From a safety perspective, if behaviors appear emergently at scale, how can we ensure that systems behave as intended? In this panel, we will explore these critical questions so that the NeurIPS community at large can continue to make fundamental advances in the era of massive scaling.
Moderator: Jacob Steinhardt (University of California, Berkeley)
Noah Goodman (Stanford University)
Jared Kaplan (Anthropic)
Melanie Mitchell (Santa Fe Institute)
Joelle Pineau (Facebook)
Oriol Vinyals (DeepMind)
The Role of Benchmarks in the Scientific Progress of Machine Learning
When: December 8 @ 3pm UTC (live)
Benchmark datasets have played a crucial role in driving empirical progress in machine learning, leading to an interesting dynamic between those on a quest for state-of-the-art performance and those creating new challenging benchmarks. In this panel, we reflect on how benchmarks can lead to scientific progress, both in terms of new algorithmic innovations and improved scientific understanding. First, what qualities of a machine learning system should a good benchmark dataset seek to measure? How well can benchmarks assess performance in dynamic and novel environments, or in tasks with an open-ended set of acceptable answers? Benchmarks can also raise significant ethical concerns including poor data collection practices, under- and misrepresentation of subjects, as well as misspecification of objectives. Second, even given high-quality, carefully constructed benchmarks, which research questions can we hope to answer from leaderboard-climbing, and which ones are deprioritized or impossible to answer due to the limitations of the benchmark paradigm? In general, we hope to deepen the community’s awareness of the important role of benchmarks for advancing the science of machine learning.
Moderator: Moritz Hardt (University of California, Berkeley)
Lora Aroyo (Google)
Sam Bowman (New York University)
Isabelle Guyon (University of Paris-Saclay)
Joaquin Vanschoren (Eindhoven University of Technology)
How should a machine learning researcher think about AI ethics?
When: December 10 @ 11pm UTC (live)
As machine learning becomes increasingly widespread in the real world, there has been a growing set of well-documented potential harms that need to be acknowledged and addressed. In particular, valid concerns about data privacy, algorithmic bias, automation risk, potential malicious uses, and more have highlighted the need for the active consideration of critical ethical issues in the field. In the light of this, there have been calls for machine learning researchers to actively consider not only the potential benefits of their research but also its potential negative societal impact, and adopt measures that enable positive trajectories to unfold while mitigating risk of harm. However, grappling with ethics is still a difficult and unfamiliar problem for many in the field. A common difficulty with assessing ethical impact is its indirectness: most papers focus on general-purpose methodologies (e.g., optimization algorithms), whereas ethical concerns are more apparent when considering downstream applications (e.g., surveillance systems). Also, real-world impact (both positive and negative) often emerges from the cumulative progress of many papers, so it is difficult to attribute the impact to an individual paper. Furthermore, regular research ethics mechanisms such as an Institutional Review Board (IRB) are not always a good fit for machine learning and problematic research practices involving extensive environmental and labor costs or inappropriate data use are so ingrained in community norms that it can be difficult to articulate where to draw the line as expectations evolve. How should machine learning researchers wrestle with these topics in their own research? In this panel, we invite the NeurIPS community to contribute questions stemming from their own research and other experiences, so that we can develop community norms around AI ethics and provide concrete guidance to individual researchers.
Moderator: Deborah Raji (University of California, Berkeley)
Amanda Askell (Anthropic)
Abeba Birhane (University College Dublin)
Jesse Dodge (Allen Institute for AI)
Casey Fiesler (University of Colorado, Boulder)
Pascale Fung (Hong Kong University of Science and Technology)
Hanna Wallach (Microsoft)