By the NeurIPS 2022 ethics review chairs: Sasha Luccioni, Inioluwa Deborah Raji, Cherie Poland, and William Isaac
TL;DR: The 2022 ethics review process is done – come discuss the process and related considerations with us at the Ethics Review Open Discussion on Tuesday, November 29th at NeurIPS!
With the 2022 decision process behind us and as this year’s conference approaches, we wanted to take this opportunity to reflect on the 2022 NeurIPS ethics review process.
The ethics review process was first introduced at the 2020 NeurIPS conference, implemented as a step towards improving the ethical awareness and engagement of NeurIPS authors and reviewers in order to inspire overall improvements to ethical research conduct, practice and reflection throughout the field, especially for those participating and presenting at the conference.
While the first year of the process was a pilot, last year was focused on operating the process at scale. This year’s main objective focused on consistency: incorporating the successful components from the previous editions of the ethics review process to reinforce its reliability and applicability for a conference of this size, further solidifying concrete policies to establish a coherent process moving forward.
Updates from the 2022 Ethics Review Process
The process saw updated Ethics Review Guidelines, which included new considerations regarding the misuse of ML algorithms to produce contradictory results, as well as the addition of a list of deprecated datasets to allow both authors and reviewers to check the status of training datasets and understand the different issues that may arise. The ethics reviews were not designed to be punitive or exclusionary. Rather, they were designed to inform, educate, and shed light on ethical concerns so authors could address these issues through an open discussion.
This year also saw the release of the first draft of the NeurIPS Provisional Code of Ethics, which aims to provide the community with more thorough ethical guidelines and expectations for the conference. As such, research integrity issues, including plagiarism, that were identified during the review process, were remanded and transferred to the Program Chairs.
Overview of the Ethics Review Process
Main NeurIPS track
We allowed technical reviewers and area chairs (ACs) to flag papers that they found to have ethical issues based on a list provided for guidance.
To help review these papers, we invited 328 individuals with diverse backgrounds and expertise in AI ethics to take part in the ethics review process. In total, 128 people agreed to participate.
The categories of ethics reviewer expertise included:
- Discrimination / Bias / Fairness Concerns
- Data and Algorithm Evaluation
- Inappropriate Potential Applications & Impact (i.e. human rights concerns)
- Privacy and Security (consent, etc.)
- Research Integrity Issues (i.e. plagiarism, etc.)
- Responsible Research Practice (i.e. IRB, documentation, research ethics, etc.)
Paper reviews were conducted in Open Review and reviewers were assigned algorithmically in a blinded fashion after preliminary conflict checks were performed, with many of the reviewers reviewing in multiple categories. Once the ethics reviews were completed, they were made available to the authors and technical reviewers so that discussions of concerns could be handled through open dialogue.
Handling false positives (flagged papers that did not have ethical issues): Of the 419 main track papers flagged for ethics review, no sub-category issues were flagged in 115 of them. This necessitated a manual review of all 115 papers to identify the potential ethical concerns. Of these, 103 papers had no apparent ethical issues.
Handling false negatives (papers with ethical concerns unflagged by technical reviewers): There were also papers that had clear ethical issues that were not surfaced by the primary reviewers. In accordance with the previous year, these papers were identified through a keyword search of especially challenging topics that had required additional ethical scrutiny in the past (i.e. key words such as surveillance, facial recognition, and biometric data generation).
No sub-category flagged
No apparent ethical issue
Main track: number of papers flagged
D&B: number of papers flagged
The Datasets and Benchmarks Track
While fewer papers were submitted to the Datasets and Benchmarks Track compared to the main track, the number of papers with ethical concerns was greater by percentage in the former: 81 papers were flagged for ethics review, with 31 having confirmed ethical issues. The ethical challenges arising from datasets range from issues of participant consent, privacy, anonymity, biometrics, data storage, and web scraping of data. These were all concerns that were raised in this year’s ethics review process. Concerns about the risk of harm and deprecation of the datasets, due to historical problems, were discussed at length among the technical and ethics reviewers. The ethics reviewers often recommended improved datasheet documentation, shedding light on issues such as consent, privacy, and third-party oversight of data collection processes and procedures.
Of the 31 submissions with confirmed ethical issues, two were minor, 25 were serious, and four were severe enough for the ethics chairs to recommend rejection or conditional acceptance upon additional review and deliberation between the ethics chairs. The decision to recommend rejection/conditional acceptance based on ethical grounds was not taken lightly and was made only after considerable open discussion (as seen on the Open Review pages of papers whose authors opted for their reviews to remain public). In these cases, the ethics chairs provided the area chairs with written justifications in support of their joint recommendations.The final decision on whether to accept or reject these papers was left to the track chairs.
Cross-cutting ethical issues
Issues pertaining to the utilization of Institutional Review Boards (IRBs) arose on several occasions in both the Main Track and the Datasets and Benchmarks Track. Concerns were related to ethical oversight of data collection, informed consent, ability to withdraw from participation in the dataset, data privacy, cross-border uses of data (global public availability), licensing, and copyright law issues. However, no requirements were made to use IRBs as a method of third-party oversight because the availability and access to IRBs as an oversight mechanism varies greatly between countries.
Moving forward, it is crucial for the conference and the larger community to consider diverging international ethical standards, as reviewers raised concerns about how to equally and equitably address ethical standards for future conferences when laws, regulations, and ethics differ by country. The focus should continue to be placed on the technical merits with attention to ethical implications and impact, without unnecessarily burdening authors to follow rigid ethical protocols. It is therefore important to establish norms to guide and educate about potential ethical harms and dual uses, rather than to impose penalties on authors for their work in advancing technically relevant and important research.
During the ethics review process, we received insightful feedback from technical reviewers, ethics reviewers and ACs that we deemed relevant to share with the NeurIPS community:
- Lack of clarity around the proper use of the ethics flag: technical and ethics reviewers noted that more information was needed regarding the purpose of the ethics flag and how to use it. The way in which the process was set up did not enable technical reviewers to add comments or reasons for flagging beyond the categories provided, which sometimes made it difficult for ethics reviewers to pinpoint the issue.
- Providing more support for technical reviewers: We also received several suggestions to add an FAQ to the technical reviewer guidelines with examples of how or when to use the ethics flag. This could be done by making the text box for comments a required field to prompt technical reviewers to provide their reason for flagging a paper. This year, over 112 papers in the main track were marked with an ethics flag without explanation or justification.
- The difference between ethical guidelines and norms: While we endeavored to cover a large scope of potential issues and problems in the updated ethical guidelines, they remained high-level and did not address all possible use cases and applications. For instance, several debates cropped up regarding whether it was acceptable to download data from public repositories to train ML models and how copyright can/should be enforced.
- The potential influence of paper visibility/promotion on reviewers’ evaluation: Given that NeurIPS doesn’t have an explicit policy regarding the use of social media and press releases before/during the review period, it was proposed that paper visibility could potentially influence reviewer evaluation.
Other minor points that were raised:
- Visibility of ethics reviews: Determining whether ethics reviews should be made visible to the public, authors and reviewers from the onset can be a non-obvious challenge, as it dictates the level of awareness and quality of communication between these stakeholders. There is a clear educational benefit in making reviews publicly available to the community. However, it could potentially lead to unhelpful public engagement in the review process.
- Communication between ER chairs and reviewers: There should be a clear articulation of guidelines on what reviewers are expected to flag and an established communication channel between ACs and reviewers to ensure they effectively comprehend the role of ethics reviews to appropriately solicit and react to the ethics reviews content.
- Ethics “office hours”: Given the lack of clarity around the scope and purpose of ethics reviews, having regular “office hours” to answer questions from both the ethical and technical reviewers could be useful to both groups.
- Better matching/reasons for ethics reviews: While the categories provided for flagging the papers aimed to cover many different ethical aspects, they did not enable easy matching between papers and reviewers. Adding more categories (and modifying existing ones) could help better match reviewer expertise to the concrete issues present in a paper in both conference tracks.
- Lack of clarity around the role of ethics reviews: It was brought to our attention that the purpose of the ethics review wasn’t always clear to both technical reviewers and the broader community. For instance, whether ethical issues alone could constitute a reason for rejecting an otherwise technically sound submission was a question that was raised on Open Review. We hope to discuss this question with conference attendees at our Ethics Open Discussion that we will organize on Tuesday, November 29th.
Retrospectives on the ethics review process
Since starting this review process, the broader community has shown enthusiasm and interest in these efforts, including some academic reflection about the process and possible improvements. Similar major institutional ethics review efforts have been launched at places such as Microsoft, Deepmind and Stanford in order to mediate project approval and ethical consideration, citing the NeurIPS ethics review process as an impetus for their internal efforts.
Furthermore, several great retrospective studies on the design of past NeurIPS ethics review processes have now been published, including an examination of past broader impact statements, and a review of last year’s move to the checklist, in addition to how-to guides for the research community to inform their thinking and taxonomize research ethics challenges.
We hope these efforts serve as evidence that since the introduction of ethical oversight practices at NeurIPS, there has been a growing interest and uptake of further ethical reflections as part of the research process in machine learning. In addition, these specific efforts and others highlight the broader community’s participation in actively informing the next steps as it relates to this process through their feedback and adoption of recommended practices.
Further questions? Come to our Ethics Review Open Discussion at NeurIPS on November 29, or send an email to firstname.lastname@example.org!