BALab seminars — Guidelines for Improving Paper and Peer Review Quality

Abstract

Redesigning the peer review of software engineering studies can improve the process's fairness, inclusiveness, transparency, and effectiveness. group led by Paul Ralph, in which your presenter participates, is developing empirical standards under the auspices of ACM SIGSOFT. Today, when a scholar peer reviews a manuscript, they must simultaneously generate and apply a set of evaluation criteria. The problem is that generating an appropriate rubric for judging research quality is mind-numbingly difficult. So reviewers tend to generate incomplete, oversimplified and inappropriate criteria. It’s not because reviewers are stupid; it’s because no one person can generate good rubrics for all of the different kinds of studies SE researchers review. Furthermore, the reviewers, the editor, and the authors may construct totally different rubrics, and these rubrics may depart wildly from published methodological guidance, or the norms of their scientific community. Most frustration with the peer review process comes from authors and reviewers disagreeing on what makes a study good. The power imbalance between authors and reviewers makes correcting some reviewers impossible. We need to change the process so that reviewers all use the right criteria in the first place. The solution is for the software engineering community to decide together what “good” means. For each common methodology, we create a one-page checklist of specific expectations. To prevent reviewers from using the standards in an inflexible, gotcha-like manner, each criterion is paired with a simple decision tree. When a reviewer indicates that a criterion is not satisfied, they will be explicitly asked whether there is a good reason. Providing the same, specific criteria to authors and reviewers will improve research quality, simplify reviewing, reduce conflict and increase acceptance rates. The presentation will introduce the taxonomy of provided empirical standards, overview the general standard, which applies to all reviews, and focus on one particular standard as an example.