Alexandria3k: Reproducible publication research on the desktop

Presenter: Diomidis Spinellis
Date: 21 November 2022


Sustained exponential advances in computing power, drops in associated costs, and Open Science initiatives allow us to process on a personal computer the metadata from most major international academic publishers as well as corresponding author, funder, and journal details. Alexandria3k is an open-source software library and command-line tool that builds on this capability to allow the conduct of sophisticated bibliometric and scientometric studies as well as systematic literature reviews in an transparent, repeatable, reproducible, and efficient manner. In total the Alexandria3k system provides relational access to 1.5 PB of data comprising 134 million publication records, of which 60 million contain full citation data, 15 million author records, 109 thousand journal records, and 32 thousand research funding bodies. The system allows the execution of simple ad hoc queries over the publication dataset or the selective population of a database for running more complex relational queries. Application examples include the independently verifiable calculation of bibliometric figures, such as the journal impact factor and the h-index, the creation of detailed research data subsets based on the publication topic, funder, institution, publication outlet, or author country, and the study of the publication citation graph.