Application domains in the Research Papers at BENEVOL: a retrospective

Capiluppi, A, Ajienka, N ORCID logoORCID: https://orcid.org/0000-0002-8792-282X and Romo, BA, 2019. Application domains in the Research Papers at BENEVOL: a retrospective. In: BENEVOL 2019: The 18th Belgium-Netherlands Software Evolution Workshop, Brussels, Belgium, 28-29 November 2019.

[thumbnail of 1292544_Ajienka.pdf]
Preview
Text
1292544_Ajienka.pdf - Post-print

Download (195kB) | Preview

Abstract

Research on empirical software engineering has increasingly used the data that is made available in online repositories , specifically Free/Libre/Open Source Software projects (FLOSS). The latest trends for researchers is to gather "as much data as possible" to (i) prevent bias in the representation of a small sample, (ii) work with a sample as close as the population itself, and (iii) showcase the performance of existing or new tools in treating vast amount of data. The effects of harvesting enormous amounts of data have been only marginally considered so far: data could be corrupted; repositories could be forked; and developer identities could be duplicated. In this paper we posit that there is a fundamental flaw in harvesting large amounts of data, and when generalising the conclusions: the application domain, or context, of the analysed systems must be the primary factor for the cluster sampling of FLOSS projects. This paper presents two contributions: first, we analyse a collection of 100 BENEVOL papers that appeared showing whether (and how much) FLOSS data has been harvested, and how many times the authors flagged an issue in their different application domains. Second, we discuss the implications of using 'application domain' as the clustering factor in FLOSS sampling, and the generalisations within and outside the clusters.

Item Type: Conference contribution
Creators: Capiluppi, A., Ajienka, N. and Romo, B.A.
Date: November 2019
Identifiers:
Number
Type
1292544
Other
Divisions: Schools > School of Science and Technology
Record created by: Linda Sullivan
Date Added: 20 Feb 2020 10:26
Last Modified: 20 Feb 2020 10:26
URI: https://irep.ntu.ac.uk/id/eprint/39242

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year