TitleChoosing a profile length in the SCAP method of source code authorship attribution
Publication TypeConference Paper
Year of Publication2014
AuthorsTennyson, M.F., Mitropoulos, F.J.
Conference NameSOUTHEASTCON 2014, IEEE
Date PublishedMarch
Keywordsauthorship attribution, C++ language, data set, frequency control, Frequency measurement, information retrieval, Java, Java language, plagiarism detection, profile length, RNA, SCAP method, software forensics, source code (software), source code authorship attribution

Source code authorship attribution is the task of determining the author of source code whose author is not explicitly known. One specific method of source code authorship attribution that has been shown to be extremely effective is the SCAP method. This method, however, relies on a parameter L that has heretofore been quite nebulous. In the SCAP method, each candidate author's known work is represented as a profile of that author, where the parameter L defines the profile's maximum length. In this study, alternative approaches for selecting a value for L were investigated. Several alternative approaches were found to perform better than the baseline approach used in the SCAP method. The approach that performed the best was empirically shown to improve the performance from 91.0% to 97.2% measured as a percentage of documents correctly attributed using a data set consisting of 7,231 programs written in Java and C++.

Citation Key6950705