Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

Eberhart, Zachary; LeClair, Alexander; McMillan, Collin

Computer Science > Software Engineering

arXiv:1912.10198 (cs)

[Submitted on 21 Dec 2019]

Title:Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

Authors:Zachary Eberhart, Alexander LeClair, Collin McMillan

View PDF

Abstract:Summary descriptions of subroutines are short (usually one-sentence) natural language explanations of a subroutine's behavior and purpose in a program. These summaries are ubiquitous in documentation, and many tools such as JavaDocs and Doxygen generate documentation built around them. And yet, extracting summaries from unstructured source code repositories remains a difficult research problem -- it is very difficult to generate clean structured documentation unless the summaries are annotated by programmers. This becomes a problem in large repositories of legacy code, since it is cost prohibitive to retroactively annotate summaries in dozens or hundreds of old programs. Likewise, it is a problem for creators of automatic documentation generation algorithms, since these algorithms usually must learn from large annotated datasets, which do not exist for many programming languages. In this paper, we present a semi-automated approach via crowdsourcing and a fully-automated approach for annotating summaries from unstructured code comments. We present experiments validating the approaches, and provide recommendations and cost estimates for automatically annotating large repositories.

Comments:	10 pages, plus references. Accepted for publication in the 27th IEEE International Conference on. Software Analysis, Evolution and Reengineering London, Ontario, Canada, February 18-21, 2020
Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:1912.10198 [cs.SE]
	(or arXiv:1912.10198v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.1912.10198

Submission history

From: Zachary Eberhart [view email]
[v1] Sat, 21 Dec 2019 05:03:10 UTC (2,731 KB)

Computer Science > Software Engineering

Title:Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators