Multitask-based Evaluation of Open-Source LLM on Software Vulnerability

Yin, Xin; Ni, Chao; Wang, Shaohua

Computer Science > Software Engineering

arXiv:2404.02056 (cs)

[Submitted on 2 Apr 2024 (v1), last revised 26 Apr 2024 (this version, v2)]

Title:Multitask-based Evaluation of Open-Source LLM on Software Vulnerability

Authors:Xin Yin, Chao Ni, Shaohua Wang

View PDF HTML (experimental)

Abstract:This paper proposes a pipeline for quantitatively evaluating interactive LLMs using publicly available datasets. We carry out an extensive technical evaluation of LLMs using Big-Vul covering four different common software vulnerability tasks. We evaluate the multitask and multilingual aspects of LLMs based on this dataset. We find that the existing state-of-the-art methods are generally superior to LLMs in software vulnerability detection. Although LLMs improve accuracy when providing context information, they still have limitations in accurately predicting severity ratings for certain CWE types. In addition, LLMs demonstrate some ability to locate vulnerabilities for certain CWE types, but their performance varies among different CWE types. Finally, LLMs show uneven performance in generating CVE descriptions for various CWE types, with limited accuracy in a few-shot setting. Overall, though LLMs perform well in some aspects, they still need improvement in understanding the subtle differences in code vulnerabilities and the ability to describe vulnerabilities to fully realize their potential. Our evaluation pipeline provides valuable insights for further enhancing LLMs' software vulnerability handling capabilities.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2404.02056 [cs.SE]
	(or arXiv:2404.02056v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2404.02056

Submission history

From: Xin Yin [view email]
[v1] Tue, 2 Apr 2024 15:52:05 UTC (796 KB)
[v2] Fri, 26 Apr 2024 03:01:48 UTC (796 KB)

Computer Science > Software Engineering

Title:Multitask-based Evaluation of Open-Source LLM on Software Vulnerability

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Multitask-based Evaluation of Open-Source LLM on Software Vulnerability

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators