Software Engineering
- [1] arXiv:2405.08848 [pdf, ps, html, other]
-
Title: Automated Repair of AI Code with Large Language Models and Formal VerificationSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
The next generation of AI systems requires strong safety guarantees. This report looks at the software implementation of neural networks and related memory safety properties, including NULL pointer deference, out-of-bound access, double-free, and memory leaks. Our goal is to detect these vulnerabilities, and automatically repair them with the help of large language models. To this end, we first expand the size of NeuroCodeBench, an existing dataset of neural network code, to about 81k programs via an automated process of program mutation. Then, we verify the memory safety of the mutated neural network implementations with ESBMC, a state-of-the-art software verifier. Whenever ESBMC spots a vulnerability, we invoke a large language model to repair the source code. For the latest task, we compare the performance of various state-of-the-art prompt engineering techniques, and an iterative approach that repeatedly calls the large language model.
- [2] arXiv:2405.09074 [pdf, ps, html, other]
-
Title: See to Believe: Using Visualization To Motivate Updating Third-party DependenciesChaiyong Ragkhitwetsagul, Vipawan Jarukitpipat, Raula Gaikovina Kula, Morakot Choetkiertikul, Klinton Chhun, Wachirayana Wanprasert, Thanwadee SunetnantaSubjects: Software Engineering (cs.SE)
Security vulnerabilities introduced by applications using third-party dependencies are on the increase, caused by the emergence of large ecosystems of libraries such as the NPM packages for JavaScript. Nowadays, libraries depend on each other. Relying on these large ecosystems thus means that vulnerable dependencies are not only direct but also indirect (transitive) dependencies. There are automated tool supports to manage these complex dependencies but recent work still shows that developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision.
In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update, especially when convincing developers. To test this hypothesis, we performed a user study involving 20 participants divided equally into experimental and control groups, comparing the state-of-the-art tools with the tasks of reviewing vulnerabilities with complexities and vulnerabilities with indirect dependencies.
We find that 70% of the participants who saw the visualization did re-prioritize their updates in both tasks. This is higher than the 30% and 60% of the participants who used the npm audit tool in both tasks, respectively. - [3] arXiv:2405.09075 [pdf, ps, html, other]
-
Title: Typhon: Automatic Recommendation of Relevant Code Cells in Jupyter NotebooksChaiyong Ragkhitwetsagul, Veerakit Prasertpol, Natanon Ritta, Paphon Sae-Wong, Thanapon Noraset, Morakot ChoetkiertikulSubjects: Software Engineering (cs.SE)
At present, code recommendation tools have gained greater importance to many software developers in various areas of expertise. Having code recommendation tools has enabled better productivity and performance in developing the code in software and made it easier for developers to find code examples and learn from them.
This paper proposes Typhon, an approach to automatically recommend relevant code cells in Jupyter notebooks. Typhon tokenizes developers' markdown description cells and looks for the most similar code cells from the database using text similarities such as the BM25 ranking function or CodeBERT, a machine-learning approach. Then, the algorithm computes the similarity distance between the tokenized query and markdown cells to return the most relevant code cells to the developers.
We evaluated the Typhon tool on Jupyter notebooks from Kaggle competitions and found that the approach can recommend code cells with moderate accuracy. The approach and results in this paper can lead to further improvements in code cell recommendations in Jupyter notebooks. - [4] arXiv:2405.09112 [pdf, ps, html, other]
-
Title: Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task LearningComments: 24 pages, 10 figures, ACM ESEC/FSE 2024Journal-ref: Proc. ACM Softw. Eng. 1,FSE, Article 75 (July 2024), 24 pagesSubjects: Software Engineering (cs.SE)
Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function names. We propose Epitome, a framework that enhances function name prediction using votes-based name tokenization and multi-task learning, specifically tailored for different compilation optimization binaries. Epitome learns comprehensive function semantics by pre-trained assembly language model and graph neural network, incorporating function semantics similarity prediction task, to maximize the similarity of function semantics in the context of different compilation optimization levels. In addition, we present two data preprocessing methods to improve the comprehensibility of function names. We evaluate the performance of Epitome using 2,597,346 functions extracted from binaries compiled with 5 optimizations (O0-Os) for 4 architectures (x64, x86, ARM, and MIPS). Epitome outperforms the state-of-the-art function name prediction tool by up to 44.34%, 64.16%, and 54.44% in precision, recall, and F1 score, while also exhibiting superior generalizability.
- [5] arXiv:2405.09132 [pdf, ps, html, other]
-
Title: EFACT: an External Function Auto-Completion Tool to Strengthen Static Binary LiftingSubjects: Software Engineering (cs.SE)
Static binary lifting is essential in binary rewriting frameworks. Existing tools overlook the impact of External Function Completion (EXFC) in static binary lifting. EXFC recovers the prototypes of External Functions (EXFs, functions defined in standard shared libraries) using only the function symbols available. Incorrect EXFC can misinterpret the source binary, or cause memory overflows in static binary translation, which eventually results in program crashes. Notably, existing tools struggle to recover the prototypes of mangled EXFs originating from binaries compiled from C++. Moreover, they require time-consuming manual processing to support new libraries.
This paper presents EFACT, an External Function Auto-Completion Tool for static binary lifting. Our EXF recovery algorithm better recovers the prototypes of mangled EXFs, particularly addressing the template specialization mechanism in C++. EFACT is designed as a lightweight plugin to strengthen other static binary rewriting frameworks in EXFC. Our evaluation shows that EFACT outperforms RetDec and McSema in mangled EXF recovery by 96.4% and 97.3% on SPEC CPU 2017.
Furthermore, we delve deeper into static binary translation and address several cross-ISA EXFC problems. When integrated with McSema, EFACT correctly translates 36.7% more benchmarks from x86-64 to x86-64 and 93.6% more from x86-64 to AArch64 than McSema alone on EEMBC. - [6] arXiv:2405.09165 [pdf, ps, html, other]
-
Title: An Empirical Study of Token-based Micro CommitsSubjects: Software Engineering (cs.SE)
In software development, developers frequently apply maintenance activities to the source code that change a few lines by a single commit. A good understanding of the characteristics of such small changes can support quality assurance approaches (e.g., automated program repair), as it is likely that small changes are addressing deficiencies in other changes; thus, understanding the reasons for creating small changes can help understand the types of errors introduced. Eventually, these reasons and the types of errors can be used to enhance quality assurance approaches for improving code quality. While prior studies used code churns to characterize and investigate the small changes, such a definition has a critical limitation. Specifically, it loses the information of changed tokens in a line. For example, this definition fails to distinguish the following two one-line changes: (1) changing a string literal to fix a displayed message and (2) changing a function call and adding a new parameter. These are definitely maintenance activities, but we deduce that researchers and practitioners are interested in supporting the latter change. To address this limitation, in this paper, we define micro commits, a type of small change based on changed tokens. Our goal is to quantify small changes using changed tokens. Changed tokens allow us to identify small changes more precisely. In fact, this token-level definition can distinguish the above example. We investigate defined micro commits in four OSS projects and understand their characteristics as the first empirical study on token-based micro commits. We find that micro commits mainly replace a single name or literal token, and micro commits are more likely used to fix bugs. Additionally, we propose the use of token-based information to support software engineering approaches in which very small changes significantly affect their effectiveness.
- [7] arXiv:2405.09178 [pdf, ps, html, other]
-
Title: Testing and Debugging Quantum Programs: The Road to 2030Comments: Submitted to FSE 2024 (SE2030. Software Engineering in 2030 Workshop)Subjects: Software Engineering (cs.SE); Quantum Physics (quant-ph)
Quantum Computing has existed in the theoretical realm for several decades. Recently, given the latest developments in hardware, quantum computing has re-emerged as a promising technology with the potential to solve problems that a classical computer could take hundreds of years to solve. With the rising interest in the field, there are challenges and opportunities for academics and practitioners in terms of software engineering practices, particularly in testing and debugging quantum programs. This paper presents a roadmap for addressing these challenges, pointing out the existing gaps in the literature and suggesting research directions. We present the current state-of-the-art testing and debugging strategies, including classical techniques applied to quantum programs, the development and implementation of quantum-specific assertions, and the identification and classification of bug patterns unique to quantum computing. Additionally, we introduce a conceptual model to illustrate the main concepts regarding the testing and debugging of quantum programs as well as the relationship between them. Those concepts are then used to identify and discuss the main research challenges to cope with quantum programs through 2030, focusing on the interfaces between classical and quantum computing and on creating testing and debugging techniques that take advantage of the unique quantum computing characteristics.
- [8] arXiv:2405.09181 [pdf, ps, html, other]
-
Title: StateGuard: Detecting State Derailment Defects in Decentralized Exchange Smart ContractComments: 5 pages,2 figures, prepared for Conference WWW 2024Journal-ref: WWW '24, May 2024, Pages 810-813Subjects: Software Engineering (cs.SE)
Decentralized Exchanges (DEXs), leveraging blockchain technology and smart contracts, have emerged in decentralized finance. However, the DEX project with multi-contract interaction is accompanied by complex state logic, which makes it challenging to solve state defects. In this paper, we conduct the first systematic study on state derailment defects of DEXs. These defects could lead to incorrect, incomplete, or unauthorized changes to the system state during contract execution, potentially causing security threats. We propose StateGuard, a deep learning-based framework to detect state derailment defects in DEX smart contracts. StateGuard constructs an Abstract Syntax Tree (AST) of the smart contract, extracting key features to generate a graph representation. Then, it leverages a Graph Convolutional Network (GCN) to discover defects. Evaluating StateGuard on 46 DEX projects with 5,671 smart contracts reveals its effectiveness, with a precision of 92.24%. To further verify its practicality, we used StateGuard to audit real-world smart contracts and successfully authenticated multiple novel CVEs.
- [9] arXiv:2405.09269 [pdf, ps, other]
-
Title: Preconceptual Modeling in Software Engineering: Metaphysics of Diagrammatic RepresentationsComments: 14 pages, 27 figuresSubjects: Software Engineering (cs.SE)
According to many researchers, conceptual model (CM) development is a hard task, and system requirements are difficult to collect, causing many miscommunication problems. CMs require more than modeling ability alone - they first require an understanding of the targeted domain that the model attempts to represent. Accordingly, a preconceptual modeling (pre-CM) stage is intended to address ontological issues before typical CM development is initiated. It involves defining a portion of reality when entities and processes are differentiated and integrated as unified wholes. This pre-CM phase forms the focus of research in this paper. The purpose is not show how to model; rather, it is to demonstrate how to establish a metaphysical basis of the involved portion of reality. To demonstrate such a venture, we employ the so-called thinging machine (TM) modeling that has been proposed as a high-level CM. A TM model integrates staticity and dynamism grounded in a fundamental construct called a thimac (things/machine). It involves two modes of reality, existence (events) and subsistence (regions - roughly, specifications of things and processes). Currently, the dominant approach in CM has evolved to limit its scope of application to develop ontological categorization (types of things). In the TM approach, pre-CM metaphysics is viewed as a part and parcel of CM itself. The general research problem is how to map TM constructs to what is out there in the targeted domain. Discussions involve the nature of thimacs (things and processes) and subsistence and existence as they are superimposed over each other in reality. Specifically, we make two claims, (a) the perceptibility of regions as a phenomenon and (b) the distinctiveness of existence as a construct for events. The results contribute to further the understanding of TM modeling in addition to introducing some metaphysical insights.
- [10] arXiv:2405.09314 [pdf, ps, html, other]
-
Title: Themis: Automatic and Efficient Deep Learning System Testing with Strong Fault Detection CapabilitySubjects: Software Engineering (cs.SE)
Deep Learning Systems (DLSs) have been widely applied in safety-critical tasks such as autopilot. However, when a perturbed input is fed into a DLS for inference, the DLS often has incorrect outputs (i.e., faults). DLS testing techniques (e.g., DeepXplore) detect such faults by generating perturbed inputs to explore data flows that induce faults. Since a DLS often has infinitely many data flows, existing techniques require developers to manually specify a set of activation values in a DLS's neurons for exploring fault-inducing data flows. Unfortunately, recent studies show that such manual effort is tedious and can detect only a tiny proportion of fault-inducing data flows.
In this paper, we present Themis, the first automatic DLS testing system, which attains strong fault detection capability by ensuring a full coverage of fault-inducing data flows at a high probability. Themis carries a new workflow for automatically and systematically revealing data flows whose internal neurons' outputs vary substantially when the inputs are slightly perturbed, as these data flows are likely fault-inducing. We evaluated Themis on ten different DLSs and found that on average the number of faults detected by Themis was 3.78X more than four notable DLS testing techniques. By retraining all evaluated DLSs with the detected faults, Themis also increased (regained) these DLSs' accuracies on average 14.7X higher than all baselines. - [11] arXiv:2405.09330 [pdf, ps, other]
-
Title: BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point DetectionComments: This paper has been accepted to FSE'24Subjects: Software Engineering (cs.SE)
Detecting failures and identifying their root causes promptly and accurately is crucial for ensuring the availability of microservice systems. A typical failure troubleshooting pipeline for microservices consists of two phases: anomaly detection and root cause analysis. While various existing works on root cause analysis require accurate anomaly detection, there is no guarantee of accurate estimation with anomaly detection techniques. Inaccurate anomaly detection results can significantly affect the root cause localization results. To address this challenge, we propose BARO, an end-to-end approach that integrates anomaly detection and root cause analysis for effectively troubleshooting failures in microservice systems. BARO leverages the Multivariate Bayesian Online Change Point Detection technique to model the dependency within multivariate time-series metrics data, enabling it to detect anomalies more accurately. BARO also incorporates a novel nonparametric statistical hypothesis testing technique for robustly identifying root causes, which is less sensitive to the accuracy of anomaly detection compared to existing works. Our comprehensive experiments conducted on three popular benchmark microservice systems demonstrate that BARO consistently outperforms state-of-the-art approaches in both anomaly detection and root cause analysis.
New submissions for Thursday, 16 May 2024 (showing 11 of 11 entries )
- [12] arXiv:2405.09115 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Hybrid Meta-Solving for Practical Quantum ComputingDomenik Eichhorn, Maximilian Schweikart, Nick Poser, Frederik Fiand, Benedikt Poggel, Jeanette Miriam LorenzComments: Submitted to the 2024 IEEE International Conference on Quantum Computing and Engineering (QCE)Subjects: Quantum Physics (quant-ph); Software Engineering (cs.SE)
The advent of quantum algorithms has initiated a discourse on the potential for quantum speedups for optimization problems. However, several factors still hinder a practical realization of the potential benefits. These include the lack of advanced, error-free quantum hardware, the absence of accessible software stacks for seamless integration and interaction, and the lack of methods that allow us to leverage the theoretical advantages to real-world use cases. This paper works towards the creation of an accessible hybrid software stack for solving optimization problems, aiming to create a fundamental platform that can utilize quantum technologies to enhance the solving process. We introduce a novel approach that we call Hybrid Meta-Solving, which combines classical and quantum optimization techniques to create customizable and extensible hybrid solvers. We decompose mathematical problems into multiple sub-problems that can be solved by classical or quantum solvers, and propose techniques to semi-automatically build the best solver for a given problem. Implemented in our ProvideQ toolbox prototype, Meta-Solving provides interactive workflows for accessing quantum computing capabilities. Our evaluation demonstrates the applicability of Meta-Solving in industrial use cases. It shows that we can reuse state-of-the-art classical algorithms and extend them with quantum computing techniques. Our approach is designed to be at least as efficient as state-of-the-art classical techniques, while having the potential to outperform them if future advances in the quantum domain are made.
- [13] arXiv:2405.09255 (cross-list from cs.HC) [pdf, ps, html, other]
-
Title: Reinforcement Learning-Based Framework for the Intelligent Adaptation of User InterfacesComments: To be published in Companion of the16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS Companion '24). 9 pages, 2 figures, 28 referencesSubjects: Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)
Adapting the user interface (UI) of software systems to meet the needs and preferences of users is a complex task. The main challenge is to provide the appropriate adaptations at the appropriate time to offer value to end-users. Recent advances in Machine Learning (ML) techniques may provide effective means to support the adaptation process. In this paper, we instantiate a reference framework for Intelligent User Interface Adaptation by using Reinforcement Learning (RL) as the ML component to adapt user interfaces and ultimately improving the overall User Experience (UX). By using RL, the system is able to learn from past adaptations to improve the decision-making capabilities. Moreover, assessing the success of such adaptations remains a challenge. To overcome this issue, we propose to use predictive Human-Computer Interaction (HCI) models to evaluate the outcome of each action (ie adaptations) performed by the RL agent. In addition, we present an implementation of the instantiated framework, which is an extension of OpenAI Gym, that serves as a toolkit for developing and comparing RL algorithms. This Gym environment is highly configurable and extensible to other UI adaptation contexts. The evaluation results show that our RL-based framework can successfully train RL agents able to learn how to adapt UIs in a specific context to maximize the user engagement by using an HCI model as rewards predictor.
- [14] arXiv:2405.09398 (cross-list from cs.DC) [pdf, ps, html, other]
-
Title: Encrypted Container File: Design and Implementation of a Hybrid-Encrypted Multi-Recipient File StructureComments: 7 pages, for associated implementation etc., see this https URLJournal-ref: Proc of the 14th International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2023), Nice, France, June 2023, pp. 1-7, ISSN 2308-4294Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Modern software engineering trends towards Cloud-native software development by international teams of developers. Cloud-based version management services, such as GitHub, are used for the source code and other artifacts created during the development process. However, using such a service usually means that every developer has access to all data stored on the platform. Particularly, if the developers belong to different companies or organizations, it would be desirable for sensitive files to be encrypted in such a way that these can only be decrypted again by a group of previously defined people. In this paper, we examine currently available tools that address this problem, but which have certain shortcomings. We then present our own solution, Encrypted Container Files (ECF), for this problem, eliminating the deficiencies found in the other tools.
Cross submissions for Thursday, 16 May 2024 (showing 3 of 3 entries )
- [15] arXiv:2104.02933 (replaced) [pdf, ps, html, other]
-
Title: Does the First Response Matter for Future Contributions? A Study of First ContributionsNoppadol Assavakamhaenghan, Supatsara Wattanakriengkrai, Naomichi Shimada, Raula Gaikovina Kula, Takashi Ishio, Kenichi MatsumotoSubjects: Software Engineering (cs.SE)
Open Source Software (OSS) projects rely on a continuous stream of new contributors for their livelihood. Recent studies reported that new contributors experience many barriers in their first contribution, with the social barrier being critical. Although a number of studies investigated the social barriers to new contributors, we hypothesize that negative first responses may cause an unpleasant feeling, and subsequently lead to the discontinuity of any future contribution. We execute protocols of a registered report to analyze 2,765,917 first contributions as Pull Requests (PRs) with 642,841 first responses. We characterize most first response as being positive, but less responsive, and exhibiting sentiments of fear, joy and love. Results also indicate that negative first responses have the literal intention to arouse emotions of being either constructive (50.71%) or criticizing (37.68%) in nature. Running different machine learning models, we find that predicting future interactions is low (F1 score of 0.6171), but relatively better than baselines. Furthermore, an analysis of these models show that interactions are positively correlated with a future contribution, with other dimensions (i.e., project, contributor, contribution) having a large effect.
- [16] arXiv:2212.11629 (replaced) [pdf, ps, html, other]
-
Title: Graph-Based Specification and Automated Construction of ILP ProblemsSebastian Ehmes (Technical University of Darmstadt, Real-Time Systems Lab, Germany), Maximilian Kratz (Technical University of Darmstadt, Real-Time Systems Lab, Germany), Andy Schürr (Technical University of Darmstadt, Real-Time Systems Lab, Germany)Comments: In Proceedings GCM 2022, arXiv:2212.10975; ILP section updated, acknowledgement updatedJournal-ref: EPTCS 374, 2022, pp. 3-22Subjects: Software Engineering (cs.SE)
In the Model-Driven Software Engineering (MDSE) community, the combination of techniques operating on graph-based models (e.g., Pattern Matching (PM) and Graph Transformation (GT)) and Integer Linear Programming (ILP) is a common occurrence, since ILP solvers offer a powerful approach to solve linear optimization problems and help to enforce global constraints while delivering optimal solutions. However, designing and specifying complex optimization problems from more abstract problem descriptions can be a challenging task. A designer must be an expert in the specific problem domain as well as the ILP optimization domain to translate the given problem into a valid ILP problem. Typically, domain-specific ILP problem generators are hand-crafted by experts, to avoid specifying a new ILP problem by hand for each new instance of a problem domain. Unfortunately, the task of writing ILP problem generators is an exercise, which has to be repeated for each new scenario, tool, and approach. For this purpose, we introduce the GIPS (Graph-Based ILP Problem Specification Tool) framework that simplifies the development of ILP problem generators for graph-based optimization problems and a new Domain-Specific Language (DSL) called GIPSL (Graph-Based ILP Problem Specification Language) that integrates GT and ILP problems on an abstract level. Our approach uses GIPSL specifications as a starting point to derive ILP problem generators for a specific application domain automatically. First experiments show that the derived ILP problem generators can compete with hand-crafted programs developed by ILP experts.
- [17] arXiv:2308.11082 (replaced) [pdf, ps, html, other]
-
Title: PrAIoritize: Automated Early Prediction and Prioritization of Vulnerabilities in Smart ContractsSubjects: Software Engineering (cs.SE)
Context:Smart contracts are prone to numerous security threats due to undisclosed vulnerabilities and code weaknesses. In Ethereum smart contracts, the challenges of timely addressing these code weaknesses highlight the critical need for automated early prediction and prioritization during the code review process. Efficient prioritization is crucial for smart contract security. Objective:Toward this end, our research aims to provide an automated approach, PrAIoritize, for prioritizing and predicting critical code weaknesses in Ethereum smart contracts during the code review process. Method: To do so, we collected smart contract code reviews sourced from Open Source Software (OSS) on GitHub and the Common Vulnerabilities and Exposures (CVE) database. Subsequently, we developed PrAIoritize, an innovative automated prioritization approach. PrAIoritize integrates advanced Large Language Models (LLMs) with sophisticated natural language processing (NLP) techniques. PrAIoritize automates code review labeling by employing a domain-specific lexicon of smart contract weaknesses and their impacts. Following this, feature engineering is conducted for code reviews, and a pre-trained DistilBERT model is utilized for priority classification. Finally, the model is trained and evaluated using code reviews of smart contracts. Results: Our evaluation demonstrates significant improvement over state-of-the-art baselines and commonly used pre-trained models (e.g. T5) for similar classification tasks, with 4.82\%-27.94\% increase in F-measure, precision, and recall. Conclusion: By leveraging PrAIoritize, practitioners can efficiently prioritize smart contract code weaknesses, addressing critical code weaknesses promptly and reducing the time and effort required for manual triage.
- [18] arXiv:2312.08945 (replaced) [pdf, ps, html, other]
-
Title: A Comparative Gas Cost Analysis of Proxy and Diamond Patterns in EVM Blockchains for Trusted Smart Contract EngineeringSubjects: Software Engineering (cs.SE)
Blockchain applications are witnessing rapid evolution, necessitating the integration of upgradeable smart contracts. Software patterns have been proposed to summarize upgradeable smart contract best practices. However, research is missing on the comparison of these upgradeable smart contract patterns, especially regarding gas costs related to deployment and execution. This study aims to provide an in-depth analysis of gas costs associated with two prevalent upgradeable smart contract patterns: the Proxy and diamond patterns. The Proxy pattern utilizes a Proxy pointing to a logic contract, while the diamond pattern enables a Proxy to point to multiple logic contracts. We conduct a comparative analysis of gas costs for both patterns in contrast to a traditional non-upgradeable smart contract. We derive from this analysis a theoretical contribution in the form of two consolidated blockchain patterns and a corresponding decision model. By so doing we hope to contribute to the broader understanding of upgradeable smart contract patterns.
- [19] arXiv:2402.13823 (replaced) [pdf, ps, html, other]
-
Title: Using Large Language Models for Natural Language Processing Tasks in Requirements Engineering: A Systematic GuidelineSubjects: Software Engineering (cs.SE)
Large Language Models (LLMs) are the cornerstone in automating Requirements Engineering (RE) tasks, underpinning recent advancements in the field. Their pre-trained comprehension of natural language is pivotal for effectively tailoring them to specific RE tasks. However, selecting an appropriate LLM from a myriad of existing architectures and fine-tuning it to address the intricacies of a given task poses a significant challenge for researchers and practitioners in the RE domain. Utilizing LLMs effectively for NLP problems in RE necessitates a dual understanding: firstly, of the inner workings of LLMs, and secondly, of a systematic approach to selecting and adapting LLMs for NLP4RE tasks. This chapter aims to furnish readers with essential knowledge about LLMs in its initial segment. Subsequently, it provides a comprehensive guideline tailored for students, researchers, and practitioners on harnessing LLMs to address their specific objectives. By offering insights into the workings of LLMs and furnishing a practical guide, this chapter contributes towards improving future research and applications leveraging LLMs for solving RE challenges.
- [20] arXiv:2404.05388 (replaced) [pdf, ps, html, other]
-
Title: An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle MappingComments: 1st ACM International Conference on AI-powered Software (AIware)Subjects: Software Engineering (cs.SE)
The advent of advanced AI underscores the urgent need for comprehensive safety evaluations, necessitating collaboration across communities (i.e., AI, software engineering, and governance). However, divergent practices and terminologies across these communities, combined with the complexity of AI systems-of which models are only a part-and environmental affordances (e.g., access to tools), obstruct effective communication and comprehensive evaluation. This paper proposes a framework for AI system evaluation comprising three components: 1) harmonised terminology to facilitate communication across communities involved in AI safety evaluation; 2) a taxonomy identifying essential elements for AI system evaluation; 3) a mapping between AI lifecycle, stakeholders, and requisite evaluations for accountable AI supply chain. This framework catalyses a deeper discourse on AI system evaluation beyond model-centric approaches.
- [21] arXiv:2405.02213 (replaced) [pdf, ps, html, other]
-
Title: Automatic Programming: Large Language Models and BeyondSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance
- [22] arXiv:2405.04600 (replaced) [pdf, ps, html, other]
-
Title: Contextual API Completion for Unseen Repositories Using LLMsSubjects: Software Engineering (cs.SE)
Large language models have made substantial progress in addressing diverse code-related tasks. However, their adoption is hindered by inconsistencies in generating output due to the lack of real-world, domain-specific information, such as for intra-repository API calls for unseen software projects. We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual information within a code repository for API completion tasks. Our approach is tailored to refine code completion tasks, with a focus on optimizing local API completions. We examine relevant import statements during API completion to derive insights into local APIs, drawing from their method signatures. For API token completion, we analyze the inline variables and correlate them with the appropriate imported modules, thereby allowing our approach to rank the most contextually relevant suggestions from the available local APIs. Further, for conversational API completion, we gather APIs that are most relevant to the developer query with a retrieval-based search across the project. We employ our tool, LANCE, within the framework of our proposed benchmark, APIEval, encompassing two different programming languages. Our evaluation yields an average accuracy of 82.6% for API token completion and 76.9% for conversational API completion tasks. On average, LANCE surpasses Copilot by 143% and 142% for API token completion and conversational API completion, respectively. The implications of our findings are substantial for developers, suggesting that our lightweight context analysis can be applied to multilingual environments without language-specific training or fine-tuning, allowing for efficient implementation with minimal examples and effort.