Computer Science

New submissions

Submissions received from Thu 25 Apr 24 to Fri 26 Apr 24, announced Mon, 29 Apr 24

New submissions
Cross-lists
Replacements

[ total of 603 entries: 1-603 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 29 Apr 24

[1] arXiv:2404.16833 [pdf, other]: Title: Leaf-Based Plant Disease Detection and Explainable AI

Authors: Saurav Sagar, Mohammed Javed, David S Doermann

Comments: To appear in a Journal/Conference

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The agricultural sector plays an essential role in the economic growth of a country. Specifically, in an Indian context, it is the critical source of livelihood for millions of people living in rural areas. Plant Disease is one of the significant factors affecting the agricultural sector. Plants get infected with diseases for various reasons, including synthetic fertilizers, archaic practices, environmental conditions, etc., which impact the farm yield and subsequently hinder the economy. To address this issue, researchers have explored many applications based on AI and Machine Learning techniques to detect plant diseases. This research survey provides a comprehensive understanding of common plant leaf diseases, evaluates traditional and deep learning techniques for disease detection, and summarizes available datasets. It also explores Explainable AI (XAI) to enhance the interpretability of deep learning models' decisions for end-users. By consolidating this knowledge, the survey offers valuable insights to researchers, practitioners, and stakeholders in the agricultural sector, fostering the development of efficient and transparent solutions for combating plant diseases and promoting sustainable agricultural practices.
[2] arXiv:2404.16835 [pdf, ps, other]: Title: Quantifying Lifetime Productivity Changes: A Longitudinal Study of 325,000 Late-Career Scientists

Authors: Marek Kwiek, Lukasz Szymula

Comments: 31 pages, 7 figures, 4 tables plus Electronic Supplementary Materials

Subjects: Digital Libraries (cs.DL)

This study focuses on persistence in research productivity over the course of an individual's entire scientific careers. We track 'late-career' scientists (N=324,463) in 16 STEMM disciplines (science, technology, engineering, mathematics, and medicine) from 38 OECD countries for up to five decades. We examine the details of their mobility patterns between the top, middle, and bottom productivity classes. Methodologically, we turn a large-scale publication and citation bibliometric dataset into a comprehensive, longitudinal data source for research on careers in science. The global science system emerges as highly immobile: 60% of global top performers continue their careers as top performers and half of global bottom performers as bottom performers. Jumpers-Up and Droppers-Down are extremely rare in science. Our regression analyses show that productivity is highly path dependent: for all disciplines examined, there is a single most important predictor of being a top performer: being a top performer at an earlier career stage.
[3] arXiv:2404.16836 [pdf, other]: Title: The Division Problem of Chances

Authors: Rasoul Ramezanian

Comments: 52 pages, 5 figures

Subjects: Computer Science and Game Theory (cs.GT)

In frequently repeated matching scenarios, individuals may require diversification in their choices. Therefore, when faced with a set of potential outcomes, each individual may have an ideal lottery over outcomes that represents their preferred option. This suggests that, as people seek variety, their favorite choice is not a particular outcome, but rather a lottery over them as their peak for their preferences.
We explore matching problems in situations where agents' preferences are represented by ideal lotteries. Our focus lies in addressing the challenge of dividing chances in matching, where agents express their preferences over a set of objects through ideal lotteries that reflect their single-peaked preferences.
We discuss properties such as strategy proofness, replacement monotonicity, (Pareto) efficiency, in-betweenness, non-bossiness, envy-freeness, and anonymity in the context of dividing chances, and propose a class of mechanisms called URC mechanisms that satisfy these properties. Subsequently, we prove that if a mechanism for dividing chances is strategy proof, (Pareto) efficient, replacement monotonic, in-between, non-bossy, and anonymous (or envy free), then it is equivalent in terms of welfare to a URC mechanism.
[4] arXiv:2404.16837 [pdf, ps, other]: Title: The Security Performance Analysis of Blockchain System Based on Post-Quantum Cryptography -- A Case Study of Cryptocurrency Exchanges

Authors: Abel C. H. Chen

Comments: in Chinese language

Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Software Engineering (cs.SE)

The current blockchain system for cryptocurrency exchanges primarily employs elliptic curve cryptography (ECC) for generating key pairs in wallets, and elliptic curve digital signature algorithms (ECDSA) for generating signatures in transactions. Consequently, with the maturation of quantum computing technology, the current blockchain system faces the risk of quantum computing attacks. Quantum computers may potentially counterfeit signatures produced by ECDSA. Therefore, this study analyzes the vulnerabilities of the current blockchain system to quantum computing attacks and proposes a post-quantum cryptography (PQC)-based blockchain system to enhance security by addressing and improving each identified weakness. Furthermore, this study proposes PQC-based wallets and PQC-based transactions, utilizing PQC digital signature algorithms to generate PQC-based signatures for the inputs in PQC-based transactions, thereby preventing signatures from being counterfeited by quantum computing. Experimental results demonstrate that the efficiency of the Dilithium algorithm, a PQC digital signature algorithm, in producing wallets, generating signatures, and verifying signatures surpasses that of ECDSA in the current blockchain system. Furthermore, the Dilithium algorithm also exhibits a higher security level.
[5] arXiv:2404.16838 [pdf, other]: Title: Predicting SSH keys in Open SSH Memory dumps

Authors: Florian Rascoussier

Comments: The report contains 148 pages, 22 figures, 17 tables and 34 listings. The GitHub of the project can be accessed here: this https URL This work is part of an ongoing effort for publication

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

As the digital landscape evolves, cybersecurity has become an indispensable focus of IT systems. Its ever-escalating challenges have amplified the importance of digital forensics, particularly in the analysis of heap dumps from main memory. In this context, the Secure Shell protocol (SSH) designed for encrypted communications, serves as both a safeguard and a potential veil for malicious activities. This research project focuses on predicting SSH keys in OpenSSH memory dumps, aiming to enhance protective measures against illicit access and enable the development of advanced security frameworks or tools like honeypots. This Masterarbeit is situated within the broader SmartVMI project, and seeks to build upon existing research on key prediction in OpenSSH heap dumps. Utilizing machine learning (ML) and deep learning models, the study aims to refine features for embedding techniques and explore innovative methods for effective key detection based on recent advancements in Knowledge Graph and ML. The objective is to accurately predict the presence and location of SSH keys within memory dumps. This work builds upon, and aims to enhance, the foundations laid by SSHkex and SmartKex, enriching both the methodology and the results of the original research while exploring the untapped potential of newly proposed approaches. The current thesis dives into memory graph modelization from raw binary heap dump files. Each memory graph can support a range of embeddings that can be used directly for model training, through the use of classic ML models and graph neural network. It offers an in-depth discussion on the current state-of-the-art in key prediction for OpenSSH memory dumps, research questions, experimental setups, programs development, results as well as discussing potential future directions.
[6] arXiv:2404.16839 [pdf, ps, other]: Title: Immersed in Reality Secured by Design -- A Comprehensive Analysis of Security Measures in AR/VR Environments

Authors: Sameer Chauhan, Luv Sachdeva

Comments: Cybersecurity. Augmented Reality on, Virtual Reality Implementation errors, Data security and efficiency

Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

Virtual reality and related technologies such as mixed and augmented reality have received extensive coverage in both mainstream and fringe media outlets. When the subject goes to a new AR headset, another AR device, or AR glasses, the talk swiftly shifts to the technical and design details. Unfortunately, no one seemed to care about security. Data theft and other forms of cyberattack pose serious threats to virtual reality systems. Virtual reality goggles are just specialist versions of computers or Internet of Things devices, whereas virtual reality experiences are software packages. As a result, AR systems are just as vulnerable as any other Internet of Things (IoT) device we use on a daily basis, such as computers, tablets, and phones. Preventing and responding to common cybersecurity threats and assaults is crucial. Cybercriminals can exploit virtual reality headsets just like any other computer system. This paper analysis the data breach induced by these assaults could result in a variety of concerns, including but not limited to identity theft, the unauthorized acquisition of personal information or network credentials, damage to hardware and software, and so on. Augmented reality (AR) allows for real-time monitoring and visualization of network activity, system logs, and security alerts. This allows security professionals to immediately identify threats, monitor suspicious activities, and fix any issues that develop. This data can be displayed in an aesthetically pleasing and intuitively structured format using augmented reality interfaces, enabling for faster analysis and decision-making.
[7] arXiv:2404.16840 [pdf, ps, other]: Title: Biometrics Employing Neural Network

Authors: Sajjad Bhuiyan

Comments: 14 Pages, 10 figures, Survey Paper

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Biometrics involves using unique human traits, both physical and behavioral, for the digital identification of individuals to provide access to systems, devices, or information. Within the field of computer science, it acts as a method for identifying and verifying individuals and controlling access. While the conventional method for personal authentication involves passwords, the vulnerability arises when passwords are compromised, allowing unauthorized access to sensitive actions. Biometric authentication presents a viable answer to this problem and is the most secure and user-friendly authentication method. Today, fingerprints, iris and retina patterns, facial recognition, hand shapes, palm prints, and voice recognition are frequently used forms of biometrics. Despite the diverse nature of these biometric identifiers, the core objective remains consistent ensuring security, recognizing authorized users, and rejecting impostors. Hence, it is crucial to determine accurately whether the characteristics belong to the rightful person. For systems to be effective and widely accepted, the error rate in recognition and verification must approach zero. It is acknowledged that current biometric techniques, while advanced, are not infallible and require continuous improvement. A more refined classifier is deemed necessary to classify patterns accurately. Artificial Neural Networks, which simulate the human brain's operations, present themselves as a promising approach. The survey presented herein explores various biometric techniques based on neural networks, emphasizing the ongoing quest for enhanced accuracy and reliability. It concludes that The utilization of neural networks along with biometric features not only enhances accuracy but also contributes to overall better security.
[8] arXiv:2404.16841 [pdf, other]: Title: Machine Unlearning in Large Language Models

Authors: Kongyang Chen, Zixin Wang, Bing Mi, Waixi Liu, Shaowei Wang, Xiaojun Ren, Jiaxing Shen

Subjects: Cryptography and Security (cs.CR)

Recently, large language models (LLMs) have emerged as a notable field, attracting significant attention for its ability to automatically generate intelligent contents for various application domains. However, LLMs still suffer from significant security and privacy issues. For example, LLMs might expose user privacy from hacking attacks or targeted prompts. To address this problem, this paper introduces a novel machine unlearning framework into LLMs. Our objectives are to make LLMs not produce harmful, hallucinatory, or privacy-compromising responses, while retaining their standard output capabilities. To accomplish this, we use an evaluative model to pinpoint dialogues needing unlearning. We also establish a distance loss to function as the model's negative loss, diverting it from previous undesirable outputs. Furthermore, we determine the expected output's cluster mean to formulate a positive loss, directing the model's outputs toward preferable outcomes without compromising its reasoning abilities and performance. Experimental results show that our approach effectively meets unlearning objectives without substantially compromising model performance.
[9] arXiv:2404.16842 [pdf, ps, other]: Title: Cybersecurity Threat Analysis And Attack Simulations For Unmanned Aerial Vehicle Networks

Authors: Charles Abdulrazak

Comments: MSc thesis

Subjects: Cryptography and Security (cs.CR)

Drones, also known as unmanned air vehicles (UAVs), have revolutionised various industries, from farming to national security. (Wexler., Lesley. 2016) However, their broad use has revealed a severe weakness in cybersecurity. (Jean-Paul Yaacoub 2020) The urgent necessity to defend UAV networks from new cyber threats is explored in-depth in this research, making it a crucial subject for both technological development and national security. The two essential areas of our study are assault simulation and threat analysis in cybersecurity. This work demonstrates how easy it is to hack a drone mid-flight using only a Raspberry Pi3 and open-source online tools. This work illustrates the ability to penetrate a DJI drone currently used by the mercenary soldiers in the Ukraine war. (Greg Myre March, 2023) This research examines strategies used to attack UAV networks, such as the de-authentic attack and the man-in-the-middle attack. This work investigates the weaknesses in these networks' sophisticated attack simulations with a Raspberry PI 3 and the Alpha network adaptor from Amazon, showing that basic tools are needed to perform cyberattacks on drones. This research proposes creative solutions and preventative methods for protecting UAV operations and highlights the seriousness of the problem. As drones become more prevalent daily, maintaining their security becomes crucial. This work provides a compelling perspective on protecting vital infrastructure and preserving our skies by bridging the gap between the latest technologies and cybersecurity.
[10] arXiv:2404.16843 [pdf, other]: Title: Enhancing Data Security through Rainbow Antimagic Graph Coloring for Secret-Share Distribution and Reconstruction

Authors: Raul M. Falcon, K. Abirami, N. Mohanapriya, Dafik

Subjects: Cryptography and Security (cs.CR)

Now-a-days, ensuring data security has become an increasingly formidable challenge in safeguarding individuals' sensitive information. Secret-sharing scheme has evolved as a most successful cryptographic technique that allows a secret to be divided or distributed among a group of participants in such a way that only a subset of those participants can reconstruct the original secret. This provides a safe level of security and redundancy, ensuring that no single individual possesses the complete secret. The implementation of Rainbow Antimagic coloring within these schemes not only safeguards the data but also ensures an advanced level of information security among multi-participant groups. Additionally, the retrieved data is reconstructed and can be disseminated to all group participants via multiple rounds of communication.
[11] arXiv:2404.16844 [pdf, other]: Title: Sugarcane Health Monitoring With Satellite Spectroscopy and Machine Learning: A Review

Authors: Ethan Kane Waters, Carla Chia-Ming Chen, Mostafa Rahimi Azghadi

Comments: 22 pages, 6 figures, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)

Research into large-scale crop monitoring has flourished due to increased accessibility to satellite imagery. This review delves into previously unexplored and under-explored areas in sugarcane health monitoring and disease/pest detection using satellite-based spectroscopy and Machine Learning (ML). It discusses key considerations in system development, including relevant satellites, vegetation indices, ML methods, factors influencing sugarcane reflectance, optimal growth conditions, common diseases, and traditional detection methods. Many studies highlight how factors like crop age, soil type, viewing angle, water content, recent weather patterns, and sugarcane variety can impact spectral reflectance, affecting the accuracy of health assessments via spectroscopy. However, these variables have not been fully considered in the literature. In addition, the current literature lacks comprehensive comparisons between ML techniques and vegetation indices. We address these gaps in this review. We discuss that, while current findings suggest the potential for an ML-driven satellite spectroscopy system for monitoring sugarcane health, further research is essential. This paper offers a comprehensive analysis of previous research to aid in unlocking this potential and advancing the development of an effective sugarcane health monitoring system using satellite technology.
[12] arXiv:2404.16845 [pdf, other]: Title: HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections

Authors: Chen Dudai, Morris Alper, Hana Bezalel, Rana Hanocka, Itai Lang, Hadar Averbuch-Elor

Comments: Eurographics 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Internet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large-scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine-grained understanding. In constrained 3D domains, recent methods have leveraged vision-and-language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain. In this work, we present a localization system that connects neural representations of scenes depicting large-scale landmarks with text describing a semantic region within the scene, by harnessing the power of SOTA vision-and-language models with adaptations for understanding landmark scene semantics. To bolster such models with fine-grained knowledge, we leverage large-scale Internet data containing images of similar landmarks along with weakly-related textual information. Our approach is built upon the premise that images physically grounded in space can provide a powerful supervision signal for localizing new concepts, whose semantics may be unlocked from Internet textual metadata with large language models. We use correspondences between views of scenes to bootstrap spatial understanding of these semantics, providing guidance for 3D-compatible segmentation that ultimately lifts to a volumetric scene representation. Our results show that HaLo-NeRF can accurately localize a variety of semantic concepts related to architectural landmarks, surpassing the results of other 3D models as well as strong 2D segmentation baselines. Our project page is at https://tau-vailab.github.io/HaLo-NeRF/.
[13] arXiv:2404.16846 [pdf, other]: Title: Securing Bluetooth Low Energy: A Literature Review

Authors: Zhe Wang

Subjects: Cryptography and Security (cs.CR)

Bluetooth Low Energy (BLE) technology, operating within the widely used 2.4 GHz ISM band, stands as a cornerstone in modern wireless communication frameworks alongside its classic Bluetooth counterpart. This paper delves into the foundational aspects of BLE, excluding niche components, to explore its core functionalities and pivotal role in diverse connectivity needs. BLE's specialization in catering to low-power devices ensures optimal energy utilization, making it indispensable in IoT applications where energy efficiency is paramount. Its versatility finds applications across consumer electronics, industrial automation, and healthcare, ensuring reliability and efficiency in safety-critical systems and enhancing user convenience through remote control capabilities. However, the wireless nature of BLE interfaces exposes them to cybersecurity threats, necessitating robust security measures for mitigating risks such as sniffing, DoS attacks, and message injection. Continuous research and development efforts are essential to stay ahead of emerging threats and safeguard BLE-enabled systems and data.
[14] arXiv:2404.16847 [pdf, other]: Title: State-of-the-Art Approaches to Enhancing Privacy Preservation of Machine Learning Datasets: A Survey

Authors: Chaoyu Zhang

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

This paper examines the evolving landscape of machine learning (ML) and its profound impact across various sectors, with a special focus on the emerging field of Privacy-preserving Machine Learning (PPML). As ML applications become increasingly integral to industries like telecommunications, financial technology, and surveillance, they raise significant privacy concerns, necessitating the development of PPML strategies. The paper highlights the unique challenges in safeguarding privacy within ML frameworks, which stem from the diverse capabilities of potential adversaries, including their ability to infer sensitive information from model outputs or training data.
We delve into the spectrum of threat models that characterize adversarial intentions, ranging from membership and attribute inference to data reconstruction. The paper emphasizes the importance of maintaining the confidentiality and integrity of training data, outlining current research efforts that focus on refining training data to minimize privacy-sensitive information and enhancing data processing techniques to uphold privacy.
Through a comprehensive analysis of privacy leakage risks and countermeasures in both centralized and collaborative learning settings, this paper aims to provide a thorough understanding of effective strategies for protecting ML training data against privacy intrusions. It explores the balance between data privacy and model utility, shedding light on privacy-preserving techniques that leverage cryptographic methods, Differential Privacy, and Trusted Execution Environments. The discussion extends to the application of these techniques in sensitive domains, underscoring the critical role of PPML in ensuring the privacy and security of ML systems.
[15] arXiv:2404.16848 [pdf, ps, other]: Title: Cyber Security issues and Blockchain-Deep Learning based solutions for UAV and Internet of Drones (FANETs)

Authors: Partha Protim Datta

Subjects: Cryptography and Security (cs.CR); Signal Processing (eess.SP)

Safety-critical systems such as automated embedded or industrial systems have a strong dependency on the trustworthiness of data collection. As sensors are the critical component for those systems, it is imperative to address the attack resilience of sensors
[16] arXiv:2404.16849 [pdf, ps, other]: Title: Smart Grids Secured By Dynamic Watermarking: How Secure?

Authors: Kate Davis, Laszlo B. Kish, Chanan Singh

Comments: Accepted for publication in Fluct. Noise Lett

Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

Unconditional security for smart grids is defined. Cryptanalyses of the watermarked security of smart grids indicate that watermarking cannot guarantee unconditional security unless the communication within the grid system is unconditionally secure. The successful attack against the dynamically watermarked smart grid remains valid even with the presence of internal noise from the grid. An open question arises: if unconditionally authenticated secure communications within the grid, together with tamper resistance of the critical elements, are satisfactory conditions to provide unconditional security for the grid operation.
[17] arXiv:2404.16850 [pdf, other]: Title: Membership Information Leakage in Federated Contrastive Learning

Authors: Kongyang Chen, Wenfeng Wang, Zixin Wang, Wangjun Zhang, Zhipeng Li, Yao Huang

Subjects: Cryptography and Security (cs.CR)

Federated Contrastive Learning (FCL) represents a burgeoning approach for learning from decentralized unlabeled data while upholding data privacy. In FCL, participant clients collaborate in learning a global encoder using unlabeled data, which can serve as a versatile feature extractor for diverse downstream tasks. Nonetheless, FCL is susceptible to privacy risks, such as membership information leakage, stemming from its distributed nature, an aspect often overlooked in current solutions. This study delves into the feasibility of executing a membership inference attack on FCL and proposes a robust attack methodology. The attacker's objective is to determine if the data signifies training member data by accessing the model's inference output. Specifically, we concentrate on attackers situated within a client framework, lacking the capability to manipulate server-side aggregation methods or discern the training status of other clients. We introduce two membership inference attacks tailored for FCL: the \textit{passive membership inference attack} and the \textit{active membership inference attack}, contingent on the attacker's involvement in local model training. Experimental findings across diverse datasets validate the effectiveness of our attacks and underscore the inherent privacy risks associated with the federated contrastive learning paradigm.
[18] arXiv:2404.16851 [pdf, other]: Title: EdgeLeakage: Membership Information Leakage in Distributed Edge Intelligence Systems

Authors: Kongyang Chen, Yi Lin, Hui Luo, Bing Mi, Yatie Xiao, Chao Ma, Jorge Sá Silva

Subjects: Cryptography and Security (cs.CR)

In contemporary edge computing systems, decentralized edge nodes aggregate unprocessed data and facilitate data analytics to uphold low transmission latency and real-time data processing capabilities. Recently, these edge nodes have evolved to facilitate the implementation of distributed machine learning models, utilizing their computational resources to enable intelligent decision-making, thereby giving rise to an emerging domain referred to as edge intelligence. However, within the realm of edge intelligence, susceptibility to numerous security and privacy threats against machine learning models becomes evident. This paper addresses the issue of membership inference leakage in distributed edge intelligence systems. Specifically, our focus is on an autonomous scenario wherein edge nodes collaboratively generate a global model. The utilization of membership inference attacks serves to elucidate the potential data leakage in this particular context. Furthermore, we delve into the examination of several defense mechanisms aimed at mitigating the aforementioned data leakage problem. Experimental results affirm that our approach is effective in detecting data leakage within edge intelligence systems, and the implementation of our defense methods proves instrumental in alleviating this security threat. Consequently, our findings contribute to safeguarding data privacy in the context of edge intelligence systems.
[19] arXiv:2404.16852 [pdf, other]: Title: A Disease Labeler for Chinese Chest X-Ray Report Generation

Authors: Mengwei Wang, Ruixin Yan, Zeyi Hou, Ning Lang, Xiuzhuang Zhou

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Image and Video Processing (eess.IV)

In the field of medical image analysis, the scarcity of Chinese chest X-ray report datasets has hindered the development of technology for generating Chinese chest X-ray reports. On one hand, the construction of a Chinese chest X-ray report dataset is limited by the time-consuming and costly process of accurate expert disease annotation. On the other hand, a single natural language generation metric is commonly used to evaluate the similarity between generated and ground-truth reports, while the clinical accuracy and effectiveness of the generated reports rely on an accurate disease labeler (classifier). To address the issues, this study proposes a disease labeler tailored for the generation of Chinese chest X-ray reports. This labeler leverages a dual BERT architecture to handle diagnostic reports and clinical information separately and constructs a hierarchical label learning algorithm based on the affiliation between diseases and body parts to enhance text classification performance. Utilizing this disease labeler, a Chinese chest X-ray report dataset comprising 51,262 report samples was established. Finally, experiments and analyses were conducted on a subset of expert-annotated Chinese chest X-ray reports, validating the effectiveness of the proposed disease labeler.
[20] arXiv:2404.16853 [pdf, other]: Title: Expectation Entropy as a Password Strength Metric

Authors: Khan Reaz, Gerhard Wunder

Subjects: Cryptography and Security (cs.CR)

The classical combinatorics-based password strength formula provides a result in tens of bits, whereas the NIST Entropy Estimation Suite give a result between 0 and 1 for Min-entropy. In this work, we present a newly developed metric -- Expectation entropy that can be applied to estimate the strength of any random or random-like password. Expectation entropy provides the strength of a password on the same scale as an entropy estimation tool. Having an 'Expectation entropy' of a certain value, for example, 0.4 means that an attacker has to exhaustively search at least 40\% of the total number of guesses to find the password.
[21] arXiv:2404.16854 [pdf, other]: Title: Dynamic Vulnerability Criticality Calculator for Industrial Control Systems

Authors: Pavlos Cheimonidis, Kontantinos Rantos

Subjects: Cryptography and Security (cs.CR)

The convergence of information and communication technologies has introduced new and advanced capabilities to Industrial Control Systems. However, concurrently, it has heightened their vulnerability to cyber attacks. Consequently, the imperative for new security methods has emerged as a critical need for these organizations to effectively identify and mitigate potential threats. This paper introduces an innovative approach by proposing a dynamic vulnerability criticality calculator. Our methodology encompasses the analysis of environmental topology and the effectiveness of deployed security mechanisms, coupled with the utilization of the Common Vulnerability Scoring System framework to adjust detected vulnerabilities based on the specific environment. Moreover, it evaluates the quantity of vulnerabilities and their interdependencies within each asset. Additionally, our approach integrates these factors into a comprehensive Fuzzy Cognitive Map model, incorporating attack paths to holistically assess the overall vulnerability score. To validate the efficacy of our proposed method, we present a relative case study alongside several modified scenarios, demonstrating its effectiveness in practical applications.
[22] arXiv:2404.16856 [pdf, ps, other]: Title: HookChain: A new perspective for Bypassing EDR Solutions

Authors: Helvio Carvalho Junior

Comments: 46 pages, 22 figures, HookChain, Bypass EDR, Evading EDR, IAT Hook, Halo's Gate

Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Operating Systems (cs.OS)

In the current digital security ecosystem, where threats evolve rapidly and with complexity, companies developing Endpoint Detection and Response (EDR) solutions are in constant search for innovations that not only keep up but also anticipate emerging attack vectors. In this context, this article introduces the HookChain, a look from another perspective at widely known techniques, which when combined, provide an additional layer of sophisticated evasion against traditional EDR systems. Through a precise combination of IAT Hooking techniques, dynamic SSN resolution, and indirect system calls, HookChain redirects the execution flow of Windows subsystems in a way that remains invisible to the vigilant eyes of EDRs that only act on Ntdll.dll, without requiring changes to the source code of the applications and malwares involved. This work not only challenges current conventions in cybersecurity but also sheds light on a promising path for future protection strategies, leveraging the understanding that continuous evolution is key to the effectiveness of digital security. By developing and exploring the HookChain technique, this study significantly contributes to the body of knowledge in endpoint security, stimulating the development of more robust and adaptive solutions that can effectively address the ever-changing dynamics of digital threats. This work aspires to inspire deep reflection and advancement in the research and development of security technologies that are always several steps ahead of adversaries.
[23] arXiv:2404.16857 [pdf, ps, other]: Title: Implementation of Entropically Secure Encryption: Securing Personal Health Data

Authors: Mehmet Hüseyin Temel, Boris Skoric, Idelfonso Tafur Monroy

Comments: 4 pages

Subjects: Cryptography and Security (cs.CR)

Entropically Secure Encryption (ESE) offers unconditional security with shorter keys compared to the One-Time Pad. In this paper, we present the first implementation of ESE for bulk encryption. The main computational bottleneck for bulk ESE is a multiplication in a very large finite field. This involves multiplication of polynomials followed by modular reduction. We have implemented polynomial multiplication based on the gf2x library, with some modifications that avoid inputs of vastly different length, thus improving speed. Additionally, we have implemented a recently proposed efficient reduction algorithm that works for any polynomial degree. We investigate two use cases: X-ray images of patients and human genome data. We conduct entropy estimation using compression methods whose results determine the key lengths required for ESE. We report running times for all steps of the encryption. We discuss the potential of ESE to be used in conjunction with Quantum Key Distribution (QKD), in order to achieve full information-theoretic security of QKD-protected links for these use cases.
[24] arXiv:2404.16859 [pdf, other]: Title: Rumour Evaluation with Very Large Language Models

Authors: Dahlia Shehata, Robin Cohen, Charles Clarke

Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Conversational prompt-engineering-based large language models (LLMs) have enabled targeted control over the output creation, enhancing versatility, adaptability and adhoc retrieval. From another perspective, digital misinformation has reached alarming levels. The anonymity, availability and reach of social media offer fertile ground for rumours to propagate. This work proposes to leverage the advancement of prompting-dependent LLMs to combat misinformation by extending the research efforts of the RumourEval task on its Twitter dataset. To the end, we employ two prompting-based LLM variants (GPT-3.5-turbo and GPT-4) to extend the two RumourEval subtasks: (1) veracity prediction, and (2) stance classification. For veracity prediction, three classifications schemes are experimented per GPT variant. Each scheme is tested in zero-, one- and few-shot settings. Our best results outperform the precedent ones by a substantial margin. For stance classification, prompting-based-approaches show comparable performance to prior results, with no improvement over finetuning methods. Rumour stance subtask is also extended beyond the original setting to allow multiclass classification. All of the generated predictions for both subtasks are equipped with confidence scores determining their trustworthiness degree according to the LLM, and post-hoc justifications for explainability and interpretability purposes. Our primary aim is AI for social good.
[25] arXiv:2404.16860 [pdf, ps, other]: Title: To what extent are multiple pendulum systems viable in pseudo-random number generation?

Authors: Matthew Sigit

Comments: 21 Pages

Subjects: Cryptography and Security (cs.CR)

This paper explores the development and viability of an alternative pseudorandom number generator (PRNG) that leverages the chaotic dynamics of multiple pendulum systems. Some traditional PRNGs, notably the one implemented in the Java.Random class, suffer from predictability which gives rise to exploitability. This study identifies these vulnerabilities and proposes a novel PRNG designed using ordinary differential equations, physics modeling, and chaos theory. The performance of the new PRNG is then tested against Java's standard PRNGs using the NIST Statistical Test Suite, which evaluates randomness through comprehensive statistical testing. Results indicate that the multiple pendulum-based PRNG not only offers enhanced security by generating less predictable number sequences but also demonstrates potential for efficiency improvements in applications requiring high levels of entropy. The findings suggest that integrating chaotic physics-based systems into PRNGs, such as the double-pendulum system tested in this study, could strengthen cryptographic practices and security protocols for applications that do not require the level of security created by true random number generators, which is useful in fields such as gaming.
[26] arXiv:2404.16865 [pdf, other]: Title: Improving Privacy-Preserving Techniques for Smart Grid using Lattice-based Cryptography

Authors: Saleh Darzi, Bahareh Akhbari, Hassan Khodaiemehr

Comments: 103 pages, 8 figures

Subjects: Cryptography and Security (cs.CR)

Advancements in communication and information tech birthed the Smart Grid, optimizing energy and data transmission. Yet, user privacy is at risk due to frequent data collection. Existing privacy schemes face vulnerability with quantum machines. To tackle this, the LPM2DA scheme is introduced, utilizing lattice-based encryption and signatures for secure data aggregation. It ensures privacy, integrity, and authentication, enabling statistical analysis while preserving user privacy. Traditional aggregation schemes suffer from weak network models and centralization issues. Enter SPDBlock, a blockchain-based solution ensuring privacy, integrity, and resistance to attacks. It detects and prosecutes malicious entities while efficiently handling multi-dimensional data transmission. Through distributed decryption and secret sharing, only valid data can be decrypted with minimal involvement from smart meters. Performance tests reveal SPDBlock's superiority in communication and computational efficiency over traditional schemes.
[27] arXiv:2404.16870 [pdf, ps, other]: Title: LEMDA: A Novel Feature Engineering Method for Intrusion Detection in IoT Systems

Authors: Ali Ghubaish, Zebo Yang, Aiman Erbad, Raj Jain

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Intrusion detection systems (IDS) for the Internet of Things (IoT) systems can use AI-based models to ensure secure communications. IoT systems tend to have many connected devices producing massive amounts of data with high dimensionality, which requires complex models. Complex models have notorious problems such as overfitting, low interpretability, and high computational complexity. Adding model complexity penalty (i.e., regularization) can ease overfitting, but it barely helps interpretability and computational efficiency. Feature engineering can solve these issues; hence, it has become critical for IDS in large-scale IoT systems to reduce the size and dimensionality of data, resulting in less complex models with excellent performance, smaller data storage, and fast detection. This paper proposes a new feature engineering method called LEMDA (Light feature Engineering based on the Mean Decrease in Accuracy). LEMDA applies exponential decay and an optional sensitivity factor to select and create the most informative features. The proposed method has been evaluated and compared to other feature engineering methods using three IoT datasets and four AI/ML models. The results show that LEMDA improves the F1 score performance of all the IDS models by an average of 34% and reduces the average training and detection times in most cases.
[28] arXiv:2404.16872 [pdf, ps, other]: Title: Mitigating Data Sharing in Public Cloud using Blockchain

Authors: Pratik Patil, Prerna Tulsiani, Dr. Sunil Mane

Subjects: Cryptography and Security (cs.CR)

Public Cloud Computing has become a fundamental part of modern IT infrastructure as its adoption has transformed the way businesses operate. However, cloud security concerns introduce new risks and challenges related to data protection, sharing, and access control. A synergistic integration of blockchain with the cloud holds immense potential. Blockchain's distributed ledger ensures transparency, immutability, and efficiency as it reduces the reliance on centralized authorities. Motivated by this, our framework proposes a secure data ecosystem in the cloud with the key aspects being Data Rights, Data Sharing, and Data Validation. Also, this approach aims to increase its interoperability and scalability by eliminating the need for data migration. This will ensure that existing public cloud-based systems can easily deploy blockchain enhancing trustworthiness and non-repudiation of cloud data.
[29] arXiv:2404.16873 [pdf, other]: Title: AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Authors: Anselm Paulus, Arman Zharmagambetov, Chuan Guo, Brandon Amos, Yuandong Tian

Comments: 32 pages, 9 figures, 7 tables

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

While recently Large Language Models (LLMs) have achieved remarkable successes, they are vulnerable to certain jailbreaking attacks that lead to generation of inappropriate or harmful content. Manual red-teaming requires finding adversarial prompts that cause such jailbreaking, e.g. by appending a suffix to a given instruction, which is inefficient and time-consuming. On the other hand, automatic adversarial prompt generation often leads to semantically meaningless attacks that can easily be detected by perplexity-based filters, may require gradient information from the TargetLLM, or do not scale well due to time-consuming discrete optimization processes over the token space. In this paper, we present a novel method that uses another LLM, called the AdvPrompter, to generate human-readable adversarial prompts in seconds, $\sim800\times$ faster than existing optimization-based approaches. We train the AdvPrompter using a novel algorithm that does not require access to the gradients of the TargetLLM. This process alternates between two steps: (1) generating high-quality target adversarial suffixes by optimizing the AdvPrompter predictions, and (2) low-rank fine-tuning of the AdvPrompter with the generated adversarial suffixes. The trained AdvPrompter generates suffixes that veil the input instruction without changing its meaning, such that the TargetLLM is lured to give a harmful response. Experimental results on popular open source TargetLLMs show state-of-the-art results on the AdvBench dataset, that also transfer to closed-source black-box LLM APIs. Further, we demonstrate that by fine-tuning on a synthetic dataset generated by AdvPrompter, LLMs can be made more robust against jailbreaking attacks while maintaining performance, i.e. high MMLU scores.
[30] arXiv:2404.16876 [pdf, other]: Title: AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Authors: Cédric Gernigon (TARAN), Silviu-Ioan Filip (TARAN), Olivier Sentieys (TARAN), Clément Coggiola (CNES), Mickael Bruno (CNES)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimizes weight and activation signal bit-widths during training for more efficient DNN inference. We use relaxed real-valued bit-widths that are updated using a gradient descent rule, but are otherwise discretized for all quantization operations. The result is a simple and flexible QAT approach for mixed-precision uniform quantization problems. Compared to other methods that are generally designed to be run on a pretrained network, AdaQAT works well in both training from scratch and fine-tuning scenarios.Initial results on the CIFAR-10 and ImageNet datasets using ResNet20 and ResNet18 models, respectively, indicate that our method is competitive with other state-of-the-art mixed-precision quantization approaches.
[31] arXiv:2404.16877 [pdf, other]: Title: Rapid Deployment of DNNs for Edge Computing via Structured Pruning at Initialization

Authors: Bailey J. Eccles, Leon Wong, Blesson Varghese

Comments: The 24th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Edge machine learning (ML) enables localized processing of data on devices and is underpinned by deep neural networks (DNNs). However, DNNs cannot be easily run on devices due to their substantial computing, memory and energy requirements for delivering performance that is comparable to cloud-based ML. Therefore, model compression techniques, such as pruning, have been considered. Existing pruning methods are problematic for edge ML since they: (1) Create compressed models that have limited runtime performance benefits (using unstructured pruning) or compromise the final model accuracy (using structured pruning), and (2) Require substantial compute resources and time for identifying a suitable compressed DNN model (using neural architecture search). In this paper, we explore a new avenue, referred to as Pruning-at-Initialization (PaI), using structured pruning to mitigate the above problems. We develop Reconvene, a system for rapidly generating pruned models suited for edge deployments using structured PaI. Reconvene systematically identifies and prunes DNN convolution layers that are least sensitive to structured pruning. Reconvene rapidly creates pruned DNNs within seconds that are up to 16.21x smaller and 2x faster while maintaining the same accuracy as an unstructured PaI counterpart.
[32] arXiv:2404.16878 [pdf, other]: Title: tinygarden -- A java package for testing properties of spanning trees

Authors: Manuel Dubinsky, César Massri, Gabriel Taubin

Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)

Spanning trees are fundamental objects in graph theory. The spanning tree set size of an arbitrary graph can be very large. This limitation discourages its analysis. However interesting patterns can emerge in small cases. In this article we introduce \emph{tinygarden}, a java package for validating hypothesis, testing properties and discovering patterns from the spanning tree set of an arbitrary graph.
[33] arXiv:2404.16879 [pdf, ps, other]: Title: Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey

Authors: Maeva Guerrier, Hassan Fouad, Giovanni Beltrame

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)

Reinforcement learning is a powerful technique for developing new robot behaviors. However, typical lack of safety guarantees constitutes a hurdle for its practical application on real robots. To address this issue, safe reinforcement learning aims to incorporate safety considerations, enabling faster transfer to real robots and facilitating lifelong learning. One promising approach within safe reinforcement learning is the use of control barrier functions. These functions provide a framework to ensure that the system remains in a safe state during the learning process. However, synthesizing control barrier functions is not straightforward and often requires ample domain knowledge. This challenge motivates the exploration of data-driven methods for automatically defining control barrier functions, which is highly appealing. We conduct a comprehensive review of the existing literature on safe reinforcement learning using control barrier functions. Additionally, we investigate various techniques for automatically learning the Control Barrier Functions, aiming to enhance the safety and efficacy of Reinforcement Learning in practical robot applications.
[34] arXiv:2404.16881 [pdf, other]: Title: On uncertainty-penalized Bayesian information criterion

Authors: Pongpisit Thanasutives, Ken-ichi Fukui

Comments: 4 pages, 2 figures

Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)

The uncertainty-penalized information criterion (UBIC) has been proposed as a new model-selection criterion for data-driven partial differential equation (PDE) discovery. In this paper, we show that using the UBIC is equivalent to employing the conventional BIC to a set of overparameterized models derived from the potential regression models of different complexity measures. The result indicates that the asymptotic property of the UBIC and BIC holds indifferently.
[35] arXiv:2404.16882 [pdf, other]: Title: ThermoPore: Predicting Part Porosity Based on Thermal Images Using Deep Learning

Authors: Peter Myung-Won Pak, Francis Ogoke, Andrew Polonsky, Anthony Garland, Dan S. Bolintineanu, Dan R. Moser, Michael J. Heiden, Amir Barati Farimani

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We present a deep learning approach for quantifying and localizing ex-situ porosity within Laser Powder Bed Fusion fabricated samples utilizing in-situ thermal image monitoring data. Our goal is to build the real time porosity map of parts based on thermal images acquired during the build. The quantification task builds upon the established Convolutional Neural Network model architecture to predict pore count and the localization task leverages the spatial and temporal attention mechanisms of the novel Video Vision Transformer model to indicate areas of expected porosity. Our model for porosity quantification achieved a $R^2$ score of 0.57 and our model for porosity localization produced an average IoU score of 0.32 and a maximum of 1.0. This work is setting the foundations of part porosity "Digital Twins" based on additive manufacturing monitoring data and can be applied downstream to reduce time-intensive post-inspection and testing activities during part qualification and certification. In addition, we seek to accelerate the acquisition of crucial insights normally only available through ex-situ part evaluation by means of machine learning analysis of in-situ process monitoring data.
[36] arXiv:2404.16883 [pdf, other]: Title: Myopically Verifiable Probabilistic Certificates for Safe Control and Learning

Authors: Zhuoyuan Wang, Haoming Jing, Christian Kurniawan, Albert Chern, Yorie Nakahira

Comments: arXiv admin note: substantial text overlap with arXiv:2110.13380

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invariance-based methods that restrict the probability of risk events in infinitesimal time intervals may exhibit significant long-term risks due to cumulative uncertainties/risks. On the other hand, reachability-based approaches that account for the long-term future may require prohibitive computation in real-time decision making. To overcome this challenge involving stringent long-term safety vs. computation tradeoffs, we first introduce a novel technique termed `probabilistic invariance'. This technique characterizes the invariance conditions of the probability of interest. When the target probability is defined using long-term trajectories, this technique can be used to design myopic conditions/controllers with assured long-term safe probability. Then, we integrate this technique into safe control and learning. The proposed control methods efficiently assure long-term safety using neural networks or model predictive controllers with short outlook horizons. The proposed learning methods can be used to guarantee long-term safety during and after training. Finally, we demonstrate the performance of the proposed techniques in numerical simulations.
[37] arXiv:2404.16884 [pdf, other]: Title: Aligning Knowledge Graphs Provided by Humans and Generated from Neural Networks in Specific Tasks

Authors: Tangrui Li, Jun Zhou

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper develops an innovative method that enables neural networks to generate and utilize knowledge graphs, which describe their concept-level knowledge and optimize network parameters through alignment with human-provided knowledge. This research addresses a gap where traditionally, network-generated knowledge has been limited to applications in downstream symbolic analysis or enhancing network transparency. By integrating a novel autoencoder design with the Vector Symbolic Architecture (VSA), we have introduced auxiliary tasks that support end-to-end training. Our approach eschews traditional dependencies on ontologies or word embedding models, mining concepts from neural networks and directly aligning them with human knowledge. Experiments show that our method consistently captures network-generated concepts that align closely with human knowledge and can even uncover new, useful concepts not previously identified by humans. This plug-and-play strategy not only enhances the interpretability of neural networks but also facilitates the integration of symbolic logical reasoning within these systems.
[38] arXiv:2404.16885 [pdf, ps, other]: Title: Adapting an Artificial Intelligence Sexually Transmitted Diseases Symptom Checker Tool for Mpox Detection: The HeHealth Experience

Authors: Rayner Kay Jin Tan, Dilruk Perera, Salomi Arasaratnam, Yudara Kularathne

Comments: 15 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Artificial Intelligence applications have shown promise in the management of pandemics and have been widely used to assist the identification, classification, and diagnosis of medical images. In response to the global outbreak of Monkeypox (Mpox), the HeHealth.ai team leveraged an existing tool to screen for sexually transmitted diseases to develop a digital screening test for symptomatic Mpox through AI approaches. Prior to the global outbreak of Mpox, the team developed a smartphone app, where app users can use their own smartphone cameras to take pictures of their own penises to screen for symptomatic STD. The AI model was initially developed using 5000 cases and use a modified convolutional neural network to output prediction scores across visually diagnosable penis pathologies including Syphilis, Herpes Simplex Virus, and Human Papilloma Virus. From June 2022 to October 2022, a total of about 22,000 users downloaded the HeHealth app, and about 21,000 images have been analyzed using HeHealth AI technology. We then engaged in formative research, stakeholder engagement, rapid consolidation images, a validation study, and implementation of the tool from July 2022. From July 2022 to October 2022, a total of 1000 Mpox related images had been used to train the Mpox symptom checker tool. Our digital symptom checker tool showed accuracy of 87% to rule in Mpox and 90% to rule out symptomatic Mpox. Several hurdles identified included issues of data privacy and security for app users, initial lack of data to train the AI tool, and the potential generalizability of input data. We offer several suggestions to help others get started on similar projects in emergency situations, including engaging a wide range of stakeholders, having a multidisciplinary team, prioritizing pragmatism, as well as the concept that big data in fact is made up of small data.
[39] arXiv:2404.16886 [pdf, other]: Title: Review of Data-centric Time Series Analysis from Sample, Feature, and Period

Authors: Chenxi Sun, Hongyan Li, Yaliang Li, Shenda Hong

Comments: 9 pages, 1 figure

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality. Even though time-series data processing methods frequently come up in a wide range of research fields, it hasn't been well investigated as a specific topic. To fill the gap, in this paper, we systematically review different data-centric methods in time series analysis, covering a wide range of research topics. Based on the time-series data characteristics at sample, feature, and period, we propose a taxonomy for the reviewed data selection methods. In addition to discussing and summarizing their characteristics, benefits, and drawbacks targeting time-series data, we also introduce the challenges and opportunities by proposing recommendations, open problems, and possible research topics.
[40] arXiv:2404.16887 [pdf, other]: Title: Anomaly Detection for Incident Response at Scale

Authors: Hanzhang Wang, Gowtham Kumar Tangirala, Gilkara Pranav Naidu, Charles Mayville, Arighna Roy, Joanne Sun, Ramesh Babu Mandava

Comments: ASPLOS 2024 AIOps workshop

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We present a machine learning-based anomaly detection product, AI Detect and Respond (AIDR), that monitors Walmart's business and system health in real-time. During the validation over 3 months, the product served predictions from over 3000 models to more than 25 application, platform, and operation teams, covering 63\% of major incidents and reducing the mean-time-to-detect (MTTD) by more than 7 minutes. Unlike previous anomaly detection methods, our solution leverages statistical, ML and deep learning models while continuing to incorporate rule-based static thresholds to incorporate domain-specific knowledge. Both univariate and multivariate ML models are deployed and maintained through distributed services for scalability and high availability. AIDR has a feedback loop that assesses model quality with a combination of drift detection algorithms and customer feedback. It also offers self-onboarding capabilities and customizability. AIDR has achieved success with various internal teams with lower time to detection and fewer false positives than previous methods. As we move forward, we aim to expand incident coverage and prevention, reduce noise, and integrate further with root cause recommendation (RCR) to enable an end-to-end AIDR experience.
[41] arXiv:2404.16890 [pdf, other]: Title: NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer

Authors: Zhu Liao, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

While deep neural networks are highly effective at solving complex tasks, their computational demands can hinder their usefulness in real-time applications and with limited-resources systems. Besides, for many tasks it is known that these models are over-parametrized: neoteric works have broadly focused on reducing the width of these networks, rather than their depth. In this paper, we aim to reduce the depth of over-parametrized deep neural networks: we propose an eNtropy-basEd Pruning as a nEural Network depTH's rEducer (NEPENTHE) to alleviate deep neural networks' computational burden. Based on our theoretical finding, NEPENTHE focuses on un-structurally pruning connections in layers with low entropy to remove them entirely. We validate our approach on popular architectures such as MobileNet and Swin-T, showing that when encountering an over-parametrization regime, it can effectively linearize some layers (hence reducing the model's depth) with little to no performance loss. The code will be publicly available upon acceptance of the article.
[42] arXiv:2404.16891 [pdf, other]: Title: Attacks on Third-Party APIs of Large Language Models

Authors: Wanru Zhao, Vidit Khazanchi, Haodi Xing, Xuanli He, Qiongkai Xu, Nicholas Donald Lane

Comments: ICLR 2024 Workshop on Secure and Trustworthy Large Language Models

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services. This innovation enhances the capabilities of LLMs, but it also introduces risks, as these plugins developed by various third parties cannot be easily trusted. This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services. Applying our framework specifically to widely used LLMs, we identify real-world malicious attacks across various domains on third-party APIs that can imperceptibly modify LLM outputs. The paper discusses the unique challenges posed by third-party API integration and offers strategic possibilities to improve the security and safety of LLM ecosystems moving forward. Our code is released at https://github.com/vk0812/Third-Party-Attacks-on-LLMs.
[43] arXiv:2404.16893 [pdf, other]: Title: Automatic AI controller that can drive with confidence: steering vehicle with uncertainty knowledge

Authors: Neha Kumari, Sumit Kumar. Sneha Priya, Ayush Kumar, Akash Fogla

Comments: arXiv admin note: substantial text overlap with arXiv:2303.08187

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

In safety-critical systems that interface with the real world, the role of uncertainty in decision-making is pivotal, particularly in the context of machine learning models. For the secure functioning of Cyber-Physical Systems (CPS), it is imperative to manage such uncertainty adeptly. In this research, we focus on the development of a vehicle's lateral control system using a machine learning framework. Specifically, we employ a Bayesian Neural Network (BNN), a probabilistic learning model, to address uncertainty quantification. This capability allows us to gauge the level of confidence or uncertainty in the model's predictions. The BNN based controller is trained using simulated data gathered from the vehicle traversing a single track and subsequently tested on various other tracks. We want to share two significant results: firstly, the trained model demonstrates the ability to adapt and effectively control the vehicle on multiple similar tracks. Secondly, the quantification of prediction confidence integrated into the controller serves as an early-warning system, signaling when the algorithm lacks confidence in its predictions and is therefore susceptible to failure. By establishing a confidence threshold, we can trigger manual intervention, ensuring that control is relinquished from the algorithm when it operates outside of safe parameters.
[44] arXiv:2404.16894 [pdf, other]: Title: On TinyML and Cybersecurity: Electric Vehicle Charging Infrastructure Use Case

Authors: Fatemeh Dehrouyeh, Li Yang, Firouz Badrkhani Ajaei, Abdallah Shami

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As technology advances, the use of Machine Learning (ML) in cybersecurity is becoming increasingly crucial to tackle the growing complexity of cyber threats. While traditional ML models can enhance cybersecurity, their high energy and resource demands limit their applications, leading to the emergence of Tiny Machine Learning (TinyML) as a more suitable solution for resource-constrained environments. TinyML is widely applied in areas such as smart homes, healthcare, and industrial automation. TinyML focuses on optimizing ML algorithms for small, low-power devices, enabling intelligent data processing directly on edge devices. This paper provides a comprehensive review of common challenges of TinyML techniques, such as power consumption, limited memory, and computational constraints; it also explores potential solutions to these challenges, such as energy harvesting, computational optimization techniques, and transfer learning for privacy preservation. On the other hand, this paper discusses TinyML's applications in advancing cybersecurity for Electric Vehicle Charging Infrastructures (EVCIs) as a representative use case. It presents an experimental case study that enhances cybersecurity in EVCI using TinyML, evaluated against traditional ML in terms of reduced delay and memory usage, with a slight trade-off in accuracy. Additionally, the study includes a practical setup using the ESP32 microcontroller in the PlatformIO environment, which provides a hands-on assessment of TinyML's application in cybersecurity for EVCI.
[45] arXiv:2404.16895 [pdf, other]: Title: QuERLoc: Towards Next-Generation Localization with Quantum-Enhanced Ranging

Authors: Entong He, Yuxiang Yang, Chenshu Wu

Comments: Submitted for review to MobiHoc 2024

Subjects: Emerging Technologies (cs.ET)

Remarkable advances have been achieved in localization techniques in past decades, rendering it one of the most important technologies indispensable to our daily lives. In this paper, we investigate a novel localization approach for future computing by presenting QuERLoc, the first study on localization using quantum-enhanced ranging. By fine-tuning the evolution of an entangled quantum probe, quantum ranging can output the information integrated in the probe as a specific mapping of distance-related parameters. QuERLoc is inspired by this unique property to measure a special combination of distances between a target sensor and multiple anchors within one single physical measurement. Leveraging this capability, QuERLoc settles two drawbacks of classical localization approaches: (i) the target-anchor distances must be measured individually and sequentially, and (ii) the resulting optimization problems are non-convex and are sensitive to noise. We first present the theoretical formulation of preparing the probing quantum state and controlling its dynamic to induce a convexified localization problem, and then solve it efficiently via optimization. We conduct extensive numerical analysis of QuERLoc under various settings. The results show that QuERLoc consistently outperforms classical approaches in accuracy and closely follows the theoretical lowerbound, while maintaining low time complexity. It achieves a minimum reduction of 73% in RMSE and 97.6% in time consumption compared to baselines. By introducing range-based quantum localization to the mobile computing community and showing its superior performance, QuERLoc sheds light on next-generation localization technologies and opens up new directions for future research.
[46] arXiv:2404.16896 [pdf, other]: Title: A Neural-Network-Based Approach for Loose-Fitting Clothing

Authors: Yongxu Jin, Dalton Omens, Zhenglin Geng, Joseph Teran, Abishek Kumar, Kenji Tashiro, Ronald Fedkiw

Subjects: Graphics (cs.GR); Machine Learning (cs.LG)

Since loose-fitting clothing contains dynamic modes that have proven to be difficult to predict via neural networks, we first illustrate how to coarsely approximate these modes with a real-time numerical algorithm specifically designed to mimic the most important ballistic features of a classical numerical simulation. Although there is some flexibility in the choice of the numerical algorithm used as a proxy for full simulation, it is essential that the stability and accuracy be independent from any time step restriction or similar requirements in order to facilitate real-time performance. In order to reduce the number of degrees of freedom that require approximations to their dynamics, we simulate rigid frames and use skinning to reconstruct a rough approximation to a desirable mesh; as one might expect, neural-network-based skinning seems to perform better than linear blend skinning in this scenario. Improved high frequency deformations are subsequently added to the skinned mesh via a quasistatic neural network (QNN). In contrast to recurrent neural networks that require a plethora of training data in order to adequately generalize to new examples, QNNs perform well with significantly less training data.
[47] arXiv:2404.16897 [pdf, other]: Title: Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models

Authors: Shi-Yu Xia, Wenxuan Zhu, Xu Yang, Xin Geng

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In practice, we usually need to build variable-sized models adapting for diverse resource constraints in different application scenarios, where weight initialization is an important step prior to training. The Learngene framework, introduced recently, firstly learns one compact part termed as learngene from a large well-trained model, after which learngene is expanded to initialize variable-sized models. In this paper, we start from analysing the importance of guidance for the expansion of well-trained learngene layers, inspiring the design of a simple but highly effective Learngene approach termed SWS (Stage-wise Weight Sharing), where both learngene layers and their learning process critically contribute to providing knowledge and guidance for initializing models at varying scales. Specifically, to learn learngene layers, we build an auxiliary model comprising multiple stages where the layer weights in each stage are shared, after which we train it through distillation. Subsequently, we expand these learngene layers containing stage information at their corresponding stage to initialize models of variable depths. Extensive experiments on ImageNet-1K demonstrate that SWS achieves consistent better performance compared to many models trained from scratch, while reducing around 6.6x total training costs. In some cases, SWS performs better only after 1 epoch tuning. When initializing variable-sized models adapting for different resource constraints, SWS achieves better results while reducing around 20x parameters stored to initialize these models and around 10x pre-training costs, in contrast to the pre-training and fine-tuning approach.
[48] arXiv:2404.16898 [pdf, other]: Title: How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training

Authors: Jaeseong You, Minseop Park, Kyunggeun Lee, Seokjun An, Chirag Patel, Markus Nage

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper investigates three different parameterizations of asymmetric uniform quantization for quantization-aware training: (1) scale and offset, (2) minimum and maximum, and (3) beta and gamma. We perform a comprehensive comparative analysis of these parameterizations' influence on quantization-aware training, using both controlled experiments and real-world large language models. Our particular focus is on their changing behavior in response to critical training hyperparameters, bit width and learning rate. Based on our investigation, we propose best practices to stabilize and accelerate quantization-aware training with learnable asymmetric quantization ranges.
[49] arXiv:2404.16899 [pdf, other]: Title: mlr3summary: Concise and interpretable summaries for machine learning models

Authors: Susanne Dandl, Marc Becker, Bernd Bischl, Giuseppe Casalicchio, Ludwig Bothmann

Comments: 9 pages

Subjects: Machine Learning (cs.LG)

This work introduces a novel R package for concise, informative summaries of machine learning models.
We take inspiration from the summary function for (generalized) linear models in R, but extend it in several directions:
First, our summary function is model-agnostic and provides a unified summary output also for non-parametric machine learning models;
Second, the summary output is more extensive and customizable -- it comprises information on the dataset, model performance, model complexity, model's estimated feature importances, feature effects, and fairness metrics;
Third, models are evaluated based on resampling strategies for unbiased estimates of model performances, feature importances, etc.
Overall, the clear, structured output should help to enhance and expedite the model selection process, making it a helpful tool for practitioners and researchers alike.
[50] arXiv:2404.16903 [pdf, other]: Title: Fiper: a Visual-based Explanation Combining Rules and Feature Importance

Authors: Eleonora Cappuccio, Daniele Fadda, Rosa Lanzilotti, Salvatore Rinzivillo

Comments: 15 pages, 4 figures, to be published in ECML PKDD International Workshop on eXplainable Knowledge Discovery in Data Mining

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Artificial Intelligence algorithms have now become pervasive in multiple high-stakes domains. However, their internal logic can be obscure to humans. Explainable Artificial Intelligence aims to design tools and techniques to illustrate the predictions of the so-called black-box algorithms. The Human-Computer Interaction community has long stressed the need for a more user-centered approach to Explainable AI. This approach can benefit from research in user interface, user experience, and visual analytics. This paper proposes a visual-based method to illustrate rules paired with feature importance. A user study with 15 participants was conducted comparing our visual method with the original output of the algorithm and textual representation to test its effectiveness with users.
[51] arXiv:2404.16905 [pdf, other]: Title: Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

Authors: Shen Zhang, Haojie Zhang, Jing Zhang, Xudong Zhang, Yimeng Zhuang, Jinting Wu

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions. Unraveling the causes of emotions is more challenging. A new task named Multimodal Emotion-Cause Pair Extraction in Conversations is responsible for recognizing emotion and identifying causal expressions. In this study, we propose a multi-stage framework to generate emotion and extract the emotion causal pairs given the target emotion. In the first stage, Llama-2-based InstructERC is utilized to extract the emotion category of each utterance in a conversation. After emotion recognition, a two-stream attention model is employed to extract the emotion causal pairs given the target emotion for subtask 2 while MuTEC is employed to extract causal span for subtask 1. Our approach achieved first place for both of the two subtasks in the competition.
[52] arXiv:2404.16906 [pdf, other]: Title: Evolve Cost-aware Acquisition Functions Using Large Language Models

Authors: Yiming Yao, Fei Liu, Ji Cheng, Qingfu Zhang

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Many real-world optimization scenarios involve expensive evaluation with unknown and heterogeneous costs. Cost-aware Bayesian optimization stands out as a prominent solution in addressing these challenges. To approach the global optimum within a limited budget in a cost-efficient manner, the design of cost-aware acquisition functions (AFs) becomes a crucial step. However, traditional manual design paradigm typically requires extensive domain knowledge and involves a labor-intensive trial-and-error process. This paper introduces EvolCAF, a novel framework that integrates large language models (LLMs) with evolutionary computation (EC) to automatically design cost-aware AFs. Leveraging the crossover and mutation in the algorithm space, EvolCAF offers a novel design paradigm, significantly reduces the reliance on domain expertise and model training. The designed cost-aware AF maximizes the utilization of available information from historical data, surrogate models and budget details. It introduces novel ideas not previously explored in the existing literature on acquisition function design, allowing for clear interpretations to provide insights into its behavior and decision-making process. In comparison to the well-known EIpu and EI-cool methods designed by human experts, our approach showcases remarkable efficiency and generalization across various tasks, including 12 synthetic problems and 3 real-world hyperparameter tuning test sets.
[53] arXiv:2404.16908 [pdf, other]: Title: Closing the gap: Optimizing Guidance and Control Networks through Neural ODEs

Authors: Sebastien Origer, Dario Izzo

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

We improve the accuracy of Guidance & Control Networks (G&CNETs), trained to represent the optimal control policies of a time-optimal transfer and a mass-optimal landing, respectively. In both cases we leverage the dynamics of the spacecraft, described by Ordinary Differential Equations which incorporate a neural network on their right-hand side (Neural ODEs). Since the neural dynamics is differentiable, the ODEs sensitivities to the network parameters can be computed using the variational equations, thereby allowing to update the G&CNET parameters based on the observed dynamics. We start with a straightforward regression task, training the G&CNETs on datasets of optimal trajectories using behavioural cloning. These networks are then refined using the Neural ODE sensitivities by minimizing the error between the final states and the target states. We demonstrate that for the orbital transfer, the final error to the target can be reduced by 99% on a single trajectory and by 70% on a batch of 500 trajectories. For the landing problem the reduction in error is around 98-99% (position) and 40-44% (velocity). This step significantly enhances the accuracy of G&CNETs, which instills greater confidence in their reliability for operational use. We also compare our results to the popular Dataset Aggregation method (DaGGER) and allude to the strengths and weaknesses of both methods.
[54] arXiv:2404.16913 [pdf, other]: Title: DE-CGAN: Boosting rTMS Treatment Prediction with Diversity Enhancing Conditional Generative Adversarial Networks

Authors: Matthew Squires, Xiaohui Tao, Soman Elangovan, Raj Gururajan, Haoran Xie, Xujuan Zhou, Yuefeng Li, U Rajendra Acharya

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Repetitive Transcranial Magnetic Stimulation (rTMS) is a well-supported, evidence-based treatment for depression. However, patterns of response to this treatment are inconsistent. Emerging evidence suggests that artificial intelligence can predict rTMS treatment outcomes for most patients using fMRI connectivity features. While these models can reliably predict treatment outcomes for many patients for some underrepresented fMRI connectivity measures DNN models are unable to reliably predict treatment outcomes. As such we propose a novel method, Diversity Enhancing Conditional General Adversarial Network (DE-CGAN) for oversampling these underrepresented examples. DE-CGAN creates synthetic examples in difficult-to-classify regions by first identifying these data points and then creating conditioned synthetic examples to enhance data diversity. Through empirical experiments we show that a classification model trained using a diversity enhanced training set outperforms traditional data augmentation techniques and existing benchmark results. This work shows that increasing the diversity of a training dataset can improve classification model performance. Furthermore, this work provides evidence for the utility of synthetic patients providing larger more robust datasets for both AI researchers and psychiatrists to explore variable relationships.
[55] arXiv:2404.16914 [pdf, other]: Title: Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing

Authors: Peizhuang Cong, Aomufei Yuan, Shimao Chen, Yuxuan Tian, Bowen Ye, Tong Yang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

MoE facilitates the development of large models by making the computational complexity of the model no longer scale linearly with increasing parameters. The learning sparse gating network selects a set of experts for each token to be processed; however, this may lead to differences in the number of tokens processed by each expert over several successive iterations, i.e., the expert load fluctuations, which reduces computational parallelization and resource utilization. To this end, we traced and analyzed loads of each expert in the training iterations for several large language models in this work, and defined the transient state with "obvious load fluctuation" and the stable state with "temporal locality". Moreover, given the characteristics of these two states and the computational overhead, we deployed three classical prediction algorithms that achieve accurate expert load prediction results. For the GPT3 350M model, the average error rates for predicting the expert load proportion over the next 1,000 and 2,000 steps are approximately 1.3% and 1.8%, respectively. This work can provide valuable guidance for expert placement or resource allocation for MoE model training. Based on this work, we will propose an expert placement scheme for transient and stable states in our coming work.
[56] arXiv:2404.16915 [pdf, other]: Title: Servicifying zk-SNARKs Execution for Verifiable Off-chain Computations

Authors: Alvaro Alonso Domenech, Jonathan Heiss, Stefan Tai

Comments: 2 pages, 3 figures

Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)

Zk-SNARKs help scale blockchains with Verifiable Off-chain Computations (VOC). zk-SNARK DSL toolkits are key when designing arithmetic circuits but fall short of automating the subsequent proof-generation step in an automated manner. We emphasize the need for portability, interoperability, and manageability in VOC-based solutions and introduce a Proving Service that is designed to provide a scalable and reusable solution for generating zk-SNARK proofs leveraging clouds.
[57] arXiv:2404.16917 [pdf, other]: Title: Grad Queue : A probabilistic framework to reinforce sparse gradients

Authors: Irfan Mohammad Al Hasib

Comments: 15 pages, 6 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Informative gradients are often lost in large batch updates. We propose a robust mechanism to reinforce the sparse components within a random batch of data points. A finite queue of online gradients is used to determine their expected instantaneous statistics. We propose a function to measure the scarcity of incoming gradients using these statistics and establish the theoretical ground of this mechanism. To minimize conflicting components within large mini-batches, samples are grouped with aligned objectives by clustering based on inherent feature space. Sparsity is measured for each centroid and weighted accordingly. A strong intuitive criterion to squeeze out redundant information from each cluster is the backbone of the system. It makes rare information indifferent to aggressive momentum also exhibits superior performance with larger mini-batch horizon. The effective length of the queue kept variable to follow the local loss pattern. The contribution of our method is to restore intra-mini-batch diversity at the same time widening the optimal batch boundary. Both of these collectively drive it deeper towards the minima. Our method has shown superior performance for CIFAR10, MNIST, and Reuters News category dataset compared to mini-batch gradient descent.
[58] arXiv:2404.16918 [pdf, other]: Title: On-the-fly Data Augmentation for Forecasting with Deep Learning

Authors: Vitor Cerqueira, Moisés Santos, Yassine Baghoussi, Carlos Soares

Subjects: Machine Learning (cs.LG)

Deep learning approaches are increasingly used to tackle forecasting tasks. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. In these scenarios, synthetic data generation techniques are usually applied to augment the dataset. Data augmentation is typically applied before fitting a model. However, these approaches create a single augmented dataset, potentially limiting their effectiveness. This work introduces OnDAT (On-the-fly Data Augmentation for Time series) to address this issue by applying data augmentation during training and validation. Contrary to traditional methods that create a single, static augmented dataset beforehand, OnDAT performs augmentation on-the-fly. By generating a new augmented dataset on each iteration, the model is exposed to a constantly changing augmented data variations. We hypothesize this process enables a better exploration of the data space, which reduces the potential for overfitting and improves forecasting performance. We validated the proposed approach using a state-of-the-art deep learning forecasting method and 8 benchmark datasets containing a total of 75797 time series. The experiments suggest that OnDAT leads to better forecasting performance than a strategy that applies data augmentation before training as well as a strategy that does not involve data augmentation. The method and experiments are publicly available.
[59] arXiv:2404.16920 [pdf, other]: Title: Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks

Authors: Shufan Wang, Guojun Xiong, Shichen Zhang, Huacheng Zeng, Jian Li, Shivendra Panwar

Comments: IEEE Transactions on Wireless Communications

Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

We study the data packet transmission problem (mmDPT) in dense cell-free millimeter wave (mmWave) networks, i.e., users sending data packet requests to access points (APs) via uplinks and APs transmitting requested data packets to users via downlinks. Our objective is to minimize the average delay in the system due to APs' limited service capacity and unreliable wireless channels between APs and users. This problem can be formulated as a restless multi-armed bandits problem with fairness constraint (RMAB-F). Since finding the optimal policy for RMAB-F is intractable, existing learning algorithms are computationally expensive and not suitable for practical dynamic dense mmWave networks. In this paper, we propose a structured reinforcement learning (RL) solution for mmDPT by exploiting the inherent structure encoded in RMAB-F. To achieve this, we first design a low-complexity and provably asymptotically optimal index policy for RMAB-F. Then, we leverage this structure information to develop a structured RL algorithm called mmDPT-TS, which provably achieves an \tilde{O}(\sqrt{T}) Bayesian regret. More importantly, mmDPT-TS is computation-efficient and thus amenable to practical implementation, as it fully exploits the structure of index policy for making decisions. Extensive emulation based on data collected in realistic mmWave networks demonstrate significant gains of mmDPT-TS over existing approaches.
[60] arXiv:2404.16921 [pdf, other]: Title: A Short Survey of Human Mobility Prediction in Epidemic Modeling from Transformers to LLMs

Authors: Christian N. Mayemba, D'Jeff K. Nkashama, Jean Marie Tshimula, Maximilien V. Dialufuma, Jean Tshibangu Muabila, Mbuyi Mukendi Didier, Hugues Kanda, René Manassé Galekwa, Heber Dibwe Fita, Serge Mundele, Kalonji Kalala, Aristarque Ilunga, Lambert Mukendi Ntobo, Dominique Muteba, Aaron Aruna Abedi

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

This paper provides a comprehensive survey of recent advancements in leveraging machine learning techniques, particularly Transformer models, for predicting human mobility patterns during epidemics. Understanding how people move during epidemics is essential for modeling the spread of diseases and devising effective response strategies. Forecasting population movement is crucial for informing epidemiological models and facilitating effective response planning in public health emergencies. Predicting mobility patterns can enable authorities to better anticipate the geographical and temporal spread of diseases, allocate resources more efficiently, and implement targeted interventions. We review a range of approaches utilizing both pretrained language models like BERT and Large Language Models (LLMs) tailored specifically for mobility prediction tasks. These models have demonstrated significant potential in capturing complex spatio-temporal dependencies and contextual patterns in textual data.
[61] arXiv:2404.16924 [pdf, other]: Title: A Survey of Generative Search and Recommendation in the Era of Large Language Models

Authors: Yongqi Li, Xinyu Lin, Wenjie Wang, Fuli Feng, Liang Pang, Wenjie Li, Liqiang Nie, Xiangnan He, Tat-Seng Chua

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

With the information explosion on the Web, search and recommendation are foundational infrastructures to satisfying users' information needs. As the two sides of the same coin, both revolve around the same core research problem, matching queries with documents or users with items. In the recent few decades, search and recommendation have experienced synchronous technological paradigm shifts, including machine learning-based and deep learning-based paradigms. Recently, the superintelligent generative large language models have sparked a new paradigm in search and recommendation, i.e., generative search (retrieval) and recommendation, which aims to address the matching problem in a generative manner. In this paper, we provide a comprehensive survey of the emerging paradigm in information systems and summarize the developments in generative search and recommendation from a unified perspective. Rather than simply categorizing existing works, we abstract a unified framework for the generative paradigm and break down the existing works into different stages within this framework to highlight the strengths and weaknesses. And then, we distinguish generative search and recommendation with their unique challenges, identify open problems and future directions, and envision the next information-seeking paradigm.
[62] arXiv:2404.16944 [pdf, other]: Title: Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection

Authors: Mehmet Kerem Turkcan, Sanjeev Narasimhan, Chengbo Zang, Gyung Hyun Je, Bo Yu, Mahshid Ghasemi, Javad Ghaderi, Gil Zussman, Zoran Kostic

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above. It enables the testing of object detection models for variations in lighting, building shadows, weather, and scene dynamics. We evaluate contemporary object detection architectures on the dataset, observing that state-of-the-art methods have lower performance in detecting small pedestrians compared to vehicles, corresponding to a 10% difference in average precision (AP). Using structurally similar datasets for pretraining the models results in an increase of 1.8% mean AP (mAP). We further find that incorporating domain-specific data augmentations helps improve model performance. Using pseudo-labeled data, obtained from inference outcomes of the best-performing models, improves the performance of the models. Finally, comparing the models trained using the data collected in two different time intervals, we find a performance drift in models due to the changes in intersection conditions over time. The best-performing model achieves a pedestrian AP of 92.0% with 11.5 ms inference time on NVIDIA A100 GPUs, and an mAP of 95.4%.
[63] arXiv:2404.16947 [pdf, other]: Title: Fuzzing MLIR by Synthesizing Custom Mutations

Authors: Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, Miryung Kim

Subjects: Software Engineering (cs.SE)

Multi-Level Intermediate Representation (MLIR) is an effort to enable faster compiler development by providing an extensible framework for downstream developers to define custom IRs with MLIR dialects. MLIR dialects define new IRs that are tailored for specific domains. The diversity and rapid evolution of these IRs make it impractical to pre-define custom generator logic for every available dialect. We design a new approach called SynthFuzz that automatically infers and applies custom mutations from existing tests. Inferred custom mutations are parameterized and context-dependent such that they can be concretized depending on the target context. By doing this, we obviate the need to manually write custom mutations for newly introduced MLIR dialects. Further, SynthFuzz increases the chance of finding effective edit locations and reduces the chance of inserting invalid edit content by performing k-ancestor-prefix and l-sibling-postfix matching. We compare SynthFuzz to three baselines: Grammarinator -- a grammar-based fuzzer without custom mutators, MLIRSmith -- a custom test generator for MLIR, and NeuRI -- a custom test generator with support for parameterized generation. We conduct this comparison on 4 different MLIR projects where each project defines a new set of MLIR dialects that would take months of effort to manually write custom input generation and mutation logic. We show that SynthFuzz on average improves input diversity by 1.51$\times$, which increases branch coverage by 1.16$\times$. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11$\times$, indicating that SynthFuzz correctly concretizes its parameterized mutations with respect to the target context. Mutation parameterization reduces the fraction of tests violating general MLIR constraints by 0.57$\times$, increasing the time spent fuzzing dialect-specific code.
[64] arXiv:2404.16952 [pdf, other]: Title: Simultaneous Estimation of Shape and Force along Highly Deformable Surgical Manipulators Using Sparse FBG Measurement

Authors: Yiang Lu, Bin Li, Wei Chen, Junyan Yan, Shing Shin Cheng, Jiangliu Wang, Jianshu Zhou, Qi Dou, Yun-hui Liu

Comments: Accepted to ICRA 2024

Subjects: Robotics (cs.RO)

Recently, fiber optic sensors such as fiber Bragg gratings (FBGs) have been widely investigated for shape reconstruction and force estimation of flexible surgical robots. However, most existing approaches need precise model parameters of FBGs inside the fiber and their alignments with the flexible robots for accurate sensing results. Another challenge lies in online acquiring external forces at arbitrary locations along the flexible robots, which is highly required when with large deflections in robotic surgery. In this paper, we propose a novel data-driven paradigm for simultaneous estimation of shape and force along highly deformable flexible robots by using sparse strain measurement from a single-core FBG fiber. A thin-walled soft sensing tube helically embedded with FBG sensors is designed for a robotic-assisted flexible ureteroscope with large deflection up to 270 degrees and a bend radius under 10 mm. We introduce and study three learning models by incorporating spatial strain encoders, and compare their performances in both free space and constrained environments with contact forces at different locations. The experimental results in terms of dynamic shape-force sensing accuracy demonstrate the effectiveness and superiority of the proposed methods.
[65] arXiv:2404.16954 [pdf, other]: Title: Taming False Positives in Out-of-Distribution Detection with Human Feedback

Authors: Harit Vishwakarma, Heguang Lin, Ramya Korlakai Vinayak

Comments: Appeared in the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

Journal-ref: PMLR 238:1486-1494, 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Robustness to out-of-distribution (OOD) samples is crucial for safely deploying machine learning models in the open world. Recent works have focused on designing scoring functions to quantify OOD uncertainty. Setting appropriate thresholds for these scoring functions for OOD detection is challenging as OOD samples are often unavailable up front. Typically, thresholds are set to achieve a desired true positive rate (TPR), e.g., $95\%$ TPR. However, this can lead to very high false positive rates (FPR), ranging from 60 to 96\%, as observed in the Open-OOD benchmark. In safety-critical real-life applications, e.g., medical diagnosis, controlling the FPR is essential when dealing with various OOD samples dynamically. To address these challenges, we propose a mathematically grounded OOD detection framework that leverages expert feedback to \emph{safely} update the threshold on the fly. We provide theoretical results showing that it is guaranteed to meet the FPR constraint at all times while minimizing the use of human feedback. Another key feature of our framework is that it can work with any scoring function for OOD uncertainty quantification. Empirical evaluation of our system on synthetic and benchmark OOD datasets shows that our method can maintain FPR at most $5\%$ while maximizing TPR.
[66] arXiv:2404.16956 [pdf, other]: Title: A Notion of Uniqueness for the Adversarial Bayes Classifier

Authors: Natalie S. Frank

Comments: 46 pages, 7 figures

Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

We propose a new notion of uniqueness for the adversarial Bayes classifier in the setting of binary classification. Analyzing this notion of uniqueness produces a simple procedure for computing all adversarial Bayes classifiers for a well-motivated family of one dimensional data distributions. This characterization is then leveraged to show that as the perturbation radius increases, certain notions of regularity improve for adversarial Bayes classifiers. We demonstrate with various examples that the boundary of the adversarial Bayes classifier frequently lies near the boundary of the Bayes classifier.
[67] arXiv:2404.16957 [pdf, other]: Title: Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability

Authors: Yunfei Ge, Quanyan Zhu

Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

The pervasive integration of Artificial Intelligence (AI) has introduced complex challenges in the responsibility and accountability in the event of incidents involving AI-enabled systems. The interconnectivity of these systems, ethical concerns of AI-induced incidents, coupled with uncertainties in AI technology and the absence of corresponding regulations, have made traditional responsibility attribution challenging. To this end, this work proposes a Computational Reflective Equilibrium (CRE) approach to establish a coherent and ethically acceptable responsibility attribution framework for all stakeholders. The computational approach provides a structured analysis that overcomes the limitations of conceptual approaches in dealing with dynamic and multifaceted scenarios, showcasing the framework's explainability, coherence, and adaptivity properties in the responsibility attribution process. We examine the pivotal role of the initial activation level associated with claims in equilibrium computation. Using an AI-assisted medical decision-support system as a case study, we illustrate how different initializations lead to diverse responsibility distributions. The framework offers valuable insights into accountability in AI-induced incidents, facilitating the development of a sustainable and resilient system through continuous monitoring, revision, and reflection.
[68] arXiv:2404.16958 [pdf, other]: Title: A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice

Authors: Juri Opitz

Comments: to appear in TACL, this is a pre-MIT Press publication version

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Classification systems are evaluated in a countless number of papers. However, we find that evaluation practice is often nebulous. Frequently, metrics are selected without arguments, and blurry terminology invites misconceptions. For instance, many works use so-called 'macro' metrics to rank systems (e.g., 'macro F1') but do not clearly specify what they would expect from such a 'macro' metric. This is problematic, since picking a metric can affect paper findings as well as shared task rankings, and thus any clarity in the process should be maximized.
Starting from the intuitive concepts of bias and prevalence, we perform an analysis of common evaluation metrics, considering expectations as found expressed in papers. Equipped with a thorough understanding of the metrics, we survey metric selection in recent shared tasks of Natural Language Processing. The results show that metric choices are often not supported with convincing arguments, an issue that can make any ranking seem arbitrary. This work aims at providing overview and guidance for more informed and transparent metric selection, fostering meaningful evaluation.
[69] arXiv:2404.16966 [pdf, other]: Title: Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks

Authors: Melissa Ailem, Katerina Marazopoulou, Charlotte Siska, James Bono

Subjects: Computation and Language (cs.CL)

Benchmarks have emerged as the central approach for evaluating Large Language Models (LLMs). The research community often relies on a model's average performance across the test prompts of a benchmark to evaluate the model's performance. This is consistent with the assumption that the test prompts within a benchmark represent a random sample from a real-world distribution of interest. We note that this is generally not the case; instead, we hold that the distribution of interest varies according to the specific use case. We find that (1) the correlation in model performance across test prompts is non-random, (2) accounting for correlations across test prompts can change model rankings on major benchmarks, (3) explanatory factors for these correlations include semantic similarity and common LLM failure points.
[70] arXiv:2404.16967 [pdf, other]: Title: ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain

Authors: Zhikai Li, Steve Vott, Bhaskar Krishnamachar

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

With the growing concern of AI safety, there is a need to trust the computations done by machine learning (ML) models. Blockchain technology, known for recording data and running computations transparently and in a tamper-proof manner, can offer this trust. One significant challenge in deploying ML Classifiers on-chain is that while ML models are typically written in Python using an ML library such as Pytorch, smart contracts deployed on EVM-compatible blockchains are written in Solidity. We introduce Machine Learning to Smart Contract (ML2SC), a PyTorch to Solidity translator that can automatically translate multi-layer perceptron (MLP) models written in Pytorch to Solidity smart contract versions. ML2SC uses a fixed-point math library to approximate floating-point computation. After deploying the generated smart contract, we can train our models off-chain using PyTorch and then further transfer the acquired weights and biases to the smart contract using a function call. Finally, the model inference can also be done with a function call providing the input. We mathematically model the gas costs associated with deploying, updating model parameters, and running inference on these models on-chain, showing that the gas costs increase linearly in various parameters associated with an MLP. We present empirical results matching our modeling. We also evaluate the classification accuracy showing that the outputs obtained by our transparent on-chain implementation are identical to the original off-chain implementation with Pytorch.
[71] arXiv:2404.16969 [pdf, other]: Title: COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations

Authors: Ruben Ciranni, Emilian Postolache, Giorgio Mariani, Michele Mancusi, Luca Cosmo, Emanuele Rodolà

Comments: Demo page: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive learning method for musical audio representations that captures the harmonic and rhythmic coherence between samples. Our method operates at the level of stems (or their combinations) composing music tracks and allows the objective evaluation of compositional models for music in the task of accompaniment generation. We also introduce a new baseline for compositional music generation called CompoNet, based on ControlNet \cite{zhang2023adding}, generalizing the tasks of MSDM, and quantify it against the latter using COCOLA. We release all models trained on public datasets containing separate stems (MUSDB18-HQ, MoisesDB, Slakh2100, and CocoChorales).
[72] arXiv:2404.16970 [pdf, other]: Title: CarbonCP: Carbon-Aware DNN Partitioning with Conformal Prediction for Sustainable Edge Intelligence

Authors: Hongyu Ke, Wanxin Jin, Haoxin Wang

Subjects: Networking and Internet Architecture (cs.NI); Performance (cs.PF)

This paper presents a solution to address carbon emission mitigation for end-to-end edge computing systems, including the computing at battery-powered edge devices and servers, as well as the communications between them. We design and implement, CarbonCP, a context-adaptive, carbon-aware, and uncertainty-aware AI inference framework built upon conformal prediction theory, which balances operational carbon emissions, end-to-end latency, and battery consumption of edge devices through DNN partitioning under varying system processing contexts and carbon intensity. Our experimental results demonstrate that CarbonCP is effective in substantially reducing operational carbon emissions, up to 58.8%, while maintaining key user-centric performance metrics with only 9.9% error rate.
[73] arXiv:2404.16972 [pdf, other]: Title: CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching

Authors: Samia Shafique, Shu Kong, Charless Fowlkes

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Shoeprints are a common type of evidence found at crime scenes and are used regularly in forensic investigations. However, existing methods cannot effectively employ deep learning techniques to match noisy and occluded crime-scene shoeprints to a shoe database due to a lack of training data. Moreover, all existing methods match crime-scene shoeprints to clean reference prints, yet our analysis shows matching to more informative tread depth maps yields better retrieval results. The matching task is further complicated by the necessity to identify similarities only in corresponding regions (heels, toes, etc) of prints and shoe treads. To overcome these challenges, we leverage shoe tread images from online retailers and utilize an off-the-shelf predictor to estimate depth maps and clean prints. Our method, named CriSp, matches crime-scene shoeprints to tread depth maps by training on this data. CriSp incorporates data augmentation to simulate crime-scene shoeprints, an encoder to learn spatially-aware features, and a masking module to ensure only visible regions of crime-scene prints affect retrieval results. To validate our approach, we introduce two validation sets by reprocessing existing datasets of crime-scene shoeprints and establish a benchmarking protocol for comparison. On this benchmark, CriSp significantly outperforms state-of-the-art methods in both automated shoeprint matching and image retrieval tailored to this task.
[74] arXiv:2404.16974 [pdf, ps, other]: Title: DRL2FC: An Attack-Resilient Controller for Automatic Generation Control Based on Deep Reinforcement Learning

Authors: Vasileios Dimitropoulos, Andreas D. Syrmakesis, Nikos Hatziargyriou

Comments: 2 pages, 2 figures, submitted to the 14th Mediterranean Conference on Power Generation, Transmission, Distribution and Energy Conversion

Subjects: Systems and Control (eess.SY)

Power grids heavily rely on Automatic Generation Control (AGC) systems to maintain grid stability by balancing generation and demand. However, the increasing digitization and interconnection of power grid infrastructure expose AGC systems to new vulnerabilities, particularly from cyberattacks such as false data injection attacks (FDIAs). These attacks aim at manipulating sensor measurements and control signals by injecting tampered data into the communication mediums. As such, it is necessary to develop innovative approaches that enhance the resilience of AGC systems. This paper addresses this challenge by exploring the potential of deep reinforcement learning (DRL) to enhancing the resilience of AGC systems against FDIAs. To this end, a DRL-based controller is proposed that dynamically adjusts generator setpoints in response to both load fluctuations and potential cyber threats. The controller learns these optimal control policies by interacting with a simulated power system environment that incorporates the AGC dynamics under cyberattacks. The extensive experiments on test power systems subjected to various FDIAs demonstrate the effectiveness of the presented approach in mitigating the impact of cyberattacks.
[75] arXiv:2404.16978 [pdf, ps, other]: Title: A Three-Field Multiscale Method

Authors: Franklin de Barros, Alexandre L. Madureira, Frédéric Valentin

Subjects: Numerical Analysis (math.NA)

"A Three-Field Domain Decomposition Method" is the title of a seminal paper by F. Brezzi and L. D. Marini which introduces a three-field formulation for elliptic partial differential equations. Based on that, we propose the Multiscale-Hybrid-Hybrid Method (MH$^2$M) for the Darcy model, a multiscale finite element method that yields, after a series of formal manipulations, a symmetric positive definite formulation that depends only on the trace of the solution. We show stability and convergence results for a family of finite element spaces and establish relationships with other multiscale finite element methods.
[76] arXiv:2404.16980 [pdf, other]: Title: Reduced and All-at-Once Approaches for Model Calibration and Discovery in Computational Solid Mechanics

Authors: Ulrich Römer, Stefan Hartmann, Jendrik-Alexander Tröger, David Anton, Henning Wessels, Moritz Flaschel, Laura De Lorenzis

Subjects: Computational Engineering, Finance, and Science (cs.CE)

In the framework of solid mechanics, the task of deriving material parameters from experimental data has recently re-emerged with the progress in full-field measurement capabilities and the renewed advances of machine learning. In this context, new methods such as the virtual fields method and physics-informed neural networks have been developed as alternatives to the already established least-squares and finite element-based approaches. Moreover, model discovery problems are starting to emerge and can also be addressed in a parameter estimation framework. These developments call for a new unified perspective, which is able to cover both traditional parameter estimation methods and novel approaches in which the state variables or the model structure itself are inferred as well. Adopting concepts discussed in the inverse problems community, we distinguish between all-at-once and reduced approaches. With this general framework, we are able to structure a large portion of the literature on parameter estimation in computational mechanics - and we can identify combinations that have not yet been addressed, two of which are proposed in this paper. We also discuss statistical approaches to quantify the uncertainty related to the estimated parameters, and we propose a novel two-step procedure for identification of complex material models based on both frequentist and Bayesian principles. Finally, we illustrate and compare several of the aforementioned methods with mechanical benchmarks based on synthetic and real data.
[77] arXiv:2404.16985 [pdf, other]: Title: Humans prefer interacting with slow, less realistic butterfly simulations

Authors: Paige L. Reiter, Talia Y. Moore

Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)

How should zoomorphic, or bio-inspired, robots indicate to humans that interactions will be safe and fun? Here, a survey is used to measure how human willingness to interact with a simulated butterfly robot is affected by different flight patterns. Flapping frequency, flap to glide ratio, and flapping pattern were independently varied based on a literature review of butterfly and moth flight. Human willingness to interact with these simulations and demographic information were self-reported via an online survey. Low flapping frequency and greater proportion of gliding were preferred, and prior experience with butterflies strongly predicted greater interaction willingness. The preferred flight parameters correspond to migrating butterfly flight patterns that are rarely directly observed by humans and do not correspond to the species that inspired the wing shape of the robot model. The most realistic butterfly simulations were among the least preferred. An analysis of animated butterflies in popular media revealed a convergence on slower, less realistic flight parameters. This iterative and interactive artistic process provides a model for determining human preferences and identifying functional requirements of robots for human interaction. Thus, the robotic design process can be streamlined by leveraging animated models and surveys prior to construction.
[78] arXiv:2404.16986 [pdf, other]: Title: Piecewise Stochastic Barrier Functions

Authors: Rayan Mazouz, Frederik Baymler Mathiesen, Luca Laurenti, Morteza Lahijanian

Subjects: Robotics (cs.RO)

This paper presents a novel stochastic barrier function (SBF) framework for safety analysis of stochastic systems based on piecewise (PW) functions. We first outline a general formulation of PW-SBFs. Then, we focus on PW-Constant (PWC) SBFs and show how their simplicity yields computational advantages for general stochastic systems. Specifically, we prove that synthesis of PWC-SBFs reduces to a minimax optimization problem. Then, we introduce three efficient algorithms to solve this problem, each offering distinct advantages and disadvantages. The first algorithm is based on dual linear programming (LP), which provides an exact solution to the minimax optimization problem. The second is a more scalable algorithm based on iterative counter-example guided synthesis, which involves solving two smaller LPs. The third algorithm solves the minimax problem using gradient descent, which admits even better scalability. We provide an extensive evaluation of these methods on various case studies, including neural network dynamic models, nonlinear switched systems, and high-dimensional linear systems. Our benchmarks demonstrate that PWC-SBFs outperform state-of-the-art methods, namely sum-of-squares and neural barrier functions, and can scale to eight dimensional systems.
[79] arXiv:2404.16989 [pdf, other]: Title: IDIL: Imitation Learning of Intent-Driven Expert Behavior

Authors: Sangwon Seo, Vaibhav Unhelkar

Comments: Extended version of an identically-titled paper accepted at AAMAS 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

When faced with accomplishing a task, human experts exhibit intentional behavior. Their unique intents shape their plans and decisions, resulting in experts demonstrating diverse behaviors to accomplish the same task. Due to the uncertainties encountered in the real world and their bounded rationality, experts sometimes adjust their intents, which in turn influences their behaviors during task execution. This paper introduces IDIL, a novel imitation learning algorithm to mimic these diverse intent-driven behaviors of experts. Iteratively, our approach estimates expert intent from heterogeneous demonstrations and then uses it to learn an intent-aware model of their behavior. Unlike contemporary approaches, IDIL is capable of addressing sequential tasks with high-dimensional state representations, while sidestepping the complexities and drawbacks associated with adversarial training (a mainstay of related techniques). Our empirical results suggest that the models generated by IDIL either match or surpass those produced by recent imitation learning benchmarks in metrics of task performance. Moreover, as it creates a generative model, IDIL demonstrates superior performance in intent inference metrics, crucial for human-agent interactions, and aptly captures a broad spectrum of expert behaviors.
[80] arXiv:2404.16990 [pdf, other]: Title: Record Acceleration of the Two-Dimensional Ising Model Using High-Performance Wafer Scale Engine

Authors: Dirk Van Essendelf, Hayl Almolyki, Wei Shi, Terry Jordan, Mei-Yu Wang, Wissam A. Saidi

Comments: 13 pages, 5 figures, plus supplementary information

Subjects: Hardware Architecture (cs.AR); Materials Science (cond-mat.mtrl-sci)

The versatility and wide-ranging applicability of the Ising model, originally introduced to study phase transitions in magnetic materials, have made it a cornerstone in statistical physics and a valuable tool for evaluating the performance of emerging computer hardware. Here, we present a novel implementation of the two-dimensional Ising model on a Cerebras Wafer-Scale Engine (WSE), a revolutionary processor that is opening new frontiers in computing. In our deployment of the checkerboard algorithm, we optimized the Ising model to take advantage of the unique WSE architecture. Specifically, we employed a compressed bit representation storing 16 spins on each int16 word, and efficiently distributed the spins over the processing units enabling seamless weak scaling and limiting communications to only immediate neighboring units. Our implementation can handle up to 754 simulations in parallel, achieving an aggregate of over 61.8 trillion flip attempts per second for Ising models with up to 200 million spins. This represents a gain of up to 148 times over previously reported single-device with a highly optimized implementation on NVIDIA V100 and up to 88 times in productivity compared to NVIDIA H100. Our findings highlight the significant potential of the WSE in scientific computing, particularly in the field of materials modeling.
[81] arXiv:2404.16992 [pdf, other]: Title: A Catalog of Transformations to Remove Smells From Natural Language Tests

Authors: Manoel Aranda, Naelson Oliveira, Elvys Soares, Márcio Ribeiro, Davi Romão, Ullyanne Patriota, Rohit Gheyi, Emerson Souza, Ivan Machado

Comments: Distinguished Paper Award at International Conference on Evaluation and Assessment in Software Engineering (EASE), 2024 edition

Subjects: Software Engineering (cs.SE)

Test smells can pose difficulties during testing activities, such as poor maintainability, non-deterministic behavior, and incomplete verification. Existing research has extensively addressed test smells in automated software tests but little attention has been given to smells in natural language tests. While some research has identified and catalogued such smells, there is a lack of systematic approaches for their removal. Consequently, there is also a lack of tools to automatically identify and remove natural language test smells. This paper introduces a catalog of transformations designed to remove seven natural language test smells and a companion tool implemented using Natural Language Processing (NLP) techniques. Our work aims to enhance the quality and reliability of natural language tests during software development. The research employs a two-fold empirical strategy to evaluate its contributions. First, a survey involving 15 software testing professionals assesses the acceptance and usefulness of the catalog's transformations. Second, an empirical study evaluates our tool to remove natural language test smells by analyzing a sample of real-practice tests from the Ubuntu OS. The results indicate that software testing professionals find the transformations valuable. Additionally, the automated tool demonstrates a good level of precision, as evidenced by a F-Measure rate of 83.70%
[82] arXiv:2404.16994 [pdf, other]: Title: PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Authors: Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi Feng

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally large computational and data resources, which hinders the progress of video-language models. This paper investigates a straightforward, highly efficient, and resource-light approach to adapting an existing image-language pre-trained model for dense video understanding. Our preliminary experiments reveal that directly fine-tuning pre-trained image-language models with multiple frames as inputs on video datasets leads to performance saturation or even a drop. Our further investigation reveals that it is largely attributed to the bias of learned high-norm visual features. Motivated by this finding, we propose a simple but effective pooling strategy to smooth the feature distribution along the temporal dimension and thus reduce the dominant impacts from the extreme features. The new model is termed Pooling LLaVA, or \nameofmethod{} in short. \nameofmethod{} achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks. Notably, on the recent popular Video ChatGPT benchmark, PLLaVA achieves a score of 3.48 out of 5 on average of five evaluated dimensions, exceeding the previous SOTA results from GPT4V (IG-VLM) by 9\%. On the latest multi-choice benchmark MVBench, PLLaVA achieves 58.1\% accuracy on average across 20 sub-tasks, 14.5\% higher than GPT4V (IG-VLM). Code is available at \url{https://github.com/magic-research/PLLaVA}.
[83] arXiv:2404.16997 [pdf, ps, other]: Title: Probabilistic Interval Analysis of Unreliable Programs

Authors: Dibyendu Das, Soumyajit Dey

Subjects: Programming Languages (cs.PL); Discrete Mathematics (cs.DM)

Advancement of chip technology will make future computer chips faster. Power consumption of such chips shall also decrease. But this speed gain shall not come free of cost, there is going to be a trade-off between speed and efficiency, i.e accuracy of the computation. In order to achieve this extra speed we will simply have to let our computers make more mistakes in computations. Consequently, systems built with these type of chips will possess an innate unreliability lying within. Programs written for these systems will also have to incorporate this unreliability. Researchers have already started developing programming frameworks for unreliable architectures as such.
In the present work, we use a restricted version of C-type languages to model the programs written for unreliable architectures. We propose a technique for statically analyzing codes written for these kind of architectures. Our technique, which primarily focuses on Interval/Range Analysis of this type of programs, uses the well established theory of abstract interpretation. While discussing unreliability of hardware, there comes scope of failure of the hardware components implicitly. There are two types of failure models, namely: 1) permanent failure model, where the hardware stops execution on failure and 2) transient failure model, where on failure, the hardware continues subsequent operations with wrong operand values. In this paper, we've only taken transient failure model into consideration. The goal of this analysis is to predict the probability with which a program variable assumes values from a given range at a given program point.
[84] arXiv:2404.17000 [pdf, other]: Title: Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models

Authors: Bradley P. Allen, Paul T. Groth

Comments: 11 pages, 1 figure, 2 tables, accepted at the European Semantic Web Conference Special Track on Large Language Models for Knowledge Engineering, Hersonissos, Crete, GR, May 2024, for associated code and data, see this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

A backbone of knowledge graphs are their class membership relations, which assign entities to a given class. As part of the knowledge engineering process, we propose a new method for evaluating the quality of these relations by processing descriptions of a given entity and class using a zero-shot chain-of-thought classifier that uses a natural language intensional definition of a class. We evaluate the method using two publicly available knowledge graphs, Wikidata and CaLiGraph, and 7 large language models. Using the gpt-4-0125-preview large language model, the method's classification performance achieves a macro-averaged F1-score of 0.830 on data from Wikidata and 0.893 on data from CaLiGraph. Moreover, a manual analysis of the classification errors shows that 40.9% of errors were due to the knowledge graphs, with 16.0% due to missing relations and 24.9% due to incorrectly asserted relations. These results show how large language models can assist knowledge engineers in the process of knowledge graph refinement. The code and data are available on Github.
[85] arXiv:2404.17009 [pdf, other]: Title: What You Use is What You Get: Unforced Errors in Studying Cultural Aspects in Agile Software Development

Authors: Michael Neumann, Klaus Schmid, Lars Baumann

Subjects: Software Engineering (cs.SE)

Context: Cultural aspects are of high importance as they guide people's behaviour and thus, influence how people apply methods and act in projects. In recent years, software engineering research emphasized the need to analyze the challenges of specific cultural characteristics. Investigating the influence of cultural characteristics is challenging due to the multi-faceted concept of culture. People's behaviour, their beliefs and underlying values are shaped by different layers of culture, e.g., regions, organizations, or groups. In this study, we focus on agile methods, which are agile approaches that focus on underlying values, collaboration and communication. Thus, cultural and social aspects are of high importance for their successful use in practice. Objective: In this paper, we address challenges that arise when using the model of cultural dimensions by Hofstede to characterize specific cultural values. This model is often used when discussing cultural influences in software engineering. Method: As a basis, we conducted an exploratory, multiple case study, consisting of two cases in Japan and two in Germany. Contributions: In this study, we observed that cultural characteristics of the participants differed significantly from cultural characteristics that would typically be expected for people from the respective country. This drives our conclusion that for studies in empirical software engineering that address cultural factors, a case-specific analysis of the characteristics is needed.
[86] arXiv:2404.17010 [pdf, other]: Title: Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models

Authors: Eren Dogan, M. Egemen Uzun, Atahan Uz, H. Emre Seyrek, Ahmed Zeer, Ezgi Sevi, H. Toprak Kesgin, M. Kaan Yuce, M. Fatih Amasyali

Comments: in Turkish language. Baz{\i} \c{c}al{\i}\c{s}malar{\i} i\c{c}ermedi\u{g}ini s\"oyleyen hakem yorumu nedeniyle bir konferanstan kabul almad{\i}. Ancak hakemin bahsetti\u{g}i \c{c}al{\i}\c{s}malar bildiri g\"onderme son tarihinde yay{\i}nlanmam{\i}\c{s}t{\i}

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The developments that language models have provided in fulfilling almost all kinds of tasks have attracted the attention of not only researchers but also the society and have enabled them to become products. There are commercially successful language models available. However, users may prefer open-source language models due to cost, data privacy, or regulations. Yet, despite the increasing number of these models, there is no comprehensive comparison of their performance for Turkish. This study aims to fill this gap in the literature. A comparison is made among seven selected language models based on their contextual learning and question-answering abilities. Turkish datasets for contextual learning and question-answering were prepared, and both automatic and human evaluations were conducted. The results show that for question-answering, continuing pretraining before fine-tuning with instructional datasets is more successful in adapting multilingual models to Turkish and that in-context learning performances do not much related to question-answering performances.
[87] arXiv:2404.17011 [pdf, other]: Title: First-Fit Coloring of Forests in Random Arrival Model

Authors: Bartłomiej Bosek, Grzegorz Gutowski, Michał Lasoń, Jakub Przybyło

Subjects: Discrete Mathematics (cs.DM)

We consider a graph coloring algorithm that processes vertices in order taken uniformly at random and assigns colors to them using First-Fit strategy. We show that this algorithm uses, in expectation, at most $(\frac{1}{2} + o(1))\cdot \ln n \,/\, \ln\ln n$ different colors to color any forest with $n$ vertices. We also construct a family of forests that shows that this bound is best possible.
[88] arXiv:2404.17012 [pdf, other]: Title: Computational hardness of detecting graph lifts and certifying lift-monotone properties of random regular graphs

Authors: Dmitriy Kunisky, Xifan Yu

Comments: 64 pages, 1 table, 4 figures

Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Probability (math.PR)

We introduce a new conjecture on the computational hardness of detecting random lifts of graphs: we claim that there is no polynomial-time algorithm that can distinguish between a large random $d$-regular graph and a large random lift of a Ramanujan $d$-regular base graph (provided that the lift is corrupted by a small amount of extra noise), and likewise for bipartite random graphs and lifts of bipartite Ramanujan graphs. We give evidence for this conjecture by proving lower bounds against the local statistics hierarchy of hypothesis testing semidefinite programs. We then explore the consequences of this conjecture for the hardness of certifying bounds on numerous functions of random regular graphs, expanding on a direction initiated by Bandeira, Banks, Kunisky, Moore, and Wein (2021). Conditional on this conjecture, we show that no polynomial-time algorithm can certify tight bounds on the maximum cut of random 3- or 4-regular graphs, the maximum independent set of random 3- or 4-regular graphs, or the chromatic number of random 7-regular graphs. We show similar gaps asymptotically for large degree for the maximum independent set and for any degree for the minimum dominating set, finding that naive spectral and combinatorial bounds are optimal among all polynomial-time certificates. Likewise, for small-set vertex and edge expansion in the limit of very small sets, we show that the spectral bounds of Kahale (1995) are optimal among all polynomial-time certificates.
[89] arXiv:2404.17013 [pdf, ps, other]: Title: Two-Source and Affine Non-Malleable Extractors for Small Entropy

Authors: Xin Li, Yan Zhong

Comments: To appear in ICALP 24. Abstract shortened due to arXiv requirement

Subjects: Computational Complexity (cs.CC); Combinatorics (math.CO)

Non-malleable extractors are generalizations and strengthening of standard randomness extractors, that are resilient to adversarial tampering. Such extractors have wide applications in cryptography and explicit construction of extractors. In the well-studied models of two-source and affine non-malleable extractors, the previous best constructions only work for entropy rate $>2/3$ and $1-\gamma$ respectively by Li (FOCS' 23).
We present explicit constructions of two-source and affine non-malleable extractors that match the state-of-the-art constructions of standard ones for small entropy. Our main results include two-source and affine non-malleable extractors (over $\mathsf{F}_2$) for sources on $n$ bits with min-entropy $k \ge \log^C n$ and polynomially small error, matching the parameters of standard extractors by Chattopadhyay and Zuckerman (STOC' 16, Annals of Mathematics' 19) and Li (FOCS' 16), as well as those with min-entropy $k = O(\log n)$ and constant error, matching the parameters of standard extractors by Li (FOCS' 23).
Our constructions significantly improve previous results, and the parameters (entropy requirement and error) are the best possible without first improving the constructions of standard extractors. In addition, our improved affine non-malleable extractors give strong lower bounds for a certain kind of read-once linear branching programs, recently introduced by Gryaznov, Pudl\'{a}k, and Talebanfard (CCC' 22) as a generalization of several well-studied computational models. These bounds match the previously best-known average-case hardness results given by Chattopadhyay and Liao (CCC' 23) and Li (FOCS' 23), where the branching program size lower bounds are close to optimal, but the explicit functions we use here are different.\ Our results also suggest a possible deeper connection between non-malleable extractors and standard ones.
[90] arXiv:2404.17017 [pdf, ps, other]: Title: AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks

Authors: Jeremy Harper

Subjects: Multiagent Systems (cs.MA)

The proliferation of large language models (LLMs) and their integration into multi-agent systems has paved the way for sophisticated automation in various domains. This paper introduces AutoGenesisAgent, a multi-agent system that autonomously designs and deploys other multi-agent systems tailored for specific tasks. AutoGenesisAgent comprises several specialized agents including System Understanding, System Design, Agent Generator, and several others that collectively manage the lifecycle of creating functional multi-agent systems from initial concept to deployment. Each agent in AutoGenesisAgent has distinct responsibilities ranging from interpreting input prompts to optimizing system performance, culminating, in the deployment of a ready-to-use system. This proof-of-concept study discusses the design, implementation, and lessons learned from developing AutoGenesisAgent, highlighting its capability to generate and refine multi-agent systems autonomously, thereby reducing the need for extensive human oversight in the initial stages of system design. Keywords: multi-agent systems, large language models, system design automation, agent architecture, autonomous systems, software deployment
[91] arXiv:2404.17018 [pdf, other]: Title: Leveraging AI to Generate Audio for User-generated Content in Video Games

Authors: Thomas Marrinan, Pakeeza Akram, Oli Gurmessa, Anthony Shishkin

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

In video game design, audio (both environmental background music and object sound effects) play a critical role. Sounds are typically pre-created assets designed for specific locations or objects in a game. However, user-generated content is becoming increasingly popular in modern games (e.g. building custom environments or crafting unique objects). Since the possibilities are virtually limitless, it is impossible for game creators to pre-create audio for user-generated content. We explore the use of generative artificial intelligence to create music and sound effects on-the-fly based on user-generated content. We investigate two avenues for audio generation: 1) text-to-audio: using a text description of user-generated content as input to the audio generator, and 2) image-to-audio: using a rendering of the created environment or object as input to an image-to-text generator, then piping the resulting text description into the audio generator. In this paper we discuss ethical implications of using generative artificial intelligence for user-generated content and highlight two prototype games where audio is generated for user-created environments and objects.
[92] arXiv:2404.17020 [pdf, other]: Title: Generating Minimalist Adversarial Perturbations to Test Object-Detection Models: An Adaptive Multi-Metric Evolutionary Search Approach

Authors: Cristopher McIntyre-Garcia, Adrien Heymans, Beril Borali, Won-Sook Lee, Shiva Nejati

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Deep Learning (DL) models excel in computer vision tasks but can be susceptible to adversarial examples. This paper introduces Triple-Metric EvoAttack (TM-EVO), an efficient algorithm for evaluating the robustness of object-detection DL models against adversarial attacks. TM-EVO utilizes a multi-metric fitness function to guide an evolutionary search efficiently in creating effective adversarial test inputs with minimal perturbations. We evaluate TM-EVO on widely-used object-detection DL models, DETR and Faster R-CNN, and open-source datasets, COCO and KITTI. Our findings reveal that TM-EVO outperforms the state-of-the-art EvoAttack baseline, leading to adversarial tests with less noise while maintaining efficiency.
[93] arXiv:2404.17022 [pdf, ps, other]: Title: Investigating differences in lab-quality and remote recording methods with dynamic acoustic measures

Authors: Cong Zhang, Kathleen Jepson, Yu-Ying Chuang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Increasingly, phonetic research utilizes data collected from participants who record themselves on readily available devices. Though such recordings are convenient, their suitability for acoustic analysis remains an open question, especially regarding how the individual methods affect acoustic measures over time. We used Quantile Generalized Additive Mixed Models (QGAMMs) to analyze measures of F0, intensity, and the first and second formants, comparing files recorded using a laboratory-standard recording method (Zoom H6 Recorder with an external microphone), to three remote recording methods, (1) the Awesome Voice Recorder application on a smartphone (AVR), (2) the Zoom meeting application with default settings (Zoom-default), and (3) the Zoom meeting application with the "Turn on Original Sound" setting (Zoom-raw). A linear temporal alignment issue was observed for the Zoom methods over the course of the long, recording session files. However, the difference was not significant for utterance-length files. F0 was reliably measured using all methods. Intensity and formants presented non-linear differences across methods that could not be corrected for simply. Overall, the AVR files were most similar to the H6's, and so AVR is deemed to be a more reliable recording method than either Zoom-default or Zoom-raw.
[94] arXiv:2404.17023 [pdf, other]: Title: Out-of-Distribution Detection using Maximum Entropy Coding

Authors: Mojtaba Abolfazli, Mohammad Zaeri Amirani, Anders Høst-Madsen, June Zhang, Andras Bratincsak

Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

Given a default distribution $P$ and a set of test data $x^M=\{x_1,x_2,\ldots,x_M\}$ this paper seeks to answer the question if it was likely that $x^M$ was generated by $P$. For discrete distributions, the definitive answer is in principle given by Kolmogorov-Martin-L\"{o}f randomness. In this paper we seek to generalize this to continuous distributions. We consider a set of statistics $T_1(x^M),T_2(x^M),\ldots$. To each statistic we associate its maximum entropy distribution and with this a universal source coder. The maximum entropy distributions are subsequently combined to give a total codelength, which is compared with $-\log P(x^M)$. We show that this approach satisfied a number of theoretical properties.
For real world data $P$ usually is unknown. We transform data into a standard distribution in the latent space using a bidirectional generate network and use maximum entropy coding there. We compare the resulting method to other methods that also used generative neural networks to detect anomalies. In most cases, our results show better performance.
[95] arXiv:2404.17025 [pdf, other]: Title: How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

Authors: Shih-Hong Huang, Ya-Fang Lin, Zeyu He, Chieh-Yang Huang, Ting-Hao 'Kenneth' Huang

Subjects: Human-Computer Interaction (cs.HC)

Users can discuss a wide range of topics with large language models (LLMs), but they do not always prefer solving problems or getting information through lengthy conversations. This raises an intriguing HCI question: How does instructing LLMs to engage in longer or shorter conversations affect conversation quality? In this paper, we developed two Slack chatbots using GPT-4 with the ability to vary conversation lengths and conducted a user study. Participants asked the chatbots both highly and less conversable questions, engaging in dialogues with 0, 3, 5, and 7 conversational turns. We found that the conversation quality does not differ drastically across different conditions, while participants had mixed reactions. Our study demonstrates LLMs' ability to change conversation length and the potential benefits for users resulting from such changes, but we caution that changes in text form may not necessarily imply changes in quality or content.
[96] arXiv:2404.17027 [pdf, other]: Title: Player-Driven Emergence in LLM-Driven Game Narrative

Authors: Xiangyu Peng, Jessica Quaye, Weijia Xu, Chris Brockett, Bill Dolan, Nebojsa Jojic, Gabriel DesGarennes, Ken Lobb, Michael Xu, Jorge Leandro, Claire Jin, Sudha Rao

Journal-ref: IEEE Conference on Games 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player's gameplay. We find that through their interactions with the non-deterministic behavior of the LLM, players are able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation.
[97] arXiv:2404.17028 [pdf, ps, other]: Title: Generative AI in Color-Changing Systems: Re-Programmable 3D Object Textures with Material and Design Constraints

Authors: Yunyi Zhu, Faraz Faruqi, Stefanie Mueller

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Advances in Generative AI tools have allowed designers to manipulate existing 3D models using text or image-based prompts, enabling creators to explore different design goals. Photochromic color-changing systems, on the other hand, allow for the reprogramming of surface texture of 3D models, enabling easy customization of physical objects and opening up the possibility of using object surfaces for data display. However, existing photochromic systems require the user to manually design the desired texture, inspect the simulation of the pattern on the object, and verify the efficacy of the generated pattern. These manual design, inspection, and verification steps prevent the user from efficiently exploring the design space of possible patterns. Thus, by designing an automated workflow desired for an end-to-end texture application process, we can allow rapid iteration on different practicable patterns.
In this workshop paper, we discuss the possibilities of extending generative AI systems, with material and design constraints for reprogrammable surfaces with photochromic materials. By constraining generative AI systems to colors and materials possible to be physically realized with photochromic dyes, we can create tools that would allow users to explore different viable patterns, with text and image-based prompts. We identify two focus areas in this topic: photochromic material constraints and design constraints for data-encoded textures. We highlight the current limitations of using generative AI tools to create viable textures using photochromic material. Finally, we present possible approaches to augment generative AI methods to take into account the photochromic material constraints, allowing for the creation of viable photochromic textures rapidly and easily.
[98] arXiv:2404.17029 [pdf, other]: Title: Dr-SAM: An End-to-End Framework for Vascular Segmentation, Diameter Estimation, and Anomaly Detection on Angiography Images

Authors: Vazgen Zohranyan, Vagner Navasardyan, Hayk Navasardyan, Jan Borggrefe, Shant Navasardyan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Recent advancements in AI have significantly transformed medical imaging, particularly in angiography, by enhancing diagnostic precision and patient care. However existing works are limited in analyzing the aorta and iliac arteries, above all for vascular anomaly detection and characterization. To close this gap, we propose Dr-SAM, a comprehensive multi-stage framework for vessel segmentation, diameter estimation, and anomaly analysis aiming to examine the peripheral vessels through angiography images. For segmentation we introduce a customized positive/negative point selection mechanism applied on top of the Segment Anything Model (SAM), specifically for medical (Angiography) images. Then we propose a morphological approach to determine the vessel diameters followed by our histogram-driven anomaly detection approach. Moreover, we introduce a new benchmark dataset for the comprehensive analysis of peripheral vessel angiography images which we hope can boost the upcoming research in this direction leading to enhanced diagnostic precision and ultimately better health outcomes for individuals facing vascular issues.
[99] arXiv:2404.17031 [pdf, other]: Title: Motor Focus: Ego-Motion Prediction with All-Pixel Matching

Authors: Hao Wang, Jiayou Qin, Xiwen Chen, Ashish Bastola, John Suchanek, Zihao Gong, Abolfazl Razi

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Motion analysis plays a critical role in various applications, from virtual reality and augmented reality to assistive visual navigation. Traditional self-driving technologies, while advanced, typically do not translate directly to pedestrian applications due to their reliance on extensive sensor arrays and non-feasible computational frameworks. This highlights a significant gap in applying these solutions to human users since human navigation introduces unique challenges, including the unpredictable nature of human movement, limited processing capabilities of portable devices, and the need for directional responsiveness due to the limited perception range of humans. In this project, we introduce an image-only method that applies motion analysis using optical flow with ego-motion compensation to predict Motor Focus-where and how humans or machines focus their movement intentions. Meanwhile, this paper addresses the camera shaking issue in handheld and body-mounted devices which can severely degrade performance and accuracy, by applying a Gaussian aggregation to stabilize the predicted motor focus area and enhance the prediction accuracy of movement direction. This also provides a robust, real-time solution that adapts to the user's immediate environment. Furthermore, in the experiments part, we show the qualitative analysis of motor focus estimation between the conventional dense optical flow-based method and the proposed method. In quantitative tests, we show the performance of the proposed method on a collected small dataset that is specialized for motor focus estimation tasks.
[100] arXiv:2404.17033 [pdf, other]: Title: Auto-Generating Weak Labels for Real & Synthetic Data to Improve Label-Scarce Medical Image Segmentation

Authors: Tanvi Deshpande, Eva Prakash, Elsie Gyang Ross, Curtis Langlotz, Andrew Ng, Jeya Maria Jose Valanarasu

Comments: Accepted at MIDL 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The high cost of creating pixel-by-pixel gold-standard labels, limited expert availability, and presence of diverse tasks make it challenging to generate segmentation labels to train deep learning models for medical imaging tasks. In this work, we present a new approach to overcome the hurdle of costly medical image labeling by leveraging foundation models like Segment Anything Model (SAM) and its medical alternate MedSAM. Our pipeline has the ability to generate weak labels for any unlabeled medical image and subsequently use it to augment label-scarce datasets. We perform this by leveraging a model trained on a few gold-standard labels and using it to intelligently prompt MedSAM for weak label generation. This automation eliminates the manual prompting step in MedSAM, creating a streamlined process for generating labels for both real and synthetic images, regardless of quantity. We conduct experiments on label-scarce settings for multiple tasks pertaining to modalities ranging from ultrasound, dermatology, and X-rays to demonstrate the usefulness of our pipeline. The code is available at https://github.com/stanfordmlgroup/Auto-Generate-WLs/.
[101] arXiv:2404.17034 [pdf, other]: Title: Learning Actionable Counterfactual Explanations in Large State Spaces

Authors: Keziah Naggita, Matthew R. Walter, Avrim Blum

Subjects: Machine Learning (cs.LG)

Counterfactual explanations (CFEs) are sets of actions that an agent with a negative classification could take to achieve a (desired) positive classification, for consequential decisions such as loan applications, hiring, admissions, etc. In this work, we consider settings where optimal CFEs correspond to solutions of weighted set cover problems. In particular, there is a collection of actions that agents can perform that each have their own cost and each provide the agent with different sets of capabilities. The agent wants to perform the cheapest subset of actions that together provide all the needed capabilities to achieve a positive classification. Since this is an NP-hard optimization problem, we are interested in the question: can we, from training data (instances of agents and their optimal CFEs) learn a CFE generator that will quickly provide optimal sets of actions for new agents?
In this work, we provide a deep-network learning procedure that we show experimentally is able to achieve strong performance at this task. We consider several problem formulations, including formulations in which the underlying "capabilities" and effects of actions are not explicitly provided, and so there is an informational challenge in addition to the computational challenge. Our problem can also be viewed as one of learning an optimal policy in a family of large but deterministic Markov Decision Processes (MDPs).
[102] arXiv:2404.17036 [pdf, other]: Title: Understanding the Career Mobility of Blind and Low Vision Software Professionals

Authors: Yoonha Cha, Victoria Jackson, Isabela Figueira, Stacy M. Branham, André van der Hoek

Comments: 12 pages, 1 table, conference paper, 2024 ACM / IEEE 17th International Conference on Cooperative and Human Aspects of Software Engineering

Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)

Context: Scholars in the software engineering (SE) research community have investigated career advancement in the software industry. Research topics have included how individual and external factors can impact career mobility of software professionals, and how gender affects career advancement. However, the community has yet to look at career mobility from the lens of accessibility. Specifically, there is a pressing need to illuminate the factors that hinder the career mobility of blind and low vision software professionals (BLVSPs). Objective: This study aims to understand aspects of the workplace that impact career mobility for BLVSPs. Methods: We interviewed 26 BLVSPs with different roles, years of experience, and industry sectors. Thematic analysis was used to identify common factors related to career mobility. Results: We found four factors that impacted the career mobility of BLVSPs: (1) technical challenges, (2) colleagues' perceptions of BLVSPs, (3) BLVSPs' own perceptions on managerial progression, and (4) BLVSPs' investment in accessibility at the workplace. Conclusion: We suggest implications for tool designers, organizations, and researchers towards fostering more accessible workplaces to support the career mobility of BLVSPs.
[103] arXiv:2404.17038 [pdf, other]: Title: Evaluating Collaborative Autonomy in Opposed Environments using Maritime Capture-the-Flag Competitions

Authors: Jordan Beason, Michael Novitzky, John Kliem, Tyler Errico, Zachary Serlin, Kevin Becker, Tyler Paine, Michael Benjamin, Prithviraj Dasgupta, Peter Crowley, Charles O'Donnell, John James

Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

Subjects: Robotics (cs.RO)

The objective of this work is to evaluate multi-agent artificial intelligence methods when deployed on teams of unmanned surface vehicles (USV) in an adversarial environment. Autonomous agents were evaluated in real-world scenarios using the Aquaticus test-bed, which is a Capture-the-Flag (CTF) style competition involving teams of USV systems. Cooperative teaming algorithms of various foundations in behavior-based optimization and deep reinforcement learning (RL) were deployed on these USV systems in two versus two teams and tested against each other during a competition period in the fall of 2023. Deep reinforcement learning applied to USV agents was achieved via the Pyquaticus test bed, a lightweight gymnasium environment that allows simulated CTF training in a low-level environment. The results of the experiment demonstrate that rule-based cooperation for behavior-based agents outperformed those trained in Deep-reinforcement learning paradigms as implemented in these competitions. Further integration of the Pyquaticus gymnasium environment for RL with MOOS-IvP in terms of configuration and control schema will allow for more competitive CTF games in future studies. As the development of experimental deep RL methods continues, the authors expect that the competitive gap between behavior-based autonomy and deep RL will be reduced. As such, this report outlines the overall competition, methods, and results with an emphasis on future works such as reward shaping and sim-to-real methodologies and extending rule-based cooperation among agents to react to safety and security events in accordance with human experts intent/rules for executing safety and security processes.
[104] arXiv:2404.17039 [pdf, other]: Title: Differentiating Through Linear Solvers

Authors: Paul Hovland, Jan Hückelheim

Subjects: Mathematical Software (cs.MS); Numerical Analysis (math.NA)

Computer programs containing calls to linear solvers are a known challenge for automatic differentiation. Previous publications advise against differentiating through the low-level solver implementation, and instead advocate for high-level approaches that express the derivative in terms of a modified linear system that can be solved with a separate solver call. Despite this ubiquitous advice, we are not aware of prior work comparing the accuracy of both approaches. With this article we thus empirically study a simple question: What happens if we ignore common wisdom, and differentiate through linear solvers?
[105] arXiv:2404.17041 [pdf, other]: Title: Nuclei-Location Based Point Set Registration of Multi-Stained Whole Slide Images

Authors: Adith Jeyasangar, Abdullah Alsalemi, Shan E Ahmed Raza

Comments: 15 pages, 5 figures, Submitted to Medical Image Understanding and Analysis Conference 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Whole Slide Images (WSIs) provide exceptional detail for studying tissue architecture at the cell level. To study tumour microenvironment (TME) with the context of various protein biomarkers and cell sub-types, analysis and registration of features using multi-stained WSIs is often required. Multi-stained WSI pairs normally suffer from rigid and non-rigid deformities in addition to slide artefacts and control tissue which present challenges at precise registration. Traditional registration methods mainly focus on global rigid/non-rigid registration but struggle with aligning slides with complex tissue deformations at the nuclei level. However, nuclei level non-rigid registration is essential for downstream tasks such as cell sub-type analysis in the context of protein biomarker signatures. This paper focuses on local level non-rigid registration using a nuclei-location based point set registration approach for aligning multi-stained WSIs. We exploit the spatial distribution of nuclei that is prominent and consistent (to a large level) across different stains to establish a spatial correspondence. We evaluate our approach using the HYRECO dataset consisting of 54 re-stained images of H\&E and PHH3 image pairs. The approach can be extended to other IHC and IF stained WSIs considering a good nuclei detection algorithm is accessible. The performance of the model is tested against established registration algorithms and is shown to outperform the model for nuclei level registration.
[106] arXiv:2404.17042 [pdf, other]: Title: Reassessing Relationality for Bipolar Data

Authors: Manuel Cuerno, Fernando Galaz-García, Sergio Galaz-García, Telmo Pérez-Izquierdo

Subjects: Social and Information Networks (cs.SI)

Methods for clustering people into construals--social affinity groups of individuals who share similarities in how they organize their outlooks on a collection of issues--have recently gained traction. Relational Class Analysis (RCA) is currently the most commonly used method for construal clustering. RCA has been applied to identify affinity groups in social spheres as varied as politics, musical preferences, and attitudes towards science. In this study, we highlight limitations in RCA's ability to accurately identify the number and underlying structure of construals. These limitations stem from RCA's mathematical underpinnings and its insensitivity to the bipolar structure of the survey items, which require respondents to place themselves in a support or rejection space and then express the intensity of their support or rejection. We develop an alternative method, which we call Bipolar Class Analysis (BCA), that aims to address this foundational limitation. BCA conceptualizes people's attitudinal positions as moving along support/rejection semispaces and assesses similarity in opinion organization by taking into account position switches across these semispaces. We conduct extensive simulation analyses, with data organized around different construals, to demonstrate that BCA clusters individuals more accurately than RCA and other available alternatives. We also replicate previous analyses to show that BCA leads to substantively different empirical results than those produced by RCA in its original and later versions, and by Correlational Clustering Analysis (CCA), a method that has been proposed as an alternative to RCA.
[107] arXiv:2404.17044 [pdf, other]: Title: A new Taxonomy for Automated Driving: Structuring Applications based on their Operational Design Domain, Level of Automation and Automation Readiness

Authors: Johannes Betz, Melina Lutwitzi, Steven Peters

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

The aim of this paper is to investigate the relationship between operational design domains (ODD), automated driving SAE Levels, and Technology Readiness Level (TRL). The first highly automated vehicles, like robotaxis, are in commercial use, and the first vehicles with highway pilot systems have been delivered to private customers. It has emerged as a crucial issue that these automated driving systems differ significantly in their ODD and in their technical maturity. Consequently, any approach to compare these systems is difficult and requires a deep dive into defined ODDs, specifications, and technologies used. Therefore, this paper challenges current state-of-the-art taxonomies and develops a new and integrated taxonomy that can structure automated vehicle systems more efficiently. We use the well-known SAE Levels 0-5 as the "level of responsibility", and link and describe the ODD at an intermediate level of abstraction. Finally, a new maturity model is explicitly proposed to improve the comparability of automated vehicles and driving functions. This method is then used to analyze today's existing automated vehicle applications, which are structured into the new taxonomy and rated by the new maturity levels. Our results indicate that this new taxonomy and maturity level model will help to differentiate automated vehicle systems in discussions more clearly and to discover white fields more systematically and upfront, e.g. for research but also for regulatory purposes.
[108] arXiv:2404.17045 [pdf, other]: Title: Toward Automated Formation of Composite Micro-Structures Using Holographic Optical Tweezers

Authors: Tommy Zhang, Nicole Werner, Ashis G. Banerjee

Comments: To appear in the Proceedings of the 2024 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS)

Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

Holographic Optical Tweezers (HOT) are powerful tools that can manipulate micro and nano-scale objects with high accuracy and precision. They are most commonly used for biological applications, such as cellular studies, and more recently, micro-structure assemblies. Automation has been of significant interest in the HOT field, since human-run experiments are time-consuming and require skilled operator(s). Automated HOTs, however, commonly use point traps, which focus high intensity laser light at specific spots in fluid media to attract and move micro-objects. In this paper, we develop a novel automated system of tweezing multiple micro-objects more efficiently using multiplexed optical traps. Multiplexed traps enable the simultaneous trapping of multiple beads in various alternate multiplexing formations, such as annular rings and line patterns. Our automated system is realized by augmenting the capabilities of a commercially available HOT with real-time bead detection and tracking, and wavefront-based path planning. We demonstrate the usefulness of the system by assembling two different composite micro-structures, comprising 5 $\mu m$ polystyrene beads, using both annular and line shaped traps in obstacle-rich environments.
[109] arXiv:2404.17046 [pdf, other]: Title: Unraveling Code Clone Dynamics in Deep Learning Frameworks

Authors: Maram Assi, Safwat Hassan, Ying Zou

Comments: 37 pages

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Deep Learning (DL) frameworks play a critical role in advancing artificial intelligence, and their rapid growth underscores the need for a comprehensive understanding of software quality and maintainability. DL frameworks, like other systems, are prone to code clones. Code clones refer to identical or highly similar source code fragments within the same project or even across different projects. Code cloning can have positive and negative implications for software development, influencing maintenance, readability, and bug propagation. In this paper, we aim to address the knowledge gap concerning the evolutionary dimension of code clones in DL frameworks and the extent of code reuse across these frameworks. We empirically analyze code clones in nine popular DL frameworks, i.e., TensorFlow, Paddle, PyTorch, Aesara, Ray, MXNet, Keras, Jax and BentoML, to investigate (1) the characteristics of the long-term code cloning evolution over releases in each framework, (2) the short-term, i.e., within-release, code cloning patterns and their influence on the long-term trends, and (3) the file-level code clones within the DL frameworks. Our findings reveal that DL frameworks adopt four distinct cloning trends and that these trends present some common and distinct characteristics. For instance, bug-fixing activities persistently happen in clones irrespective of the clone evolutionary trend but occur more in the "Serpentine" trend. Moreover, the within release level investigation demonstrates that short-term code cloning practices impact long-term cloning trends. The cross-framework code clone investigation reveals the presence of functional and architectural adaptation file-level cross-framework code clones across the nine studied frameworks. We provide insights that foster robust clone practices and collaborative maintenance in the development of DL frameworks.
[110] arXiv:2404.17047 [pdf, other]: Title: Near to Mid-term Risks and Opportunities of Open Source Generative AI

Authors: Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Jackson, Paul Röttger, Philip H.S. Torr, Trevor Darrell, Yong Suk Lee, Jakob Foerster

Subjects: Machine Learning (cs.LG)

In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact.
[111] arXiv:2404.17048 [pdf, other]: Title: Transductive Spiking Graph Neural Networks for Loihi

Authors: Shay Snyder (1), Victoria Clerico (1, 2), Guojing Cong (3), Shruti Kulkarni (3), Catherine Schuman (4), Sumedh R. Risbud (5), Maryam Parsa (1) ((1) George Mason University, (2) Universidad Politecnica de Madrid, (3) Oak Ridge National Laboratory, (4) University of Tennessee - Knoxville, (5) Intel Labs)

Comments: 6 pages, 4 figures, 3 tables

Subjects: Emerging Technologies (cs.ET); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Graph neural networks have emerged as a specialized branch of deep learning, designed to address problems where pairwise relations between objects are crucial. Recent advancements utilize graph convolutional neural networks to extract features within graph structures. Despite promising results, these methods face challenges in real-world applications due to sparse features, resulting in inefficient resource utilization. Recent studies draw inspiration from the mammalian brain and employ spiking neural networks to model and learn graph structures. However, these approaches are limited to traditional Von Neumann-based computing systems, which still face hardware inefficiencies. In this study, we present a fully neuromorphic implementation of spiking graph neural networks designed for Loihi 2. We optimize network parameters using Lava Bayesian Optimization, a novel hyperparameter optimization system compatible with neuromorphic computing architectures. We showcase the performance benefits of combining neuromorphic Bayesian optimization with our approach for citation graph classification using fixed-precision spiking neurons. Our results demonstrate the capability of integer-precision, Loihi 2 compatible spiking neural networks in performing citation graph classification with comparable accuracy to existing floating point implementations.
[112] arXiv:2404.17051 [pdf, other]: Title: Toward Improving Binary Program Comprehension via Embodied Immersion: A Survey

Authors: Dennis Brown, Emily Mulder, Samuel Mulder

Comments: 27 pages, 4 figures, Submitted to ACM Computing Surveys

Subjects: Human-Computer Interaction (cs.HC)

Binary program comprehension is critical for many use cases but is difficult, suffering from compounded uncertainty and lack of full automation. We seek methods to improve the effectiveness of the human-machine joint cognitive system performing binary PC. We survey three research areas to perform an indirect cognitive task analysis: cognitive models of the PC process, related elements of cognitive theory, and applicable affordances of virtual reality. Based on common elements in these areas, we identify three overarching themes: enhancing abductive iteration, augmenting working memory, and supporting information organization. These themes spotlight several affordances of VR to exploit in future studies of immersive tools for binary PC.
[113] arXiv:2404.17052 [pdf, other]: Title: Asynchronous Neuromorphic Optimization with Lava

Authors: Shay Snyder (1), Sumedh R. Risbud (2), Maryam Parsa (1) ((1) George Mason University, (2) Intel Labs)

Comments: 2 figures

Subjects: Emerging Technologies (cs.ET)

Performing optimization with event-based asynchronous neuromorphic systems presents significant challenges. Intel's neuromorphic computing framework, Lava, offers an abstract application programming interface designed for constructing event-based computational graphs. In this study, we introduce a novel framework tailored for asynchronous Bayesian optimization that is also compatible with Loihi 2. We showcase the capability of our asynchronous optimization framework by connecting it with a graph-based satellite scheduling problem running on physical Loihi 2 hardware.
[114] arXiv:2404.17053 [pdf, other]: Title: Agentive Permissions in Multiagent Systems

Authors: Qi Shi

Comments: The 33rd International Joint Conference on Artificial Intelligence (IJCAI-24)

Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

This paper proposes to distinguish four forms of agentive permissions in multiagent settings. The main technical results are the complexity analysis of model checking, the semantic undefinability of modalities that capture these forms of permissions through each other, and a complete logical system capturing the interplay between these modalities.
[115] arXiv:2404.17059 [pdf, other]: Title: CyNetDiff -- A Python Library for Accelerated Implementation of Network Diffusion Models

Authors: Eliot W. Robson, Dhemath Reddy, Abhishek K. Umrawal

Comments: 4 pages, 3 figures, and 2 tables

Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)

In recent years, there has been increasing interest in network diffusion models and related problems. The most popular of these are the independent cascade and linear threshold models. Much of the recent experimental work done on these models requires a large number of simulations conducted on large graphs, a computationally expensive task suited for low-level languages. However, many researchers prefer the use of higher-level languages (such as Python) for their flexibility and shorter development times. Moreover, in many research tasks, these simulations are the most computationally intensive task, so it would be desirable to have a library for these with an interface to a high-level language with the performance of a low-level language. To fill this niche, we introduce CyNetDiff, a Python library with components written in Cython to provide improved performance for these computationally intensive diffusion tasks.
[116] arXiv:2404.17063 [pdf, other]: Title: WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users

Authors: William Huang, Sam Ghahremani, Siyou Pei, Yang Zhang

Comments: Published for ACM CHI 2024. For source files, see this https URL

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)

Existing pose estimation models perform poorly on wheelchair users due to a lack of representation in training data. We present a data synthesis pipeline to address this disparity in data collection and subsequently improve pose estimation performance for wheelchair users. Our configurable pipeline generates synthetic data of wheelchair users using motion capture data and motion generation outputs simulated in the Unity game engine. We validated our pipeline by conducting a human evaluation, investigating perceived realism, diversity, and an AI performance evaluation on a set of synthetic datasets from our pipeline that synthesized different backgrounds, models, and postures. We found our generated datasets were perceived as realistic by human evaluators, had more diversity than existing image datasets, and had improved person detection and pose estimation performance when fine-tuned on existing pose estimation models. Through this work, we hope to create a foothold for future efforts in tackling the inclusiveness of AI in a data-centric and human-centric manner with the data synthesis techniques demonstrated in this work. Finally, for future works to extend upon, we open source all code in this research and provide a fully configurable Unity Environment used to generate our datasets. In the case of any models we are unable to share due to redistribution and licensing policies, we provide detailed instructions on how to source and replace said models.
[117] arXiv:2404.17065 [pdf, ps, other]: Title: DeLaM: A Dependent Layered Modal Type Theory for Meta-programming

Authors: Jason Z. S. Hu, Brigitte Pientka

Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)

We scale layered modal type theory to dependent types, introducing DeLaM, dependent layered modal type theory. This type theory is novel in that we have one uniform type theory in which we can not only compose and execute code, but also intensionally analyze the code of types and terms. The latter in particular allows us to write tactics as meta-programs and use regular libraries when writing tactics. DeLaM provides a sound foundation for proof assistants to support type-safe tactic mechanism.
[118] arXiv:2404.17068 [pdf, other]: Title: Complete Boolean Algebra for Memristive and Spintronic Asymmetric Basis Logic Functions

Authors: Vaibhav Vyas, Joseph S. Friedman

Comments: 8 pages, 5 figures

Subjects: Emerging Technologies (cs.ET)

The increasing advancement of emerging device technologies that provide alternative basis logic sets necessitates the exploration of innovative logic design automation methodologies. Specifically, emerging computing architectures based on the memristor and the bilayer avalanche spin-diode offer non-commutative or `asymmetric' operations, namely the inverted-input AND (IAND) and implication as basis logic gates. Existing logic design techniques inadequately leverage the unique characteristics of asymmetric logic functions resulting in insufficiently optimized logic circuits. This paper presents a complete Boolean algebraic framework specifically tailored to asymmetric logic functions, introducing fundamental identities, theorems and canonical normal forms that lay the groundwork for efficient synthesis and minimization of such logic circuits without relying on conventional Boolean algebra. Further, this paper establishes a logical relationship between implication and IAND operations. A previously proposed modified Karnaugh map method based on a subset of the presented algebraic principles demonstrated a 28% reduction in computational steps for an algorithmically designed memristive full adder; the presently-proposed algebraic framework lays the foundation for much greater future improvements.
[119] arXiv:2404.17069 [pdf, other]: Title: Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks

Authors: Yaqi Hu, Mingsheng Yin, Marco Mezzavilla, Hao Guo, Sundeep Rangan

Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operate at multiple frequency bands. Data-driven methods, especially generative adversarial networks (GANs), can capture the intricate relationships among data samples, and provide an appropriate tool for FR3 channel modeling. In this work, we present the architecture, link state model, and path generative network of GAN-based FR3 channel modeling. The comparison of our model greatly matches the ray-tracing simulated data.
[120] arXiv:2404.17070 [pdf, other]: Title: Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey

Authors: Lingfan Bao, Joseph Humphreys, Tianhu Peng, Chengxu Zhou

Comments: 14 pages, 4 figures

Subjects: Robotics (cs.RO)

Bipedal robots are garnering increasing global attention due to their potential applications and advancements in artificial intelligence, particularly in Deep Reinforcement Learning (DRL). While DRL has driven significant progress in bipedal locomotion, developing a comprehensive and unified framework capable of adeptly performing a wide range of tasks remains a challenge. This survey systematically categorizes, compares, and summarizes existing DRL frameworks for bipedal locomotion, organizing them into end-to-end and hierarchical control schemes. End-to-end frameworks are assessed based on their learning approaches, whereas hierarchical frameworks are dissected into layers that utilize either learning-based methods or traditional model-based approaches. This survey provides a detailed analysis of the composition, capabilities, strengths, and limitations of each framework type. Furthermore, we identify critical research gaps and propose future directions aimed at achieving a more integrated and efficient framework for bipedal locomotion, with potential broad applications in everyday life.
[121] arXiv:2404.17080 [pdf, other]: Title: Solving the Graph Burning Problem for Large Graphs

Authors: Felipe de Carvalho Pereira, Pedro Jussieu de Rezende, Tallys Yunes, Luiz Fernando Batista Morato

Comments: 10 pages, 1 figure and 2 tables

Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)

We propose an exact algorithm for the Graph Burning Problem ($\texttt{GBP}$), an NP-hard optimization problem that models the spread of influence on social networks. Given a graph $G$ with vertex set $V$, the objective is to find a sequence of $k$ vertices in $V$, namely, $v_1, v_2, \dots, v_k$, such that $k$ is minimum and $\bigcup_{i = 1}^{k} \{u\! \in\! V\! : d(u, v_i) \leq k - i\} = V$, where $d(u,v)$ denotes the distance between $u$ and $v$. We formulate the problem as a set covering integer programming model and design a row generation algorithm for the $\texttt{GBP}$. Our method exploits the fact that a very small number of covering constraints is often sufficient for solving the integer model, allowing the corresponding rows to be generated on demand. To date, the most efficient exact algorithm for the $\texttt{GBP}$, denoted here by $\texttt{GDCA}$, is able to obtain optimal solutions for graphs with up to 14,000 vertices within two hours of execution. In comparison, our algorithm finds provably optimal solutions approximately 236 times faster, on average, than $\texttt{GDCA}$. For larger graphs, memory space becomes a limiting factor for $\texttt{GDCA}$. Our algorithm, however, solves real-world instances with almost 200,000 vertices in less than 35 seconds, increasing the size of graphs for which optimal solutions are known by a factor of 14.
[122] arXiv:2404.17092 [pdf, other]: Title: Defending Spiking Neural Networks against Adversarial Attacks through Image Purification

Authors: Weiran Chen, Qi Sun, Qi Xu

Comments: 8 pages, 5 figures, ECAI2024 under review

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Spiking Neural Networks (SNNs) aim to bridge the gap between neuroscience and machine learning by emulating the structure of the human nervous system. However, like convolutional neural networks, SNNs are vulnerable to adversarial attacks. To tackle the challenge, we propose a biologically inspired methodology to enhance the robustness of SNNs, drawing insights from the visual masking effect and filtering theory. First, an end-to-end SNN-based image purification model is proposed to defend against adversarial attacks, including a noise extraction network and a non-blind denoising network. The former network extracts noise features from noisy images, while the latter component employs a residual U-Net structure to reconstruct high-quality noisy images and generate clean images. Simultaneously, a multi-level firing SNN based on Squeeze-and-Excitation Network is introduced to improve the robustness of the classifier. Crucially, the proposed image purification network serves as a pre-processing module, avoiding modifications to classifiers. Unlike adversarial training, our method is highly flexible and can be seamlessly integrated with other defense strategies. Experimental results on various datasets demonstrate that the proposed methodology outperforms state-of-the-art baselines in terms of defense effectiveness, training time, and resource consumption.
[123] arXiv:2404.17094 [pdf, other]: Title: TIUP: Effective Processor Verification with Tautology-Induced Universal Properties

Authors: Yufeng Li, Yiwei Ci, Qiusong Yang

Comments: Accepted by ASP-DAC 2024, please note that this is not the final camera-ready version

Subjects: Logic in Computer Science (cs.LO); Hardware Architecture (cs.AR); Systems and Control (eess.SY)

Design verification is a complex and costly task, especially for large and intricate processor projects. Formal verification techniques provide advantages by thoroughly examining design behaviors, but they require extensive labor and expertise in property formulation. Recent research focuses on verifying designs using the self-consistency universal property, reducing verification difficulty as it is design-independent. However, the single self-consistency property faces false positives and scalability issues due to exponential state space growth. To tackle these challenges, this paper introduces TIUP, a technique using tautologies as universal properties. We show how TIUP effectively uses tautologies as abstract specifications, covering processor data and control paths. TIUP simplifies and streamlines verification for engineers, enabling efficient formal processor verification.
[124] arXiv:2404.17095 [pdf, other]: Title: The Web unpacked: a quantitative analysis of global Web usage

Authors: Henrique S. Xavier

Comments: 12 pages, 10 figures, 3 tables

Subjects: Computers and Society (cs.CY)

This paper presents a comprehensive analysis of global web usage patterns based on data from SimilarWeb, a leading source for estimating web traffic. Leveraging a dataset comprising over 250,000 websites, we estimate the total web traffic and investigate its distribution among domains and industry sectors. We detail the characteristics of the top 116 domains, which comprise an estimated one-third of all web traffic. Our analysis scrutinizes various attributes of these domains, including their content sources and types, access requirements, offline presence, and ownership features. Our analysis reveals a significant concentration of web traffic, with a diminutive number of top websites capturing the majority of visits. Search engines, news and media, social networks, streaming, and adult content emerge as primary attractors of web traffic, which is also highly concentrated on platforms and USA-owned websites. Much of the traffic goes to for-profit but mostly free-of-charge websites, highlighting the dominance of business models not based on paywalls.
[125] arXiv:2404.17097 [pdf, other]: Title: Rank-Preference Consistency as the Appropriate Metric for Recommender Systems

Authors: Tung Nguyen, Jeffrey Uhlmann

Subjects: Information Retrieval (cs.IR)

In this paper we argue that conventional unitary-invariant measures of recommender system (RS) performance based on measuring differences between predicted ratings and actual user ratings fail to assess fundamental RS properties. More specifically, posing the optimization problem as one of predicting exact user ratings provides only an indirect suboptimal approximation for what RS applications typically need, which is an ability to accurately predict user preferences. We argue that scalar measures such as RMSE and MAE with respect to differences between actual and predicted ratings are only proxies for measuring RS ability to accurately estimate user preferences. We propose what we consider to be a measure that is more fundamentally appropriate for assessing RS performance, rank-preference consistency, which simply counts the number of prediction pairs that are inconsistent with the user's expressed product preferences. For example, if an RS predicts the user will prefer product A over product B, but the user's withheld ratings indicate s/he prefers product B over A, then rank-preference consistency has been violated. Our test results conclusively demonstrate that methods tailored to optimize arbitrary measures such as RMSE are not generally effective at accurately predicting user preferences. Thus, we conclude that conventional methods used for assessing RS performance are arbitrary and misleading.
[126] arXiv:2404.17098 [pdf, other]: Title: CLARE: Cognitive Load Assessment in REaltime with Multimodal Data

Authors: Anubhav Bhatti, Prithila Angkan, Behnam Behinaein, Zunayed Mahmud, Dirk Rodenburg, Heather Braund, P.James Mclellan, Aaron Ruberto, Geoffery Harrison, Daryl Wilson, Adam Szulewski, Dan Howes, Ali Etemad, Paul Hungler

Comments: 12 pages, 10 figures, 6 tables

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

We present a novel multimodal dataset for Cognitive Load Assessment in REaltime (CLARE). The dataset contains physiological and gaze data from 24 participants with self-reported cognitive load scores as ground-truth labels. The dataset consists of four modalities, namely, Electrocardiography (ECG), Electrodermal Activity (EDA), Electroencephalogram (EEG), and Gaze tracking. To map diverse levels of mental load on participants during experiments, each participant completed four nine-minutes sessions on a computer-based operator performance and mental workload task (the MATB-II software) with varying levels of complexity in one minute segments. During the experiment, participants reported their cognitive load every 10 seconds. For the dataset, we also provide benchmark binary classification results with machine learning and deep learning models on two different evaluation schemes, namely, 10-fold and leave-one-subject-out (LOSO) cross-validation. Benchmark results show that for 10-fold evaluation, the convolutional neural network (CNN) based deep learning model achieves the best classification performance with ECG, EDA, and Gaze. In contrast, for LOSO, the best performance is achieved by the deep learning model with ECG, EDA, and EEG.
[127] arXiv:2404.17099 [pdf, other]: Title: Unleashing the Potential of Fractional Calculus in Graph Neural Networks with FROND

Authors: Qiyu Kang, Kai Zhao, Qinxu Ding, Feng Ji, Xuhao Li, Wenfei Liang, Yang Song, Wee Peng Tay

Comments: The Twelfth International Conference on Learning Representations

Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We introduce the FRactional-Order graph Neural Dynamical network (FROND), a new continuous graph neural network (GNN) framework. Unlike traditional continuous GNNs that rely on integer-order differential equations, FROND employs the Caputo fractional derivative to leverage the non-local properties of fractional calculus. This approach enables the capture of long-term dependencies in feature updates, moving beyond the Markovian update mechanisms in conventional integer-order models and offering enhanced capabilities in graph representation learning. We offer an interpretation of the node feature updating process in FROND from a non-Markovian random walk perspective when the feature updating is particularly governed by a diffusion process. We demonstrate analytically that oversmoothing can be mitigated in this setting. Experimentally, we validate the FROND framework by comparing the fractional adaptations of various established integer-order continuous GNNs, demonstrating their consistently improved performance and underscoring the framework's potential as an effective extension to enhance traditional continuous GNNs. The code is available at \url{https://github.com/zknus/ICLR2024-FROND}.
[128] arXiv:2404.17100 [pdf, other]: Title: Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting

Authors: Yuanyuan Liu, Yuxuan Huang, Shuyang Liu, Yibing Zhan, Zijing Chen, Zhe Chen

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In Video-based Facial Expression Recognition (V-FER), models are typically trained on closed-set datasets with a fixed number of known classes. However, these V-FER models cannot deal with unknown classes that are prevalent in real-world scenarios. In this paper, we introduce a challenging Open-set Video-based Facial Expression Recognition (OV-FER) task, aiming at identifying not only known classes but also new, unknown human facial expressions not encountered during training. While existing approaches address open-set recognition by leveraging large-scale vision-language models like CLIP to identify unseen classes, we argue that these methods may not adequately capture the nuanced and subtle human expression patterns required by the OV-FER task. To address this limitation, we propose a novel Human Expression-Sensitive Prompting (HESP) mechanism to significantly enhance CLIP's ability to model video-based facial expression details effectively, thereby presenting a new CLIP-based OV-FER approach. Our proposed HESP comprises three components: 1) a textual prompting module with learnable prompt representations to complement the original CLIP textual prompts and enhance the textual representations of both known and unknown emotions, 2) a visual prompting module that encodes temporal emotional information from video frames using expression-sensitive attention, equipping CLIP with a new visual modeling ability to extract emotion-rich information, 3) a delicately designed open-set multi-task learning scheme that facilitates prompt learning and encourages interactions between the textual and visual prompting modules. Extensive experiments conducted on four OV-FER task settings demonstrate that HESP can significantly boost CLIP's performance (a relative improvement of 17.93% on AUROC and 106.18% on OSCR) and outperform other state-of-the-art open-set video understanding methods by a large margin.
[129] arXiv:2404.17101 [pdf, other]: Title: PASGAL: Parallel And Scalable Graph Algorithm Library

Authors: Xiaojun Dong, Yan Gu, Yihan Sun, Letong Wang

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

In this paper, we introduce PASGAL (Parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graph sizes. One special focus of PASGAL is the efficiency on \textit{large-diameter graphs}, which is a common challenge for many existing parallel graph processing systems: many existing graph processing systems can be even slower than the standard sequential algorithm on large-diameter graphs due to the lack of parallelism. Such performance degeneration is caused by the high overhead in scheduling and synchronizing threads when traversing the graph in the breadth-first order.
The core technique in PASGAL to achieve high parallelism is a technique called \textit{vertical granularity control (VGC)} to hide synchronization overhead, as well as careful redesign of parallel graph algorithms and data structures. In our experiments, we compare PASGAL with state-of-the-art parallel implementations on BFS, SCC, BCC, and SSSP. PASGAL achieves competitive performance on small-diameter graphs compared to the parallel baselines, and is significantly faster on large-diameter graphs.
[130] arXiv:2404.17102 [pdf, other]: Title: An explicit construction of optimized interpolation points on the 4-simplex

Authors: Trenton J. Gobel, David M. Williams

Subjects: Numerical Analysis (math.NA)

In this work, a family of symmetric interpolation points are generated on the four-dimensional simplex (i.e. the pentatope). These points are optimized in order to minimize the Lebesgue constant. The process of generating these points closely follows that outlined by Warburton in "An explicit construction of interpolation nodes on the simplex," Journal of Engineering Mathematics, 2006. Here, Warburton generated optimal interpolation points on the triangle and tetrahedron by formulating explicit geometric warping and blending functions, and applying these functions to equidistant nodal distributions. The locations of the resulting points were Lebesgue-optimized. In our work, we extend this procedure to four dimensions, and construct interpolation points on the pentatope up to order ten. The Lebesgue constants of our nodal sets are calculated, and are shown to outperform those of equidistant nodal distributions.
[131] arXiv:2404.17104 [pdf, other]: Title: Don't Look at the Camera: Achieving Perceived Eye Contact

Authors: Alice Gao, Samyukta Jayakumar, Marcello Maniglia, Brian Curless, Ira Kemelmacher-Shlizerman, Aaron R. Seitz, Steven M. Seitz

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)

We consider the question of how to best achieve the perception of eye contact when a person is captured by camera and then rendered on a 2D display. For single subjects photographed by a camera, conventional wisdom tells us that looking directly into the camera achieves eye contact. Through empirical user studies, we show that it is instead preferable to {\em look just below the camera lens}. We quantitatively assess where subjects should direct their gaze relative to a camera lens to optimize the perception that they are making eye contact.
[132] arXiv:2404.17105 [pdf, other]: Title: Synthesizing Iris Images using Generative Adversarial Networks: Survey and Comparative Analysis

Authors: Shivangi Yadav, Arun Ross

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Biometric systems based on iris recognition are currently being used in border control applications and mobile devices. However, research in iris recognition is stymied by various factors such as limited datasets of bonafide irides and presentation attack instruments; restricted intra-class variations; and privacy concerns. Some of these issues can be mitigated by the use of synthetic iris data. In this paper, we present a comprehensive review of state-of-the-art GAN-based synthetic iris image generation techniques, evaluating their strengths and limitations in producing realistic and useful iris images that can be used for both training and testing iris recognition systems and presentation attack detectors. In this regard, we first survey the various methods that have been used for synthetic iris generation and specifically consider generators based on StyleGAN, RaSGAN, CIT-GAN, iWarpGAN, StarGAN, etc. We then analyze the images generated by these models for realism, uniqueness, and biometric utility. This comprehensive analysis highlights the pros and cons of various GANs in the context of developing robust iris matchers and presentation attack detectors.
[133] arXiv:2404.17110 [pdf, other]: Title: Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT

Authors: Triet H. M. Le, M. Ali Babar, Tung Hoang Thai

Comments: Accepted in the 4th International Workshop on Software Security co-located with the 28th International Conference on Evaluation and Assessment in Software Engineering (EASE) 2024

Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and investigate potential solutions to enhance the performance. Method: We train and test the state-of-the-art model based on CodeBERT with and without data sampling techniques for function-level and line-level SV prediction in three low-resource languages - Kotlin, Swift, and Rust. We also assess the effectiveness of ChatGPT for low-resource SV prediction given its recent success in other domains. Results: Compared to the original work in C/C++ with large data, CodeBERT's performance of function-level and line-level SV prediction significantly declines in low-resource languages, signifying the negative impact of data scarcity. Regarding remediation, data sampling techniques fail to improve CodeBERT; whereas, ChatGPT showcases promising results, substantially enhancing predictive performance by up to 34.4% for the function level and up to 53.5% for the line level. Conclusion: We have highlighted the challenge and made the first promising step for low-resource SV prediction, paving the way for future research in this direction.
[134] arXiv:2404.17113 [pdf, other]: Title: MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

Authors: Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing dataset size and building more effective architectures. However, due to various reasons (such as complex environments and inaccurate labels), current systems still cannot meet the demands of practical applications. Therefore, we plan to organize a series of challenges around emotion recognition to further promote the development of this field. Last year, we launched MER2023, focusing on three topics: multi-label learning, noise robustness, and semi-supervised learning. This year, we continue to organize MER2024. In addition to expanding the dataset size, we introduce a new track around open-vocabulary emotion recognition. The main consideration for this track is that existing datasets often fix the label space and use majority voting to enhance annotator consistency, but this process may limit the model's ability to describe subtle emotions. In this track, we encourage participants to generate any number of labels in any category, aiming to describe the character's emotional state as accurately as possible. Our baseline is based on MERTools and the code is available at: https://github.com/zeroQiaoba/MERTools/tree/master/MER2024.
[135] arXiv:2404.17118 [pdf, ps, other]: Title: Localization of Pallets on Shelves Using Horizontal Plane Projection of a 360-degree Image

Authors: Yasuyo Kita, Yudai Fujieda, Ichiro Matsuda, Nobuyuki Kita

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

In this paper, we propose a method for calculating the three-dimensional (3D) position and orientation of a pallet placed on a shelf on the side of a forklift truck using a 360-degree camera. By using a 360-degree camera mounted on the forklift truck, it is possible to observe both the pallet at the side of the forklift and one several meters ahead. However, the pallet on the obtained image is observed with different distortion depending on its 3D position, so that it is difficult to extract the pallet from the image. To solve this problem, a method [1] has been proposed for detecting a pallet by projecting a 360-degree image on a vertical plane that coincides with the front of the shelf to calculate an image similar to the image seen from the front of the shelf. At the same time as the detection, the approximate position and orientation of the detected pallet can be obtained, but the accuracy is not sufficient for automatic control of the forklift truck. In this paper, we propose a method for accurately detecting the yaw angle, which is the angle of the front surface of the pallet in the horizontal plane, by projecting the 360-degree image on a horizontal plane including the boundary line of the front surface of the detected pallet. The position of the pallet is also determined by moving the vertical plane having the detected yaw angle back and forth, and finding the position at which the degree of coincidence between the projection image on the vertical plane and the actual size of the front surface of the pallet is maximized. Experiments using real images taken in a laboratory and an actual warehouse have confirmed that the proposed method can calculate the position and orientation of a pallet within a reasonable calculation time and with the accuracy necessary for inserting the fork into the hole in the front of the pallet.
[136] arXiv:2404.17120 [pdf, other]: Title: Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs

Authors: Valeriia Cherepanova, James Zou

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) exhibit excellent ability to understand human languages, but do they also understand their own language that appears gibberish to us? In this work we delve into this question, aiming to uncover the mechanisms underlying such behavior in LLMs. We employ the Greedy Coordinate Gradient optimizer to craft prompts that compel LLMs to generate coherent responses from seemingly nonsensical inputs. We call these inputs LM Babel and this work systematically studies the behavior of LLMs manipulated by these prompts. We find that the manipulation efficiency depends on the target text's length and perplexity, with the Babel prompts often located in lower loss minima compared to natural prompts. We further examine the structure of the Babel prompts and evaluate their robustness. Notably, we find that guiding the model to generate harmful texts is not more difficult than into generating benign texts, suggesting lack of alignment for out-of-distribution prompts.
[137] arXiv:2404.17122 [pdf, other]: Title: 2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion

Authors: Dongsheng Wang, Xiaoqin Feng, Zeming Liu, Chuan Wang

Comments: 20 pages

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Named entity recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying entities in sentences into pre-defined types. It plays a crucial role in various research fields, including entity linking, question answering, and online product recommendation. Recent studies have shown that incorporating multilingual and multimodal datasets can enhance the effectiveness of NER. This is due to language transfer learning and the presence of shared implicit features across different modalities. However, the lack of a dataset that combines multilingualism and multimodality has hindered research exploring the combination of these two aspects, as multimodality can help NER in multiple languages simultaneously. In this paper, we aim to address a more challenging task: multilingual and multimodal named entity recognition (MMNER), considering its potential value and influence. Specifically, we construct a large-scale MMNER dataset with four languages (English, French, German and Spanish) and two modalities (text and image). To tackle this challenging MMNER task on the dataset, we introduce a new model called 2M-NER, which aligns the text and image representations using contrastive learning and integrates a multimodal collaboration module to effectively depict the interactions between the two modalities. Extensive experimental results demonstrate that our model achieves the highest F1 score in multilingual and multimodal NER tasks compared to some comparative and representative baselines. Additionally, in a challenging analysis, we discovered that sentence-level alignment interferes a lot with NER models, indicating the higher level of difficulty in our dataset.
[138] arXiv:2404.17123 [pdf, ps, other]: Title: Text Sentiment Analysis and Classification Based on Bidirectional Gated Recurrent Units (GRUs) Model

Authors: Wei Xu, Jianlong Chen, Zhicheng Ding, Jinyin Wang

Subjects: Computation and Language (cs.CL)

This paper explores the importance of text sentiment analysis and classification in the field of natural language processing, and proposes a new approach to sentiment analysis and classification based on the bidirectional gated recurrent units (GRUs) model. The study firstly analyses the word cloud model of the text with six sentiment labels, and then carries out data preprocessing, including the steps of removing special symbols, punctuation marks, numbers, stop words and non-alphabetic parts. Subsequently, the data set is divided into training set and test set, and through model training and testing, it is found that the accuracy of the validation set is increased from 85% to 93% with training, which is an increase of 8%; at the same time, the loss value of the validation set decreases from 0.7 to 0.1 and tends to be stable, and the model is gradually close to the actual value, which can effectively classify the text emotions. The confusion matrix shows that the accuracy of the model on the test set reaches 94.8%, the precision is 95.9%, the recall is 99.1%, and the F1 score is 97.4%, which proves that the model has good generalisation ability and classification effect. Overall, the study demonstrated an effective method for text sentiment analysis and classification with satisfactory results.
[139] arXiv:2404.17125 [pdf, other]: Title: Misaka: Interactive Swarm Testbed for Smart Grid Distributed Algorithm Test and Evaluation

Authors: Tingliang Zhang, Haiwang Zhong, Zhenfei Tan, Xinfei Yan

Journal-ref: 2020 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia)

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)

In this paper, we present Misaka, a visualized swarm testbed for smart grid algorithm evaluation, also an extendable open-source open-hardware platform for developing tabletop tangible swarm interfaces. The platform consists of a collection of custom-designed 3 omni-directional wheels robots each 10 cm in diameter, high accuracy localization through a microdot pattern overlaid on top of the activity sheets, and a software framework for application development and control, while remaining affordable (per unit cost about 30 USD at the prototype stage). We illustrate the potential of tabletop swarm user interfaces through a set of smart grid algorithm application scenarios developed with Misaka.
[140] arXiv:2404.17126 [pdf, other]: Title: Deep Evidential Learning for Dose Prediction

Authors: Hai Siong Tan, Kuancheng Wang, Rafe Mcbeth

Comments: 24 pages, 8 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)

In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of network training. This was achieved only after reformulating the original loss function for a stable implementation. We found that (i)epistemic uncertainty was highly correlated with prediction errors, with various association indices comparable or stronger than those for Monte-Carlo Dropout and Deep Ensemble methods, (ii)the median error varied with uncertainty threshold much more linearly for epistemic uncertainty in Deep Evidential Learning relative to these other two conventional frameworks, indicative of a more uniformly calibrated sensitivity to model errors, (iii)relative to epistemic uncertainty, aleatoric uncertainty demonstrated a more significant shift in its distribution in response to Gaussian noise added to CT intensity, compatible with its interpretation as reflecting data noise. Collectively, our results suggest that Deep Evidential Learning is a promising approach that can endow deep-learning models in radiotherapy dose prediction with statistical robustness. Towards enhancing its clinical relevance, we demonstrate how we can use such a model to construct the predicted Dose-Volume-Histograms' confidence intervals.
[141] arXiv:2404.17129 [pdf, other]: Title: Process Mining Embeddings: Learning Vector Representations for Petri Nets

Authors: Juan G. Colonna, Ahmed A. Fares, Márcio Duarte, Ricardo Sousa

Subjects: Artificial Intelligence (cs.AI)

Process mining offers powerful techniques for discovering, analyzing, and enhancing real-world business processes. In this context, Petri nets provide an expressive means of modeling process behavior. However, directly analyzing and comparing intricate Petri net presents challenges. This study introduces PetriNet2Vec, a novel unsupervised methodology based on Natural Language Processing concepts inspired by Doc2Vec and designed to facilitate the effective comparison, clustering, and classification of process models represented as embedding vectors. These embedding vectors allow us to quantify similarities and relationships between different process models. Our methodology was experimentally validated using the PDC Dataset, featuring 96 diverse Petri net models. We performed cluster analysis, created UMAP visualizations, and trained a decision tree to provide compelling evidence for the capability of PetriNet2Vec to discern meaningful patterns and relationships among process models and their constituent tasks. Through a series of experiments, we demonstrated that PetriNet2Vec was capable of learning the structure of Petri nets, as well as the main properties used to simulate the process models of our dataset. Furthermore, our results showcase the utility of the learned embeddings in two crucial downstream tasks within process mining enhancement: process classification and process retrieval.
[142] arXiv:2404.17136 [pdf, other]: Title: Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study

Authors: Yang Wu, Yao Wan, Hongyu Zhang, Yulei Sui, Wucai Wei, Wei Zhao, Guandong Xu, Hai Jin

Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The Natural Language to Visualization (NL2Vis) task aims to transform natural-language descriptions into visual representations for a grounded table, enabling users to gain insights from vast amounts of data. Recently, many deep learning-based approaches have been developed for NL2Vis. Despite the considerable efforts made by these approaches, challenges persist in visualizing data sourced from unseen databases or spanning multiple tables. Taking inspiration from the remarkable generation capabilities of Large Language Models (LLMs), this paper conducts an empirical study to evaluate their potential in generating visualizations, and explore the effectiveness of in-context learning prompts for enhancing this task. In particular, we first explore the ways of transforming structured tabular data into sequential text prompts, as to feed them into LLMs and analyze which table content contributes most to the NL2Vis. Our findings suggest that transforming structured tabular data into programs is effective, and it is essential to consider the table schema when formulating prompts. Furthermore, we evaluate two types of LLMs: finetuned models (e.g., T5-Small) and inference-only models (e.g., GPT-3.5), against state-of-the-art methods, using the NL2Vis benchmarks (i.e., nvBench). The experimental results reveal that LLMs outperform baselines, with inference-only models consistently exhibiting performance improvements, at times even surpassing fine-tuned models when provided with certain few-shot demonstrations through in-context learning. Finally, we analyze when the LLMs fail in NL2Vis, and propose to iteratively update the results using strategies such as chain-of-thought, role-playing, and code-interpreter. The experimental results confirm the efficacy of iterative updates and hold great potential for future study.
[143] arXiv:2404.17140 [pdf, other]: Title: Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Authors: Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang

Subjects: Computation and Language (cs.CL)

Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs), where LLMs refine their solutions using self-generated critiques that pinpoint the errors. This work explores whether smaller-size (<= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs. We propose a novel pipeline that prompts smaller LMs to collect self-correction data that supports the training of self-refinement abilities. First, we leverage correct solutions to guide the model in critiquing their incorrect responses. Second, the generated critiques, after filtering, are used for supervised fine-tuning of the self-correcting reasoner through solution refinement. Our experimental results show improved self-correction abilities of two models on five datasets spanning math and commonsense reasoning, with notable performance gains when paired with a strong GPT-4-based verifier, though limitations are identified when using a weak self-verifier for determining when to correct.
[144] arXiv:2404.17143 [pdf, other]: Title: Quantifying Memorization of Domain-Specific Pre-trained Language Models using Japanese Newspaper and Paywalls

Authors: Shotaro Ishihara

Comments: TrustNLP: Fourth Workshop on Trustworthy Natural Language Processing (Non-Archival)

Subjects: Computation and Language (cs.CL)

Dominant pre-trained language models (PLMs) have been successful in high-quality natural language generation. However, the analysis of their generation is not mature: do they acquire generalizable linguistic abstractions, or do they simply memorize and recover substrings of the training data? Especially, few studies focus on domain-specific PLM. In this study, we pre-trained domain-specific GPT-2 models using a limited corpus of Japanese newspaper articles and quantified memorization of training data by comparing them with general Japanese GPT-2 models. Our experiments revealed that domain-specific PLMs sometimes "copy and paste" on a large scale. Furthermore, we replicated the empirical finding that memorization is related to duplication, model size, and prompt length, in Japanese the same as in previous English studies. Our evaluations are relieved from data contamination concerns by focusing on newspaper paywalls, which prevent their use as training data. We hope that our paper encourages a sound discussion such as the security and copyright of PLMs.
[145] arXiv:2404.17144 [pdf, ps, other]: Title: Sensor Response-Time Reduction using Long-Short Term Memory Network Forecasting

Authors: Simon J. Ward, Muhamed Baljevic, Sharon M. Weiss

Comments: 9 pages, 3 figures

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

The response time of a biosensor is a crucial metric in safety-critical applications such as medical diagnostics where an earlier diagnosis can markedly improve patient outcomes. However, the speed at which a biosensor reaches a final equilibrium state can be limited by poor mass transport and long molecular diffusion times that increase the time it takes target molecules to reach the active sensing region of a biosensor. While optimization of system and sensor design can promote molecules reaching the sensing element faster, a simpler and complementary approach for response time reduction that is widely applicable across all sensor platforms is to use time-series forecasting to predict the ultimate steady-state sensor response. In this work, we show that ensembles of long short-term memory (LSTM) networks can accurately predict equilibrium biosensor response from a small quantity of initial time-dependent biosensor measurements, allowing for significant reduction in response time by a mean and median factor of improvement of 18.6 and 5.1, respectively. The ensemble of models also provides simultaneous estimation of uncertainty, which is vital to provide confidence in the predictions and subsequent safety-related decisions that are made. This approach is demonstrated on real-time experimental data collected by exposing porous silicon biosensors to buffered protein solutions using a multi-channel fluidic cell that enables the automated measurement of 100 porous silicon biosensors in parallel. The dramatic improvement in sensor response time achieved using LSTM network ensembles and associated uncertainty quantification opens the door to trustworthy and faster responding biosensors, enabling more rapid medical diagnostics for improved patient outcomes and healthcare access, as well as quicker identification of toxins in food and the environment.
[146] arXiv:2404.17147 [pdf, other]: Title: On the Federated Learning Framework for Cooperative Perception

Authors: Zhenrong Zhang, Jianan Liu, Xi Zhou, Tao Huang, Qing-Long Han, Jingxin Liu, Hongbin Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cooperative perception is essential to enhance the efficiency and safety of future transportation systems, requiring extensive data sharing among vehicles on the road, which raises significant privacy concerns. Federated learning offers a promising solution by enabling data privacy-preserving collaborative enhancements in perception, decision-making, and planning among connected and autonomous vehicles (CAVs). However, federated learning is impeded by significant challenges arising from data heterogeneity across diverse clients, potentially diminishing model accuracy and prolonging convergence periods. This study introduces a specialized federated learning framework for CP, termed the federated dynamic weighted aggregation (FedDWA) algorithm, facilitated by dynamic adjusting loss (DALoss) function. This framework employs dynamic client weighting to direct model convergence and integrates a novel loss function that utilizes Kullback-Leibler divergence (KLD) to counteract the detrimental effects of non-independently and identically distributed (Non-IID) and unbalanced data. Utilizing the BEV transformer as the primary model, our rigorous testing on the OpenV2V dataset, augmented with FedBEVT data, demonstrates significant improvements in the average intersection over union (IoU). These results highlight the substantial potential of our federated learning framework to address data heterogeneity challenges in CP, thereby enhancing the accuracy of environmental perception models and facilitating more robust and efficient collaborative learning solutions in the transportation sector.
[147] arXiv:2404.17148 [pdf, other]: Title: Direct Regression of Distortion Field from a Single Fingerprint Image

Authors: Xiongjun Guan, Yongjie Duan, Jianjiang Feng, Jie Zhou

Journal-ref: 2022 IEEE International Joint Conference on Biometrics (IJCB), Abu Dhabi, United Arab Emirates, 2022, pp. 1-8

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Skin distortion is a long standing challenge in fingerprint matching, which causes false non-matches. Previous studies have shown that the recognition rate can be improved by estimating the distortion field from a distorted fingerprint and then rectifying it into a normal fingerprint. However, existing rectification methods are based on principal component representation of distortion fields, which is not accurate and are very sensitive to finger pose. In this paper, we propose a rectification method where a self-reference based network is utilized to directly estimate the dense distortion field of distorted fingerprint instead of its low dimensional representation. This method can output accurate distortion fields of distorted fingerprints with various finger poses. Considering the limited number and variety of distorted fingerprints in the existing public dataset, we collected more distorted fingerprints with diverse finger poses and distortion patterns as a new database. Experimental results demonstrate that our proposed method achieves the state-of-the-art rectification performance in terms of distortion field estimation and rectified fingerprint matching.
[148] arXiv:2404.17149 [pdf, other]: Title: Pose-Specific 3D Fingerprint Unfolding

Authors: Xiongjun Guan, Jianjiang Feng, Jie Zhou

Journal-ref: 15th Chinese Conference on Biometric Recognition (CCBR), Shanghai, China, 2021, pp. 185-194

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In order to make 3D fingerprints compatible with traditional 2D flat fingerprints, a common practice is to unfold the 3D fingerprint into a 2D rolled fingerprint, which is then matched with the flat fingerprints by traditional 2D fingerprint recognition algorithms. The problem with this method is that there may be large elastic deformation between the unfolded rolled fingerprint and flat fingerprint, which affects the recognition rate. In this paper, we propose a pose-specific 3D fingerprint unfolding algorithm to unfold the 3D fingerprint using the same pose as the flat fingerprint. Our experiments show that the proposed unfolding algorithm improves the compatibility between 3D fingerprint and flat fingerprint and thus leads to higher genuine matching scores.
[149] arXiv:2404.17151 [pdf, other]: Title: MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Authors: Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiaonan Luo, Xiangjian He

Comments: Accepted by Transaction on Multimedia

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)

Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a novel approach, named ``MorphText", to capture the regularity of texts by embedding deep morphology for arbitrary-shape text detection. Towards this end, two deep morphological modules are designed to regularize text segments and determine the linkage between them. First, a Deep Morphological Opening (DMOP) module is constructed to remove false text segment detections generated in the feature extraction process. Then, a Deep Morphological Closing (DMCL) module is proposed to allow text instances of various shapes to stretch their morphology along their most significant orientation while deriving their connections. Extensive experiments conducted on four challenging benchmark datasets (CTW1500, Total-Text, MSRA-TD500 and ICDAR2017) demonstrate that our proposed MorphText outperforms both top-down and bottom-up state-of-the-art arbitrary-shape scene text detection approaches.
[150] arXiv:2404.17152 [pdf, other]: Title: CSCO: Connectivity Search of Convolutional Operators

Authors: Tunhou Zhang, Shiyu Li, Hsin-Pai Cheng, Feng Yan, Hai Li, Yiran Chen

Comments: To appear on Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Exploring dense connectivity of convolutional operators establishes critical "synapses" to communicate feature vectors from different levels and enriches the set of transformations on Computer Vision applications. Yet, even with heavy-machinery approaches such as Neural Architecture Search (NAS), discovering effective connectivity patterns requires tremendous efforts due to either constrained connectivity design space or a sub-optimal exploration process induced by an unconstrained search space. In this paper, we propose CSCO, a novel paradigm that fabricates effective connectivity of convolutional operators with minimal utilization of existing design motifs and further utilizes the discovered wiring to construct high-performing ConvNets. CSCO guides the exploration via a neural predictor as a surrogate of the ground-truth performance. We introduce Graph Isomorphism as data augmentation to improve sample efficiency and propose a Metropolis-Hastings Evolutionary Search (MH-ES) to evade locally optimal architectures and advance search quality. Results on ImageNet show ~0.6% performance improvement over hand-crafted and NAS-crafted dense connectivity. Our code is publicly available.
[151] arXiv:2404.17153 [pdf, other]: Title: A Unified Debugging Approach via LLM-Based Multi-Agent Synergy

Authors: Cheryl Lee, Chunqiu Steven Xia, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, Michael R. Lyu

Subjects: Software Engineering (cs.SE)

Tremendous efforts have been devoted to automating software debugging, a time-consuming process involving fault localization and repair generation. Recently, Large Language Models (LLMs) have shown great potential in automated debugging. However, we identified three challenges posed to traditional and LLM-based debugging tools: 1) the upstream imperfection of fault localization affects the downstream repair, 2) the deficiency in handling complex logic errors, and 3) the ignorance of program contexts. In this context, we propose the first automated, unified debugging framework, FixAgent, via LLM agent synergy. FixAgent can perform end-to-end localization, repair, and analysis of bugs. Our insight is that LLMs can benefit from general software engineering principles recognized by human developers in debugging, such as rubber duck debugging, enabling a better understanding of program functionality and logic bugs. Hence, we create three designs inspired by rubber ducking to address these challenges. They are agent specialization and synergy, key variable tracking, and program context comprehension, which request LLMs to provide explicit explanations and force them to focus on crucial program logic information. Experiments on the widely used dataset QuixBugs show that FixAgent correctly fixes 79 out of 80 bugs, 9 of which have never been fixed. It also plausibly patches 1.9X more defects than the best-performing repair tool on CodeFlaws, even with no bug location information and fewer than 0.6% sampling times. On average, FixAgent increases about 20% plausible and correct fixes compared to its base model using different LLMs, showing the effectiveness of our designs. Moreover, the correctness rate of FixAgent reaches remarkably 97.26%, indicating that FixAgent can potentially overcome the overfitting issue of the existing approaches.
[152] arXiv:2404.17157 [pdf, other]: Title: Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation

Authors: Nanxu Gong, Wangyang Ying, Dongjie Wang, Yanjie Fu

Subjects: Machine Learning (cs.LG)

Feature selection aims to identify the optimal feature subset for enhancing downstream models. Effective feature selection can remove redundant features, save computational resources, accelerate the model learning process, and improve the model overall performance. However, existing works are often time-intensive to identify the effective feature subset within high-dimensional feature spaces. Meanwhile, these methods mainly utilize a single downstream task performance as the selection criterion, leading to the selected subsets that are not only redundant but also lack generalizability. To bridge these gaps, we reformulate feature selection through a neuro-symbolic lens and introduce a novel generative framework aimed at identifying short and effective feature subsets. More specifically, we found that feature ID tokens of the selected subset can be formulated as symbols to reflect the intricate correlations among features. Thus, in this framework, we first create a data collector to automatically collect numerous feature selection samples consisting of feature ID tokens, model performance, and the measurement of feature subset redundancy. Building on the collected data, an encoder-decoder-evaluator learning paradigm is developed to preserve the intelligence of feature selection into a continuous embedding space for efficient search. Within the learned embedding space, we leverage a multi-gradient search algorithm to find more robust and generalized embeddings with the objective of improving model performance and reducing feature subset redundancy. These embeddings are then utilized to reconstruct the feature ID tokens for executing the final feature selection. Ultimately, comprehensive experiments and case studies are conducted to validate the effectiveness of the proposed framework.
[153] arXiv:2404.17158 [pdf, ps, other]: Title: Online $\mathrm{L}^{\natural}$-Convex Minimization

Authors: Ken Yokoyama, Shinji Ito, Tatsuya Matsuoka, Kei Kimura, Makoto Yokoo

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

An online decision-making problem is a learning problem in which a player repeatedly makes decisions in order to minimize the long-term loss. These problems that emerge in applications often have nonlinear combinatorial objective functions, and developing algorithms for such problems has attracted considerable attention. An existing general framework for dealing with such objective functions is the online submodular minimization. However, practical problems are often out of the scope of this framework, since the domain of a submodular function is limited to a subset of the unit hypercube. To manage this limitation of the existing framework, we in this paper introduce the online $\mathrm{L}^{\natural}$-convex minimization, where an $\mathrm{L}^{\natural}$-convex function generalizes a submodular function so that the domain is a subset of the integer lattice. We propose computationally efficient algorithms for the online $\mathrm{L}^{\natural}$-convex function minimization in two major settings: the full information and the bandit settings. We analyze the regrets of these algorithms and show in particular that our algorithm for the full information setting obtains a tight regret bound up to a constant factor. We also demonstrate several motivating examples that illustrate the usefulness of the online $\mathrm{L}^{\natural}$-convex minimization.
[154] arXiv:2404.17159 [pdf, other]: Title: Phase-aggregated Dual-branch Network for Efficient Fingerprint Dense Registration

Authors: Xiongjun Guan, Jianjiang Feng, Jie Zhou

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Fingerprint dense registration aims to finely align fingerprint pairs at the pixel level, thereby reducing intra-class differences caused by distortion. Unfortunately, traditional methods exhibited subpar performance when dealing with low-quality fingerprints while suffering from slow inference speed. Although deep learning based approaches shows significant improvement in these aspects, their registration accuracy is still unsatisfactory. In this paper, we propose a Phase-aggregated Dual-branch Registration Network (PDRNet) to aggregate the advantages of both types of methods. A dual-branch structure with multi-stage interactions is introduced between correlation information at high resolution and texture feature at low resolution, to perceive local fine differences while ensuring global stability. Extensive experiments are conducted on more comprehensive databases compared to previous works. Experimental results demonstrate that our method reaches the state-of-the-art registration performance in terms of accuracy and robustness, while maintaining considerable competitiveness in efficiency.
[155] arXiv:2404.17161 [pdf, other]: Title: An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder

Authors: Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu

Comments: arXiv admin note: text overlap with arXiv:2311.14957

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Generative Adversarial Network (GAN) based vocoders are superior in both inference speed and synthesis quality when reconstructing an audible waveform from an acoustic representation. This study focuses on improving the discriminator for GAN-based vocoders. Most existing Time-Frequency Representation (TFR)-based discriminators are rooted in Short-Time Fourier Transform (STFT), which owns a constant Time-Frequency (TF) resolution, linearly scaled center frequencies, and a fixed decomposition basis, making it incompatible with signals like singing voices that require dynamic attention for different frequency bands and different time intervals. Motivated by that, we propose a Multi-Scale Sub-Band Constant-Q Transform CQT (MS-SB-CQT) discriminator and a Multi-Scale Temporal-Compressed Continuous Wavelet Transform CWT (MS-TC-CWT) discriminator. Both CQT and CWT have a dynamic TF resolution for different frequency bands. In contrast, CQT has a better modeling ability in pitch information, and CWT has a better modeling ability in short-time transients. Experiments conducted on both speech and singing voices confirm the effectiveness of our proposed discriminators. Moreover, the STFT, CQT, and CWT-based discriminators can be used jointly for better performance. The proposed discriminators can boost the synthesis quality of various state-of-the-art GAN-based vocoders, including HiFi-GAN, BigVGAN, and APNet.
[156] arXiv:2404.17163 [pdf, ps, other]: Title: Intractability results for integration in tensor product spaces

Authors: Erich Novak, Friedrich Pillichshammer

Subjects: Numerical Analysis (math.NA)

We study lower bounds on the worst-case error of numerical integration in tensor product spaces. As reference we use the $N$-th minimal error of linear rules that use $N$ function values. The information complexity is the minimal number $N$ of function evaluations that is necessary such that the $N$-th minimal error is less than a factor $\varepsilon$ times the initial error. We are interested to which extent the information complexity depends on the number $d$ of variables of the integrands. If the information complexity grows exponentially fast in $d$, then the integration problem is said to suffer from the curse of dimensionality.
Under the assumption of the existence of a worst-case function for the uni-variate problem we present two methods for providing good lower bounds on the information complexity. The first method is based on a suitable decomposition of the worst-case function. This method can be seen as a generalization of the method of decomposable reproducing kernels, that is often successfully applied when integration in Hilbert spaces with a reproducing kernel is studied. The second method, although only applicable for positive quadrature rules, has the advantage, that it does not require a suitable decomposition of the worst-case function. Rather, it is based on a spline approximation of the worst-case function and can be used for analytic functions.
The methods presented can be applied to problems beyond the Hilbert space setting. For demonstration purposes we apply them to several examples, notably to uniform integration over the unit-cube, weighted integration over the whole space, and integration of infinitely smooth functions over the cube. Some of these results have interesting consequences in discrepancy theory.
[157] arXiv:2404.17164 [pdf, other]: Title: DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs

Authors: Xindi Zheng, Yuwei Wu, Yu Pan, Wanyu Lin, Lei Ma, Jianjun Zhao

Comments: 9 pages

Subjects: Machine Learning (cs.LG)

Missing data imputation poses a paramount challenge when dealing with graph data. Prior works typically are based on feature propagation or graph autoencoders to address this issue. However, these methods usually encounter the over-smoothing issue when dealing with missing data, as the graph neural network (GNN) modules are not explicitly designed for handling missing data. This paper proposes a novel framework, called Dual-Path Generative Adversarial Network (DPGAN), that can deal simultaneously with missing data and avoid over-smoothing problems. The crux of our work is that it admits both global and local representations of the input graph signal, which can capture the long-range dependencies. It is realized via our proposed generator, consisting of two key components, i.e., MLPUNet++ and GraphUNet++. Our generator is trained with a designated discriminator via an adversarial process. In particular, to avoid assessing the entire graph as did in the literature, our discriminator focuses on the local subgraph fidelity, thereby boosting the quality of the local imputation. The subgraph size is adjustable, allowing for control over the intensity of adversarial regularization. Comprehensive experiments across various benchmark datasets substantiate that DPGAN consistently rivals, if not outperforms, existing state-of-the-art imputation algorithms. The code is provided at \url{https://github.com/momoxia/DPGAN}.
[158] arXiv:2404.17168 [pdf, ps, other]: Title: On the invertibility of matrices with a double saddle-point structure

Authors: Fatemeh P. A. Beik, Chen Greif, Manfred Trummer

Subjects: Numerical Analysis (math.NA)

We establish necessary and sufficient conditions for invertiblility of symmetric three-by-three block matrices having a double saddle-point structure that guarantee the unique solvability of double saddle-point systems. We consider various scenarios, including the case where all diagonal blocks are allowed to be rank deficient. Under certain conditions related to the ranks of the blocks and intersections of their kernels, an explicit formula for the inverse is derived.
[159] arXiv:2404.17169 [pdf, other]: Title: FairGT: A Fairness-aware Graph Transformer

Authors: Renqiang Luo, Huafei Huang, Shuo Yu, Xiuzhen Zhang, Feng Xia

Journal-ref: IJCAI2024

Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

The design of Graph Transformers (GTs) generally neglects considerations for fairness, resulting in biased outcomes against certain sensitive subgroups. Since GTs encode graph information without relying on message-passing mechanisms, conventional fairness-aware graph learning methods cannot be directly applicable to address these issues. To tackle this challenge, we propose FairGT, a Fairness-aware Graph Transformer explicitly crafted to mitigate fairness concerns inherent in GTs. FairGT incorporates a meticulous structural feature selection strategy and a multi-hop node feature integration method, ensuring independence of sensitive features and bolstering fairness considerations. These fairness-aware graph information encodings seamlessly integrate into the Transformer framework for downstream tasks. We also prove that the proposed fair structural topology encoding with adjacency matrix eigenvector selection and multi-hop integration are theoretically effective. Empirical evaluations conducted across five real-world datasets demonstrate FairGT's superiority in fairness metrics over existing graph transformers, graph neural networks, and state-of-the-art fairness-aware graph learning approaches.
[160] arXiv:2404.17170 [pdf, other]: Title: S-IQA Image Quality Assessment With Compressive Sampling

Authors: Ronghua Liao, Chen Hui, Lang Yuan, Feng Jiang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

No-Reference Image Quality Assessment (IQA) aims at estimating image quality in accordance with subjective human perception. However, most existing NR-IQA methods focus on exploring increasingly complex networks or components to improve the final performance. Such practice imposes great limitations and complexity on IQA methods, especially when they are applied to high-resolution (HR) images in the real world. Actually, most images own high spatial redundancy, especially for those HR data. To further exploit the characteristic and alleviate the issue above, we propose a new framework for Image Quality Assessment with compressive Sampling (dubbed S-IQA), which consists of three components: (1) The Flexible Sampling Module (FSM) samples the image to obtain measurements at an arbitrary ratio. (2) Vision Transformer with the Adaptive Embedding Module (AEM) makes measurements of uniform size and extracts deep features (3) Dual Branch (DB) allocates weight for every patch and predicts the final quality score. Experiments show that our proposed S-IQA achieves state-of-the-art result on various datasets with less data usage.
[161] arXiv:2404.17171 [pdf, ps, other]: Title: The rise of Indo-German collaborative research: 1990-2022

Authors: Aasif Ahmad Mir, Nina Smirnova, Jeyshankar Ramalingam, Philipp Mayr

Comments: 37 pages, 9 figures, accepted paper Global Knowledge, Memory and Communication

Subjects: Digital Libraries (cs.DL)

The study aims to highlight the growth and development of Indo-German collaborative research over the past three decades. Moreover, this study encompasses an in-depth examination of funding acknowledgements to gain valuable insights into the financial support that underpins these collaborative endeavors. Together with this paper, we provide an openly accessible dataset of Indo-German research articles for further and reproducible research activities (the "Indo-German Literature Dataset"). The data were retrieved from the Web of Science (WoS) database from the year 1990 till the 30th of November 2022. A total of 36,999 records were retrieved against the employed query. Acknowledged entities were extracted using a NER model specifically trained for this task. Interrelations between the extracted entities and scientific domains, lengths of acknowledgement texts, number of authors and affiliations, number of citations, and gender of the first author, as well as collaboration patterns between Indian and German funders were examined. The study brings to light that Physics, Chemistry, Materials Science, Astronomy and Astrophysics, and Engineering prominently dominate the Indo-German collaborative research. The United States, followed by England and France, are the most active collaborators in Indian and German research. Additionally, relations between entity, entity type, and scientific domain, were discovered. The study highlights a deeper understanding of the composition of the Indo-German collaborative research landscape of the last 30 years and its significance in advancing scientific knowledge and fostering international partnerships. Furthermore, we provide an open version of the original WoS dataset. The Indo-German Literature Dataset consists of 22,844 articles from OpenAlex and is available for related studies like literature studies and Scientometrics.
[162] arXiv:2404.17173 [pdf, other]: Title: Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification

Authors: Yanbiao Ma, Licheng Jiao, Fang Liu, Lingling Li, Shuyuan Yang, Xu Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In semi-supervised learning, methods that rely on confidence learning to generate pseudo-labels have been widely proposed. However, increasing research finds that when faced with noisy and biased data, the model's representation network is more reliable than the classification network. Additionally, label generation methods based on model predictions often show poor adaptability across different datasets, necessitating customization of the classification network. Therefore, we propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels. We also introduce an adaptive method for selecting hyperparameters in HDL, enhancing its versatility. Moreover, HDL can be combined with general image encoders (e.g., CLIP) to serve as a fundamental data processing module. We extract embeddings from datasets with class-balanced and long-tailed distributions using pre-trained semi-supervised models. Subsequently, samples are re-labeled using HDL, and the re-labeled samples are used to further train the semi-supervised models. Experiments demonstrate improved model performance, validating the motivation that representation networks are more reliable than classifiers or predictors. Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
[163] arXiv:2404.17174 [pdf, other]: Title: Optimizing Cycle Life Prediction of Lithium-ion Batteries via a Physics-Informed Model

Authors: Constantin-Daniel Nicolae, Sara Sameer, Nathan Sun, Karena Yan

Subjects: Machine Learning (cs.LG)

Accurately measuring the cycle lifetime of commercial lithium-ion batteries is crucial for performance and technology development. We introduce a novel hybrid approach combining a physics-based equation with a self-attention model to predict the cycle lifetimes of commercial lithium iron phosphate graphite cells via early-cycle data. After fitting capacity loss curves to this physics-based equation, we then use a self-attention layer to reconstruct entire battery capacity loss curves. Our model exhibits comparable performances to existing models while predicting more information: the entire capacity loss curve instead of cycle life. This provides more robustness and interpretability: our model does not need to be retrained for a different notion of end-of-life and is backed by physical intuition.
[164] arXiv:2404.17175 [pdf, ps, other]: Title: Over-the-Air Modulation for RIS-assisted Symbiotic Radios: Design, Analysis, and Optimization

Authors: Hu Zhou, Ying-Chang Liang, Chau Yuen

Comments: 13 pages, 9 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In reconfigurable intelligent surface (RIS)-assisted symbiotic radio (SR), an RIS is exploited to assist the primary system and to simultaneously operate as a secondary transmitter by modulating its own information over the incident primary signal from the air. Such an operation is called over-the-air modulation. The existing modulation schemes such as on-off keying and binary phase-shift keying suffer from two problems for joint detection of the primary and secondary signals in RIS-assisted SR, i.e., one is the detection ambiguity problem when the direct link is blocked, and the other is the bit error rate (BER) error-floor problem when the direct link is weak. To address the two problems, we propose a novel modulation scheme by dividing the phase-shift matrix into two parts: one is the assistance beamforming matrix for assisting the primary system and the other is the transmission beamforming matrix for delivering the secondary signal. To optimize the assistance and transmission beamforming matrices, we first introduce an assistance factor that describes the performance requirement of the primary system and then formulate a problem to minimize the BER of the secondary system, while guaranteeing the BER requirement of the primary system controlled by the assistance factor. To solve this non-convex problem, we resort to the successive convex approximation technique to obtain a suboptimal solution. Furthermore, to draw more insights, we propose a low-complexity assistance-transmission beamforming structure by borrowing the idea from the classical maximum ratio transmission and zero forcing techniques. Finally, simulation results reveal an interesting tradeoff between the BER performance of the primary and secondary systems by adjusting the assistance factor.
[165] arXiv:2404.17176 [pdf, other]: Title: MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

Authors: Enxin Song, Wenhao Chai, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks. Yet, existing methods either employ complex spatial-temporal modules or rely heavily on additional perception models to extract temporal features for video understanding, and they only perform well on short videos. For long videos, the computational complexity and memory costs associated with long-term temporal connections are significantly increased, posing additional challenges.Taking advantage of the Atkinson-Shiffrin memory model, with tokens in Transformers being employed as the carriers of memory in combination with our specially designed memory mechanism, we propose MovieChat to overcome these challenges. We lift pre-trained multi-modal large language models for understanding long videos without incorporating additional trainable temporal modules, employing a zero-shot approach. MovieChat achieves state-of-the-art performance in long video understanding, along with the released MovieChat-1K benchmark with 1K long video, 2K temporal grounding labels, and 14K manual annotations for validation of the effectiveness of our method. The code along with the dataset can be accessed via the following https://github.com/rese1f/MovieChat.
[166] arXiv:2404.17177 [pdf, other]: Title: RE-RFME: Real-Estate RFME Model for customer segmentation

Authors: Anurag Kumar Pandey, Anil Goyal, Nikhil Sikka

Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)

Marketing is one of the high-cost activities for any online platform. With the increase in the number of customers, it is crucial to understand customers based on their dynamic behaviors to design effective marketing strategies. Customer segmentation is a widely used approach to group customers into different categories and design the marketing strategy targeting each group individually. Therefore, in this paper, we propose an end-to-end pipeline RE-RFME for segmenting customers into 4 groups: high value, promising, need attention, and need activation. Concretely, we propose a novel RFME (Recency, Frequency, Monetary and Engagement) model to track behavioral features of customers and segment them into different categories. Finally, we train the K-means clustering algorithm to cluster the user into one of the 4 categories. We show the effectiveness of the proposed approach on real-world Housing.com datasets for both website and mobile application users.
[167] arXiv:2404.17178 [pdf, other]: Title: A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition

Authors: Haojie Zhang, Yimeng Zhuang

Subjects: Computation and Language (cs.CL)

Few-shot Named Entity Recognition (NER) aims to extract named entities using only a limited number of labeled examples. Existing contrastive learning methods often suffer from insufficient distinguishability in context vector representation because they either solely rely on label semantics or completely disregard them. To tackle this issue, we propose a unified label-aware token-level contrastive learning framework. Our approach enriches the context by utilizing label semantics as suffix prompts. Additionally, it simultaneously optimizes context-context and context-label contrastive learning objectives to enhance generalized discriminative contextual representations.Extensive experiments on various traditional test domains (OntoNotes, CoNLL'03, WNUT'17, GUM, I2B2) and the large-scale few-shot NER dataset (FEWNERD) demonstrate the effectiveness of our approach. It outperforms prior state-of-the-art models by a significant margin, achieving an average absolute gain of 7% in micro F1 scores across most scenarios. Further analysis reveals that our model benefits from its powerful transfer capability and improved contextual representations.
[168] arXiv:2404.17179 [pdf, other]: Title: Meta-Object: Interactive and Multisensory Virtual Object Learned from the Real World for the Post-Metaverse

Authors: Dooyoung Kim, Taewook Ha, Jinseok Hong, Seonji Kim, Selin Choi, Heejeong Ko, Woontack Woo

Comments: 12 pages, 4 figures, under review in the IEEE CG&A magazine

Subjects: Human-Computer Interaction (cs.HC); Emerging Technologies (cs.ET)

With the proliferation of wearable Augmented Reality/Virtual Reality (AR/VR) devices, ubiquitous virtual experiences seamlessly integrate into daily life through metaverse platforms. To support immersive metaverse experiences akin to reality, we propose a next-generation virtual object, a meta-object, a property-embedded virtual object that contains interactive and multisensory characteristics learned from the real world. Current virtual objects differ significantly from real-world objects due to restricted sensory feedback based on limited physical properties. To leverage meta-objects in the metaverse, three key components are needed: meta-object modeling and property embedding, interaction-adaptive multisensory feedback, and an intelligence simulation-based post-metaverse platform. Utilizing meta-objects that enable both on-site and remote users to interact as if they were engaging with real objects could contribute to the advent of the post-metaverse era through wearable AR/VR devices.
[169] arXiv:2404.17183 [pdf, other]: Title: Prevalent Frequency of Emotional and Physical Symptoms in Social Anxiety using Zero Shot Classification: An Observational Study

Authors: Muhammad Rizwan, Jure Demšar

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Social anxiety represents a prevalent challenge in modern society, affecting individuals across personal and professional spheres. Left unaddressed, this condition can yield substantial negative consequences, impacting social interactions and performance. Further understanding its diverse physical and emotional symptoms becomes pivotal for comprehensive diagnosis and tailored therapeutic interventions. This study analyze prevalence and frequency of social anxiety symptoms taken from Mayo Clinic, exploring diverse human experiences from utilizing a large Reddit dataset dedicated to this issue. Leveraging these platforms, the research aims to extract insights and examine a spectrum of physical and emotional symptoms linked to social anxiety disorder. Upholding ethical considerations, the study maintains strict user anonymity within the dataset. By employing a novel approach, the research utilizes BART-based multi-label zero-shot classification to identify and measure symptom prevalence and significance in the form of probability score for each symptom under consideration. Results uncover distinctive patterns: "Trembling" emerges as a prevalent physical symptom, while emotional symptoms like "Fear of being judged negatively" exhibit high frequencies. These findings offer insights into the multifaceted nature of social anxiety, aiding clinical practices and interventions tailored to its diverse expressions.
[170] arXiv:2404.17184 [pdf, other]: Title: Low-Rank Knowledge Decomposition for Medical Foundation Models

Authors: Yuhang Zhou, Haolin Li, Siyuan Du, Jiangchao Yao, Ya Zhang, Yanfeng Wang

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The popularity of large-scale pre-training has promoted the development of medical foundation models. However, some studies have shown that although foundation models exhibit strong general feature extraction capabilities, their performance on specific tasks is still inferior to task-specific methods. In this paper, we explore a new perspective called ``Knowledge Decomposition'' to improve the performance on specific medical tasks, which deconstruct the foundation model into multiple lightweight expert models, each dedicated to a particular task, with the goal of improving specialization while concurrently mitigating resource expenditure. To accomplish the above objective, we design a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates graidents by incorporating low-rank expert modules and the efficient knowledge separation convolution. Extensive experimental results demonstrate that the decomposed models perform well in terms of performance and transferability, even surpassing the original foundation models.
[171] arXiv:2404.17186 [pdf, other]: Title: MCSDNet: Mesoscale Convective System Detection Network via Multi-scale Spatiotemporal Information

Authors: Jiajun Liang, Baoquan Zhang, Yunming Ye, Xutao Li, Chuyao Luo, Xukai Fu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The accurate detection of Mesoscale Convective Systems (MCS) is crucial for meteorological monitoring due to their potential to cause significant destruction through severe weather phenomena such as hail, thunderstorms, and heavy rainfall. However, the existing methods for MCS detection mostly targets on single-frame detection, which just considers the static characteristics and ignores the temporal evolution in the life cycle of MCS. In this paper, we propose a novel encoder-decoder neural network for MCS detection(MCSDNet). MCSDNet has a simple architecture and is easy to expand. Different from the previous models, MCSDNet targets on multi-frames detection and leverages multi-scale spatiotemporal information for the detection of MCS regions in remote sensing imagery(RSI). As far as we know, it is the first work to utilize multi-scale spatiotemporal information to detect MCS regions. Firstly, we design a multi-scale spatiotemporal information module to extract multi-level semantic from different encoder levels, which makes our models can extract more detail spatiotemporal features. Secondly, a Spatiotemporal Mix Unit(STMU) is introduced to MCSDNet to capture both intra-frame features and inter-frame correlations, which is a scalable module and can be replaced by other spatiotemporal module, e.g., CNN, RNN, Transformer and our proposed Dual Spatiotemporal Attention(DSTA). This means that the future works about spatiotemporal modules can be easily integrated to our model. Finally, we present MCSRSI, the first publicly available dataset for multi-frames MCS detection based on visible channel images from the FY-4A satellite. We also conduct several experiments on MCSRSI and find that our proposed MCSDNet achieve the best performance on MCS detection task when comparing to other baseline methods.
[172] arXiv:2404.17187 [pdf, other]: Title: An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging

Authors: Sadjad Anzabi Zadeh, W. Nick Street, Barrett W. Thomas

Subjects: Machine Learning (cs.LG)

Deep Reinforcement Learning is an effective tool for drug dosing for chronic condition management. However, the final protocol is generally a black box without any justification for its prescribed doses. This paper addresses this issue by proposing an explainable dosing protocol for warfarin using a Proximal Policy Optimization method combined with Policy Distillation. We introduce Action Forging as an effective tool to achieve explainability. Our focus is on the maintenance dosing protocol. Results show that the final model is as easy to understand and deploy as the current dosing protocols and outperforms the baseline dosing algorithms.
[173] arXiv:2404.17194 [pdf, ps, other]: Title: TIGQA:An Expert Annotated Question Answering Dataset in Tigrinya

Authors: Hailay Teklehaymanot, Dren Fazlija, Niloy Ganguly, Gourab K. Patro, Wolfgang Nejdl

Comments: 9 pages,3 figures, 7 tables,2 listings

Journal-ref: LREC-COLING 2024

Subjects: Computation and Language (cs.CL)

The absence of explicitly tailored, accessible annotated datasets for educational purposes presents a notable obstacle for NLP tasks in languages with limited resources.This study initially explores the feasibility of using machine translation (MT) to convert an existing dataset into a Tigrinya dataset in SQuAD format. As a result, we present TIGQA, an expert annotated educational dataset consisting of 2.68K question-answer pairs covering 122 diverse topics such as climate, water, and traffic. These pairs are from 537 context paragraphs in publicly accessible Tigrinya and Biology books. Through comprehensive analyses, we demonstrate that the TIGQA dataset requires skills beyond simple word matching, requiring both single-sentence and multiple-sentence inference abilities. We conduct experiments using state-of-the art MRC methods, marking the first exploration of such models on TIGQA. Additionally, we estimate human performance on the dataset and juxtapose it with the results obtained from pretrained models.The notable disparities between human performance and best model performance underscore the potential for further enhancements to TIGQA through continued research. Our dataset is freely accessible via the provided link to encourage the research community to address the challenges in the Tigrinya MRC.
[174] arXiv:2404.17195 [pdf, other]: Title: Distributed computation of temporal twins in periodic undirected time-varying graphs

Authors: Lina Azerouk (SU), Binh-Minh Bui-Xuan (NPA, SU, CNRS), Camille Palisoc (SU), Maria Potop-Butucaru (NPA, SU), Massinissa Tighilt (SU, NPA)

Subjects: Data Structures and Algorithms (cs.DS)

Twin nodes in a static network capture the idea of being substitutes for each other for maintaining paths of the same length anywhere in the network. In dynamic networks, we model twin nodes over a time-bounded interval, noted $(\Delta,d)$-twins, as follows. A periodic undirected time-varying graph $\mathcal G=(G_t)_{t\in\mathbb N}$ of period $p$ is an infinite sequence of static graphs where $G_t=G_{t+p}$ for every $t\in\mathbb N$. For $\Delta$ and $d$ two integers, two distinct nodes $u$ and $v$ in $\mathcal G$ are $(\Delta,d)$-twins if, starting at some instant, the outside neighbourhoods of $u$ and $v$ has non-empty intersection and differ by at most $d$ elements for $\Delta$ consecutive instants. In particular when $d=0$, $u$ and $v$ can act during the $\Delta$ instants as substitutes for each other in order to maintain journeys of the same length in time-varying graph $\mathcal G$. We propose a distributed deterministic algorithm enabling each node to enumerate its $(\Delta,d)$-twins in $2p$ rounds, using messages of size $O(\delta_\mathcal G\log n)$, where $n$ is the total number of nodes and $\delta_\mathcal G$ is the maximum degree of the graphs $G_t$'s. Moreover, using randomized techniques borrowed from distributed hash function sampling, we reduce the message size down to $O(\log n)$ w.h.p.
[175] arXiv:2404.17196 [pdf, other]: Title: Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

Authors: Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, Yu Jiang

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Presently, with the assistance of advanced LLM application development frameworks, more and more LLM-powered applications can effortlessly augment the LLMs' knowledge with external content using the retrieval augmented generation (RAG) technique. However, these frameworks' designs do not have sufficient consideration of the risk of external content, thereby allowing attackers to undermine the applications developed with these frameworks. In this paper, we reveal a new threat to LLM-powered applications, termed retrieval poisoning, where attackers can guide the application to yield malicious responses during the RAG process. Specifically, through the analysis of LLM application frameworks, attackers can craft documents visually indistinguishable from benign ones. Despite the documents providing correct information, once they are used as reference sources for RAG, the application is misled into generating incorrect responses. Our preliminary experiments indicate that attackers can mislead LLMs with an 88.33\% success rate, and achieve a 66.67\% success rate in the real-world application, demonstrating the potential impact of retrieval poisoning.
[176] arXiv:2404.17198 [pdf, ps, other]: Title: Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving

Authors: C. Gong, C. Lu, Z. Li, Z. Liu, J. Gong, X. Chen

Journal-ref: IEEE Transactions on Vehicular Technology 2024 Pages 1-14

Subjects: Robotics (cs.RO)

Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the data sufficiency and quality of the demonstrations. To alleviate the above problems of IL-based policies, a lifelong policy learning (LLPL) framework is proposed in this paper, which extends the IL scheme with lifelong learning (LLL). First, a novel IL-based model-free control policy learning method for path tracking is introduced. Even with imperfect demonstration, the optimal control policy can be learned directly from historical driving data. Second, by using the LLL method, the pre-trained IL policy can be safely updated and fine-tuned with incremental execution knowledge. Third, a knowledge evaluation method for policy learning is introduced to avoid learning redundant or inferior knowledge, thus ensuring the performance improvement of online policy learning. Experiments are conducted using a high-fidelity vehicle dynamic model in various scenarios to evaluate the performance of the proposed method. The results show that the proposed LLPL framework can continuously improve the policy performance with collected incremental driving data, and achieves the best accuracy and control smoothness compared to other baseline methods after evolving on a 7 km curved road. Through learning and evaluation with noisy real-life data collected in an off-road environment, the proposed LLPL framework also demonstrates its applicability in learning and evolving in real-life scenarios.
[177] arXiv:2404.17199 [pdf, other]: Title: Few-shot Calligraphy Style Learning

Authors: Fangda Chen, Jiacheng Nie, Lichuan Jiang, Zhuoer Zeng

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduced "Presidifussion," a novel approach to learning and replicating the unique style of calligraphy of President Xu, using a pretrained diffusion model adapted through a two-stage training process. Initially, our model is pretrained on a diverse dataset containing works from various calligraphers. This is followed by fine-tuning on a smaller, specialized dataset of President Xu's calligraphy, comprising just under 200 images. Our method introduces innovative techniques of font image conditioning and stroke information conditioning, enabling the model to capture the intricate structural elements of Chinese characters. The effectiveness of our approach is demonstrated through a comparison with traditional methods like zi2zi and CalliGAN, with our model achieving comparable performance using significantly smaller datasets and reduced computational resources. This work not only presents a breakthrough in the digital preservation of calligraphic art but also sets a new standard for data-efficient generative modeling in the domain of cultural heritage digitization.
[178] arXiv:2404.17202 [pdf, other]: Title: Self-supervised visual learning in the low-data regime: a comparative evaluation

Authors: Sotirios Konstantakos, Despina Ioanna Chalkiadaki, Ioannis Mademlis, Yuki M. Asano, Efstratios Gavves, Georgios Th. Papadopoulos

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Self-Supervised Learning (SSL) is a valuable and robust training methodology for contemporary Deep Neural Networks (DNNs), enabling unsupervised pretraining on a `pretext task' that does not require ground-truth labels/annotation. This allows efficient representation learning from massive amounts of unlabeled training data, which in turn leads to increased accuracy in a `downstream task' by exploiting supervised transfer learning. Despite the relatively straightforward conceptualization and applicability of SSL, it is not always feasible to collect and/or to utilize very large pretraining datasets, especially when it comes to real-world application settings. In particular, in cases of specialized and domain-specific application scenarios, it may not be achievable or practical to assemble a relevant image pretraining dataset in the order of millions of instances or it could be computationally infeasible to pretrain at this scale. This motivates an investigation on the effectiveness of common SSL pretext tasks, when the pretraining dataset is of relatively limited/constrained size. In this context, this work introduces a taxonomy of modern visual SSL methods, accompanied by detailed explanations and insights regarding the main categories of approaches, and, subsequently, conducts a thorough comparative experimental evaluation in the low-data regime, targeting to identify: a) what is learnt via low-data SSL pretraining, and b) how do different SSL categories behave in such training scenarios. Interestingly, for domain-specific downstream tasks, in-domain low-data SSL pretraining outperforms the common approach of large-scale pretraining on general datasets. Grounded on the obtained results, valuable insights are highlighted regarding the performance of each category of SSL methods, which in turn suggest straightforward future research directions in the field.
[179] arXiv:2404.17205 [pdf, other]: Title: Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer

Authors: Xinpeng Li, Teng Wang, Jian Zhao, Shuyi Mao, Jinbao Wang, Feng Zheng, Xiaojiang Peng, Xuelong Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Emotion recognition aims to discern the emotional state of subjects within an image, relying on subject-centric and contextual visual cues. Current approaches typically follow a two-stage pipeline: first localize subjects by off-the-shelf detectors, then perform emotion classification through the late fusion of subject and context features. However, the complicated paradigm suffers from disjoint training stages and limited interaction between fine-grained subject-context elements. To address the challenge, we present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT), for simultaneous subject localization and emotion classification. Rather than compartmentalizing training stages, we jointly leverage box and emotion signals as supervision to enrich subject-centric feature learning. Furthermore, we introduce DSCT to facilitate interactions between fine-grained subject-context cues in a decouple-then-fuse manner. The decoupled query token--subject queries and context queries--gradually intertwine across layers within DSCT, during which spatial and semantic relations are exploited and aggregated. We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC. Our approach surpasses two-stage alternatives with fewer parameter numbers, achieving a 3.39% accuracy improvement and a 6.46% average precision gain on CAER-S and EMOTIC datasets, respectively.
[180] arXiv:2404.17212 [pdf, ps, other]: Title: Scrutinizing Data from Sky: An Examination of Its Veracity in Area Based Traffic Contexts

Authors: Yawar Ali (1), Krishnan K N (1), Debashis Ray Sarkar (1), K. Ramachandra Rao (1), Niladri Chatterjee (1), Ashish Bhaskar (2) ((1) Indian Institute of Technology Delhi, New Delhi, India (2) Queensland University of Technology, Brisbane, Australia)

Subjects: Emerging Technologies (cs.ET); Computer Vision and Pattern Recognition (cs.CV)

Traffic data collection has been an overwhelming task for researchers as well as authorities over the years. With the advancement in technology and introduction of various tools for processing and extracting traffic data the task has been made significantly convenient. Data from Sky (DFS) is one such tool, based on image processing and artificial intelligence (AI), that provides output for macroscopic as well as microscopic variables of the traffic streams. The company claims to provide 98 to 100 percent accuracy on the data exported using DFS tool. The tool is widely used in developed countries where the traffic is homogenous and has lane-based movements. In this study, authors have checked the veracity of DFS tool in heterogenous and area-based traffic movement that is prevailing in most developing countries. The validation is done using various methods using Classified Volume Count (CVC), Space Mean Speeds (SMS) of individual vehicle classes and microscopic trajectory of probe vehicle to verify DFS claim. The error for CVCs for each vehicle class present in the traffic stream is estimated. Mean Absolute Percentage Error (MAPE) values are calculated for average speeds of each vehicle class between manually and DFS extracted space mean speeds (SMSs), and the microscopic trajectories are validated using a GPS based tracker put on probe vehicles. The results are fairly accurate in the case of data taken from a bird eye view with least errors. The other configurations of data collection have some significant errors, that are majorly caused by the varied traffic composition, the view of camera angle, and the direction of traffic.
[181] arXiv:2404.17214 [pdf, other]: Title: Set Selection with Uncertain Weights: Non-Adaptive Queries and Thresholds

Authors: Christoph Dürr, Arturo Merino, José A. Soto, José Verschae

Subjects: Data Structures and Algorithms (cs.DS)

We study set selection problems where the weights are uncertain. Instead of its exact weight, only an uncertainty interval containing its true weight is available for each element. In some cases, some solutions are universally optimal; i.e., they are optimal for every weight that lies within the uncertainty intervals. However, it may be that no universal optimal solution exists, unless we are revealed additional information on the precise values of some elements.
In the minimum cost admissible query problem, we are tasked to (non-adaptively) find a minimum-cost subset of elements that, no matter how they are revealed, guarantee the existence of a universally optimal solution.
We introduce thresholds under uncertainty to analyze problems of minimum cost admissible queries. Roughly speaking, for every element e, there is a threshold for its weight, below which e is included in all optimal solutions and a second threshold above which e is excluded from all optimal solutions.
We show that computing thresholds and finding minimum cost admissible queries are essentially equivalent problems. Thus, the analysis of the minimum admissible query problem reduces to the problem of computing thresholds.
We provide efficient algorithms for computing thresholds in the settings of minimum spanning trees, matroids, and matchings in trees; and NP-hardness results in the settings of s-t shortest paths and bipartite matching. By making use of the equivalence between the two problems these results translate into efficient algorithms for minimum cost admissible queries in the settings of minimum spanning trees, matroids, and matchings in trees; and NP-hardness results in the settings of s-t shortest paths and bipartite matching.
[182] arXiv:2404.17215 [pdf, other]: Title: SLAM for Indoor Mapping of Wide Area Construction Environments

Authors: Vincent Ress, Wei Zhang, David Skuddis, Norbert Haala, Uwe Soergel

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Simultaneous localization and mapping (SLAM), i.e., the reconstruction of the environment represented by a (3D) map and the concurrent pose estimation, has made astonishing progress. Meanwhile, large scale applications aiming at the data collection in complex environments like factory halls or construction sites are becoming feasible. However, in contrast to small scale scenarios with building interiors separated to single rooms, shop floors or construction areas require measures at larger distances in potentially texture less areas under difficult illumination. Pose estimation is further aggravated since no GNSS measures are available as it is usual for such indoor applications. In our work, we realize data collection in a large factory hall by a robot system equipped with four stereo cameras as well as a 3D laser scanner. We apply our state-of-the-art LiDAR and visual SLAM approaches and discuss the respective pros and cons of the different sensor types for trajectory estimation and dense map generation in such an environment. Additionally, dense and accurate depth maps are generated by 3D Gaussian splatting, which we plan to use in the context of our project aiming on the automatic construction and site monitoring.
[183] arXiv:2404.17216 [pdf, other]: Title: Prompting Towards Alleviating Code-Switched Data Scarcity in Under-Resourced Languages with GPT as a Pivot

Authors: Michelle Terblanche, Kayode Olaleye, Vukosi Marivate

Comments: To be published in the Proceedings of SIGUL 2024: 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages

Subjects: Computation and Language (cs.CL)

Many multilingual communities, including numerous in Africa, frequently engage in code-switching during conversations. This behaviour stresses the need for natural language processing technologies adept at processing code-switched text. However, data scarcity, particularly in African languages, poses a significant challenge, as many are low-resourced and under-represented. In this study, we prompted GPT 3.5 to generate Afrikaans--English and Yoruba--English code-switched sentences, enhancing diversity using topic-keyword pairs, linguistic guidelines, and few-shot examples. Our findings indicate that the quality of generated sentences for languages using non-Latin scripts, like Yoruba, is considerably lower when compared with the high Afrikaans-English success rate. There is therefore a notable opportunity to refine prompting guidelines to yield sentences suitable for the fine-tuning of language models. We propose a framework for augmenting the diversity of synthetically generated code-switched data using GPT and propose leveraging this technology to mitigate data scarcity in low-resourced languages, underscoring the essential role of native speakers in this process.
[184] arXiv:2404.17217 [pdf, other]: Title: Cycling into the workshop: predictive maintenance for Barcelona's bike-sharing system

Authors: Jordi Grau-Escolano, Aleix Bassolas, Julian Vicens

Comments: 25 pages, 9 figures, 7 tables

Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Physics and Society (physics.soc-ph)

Bike-sharing systems have emerged as a significant element of urban mobility, providing an environmentally friendly transportation alternative. With the increasing integration of electric bikes alongside mechanical bikes, it is crucial to illuminate distinct usage patterns and their impact on maintenance. Accordingly, this research aims to develop a comprehensive understanding of mobility dynamics, distinguishing between different mobility modes, and introducing a novel predictive maintenance system tailored for bikes. By utilising a combination of trip information and maintenance data from Barcelona's bike-sharing system, Bicing, this study conducts an extensive analysis of mobility patterns and their relationship to failures of bike components. To accurately predict maintenance needs for essential bike parts, this research delves into various mobility metrics and applies statistical and machine learning survival models, including deep learning models. Due to their complexity, and with the objective of bolstering confidence in the system's predictions, interpretability techniques explain the main predictors of maintenance needs. The analysis reveals marked differences in the usage patterns of mechanical bikes and electric bikes, with a growing user preference for the latter despite their extra costs. These differences in mobility were found to have a considerable impact on the maintenance needs within the bike-sharing system. Moreover, the predictive maintenance models proved effective in forecasting these maintenance needs, capable of operating across an entire bike fleet. Despite challenges such as approximated bike usage metrics and data imbalances, the study successfully showcases the feasibility of an accurate predictive maintenance system capable of improving operational costs, bike availability, and security.
[185] arXiv:2404.17218 [pdf, other]: Title: Prompting Techniques for Reducing Social Bias in LLMs through System 1 and System 2 Cognitive Processes

Authors: Mahammed Kamruzzaman, Gene Louis Kim

Subjects: Computation and Language (cs.CL)

Dual process theory posits that human cognition arises via two systems. System 1, which is a quick, emotional, and intuitive process, which is subject to cognitive biases, and System 2, a slow, onerous, and deliberate process. NLP researchers often compare zero-shot prompting in LLMs to System 1 reasoning and chain-of-thought (CoT) prompting to System 2. In line with this interpretation, prior research has found that using CoT prompting in LLMs leads to reduced gender bias. We investigate the relationship between bias, CoT prompting, and dual process theory in LLMs directly. We compare zero-shot, CoT, and a variety of dual process theory-based prompting strategies on two bias datasets spanning nine different social bias categories. We also use human and machine personas to determine whether the effects of dual process theory in LLMs are based on modeling human cognition or inherent to the system. We find that a human persona, System 2, and CoT prompting all tend to reduce social biases in LLMs, though the best combination of features depends on the exact model and bias category -- resulting in up to a 13 percent drop in stereotypical judgments by an LLM.
[186] arXiv:2404.17221 [pdf, other]: Title: SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval

Authors: Marco Peer, Florian Kleber, Robert Sablatnig

Comments: accepted for ICDAR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces SAGHOG, a self-supervised pretraining strategy for writer retrieval using HOG features of the binarized input image. Our preprocessing involves the application of the Segment Anything technique to extract handwriting from various datasets, ending up with about 24k documents, followed by training a vision transformer on reconstructing masked patches of the handwriting. SAGHOG is then finetuned by appending NetRVLAD as an encoding layer to the pretrained encoder. Evaluation of our approach on three historical datasets, Historical-WI, HisFrag20, and GRK-Papyri, demonstrates the effectiveness of SAGHOG for writer retrieval. Additionally, we provide ablation studies on our architecture and evaluate un- and supervised finetuning. Notably, on HisFrag20, SAGHOG outperforms related work with a mAP of 57.2 % - a margin of 11.6 % to the current state of the art, showcasing its robustness on challenging data, and is competitive on even small datasets, e.g. GRK-Papyri, where we achieve a Top-1 accuracy of 58.0%.
[187] arXiv:2404.17223 [pdf, ps, other]: Title: Maximizing Minimum Cycle Bases Intersection

Authors: Dimitri Watel (SAMOVAR, ENSIIE), Marc-Antoine Weisser (GALaC), Dominique Barth (UVSQ, DAVID), Ylène Aboulfath (UVSQ, DAVID), Thierry Mautor (UVSQ, DAVID)

Subjects: Computational Complexity (cs.CC)

We address a specific case of the matroid intersection problem: given a set of graphs sharing the same set of vertices, select a minimum cycle basis for each graph to maximize the size of their intersection. We provide a comprehensive complexity analysis of this problem, which finds applications in chemoinformatics. We establish a complete partition of subcases based on intrinsic parameters: the number of graphs, the maximum degree of the graphs, and the size of the longest cycle in the minimum cycle bases. Additionally, we present results concerning the approximability and parameterized complexity of the problem.
[188] arXiv:2404.17224 [pdf, other]: Title: Scene-Extrapolation: Generating Interactive Traffic Scenarios

Authors: Maximilian Zipfl, Barbara Schütt, J. Marius Zöllner

Subjects: Robotics (cs.RO)

Verifying highly automated driving functions can be challenging, requiring identifying relevant test scenarios. Scenario-based testing will likely play a significant role in verifying these systems, predominantly occurring within simulation. In our approach, we use traffic scenes as a starting point (seed-scene) to address the individuality of various highly automated driving functions and to avoid the problems associated with a predefined test traffic scenario. Different highly autonomous driving functions, or their distinct iterations, may display different behaviors under the same operating conditions. To make a generalizable statement about a seed-scene, we simulate possible outcomes based on various behavior profiles. We utilize our lightweight simulation environment and populate it with rule-based and machine learning behavior models for individual actors in the scenario. We analyze resulting scenarios using a variety of criticality metrics. The density distributions of the resulting criticality values enable us to make a profound statement about the significance of a particular scene, considering various eventualities.
[189] arXiv:2404.17225 [pdf, other]: Title: Enhancing Privacy and Security of Autonomous UAV Navigation

Authors: Vatsal Aggarwal, Arjun Ramesh Kaushik, Charanjit Jutla, Nalini Ratha

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Autonomous Unmanned Aerial Vehicles (UAVs) have become essential tools in defense, law enforcement, disaster response, and product delivery. These autonomous navigation systems require a wireless communication network, and of late are deep learning based. In critical scenarios such as border protection or disaster response, ensuring the secure navigation of autonomous UAVs is paramount. But, these autonomous UAVs are susceptible to adversarial attacks through the communication network or the deep learning models - eavesdropping / man-in-the-middle / membership inference / reconstruction. To address this susceptibility, we propose an innovative approach that combines Reinforcement Learning (RL) and Fully Homomorphic Encryption (FHE) for secure autonomous UAV navigation. This end-to-end secure framework is designed for real-time video feeds captured by UAV cameras and utilizes FHE to perform inference on encrypted input images. While FHE allows computations on encrypted data, certain computational operators are yet to be implemented. Convolutional neural networks, fully connected neural networks, activation functions and OpenAI Gym Library are meticulously adapted to the FHE domain to enable encrypted data processing. We demonstrate the efficacy of our proposed approach through extensive experimentation. Our proposed approach ensures security and privacy in autonomous UAV navigation with negligible loss in performance.
[190] arXiv:2404.17229 [pdf, other]: Title: Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision

Authors: Cong Fan, Shengkai Zhang, Kezhong Liu, Shuai Wang, Zheng Yang, Wei Wang

Comments: This paper has been accepted by ICRA 2024

Subjects: Robotics (cs.RO)

Complementary to prevalent LiDAR and camera systems, millimeter-wave (mmWave) radar is robust to adverse weather conditions like fog, rainstorms, and blizzards but offers sparse point clouds. Current techniques enhance the point cloud by the supervision of LiDAR's data. However, high-performance LiDAR is notably expensive and is not commonly available on vehicles. This paper presents mmEMP, a supervised learning approach that enhances radar point clouds using a low-cost camera and an inertial measurement unit (IMU), enabling crowdsourcing training data from commercial vehicles. Bringing the visual-inertial (VI) supervision is challenging due to the spatial agnostic of dynamic objects. Moreover, spurious radar points from the curse of RF multipath make robots misunderstand the scene. mmEMP first devises a dynamic 3D reconstruction algorithm that restores the 3D positions of dynamic features. Then, we design a neural network that densifies radar data and eliminates spurious radar points. We build a new dataset in the real world. Extensive experiments show that mmEMP achieves competitive performance compared with the SOTA approach training by LiDAR's data. In addition, we use the enhanced point cloud to perform object detection, localization, and mapping to demonstrate mmEMP's effectiveness.
[191] arXiv:2404.17230 [pdf, other]: Title: ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion

Authors: Ziyue Zhang, Mingbao Lin, Rongrong Ji

Comments: 12 pages, submitted to ECCV2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce ObjectAdd, a training-free diffusion modification method to add user-expected objects into user-specified area. The motive of ObjectAdd stems from: first, describing everything in one prompt can be difficult, and second, users often need to add objects into the generated image. To accommodate with real world, our ObjectAdd maintains accurate image consistency after adding objects with technical innovations in: (1) embedding-level concatenation to ensure correct text embedding coalesce; (2) object-driven layout control with latent and attention injection to ensure objects accessing user-specified area; (3) prompted image inpainting in an attention refocusing & object expansion fashion to ensure rest of the image stays the same. With a text-prompted image, our ObjectAdd allows users to specify a box and an object, and achieves: (1) adding object inside the box area; (2) exact content outside the box area; (3) flawless fusion between the two areas
[192] arXiv:2404.17238 [pdf, other]: Title: TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content

Authors: Meng Yan, Haibin Huang, Ying Liu, Juan Zhao, Xiyue Gao, Cai Xu, Ziyu Guan, Wei Zhao

Subjects: Information Retrieval (cs.IR)

Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal inconsistent information. However, they could potentially constrain the capturing of personalized user preferences. In addition, it is almost impossible to entirely eliminate noise in diverse user-generated multi-modal content. To solve these problems, we propose a trustworthy sequential recommendation method via noisy user-generated multi-modal content. Specifically, we explicitly capture the consistency and complementarity of user-generated multi-modal content to mitigate noise interference. We also achieve the modeling of the user's multi-modal sequential preferences. In addition, we design a trustworthy decision mechanism that integrates subjective user perspective and objective item perspective to dynamically evaluate the uncertainty of prediction results. Experimental evaluation on four widely-used datasets demonstrates the superior performance of our model compared to state-of-the-art methods. The code is released at https://github.com/FairyMeng/TrustSR.
[193] arXiv:2404.17241 [pdf, ps, other]: Title: Synchronized Stepwise Control of Firing and Learning Thresholds in a Spiking Randomly Connected Neural Network toward Hardware Implementation

Authors: Kumiko Nomura, Yoshifumi Nishi

Comments: 18 pages, 9 figures, 1 table

Subjects: Neural and Evolutionary Computing (cs.NE)

We propose hardware-oriented models of intrinsic plasticity (IP) and synaptic plasticity (SP) for spiking randomly connected recursive neural network (RNN). Although the potential of RNNs for temporal data processing has been demonstrated, randomness of the network architecture often causes performance degradation. Self-organization mechanism using IP and SP can mitigate the degradation, therefore, we compile these functions in a spiking neuronal model. To implement the function of IP, a variable firing threshold is introduced to each excitatory neuron in the RNN that changes stepwise in accordance with its activity. We also define other thresholds for SP that synchronize with the firing threshold, which determine the direction of stepwise synaptic update that is executed on receiving a pre-synaptic spike. We demonstrate the effectiveness of our model through simulations of temporal data learning and anomaly detection with a spiking RNN using publicly available electrocardiograms. Considering hardware implementation, we employ discretized thresholds and synaptic weights and show that these parameters can be reduced to binary if the RNN architecture is appropriately designed. This contributes to minimization of the circuit of the neuronal system having IP and SP.
[194] arXiv:2404.17243 [pdf, other]: Title: Binarizing Documents by Leveraging both Space and Frequency

Authors: Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

Comments: Accepted at ICDAR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradations and acquisition artifacts that can greatly vary throughout the page. Nonetheless, even when dealing with a local patch of the document, taking into account the overall appearance of a wide portion of the page can ease the prediction by enriching it with semantic information on the ink and background conditions. In this respect, approaches able to model both local and global information have been proven suitable for this task. In particular, recent applications of Vision Transformer (ViT)-based models, able to model short and long-range dependencies via the attention mechanism, have demonstrated their superiority over standard Convolution-based models, which instead struggle to model global dependencies. In this work, we propose an alternative solution based on the recently introduced Fast Fourier Convolutions, which overcomes the limitation of standard convolutions in modeling global information while requiring fewer parameters than ViTs. We validate the effectiveness of our approach via extensive experimental analysis considering different types of degradations.
[195] arXiv:2404.17244 [pdf, other]: Title: Automated Configuration Synthesis for Machine Learning Models: A git-Based Requirement and Architecture Management System

Authors: Abdullatif AlShriaf, Hans-Martin Heyn, Eric Knauss

Comments: Accepted at 32nd IEEE International Requirements Engineering Conference (RE24), Posters and Tool Demos Track, Reykjavik, Iceland, 2024

Subjects: Software Engineering (cs.SE)

This work introduces a tool for generating runtime configurations automatically from textual requirements stored as artifacts in git repositories (a.k.a. T-Reqs) alongside the software code. The tool leverages T-Reqs-modelled architectural description to identify relevant configuration properties for the deployment of artificial intelligence (AI)-enabled software systems. This enables traceable configuration generation, taking into account both functional and non-functional requirements. The resulting configuration specification also includes the dynamic properties that need to be adjusted and the rationale behind their adjustment. We show that this intermediary format can be directly used by the system or adapted for specific targets, for example in order to achieve runtime optimisations in term of ML model size before deployment.
[196] arXiv:2404.17245 [pdf, other]: Title: Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting

Authors: Reza Akbarian Bafghi, Nidhin Harilal, Claire Monteleoni, Maziar Raissi

Comments: Accepted at eLVM Workshop, CVPR, 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Artificial neural networks often suffer from catastrophic forgetting, where learning new concepts leads to a complete loss of previously acquired knowledge. We observe that this issue is particularly magnified in vision transformers (ViTs), where post-pre-training and fine-tuning on new tasks can significantly degrade the model's original general abilities. For instance, a DINO ViT-Base/16 pre-trained on ImageNet-1k loses over 70% accuracy on ImageNet-1k after just 10 iterations of fine-tuning on CIFAR-100. Overcoming this stability-plasticity dilemma is crucial for enabling ViTs to continuously learn and adapt to new domains while preserving their initial knowledge. In this work, we study two new parameter-efficient fine-tuning strategies: (1)~Block Expansion, and (2) Low-rank adaptation (LoRA). Our experiments reveal that using either Block Expansion or LoRA on self-supervised pre-trained ViTs surpass fully fine-tuned ViTs in new domains while offering significantly greater parameter efficiency. Notably, we find that Block Expansion experiences only a minimal performance drop in the pre-training domain, thereby effectively mitigating catastrophic forgetting in pre-trained ViTs.
[197] arXiv:2404.17249 [pdf, other]: Title: Making Better Use of Unlabelled Data in Bayesian Active Learning

Authors: Freddie Bickford Smith, Adam Foster, Tom Rainforth

Comments: Published at AISTATS 2024

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Fully supervised models are predominant in Bayesian active learning. We argue that their neglect of the information present in unlabelled data harms not just predictive performance but also decisions about what data to acquire. Our proposed solution is a simple framework for semi-supervised Bayesian active learning. We find it produces better-performing models than either conventional Bayesian active learning or semi-supervised learning with randomly acquired data. It is also easier to scale up than the conventional approach. As well as supporting a shift towards semi-supervised models, our findings highlight the importance of studying models and acquisition methods in conjunction.
[198] arXiv:2404.17251 [pdf, other]: Title: Camera Motion Estimation from RGB-D-Inertial Scene Flow

Authors: Samuel Cerezo, Javier Civera

Comments: Accepted to CVPR2024 Workshop on Visual Odometry and Computer Vision Applications

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce a novel formulation for camera motion estimation that integrates RGB-D images and inertial data through scene flow. Our goal is to accurately estimate the camera motion in a rigid 3D environment, along with the state of the inertial measurement unit (IMU). Our proposed method offers the flexibility to operate as a multi-frame optimization or to marginalize older data, thus effectively utilizing past measurements. To assess the performance of our method, we conducted evaluations using both synthetic data from the ICL-NUIM dataset and real data sequences from the OpenLORIS-Scene dataset. Our results show that the fusion of these two sensors enhances the accuracy of camera motion estimation when compared to using only visual data.
[199] arXiv:2404.17252 [pdf, ps, other]: Title: Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition

Authors: Houtan Ghaffari, Paul Devos

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

Transferring the weights of a pre-trained model to assist another task has become a crucial part of modern deep learning, particularly in data-scarce scenarios. Pre-training refers to the initial step of training models outside the current task of interest, typically on another dataset. It can be done via supervised models using human-annotated datasets or self-supervised models trained on unlabeled datasets. In both cases, many pre-trained models are available to fine-tune for the task of interest. Interestingly, research has shown that pre-trained models from ImageNet can be helpful for audio tasks despite being trained on image datasets. Hence, it's unclear whether in-domain models would be advantageous compared to competent out-domain models, such as convolutional neural networks from ImageNet. Our experiments will demonstrate the usefulness of in-domain models and datasets for bird species recognition by leveraging VICReg, a recent and powerful self-supervised method.
[200] arXiv:2404.17253 [pdf, other]: Title: Weakly Supervised Training for Hologram Verification in Identity Documents

Authors: Glen Pouliquen (1 and 2), Guillaume Chiron (1), Joseph Chazalon (2), Thierry Géraud (2), Ahmad Montaser Awal (1) ((1) IDnow AI & ML Center of Excellence, France, (2) EPITA Research Lab. (LRE), EPITA, France)

Comments: Accepted at the International Conference on Document Analysis and Recognition (ICDAR 2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a method to remotely verify the authenticity of Optically Variable Devices (OVDs), often referred to as ``holograms'', in identity documents. Our method processes video clips captured with smartphones under common lighting conditions, and is evaluated on two public datasets: MIDV-HOLO and MIDV-2020. Thanks to a weakly-supervised training, we optimize a feature extraction and decision pipeline which achieves a new leading performance on MIDV-HOLO, while maintaining a high recall on documents from MIDV-2020 used as attack samples. It is also the first method, to date, to effectively address the photo replacement attack task, and can be trained on either genuine samples, attack samples, or both for increased performance. By enabling to verify OVD shapes and dynamics with very little supervision, this work opens the way towards the use of massive amounts of unlabeled data to build robust remote identity document verification systems on commodity smartphones. Code is available at https://github.com/EPITAResearchLab/pouliquen.24.icdar
[201] arXiv:2404.17254 [pdf, other]: Title: Trinity Detector:text-assisted and attention mechanisms based spectral fusion for diffusion generation image detection

Authors: Jiawei Song, Dengpan Ye, Yunming Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Artificial Intelligence Generated Content (AIGC) techniques, represented by text-to-image generation, have led to a malicious use of deep forgeries, raising concerns about the trustworthiness of multimedia content. Adapting traditional forgery detection methods to diffusion models proves challenging. Thus, this paper proposes a forgery detection method explicitly designed for diffusion models called Trinity Detector. Trinity Detector incorporates coarse-grained text features through a CLIP encoder, coherently integrating them with fine-grained artifacts in the pixel domain for comprehensive multimodal detection. To heighten sensitivity to diffusion-generated image features, a Multi-spectral Channel Attention Fusion Unit (MCAF) is designed, extracting spectral inconsistencies through adaptive fusion of diverse frequency bands and further integrating spatial co-occurrence of the two modalities. Extensive experimentation validates that our Trinity Detector method outperforms several state-of-the-art methods, our performance is competitive across all datasets and up to 17.6\% improvement in transferability in the diffusion datasets.
[202] arXiv:2404.17255 [pdf, other]: Title: SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes

Authors: Georgia Baltsou, Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos

Comments: 2024 18th International Conference on Automatic Face and Gesture Recognition (FG)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow focus limits the diversity of the data and consequently the robustness of AI systems trained on them. This work aims to address this limitation by proposing a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity. Specifically, our approach integrates a systematic prompt formulation strategy, encompassing not only demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories. These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images and can be used as an evaluation set in face analysis systems. Compared to existing datasets, our proposed dataset proves equally or more challenging in image classification tasks while being much smaller in size.
[203] arXiv:2404.17263 [pdf, ps, other]: Title: Multiple-Target Detection in Cell-Free Massive MIMO-Assisted ISAC

Authors: Mohamed Elfiatoure, Mohammadali Mohammadi, Hien Quoc Ngo, Michail Matthaiou

Comments: The manuscript has been submitted to IEEE TWC

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We propose a distributed implementation for integrated sensing and communication (ISAC) backed by a massive multiple input multiple output (CF-mMIMO) architecture without cells. Distributed multi-antenna access points (APs) simultaneously serve communication users (UEs) and emit probing signals towards multiple specified zones for sensing. The APs can switch between communication and sensing modes, and adjust their transmit power based on the network settings and sensing and communication operations' requirements. By considering local partial zero-forcing and maximum-ratio-transmit precoding at the APs for communication and sensing, respectively, we first derive closed-form expressions for the spectral efficiency (SE) of the UEs and the mainlobe-to-average-sidelobe ratio (MASR) of the sensing zones. Then, a joint operation mode selection and power control design problem is formulated to maximize the SE fairness among the UEs, while ensuring specific levels of MASR for sensing zones. The complicated mixed-integer problem is relaxed and solved via successive convex approximation approach. We further propose a low-complexity design, where AP mode selection is designed through a greedy algorithm and then power control is designed based on this chosen mode. Our findings reveal that the proposed scheme can consistently ensure a sensing success rate of $100\%$ for different network setups with a satisfactory fairness among all UEs.
[204] arXiv:2404.17264 [pdf, ps, other]: Title: Beyond Efficiency and Convenience. Using Post-growth Values as a Nucleus to Transform Design Education and Society

Authors: Matthias Laschke, Lenneke Kuijer

Subjects: Human-Computer Interaction (cs.HC)

In this position paper we present Municipan, an artefact resulting from a post-growth design experiment, applied in a student design project. In contrast to mainstream human-centered design directed at efficiency and convenience, which we argue leads to deskilling, dependency, and the progression of the climate crisis, we challenged students to envision an opposite user that is willing to invest time and effort and learn new skills. While Municipan is not a direct step towards a postgrowth society, integrating the way it was created in design education can act as a nucleus, bringing forth design professionals inclined to create technologies with potential to gradually transform society towards postgrowth living. Bringing in examples from our own research, we illustrate that designs created in this mindset, such as heating systems that train cold resistance, or navigation systems that train orientation have potential to reskill users, reduce technological dependency and steer consumption within planetary limits.
[205] arXiv:2404.17269 [pdf, other]: Title: Clustering of Motion Trajectories by a Distance Measure Based on Semantic Features

Authors: Christoph Zelch, Jan Peters, Oskar von Stryk

Comments: Published in: 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids). Code available at: this https URL

Journal-ref: 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), Austin, TX, USA, 2023

Subjects: Robotics (cs.RO)

Clustering of motion trajectories is highly relevant for human-robot interactions as it allows the anticipation of human motions, fast reaction to those, as well as the recognition of explicit gestures. Further, it allows automated analysis of recorded motion data. Many clustering algorithms for trajectories build upon distance metrics that are based on pointwise Euclidean distances. However, our work indicates that focusing on salient characteristics is often sufficient. We present a novel distance measure for motion plans consisting of state and control trajectories that is based on a compressed representation built from their main features. This approach allows a flexible choice of feature classes relevant to the respective task. The distance measure is used in agglomerative hierarchical clustering. We compare our method with the widely used dynamic time warping algorithm on test sets of motion plans for the Furuta pendulum and the Manutec robot arm and on real-world data from a human motion dataset. The proposed method demonstrates slight advantages in clustering and strong advantages in runtime, especially for long trajectories.
[206] arXiv:2404.17270 [pdf, other]: Title: Empirical Studies of Propagation Characteristics and Modeling Based on XL-MIMO Channel Measurement: From Far-Field to Near-Field

Authors: Haiyang Miao, Jianhua Zhang, Pan Tang, Lei Tian, Weirang Zuo, Qi Wei, Guangyi Liu

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In the sixth-generation (6G), the extremely large-scale multiple-input-multiple-output (XL-MIMO) is considered a promising enabling technology. With the further expansion of array element number and frequency bands, near-field effects will be more likely to occur in 6G communication systems. The near-field radio communications (NFRC) will become crucial in 6G communication systems. It is known that the channel research is very important for the development and performance evaluation of the communication systems. In this paper, we will systematically investigate the channel measurements and modeling for the emerging NFRC. First, the principle design of massive MIMO channel measurement platform are solved. Second, an indoor XL-MIMO channel measurement campaign with 1600 array elements is conducted, and the channel characteristics are extracted and validated in the near-field region. Then, the outdoor XL-MIMO channel measurement campaign with 320 array elements is conducted, and the channel characteristics are extracted and modeled from near-field to far-field (NF-FF) region. The spatial non-stationary characteristics of angular spread at the transmitting end are more important in modeling. We hope that this work will give some reference to the near-field and far-field research for 6G.
[207] arXiv:2404.17273 [pdf, other]: Title: 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

Authors: Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose

Comments: Accepted Information Processing and Management (IP&M), 10 pages, 9 figures and 8 tables

Journal-ref: Information Processing & Management, Volume 61, Issue 4, July 2024, 103716

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval. 3SHNet highlights the salient identification of prominent objects and their spatial locations within the visual modality, thus allowing the integration of visual semantics-spatial interactions and maintaining independence between two modalities. This integration effectively combines object regions with the corresponding semantic and position layouts derived from segmentation to enhance the visual representation. And the modality-independence guarantees efficiency and generalization. Additionally, 3SHNet utilizes the structured contextual visual scene information from segmentation to conduct the local (region-based) or global (grid-based) guidance and achieve accurate hybrid-level retrieval. Extensive experiments conducted on MS-COCO and Flickr30K benchmarks substantiate the superior performances, inference efficiency and generalization of the proposed 3SHNet when juxtaposed with contemporary state-of-the-art methodologies. Specifically, on the larger MS-COCO 5K test set, we achieve 16.3%, 24.8%, and 18.3% improvements in terms of rSum score, respectively, compared with the state-of-the-art methods using different image representations, while maintaining optimal retrieval efficiency. Moreover, our performance on cross-dataset generalization improves by 18.6%. Data and code are available at https://github.com/XuriGe1995/3SHNet.
[208] arXiv:2404.17274 [pdf, other]: Title: Exact and Approximate High-Multiplicity Scheduling on Identical Machines

Authors: Klaus Jansen, Kai Kahler, Esther Zwanger

Comments: 42 pages, 2 figures

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

Goemans and Rothvoss (SODA'14) gave a framework for solving problems in time $enc(P)^{2^{O(N)}}enc(Q)^{O(1)}$ that can be described as finding a point in $\text{int.cone}(P\cap\mathbb{Z}^N)\cap Q$, where $P,Q\subset\mathbb{R}^N$ are (bounded) polyhedra. This framework can be used to solve various scheduling problems, but the encoding length $enc(P)$ usually involves large parameters like the makespan. We describe three tools to improve the framework by Goemans and Rothvoss: Problem-specific preprocessing, LP relaxation techniques and a new bound for the number of vertices of the integer hull.
In particular, applied to the classical scheduling problem $P||C_{\max}$, these tools each improve the running time from $(\log(C_{\max}))^{2^{O(d)}} enc(I)^{O(1)}$ to the possibly much better $(\log(p_{\max}))^{2^{O(d)}}enc(I)^{O(1)}$. Here, $p_{\max}$ is the largest processing time, $d$ is the number of different processing times, $C_{\max}$ is the makespan and $enc(I)$ is the encoding length of the instance. This running time is FPT w.r.t. parameter $d$ if $p_{\max}$ is given in unary. We obtain similar results for various other problems. Moreover, we show how a balancing result by Govzmann et al. can be used to speed up an additive approximation scheme by Buchem et al. (ICALP'21) in the high-multiplicity setting.
On the complexity side, we use reductions from the literature to provide new parameterized lower bounds for $P||C_{\max}$ and to show that the improved running time of the additive approximation algorithm is probably optimal. Finally, we show that the big open question asked by Mnich and van Bevern (Comput. Oper. Res. '18) whether $P||C_{\max}$ is FPT w.r.t. the number of job types $d$ has the same answer as the question whether $Q||C_{\max}$ is FPT w.r.t. the number of job and machine types $d+\tau$ (all in high-multiplicity encoding). The same holds for objective $C_{\min}$.
[209] arXiv:2404.17275 [pdf, other]: Title: Adversarial Reweighting with $α$-Power Maximization for Domain Adaptation

Authors: Xiang Gu, Xi Yu, Yan Yang, Jian Sun, Zongben Xu

Comments: To appear in IJCV

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The practical Domain Adaptation (DA) tasks, e.g., Partial DA (PDA), open-set DA, universal DA, and test-time adaptation, have gained increasing attention in the machine learning community. In this paper, we propose a novel approach, dubbed Adversarial Reweighting with $\alpha$-Power Maximization (ARPM), for PDA where the source domain contains private classes absent in target domain. In ARPM, we propose a novel adversarial reweighting model that adversarially learns to reweight source domain data to identify source-private class samples by assigning smaller weights to them, for mitigating potential negative transfer. Based on the adversarial reweighting, we train the transferable recognition model on the reweighted source distribution to be able to classify common class data. To reduce the prediction uncertainty of the recognition model on the target domain for PDA, we present an $\alpha$-power maximization mechanism in ARPM, which enriches the family of losses for reducing the prediction uncertainty for PDA. Extensive experimental results on five PDA benchmarks, i.e., Office-31, Office-Home, VisDA-2017, ImageNet-Caltech, and DomainNet, show that our method is superior to recent PDA methods. Ablation studies also confirm the effectiveness of components in our approach. To theoretically analyze our method, we deduce an upper bound of target domain expected error for PDA, which is approximately minimized in our approach. We further extend ARPM to open-set DA, universal DA, and test time adaptation, and verify the usefulness through experiments.
[210] arXiv:2404.17276 [pdf, other]: Title: Efficient Deterministic Renewable Energy Forecasting Guided by Multiple-Location Weather Data

Authors: Charalampos Symeonidis, Nikos Nikolaidis

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Electricity generated from renewable energy sources has been established as an efficient remedy for both energy shortages and the environmental pollution stemming from conventional energy production methods. Solar and wind power are two of the most dominant renewable energy sources. The accurate forecasting of the energy generation of those sources facilitates their integration into electric grids, by minimizing the negative impact of uncertainty regarding their management and operation. This paper proposes a novel methodology for deterministic wind and solar energy generation forecasting for multiple generation sites, utilizing multi-location weather forecasts. The method employs a U-shaped Temporal Convolutional Auto-Encoder (UTCAE) architecture for temporal processing of weather-related and energy-related time-series across each site. The Multi-sized Kernels convolutional Spatio-Temporal Attention (MKST-Attention), inspired by the multi-head scaled-dot product attention mechanism, is also proposed aiming to efficiently transfer temporal patterns from weather data to energy data, without a priori knowledge of the locations of the power stations and the locations of provided weather data. The conducted experimental evaluation on a day-ahead solar and wind energy forecasting scenario on five datasets demonstrated that the proposed method achieves top results, outperforming all competitive time-series forecasting state-of-the-art methods.
[211] arXiv:2404.17280 [pdf, other]: Title: Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequency device cepstral coefficient, derived from the graph frequency domain using a device-related linear transformation. We also introduce two novel representations: graph frequency logarithmic coefficient and graph frequency logarithmic device coefficient. We evaluate our methods using traditional Gaussian mixture model and light convolutional neural network systems as classifiers. On the ASVspoof 2017 V2, ASVspoof 2019 physical access, and ASVspoof 2021 physical access datasets, our proposed features outperform known front-ends, demonstrating their effectiveness for replay speech detection.
[212] arXiv:2404.17283 [pdf, other]: Title: Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

Authors: Xuan Zhang, Wei Gao

Comments: Accepted by COLING 2024

Subjects: Computation and Language (cs.CL)

Retrieval-augmented language models have exhibited promising performance across various areas of natural language processing (NLP), including fact-critical tasks. However, due to the black-box nature of advanced large language models (LLMs) and the non-retrieval-oriented supervision signal of specific tasks, the training of retrieval model faces significant challenges under the setting of black-box LLM. We propose an approach leveraging Fine-grained Feedback with Reinforcement Retrieval (FFRR) to enhance fact-checking on news claims by using black-box LLM. FFRR adopts a two-level strategy to gather fine-grained feedback from the LLM, which serves as a reward for optimizing the retrieval policy, by rating the retrieved documents based on the non-retrieval ground truth of the task. We evaluate our model on two public datasets for real-world news claim verification, and the results demonstrate that FFRR achieves significant improvements over strong LLM-enabled and non-LLM baselines.
[213] arXiv:2404.17284 [pdf, ps, other]: Title: Machine Learning based prediction of Vanadium Redox Flow Battery temperature rise under different charge-discharge conditions

Authors: Anirudh Narayan D, Akshat Johar, Divye Kalra, Bhavya Ardeshna, Ankur Bhattacharjee

Comments: 21 pages, 5 figures

Subjects: Machine Learning (cs.LG)

Accurate prediction of battery temperature rise is very essential for designing an efficient thermal management scheme. In this paper, machine learning (ML) based prediction of Vanadium Redox Flow Battery (VRFB) thermal behavior during charge-discharge operation has been demonstrated for the first time. Considering different currents with a specified electrolyte flow rate, the temperature of a kW scale VRFB system is studied through experiments. Three different ML algorithms; Linear Regression (LR), Support Vector Regression (SVR) and Extreme Gradient Boost (XGBoost) have been used for the prediction work. The training and validation of ML algorithms have been done by the practical dataset of a 1kW 6kWh VRFB storage under 40A, 45A, 50A and 60A charge-discharge currents and 10 L min-1 of flow rate. A comparative analysis among the ML algorithms is done in terms of performance metrics such as correlation coefficient (R2), mean absolute error (MAE) and root mean square error (RMSE). It is observed that XGBoost shows the highest accuracy in prediction of around 99%. The ML based prediction results obtained in this work can be very useful for controlling the VRFB temperature rise during operation and act as indicator for further development of an optimized thermal management system.
[214] arXiv:2404.17287 [pdf, other]: Title: When to Trust LLMs: Aligning Confidence with Response Quality

Authors: Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun, Jinyang Gao, Huawei Shen, Bolin Ding

Subjects: Computation and Language (cs.CL)

Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods, which rely on verbalizing confidence to tell the reliability by inducing top-k responses and sampling-aggregating multiple responses, often fail, due to the lack of objective guidance of confidence. To address this, we propose CONfidence-Quality-ORDerpreserving alignment approach (CONQORD), leveraging reinforcement learning with a tailored dual-component reward function. This function encompasses quality reward and orderpreserving alignment reward functions. Specifically, the order-preserving reward incentivizes the model to verbalize greater confidence for responses of higher quality to align the order of confidence and quality. Experiments demonstrate that our CONQORD significantly improves the alignment performance between confidence levels and response accuracy, without causing the model to become over-cautious. Furthermore, the aligned confidence provided by CONQORD informs when to trust LLMs, and acts as a determinant for initiating the retrieval process of external knowledge. Aligning confidence with response quality ensures more transparent and reliable responses, providing better trustworthiness.
[215] arXiv:2404.17288 [pdf, other]: Title: ExcluIR: Exclusionary Neural Information Retrieval

Authors: Wenhao Zhang, Mengqi Zhang, Shiguang Wu, Jiahuan Pei, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Pengjie Ren

Subjects: Information Retrieval (cs.IR)

Exclusion is an important and universal linguistic skill that humans use to express what they do not want. However, in information retrieval community, there is little research on exclusionary retrieval, where users express what they do not want in their queries. In this work, we investigate the scenario of exclusionary retrieval in document retrieval for the first time. We present ExcluIR, a set of resources for exclusionary retrieval, consisting of an evaluation benchmark and a training set for helping retrieval models to comprehend exclusionary queries. The evaluation benchmark includes 3,452 high-quality exclusionary queries, each of which has been manually annotated. The training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document. We conduct detailed experiments and analyses, obtaining three main observations: (1) Existing retrieval models with different architectures struggle to effectively comprehend exclusionary queries; (2) Although integrating our training data can improve the performance of retrieval models on exclusionary retrieval, there still exists a gap compared to human performance; (3) Generative retrieval models have a natural advantage in handling exclusionary queries. To facilitate future research on exclusionary retrieval, we share the benchmark and evaluation scripts on \url{https://github.com/zwh-sdu/ExcluIR}.
[216] arXiv:2404.17290 [pdf, ps, other]: Title: Efficient Orthogonal Decomposition with Automatic Basis Extraction for Low-Rank Matrix Approximation

Authors: Weijie Shen, Weiwei Xu, Lei Zhu

Subjects: Numerical Analysis (math.NA)

Low-rank matrix approximation play a ubiquitous role in various applications such as image processing, signal processing, and data analysis. Recently, random algorithms of low-rank matrix approximation have gained widespread adoption due to their speed, accuracy, and robustness, particularly in their improved implementation on modern computer architectures. Existing low-rank approximation algorithms often require prior knowledge of the rank of the matrix, which is typically unknown. To address this bottleneck, we propose a low-rank approximation algorithm termed efficient orthogonal decomposition with automatic basis extraction (EOD-ABE) tailored for the scenario where the rank of the matrix is unknown. Notably, we introduce a randomized algorithm to automatically extract the basis that reveals the rank. The efficacy of the proposed algorithms is theoretically and numerically validated, demonstrating superior speed, accuracy, and robustness compared to existing methods. Furthermore, we apply the algorithms to image reconstruction, achieving remarkable results.
[217] arXiv:2404.17292 [pdf, other]: Title: The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version

Authors: Gabriel Kronberger, Fabricio Olivetti de Franca, Harry Desmond, Deaglan J. Bartlett, Lukas Kammerer

Comments: This is an extended version of the article submitted to Parallel Problem Solving from Nature (PPSN) Conference 2024

Subjects: Neural and Evolutionary Computing (cs.NE); Astrophysics of Galaxies (astro-ph.GA); Instrumentation and Methods for Astrophysics (astro-ph.IM)

We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings, allowing exhaustive enumeration of all solutions. This enables us to quantify the success probability of finding the best possible expressions, and to compare the search efficiency of genetic programming to random search in the space of semantically unique expressions. This analysis is made possible by improved algorithms for equality saturation, which we use to improve the Exhaustive Symbolic Regression algorithm; this produces the set of semantically unique expression structures, orders of magnitude smaller than the full symbolic regression search space. We compare the efficiency of random search in the set of unique expressions and genetic programming. For our experiments we use two real-world datasets where symbolic regression has been used to produce well-fitting univariate expressions: the Nikuradse dataset of flow in rough pipes and the Radial Acceleration Relation of galaxy dynamics. The results show that genetic programming in such limited settings explores only a small fraction of all unique expressions, and evaluates expressions repeatedly that are congruent to already visited expressions.
[218] arXiv:2404.17293 [pdf, other]: Title: Lazy Data Practices Harm Fairness Research

Authors: Jan Simson, Alessandro Fabris, Christoph Kern

Comments: Accepted for publication at the ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2024

Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Applications (stat.AP); Machine Learning (stat.ML)

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by highlighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications.
Our analyses identify three main areas of concern: (1) a \textbf{lack of representation for certain protected attributes} in both data and evaluations; (2) the widespread \textbf{exclusion of minorities} during data preprocessing; and (3) \textbf{opaque data processing} threatening the generalization of fairness research. By conducting exemplary analyses on the utilization of prominent datasets, we demonstrate how unreflective data decisions disproportionately affect minority groups, fairness metrics, and resultant model comparisons. Additionally, we identify supplementary factors such as limitations in publicly available data, privacy considerations, and a general lack of awareness, which exacerbate these challenges. To address these issues, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
[219] arXiv:2404.17297 [pdf, ps, other]: Title: Denotation-based Compositional Compiler Verification

Authors: Zheng Cheng, Jiyang Wu, Di Wang, Qinxiang Cao

Comments: 38 pages, 8 figures

Subjects: Programming Languages (cs.PL)

A desired but challenging property of compiler verification is compositionality in the sense that the compilation correctness of a program can be deduced from that of its substructures ranging from statements, functions, and modules incrementally. Previously proposed approaches have devoted extensive effort to module-level compositionality based on small-step semantics and simulation theories. This paper proposes a novel compiler verification framework based on denotational semantics for better compositionality. Specifically, our denotational semantics is defined by semantic functions that map a syntactic component to a semantic domain composed of multiple behavioral \emph{sets}, and compiler correctness is defined by the behavioral refinement between semantic domains of the source and the target programs. Therefore, when proving compiler correctness, we can extensively leverage the algebraic properties of sets. Another important contribution is that our formalization of denotational semantics captures the full meaning of a program and bridges the gap between those based on conventional powerdomains and what realistic compiler verification actually needs. We demonstrate our denotation-based framework viable and practical by applying it to the verification of the front-end of CompCert and showing that the compositionality from the compilation correctness of sub-statements to statements, from functions to modules, and from modules to the whole program (i.e., module-level compositionality) can be achieved similarly.
[220] arXiv:2404.17298 [pdf, other]: Title: Automatic Target-Less Camera-LiDAR Calibration From Motion and Deep Point Correspondences

Authors: Kürsat Petek, Niclas Vödisch, Johannes Meyer, Daniele Cattaneo, Abhinav Valada, Wolfram Burgard

Subjects: Robotics (cs.RO)

Sensor setups of robotic platforms commonly include both camera and LiDAR as they provide complementary information. However, fusing these two modalities typically requires a highly accurate calibration between them. In this paper, we propose MDPCalib which is a novel method for camera-LiDAR calibration that requires neither human supervision nor any specific target objects. Instead, we utilize sensor motion estimates from visual and LiDAR odometry as well as deep learning-based 2D-pixel-to-3D-point correspondences that are obtained without in-domain retraining. We represent the camera-LiDAR calibration as a graph optimization problem and minimize the costs induced by constraints from sensor motion and point correspondences. In extensive experiments, we demonstrate that our approach yields highly accurate extrinsic calibration parameters and is robust to random initialization. Additionally, our approach generalizes to a wide range of sensor setups, which we demonstrate by employing it on various robotic platforms including a self-driving perception car, a quadruped robot, and a UAV. To make our calibration method publicly accessible, we release the code on our project website at this http URL
[221] arXiv:2404.17302 [pdf, other]: Title: Part-Guided 3D RL for Sim2Real Articulated Object Manipulation

Authors: Pengwei Xie, Rui Chen, Siang Chen, Yuzhe Qin, Fanbo Xiang, Tianyu Sun, Jing Xu, Guijin Wang, Hao Su

Comments: 9 pages

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance learning or other pre-trained visual models to guide manipulation policies, which face challenges for novel instances in real-world scenarios. In this paper, we propose a novel part-guided 3D RL framework, which can learn to manipulate articulated objects without demonstrations. We combine the strengths of 2D segmentation and 3D RL to improve the efficiency of RL policy training. To improve the stability of the policy on real robots, we design a Frame-consistent Uncertainty-aware Sampling (FUS) strategy to get a condensed and hierarchical 3D representation. In addition, a single versatile RL policy can be trained on multiple articulated object manipulation tasks simultaneously in simulation and shows great generalizability to novel categories and instances. Experimental results demonstrate the effectiveness of our framework in both simulation and real-world settings. Our code is available at https://github.com/THU-VCLab/Part-Guided-3D-RL-for-Sim2Real-Articulated-Object-Manipulation.
[222] arXiv:2404.17310 [pdf, other]: Title: Image Copy-Move Forgery Detection via Deep PatchMatch and Pairwise Ranking Learning

Authors: Yuanman Li, Yingjie He, Changsheng Chen, Li Dong, Bin Li, Jiantao Zhou, Xia Li

Comments: 16 pages, 14figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances in deep learning algorithms have shown impressive progress in image copy-move forgery detection (CMFD). However, these algorithms lack generalizability in practical scenarios where the copied regions are not present in the training images, or the cloned regions are part of the background. Additionally, these algorithms utilize convolution operations to distinguish source and target regions, leading to unsatisfactory results when the target regions blend well with the background. To address these limitations, this study proposes a novel end-to-end CMFD framework that integrates the strengths of conventional and deep learning methods. Specifically, the study develops a deep cross-scale PatchMatch (PM) method that is customized for CMFD to locate copy-move regions. Unlike existing deep models, our approach utilizes features extracted from high-resolution scales to seek explicit and reliable point-to-point matching between source and target regions. Furthermore, we propose a novel pairwise rank learning framework to separate source and target regions. By leveraging the strong prior of point-to-point matches, the framework can identify subtle differences and effectively discriminate between source and target regions, even when the target regions blend well with the background. Our framework is fully differentiable and can be trained end-to-end. Comprehensive experimental results highlight the remarkable generalizability of our scheme across various copy-move scenarios, significantly outperforming existing methods.
[223] arXiv:2404.17313 [pdf, other]: Title: Towards Group-aware Search Success

Authors: Haolun Wu, Bhaskar Mitra, Nick Craswell

Subjects: Information Retrieval (cs.IR)

Traditional measures of search success often overlook the varying information needs of different demographic groups. To address this gap, we introduce a novel metric, named Group-aware Search Success (GA-SS). GA-SS redefines search success to ensure that all demographic groups achieve satisfaction from search outcomes. We introduce a comprehensive mathematical framework to calculate GA-SS, incorporating both static and stochastic ranking policies and integrating user browsing models for a more accurate assessment. In addition, we have proposed Group-aware Most Popular Completion (gMPC) ranking model to account for demographic variances in user intent, aligning more closely with the diverse needs of all user groups. We empirically validate our metric and approach with two real-world datasets: one focusing on query auto-completion and the other on movie recommendations, where the results highlight the impact of stochasticity and the complex interplay among various search success metrics. Our findings advocate for a more inclusive approach in measuring search success, as well as inspiring future investigations into the quality of service of search.
[224] arXiv:2404.17316 [pdf, other]: Title: Certified MaxSAT Preprocessing

Authors: Hannes Ihalainen, Andy Oertel, Yong Kiam Tan, Jeremias Berg, Matti Järvisalo, Jakob Nordström

Subjects: Artificial Intelligence (cs.AI)

Building on the progress in Boolean satisfiability (SAT) solving over the last decades, maximum satisfiability (MaxSAT) has become a viable approach for solving NP-hard optimization problems, but ensuring correctness of MaxSAT solvers has remained an important concern. For SAT, this is largely a solved problem thanks to the use of proof logging, meaning that solvers emit machine-verifiable proofs of (un)satisfiability to certify correctness. However, for MaxSAT, proof logging solvers have started being developed only very recently. Moreover, these nascent efforts have only targeted the core solving process, ignoring the preprocessing phase where input problem instances can be substantially reformulated before being passed on to the solver proper. In this work, we demonstrate how pseudo-Boolean proof logging can be used to certify the correctness of a wide range of modern MaxSAT preprocessing techniques. By combining and extending the VeriPB and CakePB tools, we provide formally verified, end-to-end proof checking that the input and preprocessed output MaxSAT problem instances have the same optimal value. An extensive evaluation on applied MaxSAT benchmarks shows that our approach is feasible in practice.
[225] arXiv:2404.17317 [pdf, other]: Title: Colosseum: The Open RAN Digital Twin

Authors: Michele Polese, Leonardo Bonati, Salvatore D'Oro, Pedram Johari, Davide Villa, Sakthivel Velumani, Rajeev Gangula, Maria Tsampazi, Clifton Paul Robinson, Gabriele Gemmi, Andrea Lacava, Stefano Maxenti, Hai Cheng, Tommaso Melodia

Comments: 13 pages, 8 figures, 1 table, submitted to IEEE for publication

Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

Recent years have witnessed the Open Radio Access Network (RAN) paradigm transforming the fundamental ways cellular systems are deployed, managed, and optimized. This shift is led by concepts such as openness, softwarization, programmability, interoperability, and intelligence of the network, all of which had never been applied to the cellular ecosystem before. The realization of the Open RAN vision into practical architectures, intelligent data-driven control loops, and efficient software implementations, however, is a multifaceted challenge, which requires (i) datasets to train Artificial Intelligence (AI) and Machine Learning (ML) models; (ii) facilities to test models without disrupting production networks; (iii) continuous and automated validation of the RAN software; and (iv) significant testing and integration efforts. This paper poses itself as a tutorial on how Colosseum - the world's largest wireless network emulator with hardware in the loop - can provide the research infrastructure and tools to fill the gap between the Open RAN vision, and the deployment and commercialization of open and programmable networks. We describe how Colosseum implements an Open RAN digital twin through a high-fidelity Radio Frequency (RF) channel emulator and end-to-end softwarized O-RAN and 5G-compliant protocol stacks, thus allowing users to reproduce and experiment upon topologies representative of real-world cellular deployments. Then, we detail the twinning infrastructure of Colosseum, as well as the automation pipelines for RF and protocol stack twinning. Finally, we showcase a broad range of Open RAN use cases implemented on Colosseum, including the real-time connection between the digital twin and real-world networks, and the development, prototyping, and testing of AI/ML solutions for Open RAN.
[226] arXiv:2404.17318 [pdf, other]: Title: Performance Bounds of Near-Field Sensing with Circular Arrays

Authors: Zhaolin Wang, Xidong Mu, Yuanwei Liu

Comments: 6 pages, 6 figures. arXiv admin note: text overlap with arXiv:2404.05076

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The performance bounds of near-field sensing are studied for circular arrays, focusing on the impact of bandwidth and array size. The closed-form Cramer-Rao bound (CRBs) for angle and distance estimation are derived, revealing the scaling laws of the CRBs with bandwidth and array size. Contrary to expectations, enlarging array size does not always enhance sensing performance. Furthermore, the asymptotic CRBs are analyzed under different conditions, unveiling that the derived expressions include the existing results as special cases. Finally, the derived expressions are validated through numerical results.
[227] arXiv:2404.17323 [pdf, other]: Title: A Deep Dive into Effects of Structural Bias on CMA-ES Performance along Affine Trajectories

Authors: Niki van Stein, Sarah L. Thomson, Anna V. Kononova

Comments: 15 pages, 5 figures, submitted to PPSN 2024

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

To guide the design of better iterative optimisation heuristics, it is imperative to understand how inherent structural biases within algorithm components affect the performance on a wide variety of search landscapes. This study explores the impact of structural bias in the modular Covariance Matrix Adaptation Evolution Strategy (modCMA), focusing on the roles of various modulars within the algorithm. Through an extensive investigation involving 435,456 configurations of modCMA, we identified key modules that significantly influence structural bias of various classes. Our analysis utilized the Deep-BIAS toolbox for structural bias detection and classification, complemented by SHAP analysis for quantifying module contributions. The performance of these configurations was tested on a sequence of affine-recombined functions, maintaining fixed optimum locations while gradually varying the landscape features. Our results demonstrate an interplay between module-induced structural bias and algorithm performance across different landscape characteristics.
[228] arXiv:2404.17324 [pdf, other]: Title: Dense Road Surface Grip Map Prediction from Multimodal Image Data

Authors: Jyri Maanpää, Julius Pesonen, Heikki Hyyti, Iaroslav Melekhov, Juho Kannala, Petri Manninen, Antero Kukko, Juha Hyyppä

Comments: 17 pages, 7 figures (supplementary material 1 page, 1 figure). Submitted to 27th International Conference of Pattern Recognition (ICPR 2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Slippery road weather conditions are prevalent in many regions and cause a regular risk for traffic. Still, there has been less research on how autonomous vehicles could detect slippery driving conditions on the road to drive safely. In this work, we propose a method to predict a dense grip map from the area in front of the car, based on postprocessed multimodal sensor data. We trained a convolutional neural network to predict pixelwise grip values from fused RGB camera, thermal camera, and LiDAR reflectance images, based on weakly supervised ground truth from an optical road weather sensor.
The experiments show that it is possible to predict dense grip values with good accuracy from the used data modalities as the produced grip map follows both ground truth measurements and local weather conditions, such as snowy areas on the road. The model using only the RGB camera or LiDAR reflectance modality provided good baseline results for grip prediction accuracy while using models fusing the RGB camera, thermal camera, and LiDAR modalities improved the grip predictions significantly.
[229] arXiv:2404.17325 [pdf, ps, other]: Title: Towards Scalable Multi-Chip Wireless Networks with Near-Field Time Reversal

Authors: Ama Bandara, Fátima Rodríguez-Galán, Pau Talarn, Elana Pereira de Santana, Peter Haring Bolívar, Eduard Alarcón, Sergi Abadal

Subjects: Emerging Technologies (cs.ET); Signal Processing (eess.SP)

The concept of Wireless Network-on-Chip (WNoC) has emerged as a potential solution to address the escalating communication demands of modern computing systems due to their low-latency, versatility, and reconfigurability. However, for WNoC to fulfill its potential, it is essential to establish multiple high-speed wireless links across chips. Unfortunately, the compact and enclosed nature of computing packages introduces significant challenges in the form of Co-Channel Interference (CCI) and Inter-Symbol Interference (ISI), which not only hinder the deployment of multiple spatial channels but also severely restrict the symbol rate of each individual channel. In this paper, we posit that Time Reversal (TR) could be effective in addressing both impairments in this static scenario thanks to its spatiotemporal focusing capabilities even in the near field. Through comprehensive full-wave simulations and bit error rate analysis in multiple scenarios and at multiple frequency bands, we provide evidence that TR can increase the symbol rate by an order of magnitude, enabling the deployment of multiple concurrent links and achieving aggregate speeds exceeding 100 Gb/s. Finally, we evaluate the impact of reducing the sampling rate of the TR filter on the achievable speeds, paving the way to practical TR-based wireless communications at the chip scale.
[230] arXiv:2404.17331 [pdf, ps, other]: Title: Finite Sample Analysis for a Class of Subspace Identification Methods

Authors: Jiabao He, Ingvar Ziemann, Cristian R. Rojas, Håkan Hjalmarsson

Subjects: Systems and Control (eess.SY)

While subspace identification methods (SIMs) are appealing due to their simple parameterization for MIMO systems and robust numerical realizations, a comprehensive statistical analysis of SIMs remains an open problem, especially in the non-asymptotic regime. In this work, we provide a finite sample analysis for a class of SIMs, which reveals that the convergence rates for estimating Markov parameters and system matrices are $\mathcal{O}(1/\sqrt{N})$, in line with classical asymptotic results. Based on the observation that the model format in classical SIMs becomes non-causal because of a projection step, we choose a parsimonious SIM that bypasses the projection step and strictly enforces a causal model to facilitate the analysis, where a bank of ARX models are estimated in parallel. Leveraging recent results from finite sample analysis of an individual ARX model, we obtain an overall error bound of an array of ARX models and proceed to derive error bounds for system matrices via robustness results for the singular value decomposition.
[231] arXiv:2404.17332 [pdf, other]: Title: Managing Security Evidence in Safety-Critical Organizations

Authors: Mazen Mohamad, Jan-Philipp Steghöfer, Eric Knauss, Riccardo Scandariato

Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)

With the increasing prevalence of open and connected products, cybersecurity has become a serious issue in safety-critical domains such as the automotive industry. As a result, regulatory bodies have become more stringent in their requirements for cybersecurity, necessitating security assurance for products developed in these domains. In response, companies have implemented new or modified processes to incorporate security into their product development lifecycle, resulting in a large amount of evidence being created to support claims about the achievement of a certain level of security. However, managing evidence is not a trivial task, particularly for complex products and systems. This paper presents a qualitative interview study conducted in six companies on the maturity of managing security evidence in safety-critical organizations. We find that the current maturity of managing security evidence is insufficient for the increasing requirements set by certification authorities and standardization bodies. Organisations currently fail to identify relevant artifacts as security evidence and manage this evidence on an organizational level. One part of the reason are educational gaps, the other a lack of processes. The impact of AI on the management of security evidence is still an open question
[232] arXiv:2404.17335 [pdf, other]: Title: A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation

Authors: Xin Zhang, Liangxiu Han, Tam Sobeih, Lianghao Han, Darren Dancey

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Depth estimation is crucial for interpreting complex environments, especially in areas such as autonomous vehicle navigation and robotics. Nonetheless, obtaining accurate depth readings from event camera data remains a formidable challenge. Event cameras operate differently from traditional digital cameras, continuously capturing data and generating asynchronous binary spikes that encode time, location, and light intensity. Yet, the unique sampling mechanisms of event cameras render standard image based algorithms inadequate for processing spike data. This necessitates the development of innovative, spike-aware algorithms tailored for event cameras, a task compounded by the irregularity, continuity, noise, and spatial and temporal characteristics inherent in spiking data.Harnessing the strong generalization capabilities of transformer neural networks for spatiotemporal data, we propose a purely spike-driven spike transformer network for depth estimation from spiking camera data. To address performance limitations with Spiking Neural Networks (SNN), we introduce a novel single-stage cross-modality knowledge transfer framework leveraging knowledge from a large vision foundational model of artificial neural networks (ANN) (DINOv2) to enhance the performance of SNNs with limited data. Our experimental results on both synthetic and real datasets show substantial improvements over existing models, with notable gains in Absolute Relative and Square Relative errors (49% and 39.77% improvements over the benchmark model Spike-T, respectively). Besides accuracy, the proposed model also demonstrates reduced power consumptions, a critical factor for practical applications.
[233] arXiv:2404.17336 [pdf, other]: Title: Introducing cosmosGPT: Monolingual Training for Turkish Language Models

Authors: H. Toprak Kesgin, M. Kaan Yuce, Eren Dogan, M. Egemen Uzun, Atahan Uz, H. Emre Seyrek, Ahmed Zeer, M. Fatih Amasyali

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The number of open source language models that can produce Turkish is increasing day by day, as in other languages. In order to create the basic versions of such models, the training of multilingual models is usually continued with Turkish corpora. The alternative is to train the model with only Turkish corpora. In this study, we first introduce the cosmosGPT models that we created with this alternative method. Then, we introduce new finetune datasets for basic language models to fulfill user requests and new evaluation datasets for measuring the capabilities of Turkish language models. Finally, a comprehensive comparison of the adapted Turkish language models on different capabilities is presented. The results show that the language models we built with the monolingual corpus have promising performance despite being about 10 times smaller than the others.
[234] arXiv:2404.17337 [pdf, other]: Title: Metronome: tracing variation in poetic meters via local sequence alignment

Authors: Ben Nagy, Artjoms Šeļa, Mirella De Sisto, Petr Plecháč

Subjects: Computation and Language (cs.CL)

All poetic forms come from somewhere. Prosodic templates can be copied for generations, altered by individuals, imported from foreign traditions, or fundamentally changed under the pressures of language evolution. Yet these relationships are notoriously difficult to trace across languages and times. This paper introduces an unsupervised method for detecting structural similarities in poems using local sequence alignment. The method relies on encoding poetic texts as strings of prosodic features using a four-letter alphabet; these sequences are then aligned to derive a distance measure based on weighted symbol (mis)matches. Local alignment allows poems to be clustered according to emergent properties of their underlying prosodic patterns. We evaluate method performance on a meter recognition tasks against strong baselines and show its potential for cross-lingual and historical research using three short case studies: 1) mutations in quantitative meter in classical Latin, 2) European diffusion of the Renaissance hendecasyllable, and 3) comparative alignment of modern meters in 18--19th century Czech, German and Russian. We release an implementation of the algorithm as a Python package with an open license.
[235] arXiv:2404.17338 [pdf, other]: Title: Towards an Approach to Pattern-based Domain-Specific Requirements Engineering

Authors: T.Chuprina, D.Méndez (1,2), V.Nigam, M.Reich (3,4), A.Schweiger (3) ((1) fortiss GmbH, (2) Blekinge Institute of Technology, (3) Airbus Defence and Space GmbH, (4) Technische University of Chemnitz)

Comments: 6 pages with 3 figures

Subjects: Software Engineering (cs.SE)

Requirements specification patterns have received much attention as they promise to guide the structured specification of natural language requirements. By using them, the intention is to reduce quality problems related to requirements artifacts. Patterns may need to vary in their syntax (e.g. domain details/ parameter incorporation) and semantics according to the particularities of the application domain. However, pattern-based approaches, such as EARS, are designed domain-independently to facilitate their wide adoption across several domains. Little is yet known about how to adopt the principle idea of pattern-based requirements engineering to cover domain-specificity in requirements engineering and, ideally, integrate requirements engineering activities into quality assurance tasks. In this paper, we propose the Pattern-based Domain-specific Requirements Engineering Approach for the specification of functional and performance requirements in a holistic manner. This approach emerges from an academia-industry collaboration and is our first attempt to frame an approach which allows for analyzing domain knowledge and incorporating it into the requirements engineering process enabling automated checks for requirements quality assurance and computer-aided support for system verification. Our contribution is two-fold: First, we present a solution to pattern-based domain-specific requirements engineering and its exemplary integration into quality assurance techniques. Second, we showcase a proof of concept using a tool implementation for the domain of flight controllers for Unmanned Aerial Vehicles. Both shall allow us to outline next steps in our research agenda and foster discussions in this direction.
[236] arXiv:2404.17340 [pdf, other]: Title: Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

Authors: Chengliang Liu, Jie Wen, Yabo Liu, Chao Huang, Zhihao Wu, Xiaoling Luo, Yong Xu

Comments: Accepted at NeurIPS 2023. Email: liucl1996@163.com

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networks to solve this problem. The core innovation of our method lies in decoupling the single-channel view-level representation, which is common in deep multi-view learning methods, into a shared representation and a view-proprietary representation. We also design a cross-channel contrastive loss to enhance the semantic property of the two channels. Additionally, we exploit supervised information to design a label-guided graph regularization loss, helping the extracted embedding features preserve the geometric structure among samples. Inspired by the success of masking mechanisms in image and text analysis, we develop a random fragment masking strategy for vector features to improve the learning ability of encoders. Finally, it is important to emphasize that our model is fully adaptable to arbitrary view and label absences while also performing well on the ideal full data. We have conducted sufficient and convincing experiments to confirm the effectiveness and advancement of our model.
[237] arXiv:2404.17342 [pdf, other]: Title: Can a Multichoice Dataset be Repurposed for Extractive Question Answering?

Authors: Teresa Lynn, Malik H. Altakrori, Samar Mohamed Magdy, Rocktim Jyoti Das, Chenyang Lyu, Mohamed Nasr, Younes Samih, Alham Fikri Aji, Preslav Nakov, Shantanu Godbole, Salim Roukos, Radu Florian, Nizar Habash

Comments: Paper 8 pages, Appendix 12 pages. Submitted to ARR

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The rapid evolution of Natural Language Processing (NLP) has favored major languages such as English, leaving a significant gap for many others due to limited resources. This is especially evident in the context of data annotation, a task whose importance cannot be underestimated, but which is time-consuming and costly. Thus, any dataset for resource-poor languages is precious, in particular when it is task-specific. Here, we explore the feasibility of repurposing existing datasets for a new NLP task: we repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA), to enable extractive QA (EQA) in the style of machine reading comprehension. We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA). We also present QA evaluation results for several monolingual and cross-lingual QA pairs including English, MSA, and five Arabic dialects. Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced. We also conduct a thorough analysis and share our insights from the process, which we hope will contribute to a deeper understanding of the challenges and the opportunities associated with task reformulation in NLP research.
[238] arXiv:2404.17343 [pdf, other]: Title: A Bionic Natural Language Parser Equivalent to a Pushdown Automaton

Authors: Zhenghao Wei, Kehua Lin, Jianlin Feng

Comments: to be published in IJCNN 2024

Subjects: Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL)

Assembly Calculus (AC), proposed by Papadimitriou et al., aims to reproduce advanced cognitive functions through simulating neural activities, with several applications based on AC having been developed, including a natural language parser proposed by Mitropolsky et al. However, this parser lacks the ability to handle Kleene closures, preventing it from parsing all regular languages and rendering it weaker than Finite Automata (FA). In this paper, we propose a new bionic natural language parser (BNLP) based on AC and integrates two new biologically rational structures, Recurrent Circuit and Stack Circuit which are inspired by RNN and short-term memory mechanism. In contrast to the original parser, the BNLP can fully handle all regular languages and Dyck languages. Therefore, leveraging the Chomsky-Sch \H{u}tzenberger theorem, the BNLP which can parse all Context-Free Languages can be constructed. We also formally prove that for any PDA, a Parser Automaton corresponding to BNLP can always be formed, ensuring that BNLP has a description ability equal to that of PDA and addressing the deficiencies of the original parser.
[239] arXiv:2404.17344 [pdf, other]: Title: Fast Evaluation of Additive Kernels: Feature Arrangement, Fourier Methods, and Kernel Derivatives

Authors: Theresa Wagner, Franziska Nestler, Martin Stoll

Comments: Official code this https URL

Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

One of the main computational bottlenecks when working with kernel based learning is dealing with the large and typically dense kernel matrix. Techniques dealing with fast approximations of the matrix vector product for these kernel matrices typically deteriorate in their performance if the feature vectors reside in higher-dimensional feature spaces. We here present a technique based on the non-equispaced fast Fourier transform (NFFT) with rigorous error analysis. We show that this approach is also well suited to allow the approximation of the matrix that arises when the kernel is differentiated with respect to the kernel hyperparameters; a problem often found in the training phase of methods such as Gaussian processes. We also provide an error analysis for this case. We illustrate the performance of the additive kernel scheme with fast matrix vector products on a number of data sets. Our code is available at https://github.com/wagnertheresa/NFFTAddKer
[240] arXiv:2404.17347 [pdf, other]: Title: InspectorRAGet: An Introspection Platform for RAG Evaluation

Authors: Kshitij Fadnis, Siva Sankalp Patel, Odellia Boni, Yannis Katsis, Sara Rosenthal, Benjamin Sznajder, Marina Danilevsky

Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)

Large Language Models (LLM) have become a popular approach for implementing Retrieval Augmented Generation (RAG) systems, and a significant amount of effort has been spent on building good models and metrics. In spite of increased recognition of the need for rigorous evaluation of RAG systems, few tools exist that go beyond the creation of model output and automatic calculation. We present InspectorRAGet, an introspection platform for RAG evaluation. InspectorRAGet allows the user to analyze aggregate and instance-level performance of RAG systems, using both human and algorithmic metrics as well as annotator quality. InspectorRAGet is suitable for multiple use cases and is available publicly to the community. The demo video is available at https://youtu.be/MJhe8QIXcEc
[241] arXiv:2404.17350 [pdf, other]: Title: On the Road to Clarity: Exploring Explainable AI for World Models in a Driver Assistance System

Authors: Mohamed Roshdi, Julian Petzold, Mostafa Wahby, Hussein Ebrahim, Mladen Berekovic, Heiko Hamann

Comments: 8 pages, 6 figures, to be published in IEEE CAI 2024

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)

In Autonomous Driving (AD) transparency and safety are paramount, as mistakes are costly. However, neural networks used in AD systems are generally considered black boxes. As a countermeasure, we have methods of explainable AI (XAI), such as feature relevance estimation and dimensionality reduction. Coarse graining techniques can also help reduce dimensionality and find interpretable global patterns. A specific coarse graining method is Renormalization Groups from statistical physics. It has previously been applied to Restricted Boltzmann Machines (RBMs) to interpret unsupervised learning. We refine this technique by building a transparent backbone model for convolutional variational autoencoders (VAE) that allows mapping latent values to input features and has performance comparable to trained black box VAEs. Moreover, we propose a custom feature map visualization technique to analyze the internal convolutional layers in the VAE to explain internal causes of poor reconstruction that may lead to dangerous traffic scenarios in AD applications. In a second key contribution, we propose explanation and evaluation techniques for the internal dynamics and feature relevance of prediction networks. We test a long short-term memory (LSTM) network in the computer vision domain to evaluate the predictability and in future applications potentially safety of prediction models. We showcase our methods by analyzing a VAE-LSTM world model that predicts pedestrian perception in an urban traffic situation.
[242] arXiv:2404.17358 [pdf, ps, other]: Title: Adversarial Consistency and the Uniqueness of the Adversarial Bayes Classifier

Authors: Natalie S. Frank

Comments: 17 pages

Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

Adversarial training is a common technique for learning robust classifiers. Prior work showed that convex surrogate losses are not statistically consistent in the adversarial context -- or in other words, a minimizing sequence of the adversarial surrogate risk will not necessarily minimize the adversarial classification error. We connect the consistency of adversarial surrogate losses to properties of minimizers to the adversarial classification risk, known as \emph{adversarial Bayes classifiers}. Specifically, under reasonable distributional assumptions, a convex loss is statistically consistent for adversarial learning iff the adversarial Bayes classifier satisfies a certain notion of uniqueness.
[243] arXiv:2404.17360 [pdf, other]: Title: UniRGB-IR: A Unified Framework for Visible-Infrared Downstream Tasks via Adapter Tuning

Authors: Maoxun Yuan, Bo Cui, Tianyi Zhao, Xingxing Wei

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Semantic analysis on visible (RGB) and infrared (IR) images has gained attention for its ability to be more accurate and robust under low-illumination and complex weather conditions. Due to the lack of pre-trained foundation models on the large-scale infrared image datasets, existing methods prefer to design task-specific frameworks and directly fine-tune them with pre-trained foundation models on their RGB-IR semantic relevance datasets, which results in poor scalability and limited generalization. In this work, we propose a scalable and efficient framework called UniRGB-IR to unify RGB-IR downstream tasks, in which a novel adapter is developed to efficiently introduce richer RGB-IR features into the pre-trained RGB-based foundation model. Specifically, our framework consists of a vision transformer (ViT) foundation model, a Multi-modal Feature Pool (MFP) module and a Supplementary Feature Injector (SFI) module. The MFP and SFI modules cooperate with each other as an adpater to effectively complement the ViT features with the contextual multi-scale features. During training process, we freeze the entire foundation model to inherit prior knowledge and only optimize the MFP and SFI modules. Furthermore, to verify the effectiveness of our framework, we utilize the ViT-Base as the pre-trained foundation model to perform extensive experiments. Experimental results on various RGB-IR downstream tasks demonstrate that our method can achieve state-of-the-art performance. The source code and results are available at https://github.com/PoTsui99/UniRGB-IR.git.
[244] arXiv:2404.17364 [pdf, other]: Title: MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Authors: Haoyu Wang, Zhilu Zhang, Donglin Di, Shiliang Zhang, Wangmeng Zuo

Comments: 15 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given clothing. However, most existing methods solely focus on the frontal try-on using the frontal clothing. When the views of the clothing and person are significantly inconsistent, particularly when the person's view is non-frontal, the results are unsatisfactory. To address this challenge, we introduce Multi-View Virtual Try-ON (MV-VTON), which aims to reconstruct the dressing results of a person from multiple views using the given clothes. On the one hand, given that single-view clothes provide insufficient information for MV-VTON, we instead employ two images, i.e., the frontal and back views of the clothing, to encompass the complete view as much as possible. On the other hand, the diffusion models that have demonstrated superior abilities are adopted to perform our MV-VTON. In particular, we propose a view-adaptive selection method where hard-selection and soft-selection are applied to the global and local clothing feature extraction, respectively. This ensures that the clothing features are roughly fit to the person's view. Subsequently, we suggest a joint attention block to align and fuse clothing features with person features. Additionally, we collect a MV-VTON dataset, i.e., Multi-View Garment (MVG), in which each person has multiple photos with diverse views and poses. Experiments show that the proposed method not only achieves state-of-the-art results on MV-VTON task using our MVG dataset, but also has superiority on frontal-view virtual try-on task using VITON-HD and DressCode datasets. Codes and datasets will be publicly released at https://github.com/hywang2002/MV-VTON .
[245] arXiv:2404.17367 [pdf, ps, other]: Title: An Optimised Brushless DC Motor Control Scheme for Robotics Applications

Authors: Nilabha Das, Laxman Rao S. Paragond, Balkrushna H. Waghmare

Comments: 6 Pages, 8 figures, 1 table

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This work aims to develop an integrated control strategy for Brushless Direct Current Motors for a wide range of applications in robotics systems. The controller is suited for both high torque - low speed and high-speed control of the motors. Hardware validation is done by developing a custom BLDC drive system, and the circuit elements are optimised for power efficiency.
[246] arXiv:2404.17371 [pdf, other]: Title: Estimating the Robustness Radius for Randomized Smoothing with 100$\times$ Sample Efficiency

Authors: Emmanouil Seferis, Stefanos Kollias, Chih-Hong Cheng

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Randomized smoothing (RS) has successfully been used to improve the robustness of predictions for deep neural networks (DNNs) by adding random noise to create multiple variations of an input, followed by deciding the consensus. To understand if an RS-enabled DNN is effective in the sampled input domains, it is mandatory to sample data points within the operational design domain, acquire the point-wise certificate regarding robustness radius, and compare it with pre-defined acceptance criteria. Consequently, ensuring that a point-wise robustness certificate for any given data point is obtained relatively cost-effectively is crucial. This work demonstrates that reducing the number of samples by one or two orders of magnitude can still enable the computation of a slightly smaller robustness radius (commonly ~20% radius reduction) with the same confidence. We provide the mathematical foundation for explaining the phenomenon while experimentally showing promising results on the standard CIFAR-10 and ImageNet datasets.
[247] arXiv:2404.17372 [pdf, other]: Title: CEM-GMsFEM for Poisson equations in heterogeneous perforated domains

Authors: Wei Xie, Yin Yang, Eric Chung, Yunqing Huang

Subjects: Numerical Analysis (math.NA)

In this paper, we propose a novel multiscale model reduction strategy tailored to address the Poisson equation within heterogeneous perforated domains. The numerical simulation of this intricate problem is impeded by its multiscale characteristics, necessitating an exceptionally fine mesh to adequately capture all relevant details. To overcome the challenges inherent in the multiscale nature of the perforations, we introduce a coarse space constructed using the Constraint Energy Minimizing Generalized Multiscale Finite Element Method (CEM-GMsFEM). This involves constructing basis functions through a sequence of local energy minimization problems over eigenspaces containing localized information pertaining to the heterogeneities. Through our analysis, we demonstrate that the oversampling layers depend on the local eigenvalues, thereby implicating the local geometry as well. Additionally, we provide numerical examples to illustrate the efficacy of the proposed scheme.
[248] arXiv:2404.17379 [pdf, ps, other]: Title: Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning

Authors: Hao Liu, Yi Shen, Wenjing Zhou, Yuelin Zou, Chang Zhou, Shuyao He

Subjects: Robotics (cs.RO)

In order to solve the problem of frequent deceleration of unmanned vehicles when approaching obstacles, this article uses a Deep Q-Network (DQN) and its extension, the Double Deep Q-Network (DDQN), to develop a local navigation system that adapts to obstacles while maintaining optimal speed planning. By integrating improved reward functions and obstacle angle determination methods, the system demonstrates significant enhancements in maneuvering capabilities without frequent decelerations. Experiments conducted in simulated environments with varying obstacle densities confirm the effectiveness of the proposed method in achieving more stable and efficient path planning.
[249] arXiv:2404.17381 [pdf, other]: Title: Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

Authors: Shun Maeda, Chunzhi Gu, Jun Yu, Shogo Tokai, Shangce Gao, Chao Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples. Compared to prior human-related anomaly detection tasks which primarily focus on unusual events from videos, HAAD involves the learning of specific action labels to recognize semantically anomalous human behaviors. To address this task, we propose a normalizing flow (NF)-based detection framework where the sample likelihood is effectively leveraged to indicate anomalies. As action anomalies often occur in some specific body parts, in addition to the full-body action feature learning, we incorporate extra encoding streams into our framework for a finer modeling of body subsets. Our framework is thus multi-level to jointly discover global and local motion anomalies. Furthermore, to show awareness of the potentially jittery data during recording, we resort to discrete cosine transformation by converting the action samples from the temporal to the frequency domain to mitigate the issue of data instability. Extensive experimental results on two human action datasets demonstrate that our method outperforms the baselines formed by adapting state-of-the-art human activity AD approaches to our task of HAAD.
[250] arXiv:2404.17390 [pdf, other]: Title: How Could AI Support Design Education? A Study Across Fields Fuels Situating Analytics

Authors: Ajit Jain, Andruid Kerne, Hannah Fowler, Jinsil Seo, Galen Newman, Nic Lupfer, Aaron Perrine

Comments: 31 pages, 3 figures, Submitted to ACM

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

We use the process and findings from a case study of design educators' practices of assessment and feedback to fuel theorizing about how to make AI useful in service of human experience. We build on Suchman's theory of situated actions. We perform a qualitative study of 11 educators in 5 fields, who teach design processes situated in project-based learning contexts. Through qualitative data gathering and analysis, we derive codes: design process; assessment and feedback challenges; and computational support.
We twice invoke creative cognition's family resemblance principle. First, to explain how design instructors already use assessment rubrics and second, to explain the analogous role for design creativity analytics: no particular trait is necessary or sufficient; each only tends to indicate good design work. Human teachers remain essential. We develop a set of situated design creativity analytics--Fluency, Flexibility, Visual Consistency, Multiscale Organization, and Legible Contrast--to support instructors' efforts, by providing on-demand, learning objectives-based assessment and feedback to students.
We theorize a methodology, which we call situating analytics, firstly because making AI support living human activity depends on aligning what analytics measure with situated practices. Further, we realize that analytics can become most significant to users by situating them through interfaces that integrate them into the material contexts of their use. Here, this means situating design creativity analytics into actual design environments. Through the case study, we identify situating analytics as a methodology for explaining analytics to users, because the iterative process of alignment with practice has the potential to enable data scientists to derive analytics that make sense as part of and support situated human experiences.
[251] arXiv:2404.17391 [pdf, other]: Title: M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training

Authors: Lakmal Meegahapola, Hamza Hassoune, Daniel Gatica-Perez

Comments: Accepted at the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT). Paper will be presented at ACM UbiComp 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Over the years, multimodal mobile sensing has been used extensively for inferences regarding health and well being, behavior, and context. However, a significant challenge hindering the widespread deployment of such models in real world scenarios is the issue of distribution shift. This is the phenomenon where the distribution of data in the training set differs from the distribution of data in the real world, the deployment environment. While extensively explored in computer vision and natural language processing, and while prior research in mobile sensing briefly addresses this concern, current work primarily focuses on models dealing with a single modality of data, such as audio or accelerometer readings, and consequently, there is little research on unsupervised domain adaptation when dealing with multimodal sensor data. To address this gap, we did extensive experiments with domain adversarial neural networks (DANN) showing that they can effectively handle distribution shifts in multimodal sensor data. Moreover, we proposed a novel improvement over DANN, called M3BAT, unsupervised domain adaptation for multimodal mobile sensing with multi-branch adversarial training, to account for the multimodality of sensor data during domain adaptation with multiple branches. Through extensive experiments conducted on two multimodal mobile sensing datasets, three inference tasks, and 14 source-target domain pairs, including both regression and classification, we demonstrate that our approach performs effectively on unseen domains. Compared to directly deploying a model trained in the source domain to the target domain, the model shows performance increases up to 12% AUC (area under the receiver operating characteristics curves) on classification tasks, and up to 0.13 MAE (mean absolute error) on regression tasks.
[252] arXiv:2404.17394 [pdf, other]: Title: Child Speech Recognition in Human-Robot Interaction: Problem Solved?

Authors: Ruben Janssens, Eva Verhelst, Giulio Antonio Abbo, Qiaoqiao Ren, Maria Jose Pinto Bernal, Tony Belpaeme

Comments: Presented at 2024 International Symposium on Technological Advances in Human-Robot Interaction

Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children's speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child speech recognition and social robot applications aimed at children. We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. While transcription is not perfect yet, the best model recognises 60.3% of sentences correctly barring small grammatical differences, with sub-second transcription time running on a local GPU, showing potential for usable autonomous child-robot speech interactions.
[253] arXiv:2404.17395 [pdf, other]: Title: Situational Graphs for Robotic First Responders: an application to dismantling drug labs

Authors: W.J. Meijer, A.C. Kemmeren, J.M. van Bruggen, T. Haije, J.E. Fransman, J.D. van Mil

Comments: IEEE ICRA Workshop on Field Robotics 2024

Subjects: Robotics (cs.RO)

In this work, we support experts in the safety domain with safer dismantling of drug labs, by deploying robots for the initial inspection. Being able to act on the discovered environment is key to enabling this (semi-)autonomous inspection, e.g. to open doors or take a closer at suspicious items. Our approach addresses this with a novel environmental representation, the Behavior-Oriented Situational Graph, where we extend on the classical situational graph by merging a perception-driven backbone with prior actionable knowledge via a situational affordance schema. Linking situations to robot behaviors facilitates both autonomous mission planning and situational understanding of the operator. Planning over the graph is easier and faster, since it directly incorporates actionable information, which is critical for online mission systems. Moreover, the representation allows the human operator to seamlessly transition between different levels of autonomy of the robot, from remote control to behavior execution to full autonomous exploration. We test the effectiveness of our approach in a real-world drug lab scenario at a Dutch police training facility using a mobile Spot robot and use the results to iterate on the system design.
[254] arXiv:2404.17399 [pdf, other]: Title: Evaluations of Machine Learning Privacy Defenses are Misleading

Authors: Michael Aerni, Jie Zhang, Florian Tramèr

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Empirical defenses for machine learning privacy forgo the provable guarantees of differential privacy in the hope of achieving higher utility while resisting realistic adversaries. We identify severe pitfalls in existing empirical privacy evaluations (based on membership inference attacks) that result in misleading conclusions. In particular, we show that prior evaluations fail to characterize the privacy leakage of the most vulnerable samples, use weak attacks, and avoid comparisons with practical differential privacy baselines. In 5 case studies of empirical privacy defenses, we find that prior evaluations underestimate privacy leakage by an order of magnitude. Under our stronger evaluation, none of the empirical defenses we study are competitive with a properly tuned, high-utility DP-SGD baseline (with vacuous provable guarantees).
[255] arXiv:2404.17400 [pdf, other]: Title: Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement

Authors: Zishu Yao, Guodong Fan, Jinfu Fan, Min Gan, C.L. Philip Chen

Comments: 14 page

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range correlations in such images. On the other hand, transformer-based methods that focus on global information face high computational complexities when processing high-resolution remote sensing images. From another perspective, Fourier transform can compute global information without introducing a large number of parameters, enabling the network to more efficiently capture the overall image structure and establish long-range correlations. Therefore, we propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. Specifically, this challenging task of low-light enhancement is divided into two more manageable sub-tasks: the first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. To facilitate information exchange between the two phases, we designed an information fusion affine block that combines data from different phases and scales. Additionally, we have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement. Extensive evaluations show that our method outperforms existing state-of-the-art methods. The code is available at https://github.com/iijjlk/DFFN.
[256] arXiv:2404.17401 [pdf, other]: Title: Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations

Authors: Rémy Decoupes, Roberto Interdonato, Mathieu Roche, Maguelonne Teisseire, Sarah Valentin

Subjects: Computation and Language (cs.CL)

Language models now constitute essential tools for improving efficiency for many professional tasks such as writing, coding, or learning. For this reason, it is imperative to identify inherent biases. In the field of Natural Language Processing, five sources of bias are well-identified: data, annotation, representation, models, and research design. This study focuses on biases related to geographical knowledge. We explore the connection between geography and language models by highlighting their tendency to misrepresent spatial information, thus leading to distortions in the representation of geographical distances. This study introduces four indicators to assess these distortions, by comparing geographical and semantic distances. Experiments are conducted from these four indicators with ten widely used language models. Results underscore the critical necessity of inspecting and rectifying spatial biases in language models to ensure accurate and equitable representations.
[257] arXiv:2404.17403 [pdf, other]: Title: Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries

Authors: Alexandros Tsakpinis, Alexander Pretschner

Comments: 6 pages, 3 figures, accepted at 28th edition of International Conference on Evaluation and Assessment in Software Engineering (EASE 2024)

Subjects: Software Engineering (cs.SE)

Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits. But, they can also present a substantial risk if a vulnerability or attack arises and the community fails to promptly address the issue and release a fix due to inactivity. To be able to monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible. Based on these repositories, integrated libraries of an application can be monitored to observe whether they are adequately maintained. In this descriptive study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries. For all available libraries, we extract assigned repository URLs, direct dependencies and use the page rank algorithm to comprehensively analyze the ecosystems from a library and dependency chain perspective. For invalid repository URLs, we derive potential reasons. Both ecosystems show varying accessibility to GitHub repository URLs, depending on the page rank score of the analyzed libraries. For individual libraries, up to 73.8% of PyPI and up to 69.4% of NPM libraries have repository URLs. Within dependency chains, up to 80.1% of PyPI libraries have URLs, while up to 81.1% for NPM. That means, most libraries, especially the ones of increasing importance, can be monitored on GitHub. Among the most common reasons for invalid repository URLs is no URLs being assigned at all, which amounts up to 17.9% for PyPI and up to 39.6% for NPM. Package maintainers should address this issue and update the repository information to enable monitoring of their libraries.
[258] arXiv:2404.17413 [pdf, ps, other]: Title: Voting with Partial Orders: The Plurality and Anti-Plurality Classes

Authors: Federico Fioravanti, Ulle Endriss

Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)

The Plurality rule for linear orders selects the alternatives most frequently appearing in the first position of those orders, while the Anti-Plurality rule selects the alternatives least often occurring in the final position. We explore extensions of these rules to partial orders, offering axiomatic characterizations for these extensions.
[259] arXiv:2404.17417 [pdf, other]: Title: How do annotations affect Java code readability?

Authors: Eduardo Guerra, Everaldo Gomes, Jeferson Ferreira, Igor Wiese, Phyllipe Lima, Marco Gerosa, Paulo Meirelles

Comments: Accepted to Empirical Software Engineering (EMSE) Journal

Subjects: Software Engineering (cs.SE)

Context: Code annotations have gained widespread popularity in programming languages, offering developers the ability to attach metadata to code elements to define custom behaviors. Many modern frameworks and APIs use annotations to keep integration less verbose and located nearer to the corresponding code element. Despite these advantages, practitioners' anecdotal evidence suggests that annotations might negatively affect code readability. Objective: To better understand this effect, this paper systematically investigates the relationship between code annotations and code readability. Method: In a survey with software developers (n=332), we present 15 pairs of Java code snippets with and without code annotations. These pairs were designed considering five categories of annotation used in real-world Java frameworks and APIs. Survey participants selected the code snippet they considered more readable for each pair and answered an open question about how annotations affect the code's readability. Results: Preferences were scattered for all categories of annotation usage, revealing no consensus among participants. The answers were spread even when segregated by participants' programming or annotation-related experience. Nevertheless, some participants showed a consistent preference in favor or against annotations across all categories, which may indicate a personal preference. Our qualitative analysis of the open-ended questions revealed that participants often praise annotation impacts on design, maintainability, and productivity but expressed contrasting views on understandability and code clarity. Conclusions: Software developers and API designers can consider our results when deciding whether to use annotations, equipped with the insight that developers express contrasting views of the annotations' impact on code readability.
[260] arXiv:2404.17419 [pdf, other]: Title: Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation

Authors: Seungwook Kim, Yichun Shi, Kejie Li, Minsu Cho, Peng Wang

Comments: 5 pages including references, 2 figures, 2 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process. In this work, we delve into the potential of using multiple image prompts, instead of a single image prompt, for 3D generation. Specifically, we build on ImageDream, a novel image-prompt multi-view diffusion model, to support multi-view images as the input prompt. Our method, dubbed MultiImageDream, reveals that transitioning from a single-image prompt to multiple-image prompts enhances the performance of multi-view and 3D object generation according to various quantitative evaluation metrics and qualitative assessments. This advancement is achieved without the necessity of fine-tuning the pre-trained ImageDream multi-view diffusion model.
[261] arXiv:2404.17421 [pdf, other]: Title: Automata-Theoretic Characterisations of Branching-Time Temporal Logics

Authors: Massimo Benerecetti, Laura Bozzelli, Fabio Mogavero, Adriano Peron

Subjects: Logic in Computer Science (cs.LO)

Characterisations theorems serve as important tools in model theory and can be used to assess and compare the expressive power of temporal languages used for the specification and verification of properties in formal methods. While complete connections have been established for the linear-time case between temporal logics, predicate logics, algebraic models, and automata, the situation in the branching-time case remains considerably more fragmented. In this work, we provide an automata-theoretic characterisation of some important branching-time temporal logics, namely CTL* and ECTL* interpreted on arbitrary-branching trees, by identifying two variants of Hesitant Tree Automata that are proved equivalent to those logics. The characterisations also apply to Monadic Path Logic and the bisimulation-invariant fragment of Monadic Chain Logic, again interpreted over trees. These results widen the characterisation landscape of the branching-time case and solve a forty-year-old open question.
[262] arXiv:2404.17422 [pdf, other]: Title: Sibson's formula for higher order Voronoi diagrams

Authors: Mercè Claverol, Andrea de las Heras-Parrilla, Clemens Huemer, Dolores Lara

Subjects: Computational Geometry (cs.CG)

Let $S$ be a set of $n$ points in general position in $\mathbb{R}^d$. The order-$k$ Voronoi diagram of $S$, $V_k(S)$, is a subdivision of $\mathbb{R}^d$ into cells whose points have the same $k$ nearest points of $S$.
Sibson, in his seminal paper from 1980 (A vector identity for the Dirichlet tessellation), gives a formula to express a point $Q$ of $S$ as a convex combination of other points of $S$ by using ratios of volumes of the intersection of cells of $V_2(S)$ and the cell of $Q$ in $V_1(S)$. The natural neighbour interpolation method is based on Sibson's formula. We generalize his result to express $Q$ as a convex combination of other points of $S$ by using ratios of volumes from Voronoi diagrams of any given order.
[263] arXiv:2404.17427 [pdf, other]: Title: Cost-Sensitive Uncertainty-Based Failure Recognition for Object Detection

Authors: Moussa Kassem Sbeyti, Michelle Karg, Christian Wirth, Nadja Klein, Sahin Albayrak

Comments: Accepted with an oral presentation at UAI 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Object detectors in real-world applications often fail to detect objects due to varying factors such as weather conditions and noisy input. Therefore, a process that mitigates false detections is crucial for both safety and accuracy. While uncertainty-based thresholding shows promise, previous works demonstrate an imperfect correlation between uncertainty and detection errors. This hinders ideal thresholding, prompting us to further investigate the correlation and associated cost with different types of uncertainty. We therefore propose a cost-sensitive framework for object detection tailored to user-defined budgets on the two types of errors, missing and false detections. We derive minimum thresholding requirements to prevent performance degradation and define metrics to assess the applicability of uncertainty for failure recognition. Furthermore, we automate and optimize the thresholding process to maximize the failure recognition rate w.r.t. the specified budget. Evaluation on three autonomous driving datasets demonstrates that our approach significantly enhances safety, particularly in challenging scenarios. Leveraging localization aleatoric uncertainty and softmax-based entropy only, our method boosts the failure recognition rate by 36-60\% compared to conventional approaches. Code is available at https://mos-ks.github.io/publications.
[264] arXiv:2404.17428 [pdf, other]: Title: Lower Bounds for the Minimum Spanning Tree Cycle Intersection Problem

Authors: Manuel Dubinsky, Kun-Mao Chao, César Massri, Gabriel Taubin

Comments: arXiv admin note: substantial text overlap with arXiv:2301.07643

Subjects: Discrete Mathematics (cs.DM)

Minimum spanning trees are important tools in the analysis and design of networks. Many practical applications require their computation, ranging from biology and linguistics to economy and telecommunications. The set of cycles of a network has a vector space structure. Given a spanning tree, the set of non-tree edges defines cycles that determine a basis. The intersection of two such cycles is the number of edges they have in common and the intersection number -- denoted $\cap(G)$ -- is the number of non-empty pairwise intersections of the cycles of the basis. The Minimum Spanning Tree Cycle Intersection problem consists in finding a spanning tree such that the intersection number is minimum. This problem is relevant in order to integrate discrete differential forms. In this paper, we present two lower bounds of the intersection number of an arbitrary connected graph $G=(V,E)$. In the first part, we prove the following statement: $$\frac{1}{2}\left(\frac{\nu^2}{n-1} - \nu\right) \leq \cap(G),$$ where $n = |V|$ and $\nu$ is the \emph{cyclomatic number} of $G$. In the second part, based on some experimental results and a new observation, we conjecture the following improved tight lower bound: $$(n-1) \binom{q}{2} + q \ r\leq \cap(G),$$ where $2 \nu = q (n-1) + r$ is the integer division of $2 \nu$ and $n-1$. This is the first result in a general context, that is for an arbitrary connected graph.
[265] arXiv:2404.17433 [pdf, other]: Title: PromptCIR: Blind Compressed Image Restoration with Prompt Learning

Authors: Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

Comments: Winner of NTIRE 2024 Blind Compressed Image Enhancement Challenge

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical quality factor lacks spatial information, preventing network adaptability toward image contents. Recent studies in prompt-learning-based image restoration have showcased the potential of prompts to generalize across varied degradation types and degrees. This motivated us to design a prompt-learning-based compressed image restoration network, dubbed PromptCIR, which can effectively restore images from various compress levels. Specifically, PromptCIR exploits prompts to encode compression information implicitly, where prompts directly interact with soft weights generated from image features, thus providing dynamic content-aware and distortion-aware guidance for the restoration process. The light-weight prompts enable our method to adapt to different compression levels, while introducing minimal parameter overhead. Overall, PromptCIR leverages the powerful transformer-based backbone with the dynamic prompt module to proficiently handle blind CIR tasks, winning first place in the NTIRE 2024 challenge of blind compressed image enhancement track. Extensive experiments have validated the effectiveness of our proposed PromptCIR. The code is available at https://github.com/lbc12345/PromptCIR-NTIRE24.
[266] arXiv:2404.17434 [pdf, other]: Title: Exploring Wireless Channels in Rural Areas: A Comprehensive Measurement Study

Authors: Tianyi Zhang, Guoying Zu, Taimoor Ul Islam, Evan Gossling, Sarath Babu, Daji Qiao, Hongwei Zhang

Subjects: Networking and Internet Architecture (cs.NI)

The study of wireless channel behavior has been an active research topic for many years. However, there exists a noticeable scarcity of studies focusing on wireless channel characteristics in rural areas. With the advancement of smart agriculture practices in rural regions, there has been an increasing demand for affordable, high-capacity, and low-latency wireless networks to support various precision agriculture applications such as plant phenotyping, livestock health monitoring, and agriculture automation. To address this research gap, we conducted a channel measurement study on multiple wireless frequency bands at various crop and livestock farms near Ames, Iowa, based on Iowa State University~(ISU)'s ARA Wireless Living lab - one of the NSF PAWR platforms. We specifically investigate the impact of weather conditions, humidity, temperature, and farm buildings on wireless channel behavior. The resulting measurement dataset, which will soon be made publicly accessible, represents a valuable resource for researchers interested in wireless channel prediction and optimization.
[267] arXiv:2404.17438 [pdf, other]: Title: Real-World Deployment of a Hierarchical Uncertainty-Aware Collaborative Multiagent Planning System

Authors: Martina Stadler Kurtz, Samuel Prentice, Yasmin Veys, Long Quang, Carlos Nieto-Granda, Michael Novitzky, Ethan Stump, Nicholas Roy

Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

We would like to enable a collaborative multiagent team to navigate at long length scales and under uncertainty in real-world environments. In practice, planning complexity scales with the number of agents in the team, with the length scale of the environment, and with environmental uncertainty. Enabling tractable planning requires developing abstract models that can represent complex, high-quality plans. However, such models often abstract away information needed to generate directly-executable plans for real-world agents in real-world environments, as planning in such detail, especially in the presence of real-world uncertainty, would be computationally intractable. In this paper, we describe the deployment of a planning system that used a hierarchy of planners to execute collaborative multiagent navigation tasks in real-world, unknown environments. By developing a planning system that was robust to failures at every level of the planning hierarchy, we enabled the team to complete collaborative navigation tasks, even in the presence of imperfect planning abstractions and real-world uncertainty. We deployed our approach on a Clearpath Husky-Jackal team navigating in a structured outdoor environment, and demonstrated that the system enabled the agents to successfully execute collaborative plans.
[268] arXiv:2404.17439 [pdf, other]: Title: Enhancing QoE in HTTP/3 using EPS Framework

Authors: Abhinav Gupta, Radim Bartos

Subjects: Networking and Internet Architecture (cs.NI)

HTTP/3, the latest evolution of the Hypertext Transfer Protocol, utilizes QUIC, a new transport protocol leveraging UDP to overcome limitations such as connection time and head-of-line blocking prevalent in HTTP/2. This advancement is enhanced by the Extensible Prioritization Scheme (EPS), which introduces a flexible prioritization framework for improving website resource delivery. This paper proposes a mixed scheduling mechanism that delivers using mixed incremental and non-incremental resource delivery and adheres to EPS urgency levels to improve the QoE. Additionally, we propose an EPS priority mapping to enhance the QoE further. This mapping is based on the priority indicated by the Chromium browser and the resource type. The result of the experimental evaluation indicates that the proposed mechanism and mapping improve commonly-used website performance measures for sites featuring a comparatively large number and size of resources.
[269] arXiv:2404.17443 [pdf, other]: Title: "ChatGPT Is Here to Help, Not to Replace Anybody" -- An Evaluation of Students' Opinions On Integrating ChatGPT In CS Courses

Authors: Bruno Pereira Cipriano, Pedro Alves

Comments: Author's version: this is a paper under revision

Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Large Language Models (LLMs) like GPT and Bard are capable of producing code based on textual descriptions, with remarkable efficacy. Such technology will have profound implications for computing education, raising concerns about cheating, excessive dependence, and a decline in computational thinking skills, among others. There has been extensive research on how teachers should handle this challenge but it is also important to understand how students feel about this paradigm shift. In this research, 52 first-year CS students were surveyed in order to assess their views on technologies with code-generation capabilities, both from academic and professional perspectives. Our findings indicate that while students generally favor the academic use of GPT, they don't over rely on it, only mildly asking for its help. Although most students benefit from GPT, some struggle to use it effectively, urging the need for specific GPT training. Opinions on GPT's impact on their professional lives vary, but there is a consensus on its importance in academic practice.
[270] arXiv:2404.17451 [pdf, other]: Title: Any-Quantile Probabilistic Forecasting of Short-Term Electricity Demand

Authors: Slawek Smyl, Boris N. Oreshkin, Paweł Pełka, Grzegorz Dudek

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Power systems operate under uncertainty originating from multiple factors that are impossible to account for deterministically. Distributional forecasting is used to control and mitigate risks associated with this uncertainty. Recent progress in deep learning has helped to significantly improve the accuracy of point forecasts, while accurate distributional forecasting still presents a significant challenge. In this paper, we propose a novel general approach for distributional forecasting capable of predicting arbitrary quantiles. We show that our general approach can be seamlessly applied to two distinct neural architectures leading to the state-of-the-art distributional forecasting results in the context of short-term electricity demand forecasting task. We empirically validate our method on 35 hourly electricity demand time-series for European countries. Our code is available here: https://github.com/boreshkinai/any-quantile.
[271] arXiv:2404.17452 [pdf, other]: Title: A Continuous Relaxation for Discrete Bayesian Optimization

Authors: Richard Michael, Simon Bartels, Miguel González-Duque, Yevgen Zainchkovskyy, Jes Frellsen, Søren Hauberg, Wouter Boomsma

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

To optimize efficiently over discrete data and with only few available target observations is a challenge in Bayesian optimization. We propose a continuous relaxation of the objective function and show that inference and optimization can be computationally tractable. We consider in particular the optimization domain where very few observations and strict budgets exist; motivated by optimizing protein sequences for expensive to evaluate bio-chemical properties. The advantages of our approach are two-fold: the problem is treated in the continuous setting, and available prior knowledge over sequences can be incorporated directly. More specifically, we utilize available and learned distributions over the problem domain for a weighting of the Hellinger distance which yields a covariance function. We show that the resulting acquisition function can be optimized with both continuous or discrete optimization algorithms and empirically assess our method on two bio-chemical sequence optimization tasks.
[272] arXiv:2404.17454 [pdf, other]: Title: Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond

Authors: Kaichen Xu, Yueyang Ding, Suyang Hou, Weiqiang Zhan, Nisang Chen, Jun Wang, Xiaobo Sun

Comments: 17 pages, 2 figures. Accepted by IJCAI 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Fined-grained anomalous cell detection from affected tissues is critical for clinical diagnosis and pathological research. Single-cell sequencing data provide unprecedented opportunities for this task. However, current anomaly detection methods struggle to handle domain shifts prevalent in multi-sample and multi-domain single-cell sequencing data, leading to suboptimal performance. Moreover, these methods fall short of distinguishing anomalous cells into pathologically distinct subtypes. In response, we propose ACSleuth, a novel, reconstruction deviation-guided generative framework that integrates the detection, domain adaptation, and fine-grained annotating of anomalous cells into a methodologically cohesive workflow. Notably, we present the first theoretical analysis of using reconstruction deviations output by generative models for anomaly detection in lieu of domain shifts. This analysis informs us to develop a novel and superior maximum mean discrepancy-based anomaly scorer in ACSleuth. Extensive benchmarks over various single-cell data and other types of tabular data demonstrate ACSleuth's superiority over the state-of-the-art methods in identifying and subtyping anomalies in multi-sample and multi-domain contexts. Our code is available at https://github.com/Catchxu/ACsleuth.
[273] arXiv:2404.17456 [pdf, other]: Title: Converting High-Performance and Low-Latency SNNs through Explicit Modelling of Residual Error in ANNs

Authors: Zhipeng Huang, Jianhao Ding, Zhiyu Pan, Haoran Li, Ying Fang, Zhaofei Yu, Jian K. Liu

Subjects: Neural and Evolutionary Computing (cs.NE)

Spiking neural networks (SNNs) have garnered interest due to their energy efficiency and superior effectiveness on neuromorphic chips compared with traditional artificial neural networks (ANNs). One of the mainstream approaches to implementing deep SNNs is the ANN-SNN conversion, which integrates the efficient training strategy of ANNs with the energy-saving potential and fast inference capability of SNNs. However, under extreme low-latency conditions, the existing conversion theory suggests that the problem of misrepresentation of residual membrane potentials in SNNs, i.e., the inability of IF neurons with a reset-by-subtraction mechanism to respond to residual membrane potentials beyond the range from resting potential to threshold, leads to a performance gap in the converted SNNs compared to the original ANNs. This severely limits the possibility of practical application of SNNs on delay-sensitive edge devices. Existing conversion methods addressing this problem usually involve modifying the state of the conversion spiking neurons. However, these methods do not consider their adaptability and compatibility with neuromorphic chips. We propose a new approach based on explicit modeling of residual errors as additive noise. The noise is incorporated into the activation function of the source ANN, which effectively reduces the residual error. Our experiments on the CIFAR10/100 dataset verify that our approach exceeds the prevailing ANN-SNN conversion methods and directly trained SNNs concerning accuracy and the required time steps. Overall, our method provides new ideas for improving SNN performance under ultra-low-latency conditions and is expected to promote practical neuromorphic hardware applications for further development.
[274] arXiv:2404.17460 [pdf, other]: Title: Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System

Authors: Robin Schmucker, Meng Xia, Amos Azaria, Tom Mitchell

Comments: arXiv admin note: substantial text overlap with arXiv:2310.01420

Subjects: Computation and Language (cs.CL)

Conversational tutoring systems (CTSs) offer learning experiences through interactions based on natural language. They are recognized for promoting cognitive engagement and improving learning outcomes, especially in reasoning tasks. Nonetheless, the cost associated with authoring CTS content is a major obstacle to widespread adoption and to research on effective instructional design. In this paper, we discuss and evaluate a novel type of CTS that leverages recent advances in large language models (LLMs) in two ways: First, the system enables AI-assisted content authoring by inducing an easily editable tutoring script automatically from a lesson text. Second, the system automates the script orchestration in a learning-by-teaching format via two LLM-based agents (Ruffle&Riley) acting as a student and a professor. The system allows for free-form conversations that follow the ITS-typical inner and outer loop structure. We evaluate Ruffle&Riley's ability to support biology lessons in two between-subject online user studies (N = 200) comparing the system to simpler QA chatbots and reading activity. Analyzing system usage patterns, pre/post-test scores and user experience surveys, we find that Ruffle&Riley users report high levels of engagement, understanding and perceive the offered support as helpful. Even though Ruffle&Riley users require more time to complete the activity, we did not find significant differences in short-term learning gains over the reading activity. Our system architecture and user study provide various insights for designers of future CTSs. We further open-source our system to support ongoing research on effective instructional design of LLM-based learning technologies.
[275] arXiv:2404.17461 [pdf, other]: Title: Multi-layer random features and the approximation power of neural networks

Authors: Rustem Takhanov

Comments: Accepted to Uncertainty in Artificial Intelligence (UAI) 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

A neural architecture with randomly initialized weights, in the infinite width limit, is equivalent to a Gaussian Random Field whose covariance function is the so-called Neural Network Gaussian Process kernel (NNGP). We prove that a reproducing kernel Hilbert space (RKHS) defined by the NNGP contains only functions that can be approximated by the architecture. To achieve a certain approximation error the required number of neurons in each layer is defined by the RKHS norm of the target function. Moreover, the approximation can be constructed from a supervised dataset by a random multi-layer representation of an input vector, together with training of the last layer's weights.
For a 2-layer NN and a domain equal to an $n-1$-dimensional sphere in ${\mathbb R}^n$, we compare the number of neurons required by Barron's theorem and by the multi-layer features construction. We show that if eigenvalues of the integral operator of the NNGP decay slower than $k^{-n-\frac{2}{3}}$ where $k$ is an order of an eigenvalue, then our theorem guarantees a more succinct neural network approximation than Barron's theorem. We also make some computational experiments to verify our theoretical findings. Our experiments show that realistic neural networks easily learn target functions even when both theorems do not give any guarantees.
[276] arXiv:2404.17462 [pdf, other]: Title: Integrated Sensing and Communication Channel Modeling: A Survey

Authors: Zhiqing Wei, Jinzhu Jia, Yangyang Niu, Lin Wang, Huici Wu, Heng Yang, Zhiyong Feng

Subjects: Networking and Internet Architecture (cs.NI)

Integrated sensing and communication (ISAC) is expected to play a crucial role in the sixth-generation (6G) mobile communication systems, offering potential applications in the scenarios of intelligent transportation, smart factories, etc. The performance of radar sensing in ISAC systems is closely related to the characteristics of radar sensing and communication channels. Therefore, ISAC channel modeling serves as a fundamental cornerstone for evaluating and optimizing ISAC systems. This article provides a comprehensive survey on the ISAC channel modeling methods. Furthermore, the methods of target radar cross section (RCS) modeling and clutter RCS modeling are summarized. Finally, we discuss the future research trends related to ISAC channel modeling in various scenarios.
[277] arXiv:2404.17465 [pdf, ps, other]: Title: Fast Abstracts and Student Forum Proceedings -- EDCC 2024 -- 19th European Dependable Computing Conference

Authors: Simona Bernardi, Tommaso Zoppi

Subjects: Software Engineering (cs.SE); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Robotics (cs.RO)

The goal of the Fast Abstracts track is to bring together researchers and practitioners working on dependable computing to discuss work in progress or opinion pieces. Contributions are welcome from academia and industry. Fast Abstracts aim to serve as a rapid and flexible mechanism to: (i) Report on current work that may or may not be complete; (ii) Introduce new ideas to the community; (iii) State positions on controversial issues or open problems; (iv) Share lessons learnt from real-word dependability engineering; and (v) Debunk or question results from other papers based on contra-indications. The Student Forum aims at creating a vibrant and friendly environment where students can present and discuss their work, and exchange ideas and experiences with other students, researchers and industry. One of the key goals of the Forum is to provide students with feedback on their preliminary results that might help with their future research directions.
[278] arXiv:2404.17471 [pdf, other]: Title: Multicontinuum homogenization in perforated domains

Authors: Wei Xie, Yalchin Efendiev, Yunqing Huang, Wing Tat Leung, Yin Yang

Subjects: Numerical Analysis (math.NA)

In this paper, we develop a general framework for multicontinuum homogenization in perforated domains. The simulations of problems in perforated domains are expensive and, in many applications, coarse-grid macroscopic models are developed. Many previous approaches include homogenization, multiscale finite element methods, and so on. In our paper, we design multicontinuum homogenization based on our recently proposed framework. In this setting, we distinguish different spatial regions in perforations based on their sizes. For example, very thin perforations are considered as one continua, while larger perforations are considered as another continua. By differentiating perforations in this way, we are able to predict flows in each of them more accurately. We present a framework by formulating cell problems for each continuum using appropriate constraints for the solution averages and their gradients. These cell problem solutions are used in a multiscale expansion and in deriving novel macroscopic systems for multicontinuum homogenization. Our proposed approaches are designed for problems without scale separation. We present numerical results for two continuum problems and demonstrate the accuracy of the proposed methods.
[279] arXiv:2404.17472 [pdf, other]: Title: MIMO in network simulators: Design, implementation and evaluation of single-user MIMO in ns-3 5G-LENA

Authors: Biljana Bojovic, Sandra Lagen

Comments: 43 pages, 8 figures (with subfigures)

Subjects: Networking and Internet Architecture (cs.NI)

MIMO technology has been studied in textbooks for several decades, and it has been adopted in 4G and 5G systems. Due to the recent evolution in 5G and beyond networks, designed to cover a wide range of use cases with every time more complex applications, it is essential to have network simulation tools (such as ns-3) to evaluate MIMO performance from the network perspective, before real implementation. Up to date, the well-known ns-3 simulator has been missing the inclusion of single-user MIMO (SU-MIMO) models for 5G. In this paper, we detail the implementation models and provide an exhaustive evaluation of SU-MIMO in the 5G-LENA module of ns-3. As per 3GPP 5G, we adopt a hybrid beamforming architecture and a closed-loop MIMO mechanism and follow all 3GPP specifications for MIMO implementation, including channel state information feedback with precoding matrix indicator and rank indicator reports, and codebook-based precoding following Precoding Type-I (used for SU-MIMO). The simulation models are released in open-source and currently support up to 32 antenna ports and 4 streams per user. The simulation results presented in this paper help in testing and verifying the simulated models, for different multi-antenna array and antenna ports configurations.
[280] arXiv:2404.17473 [pdf, other]: Title: Consistent Second Moment Methods with Scalable Linear Solvers for Radiation Transport

Authors: Samuel Olivier, Ben S. Southworth, James S. Warsa, HyeongKae Park

Subjects: Numerical Analysis (math.NA)

Second Moment Methods (SMMs) are developed that are consistent with the Discontinuous Galerkin (DG) spatial discretization of the discrete ordinates (or \Sn) transport equations. The low-order (LO) diffusion system of equations is discretized with fully consistent \Pone, Local Discontinuous Galerkin (LDG), and Interior Penalty (IP) methods. A discrete residual approach is used to derive SMM correction terms that make each of the LO systems consistent with the high-order (HO) discretization. We show that the consistent methods are more accurate and have better solution quality than independently discretized LO systems, that they preserve the diffusion limit, and that the LDG and IP consistent SMMs can be scalably solved in parallel on a challenging, multi-material benchmark problem.
[281] arXiv:2404.17474 [pdf, other]: Title: Establishing best practices for modeling long duration energy storage in deeply decarbonized energy systems

Authors: Gabriel Mantegna, Wilson Ricks, Aneesha Manocha, Neha Patankar, Dharik Mallapragada, Jesse Jenkins

Comments: Working paper

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Long duration energy storage (LDES) may become a critical technology for the decarbonization of the power sector, as current commercially available Li-ion battery storage technologies cannot cost-effectively shift energy to address multi-day or seasonal variability in demand and renewable energy availability. LDES is difficult to model in existing energy system planning models (such as electricity system capacity expansion models), as it is much more dependent on an accurate representation of chronology than other resources. Techniques exist for modeling LDES in these planning models; however, it is not known how spatial and temporal resolution affect the performance of these techniques, creating a research gap. In this study we examine what spatial and temporal resolution is necessarily to accurately capture the full value of LDES, in the context of a continent-scale capacity expansion model. We use the results to draw conclusions and present best practices for modelers seeking to accurately model LDES in a macro-energy systems planning context. Our key findings are: 1) modeling LDES with linked representative periods is crucial to capturing its full value, 2) LDES value is highly sensitive to the cost and availability of other resources, and 3) temporal resolution is more important than spatial resolution for capturing the full value of LDES, although how much temporal resolution is needed will depend on the specific model context.
[282] arXiv:2404.17475 [pdf, other]: Title: CEval: A Benchmark for Evaluating Counterfactual Text Generation

Authors: Van Bach Nguyen, Jörg Schlötterer, Christin Seifert

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, includes common counterfactual datasets with human annotations, standard baselines (MICE, GDBA, CREST) and the open-source language model LLAMA-2. Our experiments found no perfect method for generating counterfactual text. Methods that excel at counterfactual metrics often produce lower-quality text while LLMs with simple prompts generate high-quality text but struggle with counterfactual criteria. By making CEval available as an open-source Python library, we encourage the community to contribute more methods and maintain consistent evaluation in future work.
[283] arXiv:2404.17477 [pdf, other]: Title: A multi-agent model of hierarchical decision dynamics

Authors: Paul Kinsler

Comments: 7 pages, 6 figures

Subjects: Multiagent Systems (cs.MA)

Decision making can be difficult when there are many actors (or agents) who may be coordinating or competing to achieve their various ideas of the optimum outcome. Here I present a simple decision making model with an explicitly hierarchical binary-tree structure, and evaluate how this might cooperate to take actions that match its various evaluations of the uncertain state of the world. Key features of agent behaviour are (a) the separation of its decision making process into three distinct steps: observation, judgement, and action; and (b) the evolution of coordination by the sharing of judgements.
[284] arXiv:2404.17479 [pdf, other]: Title: Scalable Adaptive Traffic Light Control Over a Traffic Network Including Turns, Transit Delays, and Blocking

Authors: Yingqing Chen, Christos G. Cassandras

Comments: arXiv admin note: substantial text overlap with arXiv:2305.09024

Subjects: Systems and Control (eess.SY)

We develop adaptive data-driven traffic light controllers for a grid-like traffic network considering straight, left-turn, and right-turn traffic flows. The analysis incorporates transit delays and blocking effects on vehicle movements between neighboring intersections. Using a stochastic hybrid system model with parametric traffic light controllers, we use Infinitesimal Perturbation Analysis (IPA) to derive a data-driven cost gradient estimator with respect to controllable parameters. We then iteratively adjust them through an online gradient-based algorithm to improve performance metrics. By integrating a flexible modeling framework to represent diverse intersection and traffic network configurations with event-driven IPA-based adaptive controllers, we develop a general scalable, adaptive framework for real-time traffic light control in multi-intersection traffic networks.
[285] arXiv:2404.17481 [pdf, other]: Title: ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations

Authors: Tyler Loakman, Chenghua Lin

Comments: Accepted to HumEval at LREC-Coling 2024

Subjects: Computation and Language (cs.CL)

This paper presents a partial reproduction of Generating Fact Checking Explanations by Anatanasova et al (2020) as part of the ReproHum element of the ReproNLP shared task to reproduce the findings of NLP research regarding human evaluation. This shared task aims to investigate the extent to which NLP as a field is becoming more or less reproducible over time. Following the instructions provided by the task organisers and the original authors, we collect relative rankings of 3 fact-checking explanations (comprising a gold standard and the outputs of 2 models) for 40 inputs on the criteria of Coverage. The results of our reproduction and reanalysis of the original work's raw results lend support to the original findings, with similar patterns seen between the original work and our reproduction. Whilst we observe slight variation from the original results, our findings support the main conclusions drawn by the original authors pertaining to the efficacy of their proposed models.
[286] arXiv:2404.17484 [pdf, other]: Title: Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model

Authors: Zhenghong Li, Jiaxiang Ren, Wensheng Cheng, Congwu Du, Yingtian Pan, Haibin Ling

Comments: 19 pages, 5 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Optical Doppler Tomography (ODT) is a blood flow imaging technique popularly used in bioengineering applications. The fundamental unit of ODT is the 1D frequency response along the A-line (depth), named raw A-scan. A 2D ODT image (B-scan) is obtained by first sensing raw A-scans along the B-line (width), and then constructing the B-scan from these raw A-scans via magnitude-phase analysis and post-processing. To obtain a high-resolution B-scan with a precise flow map, densely sampled A-scans are required in current methods, causing both computational and storage burdens. To address this issue, in this paper we propose a novel sparse reconstruction framework with four main sequential steps: 1) early magnitude-phase fusion that encourages rich interaction of the complementary information in magnitude and phase, 2) State Space Model (SSM)-based representation learning, inspired by recent successes in Mamba and VMamba, to naturally capture both the intra-A-scan sequential information and between-A-scan interactions, 3) an Inception-based Feedforward Network module (IncFFN) to further boost the SSM-module, and 4) a B-line Pixel Shuffle (BPS) layer to effectively reconstruct the final results. In the experiments on real-world animal data, our method shows clear effectiveness in reconstruction accuracy. As the first application of SSM for image reconstruction tasks, we expect our work to inspire related explorations in not only efficient ODT imaging techniques but also generic image enhancement.
[287] arXiv:2404.17485 [pdf, other]: Title: A Survey on Industrial Internet of Things (IIoT) Testbeds for Connectivity Research

Authors: Tianyu Zhang, Chuanyu Xue, Jiachen Wang, Zelin Yun, Natong Lin, Song Han

Subjects: Networking and Internet Architecture (cs.NI)

Industrial Internet of Things (IIoT) technologies have revolutionized industrial processes, enabling smart automation, real-time data analytics, and improved operational efficiency across diverse industry sectors. IIoT testbeds play a critical role in advancing IIoT research and development (R&D) to provide controlled environments for technology evaluation before their real-world deployment. In this article, we conduct a comprehensive literature review on existing IIoT testbeds, aiming to identify benchmark performance, research gaps and explore emerging trends in IIoT systems. We first review the state-of-the-art resource management solutions proposed for IIoT applications. We then categorize the reviewed testbeds according to their deployed communication protocols (including TSN, IEEE 802.15.4, IEEE 802.11 and 5G) and discuss the design and usage of each testbed. Driven by the knowledge gained during this study, we present suggestions and good practices for researchers and practitioners who are planning to design and develop IIoT testbeds for connectivity research.
[288] arXiv:2404.17486 [pdf, other]: Title: TextGaze: Gaze-Controllable Face Generation with Natural Language

Authors: Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang

Comments: Under review

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generating face image with specific gaze information has attracted considerable attention. Existing approaches typically input gaze values directly for face generation, which is unnatural and requires annotated gaze datasets for training, thereby limiting its application. In this paper, we present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describe human gaze and head behavior and generates corresponding face images. Our work first introduces a text-of-gaze dataset containing over 90k text descriptions spanning a dense distribution of gaze and head poses. We further propose a gaze-controllable text-to-face method. Our method contains a sketch-conditioned face diffusion module and a model-based sketch diffusion module. We define a face sketch based on facial landmarks and eye segmentation map. The face diffusion module generates face images from the face sketch, and the sketch diffusion module employs a 3D face model to generate face sketch from text description. Experiments on the FFHQ dataset show the effectiveness of our method. We will release our dataset and code for future research.
[289] arXiv:2404.17487 [pdf, other]: Title: Conformal Prediction with Learned Features

Authors: Shayan Kiyani, George Pappas, Hamed Hassani

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In this paper, we focus on the problem of conformal prediction with conditional guarantees. Prior work has shown that it is impossible to construct nontrivial prediction sets with full conditional coverage guarantees. A wealth of research has considered relaxations of full conditional guarantees, relying on some predefined uncertainty structures. Departing from this line of thinking, we propose Partition Learning Conformal Prediction (PLCP), a framework to improve conditional validity of prediction sets through learning uncertainty-guided features from the calibration data. We implement PLCP efficiently with alternating gradient descent, utilizing off-the-shelf machine learning models. We further analyze PLCP theoretically and provide conditional guarantees for infinite and finite sample sizes. Finally, our experimental results over four real-world and synthetic datasets show the superior performance of PLCP compared to state-of-the-art methods in terms of coverage and length in both classification and regression scenarios.
[290] arXiv:2404.17488 [pdf, other]: Title: Low Cost Machine Vision for Insect Classification

Authors: Danja Brandt, Martin Tschaikner, Teodor Chiaburu, Henning Schmidt, Ilona Schrimpf, Alexandra Stadel, Ingeborg E. Beckers, Frank Haußer

Journal-ref: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 824. Springer

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Preserving the number and diversity of insects is one of our society's most important goals in the area of environmental sustainability. A prerequisite for this is a systematic and up-scaled monitoring in order to detect correlations and identify countermeasures. Therefore, automatized monitoring using live traps is important, but so far there is no system that provides image data of sufficient detailed information for entomological classification.
In this work, we present an imaging method as part of a multisensor system developed as a low-cost, scalable, open-source system that is adaptable to classical trap types. The image quality meets the requirements needed for classification in the taxonomic tree. Therefore, illumination and resolution have been optimized and motion artefacts have been suppressed. The system is evaluated exemplarily on a dataset consisting of 16 insect species of the same as well as different genus, family and order. We demonstrate that standard CNN-architectures like ResNet50 (pretrained on iNaturalist data) or MobileNet perform very well for the prediction task after re-training. Smaller custom made CNNs also lead to promising results. Classification accuracy of $>96\%$ has been achieved. Moreover, it was proved that image cropping of insects is necessary for classification of species with high inter-class similarity.
[291] arXiv:2404.17489 [pdf, other]: Title: Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

Authors: Wei Cui, Rasa Hosseinzadeh, Junwei Ma, Tongzi Wu, Yi Sui, Keyvan Golestan

Comments: 14 pages, 4 algorithms, 3 figures, 5 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Contrastive learning is a model pre-training technique by first creating similar views of the original data, and then encouraging the data and its corresponding views to be close in the embedding space. Contrastive learning has witnessed success in image and natural language data, thanks to the domain-specific augmentation techniques that are both intuitive and effective. Nonetheless, in tabular domain, the predominant augmentation technique for creating views is through corrupting tabular entries via swapping values, which is not as sound or effective. We propose a simple yet powerful improvement to this augmentation technique: corrupting tabular data conditioned on class identity. Specifically, when corrupting a specific tabular entry from an anchor row, instead of randomly sampling a value in the same feature column from the entire table uniformly, we only sample from rows that are identified to be within the same class as the anchor row. We assume the semi-supervised learning setting, and adopt the pseudo labeling technique for obtaining class identities over all table rows. We also explore the novel idea of selecting features to be corrupted based on feature correlation structures. Extensive experiments show that the proposed approach consistently outperforms the conventional corruption method for tabular data classification tasks. Our code is available at https://github.com/willtop/Tabular-Class-Conditioned-SSL.
[292] arXiv:2404.17492 [pdf, other]: Title: Regular Expressions with Backreferences and Lookaheads Capture NLOG

Authors: Yuya Uezato

Comments: Author's version of a paper accepted at ICALP 2024

Subjects: Formal Languages and Automata Theory (cs.FL); Computational Complexity (cs.CC)

Backreferences and lookaheads are vital features to make classical regular expressions (REGEX) practical. Although these features have been widely used, understanding of the unrestricted combination of them has been limited. Practically, most likely no implementation fully supports them. Theoretically, while some studies have addressed these features separately, few have dared to combine them. In those few studies, it has been made clear that the amalgamation of these features renders REGEX significantly expressive. However, no acceptable expressivity bound for REWBLk$\unicode{x2014}$REGEX with backreferences and lookaheads$\unicode{x2014}$has been established.
We elucidate this by establishing that REWBLk coincides with NLOG, the class of languages accepted by log-space nondeterministic Turing machines (NTMs). In translating REWBLk to log-space NTMs, negative lookaheads are the most challenging part since it essentially requires complementing log-space NTMs in nondeterministic log-space. To address this problem, we revisit Immerman$\unicode{x2013}$Szelepcs\'enyi theorem. In addition, we employ log-space nested-oracles NTMs to naturally handle nested lookaheads of REWBLk. Utilizing such oracle machines, we also present the new result that the membership problem of REWBLk is PSPACE-complete.
[293] arXiv:2404.17493 [pdf, other]: Title: Causally Abstracted Multi-armed Bandits

Authors: Fabio Massimo Zennaro, Nicholas Bishop, Joel Dyer, Yorgos Felekis, Anisoara Calinescu, Michael Wooldridge, Theodoros Damoulas

Comments: 8 pages, 3 figures (main article); 20 pages, 10 figures (appendix); 40th Conference on Uncertainty in Artificial Intelligence (UAI)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.
[294] arXiv:2404.17497 [pdf, ps, other]: Title: Merchants of Vulnerabilities: How Bug Bounty Programs Benefit Software Vendors

Authors: Esther Gal-Or, Muhammad Zia Hydari, Rahul Telang

Subjects: Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT); General Economics (econ.GN)

Software vulnerabilities enable exploitation by malicious hackers, compromising systems and data security. This paper examines bug bounty programs (BBPs) that incentivize ethical hackers to discover and responsibly disclose vulnerabilities to software vendors. Using game-theoretic models, we capture the strategic interactions between software vendors, ethical hackers, and malicious hackers. First, our analysis shows that software vendors can increase expected profits by participating in BBPs, explaining their growing adoption and the success of BBP platforms. Second, we find that vendors with BBPs will release software earlier, albeit with more potential vulnerabilities, as BBPs enable coordinated vulnerability disclosure and mitigation. Third, the optimal number of ethical hackers to invite to a BBP depends solely on the expected number of malicious hackers seeking exploitation. This optimal number of ethical hackers is lower than but increases with the expected malicious hacker count. Finally, higher bounties incentivize ethical hackers to exert more effort, thereby increasing the probability that they will discover severe vulnerabilities first while reducing the success probability of malicious hackers. These findings highlight BBPs' potential benefits for vendors beyond profitability. Earlier software releases are enabled by managing risks through coordinated disclosure. As cybersecurity threats evolve, BBP adoption will likely gain momentum, providing vendors with a valuable tool for enhancing security posture and stakeholder trust. Moreover, BBPs envelop vulnerability identification and disclosure into new market relationships and transactions, impacting software vendors' incentives regarding product security choices like release timing.
[295] arXiv:2404.17498 [pdf, other]: Title: Learning text-to-video retrieval from image captioning

Authors: Lucas Ventura, Cordelia Schmid, Gül Varol

Comments: A short version of this work appeared at CVPR 2023 Workshops. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We describe a protocol to study text-to-video retrieval training with unlabeled videos, where we assume (i) no access to labels for any videos, i.e., no access to the set of ground-truth captions, but (ii) access to labeled images in the form of text. Using image expert models is a realistic scenario given that annotating images is cheaper therefore scalable, in contrast to expensive video labeling schemes. Recently, zero-shot image experts such as CLIP have established a new strong baseline for video understanding tasks. In this paper, we make use of this progress and instantiate the image experts from two types of models: a text-to-image retrieval model to provide an initial backbone, and image captioning models to provide supervision signal into unlabeled videos. We show that automatically labeling video frames with image captioning allows text-to-video retrieval training. This process adapts the features to the target domain at no manual annotation cost, consequently outperforming the strong zero-shot CLIP baseline. During training, we sample captions from multiple video frames that best match the visual content, and perform a temporal pooling over frame representations by scoring frames according to their relevance to each caption. We conduct extensive ablations to provide insights and demonstrate the effectiveness of this simple framework by outperforming the CLIP zero-shot baselines on text-to-video retrieval on three standard datasets, namely ActivityNet, MSR-VTT, and MSVD.
[296] arXiv:2404.17502 [pdf, other]: Title: Internal Pattern Matching in Small Space and Applications

Authors: Gabriel Bathie, Panagiotis Charalampopoulos, Tatiana Starikovskaya

Comments: To be published in CPM 2024

Subjects: Data Structures and Algorithms (cs.DS)

In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal Pattern Matching (IPM) problem, where the goal is to construct a data structure over a string $S$ of length $n$ that allows one to answer the following type of queries: Compute the occurrences of a fragment $P$ of $S$ inside another fragment $T$ of $S$, provided that $|T| < 2|P|$. For any $\tau \in [1 .. n/\log^2 n]$, we present a nearly-optimal $\~O(n/\tau)$-size data structure that can be built in $\~O(n)$ time using $\~O(n/\tau)$ extra space, and answers IPM queries in $O(\tau+\log n \log^3 \log n)$ time. IPM queries have been identified as a crucial primitive operation for the analysis of algorithms on strings. In particular, the complexities of several recent algorithms for approximate pattern matching are expressed with regards to the number of calls to a small set of primitive operations that include IPM queries; our data structure allows us to port these results to the small-space setting. We further showcase the applicability of our IPM data structure by using it to obtain space-time trade-offs for the longest common substring and circular pattern matching problems in the asymmetric streaming setting.
[297] arXiv:2404.17503 [pdf, ps, other]: Title: Inhomogeneous illuminated image enhancement under extremely low visibility condition

Authors: Libang Chen, Yikun Liu, Jianying Zhou

Subjects: Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)

Imaging through fog significantly impacts fields such as object detection and recognition. In conditions of extremely low visibility, essential image information can be obscured, rendering standard extraction methods ineffective. Traditional digital processing techniques, such as histogram stretching, aim to mitigate fog effects by enhancing object light contrast diminished by atmospheric scattering. However, these methods often experience reduce effectiveness under inhomogeneous illumination. This paper introduces a novel approach that adaptively filters background illumination under extremely low visibility and preserve only the essential signal information. Additionally, we employ a visual optimization strategy based on image gradients to eliminate grayscale banding. Finally, the image is transformed to achieve high contrast and maintain fidelity to the original information through maximum histogram equalization. Our proposed method significantly enhances signal clarity in conditions of extremely low visibility and outperforms existing algorithms.
[298] arXiv:2404.17507 [pdf, other]: Title: HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

Authors: Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun

Comments: 28pages, 4.5MB

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training. Addressing this, we introduce HYPerbolic Entailment filtering (HYPE), a novel methodology designed to meticulously extract modality-wise meaningful and well-aligned data from extensive, noisy image-text pair datasets. Our approach leverages hyperbolic embeddings and the concept of entailment cones to evaluate and filter out samples with meaningless or underspecified semantics, focusing on enhancing the specificity of each data sample. HYPE not only demonstrates a significant improvement in filtering efficiency but also sets a new state-of-the-art in the DataComp benchmark when combined with existing filtering techniques. This breakthrough showcases the potential of HYPE to refine the data selection process, thereby contributing to the development of more accurate and efficient self-supervised learning models. Additionally, the image specificity $\epsilon_{i}$ can be independently applied to induce an image-only dataset from an image-text or image-only data pool for training image-only self-supervised models and showed superior performance when compared to the dataset induced by CLIP score.
[299] arXiv:2404.17508 [pdf, other]: Title: Constrained Neural Networks for Interpretable Heuristic Creation to Optimise Computer Algebra Systems

Authors: Dorian Florescu, Matthew England

Comments: Accepted for presentation at ICMS 2024

Subjects: Symbolic Computation (cs.SC); Machine Learning (cs.LG)

We present a new methodology for utilising machine learning technology in symbolic computation research. We explain how a well known human-designed heuristic to make the choice of variable ordering in cylindrical algebraic decomposition may be represented as a constrained neural network. This allows us to then use machine learning methods to further optimise the heuristic, leading to new networks of similar size, representing new heuristics of similar complexity as the original human-designed one. We present this as a form of ante-hoc explainability for use in computer algebra development.
[300] arXiv:2404.17509 [pdf, ps, other]: Title: Understanding the Cluster LP for Correlation Clustering

Authors: Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman, Lukas Vogl

Subjects: Data Structures and Algorithms (cs.DS)

In the classic Correlation Clustering problem introduced by Bansal, Blum, and Chawla~(FOCS 2002), the input is a complete graph where edges are labeled either $+$ or $-$, and the goal is to find a partition of the vertices that minimizes the sum of the +edges across parts plus the sum of the -edges within parts. In recent years, Chawla, Makarychev, Schramm and Yaroslavtsev~(STOC 2015) gave a 2.06-approximation by providing a near-optimal rounding of the standard LP, and Cohen-Addad, Lee, Li, and Newman~(FOCS 2022, 2023) finally bypassed the integrality gap of 2 for this LP giving a $1.73$-approximation for the problem.
In order to create a simple and unified framework for Correlation Clustering similar to those for {\em typical} approximate optimization tasks, we propose the {\em cluster LP} as a strong linear program that might tightly capture the approximability of Correlation Clustering. It unifies all the previous relaxations for the problem.
We demonstrate the power of the cluster LP by presenting a simple rounding algorithm, and providing two analyses, one analytically proving a 1.49-approximation and the other solving a factor-revealing SDP to show a 1.437-approximation. Both proofs introduce principled methods by which to analyze the performance of the algorithm, resulting in a significantly improved approximation guarantee.
Finally, we prove an integrality gap of $4/3$ for the cluster LP, showing our 1.437-upper bound cannot be drastically improved. Our gap instance directly inspires an improved NP-hardness of approximation with a ratio $24/23 \approx 1.042$; no explicit hardness ratio was known before.
[301] arXiv:2404.17511 [pdf, other]: Title: Bridging the Fairness Divide: Achieving Group and Individual Fairness in Graph Neural Networks

Authors: Duna Zhan, Dongliang Guo, Pengsheng Ji, Sheng Li

Comments: 16 pages, 3 figures

Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

Graph neural networks (GNNs) have emerged as a powerful tool for analyzing and learning from complex data structured as graphs, demonstrating remarkable effectiveness in various applications, such as social network analysis, recommendation systems, and drug discovery. However, despite their impressive performance, the fairness problem has increasingly gained attention as a crucial aspect to consider. Existing research in graph learning focuses on either group fairness or individual fairness. However, since each concept provides unique insights into fairness from distinct perspectives, integrating them into a fair graph neural network system is crucial. To the best of our knowledge, no study has yet to comprehensively tackle both individual and group fairness simultaneously. In this paper, we propose a new concept of individual fairness within groups and a novel framework named Fairness for Group and Individual (FairGI), which considers both group fairness and individual fairness within groups in the context of graph learning. FairGI employs the similarity matrix of individuals to achieve individual fairness within groups, while leveraging adversarial learning to address group fairness in terms of both Equal Opportunity and Statistical Parity. The experimental results demonstrate that our approach not only outperforms other state-of-the-art models in terms of group fairness and individual fairness within groups, but also exhibits excellent performance in population-level individual fairness, while maintaining comparable prediction accuracy.
[302] arXiv:2404.17513 [pdf, other]: Title: A Comprehensive Evaluation on Event Reasoning of Large Language Models

Authors: Zhengwei Tao, Zhi Jin, Yifan Zhang, Xiancai Chen, Xiaoying Bai, Yue Fang, Haiyan Zhao, Jia Li, Chongyang Tao

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abilities of event reasoning of LLMs. We introduce a novel benchmark EV2 for EValuation of EVent reasoning. EV2 consists of two levels of evaluation of schema and instance and is comprehensive in relations and reasoning paradigms. We conduct extensive experiments on EV2. We find that LLMs have abilities to accomplish event reasoning but their performances are far from satisfactory. We also notice the imbalance of event reasoning abilities in LLMs. Besides, LLMs have event schema knowledge, however, they're not aligned with humans on how to utilize the knowledge. Based on these findings, we introduce two methods to guide the LLMs to utilize the event schema knowledge. Both methods achieve improvements.
[303] arXiv:2404.17519 [pdf, other]: Title: Interpreting Deepcode, a learned feedback code

Authors: Yingyao Zhou, Natasha Devroye, Gyorgy Turan, Milos Zefran

Comments: Accepted to the 2024 ISIT conference

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Deep learning methods have recently been used to construct non-linear codes for the additive white Gaussian noise (AWGN) channel with feedback. However, there is limited understanding of how these black-box-like codes with many learned parameters use feedback. This study aims to uncover the fundamental principles underlying the first deep-learned feedback code, known as Deepcode, which is based on an RNN architecture. Our interpretable model based on Deepcode is built by analyzing the influence length of inputs and approximating the non-linear dynamics of the original black-box RNN encoder. Numerical experiments demonstrate that our interpretable model -- which includes both an encoder and a decoder -- achieves comparable performance to Deepcode while offering an interpretation of how it employs feedback for error correction.
[304] arXiv:2404.17520 [pdf, other]: Title: A Cognitive-Driven Trajectory Prediction Model for Autonomous Driving in Mixed Autonomy Environment

Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Bonan Wang, Hanlin Kong, Yanchen Guan, Guofa Li, Zhiyong Cui, Chengzhong Xu

Comments: Accepted by IJCAI 2024

Subjects: Robotics (cs.RO)

As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traffic scenarios. It represents a significant leap forward, achieving marked performance improvements on several key datasets. Specifically, it surpasses existing benchmarks with gains of 16.2% on the Next Generation Simulation (NGSIM), 27.4% on the Highway Drone (HighD), and 19.8% on the Macao Connected Autonomous Driving (MoCAD) dataset. Our proposed model shows exceptional proficiency in handling corner cases, essential for real-world applications. Moreover, its robustness is evident in scenarios with missing or limited data, outperforming most of the state-of-the-art baselines. This adaptability and resilience position our model as a viable tool for real-world autonomous driving systems, heralding a new standard in vehicle trajectory prediction for enhanced safety and efficiency.
[305] arXiv:2404.17521 [pdf, other]: Title: Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

Authors: Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

Comments: Project website and open-source code: this https URL

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation. However, modern methods (e.g., VIP and R3M) still face significant hurdles, notably the domain gap among robotic embodiments and the sparsity of successful task executions within specific action spaces, resulting in misaligned and ambiguous task representations. We introduce Ag2Manip (Agent-Agnostic representations for Manipulation), a framework aimed at surmounting these challenges through two key innovations: a novel agent-agnostic visual representation derived from human manipulation videos, with the specifics of embodiments obscured to enhance generalizability; and an agent-agnostic action representation abstracting a robot's kinematics to a universal agent proxy, emphasizing crucial interactions between end-effector and object. Ag2Manip's empirical validation across simulated benchmarks like FrankaKitchen, ManiSkill, and PartManip shows a 325% increase in performance, achieved without domain-specific demonstrations. Ablation studies underline the essential contributions of the visual and action representations to this success. Extending our evaluations to the real world, Ag2Manip significantly improves imitation learning success rates from 50% to 77.5%, demonstrating its effectiveness and generalizability across both simulated and physical environments.
[306] arXiv:2404.17522 [pdf, other]: Title: Enhancing Legal Compliance and Regulation Analysis with Large Language Models

Authors: Shabnam Hassani

Comments: to be published in 32nd IEEE International Requirements Engineering 2024 Conference (RE'24) - Doctoral Symposium. arXiv admin note: text overlap with arXiv:2404.14356

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

This research explores the application of Large Language Models (LLMs) for automating the extraction of requirement-related legal content in the food safety domain and checking legal compliance of regulatory artifacts. With Industry 4.0 revolutionizing the food industry and with the General Data Protection Regulation (GDPR) reshaping privacy policies and data processing agreements, there is a growing gap between regulatory analysis and recent technological advancements. This study aims to bridge this gap by leveraging LLMs, namely BERT and GPT models, to accurately classify legal provisions and automate compliance checks. Our findings demonstrate promising results, indicating LLMs' significant potential to enhance legal compliance and regulatory analysis efficiency, notably by reducing manual workload and improving accuracy within reasonable time and financial constraints.
[307] arXiv:2404.17524 [pdf, other]: Title: On the Use of Large Language Models to Generate Capability Ontologies

Authors: Luis Miguel Vieira da Silva, Aljosha Köcher, Felix Gehlhoff, Alexander Fay

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Capability ontologies are increasingly used to model functionalities of systems or machines. The creation of such ontological models with all properties and constraints of capabilities is very complex and can only be done by ontology experts. However, Large Language Models (LLMs) have shown that they can generate machine-interpretable models from natural language text input and thus support engineers / ontology experts. Therefore, this paper investigates how LLMs can be used to create capability ontologies. We present a study with a series of experiments in which capabilities with varying complexities are generated using different prompting techniques and with different LLMs. Errors in the generated ontologies are recorded and compared. To analyze the quality of the generated ontologies, a semi-automated approach based on RDF syntax checking, OWL reasoning, and SHACL constraints is used. The results of this study are very promising because even for complex capabilities, the generated ontologies are almost free of errors.
[308] arXiv:2404.17525 [pdf, ps, other]: Title: Large Language Model Agent as a Mechanical Designer

Authors: Yayati Jadhav, Amir Barati Farimani

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Conventional mechanical design paradigms rely on experts systematically refining concepts through experience-guided modification and FEA to meet specific requirements. However, this approach can be time-consuming and heavily dependent on prior knowledge and experience. While numerous machine learning models have been developed to streamline this intensive and expert-driven iterative process, these methods typically demand extensive training data and considerable computational resources. Furthermore, methods based on deep learning are usually restricted to the specific domains and tasks for which they were trained, limiting their applicability across different tasks. This creates a trade-off between the efficiency of automation and the demand for resources. In this study, we present a novel approach that integrates pre-trained LLMs with a FEM module. The FEM module evaluates each design and provides essential feedback, guiding the LLMs to continuously learn, plan, generate, and optimize designs without the need for domain-specific training. We demonstrate the effectiveness of our proposed framework in managing the iterative optimization of truss structures, showcasing its capability to reason about and refine designs according to structured feedback and criteria. Our results reveal that these LLM-based agents can successfully generate truss designs that comply with natural language specifications with a success rate of up to 90%, which varies according to the applied constraints. By employing prompt-based optimization techniques we show that LLM based agents exhibit optimization behavior when provided with solution-score pairs to iteratively refine designs to meet specifications. This ability of LLM agents to produce viable designs and optimize them based on their inherent reasoning capabilities highlights their potential to develop and implement effective design strategies autonomously.
[309] arXiv:2404.17528 [pdf, other]: Title: Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

Authors: Tianqi Liu, Xinyi Ye, Min Shi, Zihao Huang, Zhiyu Pan, Zhan Peng, Zhiguo Cao

Comments: Accepted by CVPR 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes for geometry reconstruction and encoding 3D descriptors for decoding novel views. However, existing methods show limited generalization ability in challenging conditions due to inaccurate geometry, sub-optimal descriptors, and decoding strategies. We address these issues point by point. First, we find the variance-based cost volume exhibits failure patterns as the features of pixels corresponding to the same point can be inconsistent across different views due to occlusions or reflections. We introduce an Adaptive Cost Aggregation (ACA) approach to amplify the contribution of consistent pixel pairs and suppress inconsistent ones. Unlike previous methods that solely fuse 2D features into descriptors, our approach introduces a Spatial-View Aggregator (SVA) to incorporate 3D context into descriptors through spatial and inter-view interaction. When decoding the descriptors, we observe the two existing decoding strategies excel in different areas, which are complementary. A Consistency-Aware Fusion (CAF) strategy is proposed to leverage the advantages of both. We incorporate the above ACA, SVA, and CAF into a coarse-to-fine framework, termed Geometry-aware Reconstruction and Fusion-refined Rendering (GeFu). GeFu attains state-of-the-art performance across multiple datasets. Code is available at https://github.com/TQTQliu/GeFu .
[310] arXiv:2404.17530 [pdf, other]: Title: Lookahead Games and Efficient Determinisation of History-Deterministic Büchi Automata

Authors: Rohan Acharya, Marcin Jurdziński, Aditya Prakash

Comments: Full version of paper accepted at ICALP 2024

Subjects: Formal Languages and Automata Theory (cs.FL)

Our main technical contribution is a polynomial-time determinisation procedure for history-deterministic B\"uchi automata, which settles an open question of Kuperberg and Skrzypczak, 2015. A key conceptual contribution is the lookahead game, which is a variant of Bagnol and Kuperberg's token game, in which Adam is given a fixed lookahead. We prove that the lookahead game is equivalent to the 1-token game. This allows us to show that the 1-token game characterises history-determinism for semantically-deterministic B\"uchi automata, which paves the way to our polynomial-time determinisation procedure.
[311] arXiv:2404.17532 [pdf, ps, other]: Title: Mitigating Collisions in Sidelink NR V2X: A Study on Cooperative Resource Allocation

Authors: Mohammadsaleh Nikooroo, Juan Estrada-Jimenez, Aurel Machalek, Jerome Harri, Thomas Engel, Ion Turcanu

Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

New Radio (NR) Vehicle-to-Everything (V2X) Sidelink (SL), an integral part of the 5G NR standard, is expected to revolutionize the automotive and rail industries by enabling direct and low-latency exchange of critical information between traffic participants independently of cellular networks. However, this advancement depends primarily on efficient SL resource allocation. Mode 2(a) is a well-known method for this purpose, where each node autonomously selects resources. However, this method is prone to packet collisions due to the hidden-node problem. In this paper, we propose a cooperative scheduling method that could potentially address this issue. We describe an extension of Mode 2(a) that allows nodes to share resource allocation information at two hops. Initial simulation results show a promising improvement over Mode 2(a).
[312] arXiv:2404.17534 [pdf, other]: Title: Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models

Authors: Yuhang Huang, Zihan Wu, Chongyang Gao, Jiawei Peng, Xu Yang

Comments: 11 pages, 9 figures, 6 tables. For associated code, see this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Large Vision-Language Models (LVLMs) are gaining traction for their remarkable ability to process and integrate visual and textual data. Despite their popularity, the capacity of LVLMs to generate precise, fine-grained textual descriptions has not been fully explored. This study addresses this gap by focusing on \textit{distinctiveness} and \textit{fidelity}, assessing how models like Open-Flamingo, IDEFICS, and MiniGPT-4 can distinguish between similar objects and accurately describe visual features. We proposed the Textual Retrieval-Augmented Classification (TRAC) framework, which, by leveraging its generative capabilities, allows us to delve deeper into analyzing fine-grained visual description generation. This research provides valuable insights into the generation quality of LVLMs, enhancing the understanding of multimodal language models. Notably, MiniGPT-4 stands out for its better ability to generate fine-grained descriptions, outperforming the other two models in this aspect. The code is provided at \url{https://anonymous.4open.science/r/Explore_FGVDs-E277}.
[313] arXiv:2404.17535 [pdf, other]: Title: Using Neural Implicit Flow To Represent Latent Dynamics Of Canonical Systems

Authors: Imran Nasim, Joaõ Lucas de Sousa Almeida

Comments: Accepted into the International conference on Scientific Computation and Machine Learning 2024 (SCML 2024)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The recently introduced class of architectures known as Neural Operators has emerged as highly versatile tools applicable to a wide range of tasks in the field of Scientific Machine Learning (SciML), including data representation and forecasting. In this study, we investigate the capabilities of Neural Implicit Flow (NIF), a recently developed mesh-agnostic neural operator, for representing the latent dynamics of canonical systems such as the Kuramoto-Sivashinsky (KS), forced Korteweg-de Vries (fKdV), and Sine-Gordon (SG) equations, as well as for extracting dynamically relevant information from them. Finally we assess the applicability of NIF as a dimensionality reduction algorithm and conduct a comparative analysis with another widely recognized family of neural operators, known as Deep Operator Networks (DeepONets).
[314] arXiv:2404.17544 [pdf, ps, other]: Title: Root-to-Leaf Scheduling in Write-Optimized Trees

Authors: Christopher Chung, William Jannen, Samuel McCauley, Bertrand Simon

Subjects: Data Structures and Algorithms (cs.DS)

Write-optimized dictionaries are a class of cache-efficient data structures that buffer updates and apply them in batches to optimize the amortized cache misses per update. For example, a B^epsilon tree inserts updates as messages at the root. B^epsilon trees only move ("flush") messages when they have total size close to a cache line, optimizing the amount of work done per cache line written. Thus, recently-inserted messages reside at or near the root and are only flushed down the tree after a sufficient number of new messages arrive. Although this lazy approach works well for many operations, some types of updates do not complete until the update message reaches a leaf. For example, deferred queries and secure deletes must flush through all nodes along their root-to-leaf path before taking effect. What happens when we want to service a large number of (say) secure deletes as quickly as possible? Classic techniques leave us with an unsavory choice. On the one hand, we can group the delete messages using a write-optimized approach and move them down the tree lazily. But then many individual deletes may be left incomplete for an extended period of time, as their messages wait to be grouped with a sufficiently large number of related messages. On the other hand, we can ignore cache efficiency and perform a root-to-leaf flush for each delete. This begins work on individual deletes immediately, but harms system throughput. This paper investigates a new framework for efficiently flushing collections of messages from the root to their leaves in a write-optimized data structure. Our goal is to minimize the average time that messages reach the leaves. We give an algorithm that O(1)-approximates the optimal average completion time in this model. Along the way, we give a new 4-approximation algorithm for scheduling parallel tasks for weighted completion time with tree precedence constraints.
[315] arXiv:2404.17546 [pdf, other]: Title: Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

Authors: Stephen Zhao, Rob Brekelmans, Alireza Makhzani, Roger Grosse

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Numerous capability and safety techniques of Large Language Models (LLMs), including RLHF, automated red-teaming, prompt engineering, and infilling, can be cast as sampling from an unnormalized target distribution defined by a given reward or potential function over the full sequence. In this work, we leverage the rich toolkit of Sequential Monte Carlo (SMC) for these probabilistic inference problems. In particular, we use learned twist functions to estimate the expected future value of the potential at each timestep, which enables us to focus inference-time computation on promising partial sequences. We propose a novel contrastive method for learning the twist functions, and establish connections with the rich literature of soft reinforcement learning. As a complementary application of our twisted SMC framework, we present methods for evaluating the accuracy of language model inference techniques using novel bidirectional SMC bounds on the log partition function. These bounds can be used to estimate the KL divergence between the inference and target distributions in both directions. We apply our inference evaluation techniques to show that twisted SMC is effective for sampling undesirable outputs from a pretrained model (a useful component of harmlessness training and automated red-teaming), generating reviews with varied sentiment, and performing infilling tasks.
[316] arXiv:2404.17547 [pdf, other]: Title: Integrating UAV-Enabled Base Stations in 3D Networks: QoS-Aware Joint Fronthaul and Backhaul Design

Authors: Salim Janji, Piotr Wawrzyniak, Piotr Formanowicz, Adrian Kliks

Subjects: Networking and Internet Architecture (cs.NI)

The emerging concept of 3D networks, integrating terrestrial, aerial, and space layers, introduces a novel and complex structure characterized by stations relaying backhaul loads through point-to-point wireless links, forming a wireless 3D backhaul mesh. A key challenge is the strategic placement of aerial platform such as drone base stations (DBSs), considering the locations and service demands of ground nodes and the connectivity to backhaul gateway nodes for core network access. This paper addresses these complexities with a two-fold approach: a novel Agglomerative Hierarchical Clustering (HC) algorithm that optimizes DBS locations to satisfy minimum backhaul adjacency and maximum fronthaul coverage radius requirements; and a Genetic Algorithm (GA) that designs backhaul connections to satisfy the cumulative load across the network and maximize the throughput margin which translates to network resilience to increasing demands. Our results showcase the effectiveness of these algorithms against benchline schemes, offering insights into the operational dynamics of these novel 3D networks.
[317] arXiv:2404.17550 [pdf, other]: Title: CoCar NextGen: a Multi-Purpose Platform for Connected Autonomous Driving Research

Authors: Marc Heinrich, Maximilian Zipfl, Marc Uecker, Sven Ochs, Martin Gontscharow, Tobias Fleck, Jens Doll, Philip Schörner, Christian Hubschneider, Marc René Zofka, Alexander Viehl, J. Marius Zöllner

Subjects: Robotics (cs.RO)

Real world testing is of vital importance to the success of automated driving. While many players in the business design purpose build testing vehicles, we designed and build a modular platform that offers high flexibility for any kind of scenario. CoCar NextGen is equipped with next generation hardware that addresses all future use cases. Its extensive, redundant sensor setup allows to develop cross-domain data driven approaches that manage the transfer to other sensor setups. Together with the possibility of being deployed on public roads, this creates a unique research platform that supports the road to automated driving on SAE Level 5.
[318] arXiv:2404.17553 [pdf, other]: Title: Federated Transfer Component Analysis Towards Effective VNF Profiling

Authors: Xunzheng ZhangB, Shadi Moazzeni, Juan Marcelo Parra-Ullauri, Reza Nejabati, Dimitra Simeonidou

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

The increasing concerns of knowledge transfer and data privacy challenge the traditional gather-and-analyse paradigm in networks. Specifically, the intelligent orchestration of Virtual Network Functions (VNFs) requires understanding and profiling the resource consumption. However, profiling all kinds of VNFs is time-consuming. It is important to consider transferring the well-profiled VNF knowledge to other lack-profiled VNF types while keeping data private. To this end, this paper proposes a Federated Transfer Component Analysis (FTCA) method between the source and target VNFs. FTCA first trains Generative Adversarial Networks (GANs) based on the source VNF profiling data, and the trained GANs model is sent to the target VNF domain. Then, FTCA realizes federated domain adaptation by using the generated source VNF data and less target VNF profiling data, while keeping the raw data locally. Experiments show that the proposed FTCA can effectively predict the required resources for the target VNF. Specifically, the RMSE index of the regression model decreases by 38.5% and the R-squared metric advances up to 68.6%.
[319] arXiv:2404.17554 [pdf, ps, other]: Title: A Novel Context driven Critical Integrative Levels (CIL) Approach: Advancing Human-Centric and Integrative Lighting Asset Management in Public Libraries with Practical Thresholds

Authors: Jing Lin, Nina Mylly, Per Olof Hedekvist, Jingchun Shen

Subjects: Human-Computer Interaction (cs.HC); Signal Processing (eess.SP); Systems and Control (eess.SY); Applications (stat.AP)

This paper proposes the context driven Critical Integrative Levels (CIL), a novel approach to lighting asset management in public libraries that aligns with the transformative vision of human-centric and integrative lighting. This approach encompasses not only the visual aspects of lighting performance but also prioritizes the physiological and psychological well-being of library users. Incorporating a newly defined metric, Mean Time of Exposure (MTOE), the approach quantifies user-light interaction, enabling tailored lighting strategies that respond to diverse activities and needs in library spaces. Case studies demonstrate how the CIL matrix can be practically applied, offering significant improvements over conventional methods by focusing on optimized user experiences from both visual impacts and non-visual effects.
[320] arXiv:2404.17563 [pdf, other]: Title: An exactly solvable model for emergence and scaling laws

Authors: Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Ard Louis

Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)

Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time ($T$), training data ($D$), or model size ($N$) increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute ($C$). We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.
[321] arXiv:2404.17565 [pdf, other]: Title: ChangeBind: A Hybrid Change Encoder for Remote Sensing Change Detection

Authors: Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal

Comments: accepted at IGARSS 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Change detection (CD) is a fundamental task in remote sensing (RS) which aims to detect the semantic changes between the same geographical regions at different time stamps. Existing convolutional neural networks (CNNs) based approaches often struggle to capture long-range dependencies. Whereas recent transformer-based methods are prone to the dominant global representation and may limit their capabilities to capture the subtle change regions due to the complexity of the objects in the scene. To address these limitations, we propose an effective Siamese-based framework to encode the semantic changes occurring in the bi-temporal RS images. The main focus of our design is to introduce a change encoder that leverages local and global feature representations to capture both subtle and large change feature information from multi-scale features to precisely estimate the change regions. Our experimental study on two challenging CD datasets reveals the merits of our approach and obtains state-of-the-art performance.
[322] arXiv:2404.17569 [pdf, other]: Title: MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Authors: Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

Comments: SIGGRAPH 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper aims to generate materials for 3D meshes from text descriptions. Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides substantial flexibility in editing. Instead of relying on extensive paired data, i.e., 3D meshes with material graphs and corresponding text descriptions, to train a material graph generative model, we propose to leverage the pre-trained 2D diffusion model as a bridge to connect the text and material graphs. Specifically, our approach decomposes a shape into a set of segments and designs a segment-controlled diffusion model to synthesize 2D images that are aligned with mesh parts. Based on generated images, we initialize parameters of material graphs and fine-tune them through the differentiable rendering module to produce materials in accordance with the textual description. Extensive experiments demonstrate the superior performance of our framework in photorealism, resolution, and editability over existing methods. Project page: https://zhanghe3z.github.io/MaPa/
[323] arXiv:2404.17571 [pdf, other]: Title: Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos

Authors: Zhengze Xu, Mengting Chen, Zhao Wang, Linyu Xing, Zhonghua Zhai, Nong Sang, Jinsong Lan, Shuai Xiao, Changxin Gao

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video try-on is a challenging task and has not been well tackled in previous works. The main obstacle lies in preserving the details of the clothing and modeling the coherent motions simultaneously. Faced with those difficulties, we address video try-on by proposing a diffusion-based framework named "Tunnel Try-on." The core idea is excavating a "focus tunnel" in the input video that gives close-up shots around the clothing regions. We zoom in on the region in the tunnel to better preserve the fine details of the clothing. To generate coherent motions, we first leverage the Kalman filter to construct smooth crops in the focus tunnel and inject the position embedding of the tunnel into attention layers to improve the continuity of the generated videos. In addition, we develop an environment encoder to extract the context information outside the tunnels as supplementary cues. Equipped with these techniques, Tunnel Try-on keeps the fine details of the clothing and synthesizes stable and smooth videos. Demonstrating significant advancements, Tunnel Try-on could be regarded as the first attempt toward the commercial-level application of virtual try-on in videos.

Cross-lists for Mon, 29 Apr 24

[324] arXiv:2404.16862 (cross-list from physics.soc-ph) [pdf, other]: Title: Edge Importance in Complex Networks

Authors: Silvia Noschese, Lothar Reichel

Comments: 25 pages, 9 tables, 1 figure

Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)

Complex networks are made up of vertices and edges. The latter connect the vertices. There are several ways to measure the importance of the vertices, e.g., by counting the number of edges that start or end at each vertex, or by using the subgraph centrality of the vertices. It is more difficult to assess the importance of the edges. One approach is to consider the line graph associated with the given network and determine the importance of the vertices of the line graph, but this is fairly complicated except for small networks. This paper compares two approaches to estimate the importance of edges of medium-sized to large networks. One approach computes partial derivatives of the total communicability of the weights of the edges, where a partial derivative of large magnitude indicates that the corresponding edge may be important. Our second approach computes the Perron sensitivity of the edges. A high sensitivity signals that the edge may be important. The performance of these methods and some computational aspects are discussed. Applications of interest include to determine whether a network can be replaced by a network with fewer edges with about the same communicability.
[325] arXiv:2404.16863 (cross-list from physics.soc-ph) [pdf, ps, other]: Title: Efficient Strategies on Supply Chain Network Optimization for Industrial Carbon Emission Reduction

Authors: Jihu Lei

Journal-ref: Journal of Computational Methods in Engineering Applications (2022): 1-11

Subjects: Physics and Society (physics.soc-ph); Computers and Society (cs.CY)

This study investigates the efficient strategies for supply chain network optimization, specifically aimed at reducing industrial carbon emissions. Amidst escalating concerns about global climate change, industry sectors are motivated to counteract the negative environmental implications of their supply chain networks. This paper introduces a novel framework for optimizing these networks via strategic approaches which lead to a definitive decrease in carbon emissions. We introduce Adaptive Carbon Emissions Indexing (ACEI), utilizing real-time carbon emissions data to drive instantaneous adjustments in supply chain operations. This adaptability predicates on evolving environmental regulations, fluctuating market trends and emerging technological advancements. The empirical validations demonstrate our strategy's effectiveness in various industrial sectors, indicating a significant reduction in carbon emissions and an increase in operational efficiency. This method also evidences resilience in the face of sudden disruptions and crises, reflecting its robustness.
[326] arXiv:2404.16866 (cross-list from q-bio.QM) [pdf, other]: Title: Functional Protein Design with Local Domain Alignment

Authors: Chaohao Yuan, Songyou Li, Geyan Ye, Yikun Zhang, Long-Kai Huang, Wenbing Huang, Wei Liu, Jianhua Yao, Yu Rong

Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which directly describe the protein's high-level functionalities, properties, and their correlation with target amino acid sequences, remain unexplored in the context of protein design tasks. In this paper, we propose Protein-Annotation Alignment Generation (PAAG), a multi-modality protein design framework that integrates the textual annotations extracted from protein database for controllable generation in sequence space. Specifically, within a multi-level alignment module, PAAG can explicitly generate proteins containing specific domains conditioned on the corresponding domain annotations, and can even design novel proteins with flexible combinations of different kinds of annotations. Our experimental results underscore the superiority of the aligned protein representations from PAAG over 7 prediction tasks. Furthermore, PAAG demonstrates a nearly sixfold increase in generation success rate (24.7% vs 4.7% in zinc finger, and 54.3% vs 8.7% in the immunoglobulin domain) in comparison to the existing model.
[327] arXiv:2404.16875 (cross-list from physics.soc-ph) [pdf, ps, other]: Title: Langues en danger et multilinguisme num{é}rique

Authors: Mokhtar Ben Henda (MICA)

Comments: in French language

Journal-ref: Laulan, Anne-Marie & Lenoble-Bart, Annie. Les oubli{\'e}s de l'internet : Cultures et langues sur l'Internet, oubli ou d{\'e}ni ?, Les {\'e}tudes hospitali{\`e}res, pp.77-94, 2014, 978-2-84874-555-8

Subjects: Physics and Society (physics.soc-ph); Networking and Internet Architecture (cs.NI)

In the era of globalization and digital networks, the so-called ''minored'' or ''endangered'' languages are facing a twofold dilemma: either succeed in their digital modernity by accepting a ''painful'' linguistic management or slide towards a slow extinction in front of hegemonic and ''predatory'' languages which dominate the digital networks.Oral languages and minored not-Romanized writings are the most concerned by the protective measures of the cultural and linguistic diversity on the Internet. Digital broadcasting and the Unicode multi-writing encoding system are providing them with innovative, consensual, and standardized alternatives to survive. Then, it depends on the synergy that their communities of practice will generate to place them at the heart of the debate on the digital divide.
[328] arXiv:2404.16880 (cross-list from q-bio.QM) [pdf, other]: Title: Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

Authors: Yikun Zhang, Geyan Ye, Chaohao Yuan, Bo Han, Long-Kai Huang, Jianhua Yao, Wei Liu, Yu Rong

Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields, including drug discovery and materials science. Existing studies adopt a global alignment approach to learn the knowledge from different modalities. These global alignment approaches fail to capture fine-grained information, such as molecular fragments and their corresponding textual description, which is crucial for downstream tasks. Furthermore, it is incapable to model such information using a similar global alignment strategy due to data scarcity of paired local part annotated data from existing datasets. In this paper, we propose Atomas, a multi-modal molecular representation learning framework to jointly learn representations from SMILES string and text. We design a Hierarchical Adaptive Alignment model to concurrently learn the fine-grained fragment correspondence between two modalities and align these representations of fragments in three levels. Additionally, Atomas's end-to-end training framework incorporates the tasks of understanding and generating molecule, thereby supporting a wider range of downstream tasks. In the retrieval task, Atomas exhibits robust generalization ability and outperforms the baseline by 30.8% of recall@1 on average. In the generation task, Atomas achieves state-of-the-art results in both molecule captioning task and molecule generation task. Moreover, the visualization of the Hierarchical Adaptive Alignment model further confirms the chemical significance of our approach. Our codes can be found at https://anonymous.4open.science/r/Atomas-03C3.
[329] arXiv:2404.16900 (cross-list from eess.IV) [pdf, other]: Title: Space-Variant Total Variation boosted by learning techniques in few-view tomographic imaging

Authors: Elena Morotti, Davide Evangelista, Andrea Sebastiani, Elena Loli Piccolomini

Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

This paper focuses on the development of a space-variant regularization model for solving an under-determined linear inverse problem. The case study is a medical image reconstruction from few-view tomographic noisy data. The primary objective of the proposed optimization model is to achieve a good balance between denoising and the preservation of fine details and edges, overcoming the performance of the popular and largely used Total Variation (TV) regularization through the application of appropriate pixel-dependent weights. The proposed strategy leverages the role of gradient approximations for the computation of the space-variant TV weights. For this reason, a convolutional neural network is designed, to approximate both the ground truth image and its gradient using an elastic loss function in its training. Additionally, the paper provides a theoretical analysis of the proposed model, showing the uniqueness of its solution, and illustrates a Chambolle-Pock algorithm tailored to address the specific problem at hand. This comprehensive framework integrates innovative regularization techniques with advanced neural network capabilities, demonstrating promising results in achieving high-quality reconstructions from low-sampled tomographic data.
[330] arXiv:2404.16907 (cross-list from q-bio.GN) [pdf, other]: Title: Season combinatorial intervention predictions with Salt & Peper

Authors: Thomas Gaudelet, Alice Del Vecchio, Eli M Carrami, Juliana Cudini, Chantriolnt-Andreas Kapourani, Caroline Uhler, Lindsay Edwards

Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG); Cell Behavior (q-bio.CB)

Interventions play a pivotal role in the study of complex biological systems. In drug discovery, genetic interventions (such as CRISPR base editing) have become central to both identifying potential therapeutic targets and understanding a drug's mechanism of action. With the advancement of CRISPR and the proliferation of genome-scale analyses such as transcriptomics, a new challenge is to navigate the vast combinatorial space of concurrent genetic interventions. Addressing this, our work concentrates on estimating the effects of pairwise genetic combinations on the cellular transcriptome. We introduce two novel contributions: Salt, a biologically-inspired baseline that posits the mostly additive nature of combination effects, and Peper, a deep learning model that extends Salt's additive assumption to achieve unprecedented accuracy. Our comprehensive comparison against existing state-of-the-art methods, grounded in diverse metrics, and our out-of-distribution analysis highlight the limitations of current models in realistic settings. This analysis underscores the necessity for improved modelling techniques and data acquisition strategies, paving the way for more effective exploration of genetic intervention effects.
[331] arXiv:2404.16911 (cross-list from physics.chem-ph) [pdf, other]: Title: HEroBM: a deep equivariant graph neural network for universal backmapping from coarse-grained to all-atom representations

Authors: Daniele Angioletti, Stefano Raniolo, Vittorio Limongelli

Subjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG); Biomolecules (q-bio.BM)

Molecular simulations have assumed a paramount role in the fields of chemistry, biology, and material sciences, being able to capture the intricate dynamic properties of systems. Within this realm, coarse-grained (CG) techniques have emerged as invaluable tools to sample large-scale systems and reach extended timescales by simplifying system representation. However, CG approaches come with a trade-off: they sacrifice atomistic details that might hold significant relevance in deciphering the investigated process. Therefore, a recommended approach is to identify key CG conformations and process them using backmapping methods, which retrieve atomistic coordinates. Currently, rule-based methods yield subpar geometries and rely on energy relaxation, resulting in less-than-optimal outcomes. Conversely, machine learning techniques offer higher accuracy but are either limited in transferability between systems or tied to specific CG mappings. In this work, we introduce HEroBM, a dynamic and scalable method that employs deep equivariant graph neural networks and a hierarchical approach to achieve high-resolution backmapping. HEroBM handles any type of CG mapping, offering a versatile and efficient protocol for reconstructing atomistic structures with high accuracy. Focused on local principles, HEroBM spans the entire chemical space and is transferable to systems of varying sizes. We illustrate the versatility of our framework through diverse biological systems, including a complex real-case scenario. Here, our end-to-end backmapping approach accurately generates the atomistic coordinates of a G protein-coupled receptor bound to an organic small molecule within a cholesterol/phospholipid bilayer.
[332] arXiv:2404.16991 (cross-list from quant-ph) [pdf, ps, other]: Title: Efficient Variational Quantum Linear Solver for Structured Sparse Matrices

Authors: Abeynaya Gnanasekaran, Amit Surana

Subjects: Quantum Physics (quant-ph); Numerical Analysis (math.NA)

We develop a novel approach for efficiently applying variational quantum linear solver (VQLS) in context of structured sparse matrices. Such matrices frequently arise during numerical solution of partial differential equations which are ubiquitous in science and engineering. Conventionally, Pauli basis is used for linear combination of unitary (LCU) decomposition of the underlying matrix to facilitate the evaluation the global/local VQLS cost functions. However, Pauli basis in worst case can result in number of LCU terms that scale quadratically with respect to the matrix size. We show that by using an alternate basis one can better exploit the sparsity and underlying structure of matrix leading to number of tensor product terms which scale only logarithmically with respect to the matrix size. Given this new basis is comprised of non-unitary operators, we employ the concept of unitary completion to design efficient quantum circuits for computing the global/local VQLS cost functions. We compare our approach with other related concepts in the literature including unitary dilation and measurement in Bell basis, and discuss its pros/cons while using VQLS applied to Heat equation as an example.
[333] arXiv:2404.17019 (cross-list from stat.ME) [pdf, other]: Title: Neyman Meets Causal Machine Learning: Experimental Evaluation of Individualized Treatment Rules

Authors: Michael Lingzhi Li, Kosuke Imai

Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of individualized treatment rules (ITRs), which are derived by modern causal machine learning algorithms. In particular, we show how to account for additional uncertainty resulting from a training process based on cross-fitting. The primary advantage of Neyman's approach is that it can be applied to any ITR regardless of the properties of machine learning algorithms that are used to derive the ITR. We also show, somewhat surprisingly, that for certain metrics, it is more efficient to conduct this ex-post experimental evaluation of an ITR than to conduct an ex-ante experimental evaluation that randomly assigns some units to the ITR. Our analysis demonstrates that Neyman's repeated sampling framework is as relevant for causal inference today as it has been since its inception.
[334] arXiv:2404.17057 (cross-list from physics.comp-ph) [pdf, other]: Title: Portable, Massively Parallel Implementation of a Material Point Method for Compressible Flows

Authors: Paolo Joseph Baioni, Tommaso Benacchio, Luigi Capone, Carlo de Falco

Comments: 36 pages, 11 figures

Subjects: Computational Physics (physics.comp-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)

The recent evolution of software and hardware technologies is leading to a renewed computational interest in Particle-In-Cell (PIC) methods such as the Material Point Method (MPM). Indeed, provided some critical aspects are properly handled, PIC methods can be cast in formulations suitable to the requirements of data locality and fine-grained parallelism of modern hardware accelerators as Graphics Processing Units (GPUs). Such a rapid and continuous technological development increases also the importance of generic and portable implementations. While continuum mechanics simulations have already shown the capabilities of MPM on a wide range of phenomena, the use of the method in compressible fluid dynamics is less frequent, especially in the supersonic regime. In this paper we present a portable, highly parallel, GPU based MPM solver for compressible gas dynamics. The implementation aims to reach a good compromise between portability and efficiency and to give a first assessment of the potential of this approach in reproducing high speed gas flows, also taking into account solid obstacles. The proposed model constitutes a new step towards the realization of a monolithic MPM solver for Fluid-Structure Interaction (FSI) problems at all Mach numbers up to the supersonic regime.
[335] arXiv:2404.17064 (cross-list from eess.IV) [pdf, other]: Title: Detection of Peri-Pancreatic Edema using Deep Learning and Radiomics Techniques

Authors: Ziliang Hong, Debesh Jha, Koushik Biswas, Zheyuan Zhang, Yury Velichko, Cemal Yazici, Temel Tirkes, Amir Borhani, Baris Turkbey, Alpay Medetalibeyoglu, Gorkem Durak, Ulas Bagci

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Identifying peri-pancreatic edema is a pivotal indicator for identifying disease progression and prognosis, emphasizing the critical need for accurate detection and assessment in pancreatitis diagnosis and management. This study \textit{introduces a novel CT dataset sourced from 255 patients with pancreatic diseases, featuring annotated pancreas segmentation masks and corresponding diagnostic labels for peri-pancreatic edema condition}. With the novel dataset, we first evaluate the efficacy of the \textit{LinTransUNet} model, a linear Transformer based segmentation algorithm, to segment the pancreas accurately from CT imaging data. Then, we use segmented pancreas regions with two distinctive machine learning classifiers to identify existence of peri-pancreatic edema: deep learning-based models and a radiomics-based eXtreme Gradient Boosting (XGBoost). The LinTransUNet achieved promising results, with a dice coefficient of 80.85\%, and mIoU of 68.73\%. Among the nine benchmarked classification models for peri-pancreatic edema detection, \textit{Swin-Tiny} transformer model demonstrated the highest recall of $98.85 \pm 0.42$ and precision of $98.38\pm 0.17$. Comparatively, the radiomics-based XGBoost model achieved an accuracy of $79.61\pm4.04$ and recall of $91.05\pm3.28$, showcasing its potential as a supplementary diagnostic tool given its rapid processing speed and reduced training time. Our code is available \url{https://github.com/NUBagciLab/Peri-Pancreatic-Edema-Detection}.
[336] arXiv:2404.17067 (cross-list from math.CO) [pdf, ps, other]: Title: The distance function on Coxeter-like graphs and self-dual codes

Authors: Marko Orel, Draženka Višnjić

Comments: 44 pages, 1 figure

Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

Let $SGL_n(\mathbb{F}_2)$ be the set of all invertible $n\times n$ symmetric matrices over the binary field $\mathbb{F}_2$. Let $\Gamma_n$ be the graph with the vertex set $SGL_n(\mathbb{F}_2)$ where a pair of matrices $\{A,B\}$ form an edge if and only if $\textrm{rank}(A-B)=1$. In particular, $\Gamma_3$ is the well-known Coxeter graph. The distance function $d(A,B)$ in $\Gamma_n$ is described for all matrices $A,B\in SGL_n(\mathbb{F}_2)$. The diameter of $\Gamma_n$ is computed. For odd $n\geq 3$, it is shown that each matrix $A\in SGL_n(\mathbb{F}_2)$ such that $d(A,I)=\frac{n+5}{2}$ and $\textrm{rank}(A-I)=\frac{n+1}{2}$ where $I$ is the identity matrix induces a self-dual code in $\mathbb{F}_2^{n+1}$. Conversely, each self-dual code $C$ induces a family ${\cal F}_C$ of such matrices $A$. The families given by distinct self-dual codes are disjoint. The identification $C\leftrightarrow {\cal F}_C$ provides a graph theoretical description of self-dual codes. A result of Janusz (2007) is reproved and strengthened by showing that the orthogonal group ${\cal O}_n(\mathbb{F}_2)$ acts transitively on the set of all self-dual codes in $\mathbb{F}_2^{n+1}$.
[337] arXiv:2404.17077 (cross-list from quant-ph) [pdf, other]: Title: Compiler for Distributed Quantum Computing: a Reinforcement Learning Approach

Authors: Panagiotis Promponas, Akrit Mudvari, Luca Della Chiesa, Paul Polakos, Louis Samuel, Leandros Tassiulas

Subjects: Quantum Physics (quant-ph); Networking and Internet Architecture (cs.NI)

The practical realization of quantum programs that require large-scale qubit systems is hindered by current technological limitations. Distributed Quantum Computing (DQC) presents a viable path to scalability by interconnecting multiple Quantum Processing Units (QPUs) through quantum links, facilitating the distributed execution of quantum circuits. In DQC, EPR pairs are generated and shared between distant QPUs, which enables quantum teleportation and facilitates the seamless execution of circuits. A primary obstacle in DQC is the efficient mapping and routing of logical qubits to physical qubits across different QPUs, necessitating sophisticated strategies to overcome hardware constraints and optimize communication. We introduce a novel compiler that, unlike existing approaches, prioritizes reducing the expected execution time by jointly managing the generation and routing of EPR pairs, scheduling remote operations, and injecting SWAP gates to facilitate the execution of local gates. We present a real-time, adaptive approach to compiler design, accounting for the stochastic nature of entanglement generation and the operational demands of quantum circuits. Our contributions are twofold: (i) we model the optimal compiler for DQC using a Markov Decision Process (MDP) formulation, establishing the existence of an optimal algorithm, and (ii) we introduce a constrained Reinforcement Learning (RL) method to approximate this optimal compiler, tailored to the complexities of DQC environments. Our simulations demonstrate that Double Deep Q-Networks (DDQNs) are effective in learning policies that minimize the depth of the compiled circuit, leading to a lower expected execution time and likelihood of successful operation before qubits decohere.
[338] arXiv:2404.17079 (cross-list from quant-ph) [pdf, other]: Title: Improving device-independent weak coin flipping protocols

Authors: Atul Singh Arora, Jamie Sikora, Thomas Van Himbeeck

Comments: 25 pages, 7 figures

Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)

Weak coin flipping is the cryptographic task where Alice and Bob remotely flip a coin but want opposite outcomes. This work studies this task in the device-independent regime where Alice and Bob neither trust each other, nor their quantum devices. The best protocol was devised over a decade ago by Silman, Chailloux, Aharon, Kerenidis, Pironio, and Massar with bias $\varepsilon \approx 0.33664$, where the bias is a commonly adopted security measure for coin flipping protocols. This work presents two techniques to lower the bias of such protocols, namely self-testing and abort-phobic compositions. We apply these techniques to the SCAKPM '11 protocol above and, assuming a continuity conjecture, lower the bias to $\varepsilon \approx 0.29104$. We believe that these techniques could be useful in the design of device-independent protocols for a variety of other tasks.
Independently of weak coin flipping, en route to our results, we show how one can test $n-1$ out of $n$ devices, and estimate the performance of the remaining device, for later use in the protocol. The proof uses linear programming and, due to its generality, may find applications elsewhere.
[339] arXiv:2404.17082 (cross-list from physics.soc-ph) [pdf, other]: Title: Evolutionary game dynamics with environmental feedback in a network with two communities

Authors: Katherine Betz, Feng Fu, Naoki Masuda

Comments: 8 figures, 2 tables

Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI); Dynamical Systems (math.DS); Populations and Evolution (q-bio.PE)

Recent developments of eco-evolutionary models have shown that evolving feedbacks between behavioral strategies and the environment of game interactions, leading to changes in the underlying payoff matrix, can impact the underlying population dynamics in various manners. We propose and analyze an eco-evolutionary game dynamics model on a network with two communities such that players interact with other players in the same community and those in the opposite community at different rates. In our model, we consider two-person matrix games with pairwise interactions occurring on individual edges and assume that the environmental state depends on edges rather than on nodes or being globally shared in the population. We analytically determine the equilibria and their stability under a symmetric population structure assumption, and we also numerically study the replicator dynamics of the general model. The model shows rich dynamical behavior, such as multiple transcritical bifurcations, multistability, and anti-synchronous oscillations. Our work offers insights into understanding how the presence of community structure impacts the eco-evolutionary dynamics within and between niches.
[340] arXiv:2404.17083 (cross-list from eess.IV) [pdf, other]: Title: Calculation of Femur Caput Collum Diaphyseal angle for X-Rays images using Semantic Segmentation

Authors: Deepak Bhatia, Muhammad Abdullah, Anne Querfurth, Mahdi Mantash

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

This paper investigates the use of deep learning approaches to estimate the femur caput-collum-diaphyseal (CCD) angle from X-ray images. The CCD angle is an important measurement in the diagnosis of hip problems, and correct prediction can help in the planning of surgical procedures. Manual measurement of this angle, on the other hand, can be time-intensive and vulnerable to inter-observer variability. In this paper, we present a deep-learning algorithm that can reliably estimate the femur CCD angle from X-ray images. To train and test the performance of our model, we employed an X-ray image dataset with associated femur CCD angle measurements. Furthermore, we built a prototype to display the resulting predictions and to allow the user to interact with the predictions. As this is happening in a sterile setting during surgery, we expanded our interface to the possibility of being used only by voice commands.
Our results show that our deep learning model predicts the femur CCD angle on X-ray images with great accuracy, with a mean absolute error of 4.3 degrees on the left femur and 4.9 degrees on the right femur on the test dataset. Our results suggest that deep learning has the potential to give a more efficient and accurate technique for predicting the femur CCD angle, which might have substantial therapeutic implications for the diagnosis and management of hip problems.
[341] arXiv:2404.17107 (cross-list from eess.AS) [pdf, other]: Title: Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection

Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Comments: 4 pages, 1 figure, and 4 tables. Accepted by IEEE EMBC 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

To reduce the need for skilled clinicians in heart sound interpretation, recent studies on automating cardiac auscultation have explored deep learning approaches. However, despite the demands for large data for deep learning, the size of the heart sound datasets is limited, and no pre-trained model is available. On the contrary, many pre-trained models for general audio tasks are available as general-purpose audio representations. This study explores the potential of general-purpose audio representations pre-trained on large-scale datasets for transfer learning in heart murmur detection. Experiments on the CirCor DigiScope heart sound dataset show that the recent self-supervised learning Masked Modeling Duo (M2D) outperforms previous methods with the results of a weighted accuracy of 0.832 and an unweighted average recall of 0.713. Experiments further confirm improved performance by ensembling M2D with other models. These results demonstrate the effectiveness of general-purpose audio representation in processing heart sounds and open the way for further applications. Our code is available online which runs on a 24 GB consumer GPU at https://github.com/nttcslab/m2d/tree/master/app/circor
[342] arXiv:2404.17128 (cross-list from q-bio.NC) [pdf, other]: Title: Simple Network Mechanism Leads to Quasi-Real Brain Activation Patterns with Drosophila Connectome

Authors: Xiaoyu Zhang, Pengcheng Yang, Jiawei Feng, Qiang Luo, Wei Lin, Xin Lu

Subjects: Neurons and Cognition (q-bio.NC); Social and Information Networks (cs.SI)

Considering the high computational demands of most methods, using network communication models to simulate the brain is a more economical way. However, despite numerous brain network communication models, there is still insufficient evidence that they can effectively replicate the real activation patterns of the brain. Moreover, it remains unclear whether actual network structures are crucial in simulating intelligence. Addressing these issues, we propose a large scale network communication model based on simple rules and design criteria to assess the differences between network models and real situations. We conduct research on the biggest adult Drosophila connectome data set. Experimental results show significant activation in neurons that should respond to stimulus and slight activation in irrelevant ones, which we call quasi-real activation pattern. Besides, when we change the network structure, the quasi-activation patterns disappear. Interestingly, activation regions have shorter network distances to their input neurons, implying that the network structure (not spatial distance) is the core to form brain functionality. In addition, giving the input neurons a unilateral stimulus, we observe a bilateral response, which is consistent with reality. Then we find that both hemispheres have extremely similar statistical indicators. We also develop real-time 3D large spatial network visualization software to observe and document experimental phenomena, filling the software gap. This research reveals network models' power: it can reach the quasi-activation pattern even with simple propagation rules. Besides, it provides evidence that network structure matters in brain activity pattern generation. Future research could fully simulate brain behavior through network models, paving the way for artificial intelligence by developing new propagation rules and optimizing link weights.
[343] arXiv:2404.17142 (cross-list from quant-ph) [pdf, other]: Title: Automated Quantum Circuit Generation for Computing Inverse Hash Functions

Authors: Elena R. Henderson, Jessie M. Henderson, William V. Oxford, Mitchell A. Thornton

Comments: 12 pages, 9 figures, 1 table

Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)

Several cryptographic systems depend upon the computational difficulty of reversing cryptographic hash functions. Robust hash functions transform inputs to outputs in such a way that the inputs cannot be later retrieved in a reasonable amount of time even if the outputs and the function that created them are known. Consequently, hash functions can be cryptographically secure, and they are employed in encryption, authentication, and other security methods. It has been suggested that such cryptographically-secure hash functions will play a critical role in the era of post-quantum cryptography (PQC), as they do in conventional systems. In this work, we introduce a procedure that leverages the principle of reversibility to generate circuits that invert hash functions. We provide a proof-of-concept implementation and describe methods that allow for scaling the hash function inversion approach. Specifically, we implement one manifestation of the algorithm as part of a more general automated quantum circuit synthesis, compilation, and optimization toolkit. We illustrate production of reversible circuits for crypto-hash functions that inherently provide the inverse of the function, and we describe data structures that increase the scalability of the hash function inversion approach.
[344] arXiv:2404.17150 (cross-list from math.CO) [pdf, other]: Title: A concentration phenomenon for $h$-extra edge-connectivity reliability analysis of enhanced hypercubes Q_{n,2} with exponentially many faulty links

Authors: Yali Sun, Mingzu Zhang, Xing Feng, Xing Yang

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

Reliability assessment of interconnection networks is critical to the design and maintenance of multiprocessor systems. The (n, k)-enhanced hypercube Q_{n,k} as a variation of the hypercube Q_{n}, was proposed by Tzeng and Wei in 1991. As an extension of traditional edge-connectivity, h-extra edge-connectivity of a connected graph G, \lambda_h(G), is an essential parameter for evaluating the reliability of interconnection networks. This article intends to study the h-extra edge-connectivity of the (n,2)-enhanced hypercube Q_{n,2}. Suppose that the link malfunction of an interconnection network Q_{n,2} does not isolate any subnetwork with no more than h-1 processors, the minimum number of these possible faulty links concentrate on a constant 2^{n-1} for each integer \lceil\frac{11\times2^{n-1}}{48}\rceil \leq h \leq 2^{n-1} and n\geq 9. That is, for about 77.083 percent values of h\leq2^{n-1}, the corresponding h-extra edge-connectivity of Q_{n,2}, \lambda_h(Q_{n,2}), presents a concentration phenomenon. Moreover, the above lower and upper bounds of h are both tight.
[345] arXiv:2404.17235 (cross-list from eess.IV) [pdf, other]: Title: Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment

Authors: Kazi Shahriar Sanjid, Md. Tanzim Hossain, Md. Shakib Shahariar Junayed, M. Monir Uddin

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning has revolutionized medical imaging by providing innovative solutions to complex healthcare challenges. Traditional models often struggle to dynamically adjust feature importance, resulting in suboptimal representation, particularly in tasks like semantic segmentation crucial for accurate structure delineation. Moreover, their static nature incurs high computational costs. To tackle these issues, we introduce Mamba-Ahnet, a novel integration of State Space Model (SSM) and Advanced Hierarchical Network (AHNet) within the MAMBA framework, specifically tailored for semantic segmentation in medical imaging.Mamba-Ahnet combines SSM's feature extraction and comprehension with AHNet's attention mechanisms and image reconstruction, aiming to enhance segmentation accuracy and robustness. By dissecting images into patches and refining feature comprehension through self-attention mechanisms, the approach significantly improves feature resolution. Integration of AHNet into the MAMBA framework further enhances segmentation performance by selectively amplifying informative regions and facilitating the learning of rich hierarchical representations. Evaluation on the Universal Lesion Segmentation dataset demonstrates superior performance compared to state-of-the-art techniques, with notable metrics such as a Dice similarity coefficient of approximately 98% and an Intersection over Union of about 83%. These results underscore the potential of our methodology to enhance diagnostic accuracy, treatment planning, and ultimately, patient outcomes in clinical practice. By addressing the limitations of traditional models and leveraging the power of deep learning, our approach represents a significant step forward in advancing medical imaging technology.
[346] arXiv:2404.17271 (cross-list from stat.OT) [pdf, other]: Title: To democratize research with sensitive data, we should make synthetic data more accessible

Authors: Erik-Jan van Kesteren

Comments: 4 pages, 2 figures

Subjects: Other Statistics (stat.OT); Computers and Society (cs.CY)

For over 30 years, synthetic data has been heralded as a promising solution to make sensitive datasets accessible. However, despite much research effort and several high-profile use-cases, the widespread adoption of synthetic data as a tool for open, accessible, reproducible research with sensitive data is still a distant dream. In this opinion, Erik-Jan van Kesteren, head of the ODISSEI Social Data Science team, argues that in order to progress towards widespread adoption of synthetic data as a privacy enhancing technology, the data science research community should shift focus away from developing better synthesis methods: instead, it should develop accessible tools, educate peers, and publish small-scale case studies.
[347] arXiv:2404.17279 (cross-list from math.CO) [pdf, ps, other]: Title: Bipartite powers of some classes of bipartite graphs

Authors: Indrajit Paul, Ashok Kumar Das

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

Graph powers are a well-studied concept in graph theory. Analogous to graph powers, Chandran et al.[3] introduced the concept of bipartite powers for bipartite graphs. In this paper, we will demonstrate that some well-known classes of bipartite graphs, namely the interval bigraphs, proper interval bigraphs, and bigraphs of Ferrers dimension 2, are closed under the operation of taking bipartite powers. Finally, we define strongly closed property for bipartite graphs under powers and have shown that the class of chordal bipartite graphs is strongly closed under powers.
[348] arXiv:2404.17306 (cross-list from math.CO) [pdf, other]: Title: Quickly excluding an apex-forest

Authors: Jędrzej Hodor, Hoang La, Piotr Micek, Clément Rambaud

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

We give a short proof that for every apex-forest $X$ on at least two vertices, graphs excluding $X$ as a minor have layered pathwidth at most $2|V(X)|-3$. This improves upon a result by Dujmovi\'c, Eppstein, Joret, Morin, and Wood (SIDMA, 2020). Our main tool is a structural result about graphs excluding a forest as a rooted minor, which is of independent interest. We develop similar tools for treedepth and treewidth. We discuss implications for Erd\H{o}s-P\'osa properties of rooted models of minors in graphs.
[349] arXiv:2404.17349 (cross-list from math.CO) [pdf, other]: Title: Rectangulotopes

Authors: Jean Cardinal, Vincent Pilaud

Comments: 23 pages, 14 figures

Subjects: Combinatorics (math.CO); Computational Geometry (cs.CG); Discrete Mathematics (cs.DM)

Rectangulations are decompositions of a square into finitely many axis-aligned rectangles. We describe realizations of (n-1)-dimensional polytopes associated with two combinatorial families of rectangulations composed of n rectangles. They are defined as quotientopes of natural lattice congruences on the weak Bruhat order on permutations in S_n, and their skeleta are flip graphs on rectangulations. We give simple vertex and facet descriptions of these polytopes, in particular elementary formulas for computing the coordinates of the vertex corresponding to each rectangulation, in the spirit of J.-L. Loday's realization of the associahedron.
[350] arXiv:2404.17357 (cross-list from eess.IV) [pdf, other]: Title: Simultaneous Tri-Modal Medical Image Fusion and Super-Resolution using Conditional Diffusion Model

Authors: Yushen Xu, Xiaosong Li, Yuchan Jie, Haishu Tan

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In clinical practice, tri-modal medical image fusion, compared to the existing dual-modal technique, can provide a more comprehensive view of the lesions, aiding physicians in evaluating the disease's shape, location, and biological activity. However, due to the limitations of imaging equipment and considerations for patient safety, the quality of medical images is usually limited, leading to sub-optimal fusion performance, and affecting the depth of image analysis by the physician. Thus, there is an urgent need for a technology that can both enhance image resolution and integrate multi-modal information. Although current image processing methods can effectively address image fusion and super-resolution individually, solving both problems synchronously remains extremely challenging. In this paper, we propose TFS-Diff, a simultaneously realize tri-modal medical image fusion and super-resolution model. Specially, TFS-Diff is based on the diffusion model generation of a random iterative denoising process. We also develop a simple objective function and the proposed fusion super-resolution loss, effectively evaluates the uncertainty in the fusion and ensures the stability of the optimization process. And the channel attention module is proposed to effectively integrate key information from different modalities for clinical diagnosis, avoiding information loss caused by multiple image processing. Extensive experiments on public Harvard datasets show that TFS-Diff significantly surpass the existing state-of-the-art methods in both quantitative and visual evaluations. The source code will be available at GitHub.
[351] arXiv:2404.17359 (cross-list from math.AP) [pdf, ps, other]: Title: Relations between Kondratiev spaces and refined localization Triebel-Lizorkin spaces

Authors: Markus Hansen, Benjamin Scharf, Cornelia Schneider

Comments: 26 pages

Subjects: Analysis of PDEs (math.AP); Functional Analysis (math.FA); Numerical Analysis (math.NA)

We investigate the close relation between certain weighted Sobolev spaces (Kondratiev spaces) and refined localization spaces from introduced by Triebel [39,40]. In particular, using a characterization for refined localization spaces from Scharf [32], we considerably improve an embedding from Hansen [17]. This embedding is of special interest in connection with convergence rates for adaptive approximation schemes.
[352] arXiv:2404.17365 (cross-list from cond-mat.soft) [pdf, other]: Title: Similarity Equivariant Graph Neural Networks for Homogenization of Metamaterials

Authors: Fleur Hendriks (1), Vlado Menkovski (1), Martin Doškář (2), Marc G. D. Geers (1), Ondřej Rokoš (1) ((1) Eindhoven University of Technology, (2) Czech Technical University in Prague)

Comments: 54 pages, 20 figures submitted to CMAME (Computer Methods in Applied Mechanics and Engineering)

Subjects: Soft Condensed Matter (cond-mat.soft); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Soft, porous mechanical metamaterials exhibit pattern transformations that may have important applications in soft robotics, sound reduction and biomedicine. To design these innovative materials, it is important to be able to simulate them accurately and quickly, in order to tune their mechanical properties. Since conventional simulations using the finite element method entail a high computational cost, in this article we aim to develop a machine learning-based approach that scales favorably to serve as a surrogate model. To ensure that the model is also able to handle various microstructures, including those not encountered during training, we include the microstructure as part of the network input. Therefore, we introduce a graph neural network that predicts global quantities (energy, stress stiffness) as well as the pattern transformations that occur (the kinematics). To make our model as accurate and data-efficient as possible, various symmetries are incorporated into the model. The starting point is an E(n)-equivariant graph neural network (which respects translation, rotation and reflection) that has periodic boundary conditions (i.e., it is in-/equivariant with respect to the choice of RVE), is scale in-/equivariant, can simulate large deformations, and can predict scalars, vectors as well as second and fourth order tensors (specifically energy, stress and stiffness). The incorporation of scale equivariance makes the model equivariant with respect to the similarities group, of which the Euclidean group E(n) is a subgroup. We show that this network is more accurate and data-efficient than graph neural networks with fewer symmetries. To create an efficient graph representation of the finite element discretization, we use only the internal geometrical hole boundaries from the finite element mesh to achieve a better speed-up and scaling with the mesh size.
[353] arXiv:2404.17369 (cross-list from q-fin.CP) [pdf, ps, other]: Title: Assessing the Potential of AI for Spatially Sensitive Nature-Related Financial Risks

Authors: Steven Reece, Emma O donnell, Felicia Liu, Joanna Wolstenholme, Frida Arriaga, Giacomo Ascenzi, Richard Pywell

Comments: 67 pages, 10 figures, UKRI (NERC) Integrated Finance and Biodiversity for a Nature Positive Future Programme

Subjects: Computational Finance (q-fin.CP); Artificial Intelligence (cs.AI)

There is growing recognition among financial institutions, financial regulators and policy makers of the importance of addressing nature-related risks and opportunities. Evaluating and assessing nature-related risks for financial institutions is challenging due to the large volume of heterogeneous data available on nature and the complexity of investment value chains and the various components' relationship to nature. The dual problem of scaling data analytics and analysing complex systems can be addressed using Artificial Intelligence (AI). We address issues such as plugging existing data gaps with discovered data, data estimation under uncertainty, time series analysis and (near) real-time updates. This report presents potential AI solutions for models of two distinct use cases, the Brazil Beef Supply Use Case and the Water Utility Use Case. Our two use cases cover a broad perspective within sustainable finance. The Brazilian cattle farming use case is an example of greening finance - integrating nature-related considerations into mainstream financial decision-making to transition investments away from sectors with poor historical track records and unsustainable operations. The deployment of nature-based solutions in the UK water utility use case is an example of financing green - driving investment to nature-positive outcomes. The two use cases also cover different sectors, geographies, financial assets and AI modelling techniques, providing an overview on how AI could be applied to different challenges relating to nature's integration into finance. This report is primarily aimed at financial institutions but is also of interest to ESG data providers, TNFD, systems modellers, and, of course, AI practitioners.
[354] arXiv:2404.17378 (cross-list from quant-ph) [pdf, ps, other]: Title: Quantum Adjoint Convolutional Layers for Effective Data Representation

Authors: Ren-Xin Zhao, Shi Wang, Yaonan Wang

Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI)

Quantum Convolutional Layer (QCL) is considered as one of the core of Quantum Convolutional Neural Networks (QCNNs) due to its efficient data feature extraction capability. However, the current principle of QCL is not as mathematically understandable as Classical Convolutional Layer (CCL) due to its black-box structure. Moreover, classical data mapping in many QCLs is inefficient. To this end, firstly, the Quantum Adjoint Convolution Operation (QACO) consisting of a quantum amplitude encoding and its inverse is theoretically shown to be equivalent to the quantum normalization of the convolution operation based on the Frobenius inner product while achieving an efficient characterization of the data. Subsequently, QACO is extended into a Quantum Adjoint Convolutional Layer (QACL) by Quantum Phase Estimation (QPE) to compute all Frobenius inner products in parallel. At last, comparative simulation experiments are carried out on PennyLane and TensorFlow platforms, mainly for the two cases of kernel fixed and unfixed in QACL. The results demonstrate that QACL with the insight of special quantum properties for the same images, provides higher training accuracy in MNIST and Fashion MNIST classification experiments, but sacrifices the learning performance to some extent. Predictably, our research lays the foundation for the development of efficient and interpretable quantum convolutional networks and also advances the field of quantum machine vision.
[355] arXiv:2404.17398 (cross-list from stat.ML) [pdf, other]: Title: Online Policy Learning and Inference by Matrix Completion

Authors: Congyuan Duan, Jingyang Li, Dong Xia

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Making online decisions can be challenging when features are sparse and orthogonal to historical ones, especially when the optimal policy is learned through collaborative filtering. We formulate the problem as a matrix completion bandit (MCB), where the expected reward under each arm is characterized by an unknown low-rank matrix. The $\epsilon$-greedy bandit and the online gradient descent algorithm are explored. Policy learning and regret performance are studied under a specific schedule for exploration probabilities and step sizes. A faster decaying exploration probability yields smaller regret but learns the optimal policy less accurately. We investigate an online debiasing method based on inverse propensity weighting (IPW) and a general framework for online policy inference. The IPW-based estimators are asymptotically normal under mild arm-optimality conditions. Numerical simulations corroborate our theoretical findings. Our methods are applied to the San Francisco parking pricing project data, revealing intriguing discoveries and outperforming the benchmark policy.
[356] arXiv:2404.17426 (cross-list from eess.IV) [pdf, ps, other]: Title: One-Shot Image Restoration

Authors: Deborah Pereg

Comments: arXiv admin note: text overlap with arXiv:2209.14267

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Image restoration, or inverse problems in image processing, has long been an extensively studied topic. In recent years supervised learning approaches have become a popular strategy attempting to tackle this task. Unfortunately, most supervised learning-based methods are highly demanding in terms of computational resources and training data (sample complexity). In addition, trained models are sensitive to domain changes, such as varying acquisition systems, signal sampling rates, resolution and contrast. In this work, we try to answer a fundamental question: Can supervised learning models generalize well solely by learning from one image or even part of an image? If so, then what is the minimal amount of patches required to achieve acceptable generalization? To this end, we focus on an efficient patch-based learning framework that requires a single image input-output pair for training. Experimental results demonstrate the applicability, robustness and computational efficiency of the proposed approach for supervised image deblurring and super-resolution. Our results showcase significant improvement of learning models' sample efficiency, generalization and time complexity, that can hopefully be leveraged for future real-time applications, and applied to other signals and modalities.
[357] arXiv:2404.17429 (cross-list from stat.ML) [pdf, other]: Title: Separation capacity of linear reservoirs with random connectivity matrix

Authors: Youness Boutaib

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)

We argue that the success of reservoir computing lies within the separation capacity of the reservoirs and show that the expected separation capacity of random linear reservoirs is fully characterised by the spectral decomposition of an associated generalised matrix of moments. Of particular interest are reservoirs with Gaussian matrices that are either symmetric or whose entries are all independent. In the symmetric case, we prove that the separation capacity always deteriorates with time; while for short inputs, separation with large reservoirs is best achieved when the entries of the matrix are scaled with a factor $\rho_T/\sqrt{N}$, where $N$ is the dimension of the reservoir and $\rho_T$ depends on the maximum length of the input time series. In the i.i.d. case, we establish that optimal separation with large reservoirs is consistently achieved when the entries of the reservoir matrix are scaled with the exact factor $1/\sqrt{N}$. We further give upper bounds on the quality of separation in function of the length of the time series. We complement this analysis with an investigation of the likelihood of this separation and the impact of the chosen architecture on separation consistency.
[358] arXiv:2404.17442 (cross-list from stat.ML) [pdf, ps, other]: Title: Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

Authors: Benjamin Dupuis, Paul Viallard, George Deligiannidis, Umut Simsekli

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We propose data-dependent uniform generalization bounds by approaching the problem from a PAC-Bayesian perspective. We first apply the PAC-Bayesian framework on `random sets' in a rigorous way, where the training algorithm is assumed to output a data-dependent hypothesis set after observing the training data. This approach allows us to prove data-dependent bounds, which can be applicable in numerous contexts. To highlight the power of our approach, we consider two main applications. First, we propose a PAC-Bayesian formulation of the recently developed fractal-dimension-based generalization bounds. The derived results are shown to be tighter and they unify the existing results around one simple proof technique. Second, we prove uniform bounds over the trajectories of continuous Langevin dynamics and stochastic gradient Langevin dynamics. These results provide novel information about the generalization properties of noisy algorithms.
[359] arXiv:2404.17466 (cross-list from physics.comp-ph) [pdf, other]: Title: FTL: Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks

Authors: Zhe Bai, Xishuo Wei, William Tang, Leonid Oliker, Zhihong Lin, Samuel Williams

Comments: 18 pages, 10 figures

Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG); Plasma Physics (physics.plasm-ph)

Deep learning algorithms provide a new paradigm to study high-dimensional dynamical behaviors, such as those in fusion plasma systems. Development of novel model reduction methods, coupled with detection of abnormal modes with plasma physics, opens a unique opportunity for building efficient models to identify plasma instabilities for real-time control. Our Fusion Transfer Learning (FTL) model demonstrates success in reconstructing nonlinear kink mode structures by learning from a limited amount of nonlinear simulation data. The knowledge transfer process leverages a pre-trained neural encoder-decoder network, initially trained on linear simulations, to effectively capture nonlinear dynamics. The low-dimensional embeddings extract the coherent structures of interest, while preserving the inherent dynamics of the complex system. Experimental results highlight FTL's capacity to capture transitional behaviors and dynamical features in plasma dynamics -- a task often challenging for conventional methods. The model developed in this study is generalizable and can be extended broadly through transfer learning to address various magnetohydrodynamics (MHD) modes.
[360] arXiv:2404.17483 (cross-list from stat.ML) [pdf, other]: Title: Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect Estimation

Authors: Yoichi Chikahara, Kansei Ushiyama

Comments: Accepted to UAI2024. 14 pages, 4 figures

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

There is a growing interest in estimating heterogeneous treatment effects across individuals using their high-dimensional feature attributes. Achieving high performance in such high-dimensional heterogeneous treatment effect estimation is challenging because in this setup, it is usual that some features induce sample selection bias while others do not but are predictive of potential outcomes. To avoid losing such predictive feature information, existing methods learn separate feature representations using the inverse of probability weighting (IPW). However, due to the numerically unstable IPW weights, they suffer from estimation bias under a finite sample setup. To develop a numerically robust estimator via weighted representation learning, we propose a differentiable Pareto-smoothed weighting framework that replaces extreme weight values in an end-to-end fashion. Experimental results show that by effectively correcting the weight values, our method outperforms the existing ones, including traditional weighting schemes.
[361] arXiv:2404.17490 (cross-list from eess.AS) [pdf, other]: Title: The CARFAC v2 Cochlear Model in Matlab, NumPy, and JAX

Authors: Richard F. Lyon, Rob Schonberger, Malcolm Slaney, Mihajlo Velimirović, Honglin Yu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)

The open-source CARFAC (Cascade of Asymmetric Resonators with Fast-Acting Compression) cochlear model is upgraded to version 2, with improvements to the Matlab implementation, and with new Python/NumPy and JAX implementations -- but C++ version changes are still pending. One change addresses the DC (direct current, or zero frequency) quadratic distortion anomaly previously reported; another reduces the neural synchrony at high frequencies; the others have little or no noticeable effect in the default configuration. A new feature allows modeling a reduction of cochlear amplifier function, as a step toward a differentiable parameterized model of hearing impairment. In addition, the integration into the Auditory Model Toolbox (AMT) has been extensively improved, as the prior integration had bugs that made it unsuitable for including CARFAC in multi-model comparisons.
[362] arXiv:2404.17495 (cross-list from physics.bio-ph) [pdf, other]: Title: Q-Learning to navigate turbulence without a map

Authors: Marco Rando, Martin James, Alessandro Verri, Lorenzo Rosasco, Agnese Seminara

Comments: 18 pages, 8 figures

Subjects: Biological Physics (physics.bio-ph); Machine Learning (cs.LG)

We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor location. We ask whether navigation strategies to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent plumes. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.
[363] arXiv:2404.17541 (cross-list from math.OC) [pdf, ps, other]: Title: Applications of Lifted Nonlinear Cuts to Convex Relaxations of the AC Power Flow Equations

Authors: Sergio I. Bugosen, Robert B. Parker, Carleton Coffrin

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

We demonstrate that valid inequalities, or lifted nonlinear cuts (LNC), can be projected to tighten the Second Order Cone (SOC), Convex DistFlow (CDF), and Network Flow (NF) relaxations of the AC Optimal Power Flow (AC-OPF) problem. We conduct experiments on 36 cases from the PGLib-OPF library for two objective functions, (1) power generation maximization and (2) generation cost minimization. Significant optimality gap improvements are shown for the maximization problem, where the LNC strengthen the SOC and CDF relaxations in 100% of the test cases, with average and maximum differences in the optimality gaps of 23.1% and 93.5% respectively. The NF relaxation is strengthened in 79.2% of test cases, with average and maximum differences in the optimality gaps of 3.45% and 21.2% respectively. We also study the trade-off between relaxation quality and solve time, demonstrating that the strengthened CDF relaxation outperforms the strengthened SOC formulation in terms of runtime and number of iterations needed, while the strengthened NF formulation is the most scalable with the lowest relaxation quality provided by these LNC.
[364] arXiv:2404.17542 (cross-list from physics.flu-dyn) [pdf, other]: Title: A mesh-constrained discrete point method for incompressible flows with moving boundaries

Authors: Takeharu Matsuda, Satoshi Ii

Subjects: Fluid Dynamics (physics.flu-dyn); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

Particle-based methods are a practical tool in computational fluid dynamics, and novel types of methods have been proposed. However, widely developed Lagrangian-type formulations suffer from the nonuniform distribution of particles, which is enhanced over time and result in problems in computational efficiency and parallel computations. To mitigate these problems, a mesh-constrained discrete point (MCD) method was developed for stationary boundary problems (Matsuda et al., 2022). Although the MCD method is a meshless method that uses moving least-squares approximation, the arrangement of particles (or discrete points (DPs)) is specialized so that their positions are constrained in background meshes to obtain a closely uniform distribution. This achieves a reasonable approximation for spatial derivatives with compact stencils without encountering any ill-posed condition and leads to good performance in terms of computational efficiency. In this study, a novel meshless method based on the MCD method for incompressible flows with moving boundaries is proposed. To ensure the mesh constraint of each DP in moving boundary problems, a novel updating algorithm for the DP arrangement is developed so that the position of DPs is not only rearranged but the DPs are also reassigned the role of being on the boundary or not. The proposed method achieved reasonable results in numerical experiments for well-known moving boundary problems.
[365] arXiv:2404.17552 (cross-list from eess.AS) [pdf, other]: Title: A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification

Authors: Rémi Uro, David Doukhan, Albert Rilliard, Laëtitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon, Antoine Laurent

Comments: Keywords:, semi-automatic processing, corpus creation, diarization, speaker identification, gender-balanced, age-balanced, speaker corpus, diachrony

Journal-ref: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 3271-3280, Marseille, 20-25 June 2022. European Language Resources Association (ELRA)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Digital Libraries (cs.DL); Machine Learning (cs.LG); Sound (cs.SD)

This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker's age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For each speaker, speech excerpts were extracted from audiovisual documents using an automatic pipeline consisting of speech detection, background music and overlapped speech removal and speaker diarization, used to present clean speaker segments to human annotators identifying target speakers. This pipeline proved highly effective, cutting down manual processing by a factor of ten. Evaluation of the quality of the automatic processing and of the final output is provided. It shows the automatic processing compare to up-to-date process, and that the output provides high quality speech for most of the selected excerpts. This method shows promise for creating large corpora of known target speakers.
[366] arXiv:2404.17564 (cross-list from math.CO) [pdf, other]: Title: Half-space separation in monophonic convexity

Authors: Mohammed Elaroussi, Lhouari Nourine, Simon Vilmin

Comments: 22 pages, 11 figures

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

We study half-space separation in the convexity of chordless paths of a graph, i.e., monophonic convexity. In this problem, one is given a graph and two (disjoint) subsets of vertices and asks whether these two sets can be separated by complementary convex sets, called half-spaces. While it is known this problem is $\mathbf{NP}$-complete for geodesic convexity -- the convexity of shortest paths -- we show that it can be solved in polynomial time for monophonic convexity.

Replacements for Mon, 29 Apr 24

[367] arXiv:1706.08941 (replaced) [pdf, other]: Title: Hybrid Localized Spectral Decomposition for multiscale problems

Authors: Alexandre L. Madureira, Marcus Sarkis

Subjects: Numerical Analysis (math.NA)
[368] arXiv:1907.13283 (replaced) [pdf, other]: Title: A globally conservative finite element MHD code and its application to the study of compact torus formation, levitation and magnetic compression

Authors: Carl Dunlea, Ivan Khalzov

Comments: 50 pages, 78 figures, partially presented in conference posters C. Dunlea et al., Magnetic Compression at General Fusion - Experiment & Simulation with a neutral fluid, APS_DPP 2017, EPS 2018, ICPP_2018

Subjects: Numerical Analysis (math.NA); Plasma Physics (physics.plasm-ph)
[369] arXiv:2008.07007 (replaced) [pdf, other]: Title: Interpretable Representations in Explainable AI: From Theory to Practice

Authors: Kacper Sokol, Peter Flach

Comments: Published in the *Special Issue on Explainable and Interpretable Machine Learning and Data Mining* of the Springer *Data Mining and Knowledge Discovery* journal

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[370] arXiv:2009.06560 (replaced) [pdf, other]: Title: Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

Authors: Lily Xu, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, Milind Tambe

Comments: Published at AAAI 2021. 9 pages (paper and references), 3 page appendix. 6 figures and 1 table

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[371] arXiv:2012.02303 (replaced) [pdf, other]: Title: Decentralized State-Dependent Markov Chain Synthesis with an Application to Swarm Guidance

Authors: Samet Uzun, Nazim Kemal Ure, Behcet Acikmese

Comments: arXiv admin note: text overlap with arXiv:2012.01928

Subjects: Optimization and Control (math.OC); Multiagent Systems (cs.MA); Dynamical Systems (math.DS); Probability (math.PR)
[372] arXiv:2101.00009 (replaced) [pdf, other]: Title: Adversarial Estimation of Riesz Representers

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Machine Learning (stat.ML)
[373] arXiv:2103.03567 (replaced) [pdf, other]: Title: Thermodynamic topology optimization for hardening materials

Authors: Miriam Kick, Philipp Junker

Subjects: Computational Engineering, Finance, and Science (cs.CE)
[374] arXiv:2110.01360 (replaced) [pdf, other]: Title: Bayesian Machine Learning meets Formal Methods: An application to spatio-temporal data

Authors: Laura Vana, Ennio Visconti, Laura Nenzi, Annalisa Cadonna, Gregor Kastner

Subjects: Computation (stat.CO); Logic in Computer Science (cs.LO)
[375] arXiv:2111.06390 (replaced) [pdf, other]: Title: Full Characterization of Adaptively Strong Majority Voting in Crowdsourcing

Authors: Margarita Boyarskaya, Panos Ipeirotis

Subjects: Applications (stat.AP); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC)
[376] arXiv:2111.08108 (replaced) [pdf, other]: Title: Physics-informed neural networks via stochastic Hamiltonian dynamics learning

Authors: Chandrajit Bajaj, Minh Nguyen

Comments: To be published in Springer series "Lecture Notes in Networks and Systems"

Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)
[377] arXiv:2112.11561 (replaced) [pdf, other]: Title: Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

Authors: Shahin Atakishiyev, Mohammad Salameh, Hengshuai Yao, Randy Goebel

Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[378] arXiv:2203.05314 (replaced) [pdf, other]: Title: SoK: On the Semantic AI Security in Autonomous Driving

Authors: Junjie Shen, Ningfei Wang, Ziwen Wan, Yunpeng Luo, Takami Sato, Zhisheng Hu, Xinyang Zhang, Shengjian Guo, Zhenyu Zhong, Kang Li, Ziming Zhao, Chunming Qiao, Qi Alfred Chen

Comments: Project website: this https URL

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[379] arXiv:2207.14643 (replaced) [pdf, other]: Title: Open World Learning Graph Convolution for Latency Estimation in Routing Networks

Authors: Yifei Jin, Marios Daoutis, Sarunas Girdzijauskas, Aristides Gionis

Comments: Accepted in IJCNN 2022

Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[380] arXiv:2208.08690 (replaced) [pdf, other]: Title: A Survey on Open Information Extraction from Rule-based Model to Large Language Model (meta)

Authors: Pai Liu, Wenyang Gao, Wenjie Dong, Lin Ai, Ziwei Gong, Songfang Huang, Zongsheng Li, Ehsan Hoque, Julia Hirschberg, Yue Zhang

Comments: The first five authors contributed to this work equally. Names are ordered randomly

Subjects: Computation and Language (cs.CL)
[381] arXiv:2210.00802 (replaced) [pdf, other]: Title: DDoS: A Graph Neural Network based Drug Synergy Prediction Algorithm

Authors: Kyriakos Schwarz, Alicia Pliego-Mendieta, Amina Mollaysa, Lara Planas-Paz, Chantal Pauli, Ahmed Allam, Michael Krauthammer

Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
[382] arXiv:2210.01060 (replaced) [pdf, other]: Title: Generating Hidden Markov Models from Process Models Through Nonnegative Tensor Factorization

Authors: Erik Skau, Andrew Hollis, Stephan Eidenbenz, Kim Rasmussen, Boian Alexandrov

Comments: 19 pages, 8 figures, TOMACS

Subjects: Machine Learning (cs.LG)
[383] arXiv:2210.06404 (replaced) [pdf, other]: Title: Graph Neural Network Surrogate for Seismic Reliability Analysis of Highway Bridge Systems

Authors: Tong Liu, Hadi Meidani

Comments: 13 pages, 12 figures

Subjects: Machine Learning (cs.LG)
[384] arXiv:2210.14100 (replaced) [pdf, ps, other]: Title: The capacity of a finite field matrix channel

Authors: Simon R. Blackburn, Jessica Claridge

Comments: 32 pages, 1 figure. Typos corrected, minor changes to proofs for clarity, more discussion added

Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[385] arXiv:2211.02713 (replaced) [pdf, other]: Title: A degree 4 sum-of-squares lower bound for the clique number of the Paley graph

Authors: Dmitriy Kunisky, Xifan Yu

Comments: 62 pages, 3 figures, 1 table; closest to version published in CCC 2023

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Number Theory (math.NT); Optimization and Control (math.OC)
[386] arXiv:2211.03874 (replaced) [pdf, other]: Title: Nearly optimal independence oracle algorithms for edge estimation in hypergraphs

Authors: Holger Dell, John Lapinskas, Kitty Meeks

Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
[387] arXiv:2211.12371 (replaced) [pdf, other]: Title: Gait Recognition in Large-scale Free Environment via Single LiDAR

Authors: Xiao Han, Yiming Ren, Peishan Cong, Yujing Sun, Jingya Wang, Lan Xu, Yuexin Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[388] arXiv:2212.01485 (replaced) [pdf, other]: Title: A Theory of Semantic Communication

Authors: Yulin Shao, Qi Cao, Deniz Gunduz

Comments: Keywords: Semantic communication, joint source-channel coding, semantic decoding, semantic encoding, large language model

Subjects: Information Theory (cs.IT)
[389] arXiv:2301.03889 (replaced) [pdf, other]: Title: Earn While You Reveal: Private Set Intersection that Rewards Participants

Authors: Aydin Abadi

Comments: 54 pages

Subjects: Cryptography and Security (cs.CR)
[390] arXiv:2301.10389 (replaced) [pdf, other]: Title: Counterfactual Editing for Search Result Explanation

Authors: Zhichao Xu, Hemank Lamba, Qingyao Ai, Joel Tetreault, Alex Jaimes

Comments: work in progress

Subjects: Information Retrieval (cs.IR)
[391] arXiv:2301.13014 (replaced) [pdf, other]: Title: Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval

Authors: Ling Xiao, Toshihiko Yamasaki

Journal-ref: IEEE Access, vol. 12, pp. 48068-48080, 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[392] arXiv:2302.05309 (replaced) [pdf, other]: Title: The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization

Authors: Ilayda Yaman, Guoda Tian, Martin Larsson, Patrik Persson, Michiel Sandra, Alexander Dürr, Erik Tegler, Nikhil Challa, Henrik Garde, Fredrik Tufvesson, Kalle Åström, Ove Edfors, Steffen Malkowsky, Liang Liu

Comments: 7 pages, 7 figures, Accepted to ICRA 2024

Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[393] arXiv:2302.07867 (replaced) [pdf, other]: Title: Learning Performance-Improving Code Edits

Authors: Alexander Shypula, Aman Madaan, Yimeng Zeng, Uri Alon, Jacob Gardner, Milad Hashemi, Graham Neubig, Parthasarathy Ranganathan, Osbert Bastani, Amir Yazdanbakhsh

Comments: Published as a conference paper at ICLR 2024 (Spotlight). Project website: this https URL

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
[394] arXiv:2302.10468 (replaced) [pdf, other]: Title: Soft Error Reliability Analysis of Vision Transformers

Authors: Xinghua Xue, Cheng Liu, Ying Wang, Bing Yang, Tao Luo, Lei Zhang, Huawei Li, Xiaowei Li

Subjects: Cryptography and Security (cs.CR)
[395] arXiv:2303.02432 (replaced) [pdf, other]: Title: Good Gottesman-Kitaev-Preskill codes from the NTRU cryptosystem

Authors: Jonathan Conrad, Jens Eisert, Jean-Pierre Seifert

Comments: 23 pages, 10 figures, comments welcome! Version 3 contains added clarifications and an additional proof of the Gaussian heuristic for a class of NTRU-like lattices

Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)
[396] arXiv:2303.06524 (replaced) [pdf, ps, other]: Title: A heuristic search algorithm for discovering large Condorcet domains

Authors: Bei Zhou, Søren Riis

Subjects: Discrete Mathematics (cs.DM)
[397] arXiv:2303.08736 (replaced) [pdf, other]: Title: A machine-learning approach to thunderstorm forecasting through post-processing of simulation data

Authors: Kianusch Vahid Yousefnia, Tobias Bölle, Isabella Zöbisch, Thomas Gerz

Comments: 19 pages, 11 figures, 3 tables. Submitted to Quarterly Journal of the Royal Meteorological Society; v3: More thorough explanation of our use of ensemble data, improved performance of SALAMA in reliability diagrams; v2: Consideration of additional skill scores and more competitive baseline model, and novel visualization of reliability and resolution as a function of model probability

Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)
[398] arXiv:2303.17249 (replaced) [pdf, ps, other]: Title: Model-agnostic explainable artificial intelligence for object detection in image data

Authors: Milad Moradi, Ke Yan, David Colwell, Matthias Samwald, Rhona Asgari

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[399] arXiv:2304.03659 (replaced) [pdf, other]: Title: Probing Conceptual Understanding of Large Visual-Language Models

Authors: Madeline Schiappa, Raiyaan Abdullah, Shehreen Azad, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh Rawat

Comments: All code and dataset is available at: this https URL Accepted in CVPRW 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[400] arXiv:2304.05360 (replaced) [pdf, ps, other]: Title: A Third Information-Theoretic Approach to Finite de Finetti Theorems

Authors: Mario Berta, Lampros Gavalakis, Ioannis Kontoyiannis

Comments: 11 pages, no figures. In the second version the introduction is slightly extended, two new references and Section 2.4 have been added

Subjects: Information Theory (cs.IT); Probability (math.PR); Quantum Physics (quant-ph)
[401] arXiv:2304.05370 (replaced) [pdf, other]: Title: Overload: Latency Attacks on Object Detection for Edge Devices

Authors: Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-rung Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[402] arXiv:2304.09400 (replaced) [pdf, other]: Title: On the Capacity Region of Reconfigurable Intelligent Surface Assisted Symbiotic Radios

Authors: Qianqian Zhang, Hu Zhou, Ying-Chang Liang, Sumei Sun, Wei Zhang, H. Vincent Poor

Subjects: Information Theory (cs.IT)
[403] arXiv:2304.11671 (replaced) [pdf, other]: Title: Battery Capacity Knee-Onset Identification and Early Prediction Using Degradation Curvature

Authors: Huang Zhang, Faisal Altaf, Torsten Wik

Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
[404] arXiv:2304.12559 (replaced) [pdf, ps, other]: Title: Attraction by pairwise coherence explains the emergence of ideological sorting

Authors: Federico Zimmerman, Lucía Pedraza, Joaquín Navajas, Pablo Balenzuela

Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)
[405] arXiv:2304.13029 (replaced) [pdf, other]: Title: Bake off redux: a review and experimental evaluation of recent time series classification algorithms

Authors: Matthew Middlehurst, Patrick Schäfer, Anthony Bagnall

Subjects: Machine Learning (cs.LG)
[406] arXiv:2305.02605 (replaced) [pdf, other]: Title: Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy

Authors: Xiang Zheng, Xingjun Ma, Shengjie Wang, Xinyu Wang, Chao Shen, Cong Wang

Comments: Accepted by DSN 2024

Subjects: Machine Learning (cs.LG)
[407] arXiv:2305.08275 (replaced) [pdf, other]: Title: ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

Authors: Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese

Comments: CVPR2024

Journal-ref: CVPR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[408] arXiv:2305.09278 (replaced) [pdf, other]: Title: Multi-domain FEM-BEM coupling for acoustic scattering

Authors: Marcella Bonazzoli (IDEFIX), Xavier Claeys (LJLL (UMR\_7598))

Subjects: Numerical Analysis (math.NA)
[409] arXiv:2305.10223 (replaced) [pdf, other]: Title: NAI$_2$: Learning Noise-Aware Illumination-Interpolator for Unsupervised Low-Light Image Enhancement

Authors: Xiaofeng Liu, Jiaxin Gao, Xin Fan, Risheng Liu

Comments: Image processing, low-light image enhancement, noise estimation, illumination learning

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[410] arXiv:2305.12517 (replaced) [pdf, other]: Title: Description-Based Text Similarity

Authors: Shauli Ravfogel, Valentina Pyatkin, Amir DN Cohen, Avshalom Manevich, Yoav Goldberg

Comments: A preprint

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[411] arXiv:2305.12829 (replaced) [pdf, other]: Title: On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection

Authors: Fatma Elsafoury, Stamos Katsigiannis

Comments: 10 pages

Subjects: Computation and Language (cs.CL)
[412] arXiv:2305.14852 (replaced) [pdf, other]: Title: Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning

Authors: Moonseok Choi, Hyungi Lee, Giung Nam, Juho Lee

Comments: ICLR 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[413] arXiv:2306.02090 (replaced) [pdf, other]: Title: Deep Classifier Mimicry without Data Access

Authors: Steven Braun, Martin Mundt, Kristian Kersting

Comments: 11 pages main, 4 figures, 2 tables, 4 pages appendix

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[414] arXiv:2306.04242 (replaced) [pdf, other]: Title: 4D Millimeter-Wave Radar in Autonomous Driving: A Survey

Authors: Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, Keqiang Li

Subjects: Signal Processing (eess.SP); Robotics (cs.RO)
[415] arXiv:2306.04367 (replaced) [pdf, other]: Title: Solving NP-hard Problems on \textsc{GaTEx} Graphs: Linear-Time Algorithms for Perfect Orderings, Cliques, Colorings, and Independent Sets

Authors: Marc Hellmuth, Guillaume E. Scholz

Subjects: Discrete Mathematics (cs.DM); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
[416] arXiv:2306.05272 (replaced) [pdf, other]: Title: Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Authors: Tianzhe Chu, Shengbang Tong, Tianjiao Ding, Xili Dai, Benjamin David Haeffele, René Vidal, Yi Ma

Comments: 23 pages, 14 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[417] arXiv:2306.06306 (replaced) [pdf, other]: Title: DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

Authors: Fuxiao Liu, Hao Tan, Chris Tensmeyer

Comments: Accepted to ICPRAI 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[418] arXiv:2306.07249 (replaced) [pdf, other]: Title: Generalized Power Attacks against Crypto Hardware using Long-Range Deep Learning

Authors: Elie Bursztein, Luca Invernizzi, Karel Král, Daniel Moghimi, Jean-Michel Picod, Marina Zhang

Subjects: Cryptography and Security (cs.CR)
[419] arXiv:2306.08313 (replaced) [pdf, other]: Title: A Proxy Attack-Free Strategy for Practically Improving the Poisoning Efficiency in Backdoor Attacks

Authors: Ziqiang Li, Hong Sun, Pengfei Xia, Beihao Xia, Xue Rui, Wei Zhang, Qinglang Guo, Bin Li

Comments: Under review

Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[420] arXiv:2306.12416 (replaced) [pdf, other]: Title: Quantum soft-covering lemma with applications to rate-distortion coding, resolvability and identification via quantum channels

Authors: Touheed Anwar Atif, S. Sandeep Pradhan, Andreas Winter

Comments: 30 pages, 3 figures; v2 fixes an error in Definition 36 and various typos and minor issues throughout; v3 fixes some further minor points and provides a proof of Lemma 11 due to F. Dupuis, it is the version accepted by IJQI (special issue in honour of Alexander S. Holevo)

Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
[421] arXiv:2306.12465 (replaced) [pdf, other]: Title: Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference

Authors: Boyan Li, Luziwei Leng, Shuaijie Shen, Kaixuan Zhang, Jianguo Zhang, Jianxing Liao, Ran Cheng

Comments: IEEE TNNLS

Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
[422] arXiv:2306.15945 (replaced) [pdf, ps, other]: Title: Permutation Polynomial Interleaved Zadoff-Chu Sequences

Authors: Fredrik Berggren, Branislav M. Popovic

Comments: Submitted to IEEE Transactions on Information Theory

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[423] arXiv:2307.05052 (replaced) [pdf, other]: Title: Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps

Authors: Fuxiao Liu, Paiheng Xu, Zongxia Li, Yue Feng, Hyemi Song

Comments: 10 pages, 5 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[424] arXiv:2307.09055 (replaced) [pdf, ps, other]: Title: Robust Data Clustering with Outliers via Transformed Tensor Low-Rank Representation

Authors: Tong Wu

Comments: AISTATS 2024

Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[425] arXiv:2307.09691 (replaced) [pdf, other]: Title: Joint Service Caching, Communication and Computing Resource Allocation in Collaborative MEC Systems: A DRL-based Two-timescale Approach

Authors: Qianqian Liu, Haixia Zhang, Xin Zhang, Dongfeng Yuan

Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
[426] arXiv:2307.12765 (replaced) [pdf, other]: Title: HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation

Authors: Runzhen Xue, Dengke Han, Mingyu Yan, Mo Zou, Xiaocheng Yang, Duo Wang, Wenming Li, Zhimin Tang, John Kim, Xiaochun Ye, Dongrui Fan

Comments: 16 pages, 17 figures; To appear in IEEE TPDS 2024

Subjects: Hardware Architecture (cs.AR)
[427] arXiv:2307.14826 (replaced) [pdf, other]: Title: Graded Semantics and Graded Logics for Eilenberg-Moore Coalgebras

Authors: Jonas Forster, Lutz Schröder, Paul Wild, Harsh Beohar, Sebastian Gurke, Karla Messing

Subjects: Logic in Computer Science (cs.LO)
[428] arXiv:2307.15830 (replaced) [pdf, other]: Title: A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting

Authors: Christopher Salazar, Ashis G. Banerjee

Subjects: Machine Learning (cs.LG)
[429] arXiv:2308.00130 (replaced) [pdf, other]: Title: Kinodynamic Motion Planning via Funnel Control for Underactuated Unmanned Surface Vehicles

Authors: Dženan Lapandić, Christos K. Verginis, Dimos V. Dimarogonas, Bo Wahlberg

Comments: 12 pages, 10 figures, submitted to IEEE T-CST

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[430] arXiv:2308.09375 (replaced) [pdf, other]: Title: Image Processing and Machine Learning for Hyperspectral Unmixing: An Overview and the HySUPP Python Package

Authors: Behnood Rasti (HZDR), Alexandre Zouaoui (Thoth), Julien Mairal (Thoth), Jocelyn Chanussot (Thoth)

Comments: IEEE Transactions on Geoscience and Remote Sensing, 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[431] arXiv:2308.10684 (replaced) [pdf, other]: Title: Systematic Offensive Stereotyping (SOS) Bias in Language Models

Authors: Fatma Elsafoury

Comments: Keywords: Systematic offensive stereotyping (SOS) bias, Language models, bias removal, fairness, hate speech detection

Subjects: Computation and Language (cs.CL)
[432] arXiv:2308.11581 (replaced) [pdf, ps, other]: Title: Dynamical Low-Rank Approximation for Stochastic Differential Equations

Authors: Yoshihito Kazashi, Fabio Nobile, Fabio Zoccolan

Comments: 41 pages

Subjects: Numerical Analysis (math.NA)
[433] arXiv:2309.05855 (replaced) [pdf, other]: Title: Instabilities in Convnets for Raw Audio

Authors: Daniel Haider, Vincent Lostanlen, Martin Ehler, Peter Balazs

Comments: 4 pages, 5 figures, 1 page appendix with mathematical proofs

Journal-ref: IEEE Signal Processing Letters 31 (2024) 1084-1088

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[434] arXiv:2309.06462 (replaced) [pdf, other]: Title: Action Segmentation Using 2D Skeleton Heatmaps and Multi-Modality Fusion

Authors: Syed Waleed Hyder, Muhammad Usama, Anas Zafar, Muhammad Naufil, Fawad Javed Fateh, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran

Comments: Accepted to ICRA 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[435] arXiv:2309.07880 (replaced) [pdf, other]: Title: mEBAL2 Database and Benchmark: Image-based Multispectral Eyeblink Detection

Authors: Roberto Daza, Aythami Morales, Julian Fierrez, Ruben Tolosana, Ruben Vera-Rodriguez

Comments: Published in the journal Pattern Recognition Letters in June 2024. Accessible from this https URL

Journal-ref: Pattern Recognition Letters, vol. 182, pp. 83-89, 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[436] arXiv:2309.13541 (replaced) [pdf, other]: Title: Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies

Authors: Prithwish Basu, Liangyu Zhao, Jason Fantl, Siddharth Pal, Arvind Krishnamurthy, Joud Khoury

Comments: HPDC '24

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
[437] arXiv:2310.00724 (replaced) [pdf, other]: Title: Subtractive Mixture Models via Squaring: Representation and Learning

Authors: Lorenzo Loconte, Aleksanteri M. Sladek, Stefan Mengel, Martin Trapp, Arno Solin, Nicolas Gillis, Antonio Vergari

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[438] arXiv:2310.00923 (replaced) [pdf, other]: Title: Lightweight Regression Model with Prediction Interval Estimation for Computer Vision-based Winter Road Surface Condition Monitoring

Authors: Risto Ojala, Alvari Seppänen

Comments: Published in IEEE Transactions on Intelligent Vehicles (2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[439] arXiv:2310.01522 (replaced) [pdf, other]: Title: Property-preserving numerical approximations of a Cahn-Hilliard-Navier-Stokes model with variable densities and degenerate mobility

Authors: Daniel Acosta-Soba, Francisco Guillén-González, J. Rafael Rodríguez-Galván, Jin Wang

Comments: 32 pages, 11 figures, 2 tables

Subjects: Numerical Analysis (math.NA)
[440] arXiv:2310.01717 (replaced) [pdf, other]: Title: Ensemble Distillation for Unsupervised Constituency Parsing

Authors: Behzad Shayegh, Yanshuai Cao, Xiaodan Zhu, Jackie C.K. Cheung, Lili Mou

Comments: Accepted by International Conference on Learning Representations (ICLR) 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[441] arXiv:2310.02992 (replaced) [pdf, other]: Title: Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Authors: Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei

Comments: Code: this https URL Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[442] arXiv:2310.08701 (replaced) [pdf, other]: Title: Analyzing User Ideologies and Shared News During the 2019 Argentinian Elections

Authors: Sofía M del Pozo, Sebastián Pinto, Matteo Serafino, Lucio Garcia, Hernán A Makse, Pablo Balenzuela

Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
[443] arXiv:2310.09413 (replaced) [pdf, other]: Title: ZeroSwap: Data-driven Optimal Market Making in DeFi

Authors: Viraj Nadkarni, Jiachen Hu, Ranvir Rana, Chi Jin, Sanjeev Kulkarni, Pramod Viswanath

Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
[444] arXiv:2310.10858 (replaced) [pdf, other]: Title: Designing Shared Information Displays for Agents of Varying Strategic Sophistication

Authors: Dongping Zhang, Jason Hartline, Jessica Hullman

Comments: 34 pages, 11 figures, 7 tables. Accepted by ACM CSCW 2024

Subjects: Human-Computer Interaction (cs.HC); Computer Science and Game Theory (cs.GT)
[445] arXiv:2310.10879 (replaced) [pdf, other]: Title: BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

Authors: Raphael Ruschel, A. S. M. Iftekhar, B. S. Manjunath, Suya You

Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[446] arXiv:2310.11305 (replaced) [pdf, other]: Title: MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games

Authors: Ti-Rong Wu, Hung Guei, Pei-Chiun Peng, Po-Wei Huang, Ting Han Wei, Chung-Chin Shih, Yun-Jui Tsai

Comments: Accepted by IEEE Transactions on Games

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[447] arXiv:2310.13441 (replaced) [pdf, ps, other]: Title: Seamless, Correct, and Generic Programming over Serialised Data

Authors: Guillaume Allais

Comments: As submitted to JFP

Subjects: Programming Languages (cs.PL)
[448] arXiv:2310.18932 (replaced) [pdf, other]: Title: Self Attention with Temporal Prior: Can We Learn More from Arrow of Time?

Authors: Kyung Geun Kim, Byeong Tak Lee

Subjects: Artificial Intelligence (cs.AI)
[449] arXiv:2310.19046 (replaced) [pdf, other]: Title: Large Language Models as Evolutionary Optimizers

Authors: Shengcai Liu, Caishun Chen, Xinghua Qu, Ke Tang, Yew-Soon Ong

Comments: Accepted by CEC 2024

Subjects: Neural and Evolutionary Computing (cs.NE)
[450] arXiv:2310.19091 (replaced) [pdf, other]: Title: Bridging the Gap: Towards an Expanded Toolkit for ML-Supported Decision-Making in the Public Sector

Authors: Unai Fischer-Abaigar, Christoph Kern, Noam Barda, Frauke Kreuter

Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Methodology (stat.ME)
[451] arXiv:2310.20431 (replaced) [pdf, other]: Title: Raising the ClaSS of Streaming Time Series Segmentation

Authors: Arik Ermshaus, Patrick Schäfer, Ulf Leser

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)
[452] arXiv:2311.01167 (replaced) [pdf, ps, other]: Title: Modulation Design and Optimization for RIS-Assisted Symbiotic Radios

Authors: Hu Zhou, Bowen Cai, Qianqian Zhang, Ruizhe Long, Yiyang Pei, Ying-Chang Liang

Comments: 16 pages,16 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[453] arXiv:2311.01194 (replaced) [pdf, other]: Title: Predictive Modelling of Critical Variables for Improving HVOF Coating using Gamma Regression Models

Authors: Wolfgang Rannetbauer, Simon Hubmer, Carina Hambrock, Ronny Ramlau

Comments: 37 pages, 7 figures

Subjects: Applications (stat.AP); Numerical Analysis (math.NA); Applied Physics (physics.app-ph)
[454] arXiv:2311.02296 (replaced) [pdf, other]: Title: Survey of Simulators for Aerial Robots

Authors: Cora A. Dimmig, Giuseppe Silano, Kimberly McGuire, Chiara Gabellieri, Wolfgang Hönig, Joseph Moore, Marin Kobilarov

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Robotics (cs.RO)
[455] arXiv:2311.02610 (replaced) [pdf, other]: Title: An adaptive standardisation methodology for Day-Ahead electricity price forecasting

Authors: Carlos Sebastián, Carlos E. González-Guillén, Jesús Juan

Subjects: Applications (stat.AP); Machine Learning (cs.LG); Methodology (stat.ME)
[456] arXiv:2311.04730 (replaced) [pdf, other]: Title: Predicting Properties of Nodes via Community-Aware Features

Authors: Bogumił Kamiński, Paweł Prałat, François Théberge, Sebastian Zając

Comments: 21 pages, 3 figures, 7 tables

Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Combinatorics (math.CO)
[457] arXiv:2311.06890 (replaced) [pdf, ps, other]: Title: Distributed Sequential Receding Horizon Control of Multi-Agent Systems under Recurring Signal Temporal Logic

Authors: Eleftherios E. Vlahakis, Lars Lindemann, Dimos V. Dimarogonas

Comments: Accepted for presentation at ECC24

Subjects: Systems and Control (eess.SY)
[458] arXiv:2311.07310 (replaced) [pdf, other]: Title: Dynamic Optimization on Quantum Hardware: Feasibility for a Process Industry Use Case

Authors: Dennis Michael Nenno, Adrian Caspari

Comments: 21 pages, 6 figures

Journal-ref: Computers and Chemical Engineering, 2024, 108704, ISSN 0098-1354

Subjects: Optimization and Control (math.OC); Emerging Technologies (cs.ET)
[459] arXiv:2311.07550 (replaced) [pdf, other]: Title: Tabdoor: Backdoor Vulnerabilities in Transformer-based Neural Networks for Tabular Data

Authors: Bart Pleiter, Behrad Tajalli, Stefanos Koffas, Gorka Abad, Jing Xu, Martha Larson, Stjepan Picek

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[460] arXiv:2311.08496 (replaced) [pdf, other]: Title: A Robust, Efficient Predictive Safety Filter

Authors: Wenceslao Shaw Cortez, Jan Drgona, Draguna Vrabie, Mahantesh Halappanavar

Subjects: Systems and Control (eess.SY)
[461] arXiv:2311.08880 (replaced) [pdf, other]: Title: Motion Control of Two Mobile Robots under Allowable Collisions

Authors: Li Tan, Wei Ren, Xi-Ming Sun, Junlin Xiong

Comments: 8 pages, 5 figures

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[462] arXiv:2311.08931 (replaced) [pdf, ps, other]: Title: Structural-Based Uncertainty in Deep Learning Across Anatomical Scales: Analysis in White Matter Lesion Segmentation

Authors: Nataliia Molchanova, Vatsal Raina, Andrey Malinin, Francesco La Rosa, Adrien Depeursinge, Mark Gales, Cristina Granziera, Henning Muller, Mara Graziani, Meritxell Bach Cuadra

Comments: Preprint submitted to the journal

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[463] arXiv:2311.08941 (replaced) [pdf, other]: Title: Transformers in the Service of Description Logic-based Contexts

Authors: Angelos Poulis, Eleni Tsalapati, Manolis Koubarakis

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[464] arXiv:2311.09745 (replaced) [pdf, other]: Title: Application-Centric Benchmarking of Distributed FaaS Platforms using BeFaaS

Authors: Martin Grambow, Tobias Pfandzelter, David Bermbach

Comments: arXiv admin note: substantial text overlap with arXiv:2102.12770

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[465] arXiv:2311.10448 (replaced) [pdf, other]: Title: DeepClean: Machine Unlearning on the Cheap by Resetting Privacy Sensitive Weights using the Fisher Diagonal

Authors: Jiaeli Shi, Najah Ghalyan, Kostis Gourgoulias, John Buford, Sean Moran

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[466] arXiv:2311.10543 (replaced) [pdf, other]: Title: Joint covariance properties under geometric image transformations for spatio-temporal receptive fields according to the generalized Gaussian derivative model for visual receptive fields

Authors: Tony Lindeberg

Comments: 38 pages, 13 figures. Note: From version 4, this paper considers a different form of joint composition of the geometric image transformations than in the earlier versions

Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
[467] arXiv:2311.12161 (replaced) [pdf, other]: Title: ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing

Authors: Ayush Kumar Shah, Bryan Manrique Amador, Abhisek Dey, Ming Creekmore, Blake Ocampo, Scott Denmark, Richard Zanibbi

Comments: 20 pages without references, 12 figures, 4 Tables, submitted to International Journal on Document Analysis and Recognition (IJDAR)

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[468] arXiv:2311.12327 (replaced) [pdf, other]: Title: Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models

Authors: Xiaoyu Yang, Lijian Xu, Hao Sun, Hongsheng Li, Shaoting Zhang

Comments: 22 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[469] arXiv:2311.13125 (replaced) [pdf, other]: Title: DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

Authors: Zhiqin Chen, Qimin Chen, Hang Zhou, Hao Zhang

Comments: SIGGRAPH 2024 conference track

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[470] arXiv:2311.13668 (replaced) [pdf, other]: Title: MAIRA-1: A specialised large multimodal model for radiology report generation

Authors: Stephanie L. Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Mercy Ranjit, Anton Schwaighofer, Fernando Pérez-García, Valentina Salvatelli, Shaury Srivastav, Anja Thieme, Noel Codella, Matthew P. Lungren, Maria Teodora Wetscherek, Ozan Oktay, Javier Alvarez-Valle

Comments: 18 pages, 9 tables, 5 figures. v2 adds test IDs and image encoder citation. v3 fixes error in NPV/specificity

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[471] arXiv:2311.16568 (replaced) [pdf, ps, other]: Title: Active Reconfigurable Intelligent Surface Enhanced Spectrum Sensing for Cognitive Radio Networks

Authors: Jungang Ge, Ying-Chang Liang, Sumei Sun, Yonghong Zeng, Zhidong Bai

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[472] arXiv:2311.17774 (replaced) [pdf, ps, other]: Title: Enumeration of Minimum Weight Codewords of Pre-Transformed Polar Codes by Tree Intersection

Authors: Andreas Zunker, Marvin Geiselhart, Stephan ten Brink

Comments: 8 pages, 4 figures, extended version of the CISS 2024 paper

Subjects: Information Theory (cs.IT)
[473] arXiv:2311.18405 (replaced) [pdf, other]: Title: CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

Authors: Jianhao Zeng, Dan Song, Weizhi Nie, Hongshuo Tian, Tongtong Wang, Anan Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[474] arXiv:2311.18588 (replaced) [pdf, other]: Title: Optimizing ZX-Diagrams with Deep Reinforcement Learning

Authors: Maximilian Nägele, Florian Marquardt

Comments: 12 pages, 7 figures - Revision on 26.04.2024: Fixed bug in training algorithm to give quantitatively better results (qualitative results unchanged)

Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
[475] arXiv:2312.00923 (replaced) [pdf, other]: Title: Label Delay in Online Continual Learning

Authors: Botos Csaba, Wenxuan Zhang, Matthias Müller, Ser-Nam Lim, Mohamed Elhoseiny, Philip Torr, Adel Bibi

Comments: 17 pages, 12 figures

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[476] arXiv:2312.01253 (replaced) [pdf, ps, other]: Title: On Merits of Faster-than-Nyquist Signaling in the Finite Blocklength Regime

Authors: Yong Jin Daniel Kim

Subjects: Information Theory (cs.IT)
[477] arXiv:2312.02246 (replaced) [pdf, other]: Title: Conditional Variational Diffusion Models

Authors: Gabriel della Maggiora, Luis Alberto Croquevielle, Nikita Deshpande, Harry Horsley, Thomas Heinis, Artur Yakimovich

Comments: Denoising Diffusion Probabilistic Models, Inverse Problems, Generative Models, Super Resolution, Phase Quantification, Variational Methods

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[478] arXiv:2312.04606 (replaced) [pdf, other]: Title: Urban Region Representation Learning with Attentive Fusion

Authors: Fengze Sun, Jianzhong Qi, Yanchuan Chang, Xiaoliang Fan, Shanika Karunasekera, Egemen Tanin

Subjects: Machine Learning (cs.LG); Databases (cs.DB)
[479] arXiv:2312.04876 (replaced) [pdf, other]: Title: GVE-Louvain: Fast Louvain Algorithm for Community Detection in Shared Memory Setting

Authors: Subhajit Sahu

Comments: 11 pages, 8 figures, 2 tables

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[480] arXiv:2312.05677 (replaced) [pdf, other]: Title: Batched Low-Rank Adaptation of Foundation Models

Authors: Yeming Wen, Swarat Chaudhuri

Comments: 16 pages, 3 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[481] arXiv:2312.06738 (replaced) [pdf, other]: Title: InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following

Authors: Shufan Li, Harkanwar Singh, Aditya Grover

Comments: 29 pages, 14 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[482] arXiv:2312.10441 (replaced) [pdf, ps, other]: Title: Disjunctive Policies for Database-Backed Programs

Authors: Amir M. Ahmadian, Matvey Soloviev, Musard Balliu

Comments: 21 pages, including references and appendix. Extended version of paper accepted to CSF 2024

Subjects: Cryptography and Security (cs.CR)
[483] arXiv:2312.11894 (replaced) [pdf, other]: Title: 3D-LFM: Lifting Foundation Model

Authors: Mosam Dabhi, Laszlo A. Jeni, Simon Lucey

Comments: Visit the project page at this https URL for links to additional media, code, and videos. The site also features a custom GPT tailored to address queries related to 3D-LFM. Accepted at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[484] arXiv:2312.15416 (replaced) [pdf, other]: Title: On Completeness of SDP-Based Barrier Certificate Synthesis over Unbounded Domains

Authors: Hao Wu, Shenghua Feng, Ting Gan, Jie Wang, Bican Xia, Naijun Zhan

Comments: 18 pages, 1 figure

Subjects: Systems and Control (eess.SY)
[485] arXiv:2312.16011 (replaced) [pdf, other]: Title: Assigning Stationary Distributions to Sparse Stochastic Matrices

Authors: Nicolas Gillis, Paul Van Dooren

Comments: 29 pages, code available from this https URL In this third version, we have added clarifications, corrections and remarks suggested to us by anonymous reviewers

Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC); Probability (math.PR); Computation (stat.CO)
[486] arXiv:2312.17023 (replaced) [pdf, other]: Title: Tensorial structure of the lifting doctrine in constructive domain theory

Authors: Jonathan Sterling

Comments: Minor errors fixed

Subjects: Category Theory (math.CT); Logic in Computer Science (cs.LO)
[487] arXiv:2312.17163 (replaced) [pdf, other]: Title: FENet: Focusing Enhanced Network for Lane Detection

Authors: Liman Wang, Hanyang Zhong

Comments: 12 pages including appendix. The Code is available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[488] arXiv:2312.17296 (replaced) [pdf, other]: Title: Structured Packing in LLM Training Improves Long Context Utilization

Authors: Konrad Staniszewski, Szymon Tworkowski, Yu Zhao, Sebastian Jaszczur, Henryk Michalewski, Łukasz Kuciński, Piotr Miłoś

Subjects: Computation and Language (cs.CL)
[489] arXiv:2401.00260 (replaced) [pdf, other]: Title: GazeCLIP: Towards Enhancing Gaze Estimation via Text Guidance

Authors: Jun Wang, Hao Ruan, Mingjie Wang, Chuanghui Zhang, Huachun Li, Jun Zhou

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[490] arXiv:2401.01373 (replaced) [pdf, other]: Title: Boosting Defect Detection in Manufacturing using Tensor Convolutional Neural Networks

Authors: Pablo Martin-Ramiro, Unai Sainz de la Maza, Sukhbinder Singh, Roman Orus, Samuel Mugel

Comments: 12 pages, 4 figures, 2 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantum Physics (quant-ph)
[491] arXiv:2401.01472 (replaced) [pdf, other]: Title: Studying and Recommending Information Highlighting in Stack Overflow Answers

Authors: Shahla Shaan Ahmed, Shaowei Wang, Yuan Tian, Tse-Hsun (Peter) Chen, Haoxiang Zhang

Comments: This work is submitted to Information and Software Technology Journal

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Software Engineering (cs.SE)
[492] arXiv:2401.02416 (replaced) [pdf, other]: Title: ODIN: A Single Model for 2D and 3D Segmentation

Authors: Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

Comments: Camera Ready (CVPR 2024, Highlight)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[493] arXiv:2401.03855 (replaced) [pdf, other]: Title: PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLM

Authors: Ankit Yadav, Mayank Singh

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[494] arXiv:2401.05217 (replaced) [pdf, other]: Title: Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box Method

Authors: Chenxi Yang, Yujia Liu, Dingquan Li, Tingting Jiang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[495] arXiv:2401.07282 (replaced) [pdf, ps, other]: Title: Half-Space Modeling with Reflecting Surface in Molecular Communication

Authors: Anil Kamber, H. Birkan Yilmaz, Ali Emre Pusane, Tuna Tugcu

Comments: 9 pages, 10 figures

Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)
[496] arXiv:2401.08366 (replaced) [pdf, ps, other]: Title: On the formalization of the notion of an algorithm

Authors: C. A. Middelburg

Comments: 22 pages, revision of v1, presentation improved at several places and some minor errors corrected

Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Logic in Computer Science (cs.LO)
[497] arXiv:2401.08495 (replaced) [pdf, other]: Title: Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans

Authors: Messi H.J. Lee, Jacob M. Montgomery, Calvin K. Lai

Comments: Forthcoming at ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2024

Subjects: Computation and Language (cs.CL)
[498] arXiv:2401.08664 (replaced) [pdf, other]: Title: Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges

Authors: Qingyao Li, Lingyue Fu, Weiming Zhang, Xianyu Chen, Jingwei Yu, Wei Xia, Weinan Zhang, Ruiming Tang, Yong Yu

Comments: 31 pages, 5 figures, 1 table

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[499] arXiv:2401.10711 (replaced) [pdf, other]: Title: Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering

Authors: Haibo Wang, Chenghang Lai, Yixuan Sun, Weifeng Ge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[500] arXiv:2401.10805 (replaced) [pdf, other]: Title: Learning to Visually Connect Actions and their Effects

Authors: Eric Peh, Paritosh Parmar, Basura Fernando

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[501] arXiv:2401.13555 (replaced) [pdf, other]: Title: Benchmarking the Fairness of Image Upsampling Methods

Authors: Mike Laszkiewicz, Imant Daunhawer, Julia E. Vogt, Asja Fischer, Johannes Lederer

Comments: This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published at the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[502] arXiv:2401.13695 (replaced) [pdf, other]: Title: Inverse analysis of granular flows using differentiable graph neural network simulator

Authors: Yongjin Choi, Krishna Kumar

Subjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG)
[503] arXiv:2401.15261 (replaced) [pdf, other]: Title: Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

Authors: Diandian Guo, Deng-Ping Fan, Tongyu Lu, Christos Sakaridis, Luc Van Gool

Comments: CVPR 2024 highlight

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[504] arXiv:2401.15489 (replaced) [pdf, other]: Title: Distilling Privileged Multimodal Information for Expression Recognition using Optimal Transport

Authors: Muhammad Haseeb Aslam, Muhammad Osama Zeeshan, Soufiane Belharbi, Marco Pedersoli, Alessandro Koerich, Simon Bacon, Eric Granger

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[505] arXiv:2401.16587 (replaced) [pdf, other]: Title: A Linguistic Comparison between Human and ChatGPT-Generated Conversations

Authors: Morgan Sandler, Hyesun Choung, Arun Ross, Prabu David

Comments: Proceedings of the 4th International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI), Jeju, Korea, 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[506] arXiv:2402.00290 (replaced) [pdf, other]: Title: MEIA: Towards Realistic Multimodal Interaction and Manipulation for Embodied Robots

Authors: Yang Liu, Xinshuai Song, Kaixuan Jiang, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin

Comments: Codes will be available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[507] arXiv:2402.01528 (replaced) [pdf, other]: Title: Decoding Speculative Decoding

Authors: Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[508] arXiv:2402.02301 (replaced) [pdf, ps, other]: Title: MATLAB Simulator of Level-Index Arithmetic

Authors: Mantas Mikaitis

Subjects: Mathematical Software (cs.MS); Hardware Architecture (cs.AR); Numerical Analysis (math.NA)
[509] arXiv:2402.03922 (replaced) [pdf, other]: Title: Competitive advantage of URLLC vs. eMBB for supporting timeliness-relevant services

Authors: Luis Guijarro, Jose-Ramon Vidal, Vicent Pla

Subjects: Networking and Internet Architecture (cs.NI); Theoretical Economics (econ.TH)
[510] arXiv:2402.06497 (replaced) [pdf, other]: Title: Iris-SAM: Iris Segmentation Using a Foundation Model

Authors: Parisa Farmanifard, Arun Ross

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[511] arXiv:2402.06798 (replaced) [pdf, other]: Title: Reasoning Grasping via Multimodal Large Language Model

Authors: Shiyu Jin, Jinxuan Xu, Yutian Lei, Liangjun Zhang

Subjects: Robotics (cs.RO)
[512] arXiv:2402.06815 (replaced) [pdf, other]: Title: Estimating Player Performance in Different Contexts Using Fine-tuned Large Events Models

Authors: Tiago Mendes-Neves, Luís Meireles, João Mendes-Moreira

Subjects: Machine Learning (cs.LG)
[513] arXiv:2402.06820 (replaced) [pdf, other]: Title: Forecasting Events in Soccer Matches Through Language

Authors: Tiago Mendes-Neves, Luís Meireles, João Mendes-Moreira

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[514] arXiv:2402.08015 (replaced) [pdf, other]: Title: Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets

Authors: Israel Abebe Azime, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Mitiku Yohannes Fuge, Aman Kassahun Wassie, Eyasu Shiferaw Jada, Yonas Chanie, Walelign Tewabe Sewunetie, Seid Muhie Yimam

Subjects: Computation and Language (cs.CL)
[515] arXiv:2402.08522 (replaced) [pdf, other]: Title: Fairness Auditing with Multi-Agent Collaboration

Authors: Martijn de Vos, Akash Dhasade, Jade Garcia Bourrée, Anne-Marie Kermarrec, Erwan Le Merrer, Benoit Rottembourg, Gilles Tredan

Comments: 13 pages, 6 figures

Subjects: Machine Learning (cs.LG)
[516] arXiv:2402.09877 (replaced) [pdf, other]: Title: On Computing Plans with Uniform Action Costs

Authors: Alberto Pozanco, Daniel Borrajo, Manuela Veloso

Subjects: Artificial Intelligence (cs.AI)
[517] arXiv:2402.09971 (replaced) [pdf, other]: Title: Parameterized Vertex Integrity Revisited

Authors: Tesshu Hanaka, Michael Lampis, Manolis Vasilakis, Kanae Yoshiwatari

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
[518] arXiv:2402.12237 (replaced) [pdf, other]: Title: Learning to Defer in Content Moderation: The Human-AI Interplay

Authors: Thodoris Lykouris, Wentao Weng

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC); Performance (cs.PF)
[519] arXiv:2402.12819 (replaced) [pdf, other]: Title: Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance

Authors: Branislav Pecher, Ivan Srba, Maria Bielikova

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[520] arXiv:2402.15264 (replaced) [pdf, other]: Title: DEEM: Dynamic Experienced Expert Modeling for Stance Detection

Authors: Xiaolong Wang, Yile Wang, Sijie Cheng, Peng Li, Yang Liu

Comments: Accepted by LREC-COLING 2024, Oral presentation

Subjects: Computation and Language (cs.CL)
[521] arXiv:2402.17428 (replaced) [pdf, other]: Title: Shortest cover after edit

Authors: Kazuki Mitani, Takuya Mieno, Kazuhisa Seto, Takashi Horiyama

Subjects: Data Structures and Algorithms (cs.DS)
[522] arXiv:2402.18573 (replaced) [pdf, other]: Title: UniMODE: Unified Monocular 3D Object Detection

Authors: Zhuoling Li, Xiaogang Xu, SerNam Lim, Hengshuang Zhao

Comments: This paper has been accepted for publication in CVPR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[523] arXiv:2402.18892 (replaced) [pdf, other]: Title: Aligning Knowledge Graph with Visual Perception for Object-goal Navigation

Authors: Nuo Xu, Wen Wang, Rong Yang, Mengjie Qin, Zheyuan Lin, Wei Song, Chunlong Zhang, Jason Gu, Chao Li

Comments: Accepted to ICRA 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[524] arXiv:2403.00642 (replaced) [pdf, other]: Title: Rethinking The Uniformity Metric in Self-Supervised Learning

Authors: Xianghong Fang, Jian Li, Qiang Sun, Benyou Wang

Journal-ref: ICLR 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[525] arXiv:2403.00843 (replaced) [pdf, other]: Title: Large Language Models are Learnable Planners for Long-Term Recommendation

Authors: Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, Fuli Feng

Comments: 11 pages, 5 figures

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[526] arXiv:2403.01180 (replaced) [pdf, other]: Title: Misconfiguration in O-RAN: Analysis of the impact of AI/ML

Authors: Noe Yungaicela-Naula, Vishal Sharma, Sandra Scott-Hayward

Subjects: Networking and Internet Architecture (cs.NI)
[527] arXiv:2403.02862 (replaced) [pdf, other]: Title: Numerical investigation of stabilization in the Hybridizable Discontinuous Galerkin method for linear anisotropic elastic equation

Authors: Ha Pham, Florian Faucher, Hélène Barucq

Comments: 34 pages, 9 figures

Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA)
[528] arXiv:2403.03314 (replaced) [pdf, other]: Title: Collision Avoidance Verification of Multiagent Systems with Learned Policies

Authors: Zihao Dong, Shayegan Omidshafiei, Michael Everett

Comments: 6 pages, 6 figures

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Robotics (cs.RO)
[529] arXiv:2403.03611 (replaced) [pdf, ps, other]: Title: Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Authors: Dang Thoai Phan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[530] arXiv:2403.03627 (replaced) [pdf, other]: Title: Multimodal Large Language Models to Support Real-World Fact-Checking

Authors: Jiahui Geng, Yova Kementchedjhieva, Preslav Nakov, Iryna Gurevych

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[531] arXiv:2403.04074 (replaced) [pdf, other]: Title: Improving HTTP/3 Quality of Experience with Incremental EPS

Authors: Abhinav Gupta, Radim Bartos

Subjects: Networking and Internet Architecture (cs.NI)
[532] arXiv:2403.04654 (replaced) [pdf, other]: Title: Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention

Authors: R. Gnana Praveen, Jahangir Alam

Comments: Accepted to FG2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[533] arXiv:2403.05728 (replaced) [pdf, other]: Title: Engineering Formality and Software Risk in Debian Python Packages

Authors: Matthew Gaughan, Kaylea Champion, Sohyeon Hwang

Comments: Preprint of archival paper accepted for IEEE SANER 2024

Subjects: Software Engineering (cs.SE); Computers and Society (cs.CY)
[534] arXiv:2403.05921 (replaced) [pdf, other]: Title: OntoChat: a Framework for Conversational Ontology Engineering using Language Models

Authors: Bohui Zhang, Valentina Anita Carriero, Katrin Schreiberhuber, Stefani Tsaneva, Lucía Sánchez González, Jongmo Kim, Jacopo de Berardinis

Comments: ESWC 2024 Special Track on Large Language Models for Knowledge Engineering

Subjects: Artificial Intelligence (cs.AI)
[535] arXiv:2403.07789 (replaced) [pdf, other]: Title: RobotCycle: Assessing Cycling Safety in Urban Environments

Authors: Efimia Panagiotaki, Tyler Reinmund, Stephan Mouton, Luke Pitt, Arundathi Shaji Shanthini, Wayne Tubby, Matthew Towlson, Samuel Sze, Brian Liu, Chris Prahacs, Daniele De Martini, Lars Kunze

Comments: IEEE Intelligent Vehicles Symposium (IV 2024)

Subjects: Robotics (cs.RO)
[536] arXiv:2403.08715 (replaced) [pdf, other]: Title: SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents

Authors: Ruiyi Wang, Haofei Yu, Wenxin Zhang, Zhengyang Qi, Maarten Sap, Graham Neubig, Yonatan Bisk, Hao Zhu

Subjects: Computation and Language (cs.CL)
[537] arXiv:2403.08718 (replaced) [pdf, ps, other]: Title: Probabilistic Metaplasticity for Continual Learning with Memristors

Authors: Fatima Tuz Zohora, Vedant Karia, Nicholas Soures, Dhireesha Kudithipudi

Subjects: Systems and Control (eess.SY)
[538] arXiv:2403.10021 (replaced) [pdf, other]: Title: Time-Frequency Jointed Imperceptible Adversarial Attack to Brainprint Recognition with Deep Learning Models

Authors: Hangjie Yi, Yuhang Ming, Dongjun Liu, Wanzeng Kong

Comments: This work is accepted by ICME 2024

Subjects: Cryptography and Security (cs.CR)
[539] arXiv:2403.10962 (replaced) [pdf, other]: Title: Exploiting Topological Priors for Boosting Point Cloud Generation

Authors: Baiyuan Chen

Comments: 7 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[540] arXiv:2403.11281 (replaced) [pdf, other]: Title: Java JIT Testing with Template Extraction

Authors: Zhiqiang Zang, Fu-Yao Yu, Aditya Thimmaiah, August Shi, Milos Gligoric

Comments: 23 pages, 6 figures, 8 tables, accepted in FSE 2024 (Research Papers track)

Subjects: Software Engineering (cs.SE)
[541] arXiv:2403.14608 (replaced) [pdf, other]: Title: Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Authors: Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, Sai Qian Zhang

Comments: 24 pages, 12 figures

Subjects: Machine Learning (cs.LG)
[542] arXiv:2403.15490 (replaced) [pdf, ps, other]: Title: Increasing retrofit device adoption in social housing: evidence from two field experiments in Belgium

Authors: Mona Bielig, Celina Kacperski, Florian Kutzner

Journal-ref: Journal of Environmental Psychology (2024), 102284

Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
[543] arXiv:2403.18360 (replaced) [pdf, other]: Title: Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation

Authors: Ba Hung Ngo, Nhat-Tuong Do-Tran, Tuan-Ngoc Nguyen, Hae-Gon Jeon, Tae Jong Choi

Comments: Project page: this https URL, Accepted to CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[544] arXiv:2403.20305 (replaced) [pdf, ps, other]: Title: Local Correction of Linear Functions over the Boolean Cube

Authors: Prashanth Amireddy, Amik Raj Behera, Manaswi Paraashar, Srikanth Srinivasan, Madhu Sudan

Comments: 61 pages, To Appear in the Proceedings of the 56th Annual ACM Symposium on Theory of Computing, June 24-28 2024, Vancouver, Canada. Added a remark on local testing in the revision

Subjects: Computational Complexity (cs.CC)
[545] arXiv:2404.00566 (replaced) [pdf, other]: Title: CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Authors: Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, Carolyn Rose

Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
[546] arXiv:2404.02056 (replaced) [pdf, other]: Title: Multitask-based Evaluation of Open-Source LLM on Software Vulnerability

Authors: Xin Yin, Chao Ni, Shaohua Wang

Subjects: Software Engineering (cs.SE)
[547] arXiv:2404.02897 (replaced) [pdf, other]: Title: Deep Image Composition Meets Image Forgery

Authors: Eren Tahir, Mert Bal

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[548] arXiv:2404.03418 (replaced) [pdf, ps, other]: Title: Permissible Knowledge Pooling

Authors: Huimin Dong

Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
[549] arXiv:2404.03436 (replaced) [pdf, other]: Title: Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation

Authors: Luca Comanducci, Fabio Antonacci, Augusto Sarti

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[550] arXiv:2404.03537 (replaced) [pdf, other]: Title: If It's Not Enough, Make It So: Reducing Authentic Data Demand in Face Recognition through Synthetic Faces

Authors: Andrea Atzori, Fadi Boutros, Naser Damer, Gianni Fenu, Mirko Marras

Comments: Accepted as full paper at FG 2024 main track

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[551] arXiv:2404.04608 (replaced) [pdf, other]: Title: Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation

Authors: Danpei Zhao, Bo Yuan, Ziqiang Chen, Tian Li, Zhuoran Liu, Wentao Li, Yue Gao

Journal-ref: IEEE Transactions on Geoscience and Remote Sensing, 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[552] arXiv:2404.07240 (replaced) [pdf, other]: Title: Interactions Between Brauer Configuration Algebras and Classical Cryptanalysis to Analyze Bach's Canons

Authors: Agustín Moreno Cañadas, Pedro Fernando Fernández Espinosa, José Gregorio Rodríguez Nieto, Odette M. Mendez, Ricardo Hugo Arteaga-Bastidas

Comments: 50 pages

Subjects: History and Overview (math.HO); Cryptography and Security (cs.CR)
[553] arXiv:2404.08364 (replaced) [pdf, other]: Title: FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

Authors: Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[554] arXiv:2404.10616 (replaced) [pdf, ps, other]: Title: One is all you need: Second-order Unification without First-order Variables

Authors: David M. Cerna, Julian Parsert

Comments: Under review

Subjects: Logic in Computer Science (cs.LO)
[555] arXiv:2404.10759 (replaced) [pdf, other]: Title: Laplace-HDC: Understanding the geometry of binary hyperdimensional computing

Authors: Saeid Pourmand, Wyatt D. Whiting, Alireza Aghasi, Nicholas F. Marshall

Comments: 23 pages, 7 figures

Subjects: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
[556] arXiv:2404.11051 (replaced) [pdf, ps, other]: Title: WPS-Dataset: A benchmark for wood plate segmentation in bark removal processing

Authors: Rijun Wang, Guanghao Zhang, Fulong Liang, Bo Wang, Xiangwei Mou, Yesheng Chen, Peng Sun, Canjin Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[557] arXiv:2404.12604 (replaced) [pdf, ps, other]: Title: Transmitter Side Beyond-Diagonal RIS for mmWave Integrated Sensing and Communications

Authors: Kexin Chen, Yijie Mao

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[558] arXiv:2404.13108 (replaced) [pdf, other]: Title: RegWSI: Whole Slide Image Registration using Combined Deep Feature- and Intensity-Based Methods: Winner of the ACROBAT 2023 Challenge

Authors: Marek Wodzinski, Niccolò Marini, Manfredo Atzori, Henning Müller

Journal-ref: Computer Methods and Programs in Biomedicine, Vol. 250, 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[559] arXiv:2404.13249 (replaced) [pdf, ps, other]: Title: Additive Complementary Pairs of Codes

Authors: Sanjit Bhowmick, Deepak Kumar Dalai

Subjects: Information Theory (cs.IT)
[560] arXiv:2404.13295 (replaced) [pdf, other]: Title: Detecting Build Dependency Errors in Incremental Builds

Authors: Jun Lyu, Shanshan Li, He Zhang, Yang Zhang, Guoping Rong, Manuel Rigger

Subjects: Software Engineering (cs.SE)
[561] arXiv:2404.13330 (replaced) [pdf, other]: Title: SEGSRNet for Stereo-Endoscopic Image Super-Resolution and Surgical Instrument Segmentation

Authors: Mansoor Hayat, Supavadee Aramvith, Titipat Achakulvisut

Comments: Paper accepted for Presentation in 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Orlando, Florida, USA (Camera Ready Version)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[562] arXiv:2404.13468 (replaced) [pdf, other]: Title: A Grassroots Architecture to Supplant Global Digital Platforms by a Global Digital Democracy

Authors: Ehud Shapiro

Subjects: Networking and Internet Architecture (cs.NI); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Social and Information Networks (cs.SI)
[563] arXiv:2404.13515 (replaced) [pdf, other]: Title: FedTrans: Efficient Federated Learning via Multi-Model Transformation

Authors: Yuxuan Zhu, Jiachen Liu, Mosharaf Chowdhury, Fan Lai

Journal-ref: MLSys (2024)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
[564] arXiv:2404.13634 (replaced) [pdf, other]: Title: Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks

Authors: Resmi Ramachandranpillai, Md Fahim Sikder, David Bergström, Fredrik Heintz

Journal-ref: Journal of Artificial Intelligence Research, vol. 79, Apr. 2024, 1313-41

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[565] arXiv:2404.13697 (replaced) [pdf, other]: Title: Should Teleoperation Be like Driving in a Car? Comparison of Teleoperation HMIs

Authors: Maria-Magdalena Wolf, Richard Taupitz, Frank Diermeyer

Comments: 8 pages, 7 figures, 3 tables

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
[566] arXiv:2404.13711 (replaced) [pdf, other]: Title: ArtNeRF: A Stylized Neural Field for 3D-Aware Cartoonized Face Synthesis

Authors: Zichen Tang, Hongyu Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[567] arXiv:2404.13816 (replaced) [pdf, other]: Title: Neural Radiance Field in Autonomous Driving: A Survey

Authors: Lei He, Leheng Li, Wenchao Sun, Zeyu Han, Yichen Liu, Sifa Zheng, Jianqiang Wang, Keqiang Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[568] arXiv:2404.14057 (replaced) [pdf, ps, other]: Title: Bored to Death: Artificial Intelligence Research Reveals the Role of Boredom in Suicide Behavior

Authors: Shir Lissak, Yaakov Ophir, Refael Tikochinski, Anat Brunstein Klomek, Itay Sisso, Eyal Fruchter, Roi Reichart

Journal-ref: www.frontiersin.org/journals/psychiatry/articles/10.3389/fpsyt.2024.1328122

Subjects: Computation and Language (cs.CL)
[569] arXiv:2404.14212 (replaced) [pdf, other]: Title: Toward Routing River Water in Land Surface Models with Recurrent Neural Networks

Authors: Mauricio Lima, Katherine Deck, Oliver R. A. Dunbar, Tapio Schneider

Comments: 27 pages, 10 figures; submitted in HESS (EGU) with CCBY license

Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG); Geophysics (physics.geo-ph)
[570] arXiv:2404.14471 (replaced) [pdf, other]: Title: Narrative Action Evaluation with Prompt-Guided Multimodal Interaction

Authors: Shiyi Zhang, Sule Bai, Guangyi Chen, Lei Chen, Jiwen Lu, Junle Wang, Yansong Tang

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[571] arXiv:2404.14604 (replaced) [pdf, other]: Title: Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training

Authors: Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao, Meng Jiang

Subjects: Computation and Language (cs.CL)
[572] arXiv:2404.15041 (replaced) [pdf, other]: Title: LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

Authors: Fan Zhang, Zhi-Qi Cheng, Jian Zhao, Xiaojiang Peng, Xuelong Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[573] arXiv:2404.15080 (replaced) [pdf, ps, other]: Title: Flexible Field Sizes in Secure Distributed Matrix Multiplication via Efficient Interference Cancellation

Authors: Okko Makkonen

Comments: 11 pages

Subjects: Information Theory (cs.IT)
[574] arXiv:2404.15261 (replaced) [pdf, other]: Title: All You Need is Resistance: On the Equivalence of Effective Resistance and Certain Optimal Transport Problems on Graphs

Authors: Sawyer Robertson, Zhengchao Wan, Alexander Cloninger

Comments: 35 pages, 7 figures

Subjects: Optimization and Control (math.OC); Discrete Mathematics (cs.DM); Machine Learning (cs.LG); Probability (math.PR)
[575] arXiv:2404.15272 (replaced) [pdf, other]: Title: CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

Authors: Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang

Comments: 12 pages, 5 figures, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[576] arXiv:2404.15325 (replaced) [pdf, ps, other]: Title: Quantifying Social Presence in Mixed Reality: A Contemporary Review of Techniques and Innovations

Authors: Sparsh Srivastava

Subjects: Human-Computer Interaction (cs.HC)
[577] arXiv:2404.15417 (replaced) [pdf, other]: Title: The Power of Resets in Online Reinforcement Learning

Authors: Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin

Comments: Fixed a small typo

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[578] arXiv:2404.15515 (replaced) [pdf, other]: Title: ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models

Authors: Weizhi Tang, Vaishak Belle

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[579] arXiv:2404.15667 (replaced) [pdf, other]: Title: The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews

Authors: Aleksi Huotala, Miikka Kuutila, Paul Ralph, Mika Mäntylä

Comments: Accepted to the International Conference on Evaluation and Assessment in Software Engineering (EASE), 2024 edition

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[580] arXiv:2404.15678 (replaced) [pdf, other]: Title: Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System

Authors: Lei Zheng, Ning Li, Weinan Zhang, Yong Yu

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[581] arXiv:2404.15744 (replaced) [pdf, other]: Title: A General Black-box Adversarial Attack on Graph-based Fake News Detectors

Authors: Peican Zhu, Zechen Pan, Yang Liu, Jiwei Tian, Keke Tang, Zhen Wang

Comments: Accepted by IJCAI2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[582] arXiv:2404.15848 (replaced) [pdf, other]: Title: Detecting Conceptual Abstraction in LLMs

Authors: Michaela Regneri, Alhassan Abdelhalim, Sören Laue

Comments: Paper accepted at the LREC-COLING 2024 Conference (Paper ID: 1968) this https URL

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[583] arXiv:2404.15899 (replaced) [pdf, other]: Title: ST-MambaSync: The Confluence of Mamba Structure and Spatio-Temporal Transformers for Precipitous Traffic Prediction

Authors: Zhiqi Shao, Xusheng Yao, Ze Wang, Junbin Gao

Comments: 11 pages. arXiv admin note: substantial text overlap with arXiv:2404.13257

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[584] arXiv:2404.15939 (replaced) [pdf, other]: Title: Telco-RAG: Navigating the Challenges of Retrieval-Augmented Language Models for Telecommunications

Authors: Andrei-Laurentiu Bornea, Fadhel Ayed, Antonio De Domenico, Nicola Piovesan, Ali Maatouk

Comments: 6 pages, 5 Figure, 4 Tables, submitted to IEEE Globecom 2024 (see this https URL)

Subjects: Information Retrieval (cs.IR); Signal Processing (eess.SP)
[585] arXiv:2404.15954 (replaced) [pdf, other]: Title: Mixed Supervised Graph Contrastive Learning for Recommendation

Authors: Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
[586] arXiv:2404.15956 (replaced) [pdf, other]: Title: A Survey on Visual Mamba

Authors: Hanwei Zhang, Ying Zhu, Dan Wang, Lijun Zhang, Tianxiang Chen, Zi Ye

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[587] arXiv:2404.16042 (replaced) [pdf, ps, other]: Title: How explainable AI affects human performance: A systematic review of the behavioural consequences of saliency maps

Authors: Romy Müller

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
[588] arXiv:2404.16078 (replaced) [pdf, other]: Title: Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective

Authors: Vaisakh Shaj

Comments: Doctoral Dissertation Preprint, Department of Computer Science, Karlsruhe Institute Of Technology, 2024

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[589] arXiv:2404.16116 (replaced) [pdf, other]: Title: Classifying Human-Generated and AI-Generated Election Claims in Social Media

Authors: Alphaeus Dmonte, Marcos Zampieri, Kevin Lybarger, Massimiliano Albanese, Genya Coulter

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[590] arXiv:2404.16147 (replaced) [pdf, other]: Title: Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model

Authors: Yongqi Zhao, Wenbo Xiao, Tomislav Mihalj, Jia Hu, Arno Eichberger

Comments: IEEE Intelligent Vehicles Symposium (IV 2024)

Subjects: Robotics (cs.RO)
[591] arXiv:2404.16156 (replaced) [pdf, other]: Title: Guardians of the Quantum GAN

Authors: Archisman Ghosh, Debarshi Kundu, Avimita Chatterjee, Swaroop Ghosh

Comments: 11 pages, 10 figures

Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[592] arXiv:2404.16251 (replaced) [pdf, ps, other]: Title: Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

Authors: Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[593] arXiv:2404.16296 (replaced) [pdf, ps, other]: Title: Research on Splicing Image Detection Algorithms Based on Natural Image Statistical Characteristics

Authors: Ao Xiang, Jingyu Zhang, Qin Yang, Liyang Wang, Yu Cheng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[594] arXiv:2404.16305 (replaced) [pdf, other]: Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model

Authors: Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao Sang

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[595] arXiv:2404.16347 (replaced) [pdf, other]: Title: Enhancing Arterial Blood Flow Simulations through Physics-Informed Neural Networks

Authors: Shivam Bhargava, Nagaiah Chamakuri

Subjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
[596] arXiv:2404.16431 (replaced) [pdf, ps, other]: Title: Secure Coded Distributed Computing

Authors: Shanuja Sasi, Onur Günlü

Comments: 6 pages

Subjects: Information Theory (cs.IT)
[597] arXiv:2404.16461 (replaced) [pdf, other]: Title: Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums

Authors: Isabelle Lorge, Dan W. Joyce, Andrey Kormilitzin

Subjects: Computation and Language (cs.CL)
[598] arXiv:2404.16573 (replaced) [pdf, other]: Title: Multi-Scale Representations by Varying Window Attention for Semantic Segmentation

Authors: Haotian Yan, Ming Wu, Chuang Zhang

Comments: ICLR2024 Poster

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[599] arXiv:2404.16663 (replaced) [pdf, other]: Title: Formal Specification, Assessment, and Enforcement of Fairness for Generative AIs

Authors: Chih-Hong Cheng, Changshun Wu, Harald Ruess, Xingyu Zhao, Saddek Bensalem

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
[600] arXiv:2404.16743 (replaced) [pdf, other]: Title: Automatic Speech Recognition System-Independent Word Error Rate Estimation

Authors: Chanho Park, Mingjie Chen, Thomas Hain

Comments: Accepted to LREC-COLING 2024 (long)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[601] arXiv:2404.16768 (replaced) [pdf, ps, other]: Title: Redefining Safety for Autonomous Vehicles

Authors: Philip Koopman, William Widen

Comments: 18 pages, SafeComp 2024 draft preprint

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[602] arXiv:2404.16798 (replaced) [pdf, other]: Title: A Test Problem for Flow Codes

Authors: Henry von Wahl, L. Ridgway Scott

Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)
[603] arXiv:2404.16811 (replaced) [pdf, other]: Title: Make Your LLM Fully Utilize the Context

Authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou

Comments: 19 pages, 7 figures, 3 tables, 9 examples

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

New submissions
Cross-lists
Replacements

[ total of 603 entries: 1-603 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2404, contact, help (Access key information)

> cs

Computer Science

New submissions

New submissions for Mon, 29 Apr 24

Cross-lists for Mon, 29 Apr 24

Replacements for Mon, 29 Apr 24