Computer Science
- [1] arXiv:2405.08007 [pdf, ps, other]
-
Title: People cannot distinguish GPT-4 from a human in a Turing testComments: 23 pages, 13 figuresSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
We evaluated 3 systems (ELIZA, GPT-3.5 and GPT-4) in a randomized, controlled, and preregistered Turing test. Human participants had a 5 minute conversation with either a human or an AI, and judged whether or not they thought their interlocutor was human. GPT-4 was judged to be a human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%). The results provide the first robust empirical demonstration that any artificial system passes an interactive 2-player Turing test. The results have implications for debates around machine intelligence and, more urgently, suggest that deception by current AI systems may go undetected. Analysis of participants' strategies and reasoning suggests that stylistic and socio-emotional factors play a larger role in passing the Turing test than traditional notions of intelligence.
- [2] arXiv:2405.08008 [pdf, ps, html, other]
-
Title: Iris: An AI-Driven Virtual Tutor For Computer Science EducationComments: To be published in Proceedings of the 2024 Innovation and Technology in Computer Science Education V. 1 (ITiCSE 2024), July 8--10, 2024, Milan, ItalySubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Integrating AI-driven tools in higher education is an emerging area with transformative potential. This paper introduces Iris, a chat-based virtual tutor integrated into the interactive learning platform Artemis that offers personalized, context-aware assistance in large-scale educational settings. Iris supports computer science students by guiding them through programming exercises and is designed to act as a tutor in a didactically meaningful way. Its calibrated assistance avoids revealing complete solutions, offering subtle hints or counter-questions to foster independent problem-solving skills. For each question, it issues multiple prompts in a Chain-of-Thought to GPT-3.5-Turbo. The prompts include a tutor role description and examples of meaningful answers through few-shot learning. Iris employs contextual awareness by accessing the problem statement, student code, and automated feedback to provide tailored advice.
An empirical evaluation shows that students perceive Iris as effective because it understands their questions, provides relevant support, and contributes to the learning process. While students consider Iris a valuable tool for programming exercises and homework, they also feel confident solving programming tasks in computer-based exams without Iris. The findings underscore students' appreciation for Iris' immediate and personalized support, though students predominantly view it as a complement to, rather than a replacement for, human tutors. Nevertheless, Iris creates a space for students to ask questions without being judged by others. - [3] arXiv:2405.08011 [pdf, ps, html, other]
-
Title: A Survey of Large Language Models for GraphsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Graphs are an essential data structure utilized to represent relationships in real-world scenarios. Prior research has established that Graph Neural Networks (GNNs) deliver impressive outcomes in graph-centric tasks, such as link prediction and node classification. Despite these advancements, challenges like data sparsity and limited generalization capabilities continue to persist. Recently, Large Language Models (LLMs) have gained attention in natural language processing. They excel in language comprehension and summarization. Integrating LLMs with graph learning techniques has attracted interest as a way to enhance performance in graph learning tasks. In this survey, we conduct an in-depth review of the latest state-of-the-art LLMs applied in graph learning and introduce a novel taxonomy to categorize existing methods based on their framework design. We detail four unique designs: i) GNNs as Prefix, ii) LLMs as Prefix, iii) LLMs-Graphs Integration, and iv) LLMs-Only, highlighting key methodologies within each category. We explore the strengths and limitations of each framework, and emphasize potential avenues for future research, including overcoming current integration challenges between LLMs and graph learning techniques, and venturing into new application areas. This survey aims to serve as a valuable resource for researchers and practitioners eager to leverage large language models in graph learning, and to inspire continued progress in this dynamic field. We consistently maintain the related open-source materials at \url{this https URL}.
- [4] arXiv:2405.08013 [pdf, ps, html, other]
-
Title: CTRL: Continuous-Time Representation Learning on Temporal Heterogeneous Information NetworkSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Inductive representation learning on temporal heterogeneous graphs is crucial for scalable deep learning on heterogeneous information networks (HINs) which are time-varying, such as citation networks. However, most existing approaches are not inductive and thus cannot handle new nodes or edges. Moreover, previous temporal graph embedding methods are often trained with the temporal link prediction task to simulate the link formation process of temporal graphs, while ignoring the evolution of high-order topological structures on temporal graphs. To fill these gaps, we propose a Continuous-Time Representation Learning (CTRL) model on temporal HINs. To preserve heterogeneous node features and temporal structures, CTRL integrates three parts in a single layer, they are 1) a \emph{heterogeneous attention} unit that measures the semantic correlation between nodes, 2) a \emph{edge-based Hawkes process} to capture temporal influence between heterogeneous nodes, and 3) \emph{dynamic centrality} that indicates the dynamic importance of a node. We train the CTRL model with a future event (a subgraph) prediction task to capture the evolution of the high-order network structure. Extensive experiments have been conducted on three benchmark datasets. The results demonstrate that our model significantly boosts performance and outperforms various state-of-the-art approaches. Ablation studies are conducted to demonstrate the effectiveness of the model design.
- [5] arXiv:2405.08014 [pdf, ps, html, other]
-
Title: Robot Detection System 1: Front-FollowingComments: paper seriesSubjects: Robotics (cs.RO); Computation and Language (cs.CL)
Front-following is more technically difficult to implement than the other two human following technologies, but front-following technology is more practical and can be applied in more areas to solve more practical problems. Front-following technology has many advantages not found in back-following and side-by-side technologies. In this paper, we will discuss basic and significant principles and general design idea of this technology. Besides, various of novel and special useful methods will be presented and provided. We use enough beautiful figures to display our novel design idea. Our research result is open source in 2018, and this paper is just to expand the research result propagation granularity. Abundant magic design idea are included in this paper, more idea and analyzing can sear and see other paper naming with a start of Robot Design System with Jinwei Lin, the only author of this series papers.
- [6] arXiv:2405.08015 [pdf, ps, html, other]
-
Title: A Methodology-Oriented Study of Catastrophic Forgetting in Incremental Deep Neural NetworksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Human being and different species of animals having the skills to gather, transferring knowledge, processing, fine-tune and generating information throughout their lifetime. The ability of learning throughout their lifespan is referred as continuous learning which is using neurocognition mechanism. Consequently, in real world computational system of incremental learning autonomous agents also needs such continuous learning mechanism which provide retrieval of information and long-term memory consolidation. However, the main challenge in artificial intelligence is that the incremental learning of the autonomous agent when new data confronted. In such scenarios, the main concern is catastrophic forgetting(CF), i.e., while learning the sequentially, neural network underfits the old data when it confronted with new data. To tackle this CF problem many numerous studied have been proposed, however it is very difficult to compare their performance due to dissimilarity in their evaluation mechanism. Here we focus on the comparison of all algorithms which are having similar type of evaluation mechanism. Here we are comparing three types of incremental learning methods: (1) Exemplar based methods, (2) Memory based methods, and (3) Network based method. In this survey paper, methodology oriented study for catastrophic forgetting in incremental deep neural network is addressed. Furthermore, it contains the mathematical overview of impact-full methods which can be help researchers to deal with CF.
- [7] arXiv:2405.08016 [pdf, ps, html, other]
-
Title: Robot Detection System 2: Design of Sensor SystemComments: 6 pages, 2 columns, paper seriesSubjects: Robotics (cs.RO)
Front-following is more technically difficult to implement than the other two human following technologies, but front-following technology is more practical and can be applied in more areas to solve more practical problems. The design of sensors structure is an important part of robot detection system. In this paper, we will discuss basic and significant principles and general design idea of sensor system design of robot detction system. Besides, various of novel and special useful methods will be presented and provided. We use enough beautiful figures to display our novel design idea. Our research result is open source in 2018, and this paper is just to expand the research result propagation granularity. Abundant magic design idea are included in this paper, more idea and analyzing can sear and see other paper naming with a start of Robot Design System with Jinwei Lin, the only author of this series papers.
- [8] arXiv:2405.08017 [pdf, ps, other]
-
Title: Translating Expert Intuition into Quantifiable Features: Encode Investigator Domain Knowledge via LLM for Enhanced Predictive AnalyticsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
In the realm of predictive analytics, the nuanced domain knowledge of investigators often remains underutilized, confined largely to subjective interpretations and ad hoc decision-making. This paper explores the potential of Large Language Models (LLMs) to bridge this gap by systematically converting investigator-derived insights into quantifiable, actionable features that enhance model performance. We present a framework that leverages LLMs' natural language understanding capabilities to encode these red flags into a structured feature set that can be readily integrated into existing predictive models. Through a series of case studies, we demonstrate how this approach not only preserves the critical human expertise within the investigative process but also scales the impact of this knowledge across various prediction tasks. The results indicate significant improvements in risk assessment and decision-making accuracy, highlighting the value of blending human experiential knowledge with advanced machine learning techniques. This study paves the way for more sophisticated, knowledge-driven analytics in fields where expert insight is paramount.
- [9] arXiv:2405.08019 [pdf, ps, html, other]
-
Title: AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss WeightingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Knowledge distillation, a widely used model compression technique, works on the basis of transferring knowledge from a cumbersome teacher model to a lightweight student model. The technique involves jointly optimizing the task specific and knowledge distillation losses with a weight assigned to them. Despite these weights playing a crucial role in the performance of the distillation process, current methods provide equal weight to both losses, leading to suboptimal performance. In this paper, we propose Adaptive Knowledge Distillation, a novel technique inspired by curriculum learning to adaptively weigh the losses at instance level. This technique goes by the notion that sample difficulty increases with teacher loss. Our method follows a plug-and-play paradigm that can be applied on top of any task-specific and distillation objectives. Experiments show that our method performs better than conventional knowledge distillation method and existing instance-level loss functions.
- [10] arXiv:2405.08020 [pdf, ps, html, other]
-
Title: ReActXGB: A Hybrid Binary Convolutional Neural Network Architecture for Improved Performance and Computational EfficiencyComments: Accepted to ICCE-TW 2024Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Binary convolutional neural networks (BCNNs) provide a potential solution to reduce the memory requirements and computational costs associated with deep neural networks (DNNs). However, achieving a trade-off between performance and computational resources remains a significant challenge. Furthermore, the fully connected layer of BCNNs has evolved into a significant computational bottleneck. This is mainly due to the conventional practice of excluding the input layer and fully connected layer from binarization to prevent a substantial loss in accuracy. In this paper, we propose a hybrid model named ReActXGB, where we replace the fully convolutional layer of ReActNet-A with XGBoost. This modification targets to narrow the performance gap between BCNNs and real-valued networks while maintaining lower computational costs. Experimental results on the FashionMNIST benchmark demonstrate that ReActXGB outperforms ReActNet-A by 1.47% in top-1 accuracy, along with a reduction of 7.14% in floating-point operations (FLOPs) and 1.02% in model size.
- [11] arXiv:2405.08021 [pdf, ps, html, other]
-
Title: Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech ConversionComments: Accepted by EMBC 2024Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Electromyography-to-Speech (ETS) conversion has demonstrated its potential for silent speech interfaces by generating audible speech from Electromyography (EMG) signals during silent articulations. ETS models usually consist of an EMG encoder which converts EMG signals to acoustic speech features, and a vocoder which then synthesises the speech signals. Due to an inadequate amount of available data and noisy signals, the synthesised speech often exhibits a low level of naturalness. In this work, we propose Diff-ETS, an ETS model which uses a score-based diffusion probabilistic model to enhance the naturalness of synthesised speech. The diffusion model is applied to improve the quality of the acoustic features predicted by an EMG encoder. In our experiments, we evaluated fine-tuning the diffusion model on predictions of a pre-trained EMG encoder, and training both models in an end-to-end fashion. We compared Diff-ETS with a baseline ETS model without diffusion using objective metrics and a listening test. The results indicated the proposed Diff-ETS significantly improved speech naturalness over the baseline.
- [12] arXiv:2405.08022 [pdf, ps, html, other]
-
Title: Robot Detection System 3: LRF groups and Coordinate SystemComments: 6 pages, 2 clomuns, 7 figuresSubjects: Robotics (cs.RO)
Front-following is more technically difficult to implement than the other two human following technologies, but front-following technology is more practical and can be applied in more areas to solve more practical problems. In this paper, we will analyze the detailed design of LRF groups, the structure and combination design of coordinate system of Robot Detection System. We use enough beautiful figures to display our novel design idea. Our research result is open source in 2018, and this paper is just to expand the research result propagation granularity. Abundant magic design idea are included in this paper, more idea and analyzing can sear and see other paper naming with a start of Robot Design System with Jinwei Lin, the only author of this series papers.
- [13] arXiv:2405.08025 [pdf, ps, html, other]
-
Title: A Data-Mining Based Study of Security Vulnerability Types and Their Mitigation in Different LanguagesSubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
The number of people accessing online services is increasing day by day, and with new users, comes a greater need for effective and responsive cyber-security. Our goal in this study was to find out if there are common patterns within the most widely used programming languages in terms of security issues and fixes. In this paper, we showcase some statistics based on the data we extracted for these languages. Analyzing the more popular ones, we found that the same security issues might appear differently in different languages, and as such the provided solutions may vary just as much. We also found that projects with similar sizes can produce extremely different results, and have different common weaknesses, even if they provide a solution to the same task. These statistics may not be entirely indicative of the projects' standards when it comes to security, but they provide a good reference point of what one should expect. Given a larger sample size they could be made even more precise, and as such a better understanding of the security relevant activities within the projects written in given languages could be achieved.
- [14] arXiv:2405.08026 [pdf, ps, html, other]
-
Title: ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability AnalysisSubjects: Machine Learning (cs.LG)
SMS, or short messaging service, is a widely used and cost-effective communication medium that has sadly turned into a haven for unwanted messages, commonly known as SMS spam. With the rapid adoption of smartphones and Internet connectivity, SMS spam has emerged as a prevalent threat. Spammers have taken notice of the significance of SMS for mobile phone users. Consequently, with the emergence of new cybersecurity threats, the number of SMS spam has expanded significantly in recent years. The unstructured format of SMS data creates significant challenges for SMS spam detection, making it more difficult to successfully fight spam attacks in the cybersecurity domain. In this work, we employ optimized and fine-tuned transformer-based Large Language Models (LLMs) to solve the problem of spam message detection. We use a benchmark SMS spam dataset for this spam detection and utilize several preprocessing techniques to get clean and noise-free data and solve the class imbalance problem using the text augmentation technique. The overall experiment showed that our optimized fine-tuned BERT (Bidirectional Encoder Representations from Transformers) variant model RoBERTa obtained high accuracy with 99.84\%. We also work with Explainable Artificial Intelligence (XAI) techniques to calculate the positive and negative coefficient scores which explore and explain the fine-tuned model transparency in this text-based spam SMS detection task. In addition, traditional Machine Learning (ML) models were also examined to compare their performance with the transformer-based models. This analysis describes how LLMs can make a good impact on complex textual-based spam data in the cybersecurity field.
- [15] arXiv:2405.08027 [pdf, ps, html, other]
-
Title: Automating Data Annotation under Strategic Human Agents: Risks and Potential SolutionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
As machine learning (ML) models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find that agents are increasingly likely to receive positive decisions as the model gets retrained, whereas the proportion of agents with positive labels may decrease over time. We thus propose a refined retraining process to stabilize the dynamics. Last, we examine how algorithmic fairness can be affected by these retraining processes and find that enforcing common fairness constraints at every round may not benefit the disadvantaged group in the long run. Experiments on (semi-)synthetic and real data validate the theoretical findings.
- [16] arXiv:2405.08029 [pdf, ps, html, other]
-
Title: PHUDGE: Phi-3 as Scalable JudgeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In this paper cum technical report, we present PHUDGE A fine tuned Phi3 model that achieved SOTA results in 4 tasks as Feedback Test, Feedback OOD, MT Human, Preference Test surpassing each and every existing model in latency and throughput. It shows very strong correlation not only with GPT4 but with Human annotators too in unseen data as well as in both absolute and relative grading tasks. We have not only addressed the usage of small LMs for cost effective production grade systems but have also shown that Causal modelling is not only slow in nature but sometimes it can hinder models learning capabilities and should be replaced by simpler tasks whenever we can to make the overall system faster and better. We show that by following systematic ML experimentation, thoughtful data augmentation and re purposing the problem itself, we can even beat 10x bigger models even with lesser training data. To the best of our knowledge, we are re the first one to experiment and showcase the usage of generalised version of Earth Movers Distance AKA Wasserstein distance by using Minkowski Distance with a penalty to control loss smoothing and can be used as a loss function instead of Cross Entropy to get stable training and better results for grading tasks.
- [17] arXiv:2405.08031 [pdf, ps, other]
-
Title: HGTDR: Advancing Drug Repurposing with Heterogeneous Graph TransformersComments: Accepted for Publication in Bioinformatics (11-Feb-2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
Motivation: Drug repurposing is a viable solution for reducing the time and cost associated with drug development. However, thus far, the proposed drug repurposing approaches still need to meet expectations. Therefore, it is crucial to offer a systematic approach for drug repurposing to achieve cost savings and enhance human lives. In recent years, using biological network-based methods for drug repurposing has generated promising results. Nevertheless, these methods have limitations. Primarily, the scope of these methods is generally limited concerning the size and variety of data they can effectively handle. Another issue arises from the treatment of heterogeneous data, which needs to be addressed or converted into homogeneous data, leading to a loss of information. A significant drawback is that most of these approaches lack end-to-end functionality, necessitating manual implementation and expert knowledge in certain stages. Results: We propose a new solution, HGTDR (Heterogeneous Graph Transformer for Drug Repurposing), to address the challenges associated with drug repurposing. HGTDR is a three-step approach for knowledge graph-based drug re-purposing: 1) constructing a heterogeneous knowledge graph, 2) utilizing a heterogeneous graph transformer network, and 3) computing relationship scores using a fully connected network. By leveraging HGTDR, users gain the ability to manipulate input graphs, extract information from diverse entities, and obtain their desired output. In the evaluation step, we demonstrate that HGTDR performs comparably to previous methods. Furthermore, we review medical studies to validate our method's top ten drug repurposing suggestions, which have exhibited promising results. We also demon-strated HGTDR's capability to predict other types of relations through numerical and experimental validation, such as drug-protein and disease-protein inter-relations.
- [18] arXiv:2405.08032 [pdf, ps, other]
-
Title: Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model DesignComments: 29 pages, 3 figures, 1 tableSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)
ChatGPT, the AI-powered chatbot with a massive user base of hundreds of millions, has become a global phenomenon. However, the use of Conversational AI Systems (CAISs) like ChatGPT for research in the field of Social Simulation is still limited. Specifically, there is no evidence of its usage in Agent-Based Social Simulation (ABSS) model design. While scepticism towards anything new is inherent to human nature, we firmly believe it is imperative to initiate the use of this innovative technology to support ABSS model design. This paper presents a proof-of-concept that demonstrates how CAISs can facilitate the development of innovative conceptual ABSS models in a concise timeframe and with minimal required upfront case-based knowledge. By employing advanced prompt engineering techniques and adhering to the Engineering ABSS framework, we have constructed a comprehensive prompt script that enables the design of ABSS models with or by the CAIS. The effectiveness of the script is demonstrated through an illustrative case study concerning the use of adaptive architecture in museums. Despite occasional inaccuracies and divergences in conversation, the CAIS proved to be a valuable companion for ABSS modellers.
- [19] arXiv:2405.08033 [pdf, ps, other]
-
Title: Predicting Ship Responses in Different Seaways using a Generalizable Force Correcting Machine Learning MethodSubjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn); Machine Learning (stat.ML)
A machine learning (ML) method is generalizable if it can make predictions on inputs which differ from the training dataset. For predictions of wave-induced ship responses, generalizability is an important consideration if ML methods are to be useful in design evaluations. Furthermore, the size of the training dataset has a significant impact on the practicality of a method, especially when training data is generated using high-fidelity numerical tools which are expensive. This paper considers a hybrid machine learning method which corrects the force in a low-fidelity equation of motion. The method is applied to two different case studies: the nonlinear responses of a Duffing equation subject to irregular excitation, and high-fidelity heave and pitch response data of a Fast Displacement Ship (FDS) in head seas. The generalizability of the method is determined in both cases by making predictions of the response in irregular wave conditions that differ from those in the training dataset. The influence that low-fidelity physics-based terms in the hybrid model have on generalizability is also investigated. The predictions are compared to two benchmarks: a linear physics-based model and a data-driven LSTM model. It is found that the hybrid method offers an improvement in prediction accuracy and generalizability when trained on a small dataset.
- [20] arXiv:2405.08034 [pdf, ps, other]
-
Title: Fighter flight trajectory prediction based on spatio-temporal graphcial attention networkYao Sun (1), Tengyu Jing (2), Jiapeng Wang (2), Wei Wang (2) ((1) School of Aeronautical Engineering, Air Force Engineering University, Xi'an, China,(2) School of Information and Communication Engineering, Xidian University, Xi'an, China)Subjects: Machine Learning (cs.LG)
Quickly and accurately predicting the flight trajectory of a blue army fighter in close-range air combat helps a red army fighter gain a dominant situation, which is the winning factor in later air combat. However,due to the high speed and even hypersonic capabilities of advanced fighters, the diversity of tactical maneuvers,and the instantaneous nature of situational transitions,it is difficult to meet the requirements of practical combat applications in terms of prediction this http URL improve prediction accuracy,this paper proposes a spatio-temporal graph attention network (ST-GAT) using encoding and decoding structures to predict the flight trajectory. The encoder adopts a parallel structure of Transformer and GAT branches embedded with the multi-head self-attention mechanism in each front end. The Transformer branch network is used to extract the temporal characteristics of historical trajectories and capture the impact of the fighter's historical state on future trajectories, while the GAT branch network is used to extract spatial features in historical trajectories and capture potential spatial correlations between fighters.Then we concatenate the outputs of the two branches into a new feature vector and input it into a decoder composed of a fully connected network to predict the future position coordinates of the blue army fighter.The computer simulation results show that the proposed network significantly improves the prediction accuracy of flight trajectories compared to the enhanced CNN-LSTM network (ECNN-LSTM), with improvements of 47% and 34% in both ADE and FDE indicators,providing strong support for subsequent autonomous combat missions.
- [21] arXiv:2405.08035 [pdf, ps, html, other]
-
Title: A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender SystemsSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences, thereby enhancing the system's ability to provide personalized recommendations and improving the overall user experience. CRS has demonstrated significant promise, prompting researchers to concentrate their efforts on developing user simulators that are both more realistic and trustworthy. The emergence of Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities, exhibiting human-level intelligence in various tasks. Research efforts have been made to utilize LLMs for building user simulators to evaluate the performance of CRS. Although these efforts showcase innovation, they are accompanied by certain limitations. In this work, we introduce a Controllable, Scalable, and Human-Involved (CSHI) simulator framework that manages the behavior of user simulators across various stages via a plugin manager. CSHI customizes the simulation of user behavior and interactions to provide a more lifelike and convincing user interaction experience. Through experiments and case studies in two conversational recommendation scenarios, we show that our framework can adapt to a variety of conversational recommendation settings and effectively simulate users' personalized preferences. Consequently, our simulator is able to generate feedback that closely mirrors that of real users. This facilitates a reliable assessment of existing CRS studies and promotes the creation of high-quality conversational recommendation datasets.
- [22] arXiv:2405.08036 [pdf, ps, html, other]
-
Title: POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting the joint action values it can represent and hindering the learning of the optimal policy. To address this challenge, we propose the Potentially Optimal joint actions Weighted QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses of these joint actions during training. We theoretically prove that with such a weighted training approach the optimal policy is guaranteed to be recovered. Experiments in matrix games, predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
- [23] arXiv:2405.08037 [pdf, ps, html, other]
-
Title: Layout Generation Agents with Large Language ModelsSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
In recent years, there has been an increasing demand for customizable 3D virtual spaces. Due to the significant human effort required to create these virtual spaces, there is a need for efficiency in virtual space creation. While existing studies have proposed methods for automatically generating layouts such as floor plans and furniture arrangements, these methods only generate text indicating the layout structure based on user instructions, without utilizing the information obtained during the generation process. In this study, we propose an agent-driven layout generation system using the GPT-4V multimodal large language model and validate its effectiveness. Specifically, the language model manipulates agents to sequentially place objects in the virtual space, thus generating layouts that reflect user instructions. Experimental results confirm that our proposed method can generate virtual spaces reflecting user instructions with a high success rate. Additionally, we successfully identified elements contributing to the improvement in behavior generation performance through ablation study.
- [24] arXiv:2405.08038 [pdf, ps, other]
-
Title: Feature Expansion and enhanced Compression for Class Incremental LearningQuentin Ferdinand (ENSTA Bretagne, Lab-STICC\_MATRIX), Gilles Le Chenadec (ENSTA Bretagne, Lab-STICC\_MATRIX), Benoit Clement (CROSSING, ENSTA Bretagne, Lab-STICC\_MATRIX), Panagiotis Papadakis (Lab-STICC\_RAMBO, IMT Atlantique - INFO), Quentin OliveauSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Class incremental learning consists in training discriminative models to classify an increasing number of classes over time. However, doing so using only the newly added class data leads to the known problem of catastrophic forgetting of the previous classes. Recently, dynamic deep learning architectures have been shown to exhibit a better stability-plasticity trade-off by dynamically adding new feature extractors to the model in order to learn new classes followed by a compression step to scale the model back to its original size, thus avoiding a growing number of parameters. In this context, we propose a new algorithm that enhances the compression of previous class knowledge by cutting and mixing patches of previous class samples with the new images during compression using our Rehearsal-CutMix method. We show that this new data augmentation reduces catastrophic forgetting by specifically targeting past class information and improving its compression. Extensive experiments performed on the CIFAR and ImageNet datasets under diverse incremental learning evaluation protocols demonstrate that our approach consistently outperforms the state-of-the-art . The code will be made available upon publication of our work.
- [25] arXiv:2405.08039 [pdf, ps, other]
-
Title: Vehicles Swarm Intelligence: Cooperation in both Longitudinal and Lateral DimensionsSubjects: Robotics (cs.RO)
Longitudinal-only platooning methods are facing great challenges on running mobility, since they may be impeded by slow-moving vehicles from time to time. To address this issue, this paper proposes a vehicles swarming method coupled both longitudinal and lateral cooperation. The proposed method bears the following contributions: i) enhancing driving mobility by swarming like a bee colony; ii) ensuring the success rate of overtaking; iii) cruising as a string of platoon to preserve sustainability. Evaluations indicate that the proposed method is capable of maneuvering a vehicle swarm to overtake slow-moving vehicles safely and successfully. The proposed method is confirmed to improve running mobility by 12.04%. Swarming safety is ensured by a safe following distance. The proposed method's influence on traffic is limited within five upstream vehicles.
- [26] arXiv:2405.08041 [pdf, ps, other]
-
Title: DeepFMEA -- A Scalable Framework Harmonizing Process Expertise and Data-Driven PHMComments: 11 pages, 6 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Machine Learning (ML) based prognostics and health monitoring (PHM) tools provide new opportunities for manufacturers to operate and maintain their equipment in a risk-optimized manner and utilize it more sustainably along its lifecycle. Yet, in most industrial settings, data is often limited in quantity, and its quality can be inconsistent - both critical for developing and operating reliable ML models. To bridge this gap in practice, successfully industrialized PHM tools rely on the introduction of domain expertise as a prior, to enable sufficiently accurate predictions, while enhancing their interpretability.
Thus, a key challenge while developing data-driven PHM tools involves translating the experience and process knowledge of maintenance personnel, development, and service engineers into a data structure. This structure must not only capture the diversity and variability of the expertise but also render this knowledge accessible for various data-driven algorithms. This results in data models that are heavily tailored towards a specific application and the failure modes the development team aims to detect or predict. The lack of a standardized approach limits developments' extensibility to new failure modes, their transferability to new applications, and it inhibits the utilization of standard data management and MLOps tools, increasing the burden on the development team.
DeepFMEA draws inspiration from the Failure Mode and Effects Analysis (FMEA) in its structured approach to the analysis of any technical system and the resulting standardized data model, while considering aspects that are crucial to capturing process and maintenance expertise in a way that is both intuitive to domain experts and the resulting information can be introduced as priors to ML algorithms. - [27] arXiv:2405.08042 [pdf, ps, html, other]
-
Title: LLAniMAtion: LLAMA Driven Gesture AnimationSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Co-speech gesturing is an important modality in conversation, providing context and social cues. In character animation, appropriate and synchronised gestures add realism, and can make interactive agents more engaging. Historically, methods for automatically generating gestures were predominantly audio-driven, exploiting the prosodic and speech-related content that is encoded in the audio signal. In this paper we instead experiment with using LLM features for gesture generation that are extracted from text using LLAMA2. We compare against audio features, and explore combining the two modalities in both objective tests and a user study. Surprisingly, our results show that LLAMA2 features on their own perform significantly better than audio features and that including both modalities yields no significant difference to using LLAMA2 features in isolation. We demonstrate that the LLAMA2 based model can generate both beat and semantic gestures without any audio input, suggesting LLMs can provide rich encodings that are well suited for gesture generation.
- [28] arXiv:2405.08043 [pdf, ps, html, other]
-
Title: HRNet: Differentially Private Hierarchical and Multi-Resolution Network for Human Mobility Data SynthesizationSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Human mobility data offers valuable insights for many applications such as urban planning and pandemic response, but its use also raises privacy concerns. In this paper, we introduce the Hierarchical and Multi-Resolution Network (HRNet), a novel deep generative model specifically designed to synthesize realistic human mobility data while guaranteeing differential privacy. We first identify the key difficulties inherent in learning human mobility data under differential privacy. In response to these challenges, HRNet integrates three components: a hierarchical location encoding mechanism, multi-task learning across multiple resolutions, and private pre-training. These elements collectively enhance the model's ability under the constraints of differential privacy. Through extensive comparative experiments utilizing a real-world dataset, HRNet demonstrates a marked improvement over existing methods in balancing the utility-privacy trade-off.
- [29] arXiv:2405.08044 [pdf, ps, html, other]
-
Title: Mitigating federated learning contribution allocation instability through randomized aggregationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Federated learning (FL) is a novel collaborative machine learning framework designed to preserve privacy while enabling the creation of robust models. This paradigm addresses a growing need for data security by allowing multiple participants to contribute to a model without exposing their individual datasets. A pivotal issue within this framework, however, concerns the fair and accurate attribution of contributions from various participants to the creation of the joint global model. Incorrect contribution distribution can erode trust among participants, result in inequitable compensation, and ultimately diminish the willingness of parties to engage or actively contribute to the federation. While several methods for remunerating participants have been proposed, little attention was given to the analysis of the stability of these methods when evaluating contributions, which is critical to ensure the long-term viability and fairness of FL systems. In this paper, we analyse this stability through the calculation of contributions by gradient-based model reconstruction techniques with Shapley values. Our investigation reveals that Shapley values fail to reflect baseline contributions, especially when employing different aggregation techniques. To address this issue, we extend on established aggregation techniques by introducing FedRandom, which is designed to sample contributions in a more equitable and distributed manner. We demonstrate that this approach not only serves as a viable aggregation technique but also significantly improves the accuracy of contribution assessment compared to traditional methods. Our results suggest that FedRandom enhances the overall fairness and stability of the federated learning system, making it a superior choice for federations with limited number of participants.
- [30] arXiv:2405.08051 [pdf, ps, html, other]
-
Title: P=NPSubjects: Computational Complexity (cs.CC)
This paper investigates an extremely classic NP-complete problem: How to determine if a graph G, where each vertex has a degree of at most 4, can be 3-colorable(The research in this paper focuses on graphs G that satisfy the condition where the degree of each vertex does not exceed 4. To conserve space, it is assumed throughout the paper that graph G meets this condition by default.). The author has meticulously observed the relationship between the coloring problem and semidefinite programming, and has creatively constructed the corresponding semidefinite programming problem R(G) for a given graph G. The construction method of R(G) refers to Theorem 1.1 in the paper. I have obtained and proven the conclusion: A graph G is 3-colorable if and only if the objective function of its corresponding optimization problem R(G) is bounded, and when the objective function is bounded, its minimum value is 0.
- [31] arXiv:2405.08054 [pdf, ps, html, other]
-
Title: Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided ConditioningComments: Project webpage: this https URLSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
As humans, we aspire to create media content that is both freely willed and readily controlled. Thanks to the prominent development of generative techniques, we now can easily utilize 2D diffusion methods to synthesize images controlled by raw sketch or designated human poses, and even progressively edit/regenerate local regions with masked inpainting. However, similar workflows in 3D modeling tasks are still unavailable due to the lack of controllability and efficiency in 3D generation. In this paper, we present a novel controllable and interactive 3D assets modeling framework, named Coin3D. Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes, and introduces an interactive generation workflow to support seamless local part editing while delivering responsive 3D object previewing within a few seconds. To this end, we develop several techniques, including the 3D adapter that applies volumetric coarse shape control to the diffusion model, proxy-bounded editing strategy for precise part editing, progressive volume cache to support responsive preview, and volume-SDS to ensure consistent mesh reconstruction. Extensive experiments of interactive generation and editing on diverse shape proxies demonstrate that our method achieves superior controllability and flexibility in the 3D assets generation task.
- [32] arXiv:2405.08055 [pdf, ps, html, other]
-
Title: DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D GenerationComments: arXiv admin note: substantial text overlap with arXiv:2309.07920Subjects: Computer Vision and Pattern Recognition (cs.CV)
Generating diverse and high-quality 3D assets automatically poses a fundamental yet challenging task in 3D computer vision. Despite extensive efforts in 3D generation, existing optimization-based approaches struggle to produce large-scale 3D assets efficiently. Meanwhile, feed-forward methods often focus on generating only a single category or a few categories, limiting their generalizability. Therefore, we introduce a diffusion-based feed-forward framework to address these challenges with a single model. To handle the large diversity and complexity in geometry and texture across categories efficiently, we 1) adopt improved triplane to guarantee efficiency; 2) introduce the 3D-aware transformer to aggregate the generalized 3D knowledge with specialized 3D features; and 3) devise the 3D-aware encoder/decoder to enhance the generalized 3D knowledge. Building upon our 3D-aware Diffusion model with TransFormer, DiffTF, we propose a stronger version for 3D generation, i.e., DiffTF++. It boils down to two parts: multi-view reconstruction loss and triplane refinement. Specifically, we utilize multi-view reconstruction loss to fine-tune the diffusion model and triplane decoder, thereby avoiding the negative influence caused by reconstruction errors and improving texture synthesis. By eliminating the mismatch between the two stages, the generative performance is enhanced, especially in texture. Additionally, a 3D-aware refinement process is introduced to filter out artifacts and refine triplanes, resulting in the generation of more intricate and reasonable details. Extensive experiments on ShapeNet and OmniObject3D convincingly demonstrate the effectiveness of our proposed modules and the state-of-the-art 3D object generation performance with large diversity, rich semantics, and high quality.
- [33] arXiv:2405.08084 [pdf, ps, other]
-
Title: PrivFED -- A Framework for Privacy-Preserving Federated Learning in Enhanced Breast Cancer DiagnosisComments: Presented in ICIITB 2024 organized by Modern College of Business and Science, OmanSubjects: Cryptography and Security (cs.CR)
In the day-to-day operations of healthcare institutions, a multitude of Personally Identifiable Information (PII) data exchanges occur, exposing the data to a spectrum of cybersecurity threats. This study introduces a federated learning framework, trained on the Wisconsin dataset, to mitigate challenges such as data scarcity and imbalance. Techniques like the Synthetic Minority Over-sampling Technique (SMOTE) are incorporated to bolster robustness, while isolation forests are employed to fortify the model against outliers. Catboost serves as the classification tool across all devices. The identification of optimal features for heightened accuracy is pursued through Principal Component Analysis (PCA),accentuating the significance of hyperparameter tuning, as underscored in a comparative analysis. The model exhibits an average accuracy of 99.95% on edge devices and 98% on the central server.
- [34] arXiv:2405.08092 [pdf, ps, html, other]
-
Title: A Flexible MATLAB/Simulink Simulator for Robotic Floating-base Systems in Contact with the Ground: Theoretical background and Implementation DetailsComments: 17 pages, 4 figures, 1 table, equal contribution by authors Nuno Guedelha and Venus Pasandi. arXiv admin note: substantial text overlap with arXiv:2211.09716Journal-ref: International Journal of Semantic Computing 0 Ahead of Print (2024) 1-17Subjects: Robotics (cs.RO)
This paper presents an open-source MATLAB/Simulink physics simulator for rigid-body articulated systems, including manipulators and floating-base robots. Thanks to MATLAB/Simulink features like MATLAB system classes and Simulink function blocks, the presented simulator combines a programmatic and block-based approach, resulting in a flexible design in the sense that different parts, including its physics engine, robot-ground interaction model, and state evolution algorithm are simply accessible and editable. Moreover, through the use of Simulink dynamic mask blocks, the proposed simulator supports robot models integrating open-chain and closed-chain kinematics with any desired number of links interacting with the ground. This simulator can also integrate second-order actuator dynamics. Furthermore, the simulator benefits from a one-line installation and an easy-to-use Simulink interface.
- [35] arXiv:2405.08097 [pdf, ps, html, other]
-
Title: Learning functions on symmetric matrices and point clouds via lightweight invariant featuresComments: 28 pages, 2 figures, 2 tablesSubjects: Machine Learning (cs.LG); Commutative Algebra (math.AC)
In this work, we present a mathematical formulation for machine learning of (1) functions on symmetric matrices that are invariant with respect to the action of permutations by conjugation, and (2) functions on point clouds that are invariant with respect to rotations, reflections, and permutations of the points. To achieve this, we construct $O(n^2)$ invariant features derived from generators for the field of rational functions on $n\times n$ symmetric matrices that are invariant under joint permutations of rows and columns. We show that these invariant features can separate all distinct orbits of symmetric matrices except for a measure zero set; such features can be used to universally approximate invariant functions on almost all weighted graphs. For point clouds in a fixed dimension, we prove that the number of invariant features can be reduced, generically without losing expressivity, to $O(n)$, where $n$ is the number of points. We combine these invariant features with DeepSets to learn functions on symmetric matrices and point clouds with varying sizes. We empirically demonstrate the feasibility of our approach on molecule property regression and point cloud distance prediction.
- [36] arXiv:2405.08099 [pdf, ps, html, other]
-
Title: KET-QA: A Dataset for Knowledge Enhanced Table Question AnsweringComments: LREC-Coling 2024Subjects: Computation and Language (cs.CL)
Due to the concise and structured nature of tables, the knowledge contained therein may be incomplete or missing, posing a significant challenge for table question answering (TableQA) and data analysis systems. Most existing datasets either fail to address the issue of external knowledge in TableQA or only utilize unstructured text as supplementary information for tables. In this paper, we propose to use a knowledge base (KB) as the external knowledge source for TableQA and construct a dataset KET-QA with fine-grained gold evidence annotation. Each table in the dataset corresponds to a sub-graph of the entire KB, and every question requires the integration of information from both the table and the sub-graph to be answered. To extract pertinent information from the vast knowledge sub-graph and apply it to TableQA, we design a retriever-reasoner structured pipeline model. Experimental results demonstrate that our model consistently achieves remarkable relative performance improvements ranging from 1.9 to 6.5 times and absolute improvements of 11.66% to 44.64% on EM scores across three distinct settings (fine-tuning, zero-shot, and few-shot), in comparison with solely relying on table information in the traditional TableQA manner. However, even the best model achieves a 60.23% EM score, which still lags behind the human-level performance, highlighting the challenging nature of KET-QA for the question-answering community. We also provide a human evaluation of error cases to analyze further the aspects in which the model can be improved. Project page: this https URL.
- [37] arXiv:2405.08102 [pdf, ps, html, other]
-
Title: Evaluating Google's Protected Audience ProtocolSubjects: Cryptography and Security (cs.CR)
While third-party cookies have been a key component of the digital marketing ecosystem for years, they allow users to be tracked across web sites in ways that raise serious privacy concerns. Google has proposed the Privacy Sandbox initiative to enable ad targeting without third-party cookies. While there have been several studies focused on other aspects of this initiative, there has been little analysis to date as to how well the system achieves the intended goal of preventing request linking. This work focuses on analyzing linkage privacy risks for the reporting mechanisms proposed in the Protected Audience (PrAu) proposal (previously known as FLEDGE), which is intended to enable online remarketing without using third-party cookies. We summarize the overall workflow of PrAu and highlight potential privacy risks associated with its proposed design, focusing on scenarios in which adversaries attempt to link requests to different sites to the same user. We show how a realistic adversary would be still able to use the privacy-protected reporting mechanisms to link user requests and conduct mass surveillance, even with correct implementations of all the currently proposed privacy mechanisms.
- [38] arXiv:2405.08104 [pdf, ps, other]
-
Title: Separation and Encodability in Mixed Choice Multiparty Sessions (Technical Report)Comments: Technical report of the paper Separation and Encodability in Mixed Choice Multiparty Sessions by Kirstin Peters and Nobuko Yoshida at LICS'24Subjects: Logic in Computer Science (cs.LO)
Multiparty session types (MP) are a type discipline for enforcing the structured, deadlock-free communication of concurrent and message-passing programs. Traditional MP have a limited form of choice in which alternative communication possibilities are offered by a single participant and selected by another. Mixed choice multiparty session types (MCMP) extend the choice construct to include both selections and offers in the same choice. This paper first proposes a general typing system for a mixed choice synchronous multiparty session calculus, and prove type soundness, communication safety, and deadlock-freedom.
Next we compare expressiveness of nine subcalcli of MCMP-calculus by examining their encodability (there exists a good encoding from one to another) and separation (there exists no good encoding from one calculus to another). We prove 8 new encodablity results and 20 new separation results. In summary, MCMP is strictly more expressive than classical multiparty sessions (MP) and mixed choice in mixed sessions. This contrasts earlier results where mixed sessions do not add any expressiveness to non-mixed fundamental sessions, shedding a light on expressiveness of multiparty mixed choice. - [39] arXiv:2405.08111 [pdf, ps, other]
-
Title: Conformalized Physics-Informed Neural NetworksSubjects: Machine Learning (cs.LG)
Physics-informed neural networks (PINNs) are an influential method of solving differential equations and estimating their parameters given data. However, since they make use of neural networks, they provide only a point estimate of differential equation parameters, as well as the solution at any given point, without any measure of uncertainty. Ensemble and Bayesian methods have been previously applied to quantify the uncertainty of PINNs, but these methods may require making strong assumptions on the data-generating process, and can be computationally expensive. Here, we introduce Conformalized PINNs (C-PINNs) that, without making any additional assumptions, utilize the framework of conformal prediction to quantify the uncertainty of PINNs by providing intervals that have finite-sample, distribution-free statistical validity.
- [40] arXiv:2405.08114 [pdf, ps, html, other]
-
Title: RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine TransformationsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), the classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization and instance normalization, have been applied to different layers of GAN to control content synthesis in images. CAT is a multi-layer perceptron that independently predicts data based on batch statistics between neighboring layers, with global textual information unavailable to other layers. To address this issue, we first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the characteristic of information forgetting in recurrent neural networks. Moreover, both our generator and discriminator utilize the powerful pre-trained model, Clip, which has been extensively employed for establishing associations between text and images through the learning of multimodal representations in latent space. The discriminator utilizes CLIP's ability to comprehend complex scenes to accurately assess the quality of the generated images. Extensive experiments have been conducted on the CUB, Oxford, and CelebA-tiny datasets to demonstrate the superiority of the proposed model over current state-of-the-art models. The code is this https URL.
- [41] arXiv:2405.08117 [pdf, ps, html, other]
-
Title: Secret Sharing with Certified DeletionComments: To appear at CRYPTO 2024Subjects: Cryptography and Security (cs.CR)
Secret sharing allows a user to split a secret into many shares so that the secret can be recovered if, and only if, an authorized set of shares is collected. Although secret sharing typically does not require any computational hardness assumptions, its security does require that an adversary cannot collect an authorized set of shares. Over long periods of time where an adversary can benefit from multiple data breaches, this may become an unrealistic assumption.
We initiate the systematic study of secret sharing with certified deletion in order to achieve security even against an adversary that eventually collects an authorized set of shares. In secret sharing with certified deletion, a (classical) secret is split into quantum shares which can be verifiably destroyed. We define two natural notions of security: no-signaling security and adaptive security.
Next, we show how to construct (i) a secret sharing scheme with no-signaling certified deletion for any monotone access structure, and (ii) a threshold secret sharing scheme with adaptive certified deletion. Our first construction uses Bartusek and Khurana's (CRYPTO 2023) 2-out-of-2 secret sharing scheme with certified deletion as a building block, while our second construction is built from scratch and requires several new technical ideas. For example, we significantly generalize the ``XOR extractor'' of Agarwal, Bartusek, Khurana, and Kumar (EUROCRYPT 2023) in order to obtain high rate seedless extraction from certain quantum sources of entropy. - [42] arXiv:2405.08119 [pdf, ps, html, other]
-
Title: GPS-IMU Sensor Fusion for Reliable Autonomous Vehicle Position EstimationComments: 6 pages, 4 figures, and conferenceSubjects: Systems and Control (eess.SY); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Global Positioning System (GPS) navigation provides accurate positioning with global coverage, making it a reliable option in open areas with unobstructed sky views. However, signal degradation may occur in indoor spaces and urban canyons. In contrast, Inertial Measurement Units (IMUs) consist of gyroscopes and accelerometers that offer relative motion information such as acceleration and rotational changes. Unlike GPS, IMUs do not rely on external signals, making them useful in GPS-denied environments. Nonetheless, IMUs suffer from drift over time due to the accumulation of errors while integrating acceleration to determine velocity and position. Therefore, fusing the GPS and IMU is crucial for enhancing the reliability and precision of navigation systems in autonomous vehicles, especially in environments where GPS signals are compromised. To ensure smooth navigation and overcome the limitations of each sensor, the proposed method fuses GPS and IMU data. This sensor fusion uses the Unscented Kalman Filter (UKF) Bayesian filtering technique. The proposed navigation system is designed to be robust, delivering continuous and accurate positioning critical for the safe operation of autonomous vehicles, particularly in GPS-denied environments. This project uses KITTI GNSS and IMU datasets for experimental validation, showing that the GNSS-IMU fusion technique reduces GNSS-only data's RMSE. The RMSE decreased from 13.214, 13.284, and 13.363 to 4.271, 5.275, and 0.224 for the x-axis, y-axis, and z-axis, respectively. The experimental result using UKF shows promising direction in improving autonomous vehicle navigation using GPS and IMU sensor fusion using the best of two sensors in GPS-denied environments.
- [43] arXiv:2405.08120 [pdf, ps, html, other]
-
Title: From Questions to Insightful Answers: Building an Informed Chatbot for University ResourcesSubash Neupane, Elias Hossain, Jason Keith, Himanshu Tripathi, Farbod Ghiasi, Noorbakhsh Amiri Golilarz, Amin Amirlatifi, Sudip Mittal, Shahram RahimiSubjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI)
This paper presents BARKPLUG V.2, a Large Language Model (LLM)-based chatbot system built using Retrieval Augmented Generation (RAG) pipelines to enhance the user experience and access to information within academic settings.The objective of BARKPLUG V.2 is to provide information to users about various campus resources, including academic departments, programs, campus facilities, and student resources at a university setting in an interactive fashion. Our system leverages university data as an external data corpus and ingests it into our RAG pipelines for domain-specific question-answering tasks. We evaluate the effectiveness of our system in generating accurate and pertinent responses for Mississippi State University, as a case study, using quantitative measures, employing frameworks such as Retrieval Augmented Generation Assessment(RAGAS). Furthermore, we evaluate the usability of this system via subjective satisfaction surveys using the System Usability Scale (SUS). Our system demonstrates impressive quantitative performance, with a mean RAGAS score of 0.96, and experience, as validated by usability assessments.
- [44] arXiv:2405.08122 [pdf, ps, html, other]
-
Title: Equivariant Deep Learning of Mixed-Integer Optimal Control Solutions for Vehicle Decision Making and Motion PlanningSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Mixed-integer quadratic programs (MIQPs) are a versatile way of formulating vehicle decision making and motion planning problems, where the prediction model is a hybrid dynamical system that involves both discrete and continuous decision variables. However, even the most advanced MIQP solvers can hardly account for the challenging requirements of automotive embedded platforms. Thus, we use machine learning to simplify and hence speed up optimization. Our work builds on recent ideas for solving MIQPs in real-time by training a neural network to predict the optimal values of integer variables and solving the remaining problem by online quadratic programming. Specifically, we propose a recurrent permutation equivariant deep set that is particularly suited for imitating MIQPs that involve many obstacles, which is often the major source of computational burden in motion planning problems. Our framework comprises also a feasibility projector that corrects infeasible predictions of integer variables and considerably increases the likelihood of computing a collision-free trajectory. We evaluate the performance, safety and real-time feasibility of decision-making for autonomous driving using the proposed approach on realistic multi-lane traffic scenarios with interactive agents in SUMO simulations.
- [45] arXiv:2405.08125 [pdf, ps, html, other]
-
Title: AI-Cybersecurity Education Through Designing AI-based Cyberharassment Detection LabEbuka Okpala, Nishant Vishwamitra, Keyan Guo, Song Liao, Long Cheng, Hongxin Hu, Yongkai Wu, Xiaohong Yuan, Jeannette Wade, Sajad KhorsandrooComments: 10 pagesSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cyberharassment is a critical, socially relevant cybersecurity problem because of the adverse effects it can have on targeted groups or individuals. While progress has been made in understanding cyber-harassment, its detection, attacks on artificial intelligence (AI) based cyberharassment systems, and the social problems in cyberharassment detectors, little has been done in designing experiential learning educational materials that engage students in this emerging social cybersecurity in the era of AI. Experiential learning opportunities are usually provided through capstone projects and engineering design courses in STEM programs such as computer science. While capstone projects are an excellent example of experiential learning, given the interdisciplinary nature of this emerging social cybersecurity problem, it can be challenging to use them to engage non-computing students without prior knowledge of AI. Because of this, we were motivated to develop a hands-on lab platform that provided experiential learning experiences to non-computing students with little or no background knowledge in AI and discussed the lessons learned in developing this lab. In this lab used by social science students at North Carolina A&T State University across two semesters (spring and fall) in 2022, students are given a detailed lab manual and are to complete a set of well-detailed tasks. Through this process, students learn AI concepts and the application of AI for cyberharassment detection. Using pre- and post-surveys, we asked students to rate their knowledge or skills in AI and their understanding of the concepts learned. The results revealed that the students moderately understood the concepts of AI and cyberharassment.
- [46] arXiv:2405.08131 [pdf, ps, html, other]
-
Title: When factorization meets argumentation: towards argumentative explanationsComments: arXiv admin note: substantial text overlap with arXiv:2310.16157Subjects: Artificial Intelligence (cs.AI)
Factorization-based models have gained popularity since the Netflix challenge {(2007)}. Since that, various factorization-based models have been developed and these models have been proven to be efficient in predicting users' ratings towards items. A major concern is that explaining the recommendations generated by such methods is non-trivial because the explicit meaning of the latent factors they learn are not always clear. In response, we propose a novel model that combines factorization-based methods with argumentation frameworks (AFs). The integration of AFs provides clear meaning at each stage of the model, enabling it to produce easily understandable explanations for its recommendations. In this model, for every user-item interaction, an AF is defined in which the features of items are considered as arguments, and the users' ratings towards these features determine the strength and polarity of these arguments. This perspective allows our model to treat feature attribution as a structured argumentation procedure, where each calculation is marked with explicit meaning, enhancing its inherent interpretability. Additionally, our framework seamlessly incorporates side information, such as user contexts, leading to more accurate predictions. We anticipate at least three practical applications for our model: creating explanation templates, providing interactive explanations, and generating contrastive explanations. Through testing on real-world datasets, we have found that our model, along with its variants, not only surpasses existing argumentation-based methods but also competes effectively with current context-free and context-aware methods.
- [47] arXiv:2405.08134 [pdf, ps, html, other]
-
Title: Many-Shot Regurgitation (MSR) PromptingSubjects: Computation and Language (cs.CL)
We introduce Many-Shot Regurgitation (MSR) prompting, a new black-box membership inference attack framework for examining verbatim content reproduction in large language models (LLMs). MSR prompting involves dividing the input text into multiple segments and creating a single prompt that includes a series of faux conversation rounds between a user and a language model to elicit verbatim regurgitation. We apply MSR prompting to diverse text sources, including Wikipedia articles and open educational resources (OER) textbooks, which provide high-quality, factual content and are continuously updated over time. For each source, we curate two dataset types: one that LLMs were likely exposed to during training ($D_{\rm pre}$) and another consisting of documents published after the models' training cutoff dates ($D_{\rm post}$). To quantify the occurrence of verbatim matches, we employ the Longest Common Substring algorithm and count the frequency of matches at different length thresholds. We then use statistical measures such as Cliff's delta, Kolmogorov-Smirnov (KS) distance, and Kruskal-Wallis H test to determine whether the distribution of verbatim matches differs significantly between $D_{\rm pre}$ and $D_{\rm post}$. Our findings reveal a striking difference in the distribution of verbatim matches between $D_{\rm pre}$ and $D_{\rm post}$, with the frequency of verbatim reproduction being significantly higher when LLMs (e.g. GPT models and LLaMAs) are prompted with text from datasets they were likely trained on. For instance, when using GPT-3.5 on Wikipedia articles, we observe a substantial effect size (Cliff's delta $= -0.984$) and a large KS distance ($0.875$) between the distributions of $D_{\rm pre}$ and $D_{\rm post}$. Our results provide compelling evidence that LLMs are more prone to reproducing verbatim content when the input text is likely sourced from their training data.
- [48] arXiv:2405.08135 [pdf, ps, html, other]
-
Title: An Optimal Multilevel Quorum System for Probabilistic ConsensusSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM); Probability (math.PR)
We present the notion of a multilevel, slashable quorum system, where an application can obtain gradual levels of assurance that a certain value is bound to be decided (or "finalized") in a global consensus procedure, unless a large number of Byzantine processes are exposed to slashing (that is, penalty on staked assets). Our construction is a highly parameterized generalization of quorum systems based on finite projective spaces, with asymptotic high availability and optimal slashing properties. In particular, we show that any quorum system whose ground elements are disjoint subsets of nodes (e.g. "commmittees" in committee-based consensus protocols) has asymptotic high availability under very reasonable conditions, a general proof with significance of its own. Under similarly relaxed conditions, we show that our construction has asymptotically optimal slashing properties with respect to message complexity and process load; this illustrates a fundamental trade off between message complexity, load, and slashing. Our multilevel construction allows nodes to decide how many "levels" of finalization assurance they wish to obtain, noting that this functionality, if applied to a proof-of-stake blockchain, can be seen either as (i) a form of an early, slashing-based, probabilistic block finalization; or (ii) a service for reorg tolerance.
- [49] arXiv:2405.08142 [pdf, ps, other]
-
Title: Discursive objection strategies in online comments: Developing a classification schema and validating its trainingAshley L. Shea, Aspen K.B. Omapang, Ji Yong Cho, Miryam Y. Ginsparg, Natalie Bazarova, Winice Hui, René F. Kizilcec, Chau Tong, Drew MargolinComments: This paper was accepted and presented at the 73rd Annual International Communication Association International Conference, May 2023Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Most Americans agree that misinformation, hate speech and harassment are harmful and inadequately curbed on social media through current moderation practices. In this paper, we aim to understand the discursive strategies employed by people in response to harmful speech in news comments. We conducted a content analysis of more than 6500 comment replies to trending news videos on YouTube and Twitter and identified seven distinct discursive objection strategies (Study 1). We examined the frequency of each strategy's occurrence from the 6500 comment replies, as well as from a second sample of 2004 replies (Study 2). Together, these studies show that people deploy a diversity of discursive strategies when objecting to speech, and reputational attacks are the most common. The resulting classification scheme accounts for different theoretical approaches for expressing objections and offers a comprehensive perspective on grassroots efforts aimed at stopping offensive or problematic speech on campus.
- [50] arXiv:2405.08150 [pdf, ps, html, other]
-
Title: cVIL: Class-Centric Visual Interactive LabelingSubjects: Human-Computer Interaction (cs.HC)
We present cVIL, a class-centric approach to visual interactive labeling, which facilitates human annotation of large and complex image data sets. cVIL uses different property measures to support instance labeling for labeling difficult instances and batch labeling to quickly label easy instances. Simulated experiments reveal that cVIL with batch labeling can outperform traditional labeling approaches based on active learning. In a user study, cVIL led to better accuracy and higher user preference compared to a traditional instance-based visual interactive labeling approach based on 2D scatterplots.
- [51] arXiv:2405.08151 [pdf, ps, html, other]
-
Title: Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-AwarenessSubjects: Computation and Language (cs.CL)
Large language models (LLM) have demonstrated remarkable capabilities in various biomedical natural language processing (NLP) tasks, leveraging the demonstration within the input context to adapt to new tasks. However, LLM is sensitive to the selection of demonstrations. To address the hallucination issue inherent in LLM, retrieval-augmented LLM (RAL) offers a solution by retrieving pertinent information from an established database. Nonetheless, existing research work lacks rigorous evaluation of the impact of retrieval-augmented large language models on different biomedical NLP tasks. This deficiency makes it challenging to ascertain the capabilities of RAL within the biomedical domain. Moreover, the outputs from RAL are affected by retrieving the unlabeled, counterfactual, or diverse knowledge that is not well studied in the biomedical domain. However, such knowledge is common in the real world. Finally, exploring the self-awareness ability is also crucial for the RAL system. So, in this paper, we systematically investigate the impact of RALs on 5 different biomedical tasks (triple extraction, link prediction, classification, question answering, and natural language inference). We analyze the performance of RALs in four fundamental abilities, including unlabeled robustness, counterfactual robustness, diverse robustness, and negative awareness. To this end, we proposed an evaluation framework to assess the RALs' performance on different biomedical NLP tasks and establish four different testbeds based on the aforementioned fundamental abilities. Then, we evaluate 3 representative LLMs with 3 different retrievers on 5 tasks over 9 datasets.
- [52] arXiv:2405.08154 [pdf, ps, html, other]
-
Title: LLM Theory of Mind and Alignment: Opportunities and RisksJournal-ref: Proceedings of Workshop on Theory of Mind in Human-AI Interaction at CHI 2024 (ToMinHAI at CHI 2024)Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Large language models (LLMs) are transforming human-computer interaction and conceptions of artificial intelligence (AI) with their impressive capacities for conversing and reasoning in natural language. There is growing interest in whether LLMs have theory of mind (ToM); the ability to reason about the mental and emotional states of others that is core to human social intelligence. As LLMs are integrated into the fabric of our personal, professional and social lives and given greater agency to make decisions with real-world consequences, there is a critical need to understand how they can be aligned with human values. ToM seems to be a promising direction of inquiry in this regard. Following the literature on the role and impacts of human ToM, this paper identifies key areas in which LLM ToM will show up in human:LLM interactions at individual and group levels, and what opportunities and risks for alignment are raised in each. On the individual level, the paper considers how LLM ToM might manifest in goal specification, conversational adaptation, empathy and anthropomorphism. On the group level, it considers how LLM ToM might facilitate collective alignment, cooperation or competition, and moral judgement-making. The paper lays out a broad spectrum of potential implications and suggests the most pressing areas for future research.
- [53] arXiv:2405.08171 [pdf, ps, html, other]
-
Title: Finite-valued Streaming String TransducersSubjects: Formal Languages and Automata Theory (cs.FL)
A transducer is finite-valued if for some bound k, it maps any given input to at most k outputs. For classical, one-way transducers, it is known since the 80s that finite valuedness entails decidability of the equivalence problem. This decidability result is in contrast to the general case, which makes finite-valued transducers very attractive. For classical transducers, it is also known that finite valuedness is decidable and that any k-valued finite transducer can be decomposed as a union of k single-valued finite transducers.
In this paper, we extend the above results to copyless streaming string transducers (SSTs), answering questions raised by Alur and Deshmukh in 2011. SSTs strictly extend the expressiveness of one-way transducers via additional variables that store partial outputs. We prove that any k-valued SST can be effectively decomposed as a union of k (single-valued) deterministic SSTs. As a corollary, we obtain equivalence of SSTs and two-way transducers in the finite-valued case (those two models are incomparable in general). Another corollary is an elementary upper bound for checking equivalence of finite-valued SSTs. The latter problem was already known to be decidable, but the proof complexity was unknown (it relied on Ehrenfeucht's conjecture). Finally, our main result is that finite valuedness of SSTs is decidable. The complexity is PSpace, and even PTime when the number of variables is fixed. - [54] arXiv:2405.08172 [pdf, ps, html, other]
-
Title: CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine TranslationComments: on-going work, 30 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This paper investigates the development and evaluation of machine translation models from Cantonese to English, where we propose a novel approach to tackle low-resource language translations. The main objectives of the study are to develop a model that can effectively translate Cantonese to English and evaluate it against state-of-the-art commercial models. To achieve this, a new parallel corpus has been created by combining different available corpora online with preprocessing and cleaning. In addition, a monolingual Cantonese dataset has been created through web scraping to aid the synthetic parallel corpus generation. Following the data collection process, several approaches, including fine-tuning models, back-translation, and model switch, have been used. The translation quality of models has been evaluated with multiple quality metrics, including lexicon-based metrics (SacreBLEU and hLEPOR) and embedding-space metrics (COMET and BERTscore). Based on the automatic metrics, the best model is selected and compared against the 2 best commercial translators using the human evaluation framework HOPES. The best model proposed in this investigation (NLLB-mBART) with model switch mechanisms has reached comparable and even better automatic evaluation scores against State-of-the-art commercial models (Bing and Baidu Translators), with a SacreBLEU score of 16.8 on our test set. Furthermore, an open-source web application has been developed to allow users to translate between Cantonese and English, with the different trained models available for effective comparisons between models from this investigation and users. CANTONMT is available at this https URL
- [55] arXiv:2405.08174 [pdf, ps, html, other]
-
Title: Estimating Direct and Indirect Causal Effects of Spatiotemporal Interventions in Presence of Spatial InterferenceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Spatial interference (SI) occurs when the treatment at one location affects the outcomes at other locations. Accounting for spatial interference in spatiotemporal settings poses further challenges as interference violates the stable unit treatment value assumption, making it infeasible for standard causal inference methods to quantify the effects of time-varying treatment at spatially varying outcomes. In this paper, we first formalize the concept of spatial interference in case of time-varying treatment assignments by extending the potential outcome framework under the assumption of no unmeasured confounding. We then propose our deep learning based potential outcome model for spatiotemporal causal inference. We utilize latent factor modeling to reduce the bias due to time-varying confounding while leveraging the power of U-Net architecture to capture global and local spatial interference in data over time. Our causal estimators are an extension of average treatment effect (ATE) for estimating direct (DATE) and indirect effects (IATE) of spatial interference on treated and untreated data. Being the first of its kind deep learning based spatiotemporal causal inference technique, our approach shows advantages over several baseline methods based on the experiment results on two synthetic datasets, with and without spatial interference. Our results on real-world climate dataset also align with domain knowledge, further demonstrating the effectiveness of our proposed method.
- [56] arXiv:2405.08175 [pdf, ps, other]
-
Title: Comparative Analysis of AWS Model Deployment ServicesJournal-ref: IJCTT, 72(5), 102-110, 2024Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Amazon Web Services (AWS) offers three important Model Deployment Services for model developers: SageMaker, Lambda, and Elastic Container Service (ECS). These services have critical advantages and disadvantages, influencing model developer's adoption decisions. This comparative analysis reviews the merits and drawbacks of these services. This analysis found that Lambda AWS service leads in efficiency, autoscaling aspects, and integration during model development. However, ECS was found to be outstanding in terms of flexibility, scalability, and infrastructure control; conversely, ECS is better suited when it comes to managing complex container environments during model development, as well as addressing budget concerns -- it is, therefore, the preferred option for model developers whose objective is to achieve complete freedom and framework flexibility with horizontal scaling. ECS is better suited to ensuring performance requirements align with project goals and constraints. The AWS service selection process considered factors that include but are not limited to load balance and cost-effectiveness. ECS is a better choice when model development begins from the abstract. It offers unique benefits, such as the ability to scale horizontally and vertically, making it the best preferable tool for model deployment.
- [57] arXiv:2405.08183 [pdf, ps, html, other]
-
Title: Towards Energy-Aware Federated Learning via MARL: A Dual-Selection Approach for Model and ClientSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Although Federated Learning (FL) is promising in knowledge sharing for heterogeneous Artificial Intelligence of Thing (AIoT) devices, their training performance and energy efficacy are severely restricted in practical battery-driven scenarios due to the ``wooden barrel effect'' caused by the mismatch between homogeneous model paradigms and heterogeneous device capability. As a result, due to various kinds of differences among devices, it is hard for existing FL methods to conduct training effectively in energy-constrained scenarios, such as the battery constraints of devices. To tackle the above issues, we propose an energy-aware FL framework named DR-FL, which considers the energy constraints in both clients and heterogeneous deep learning models to enable energy-efficient FL. Unlike Vanilla FL, DR-FL adopts our proposed Muti-Agents Reinforcement Learning (MARL)-based dual-selection method, which allows participated devices to make contributions to the global model effectively and adaptively based on their computing capabilities and energy capacities in a MARL-based manner. Experiments on various well-known datasets show that DR-FL can not only maximise knowledge sharing among heterogeneous models under the energy constraint of large-scale AIoT systems but also improve the model performance of each involved heterogeneous device.
- [58] arXiv:2405.08187 [pdf, ps, html, other]
-
Title: Optimizing Task Scheduling in Heterogeneous Computing Environments: A Comparative Analysis of CPU, GPU, and ASIC Platforms Using E2C SimulatorSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)
Efficient task scheduling in heterogeneous computing environments is imperative for optimizing resource utilization and minimizing task completion times. In this study, we conducted a comprehensive benchmarking analysis to evaluate the performance of four scheduling algorithms First Come, First-Served (FCFS), FCFS with No Queuing (FCFS-NQ), Minimum Expected Completion Time (MECT), and Minimum Expected Execution Time (MEET) across varying workload scenarios. We defined three workload scenarios: low, medium, and high, each representing different levels of computational demands. Through rigorous experimentation and analysis, we assessed the effectiveness of each algorithm in terms of total completion percentage, energy consumption, wasted energy, and energy per completion. Our findings highlight the strengths and limitations of each algorithm, with MECT and MEET emerging as robust contenders, dynamically prioritizing tasks based on comprehensive estimates of completion and execution times. Furthermore, MECT and MEET exhibit superior energy efficiency compared to FCFS and FCFS-NQ, underscoring their suitability for resource-constrained environments. This study provides valuable insights into the efficacy of task scheduling algorithms in heterogeneous computing environments, enabling informed decision-making to enhance resource allocation, minimize task completion times, and improve energy efficiency
- [59] arXiv:2405.08194 [pdf, ps, html, other]
-
Title: Distributionally Robust Degree Optimization for BATS CodesComments: 8 pages, accepted by 2024 IEEE International Symposium on Information TheorySubjects: Information Theory (cs.IT)
Batched sparse (BATS) code is a network coding solution for multi-hop wireless networks with packet loss. Achieving a close-to-optimal rate relies on an optimal degree distribution. Technical challenges arise from the sensitivity of this distribution to the often empirically obtained rank distribution at the destination node. Specifically, if the empirical distribution overestimates the channel, BATS codes experience a significant rate degradation, leading to unstable rates across different runs and hence unpredictable transmission costs. Confronting this unresolved obstacle, we introduce a formulation for distributionally robust optimization in degree optimization. Deploying the resulting degree distribution resolves the instability of empirical rank distributions, ensuring a close-to-optimal rate, and unleashing the potential of applying BATS codes in real-world scenarios.
- [60] arXiv:2405.08197 [pdf, ps, html, other]
-
Title: IHC Matters: Incorporating IHC analysis to H&E Whole Slide Image Analysis for Improved Cancer Grading via Two-stage Multimodal Bilinear Pooling FusionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Immunohistochemistry (IHC) plays a crucial role in pathology as it detects the over-expression of protein in tissue samples. However, there are still fewer machine learning model studies on IHC's impact on accurate cancer grading. We discovered that IHC and H\&E possess distinct advantages and disadvantages while possessing certain complementary qualities. Building on this observation, we developed a two-stage multi-modal bilinear model with a feature pooling module. This model aims to maximize the potential of both IHC and HE's feature representation, resulting in improved performance compared to their individual use. Our experiments demonstrate that incorporating IHC data into machine learning models, alongside H\&E stained images, leads to superior predictive results for cancer grading. The proposed framework achieves an impressive ACC higher of 0.953 on the public dataset BCI.
- [61] arXiv:2405.08199 [pdf, ps, other]
-
Title: Modeling of Time-varying Wireless Communication Channel with Fading and ShadowingComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
The real-time quantification of the effect of a wireless channel on the transmitting signal is crucial for the analysis and the intelligent design of wireless communication systems for various services. Recent mechanisms to model channel characteristics independent of coding, modulation, signal processing, etc., using deep learning neural networks are promising solutions. However, the current approaches are neither statistically accurate nor able to adapt to the changing environment. In this paper, we propose a new approach that combines a deep learning neural network with a mixture density network model to derive the conditional probability density function (PDF) of receiving power given a communication distance in general wireless communication systems. Furthermore, a deep transfer learning scheme is designed and implemented to allow the channel model to dynamically adapt to changes in communication environments. Extensive experiments on Nakagami fading channel model and Log-normal shadowing channel model with path loss and noise show that the new approach is more statistically accurate, faster, and more robust than the previous deep learning-based channel models.
- [62] arXiv:2405.08200 [pdf, ps, html, other]
-
Title: Interactive Lab Notebooks for Robotics ResearchersSubjects: Computational Engineering, Finance, and Science (cs.CE)
Interactive notebooks, such as Jupyter, have revolutionized the field of data science by providing an integrated environment for data, code, and documentation. However, their adoption by robotics researchers and model developers has been limited. This study investigates the logging and record-keeping practices of robotics researchers, drawing parallels to the pre-interactive notebook era of data science. Through interviews with robotics researchers, we identified the reliance on diverse and often incompatible tools for managing experimental data, leading to challenges in reproducibility and data traceability. Our findings reveal that robotics researchers can benefit from a specialized version of interactive notebooks that supports comprehensive data entry, continuous context capture, and agile data staging. We propose extending interactive notebooks to better serve the needs of robotics researchers by integrating features akin to traditional lab notebooks. This adaptation aims to enhance the organization, analysis, and reproducibility of experimental data in robotics, fostering a more streamlined and efficient research workflow.
- [63] arXiv:2405.08204 [pdf, ps, html, other]
-
Title: A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action DetectionComments: IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel spatiotemporal transformer network that introduces several original components to detect actions in untrimmed videos. First, the multi-feature selective semantic attention model calculates the correlations between spatial and motion features to model spatiotemporal interactions between different action semantics properly. Second, the motion-aware network encodes the locations of action semantics in video frames utilizing the motion-aware 2D positional encoding algorithm. Such a motion-aware mechanism memorizes the dynamic spatiotemporal variations in action frames that current methods cannot exploit. Third, the sequence-based temporal attention model captures the heterogeneous temporal dependencies in action frames. In contrast to standard temporal attention used in natural language processing, primarily aimed at finding similarities between linguistic words, the proposed sequence-based temporal attention is designed to determine both the differences and similarities between video frames that jointly define the meaning of actions. The proposed approach outperforms the state-of-the-art solutions on four spatiotemporal action datasets: AVA 2.2, AVA 2.1, UCF101-24, and EPIC-Kitchens.
- [64] arXiv:2405.08205 [pdf, ps, html, other]
-
Title: Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule SubstratesSubjects: Machine Learning (cs.LG)
Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme's amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen's superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities.
- [65] arXiv:2405.08206 [pdf, ps, html, other]
-
Title: Beyond Theorems: A Counterexample to Potential Markov Game CriteriaSubjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
There are only limited classes of multi-player stochastic games in which independent learning is guaranteed to converge to a Nash equilibrium. Markov potential games are a key example of such classes. Prior work has outlined sets of sufficient conditions for a stochastic game to qualify as a Markov potential game. However, these conditions often impose strict limitations on the game's structure and tend to be challenging to verify. To address these limitations, Mguni et al. [12] introduce a relaxed notion of Markov potential games and offer an alternative set of necessary conditions for categorizing stochastic games as potential games. Under these conditions, the authors claim that a deterministic Nash equilibrium can be computed efficiently by solving a dual Markov decision process. In this paper, we offer evidence refuting this claim by presenting a counterexample.
- [66] arXiv:2405.08209 [pdf, ps, html, other]
-
Title: Who's in and who's out? A case study of multimodal CLIP-filtering in DataCompComments: Content warning: This paper discusses societal stereotypes and sexually-explicit material that may be disturbing, distressing, and/or offensive to the readerSubjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
As training datasets become increasingly drawn from unstructured, uncontrolled environments such as the web, researchers and industry practitioners have increasingly relied upon data filtering techniques to "filter out the noise" of web-scraped data. While datasets have been widely shown to reflect the biases and values of their creators, in this paper we contribute to an emerging body of research that assesses the filters used to create these datasets. We show that image-text data filtering also has biases and is value-laden, encoding specific notions of what is counted as "high-quality" data. In our work, we audit a standard approach of image-text CLIP-filtering on the academic benchmark DataComp's CommonPool by analyzing discrepancies of filtering through various annotation techniques across multiple modalities of image, text, and website source. We find that data relating to several imputed demographic groups -- such as LGBTQ+ people, older women, and younger men -- are associated with higher rates of exclusion. Moreover, we demonstrate cases of exclusion amplification: not only are certain marginalized groups already underrepresented in the unfiltered data, but CLIP-filtering excludes data from these groups at higher rates. The data-filtering step in the machine learning pipeline can therefore exacerbate representation disparities already present in the data-gathering step, especially when existing filters are designed to optimize a specifically-chosen downstream performance metric like zero-shot image classification accuracy. Finally, we show that the NSFW filter fails to remove sexually-explicit content from CommonPool, and that CLIP-filtering includes several categories of copyrighted content at high rates. Our conclusions point to a need for fundamental changes in dataset creation and filtering practices.
- [67] arXiv:2405.08210 [pdf, ps, html, other]
-
Title: Infinite Texture: Text-guided High Resolution Diffusion Texture SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. Our approach fine-tunes a diffusion model on a single texture, and learns to embed that statistical distribution in the output domain of the model. We seed this fine-tuning process with a sample texture patch, which can be optionally generated from a text-to-image model like DALL-E 2. At generation time, our fine-tuned diffusion model is used through a score aggregation strategy to generate output texture images of arbitrary resolution on a single GPU. We compare synthesized textures from our method to existing work in patch-based and deep learning texture synthesis methods. We also showcase two applications of our generated textures in 3D rendering and texture transfer.
- [68] arXiv:2405.08213 [pdf, ps, html, other]
-
Title: Interpreting Latent Student Knowledge Representations in Programming AssignmentsComments: EDM 2024: 17th International Conference on Educational Data MiningSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Recent advances in artificial intelligence for education leverage generative large language models, including using them to predict open-ended student responses rather than their correctness only. However, the black-box nature of these models limits the interpretability of the learned student knowledge representations. In this paper, we conduct a first exploration into interpreting latent student knowledge representations by presenting InfoOIRT, an Information regularized Open-ended Item Response Theory model, which encourages the latent student knowledge states to be interpretable while being able to generate student-written code for open-ended programming questions. InfoOIRT maximizes the mutual information between a fixed subset of latent knowledge states enforced with simple prior distributions and generated student code, which encourages the model to learn disentangled representations of salient syntactic and semantic code features including syntactic styles, mastery of programming skills, and code structures. Through experiments on a real-world programming education dataset, we show that InfoOIRT can both accurately generate student code and lead to interpretable student knowledge representations.
- [69] arXiv:2405.08216 [pdf, ps, html, other]
-
Title: Toward Automated Programming for Robotic Assembly Using ChatGPTSubjects: Robotics (cs.RO)
Despite significant technological advancements, the process of programming robots for adaptive assembly remains labor-intensive, demanding expertise in multiple domains and often resulting in task-specific, inflexible code. This work explores the potential of Large Language Models (LLMs), like ChatGPT, to automate this process, leveraging their ability to understand natural language instructions, generalize examples to new tasks, and write code. In this paper, we suggest how these abilities can be harnessed and applied to real-world challenges in the manufacturing industry. We present a novel system that uses ChatGPT to automate the process of programming robots for adaptive assembly by decomposing complex tasks into simpler subtasks, generating robot control code, executing the code in a simulated workcell, and debugging syntax and control errors, such as collisions. We outline the architecture of this system and strategies for task decomposition and code generation. Finally, we demonstrate how our system can autonomously program robots for various assembly tasks in a real-world project.
- [70] arXiv:2405.08217 [pdf, ps, html, other]
-
Title: Data Valuation with Gradient SimilaritySubjects: Machine Learning (cs.LG); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
High-quality data is crucial for accurate machine learning and actionable analytics, however, mislabeled or noisy data is a common problem in many domains. Distinguishing low- from high-quality data can be challenging, often requiring expert knowledge and considerable manual intervention. Data Valuation algorithms are a class of methods that seek to quantify the value of each sample in a dataset based on its contribution or importance to a given predictive task. These data values have shown an impressive ability to identify mislabeled observations, and filtering low-value data can boost machine learning performance. In this work, we present a simple alternative to existing methods, termed Data Valuation with Gradient Similarity (DVGS). This approach can be easily applied to any gradient descent learning algorithm, scales well to large datasets, and performs comparably or better than baseline valuation methods for tasks such as corrupted label discovery and noise quantification. We evaluate the DVGS method on tabular, image and RNA expression datasets to show the effectiveness of the method across domains. Our approach has the ability to rapidly and accurately identify low-quality data, which can reduce the need for expert knowledge and manual intervention in data cleaning tasks.
- [71] arXiv:2405.08223 [pdf, ps, html, other]
-
Title: An information-theoretic model of shallow and deep language comprehensionComments: 6 pages; accepted to COGSCI 2024Subjects: Computation and Language (cs.CL); Information Theory (cs.IT)
A large body of work in psycholinguistics has focused on the idea that online language comprehension can be shallow or `good enough': given constraints on time or available computation, comprehenders may form interpretations of their input that are plausible but inaccurate. However, this idea has not yet been linked with formal theories of computation under resource constraints. Here we use information theory to formulate a model of language comprehension as an optimal trade-off between accuracy and processing depth, formalized as bits of information extracted from the input, which increases with processing time. The model provides a measure of processing effort as the change in processing depth, which we link to EEG signals and reading times. We validate our theory against a large-scale dataset of garden path sentence reading times, and EEG experiments featuring N400, P600 and biphasic ERP effects. By quantifying the timecourse of language processing as it proceeds from shallow to deep, our model provides a unified framework to explain behavioral and neural signatures of language comprehension.
- [72] arXiv:2405.08226 [pdf, ps, html, other]
-
Title: SeNMo: A Self-Normalizing Deep Learning Model for Enhanced Multi-Omics Data Analysis in OncologyAsim Waqas, Aakash Tripathi, Sabeen Ahmed, Ashwin Mukund, Hamza Farooq, Matthew B. Schabath, Paul Stewart, Mia Naeini, Ghulam RasoolSubjects: Machine Learning (cs.LG)
Multi-omics research has enhanced our understanding of cancer heterogeneity and progression. Investigating molecular data through multi-omics approaches is crucial for unraveling the complex biological mechanisms underlying cancer, thereby enabling effective diagnosis, treatment, and prevention strategies. However, predicting patient outcomes through integration of all available multi-omics data is an under-study research direction. Here, we present SeNMo (Self-normalizing Network for Multi-omics), a deep neural network trained on multi-omics data across 33 cancer types. SeNMo is efficient in handling multi-omics data characterized by high-width (many features) and low-length (fewer samples) attributes. We trained SeNMo for the task of overall survival using pan-cancer data involving 33 cancer sites from Genomics Data Commons (GDC). The training data includes gene expression, DNA methylation, miRNA expression, DNA mutations, protein expression modalities, and clinical data. We evaluated the model's performance in predicting overall survival using concordance index (C-Index). SeNMo performed consistently well in training regime, with the validation C-Index of 0.76 on GDC's public data. In the testing regime, SeNMo performed with a C-Index of 0.758 on a held-out test set. The model showed an average accuracy of 99.8% on the task of classifying the primary cancer type on the pan-cancer test cohort. SeNMo proved to be a mini-foundation model for multi-omics oncology data because it demonstrated robust performance, and adaptability not only across molecular data types but also on the classification task of predicting the primary cancer type of patients. SeNMo can be further scaled to any cancer site and molecular data type. We believe SeNMo and similar models are poised to transform the oncology landscape, offering hope for more effective, efficient, and patient-centric cancer care.
- [73] arXiv:2405.08228 [pdf, ps, html, other]
-
Title: Slow Inter-area Electro-mechanical Oscillations Revisited: Structural Property of Complex Multi-area Electric Power SystemsSubjects: Systems and Control (eess.SY)
This paper introduces a physically-intuitive notion of inter-area dynamics in systems comprising multiple interconnected energy conversion modules. The ideas build on an earlier general approach to setting their structural properties by modeling first stand-alone modular dynamics starting from the fundamental relations between energy stored in modules (components, areas), and constraining explicitly their Tellegen's quantities, power and rate of change of power, in particular. In this paper we derive, by following the same principles, a transformed state-space model for a general nonlinear system. Using this model we show the existence of an area-level interaction variable, intVar, whose rate of change depends solely on the area internal power imbalance. Given these structural properties of stand-alone modules, we define in this paper for the first time an inter-area variable as the difference of power wave incident to tie-line from Area I and the power reflected into tie-lie from Area II. Notably, these power waves represent the rate of change of intVars associated with the two interconnected areas. We illustrate these notions using a linearized case of two lossless inter-connected areas, and show the existence of a new inter-area mode when the areas get connected. We suggest that lessons learned in this paper open possibilities for computationally-efficient modeling and control of inter-area oscillations, and offer further the basis for modeling and control of dynamics in changing systems comprising faster energy conversion processes.
- [74] arXiv:2405.08232 [pdf, ps, html, other]
-
Title: Distributionally Robust Aggregation of Electric Vehicle FlexibilityComments: 6 pages, conferenceSubjects: Computational Engineering, Finance, and Science (cs.CE)
We address the problem of characterising the aggregate flexibility in populations of electric vehicles (EVs) with uncertain charging requirements. Building on previous results that provide exact characterisations of the aggregate flexibility in populations with known charging requirements, in this paper we extend the aggregation methods so that charging requirements are uncertain, but sampled from a given distribution. In particular, we construct \textit{distributionally robust aggregate flexibility sets}, sets of aggregate charging profiles over which we can provide probabilistic guarantees that actual realised populations will be able to track. By leveraging measure concentration results that establish powerful finite sample guarantees, we are able to give tight bounds on these robust flexibility sets, even in low sample regimes that are well suited for aggregating small populations of EVs. We detail explicit methods of calculating these sets and provide numerical results that validate the theory developed here.
- [75] arXiv:2405.08233 [pdf, ps, other]
-
Title: Factors Shaping Financial Success: A Deep Dive into Influencing VariablesComments: 21 pages, 4 figures, 10 tablesSubjects: Machine Learning (cs.LG)
This paper explores various socioeconomic factors that contribute to individual financial success using machine learning algorithms and approaches.
Financial success, a critical aspect of all individual's well-being, is a complex concept influenced by a plethora of different factors. This study aims to understand the true determinants of financial success. It examines the survey data from the National Longitudinal Survey of Youth 1997 by the Bureau of Labor Statistics [1], consisting of a sample of 8,984 individuals's longitudinal data over years. The dataset comprises income variables and a large set of socioeconomic variables of individuals.
An in-depth analysis demonstrates the effectiveness of machine learning algorithms in financial success research, highlights the potential of leveraging longitudinal data to enhance prediction accuracy, and provides valuable insights into how various socioeconomic factors influence financial success.
The findings underscore the significant influence of highest education degree, occupation and gender as the top three determinants of individual income among socioeconomic factors examined. Yearly working hours, age and work tenure emerge as three secondary influencing factors, and all other factors including parental household income, industry, parents' highest grade and others are identified as tertiary factors.
These insights allow researchers to better understand the complex nature of financial success and enable policymakers to grasp the underlying dynamics shaping aspirations, decision-making, and the broader socio-economic fabric of society. This comprehension is crucial for fostering financial success among individuals and advancing broader societal well-being. - [76] arXiv:2405.08237 [pdf, ps, html, other]
-
Title: A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speechComments: Accepted to CogSci 2024Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Speech perception involves storing and integrating sequentially presented items. Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech that may facilitate this temporal processing. In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech with the learning objective of predicting upcoming acoustics. Our simulations revealed temporal dynamics similar to those in brain signals, implying that these properties can arise without linguistic knowledge. Another property shared between brains and the model is that the encoding patterns of phonemes support some degree of cross-context generalization. However, we found evidence that the effectiveness of these generalizations depends on the specific contexts, which suggests that this analysis alone is insufficient to support the presence of context-invariant encoding.
- [77] arXiv:2405.08238 [pdf, ps, html, other]
-
Title: Silver-Tongued and Sundry: Exploring Intersectional Pronouns with ChatGPTComments: Honorable Mention award (top 5%) at CHI '24Journal-ref: CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems (2024), Article No. 511, 1-14Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
ChatGPT is a conversational agent built on a large language model. Trained on a significant portion of human output, ChatGPT can mimic people to a degree. As such, we need to consider what social identities ChatGPT simulates (or can be designed to simulate). In this study, we explored the case of identity simulation through Japanese first-person pronouns, which are tightly connected to social identities in intersectional ways, i.e., intersectional pronouns. We conducted a controlled online experiment where people from two regions in Japan (Kanto and Kinki) witnessed interactions with ChatGPT using ten sets of first-person pronouns. We discovered that pronouns alone can evoke perceptions of social identities in ChatGPT at the intersections of gender, age, region, and formality, with caveats. This work highlights the importance of pronoun use for social identity simulation, provides a language-based methodology for culturally-sensitive persona development, and advances the potential of intersectional identities in intelligent agents.
- [78] arXiv:2405.08240 [pdf, ps, other]
-
Title: Play Across Boundaries: Exploring Cross-Cultural Maldaimonic Game ExperiencesJournal-ref: CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems (2024), Article No. 564, 1-15Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Maldaimonic game experiences occur when people engage in personally fulfilling play through egocentric, destructive, and/or exploitative acts. Initial qualitative work verified this orientation and experiential construct for English-speaking Westerners. In this comparative mixed methods study, we explored whether and how maldaimonic game experiences and orientations play out in Japan, an Eastern gaming capital that may have cultural values incongruous with the Western philosophical basis underlying maldaimonia. We present findings anchored to the initial frameworks on maldaimonia in game experiences that show little divergence between the Japanese and US cohorts. We also extend the qualitative findings with quantitative measures on affect, player experience, and the related constructs of hedonia and eudaimonia. We confirm this novel construct for Japan and set the stage for scale development.
- [79] arXiv:2405.08242 [pdf, ps, other]
-
Title: No Joke: An Embodied Conversational Agent Greeting Older Adults with Humour or a Smile Unrelated to Initial AcceptanceJournal-ref: CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (2024), Article No.: 252, 1-7Subjects: Human-Computer Interaction (cs.HC)
Embodied conversation agents (ECAs) are increasingly being developed for older adults as assistants or companions. Older adults may not be familiar with ECAs, influencing uptake and acceptability. First impressions can correlate strongly with subsequent judgments, even of computer agents, and could influence acceptance. Using the circumplex model of affect, we developed three versions of an ECA -- laughing, smiling, and neutral in expression -- to evaluate how positive first impressions affect acceptance. Results from 249 older adults indicated no statistically significant effects except for general attitudes towards technology and intelligent agents. This questions the potential of laughter, jokes, puns, and smiles as a method of initial engagement for older adults.
- [80] arXiv:2405.08244 [pdf, ps, html, other]
-
Title: Kawaii Computing: Scoping Out the Japanese Notion of Cute in User Experiences with Interactive SystemsJournal-ref: CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (2024), Article No.: 210, 1-9Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Kawaii computing is a new term for a steadily growing body of work on the Japanese notion of "cute" in human-computer interaction (HCI) research and practice. Kawaii is distinguished from general notions of cute by its experiential and culturally-sensitive nature. While it can be designed into the appearance and behaviour of interactive agents, interfaces, and systems, kawaii also refers to certain affective and cultural dimensions experienced by culturally Japanese users, i.e., kawaii user experiences (UX) and mental models of kawaii elicited by the socio-cultural context of Japan. In this scoping review, we map out the ways in which kawaii has been explored within HCI research and related fields as a factor of design and experience. We illuminate theoretical and methodological gaps and opportunities for future work on kawaii computing.
- [81] arXiv:2405.08245 [pdf, ps, other]
-
Title: Progressive enhancement and restoration for mural images under low-light and defected conditions based on multi-receptive field strategySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Ancient murals are valuable cultural heritage with great archaeological value. They provide insights into ancient religions, ceremonies, folklore, among other things through their content. However, due to long-term oxidation and inadequate protection, ancient murals have suffered continuous damage, including peeling and mold etc. Additionally, since ancient murals were typically painted indoors, the light intensity in images captured by digital devices is often low. The poor visibility hampers the further restoration of damaged areas. To address the escalating damage to ancient frescoes and facilitate batch restoration at archaeological sites, we propose a two-stage restoration model which called MER(Mural Enhancement and Restoration net) for ancient murals that are damaged and have been captured in low light. Our two-stage model not only enhances the visual quality of restored images but also achieves commendable results in relevant metric evaluations compared with other competitors. Furthermore, we have launched a website dedicated to the restoration of ancient mural paintings, utilizing the proposed model. Code is available at this https URL.
- [82] arXiv:2405.08246 [pdf, ps, other]
-
Title: Compositional Text-to-Image Generation with Dense Blob RepresentationsComments: ICML 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Existing text-to-image models struggle to follow complex text prompts, raising the need for extra grounding inputs for better controllability. In this work, we propose to decompose a scene into visual primitives - denoted as dense blob representations - that contain fine-grained details of the scene while being modular, human-interpretable, and easy-to-construct. Based on blob representations, we develop a blob-grounded text-to-image diffusion model, termed BlobGEN, for compositional generation. Particularly, we introduce a new masked cross-attention module to disentangle the fusion between blob representations and visual features. To leverage the compositionality of large language models (LLMs), we introduce a new in-context learning approach to generate blob representations from text prompts. Our extensive experiments show that BlobGEN achieves superior zero-shot generation quality and better layout-guided controllability on MS-COCO. When augmented by LLMs, our method exhibits superior numerical and spatial correctness on compositional image generation benchmarks. Project page: this https URL.
- [83] arXiv:2405.08251 [pdf, ps, other]
-
Title: Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale EventsSubjects: Computer Vision and Pattern Recognition (cs.CV)
In large-scale disaster events, the planning of optimal rescue routes depends on the object detection ability at the disaster scene, with one of the main challenges being the presence of dense and occluded objects. Existing methods, which are typically based on the RGB modality, struggle to distinguish targets with similar colors and textures in crowded environments and are unable to identify obscured objects. To this end, we first construct two multimodal dense and occlusion vehicle detection datasets for large-scale events, utilizing RGB and height map modalities. Based on these datasets, we propose a multimodal collaboration network for dense and occluded vehicle detection, MuDet for short. MuDet hierarchically enhances the completeness of discriminable information within and across modalities and differentiates between simple and complex samples. MuDet includes three main modules: Unimodal Feature Hierarchical Enhancement (Uni-Enh), Multimodal Cross Learning (Mul-Lea), and Hard-easy Discriminative (He-Dis) Pattern. Uni-Enh and Mul-Lea enhance the features within each modality and facilitate the cross-integration of features from two heterogeneous modalities. He-Dis effectively separates densely occluded vehicle targets with significant intra-class differences and minimal inter-class differences by defining and thresholding confidence values, thereby suppressing the complex background. Experimental results on two re-labeled multimodal benchmark datasets, the 4K-SAI-LCS dataset, and the ISPRS Potsdam dataset, demonstrate the robustness and generalization of the MuDet. The codes of this work are available openly at \url{this https URL}.
- [84] arXiv:2405.08252 [pdf, ps, other]
-
Title: Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-LearningComments: FLAIRS-37 (2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning. Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble. This not only results in performance improvements over the original REDQ (Chen et al. 2021) and its variant DroQ (Hi-raoka et al. 2022), thereby enhancing Q predictions, but also effectively reduces both the average normalized bias and standard deviation of normalized bias within Q-function ensembles. Importantly, our method also performs well even in scenarios with a low update-to-data (UTD) ratio. Notably, the implementation of our proposed method is straightforward, requiring minimal modifications to the base model.
- [85] arXiv:2405.08254 [pdf, ps, other]
-
Title: Detecting Fallacies in Climate Misinformation: A Technocognitive Approach to Identifying Misleading ArgumentationSubjects: Computation and Language (cs.CL)
Misinformation about climate change is a complex societal issue requiring holistic, interdisciplinary solutions at the intersection between technology and psychology. One proposed solution is a "technocognitive" approach, involving the synthesis of psychological and computer science research. Psychological research has identified that interventions in response to misinformation require both fact-based (e.g., factual explanations) and technique-based (e.g., explanations of misleading techniques) content. However, little progress has been made on documenting and detecting fallacies in climate misinformation. In this study, we apply a previously developed critical thinking methodology for deconstructing climate misinformation, in order to develop a dataset mapping different types of climate misinformation to reasoning fallacies. This dataset is used to train a model to detect fallacies in climate misinformation. Our study shows F1 scores that are 2.5 to 3.5 better than previous works. The fallacies that are easiest to detect include fake experts and anecdotal arguments, while fallacies that require background knowledge, such as oversimplification, misrepresentation, and slothful induction, are relatively more difficult to detect. This research lays the groundwork for development of solutions where automatically detected climate misinformation can be countered with generative technique-based corrections.
- [86] arXiv:2405.08255 [pdf, ps, html, other]
-
Title: Total Variation Distance for Product Distributions is $\#\mathsf{P}$-CompleteArnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, Dimitrios Myrisiotis, A. Pavan, N. V. VinodchandranComments: 5 pages. An extended version of this paper appeared in the proceedings of IJCAI 2023, under the title "On approximating total variation distance" (see this https URL and arXiv:2206.07209)Subjects: Computational Complexity (cs.CC)
We show that computing the total variation distance between two product distributions is $\#\mathsf{P}$-complete. This is in stark contrast with other distance measures such as Kullback-Leibler, Chi-square, and Hellinger, which tensorize over the marginals leading to efficient algorithms.
- [87] arXiv:2405.08260 [pdf, ps, other]
-
Title: Multi-Agent Combinatorial ContractsSubjects: Computer Science and Game Theory (cs.GT)
Combinatorial contracts are emerging as a key paradigm in algorithmic contract design, paralleling the role of combinatorial auctions in algorithmic mechanism design. In this paper we study natural combinatorial contract settings involving teams of agents, each capable of performing multiple actions. This scenario extends two fundamental special cases previously examined in the literature, namely the single-agent combinatorial action model of [Duetting et al., 2021] and the multi-agent binary-action model of [Babaioff et al., 2012, Duetting et al., 2023].
We study the algorithmic and computational aspects of these settings, highlighting the unique challenges posed by the absence of certain monotonicity properties essential for analyzing the previous special cases. To navigate these complexities, we introduce a broad set of novel tools that deepen our understanding of combinatorial contracts environments and yield good approximation guarantees.
Our main result is a constant-factor approximation for submodular multi-agent multi-action problems with value and demand oracles access. This result is tight: we show that this problem admits no PTAS (even under binary actions). As a side product of our main result, we devise an FPTAS, with value and demand oracles, for single-agent combinatorial action scenarios with general reward functions, which is of independent interest. We also provide bounds on the gap between the optimal welfare and the principal's utility. We show that, for subadditive rewards, perhaps surprisingly, this gap scales only logarithmically (rather than linearly) in the size of the action space. - [88] arXiv:2405.08263 [pdf, ps, other]
-
Title: Palette-based Color Transfer between ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
As an important subtopic of image enhancement, color transfer aims to enhance the color scheme of a source image according to a reference one while preserving the semantic context. To implement color transfer, the palette-based color mapping framework was proposed. \textcolor{black}{It is a classical solution that does not depend on complex semantic analysis to generate a new color scheme. However, the framework usually requires manual settings, blackucing its practicality.} The quality of traditional palette generation depends on the degree of color separation. In this paper, we propose a new palette-based color transfer method that can automatically generate a new color scheme. With a redesigned palette-based clustering method, pixels can be classified into different segments according to color distribution with better applicability. {By combining deep learning-based image segmentation and a new color mapping strategy, color transfer can be implemented on foreground and background parts independently while maintaining semantic consistency.} The experimental results indicate that our method exhibits significant advantages over peer methods in terms of natural realism, color consistency, generality, and robustness.
- [89] arXiv:2405.08268 [pdf, ps, other]
-
Title: T-Watch: Towards Timed Execution of Private Transaction in BlockchainsSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
In blockchains such as Bitcoin and Ethereum, transactions represent the primary mechanism that the external world can use to trigger a change of blockchain state. Transactions serve as key sources of evidence and play a vital role in forensic analysis. Timed transaction refers to a specific class of service that enables a user to schedule a transaction to change the blockchain state during a chosen future time-frame. This paper proposes T-Watch, a decentralized and cost-efficient approach for users to schedule timed execution of any type of transaction in Ethereum with privacy guarantees. T-Watch employs a novel combination of threshold secret sharing and decentralized smart contracts. To protect the private elements of a scheduled transaction from getting disclosed before the future time-frame, T-Watch maintains shares of the decryption key of the scheduled transaction using a group of executors recruited in a blockchain network before the specified future time-frame and restores the scheduled transaction at a proxy smart contract to trigger the change of blockchain state at the required time-frame. To reduce the cost of smart contract execution in T-Watch, we carefully design the proposed protocol to run in an optimistic mode by default and then switch to a pessimistic mode once misbehaviors occur. Furthermore, the protocol supports users to form service request pooling to further reduce the gas cost. We rigorously analyze the security of T-Watch and implement the protocol over the Ethereum official test network. The results demonstrate that T-Watch is more scalable compared to the state of the art and could reduce the cost by over 90% through pooling.
- [90] arXiv:2405.08269 [pdf, ps, other]
-
Title: On saturation of the discrepancy principle for nonlinear Tikhonov regularization in Hilbert spacesSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
In this paper we revisit the discrepancy principle for Tikhonov regularization of nonlinear ill-posed problems in Hilbert spaces and provide some new and improved saturation results under less restrictive conditions, comparing with the existing results in the literature.
- [91] arXiv:2405.08270 [pdf, ps, other]
-
Title: Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning-based medical image segmentation models often face performance degradation when deployed across various medical centers, largely due to the discrepancies in data distribution. Test Time Adaptation (TTA) methods, which adapt pre-trained models to test data, have been employed to mitigate such discrepancies. However, existing TTA methods primarily focus on manipulating Batch Normalization (BN) layers or employing prompt and adversarial learning, which may not effectively rectify the inconsistencies arising from divergent data distributions. In this paper, we propose a novel Human-in-the-loop TTA (HiTTA) framework that stands out in two significant ways. First, it capitalizes on the largely overlooked potential of clinician-corrected predictions, integrating these corrections into the TTA process to steer the model towards predictions that coincide more closely with clinical annotation preferences. Second, our framework conceives a divergence loss, designed specifically to diminish the prediction divergence instigated by domain disparities, through the careful calibration of BN parameters. Our HiTTA is distinguished by its dual-faceted capability to acclimatize to the distribution of test data whilst ensuring the model's predictions align with clinical expectations, thereby enhancing its relevance in a medical context. Extensive experiments on a public dataset underscore the superiority of our HiTTA over existing TTA methods, emphasizing the advantages of integrating human feedback and our divergence loss in enhancing the model's performance and adaptability across diverse medical centers.
- [92] arXiv:2405.08272 [pdf, ps, other]
-
Title: VS-Assistant: Versatile Surgery Assistant on the Demand of SurgeonsZhen Chen, Xingjian Luo, Jinlin Wu, Danny T.M. Chan, Zhen Lei, Jinqiao Wang, Sebastien Ourselin, Hongbin LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
The surgical intervention is crucial to patient healthcare, and many studies have developed advanced algorithms to provide understanding and decision-making assistance for surgeons. Despite great progress, these algorithms are developed for a single specific task and scenario, and in practice require the manual combination of different functions, thus limiting the applicability. Thus, an intelligent and versatile surgical assistant is expected to accurately understand the surgeon's intentions and accordingly conduct the specific tasks to support the surgical process. In this work, by leveraging advanced multimodal large language models (MLLMs), we propose a Versatile Surgery Assistant (VS-Assistant) that can accurately understand the surgeon's intention and complete a series of surgical understanding tasks, e.g., surgical scene analysis, surgical instrument detection, and segmentation on demand. Specifically, to achieve superior surgical multimodal understanding, we devise a mixture of projectors (MOP) module to align the surgical MLLM in VS-Assistant to balance the natural and surgical knowledge. Moreover, we devise a surgical Function-Calling Tuning strategy to enable the VS-Assistant to understand surgical intentions, and thus make a series of surgical function calls on demand to meet the needs of the surgeons. Extensive experiments on neurosurgery data confirm that our VS-Assistant can understand the surgeon's intention more accurately than the existing MLLM, resulting in overwhelming performance in textual analysis and visual tasks. Source code and models will be made public.
- [93] arXiv:2405.08277 [pdf, ps, other]
-
Title: AI-driven, Model-Free Current Control: A Deep Symbolic Approach for Optimal Induction Machine PerformanceComments: This work has been accepted for potential publication at the IEEE ECCE Asia 2024 International Power Electronics and Motion Control Conference. Please note that copyright may be transferred without prior noticeSubjects: Systems and Control (eess.SY)
This paper proposed a straightforward and efficient current control solution for induction machines employing deep symbolic regression (DSR). The proposed DSR-based control design offers a simple yet highly effective approach by creating an optimal control model through training and fitting, resulting in an analytical dynamic numerical expression that characterizes the data. Notably, this approach not only produces an understandable model but also demonstrates the capacity to extrapolate and estimate data points outside its training dataset, showcasing its adaptability and resilience. In contrast to conventional state-of-the-art proportional-integral (PI) current controllers, which heavily rely on specific system models, the proposed DSR-based approach stands out for its model independence. Simulation and experimental tests validate its effectiveness, highlighting its superior extrapolation capabilities compared to conventional methods. These findings pave the way for the integration of deep learning methods in power conversion applications, promising improved performance and adaptability in the control of induction machines. The simulation and experimental test results are provided with a 3.7 kw induction machine to verify the efficacy of the proposed control solution.
- [94] arXiv:2405.08278 [pdf, ps, other]
-
Title: Facilitating Feature and Topology Lightweighting: An Ethereum Transaction Graph Compression Method for Malicious Account DetectionComments: Under reviewSubjects: Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
Ethereum has become one of the primary global platforms for cryptocurrency, playing an important role in promoting the diversification of the financial ecosystem. However, the relative lag in regulation has led to a proliferation of malicious activities in Ethereum, posing a serious threat to fund security. Existing regulatory methods usually detect malicious accounts through feature engineering or large-scale transaction graph mining. However, due to the immense scale of transaction data and malicious attacks, these methods suffer from inefficiency and low robustness during data processing and anomaly detection. In this regard, we propose an Ethereum Transaction Graph Compression method named TGC4Eth, which assists malicious account detection by lightweighting both features and topology of the transaction graph. At the feature level, we select transaction features based on their low importance to improve the robustness of the subsequent detection models against feature evasion attacks; at the topology level, we employ focusing and coarsening processes to compress the structure of the transaction graph, thereby improving both data processing and inference efficiency of detection models. Extensive experiments demonstrate that TGC4Eth significantly improves the computational efficiency of existing detection models while preserving the connectivity of the transaction graph. Furthermore, TGC4Eth enables existing detection models to maintain stable performance and exhibit high robustness against feature evasion attacks.
- [95] arXiv:2405.08280 [pdf, ps, other]
-
Title: Parallel-in-Time Iterative Methods for Pricing American OptionsComments: 20 pages, 7 figures, 3 tablesSubjects: Numerical Analysis (math.NA)
For pricing American options, %after suitable discretization in space and time, a sequence of discrete linear complementarity problems (LCPs) or equivalently Hamilton-Jacobi-Bellman (HJB) equations need to be solved in a sequential time-stepping manner. In each time step, the policy iteration or its penalty variant is often applied due to their fast convergence rates. In this paper, we aim to solve for all time steps simultaneously, by applying the policy iteration to an ``all-at-once form" of the HJB equations, where two different parallel-in-time preconditioners are proposed to accelerate the solution of the linear systems within the policy iteration. Our proposed methods are generally applicable for such all-at-once forms of the HJB equation, arising from option pricing problems with optimal stopping and nontrivial underlying asset models. Numerical examples are presented to show the feasibility and robust convergence behavior of the proposed methodology.
- [96] arXiv:2405.08283 [pdf, ps, other]
-
Title: Vector Field-Guided Learning Predictive Control for Motion Planning of Mobile Robots with Unknown DynamicsSubjects: Robotics (cs.RO)
Safe maneuvering capability is critical for mobile robots in complex environments. However, robotic system dynamics are often time-varying, uncertain, or even unknown during the motion planning and control process. Therefore, many existing model-based reinforcement learning (RL) methods could not achieve satisfactory reliability in guaranteeing safety. To address this challenge, we propose a two-level Vector Field-guided Learning Predictive Control (VF-LPC) approach that guarantees safe maneuverability. The first level, the guiding level, generates safe desired trajectories using the designed kinodynamic guiding vector field, enabling safe motion in obstacle-dense environments. The second level, the Integrated Motion Planning and Control (IMPC) level, first uses the deep Koopman operator to learn a nominal dynamics model offline and then updates the model uncertainties online using sparse Gaussian processes (GPs). The learned dynamics and game-based safe barrier function are then incorporated into the learning predictive control framework to generate near-optimal control sequences. We conducted tests to compare the performance of VF-LPC with existing advanced planning methods in an obstacle-dense environment. The simulation results show that it can generate feasible trajectories quickly. Then, VF-LPC is evaluated against motion planning methods that employ model predictive control (MPC) and RL in high-fidelity CarSim software. The results show that VF-LPC outperforms them under metrics of completion time, route length, and average solution time. We also carried out path-tracking control tests on a racing road to validate the model uncertainties learning capability. Finally, we conducted real-world experiments on a Hongqi E-HS3 vehicle, further validating the VF-LPC approach's effectiveness.
- [97] arXiv:2405.08285 [pdf, ps, other]
-
Title: Future Trends in the Design of Memetic Algorithms: the Case of the Linear Ordering ProblemSubjects: Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
The way heuristic optimizers are designed has evolved over the decades, as computing power has increased. Initially, trajectory metaheuristics used to shape the state of the art in many problems, whereas today, population-based mechanisms tend to be more effective.Such has been the case for the Linear Ordering Problem (LOP), a field in which strategies such as Iterated Local Search and Variable Neighborhood Search led the way during the 1990s, but which have now been surpassed by evolutionary and memetic schemes. This paper focuses on understanding how the design of LOP optimizers will change in the future, as computing power continues to increase, yielding two main contributions. On the one hand, a metaheuristic was designed that is capable of effectively exploiting a large amount of computational resources, specifically, computing power equivalent to what a recent core can output during runs lasting over four months. Our analysis of this aspect relied on parallelization, and allowed us to conclude that as the power of the computational resources increases, it will be necessary to boost the capacities of the intensification methods applied in the memetic algorithms to keep the population from stagnating. And on the other, the best-known results for today's most challenging set of instances (xLOLIB2) were significantly outperformed. Instances with sizes ranging from 300 to 1000 were analyzed, and new bounds were established that provide a frame of reference for future research.
- [98] arXiv:2405.08289 [pdf, ps, other]
-
Title: Exploring Equilibrium Strategies in Network Games with Generative AISubjects: Computer Science and Game Theory (cs.GT)
Game theory offers a powerful framework for analyzing strategic interactions among decision-makers, providing tools to model, analyze, and predict their behavior. However, implementing game theory can be challenging due to difficulties in deriving solutions, understanding interactions, and ensuring optimal performance. Traditional non-AI and discriminative AI approaches have made valuable contributions but struggle with limitations in handling large-scale games and dynamic scenarios. In this context, generative AI emerges as a promising solution because of its superior data analysis and generation capabilities. This paper comprehensively summarizes the challenges, solutions, and outlooks of combining generative AI with game theory. We start with reviewing the limitations of traditional non-AI and discriminative AI approaches in employing game theory, and then highlight the necessity and advantages of integrating generative AI. Next, we explore the applications of generative AI in various stages of the game theory lifecycle, including model formulation, solution derivation, and strategy improvement. Additionally, from game theory viewpoint, we propose a generative AI-enabled framework for optimizing machine learning model performance against false data injection attacks, supported by a case study to demonstrate its effectiveness. Finally, we outline future research directions for generative AI-enabled game theory, paving the way for its further advancements and development.
- [99] arXiv:2405.08293 [pdf, ps, other]
-
Title: Airport Delay Prediction with Temporal Fusion TransformersSubjects: Machine Learning (cs.LG)
Since flight delay hurts passengers, airlines, and airports, its prediction becomes crucial for the decision-making of all stakeholders in the aviation industry and thus has been attempted by various previous research. However, previous delay predictions are often categorical and at a highly aggregated level. To improve that, this study proposes to apply the novel Temporal Fusion Transformer model and predict numerical airport arrival delays at quarter hour level for U.S. top 30 airports. Inputs to our model include airport demand and capacity forecasts, historic airport operation efficiency information, airport wind and visibility conditions, as well as enroute weather and traffic conditions. The results show that our model achieves satisfactory performance measured by small prediction errors on the test set. In addition, the interpretability analysis of the model outputs identifies the important input factors for delay prediction.
- [100] arXiv:2405.08295 [pdf, ps, other]
-
Title: SpeechVerse: A Large-scale Generalizable Audio Language ModelNilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, David Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin KirchhoffComments: Single Column, 13 pageSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore develop SpeechVerse, a robust multi-task training and curriculum learning framework that combines pre-trained speech and text foundation models via a small set of learnable parameters, while keeping the pre-trained models frozen during training. The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions. We perform extensive benchmarking that includes comparing our model performance against traditional baselines across several datasets and tasks. Furthermore, we evaluate the model's capability for generalized instruction following by testing on out-of-domain datasets, novel prompts, and unseen tasks. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
- [101] arXiv:2405.08297 [pdf, ps, other]
-
Title: Distance-Restricted Explanations: Theoretical Underpinnings & Efficient ImplementationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
The uses of machine learning (ML) have snowballed in recent years. In many cases, ML models are highly complex, and their operation is beyond the understanding of human decision-makers. Nevertheless, some uses of ML models involve high-stakes and safety-critical applications. Explainable artificial intelligence (XAI) aims to help human decision-makers in understanding the operation of such complex ML models, thus eliciting trust in their operation. Unfortunately, the majority of past XAI work is based on informal approaches, that offer no guarantees of rigor. Unsurprisingly, there exists comprehensive experimental and theoretical evidence confirming that informal methods of XAI can provide human-decision makers with erroneous information. Logic-based XAI represents a rigorous approach to explainability; it is model-based and offers the strongest guarantees of rigor of computed explanations. However, a well-known drawback of logic-based XAI is the complexity of logic reasoning, especially for highly complex ML models. Recent work proposed distance-restricted explanations, i.e. explanations that are rigorous provided the distance to a given input is small enough. Distance-restricted explainability is tightly related with adversarial robustness, and it has been shown to scale for moderately complex ML models, but the number of inputs still represents a key limiting factor. This paper investigates novel algorithms for scaling up the performance of logic-based explainers when computing and enumerating ML model explanations with a large number of inputs.
- [102] arXiv:2405.08298 [pdf, ps, other]
-
Title: Deep Reinforcement Learning for Real-Time Ground Delay Program Revision and Corresponding Flight Delay AssignmentsSubjects: Machine Learning (cs.LG)
This paper explores the optimization of Ground Delay Programs (GDP), a prevalent Traffic Management Initiative used in Air Traffic Management (ATM) to reconcile capacity and demand discrepancies at airports. Employing Reinforcement Learning (RL) to manage the inherent uncertainties in the national airspace system-such as weather variability, fluctuating flight demands, and airport arrival rates-we developed two RL models: Behavioral Cloning (BC) and Conservative Q-Learning (CQL). These models are designed to enhance GDP efficiency by utilizing a sophisticated reward function that integrates ground and airborne delays and terminal area congestion. We constructed a simulated single-airport environment, SAGDP_ENV, which incorporates real operational data along with predicted uncertainties to facilitate realistic decision-making scenarios. Utilizing the whole year 2019 data from Newark Liberty International Airport (EWR), our models aimed to preemptively set airport program rates. Despite thorough modeling and simulation, initial outcomes indicated that the models struggled to learn effectively, attributed potentially to oversimplified environmental assumptions. This paper discusses the challenges encountered, evaluates the models' performance against actual operational data, and outlines future directions to refine RL applications in ATM.
- [103] arXiv:2405.08299 [pdf, ps, other]
-
Title: Differentially Private Federated Learning: A Systematic ReviewJie Fu, Yuan Hong, Xinpeng Ling, Leixia Wang, Xun Ran, Zhiyu Sun, Wendy Hui Wang, Zhili Chen, Yang CaoComments: 37pagesSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
In recent years, privacy and security concerns in machine learning have promoted trusted federated learning to the forefront of research. Differential privacy has emerged as the de facto standard for privacy protection in federated learning due to its rigorous mathematical foundation and provable guarantee. Despite extensive research on algorithms that incorporate differential privacy within federated learning, there remains an evident deficiency in systematic reviews that categorize and synthesize these studies.
Our work presents a systematic overview of the differentially private federated learning. Existing taxonomies have not adequately considered objects and level of privacy protection provided by differential privacy in federated learning. To rectify this gap, we propose a new taxonomy of differentially private federated learning based on definition and guarantee of differential privacy and federated scenarios. Our classification allows for a clear delineation of the protected objects across various differential privacy models and their respective neighborhood levels within federated learning environments. Furthermore, we explore the applications of differential privacy in federated learning scenarios. Our findings provide valuable insights into privacy-preserving federated learning and suggest practical directions for future research. - [104] arXiv:2405.08300 [pdf, ps, other]
-
Title: Vector-Symbolic Architecture for Event-Based Optical FlowSubjects: Computer Vision and Pattern Recognition (cs.CV); Symbolic Computation (cs.SC)
From a perspective of feature matching, optical flow estimation for event cameras involves identifying event correspondences by comparing feature similarity across accompanying event frames. In this work, we introduces an effective and robust high-dimensional (HD) feature descriptor for event frames, utilizing Vector Symbolic Architectures (VSA). The topological similarity among neighboring variables within VSA contributes to the enhanced representation similarity of feature descriptors for flow-matching points, while its structured symbolic representation capacity facilitates feature fusion from both event polarities and multiple spatial scales. Based on this HD feature descriptor, we propose a novel feature matching framework for event-based optical flow, encompassing both model-based (VSA-Flow) and self-supervised learning (VSA-SM) methods. In VSA-Flow, accurate optical flow estimation validates the effectiveness of HD feature descriptors. In VSA-SM, a novel similarity maximization method based on the HD feature descriptor is proposed to learn optical flow in a self-supervised way from events alone, eliminating the need for auxiliary grayscale images. Evaluation results demonstrate that our VSA-based method achieves superior accuracy in comparison to both model-based and self-supervised learning methods on the DSEC benchmark, while remains competitive among both methods on the MVSEC benchmark. This contribution marks a significant advancement in event-based optical flow within the feature matching methodology.
- [105] arXiv:2405.08301 [pdf, ps, other]
-
Title: Coded Downlink Massive Random Access and a Finite de Finetti TheoremComments: 14 Pages, submitted to IEEE Transactions on Information TheorySubjects: Information Theory (cs.IT)
This paper considers a massive connectivity setting in which a base-station (BS) aims to communicate sources $(X_1,\cdots,X_k)$ to a randomly activated subset of $k$ users, among a large pool of $n$ users, via a common downlink message. Although the identities of the $k$ active users are assumed to be known at the BS, each active user only knows whether itself is active and does not know the identities of the other active users. A naive coding strategy is to transmit the sources alongside the identities of the users for which the source information is intended, which would require $H(X_1,\cdots,X_k) + k\log(n)$ bits, because the cost of specifying the identity of a user is $\log(n)$ bits. For large $n$, this overhead can be significant. This paper shows that it is possible to develop coding techniques that eliminate the dependency of the overhead on $n$, if the source distribution follows certain symmetry. Specifically, if the source distribution is independent and identically distributed (i.i.d.) then the overhead can be reduced to at most $O(\log(k))$ bits, and in case of uniform i.i.d. sources, the overhead can be further reduced to $O(1)$ bits. For sources that follow a more general exchangeable distribution, the overhead is at most $O(k)$ bits, and in case of finite-alphabet exchangeable sources, the overhead can be further reduced to $O(\log(k))$ bits. The downlink massive random access problem is closely connected to the study of finite exchangeable sequences. The proposed coding strategy allows bounds on the relative entropy distance between finite exchangeable distributions and i.i.d. mixture distributions to be developed, and gives a new relative entropy version of the finite de Finetti theorem which is scaling optimal.
- [106] arXiv:2405.08302 [pdf, ps, other]
-
Title: Designing Adaptive User Interfaces for mHealth applications targeting chronic disease: A User-Centric ApproachSubjects: Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)
mHealth interventions show significant potential to help in the self-management of chronic diseases, but their under use remains a problem. Considering the substantial diversity among individuals dealing with chronic diseases, tailored strategies are essential. \emph{Adaptive User Interfaces} (AUIs) may help address the diverse and evolving needs of this demographic. To investigate this approach, we developed an AUI prototype informed by existing literature findings. We then used this prototype as the basis for focus group discussions and interview studies with 22 participants managing various chronic diseases, and follow-up surveys of all participants. Through these investigations, we pinpointed key challenges related to the use of AUIs, strategies to improve adaptation design, and potential trade-offs between these challenges and strategies. Concurrently, a quantitative survey was conducted to extract preferences for AUIs in chronic disease-related applications with 90 further participants. This uncovered participants' preferences for various adaptations, data types, collection methods, and involvement levels. Finally, we synthesised these insights and categories, aligning them with existing guidelines and design considerations for mHealth app adaptation design. This resulted in nine guidelines that we refined by a final feedback survey conducted with 20 participants.
- [107] arXiv:2405.08304 [pdf, ps, other]
-
Title: Computational Thought Experiments for a More Rigorous Philosophy and Science of the MindComments: 6 pages, 4 figures, to appear at CogSci 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
We offer philosophical motivations for a method we call Virtual World Cognitive Science (VW CogSci), in which researchers use virtual embodied agents that are embedded in virtual worlds to explore questions in the field of Cognitive Science. We focus on questions about mental and linguistic representation and the ways that such computational modeling can add rigor to philosophical thought experiments, as well as the terminology used in the scientific study of such representations. We find that this method forces researchers to take a god's-eye view when describing dynamical relationships between entities in minds and entities in an environment in a way that eliminates the need for problematic talk of belief and concept types, such as the belief that cats are silly, and the concept CAT, while preserving belief and concept tokens in individual cognizers' minds. We conclude with some further key advantages of VW CogSci for the scientific study of mental and linguistic representation and for Cognitive Science more broadly.
- [108] arXiv:2405.08305 [pdf, ps, other]
-
Title: Collateral Portfolio Optimization in Crypto-Backed StablecoinsComments: Accepted for presentation at MARBLE 2024Subjects: Cryptography and Security (cs.CR)
Stablecoins - crypto tokens whose value is pegged to a real-world asset such as the US Dollar - are an important component of the DeFi ecosystem as they mitigate the impact of token price volatility. In crypto-backed stablecoins, the peg is founded on the guarantee that in case of system shutdown, each stablecoin can be exchanged for a basket of other crypto tokens worth approximately its nominal value. However, price fluctuations that affect the collateral tokens may cause this guarantee to be invalidated. In this work, we investigate the impact of the collateral portfolio's composition on the resilience to this type of catastrophic event. For stablecoins whose developers maintain a significant portion of the collateral (e.g., MakerDAO's Dai), we propose two portfolio optimization methods, based on convex optimization and (semi)variance minimization, that account for the correlation between the various token prices. We compare the optimal portfolios to the historical evolution of Dai's collateral portfolio, and to aid reproducibility, we have made our data and code publicly available.
- [109] arXiv:2405.08310 [pdf, ps, other]
-
Title: Cross-Category Functional Grasp TansferSubjects: Robotics (cs.RO)
The grasp generation of dexterous hand often requires a large number of grasping annotations. Especially for functional grasp-requiring the grasp pose to be convenient for the subsequent use of the object. However, annotating high DoF dexterous hand pose is rather challenging. This prompt us to explore how people achieve manipulations on new objects based on past grasp experiences. We find that people are adept at discovering and leveraging various similarities between objects when grasping new items, including shape, layout, and grasp type. In light of this, we analyze and collect grasp-related similarity relationships among 51 common tool-like object categories and annotate semantic grasp representation for 1768 objects. These data are organized into the form of a knowledge graph, which helps infer our proposed cross-category functional grasp synthesis. Through extensive experiments, we demonstrate that the grasp-related knowledge indeed contributed to achieving functional grasp transfer across unknown or entirely new categories of objects. We will publicly release the dataset and code to facilitate future research.
- [110] arXiv:2405.08311 [pdf, ps, other]
-
Title: A Decoupling and Aggregating Framework for Joint Extraction of Entities and RelationsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Named Entity Recognition and Relation Extraction are two crucial and challenging subtasks in the field of Information Extraction. Despite the successes achieved by the traditional approaches, fundamental research questions remain open. First, most recent studies use parameter sharing for a single subtask or shared features for both two subtasks, ignoring their semantic differences. Second, information interaction mainly focuses on the two subtasks, leaving the fine-grained informtion interaction among the subtask-specific features of encoding subjects, relations, and objects unexplored. Motivated by the aforementioned limitations, we propose a novel model to jointly extract entities and relations. The main novelties are as follows: (1) We propose to decouple the feature encoding process into three parts, namely encoding subjects, encoding objects, and encoding relations. Thanks to this, we are able to use fine-grained subtask-specific features. (2) We propose novel inter-aggregation and intra-aggregation strategies to enhance the information interaction and construct individual fine-grained subtask-specific features, respectively. The experimental results demonstrate that our model outperforms several previous state-of-the-art models. Extensive additional experiments further confirm the effectiveness of our model.
- [111] arXiv:2405.08313 [pdf, ps, other]
-
Title: Establishing Heuristics for Improving the Usability of GUI Machine Learning Tools for Novice UsersSubjects: Human-Computer Interaction (cs.HC)
Machine learning (ML) tools with graphical user interfaces (GUI) are facing demand from novice users who do not have the background of their underlying concepts. These tools are frequently complex and pose unique challenges in terms of interaction and comprehension by novice users. There is yet to be an established set of usability heuristics to guide and assess GUI ML tool design. To address this gap, in this paper, we extend Nielsen's heuristics for evaluating GUI ML Tools through a set of empirical evaluations. To validate the proposed heuristics, user testing was conducted by novice users on a prototype that reflects those heuristics. Based on the results of the evaluations, our new heuristics set improves upon existing heuristics in the context of ML tools. It can serve as a resource for practitioners designing and evaluating these tools.
- [112] arXiv:2405.08315 [pdf, ps, other]
-
Title: Independent Range Sampling on Interval Data (Longer Version)Comments: Ful version of our ICDE2024 paperSubjects: Databases (cs.DB)
Many applications require efficient management of large sets of intervals because many objects are associated with intervals (e.g., time and price intervals). In such interval management systems, range search is a primitive operator for retrieving and analysis tasks. As dataset sizes are growing nowadays, range search results are also becoming larger, which may overwhelm users and incur long computation time. Because applications are usually satisfied with a subset of the result set, it is desirable to efficiently obtain only small samples from the result set.We therefore address the problem of independent range sampling on interval data, which outputs $s$ random samples that overlap a given query interval and are independent of the samples of all previous queries. To efficiently solve this problem theoretically and practically, we propose a variant of an interval tree, namely the augmented interval tree (or AIT), and we show that there exists an exact algorithm that needs $O(n \log n)$ space and $O(\log^{2} n + s)$ time, where $n$ is the dataset size. The simple structure of an AIT provides flexible extensions: (i) its time and space complexities respectively become $O(\log^{2} n + s)$ expected and $O(n)$ by bucketing intervals and (ii) it can deal with weighted intervals and outputs $s$ weighted random samples in $O(\log^{2} n+s\log n)$ time. We conduct extensive experiments on real datasets, and the results demonstrate that our algorithms significantly outperform competitors.
- [113] arXiv:2405.08317 [pdf, ps, other]
-
Title: SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language ModelsRaghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin KirchhoffComments: 9+6 pages, Submitted to ACL 2024Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we design algorithms that can generate adversarial examples to jailbreak SLMs in both white-box and black-box attack settings without human involvement. Additionally, we propose countermeasures to thwart such jailbreaking attacks. Our models, trained on dialog data with speech instructions, achieve state-of-the-art performance on spoken question-answering task, scoring over 80% on both safety and helpfulness metrics. Despite safety guardrails, experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions spanning 12 different toxic categories. However, we demonstrate that our proposed countermeasures reduce the attack success significantly.
- [114] arXiv:2405.08318 [pdf, ps, other]
-
Title: No-Regret Learning of Nash Equilibrium for Black-Box Games via Gaussian ProcessesComments: 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)Subjects: Machine Learning (cs.LG)
This paper investigates the challenge of learning in black-box games, where the underlying utility function is unknown to any of the agents. While there is an extensive body of literature on the theoretical analysis of algorithms for computing the Nash equilibrium with complete information about the game, studies on Nash equilibrium in black-box games are less common. In this paper, we focus on learning the Nash equilibrium when the only available information about an agent's payoff comes in the form of empirical queries. We provide a no-regret learning algorithm that utilizes Gaussian processes to identify the equilibrium in such games. Our approach not only ensures a theoretical convergence rate but also demonstrates effectiveness across a variety collection of games through experimental validation.
- [115] arXiv:2405.08322 [pdf, ps, other]
-
Title: StraightPCF: Straight Point Cloud FilteringComments: This paper has been accepted to the IEEE/CVF CVPR Conference, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Point cloud filtering is a fundamental 3D vision task, which aims to remove noise while recovering the underlying clean surfaces. State-of-the-art methods remove noise by moving noisy points along stochastic trajectories to the clean surfaces. These methods often require regularization within the training objective and/or during post-processing, to ensure fidelity. In this paper, we introduce StraightPCF, a new deep learning based method for point cloud filtering. It works by moving noisy points along straight paths, thus reducing discretization errors while ensuring faster convergence to the clean surfaces. We model noisy patches as intermediate states between high noise patch variants and their clean counterparts, and design the VelocityModule to infer a constant flow velocity from the former to the latter. This constant flow leads to straight filtering trajectories. In addition, we introduce a DistanceModule that scales the straight trajectory using an estimated distance scalar to attain convergence near the clean surface. Our network is lightweight and only has $\sim530K$ parameters, being 17% of IterativePFN (a most recent point cloud filtering network). Extensive experiments on both synthetic and real-world data show our method achieves state-of-the-art results. Our method also demonstrates nice distributions of filtered points without the need for regularization. The implementation code can be found at: this https URL.
- [116] arXiv:2405.08328 [pdf, ps, other]
-
Title: Towards Multi-Task Generative-AI Edge Services with an Attention-based Diffusion DRL ApproachSubjects: Networking and Internet Architecture (cs.NI)
As an emerging paradigm of content creation, AI-Generated Content (AIGC) has been widely adopted by a large number of edge end users. However, the requests for generated content from AIGC users have obvious diversity, and there remains a notable lack of research addressing the variance in user demands for AIGC services. This gap underscores a critical need for suitable AIGC service selection mechanisms satisfying various AIGC user requirements under resource-constrained edge environments. To address this challenge, this paper proposes a novel Attention-based Diffusion Soft Actor-Critic (ADSAC) algorithm to select the appropriate AIGC model in response to heterogeneous AIGC user requests. Specifically, the ADSAC algorithm integrates a diffusion model as the policy network in the off-policy reinforcement learning (RL) framework, to capture the intricate relationships between the characteristics of AIGC tasks and the integrated edge network states. Furthermore, an attention mechanism is utilized to harness the contextual long-range dependencies present in state feature vectors, enhancing the decision-making process. Extensive experiments validate the effectiveness of our algorithm in enhancing the overall user utility and reducing the crash rate of servers. Compared to the existing methods, the proposed ADSAC algorithm outperforms existing methods, reducing the overall user utility loss and the server crash rate by at least 58.3% and 58.4%, respectively. These results demonstrate our ADSAC algorithm is a robust solution to the challenges of diverse and dynamic user requirements in edge-based AIGC application environments.
- [117] arXiv:2405.08329 [pdf, ps, other]
-
Title: Cross-Dataset Generalization For Retinal Lesions SegmentationComments: 6 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Identifying lesions in fundus images is an important milestone toward an automated and interpretable diagnosis of retinal diseases. To support research in this direction, multiple datasets have been released, proposing groundtruth maps for different lesions. However, important discrepancies exist between the annotations and raise the question of generalization across datasets. This study characterizes several known datasets and compares different techniques that have been proposed to enhance the generalisation performance of a model, such as stochastic weight averaging, model soups and ensembles. Our results provide insights into how to combine coarsely labelled data with a finely-grained dataset in order to improve the lesions segmentation.
- [118] arXiv:2405.08331 [pdf, ps, other]
-
Title: Are Generics and Negativity about Social Groups Common on Social Media? A Comparative Analysis of Twitter (X) DataSubjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Generics (unquantified generalizations) are thought to be pervasive in communication and when they are about social groups, this may offend and polarize people because generics gloss over variations between individuals. Generics about social groups might be particularly common on Twitter (X). This remains unexplored, however. Using machine learning (ML) techniques, we therefore developed an automatic classifier for social generics, applied it to more than a million tweets about people, and analyzed the tweets. We found that most tweets (78%) about people contained no generics. However, tweets with social generics received more 'likes' and retweets. Furthermore, while recent psychological research may lead to the prediction that tweets with generics about political groups are more common than tweets with generics about ethnic groups, we found the opposite. However, consistent with recent claims that political animosity is less constrained by social norms than animosity against gender and ethnic groups, negative tweets with generics about political groups were significantly more prevalent and retweeted than negative tweets about ethnic groups. Our study provides the first ML-based insights into the use and impact of social generics on Twitter.
- [119] arXiv:2405.08334 [pdf, ps, other]
-
Title: Could Chemical LLMs benefit from Message PassingComments: Under review at BIOKDD'24Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Pretrained language models (LMs) showcase significant capabilities in processing molecular text, while concurrently, message passing neural networks (MPNNs) demonstrate resilience and versatility in the domain of molecular science. Despite these advancements, we find there are limited studies investigating the bidirectional interactions between molecular structures and their corresponding textual representations. Therefore, in this paper, we propose two strategies to evaluate whether an information integration can enhance the performance: contrast learning, which involves utilizing an MPNN to supervise the training of the LM, and fusion, which exploits information from both models. Our empirical analysis reveals that the integration approaches exhibit superior performance compared to baselines when applied to smaller molecular graphs, while these integration approaches do not yield performance enhancements on large scale graphs.
- [120] arXiv:2405.08337 [pdf, ps, other]
-
Title: Perivascular space Identification Nnunet for Generalised Usage (PINGU)Benjamin Sinclair, Lucy Vivash, Jasmine Moses, Miranda Lynch, William Pham, Karina Dorfmann, Cassandra Marotta, Shaun Koh, Jacob Bunyamin, Ella Rowsthorn, Alex Jarema, Himashi Peiris, Zhaolin Chen, Sandy R Shultz, David K Wright, Dexiao Kong, Sharon L. Naismith, Terence J. OBrien, Meng LawSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Perivascular spaces(PVSs) form a central component of the brainś waste clearance system, the glymphatic system. These structures are visible on MRI images, and their morphology is associated with aging and neurological disease. Manual quantification of PVS is time consuming and subjective. Numerous deep learning methods for PVS segmentation have been developed, however the majority have been developed and evaluated on homogenous datasets and high resolution scans, perhaps limiting their applicability for the wide range of image qualities acquired in clinic and research. In this work we train a nnUNet, a top-performing biomedical image segmentation algorithm, on a heterogenous training sample of manually segmented MRI images of a range of different qualities and resolutions from 6 different datasets. These are compared to publicly available deep learning methods for 3D segmentation of PVS. The resulting model, PINGU (Perivascular space Identification Nnunet for Generalised Usage), achieved voxel and cluster level dice scores of 0.50(SD=0.15), 0.63(0.17) in the white matter(WM), and 0.54(0.11), 0.66(0.17) in the basal ganglia(BG). Performance on data from unseen sites was substantially lower for both PINGU(0.20-0.38(WM, voxel), 0.29-0.58(WM, cluster), 0.22-0.36(BG, voxel), 0.46-0.60(BG, cluster)) and the publicly available algorithms(0.18-0.30(WM, voxel), 0.29-0.38(WM cluster), 0.10-0.20(BG, voxel), 0.15-0.37(BG, cluster)), but PINGU strongly outperformed the publicly available algorithms, particularly in the BG. Finally, training PINGU on manual segmentations from a single site with homogenous scan properties gave marginally lower performances on internal cross-validation, but in some cases gave higher performance on external validation. PINGU stands out as broad-use PVS segmentation tool, with particular strength in the BG, an area of PVS related to vascular disease and pathology.
- [121] arXiv:2405.08340 [pdf, ps, other]
-
Title: Achieving Resolution-Agnostic DNN-based Image Watermarking:A Novel Perspective of Implicit Neural RepresentationSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
DNN-based watermarking methods are rapidly developing and delivering impressive performances. Recent advances achieve resolution-agnostic image watermarking by reducing the variant resolution watermarking problem to a fixed resolution watermarking problem. However, such a reduction process can potentially introduce artifacts and low robustness. To address this issue, we propose the first, to the best of our knowledge, Resolution-Agnostic Image WaterMarking (RAIMark) framework by watermarking the implicit neural representation (INR) of image. Unlike previous methods, our method does not rely on the previous reduction process by directly watermarking the continuous signal instead of image pixels, thus achieving resolution-agnostic watermarking. Precisely, given an arbitrary-resolution image, we fit an INR for the target image. As a continuous signal, such an INR can be sampled to obtain images with variant resolutions. Then, we quickly fine-tune the fitted INR to get a watermarked INR conditioned on a binary secret message. A pre-trained watermark decoder extracts the hidden message from any sampled images with arbitrary resolutions. By directly watermarking INR, we achieve resolution-agnostic watermarking with increased robustness. Extensive experiments show that our method outperforms previous methods with significant improvements: averagely improved bit accuracy by 7%$\sim$29%. Notably, we observe that previous methods are vulnerable to at least one watermarking attack (e.g. JPEG, crop, resize), while ours are robust against all watermarking attacks.
- [122] arXiv:2405.08342 [pdf, ps, other]
-
Title: Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision TransformerComments: Published in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)Journal-ref: 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (2023) 1-4Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Respiratory disease, the third leading cause of deaths globally, is considered a high-priority ailment requiring significant research on identification and treatment. Stethoscope-recorded lung sounds and artificial intelligence-powered devices have been used to identify lung disorders and aid specialists in making accurate diagnoses. In this study, audio-spectrogram vision transformer (AS-ViT), a new approach for identifying abnormal respiration sounds, was developed. The sounds of the lungs are converted into visual representations called spectrograms using a technique called short-time Fourier transform (STFT). These images are then analyzed using a model called vision transformer to identify different types of respiratory sounds. The classification was carried out using the ICBHI 2017 database, which includes various types of lung sounds with different frequencies, noise levels, and backgrounds. The proposed AS-ViT method was evaluated using three metrics and achieved 79.1% and 59.8% for 60:40 split ratio and 86.4% and 69.3% for 80:20 split ratio in terms of unweighted average recall and overall scores respectively for respiratory sound detection, surpassing previous state-of-the-art results.
- [123] arXiv:2405.08343 [pdf, ps, other]
-
Title: Accuracy Evaluation of a Lightweight Analytic Vehicle Dynamics Model for Maneuver PlanningComments: 9 pages, 13 figuresJournal-ref: 2020 5th International Conference on Robotics and Automation Engineering (ICRAE)Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Numerical Analysis (math.NA)
Models for vehicle dynamics play an important role in maneuver planning for automated driving. They are used to derive trajectories from given control inputs, or to evaluate a given trajectory in terms of constraint violation or optimality criteria such as safety, comfort or ecology. Depending on the computation process, models with different assumptions and levels of detail are used; since maneuver planning usually has strong requirements for computation speed at a potentially high number of trajectory evaluations per planning cycle, most of the applied models aim to reduce complexity by implicitly or explicitly introducing simplifying assumptions. While evaluations show that these assumptions may be sufficiently valid under typical conditions, their effect has yet to be studied conclusively.
We propose a model for vehicle dynamics that is convenient for maneuver planning by supporting both an analytic approach of extracting parameters from a given trajectory, and a generative approach of establishing a trajectory from given control inputs. Both applications of the model are evaluated in real-world test drives under dynamic conditions, both on a closed-off test track and on public roads, and effects arising from the simplifying assumptions are analyzed. - [124] arXiv:2405.08344 [pdf, ps, other]
-
Title: No Time to Waste: Squeeze Time into Channel for Mobile Video UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Current architectures for video understanding mainly build upon 3D convolutional blocks or 2D convolutions with additional operations for temporal modeling. However, these methods all regard the temporal axis as a separate dimension of the video sequence, which requires large computation and memory budgets and thus limits their usage on mobile devices. In this paper, we propose to squeeze the time axis of a video sequence into the channel dimension and present a lightweight video recognition network, term as \textit{SqueezeTime}, for mobile video understanding. To enhance the temporal modeling capability of the proposed network, we design a Channel-Time Learning (CTL) Block to capture temporal dynamics of the sequence. This module has two complementary branches, in which one branch is for temporal importance learning and another branch with temporal position restoring capability is to enhance inter-temporal object modeling ability. The proposed SqueezeTime is much lightweight and fast with high accuracies for mobile video understanding. Extensive experiments on various video recognition and action detection benchmarks, i.e., Kinetics400, Kinetics600, HMDB51, AVA2.1 and THUMOS14, demonstrate the superiority of our model. For example, our SqueezeTime achieves $+1.2\%$ accuracy and $+80\%$ GPU throughput gain on Kinetics400 than prior methods. Codes are publicly available at this https URL and this https URL.
- [125] arXiv:2405.08345 [pdf, ps, other]
-
Title: Multi-Robot Rendezvous in Unknown Environment with Limited CommunicationComments: Submit to RAL. 8 pages, 6 figuresSubjects: Robotics (cs.RO)
Rendezvous aims at gathering all robots at a specific location, which is an important collaborative behavior for multirobot systems. However, in an unknown environment, it is challenging to achieve rendezvous. Previous researches mainly focus on special scenarios where communication is not allowed and each robot executes a random searching strategy, which is highly time-consuming, especially in large-scale environments. In this work, we focus on rendezvous in unknown environments where communication is available. We divide this task into two steps: rendezvous based environment exploration with relative pose (RP) estimation and rendezvous point election. A new strategy called partitioned and incomplete exploration for rendezvous (PIER) is proposed to efficiently explore the unknown environment, where lightweight topological maps are constructed and shared among robots for RP estimation with very few communications. Then, a rendezvous point selection algorithm based on the merged topological map is proposed for efficient rendezvous for multi-robot systems. The effectiveness of the proposed methods is validated in both simulations and real-world experiments.
- [126] arXiv:2405.08348 [pdf, ps, other]
-
Title: Foundational Verification of Smart Contracts through Verified CompilationVilhelm Sjöberg, Kinnari Dave, Daniel Britten, Maria A Schett, Xinyuan Sun, Qinshi Wang, Sean Noble Anderson, Steve Reeves, Zhong ShaoComments: 27 pages, 6 figuresSubjects: Programming Languages (cs.PL)
Programs executed on a blockchain - smart contracts - have high financial stakes; their correctness is crucial. We argue, that this correctness needs to be foundational: correctness needs to be based on the operational semantics of their execution environment. In this work we present a foundational system - the DeepSEA system - targeting the Ethereum blockchain as the largest smart contract platform. The DeepSEA system has a small but sufficiently rich programming language amenable for verification, the DeepSEA language, and a verified DeepSEA compiler. Together they enable true end-to-end verification for smart contracts. We demonstrate usability through two case studies: a realistic contract for Decentralized Finance and contract for crowdfunding.
- [127] arXiv:2405.08349 [pdf, ps, other]
-
Title: Model-Free Unsupervised Anomaly detection framework in multivariate time-series of industrial dynamical systemsComments: 21 pages, 2 tables, 10 figuresSubjects: Systems and Control (eess.SY)
In this paper, a new model-free anomaly detection framework is proposed for time-series induced by industrial dynamical systems. The framework lies in the category of conventional approaches which enable appealing features such as, a fast learning with reduced amount of learning data, a reduced memory, a high potential for explainability as well as easiness of incremental learning mechanism to incorporate operator feedback after an alarm is raised an analyzed. All these are crucial features towards acceptance of data-driven solution by industry but they are rarely considered in the comparisons between competing methods which generally exclusively focus on performance metrics. Moreover, the features engineering step involved in the proposed framework is inspired by the time-series being implicitly governed by physical laws as it is generally the case in industrial time-series. Two examples are given to assess the efficiency of the proposed approach.
- [128] arXiv:2405.08351 [pdf, ps, other]
-
Title: KG-EmpiRE: A Community-Maintainable Knowledge Graph for a Sustainable Literature Review on the State and Evolution of Empirical Research in Requirements EngineeringComments: Accepted for publication at the 32nd IEEE International Requirements Engineering conference (RE) 2024Subjects: Software Engineering (cs.SE)
In the last two decades, several researchers provided snapshots of the "current" state and evolution of empirical research in requirements engineering (RE) through literature reviews. However, these literature reviews were not sustainable, as none built on or updated previous works due to the unavailability of the extracted and analyzed data. KG-EmpiRE is a Knowledge Graph (KG) of empirical research in RE based on scientific data extracted from currently 680 papers published in the IEEE International Requirements Engineering Conference (1994-2022). KG-EmpiRE is maintained in the Open Research Knowledge Graph (ORKG), making all data openly and long-term available according to the FAIR data principles. Our long-term goal is to constantly maintain KG-EmpiRE with the research community to synthesize a comprehensive, up-to-date, and long-term available overview of the state and evolution of empirical research in RE. Besides KG-EmpiRE, we provide its analysis with all supplementary materials in a repository. This repository contains all files with instructions for replicating and (re-)using the analysis locally or via executable environments and for repeating the research approach. Since its first release based on 199 papers (2014-2022), KG-EmpiRE and its analysis have been updated twice, currently covering over 650 papers. KG-EmpiRE and its analysis demonstrate how innovative infrastructures, such as the ORKG, can be leveraged to make data from literature reviews FAIR, openly available, and maintainable for the research community in the long term. In this way, we can enable replicable, (re-)usable, and thus sustainable literature reviews to ensure the quality, reliability, and timeliness of their research results.
- [129] arXiv:2405.08352 [pdf, ps, other]
-
Title: Sibson's $\alpha$-Mutual Information and its Variational RepresentationsSubjects: Information Theory (cs.IT); Probability (math.PR)
Information measures can be constructed from Rényi divergences much like mutual information from Kullback-Leibler divergence. One such information measure is known as Sibson's $\alpha$-mutual information and has received renewed attention recently in several contexts: concentration of measure under dependence, statistical learning, hypothesis testing, and estimation theory. In this paper, we survey and extend the state of the art. In particular, we introduce variational representations for Sibson's $\alpha$-mutual information and employ them in each of the contexts just described to derive novel results. Namely, we produce generalized Transportation-Cost inequalities and Fano-type inequalities. We also present an overview of known applications, spanning from learning theory and Bayesian risk to universal prediction.
- [130] arXiv:2405.08353 [pdf, ps, other]
-
Title: Data-driven memory-dependent abstractions of dynamical systems via a Cantor-Kantorovich metricComments: Submitted to IEEE Transactions on Automatic ControlSubjects: Systems and Control (eess.SY)
Abstractions of dynamical systems enable their verification and the design of feedback controllers using simpler, usually discrete, models. In this paper, we propose a data-driven abstraction mechanism based on a novel metric between Markov models. Our approach is based purely on observing output labels of the underlying dynamics, thus opening the road for a fully data-driven approach to construct abstractions. Another feature of the proposed approach is the use of memory to better represent the dynamics in a given region of the state space. We show through numerical examples the usefulness of the proposed methodology.
- [131] arXiv:2405.08355 [pdf, ps, other]
-
Title: Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed BenchmarkComments: 14 pages, 10 figuresSubjects: Computation and Language (cs.CL)
This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical application of tools. Seeking to generate data on a large scale while ensuring reliability, we propose a self-instruct method to generate tools and instances, allowing precise control over the process. Moreover, our Seal-Tools contains hard instances that call multiple tools to complete the job, among which some are nested tool callings. For precise and comprehensive evaluation, we use strict format control and design three metrics from different dimensions. Therefore, Seal-Tools can serve as a new benchmark to evaluate the tool-calling ability of LLMs. Finally, we evaluate several prevalent LLMs and our finetuned model on Seal-Tools. The results show that current systems are far from perfect. The code, data and experiment results are available at this https URL .
- [132] arXiv:2405.08356 [pdf, ps, other]
-
Title: A Model-oriented Reasoning Framework for Privacy Analysis of Complex SystemsComments: 24 pages, 7 figuresSubjects: Cryptography and Security (cs.CR)
This paper proposes a reasoning framework for privacy properties of systems and their environments that can capture any knowledge leaks on different logical levels of the system to answer the question: which entity can learn what? With the term knowledge we refer to any kind of data, meta-data or interpretation of those that might be relevant. To achieve this, we present a modeling framework that forces the developers to explicitly describe which knowledge is available at which entity, which knowledge flows between entities and which knowledge can be inferred from other knowledge. In addition, privacy requirements are specified as rules describing forbidden knowledge for entities. Our modeling approach is incremental, starting from an abstract view of the system and adding details through well-defined transformations. This work is intended to complement existing approaches and introduces steps towards more formal foundations for privacy oriented analyses while keeping them as accessible as possible. It is designed to be extensible through schemata and vocabulary to enable compatibility with external requirements and standards.
- [133] arXiv:2405.08359 [pdf, ps, other]
-
Title: GPS-IDS: An Anomaly-based GPS Spoofing Attack Detection Framework for Autonomous VehiclesComments: Article under review at IEEE Transactions on Dependable and Secure Computing. For associated AV-GPS-Dataset, see this https URLSubjects: Cryptography and Security (cs.CR); Robotics (cs.RO)
Autonomous Vehicles (AVs) heavily rely on sensors and communication networks like Global Positioning System (GPS) to navigate autonomously. Prior research has indicated that networks like GPS are vulnerable to cyber-attacks such as spoofing and jamming, thus posing serious risks like navigation errors and system failures. These threats are expected to intensify with the widespread deployment of AVs, making it crucial to detect and mitigate such attacks. This paper proposes GPS Intrusion Detection System, or GPS-IDS, an Anomaly Behavior Analysis (ABA)-based intrusion detection framework to detect GPS spoofing attacks on AVs. The framework uses a novel physics-based vehicle behavior model where a GPS navigation model is integrated into the conventional dynamic bicycle model for accurate AV behavior representation. Temporal features derived from this behavior model are analyzed using machine learning to detect normal and abnormal navigation behavior. The performance of the GPS-IDS framework is evaluated on the AV-GPS-Dataset - a real-world dataset collected by the team using an AV testbed. The dataset has been publicly released for the global research community. To the best of our knowledge, this dataset is the first of its kind and will serve as a useful resource to address such security challenges.
- [134] arXiv:2405.08360 [pdf, ps, other]
-
Title: A Local discontinuous Galerkin method for the Benajamin-Ono equationComments: arXiv admin note: text overlap with arXiv:2404.18069Subjects: Numerical Analysis (math.NA)
The main purpose of this paper is to design a local discontinuous Galerkin (LDG) method for the Benjamin-Ono equation. We analyze the stability and error estimates for the semi-discrete LDG scheme. We prove that the scheme is $L^2$-stable and it converges at a rate $\mathcal{O}(h^{k+1/2})$ for general nonlinear flux. Furthermore, we develop a fully discrete LDG scheme using the four-stage fourth order Runge-Kutta method and ensure the devised scheme is strongly stable in case of linear flux using two-step and three-step stability approach under an appropriate time step constraint. Numerical examples are provided to validate the efficiency and accuracy of the method.
- [135] arXiv:2405.08363 [pdf, ps, other]
-
Title: UnMarker: A Universal Attack on Defensive WatermarkingSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Reports regarding the misuse of $\textit{Generative AI}$ ($\textit{GenAI}$) to create harmful deepfakes are emerging daily. Recently, defensive watermarking, which enables $\textit{GenAI}$ providers to hide fingerprints in their images to later use for deepfake detection, has been on the rise. Yet, its potential has not been fully explored. We present $\textit{UnMarker}$ -- the first practical $\textit{universal}$ attack on defensive watermarking. Unlike existing attacks, $\textit{UnMarker}$ requires no detector feedback, no unrealistic knowledge of the scheme or similar models, and no advanced denoising pipelines that may not be available. Instead, being the product of an in-depth analysis of the watermarking paradigm revealing that robust schemes must construct their watermarks in the spectral amplitudes, $\textit{UnMarker}$ employs two novel adversarial optimizations to disrupt the spectra of watermarked images, erasing the watermarks. Evaluations against the $\textit{SOTA}$ prove its effectiveness, not only defeating traditional schemes while retaining superior quality compared to existing attacks but also breaking $\textit{semantic}$ watermarks that alter the image's structure, reducing the best detection rate to $43\%$ and rendering them useless. To our knowledge, $\textit{UnMarker}$ is the first practical attack on $\textit{semantic}$ watermarks, which have been deemed the future of robust watermarking. $\textit{UnMarker}$ casts doubts on the very penitential of this countermeasure and exposes its paradoxical nature as designing schemes for robustness inevitably compromises other robustness aspects.
- [136] arXiv:2405.08366 [pdf, ps, html, other]
-
Title: Towards Principled Evaluations of Sparse Autoencoders for Interpretability and ControlSubjects: Machine Learning (cs.LG)
Disentangling model activations into meaningful features is a central problem in interpretability. However, the lack of ground-truth for these features in realistic scenarios makes the validation of recent approaches, such as sparse dictionary learning, elusive. To overcome this, we propose a framework to evaluate feature dictionaries in the context of specific tasks, by comparing them against \emph{supervised} feature dictionaries. First, we demonstrate that supervised dictionaries achieve excellent approximation, control and interpretability of model computations on the task. Second, we use the supervised dictionaries to develop and contextualize evaluations of unsupervised dictionaries along the same three axes.
We apply this framework to the indirect object identification task (IOI) using GPT-2 Small, with sparse autoencoders (SAEs) trained on either the IOI or OpenWebText datasets. We find that these SAEs capture interpretable features for the IOI task, but they are not as successful as supervised features in controlling the model. Finally, we observe two qualitative phenomena in SAE training: feature occlusion (where a causally relevant concept is robustly overshadowed by even slightly higher-magnitude ones in the learned features), and feature over-splitting (where binary features split into many smaller features without clear interpretation). We hope that our framework will be a useful step towards more objective and grounded evaluations of sparse dictionary learning methods. - [137] arXiv:2405.08372 [pdf, ps, other]
-
Title: Reasoning about Interior Mutability in Rust using Library-Defined CapabilitiesSubjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
Existing automated verification techniques for safe Rust code rely on the strong type-system properties to reason about programs, especially to deduce which memory locations do not change (i.e., are framed) across function calls. However, these type guarantees do not hold in the presence of interior mutability (e.g., when interacting with any concurrent data structure). As a consequence, existing verification techniques for safe code such as Prusti and Creusot are either unsound or fundamentally incomplete if applied to this setting. In this work, we present the first technique capable of automatically verifying safe clients of existing interiorly mutable types. At the core of our approach, we identify a novel notion of implicit capabilities: library-defined properties that cannot be expressed using Rust's types. We propose new annotations to specify these capabilities and a first-order logic encoding suitable for program verification. We have implemented our technique in a verifier called Mendel and used it to prove absence of panics in Rust programs that make use of popular standard-library types with interior mutability, including Rc, Arc, Cell, RefCell, AtomicI32, Mutex and RwLock. Our evaluation shows that these library annotations are useful for verifying usages of real-world libraries, and powerful enough to require zero client-side annotations in many of the verified programs.
- [138] arXiv:2405.08373 [pdf, ps, other]
-
Title: PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM EnsemblesComments: Paper accepted for oral presentation at Clinical NLP workshop, NAACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This paper describes our approach to the MEDIQA-CORR shared task, which involves error detection and correction in clinical notes curated by medical professionals. This task involves handling three subtasks: detecting the presence of errors, identifying the specific sentence containing the error, and correcting it. Through our work, we aim to assess the capabilities of Large Language Models (LLMs) trained on a vast corpora of internet data that contain both factual and unreliable information. We propose to comprehensively address all subtasks together, and suggest employing a unique prompt-based in-context learning strategy. We will evaluate its efficacy in this specialized task demanding a combination of general reasoning and medical knowledge. In medical systems where prediction errors can have grave consequences, we propose leveraging self-consistency and ensemble methods to enhance error correction and error detection performance.
- [139] arXiv:2405.08377 [pdf, ps, other]
-
Title: ASP-Completeness of Hamiltonicity in Grid Graphs, with Applications to Loop PuzzlesComments: 34 pages, 41 figures. To appear at Fun with Algorithms 2024Subjects: Computational Complexity (cs.CC)
We prove that Hamiltonicity in maximum-degree-3 grid graphs (directed or undirected) is ASP-complete, i.e., it has a parsimonious reduction from every NP search problem (including a polynomial-time bijection between solutions). As a consequence, given k Hamiltonian cycles, it is NP-complete to find another; and counting Hamiltonian cycles is #P-complete. If we require the grid graph's vertices to form a full $m \times n$ rectangle, then we show that Hamiltonicity remains ASP-complete if the edges are directed or if we allow removing some edges (whereas including all undirected edges is known to be easy). These results enable us to develop a stronger "T-metacell" framework for proving ASP-completeness of rectangular puzzles, which requires building just a single gadget representing a degree-3 grid-graph vertex. We apply this general theory to prove ASP-completeness of 38 pencil-and-paper puzzles where the goal is to draw a loop subject to given constraints: Slalom, Onsen-meguri, Mejilink, Detour, Tapa-Like Loop, Kouchoku, Icelom; Masyu, Yajilin, Nagareru, Castle Wall, Moon or Sun, Country Road, Geradeweg, Maxi Loop, Mid-loop, Balance Loop, Simple Loop, Haisu, Reflect Link, Linesweeper; Vertex/Touch Slitherlink, Dotchi-Loop, Ovotovata, Building Walk, Rail Pool, Disorderly Loop, Ant Mill, Koburin, Mukkonn Enn, Rassi Silai, (Crossing) Ichimaga, Tapa, Canal View, Aqre, and Paintarea. The last 14 of these puzzles were not even known to be NP-hard. Along the way, we prove ASP-completeness of some simple forms of Tree-Residue Vertex-Breaking (TRVB), including planar multigraphs with degree-6 breakable vertices, or with degree-4 breakable and degree-1 unbreakable vertices.
- [140] arXiv:2405.08380 [pdf, ps, other]
-
Title: CIER: A Novel Experience Replay Approach with Causal Inference in Deep Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In the training process of Deep Reinforcement Learning (DRL), agents require repetitive interactions with the environment. With an increase in training volume and model complexity, it is still a challenging problem to enhance data utilization and explainability of DRL training. This paper addresses these challenges by focusing on the temporal correlations within the time dimension of time series. We propose a novel approach to segment multivariate time series into meaningful subsequences and represent the time series based on these subsequences. Furthermore, the subsequences are employed for causal inference to identify fundamental causal factors that significantly impact training outcomes. We design a module to provide feedback on the causality during DRL training. Several experiments demonstrate the feasibility of our approach in common environments, confirming its ability to enhance the effectiveness of DRL training and impart a certain level of explainability to the training process. Additionally, we extended our approach with priority experience replay algorithm, and experimental results demonstrate the continued effectiveness of our approach.
- [141] arXiv:2405.08392 [pdf, ps, other]
-
Title: Neuromorphic Robust Estimation of Nonlinear Dynamical Systems Applied to Satellite RendezvousComments: 11 figures, 7 tables, 37 pages. arXiv admin note: text overlap with arXiv:2307.07963Subjects: Systems and Control (eess.SY); Earth and Planetary Astrophysics (astro-ph.EP); Neural and Evolutionary Computing (cs.NE)
State estimation of nonlinear dynamical systems has long aimed to balance accuracy, computational efficiency, robustness, and reliability. The rapid evolution of various industries has amplified the demand for estimation frameworks that satisfy all these factors. This study introduces a neuromorphic approach for robust filtering of nonlinear dynamical systems: SNN-EMSIF (spiking neural network-extended modified sliding innovation filter). SNN-EMSIF combines the computational efficiency and scalability of SNNs with the robustness of EMSIF, an estimation framework designed for nonlinear systems with zero-mean Gaussian noise. Notably, the weight matrices are designed according to the system model, eliminating the need for a learning process. The framework's efficacy is evaluated through comprehensive Monte Carlo simulations, comparing SNN-EMSIF with EKF and EMSIF. Additionally, it is compared with SNN-EKF in the presence of modeling uncertainties and neuron loss, using RMSEs as a metric. The results demonstrate the superior accuracy and robustness of SNN-EMSIF. Further analysis of runtimes and spiking patterns reveals an impressive reduction of 85% in emitted spikes compared to possible spikes, highlighting the computational efficiency of SNN-EMSIF. This framework offers a promising solution for robust estimation in nonlinear dynamical systems, opening new avenues for efficient and reliable estimation in various industries that can benefit from neuromorphic computing.
- [142] arXiv:2405.08395 [pdf, ps, html, other]
-
Title: Cross-Blockchain Communication Using Oracles With an Off-Chain Aggregation Mechanism Based on zk-SNARKsSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
The closed architecture of prevailing blockchain systems renders the usage of this technology mostly infeasible for a wide range of real-world problems. Most blockchains trap users and applications in their isolated space without the possibility of cooperating or switching to other blockchains. Therefore, blockchains need additional mechanisms for seamless communication and arbitrary data exchange between each other and external systems. Unfortunately, current approaches for cross-blockchain communication are resource-intensive or require additional blockchains or tailored solutions depending on the applied consensus mechanisms of the connected blockchains. Therefore, we propose an oracle with an off-chain aggregation mechanism based on ZeroKnowledge Succinct Non-interactive Arguments of Knowledge (zk-SNARKs) to facilitate cross-blockchain communication. The oracle queries data from another blockchain and applies a rollup-like mechanism to move state and computation off-chain. The zkOracle contract only expects the transferred data, an updated state root, and proof of the correct execution of the aggregation mechanism. The proposed solution only requires constant 378 kgas to submit data on the Ethereum blockchain and is primarily independent of the underlying technology of the queried blockchains.
- [143] arXiv:2405.08396 [pdf, ps, other]
-
Title: Coupled-Band ESSFM for Low-Complexity DBPComments: The paper has been submitted for publication to ECOC 2024Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
We propose a novel digital backpropagation (DBP) technique that combines perturbation theory, subband processing, and splitting ratio optimization. We obtain 0.23 dB, 0.47 dB, or 0.91 dB gains w.r.t. dispersion compensation with only 74, 161, or 681 real multiplications/2D-symbol, improving significantly on existing DBP techniques.
- [144] arXiv:2405.08400 [pdf, ps, other]
-
Title: Stylometric Watermarks for Large Language ModelsComments: 19 pages, 4 figures, 9 tablesSubjects: Computation and Language (cs.CL)
The rapid advancement of large language models (LLMs) has made it increasingly difficult to distinguish between text written by humans and machines. Addressing this, we propose a novel method for generating watermarks that strategically alters token probabilities during generation. Unlike previous works, this method uniquely employs linguistic features such as stylometry. Concretely, we introduce acrostica and sensorimotor norms to LLMs. Further, these features are parameterized by a key, which is updated every sentence. To compute this key, we use semantic zero shot classification, which enhances resilience. In our evaluation, we find that for three or more sentences, our method achieves a false positive and false negative rate of 0.02. For the case of a cyclic translation attack, we observe similar results for seven or more sentences. This research is of particular of interest for proprietary LLMs to facilitate accountability and prevent societal harm.
- [145] arXiv:2405.08401 [pdf, ps, other]
-
Title: Realtime Global Optimization of a Fail-Safe Emergency Stop Maneuver for Arbitrary Electrical / Electronical Failures in Automated DrivingComments: 8 pages, 7 figuresJournal-ref: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Numerical Analysis (math.NA)
In the event of a critical system failures in auto-mated vehicles, fail-operational or fail-safe measures provide minimum guarantees for the vehicle's performance, depending on which of its subsystems remain operational. Various such methods have been proposed which, upon failure, use different remaining sets of operational subsystems to execute maneuvers that bring the vehicle into a safe state under different environmental conditions. One particular such method proposes a fail-safe emergency stop system that requires no particular electric or electronic subsystem to be available after failure, and still provides a basic situation-dependent emergency stop maneuver. This is achieved by preemptively setting parameters to a hydraulic / mechanical system prior to failure, which after failure executes the preset maneuver "blindly". The focus of this paper is the particular challenge of implementing a lightweight planning algorithm that can cope with the complex uncertainties of the given task while still providing a globally optimal solution at regular intervals, based on the perceived and predicted environment of the automated vehicle.
- [146] arXiv:2405.08402 [pdf, ps, other]
-
Title: Investigating the 'Autoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's PretrainingSubjects: Computation and Language (cs.CL)
Self-supervised learning has shown great success in Speech Recognition. However, it has been observed that finetuning all layers of the learned model leads to lower performance compared to resetting top layers. This phenomenon is attributed to the ''autoencoder'' behavior: top layers contain information closer to the input and are less suitable for tasks that require linguistic information, such as Speech this http URL better our understanding of this behavior, we propose to study the evolution of high-level information within the model during pretraining. We focus on the HuBERT model, which exhibits a less pronounced ''autoencoder'' behavior. By experimentally exploring various factors that may have an impact, we aim to improve the training procedure and enhance the top layers of HuBERT for high-level tasks.Furthermore, our experiments demonstrate that these improvements in the training procedure result in faster convergence and competitive performance on downstream tasks.
- [147] arXiv:2405.08403 [pdf, ps, html, other]
-
Title: TFWT: Tabular Feature Weighting with TransformerSubjects: Machine Learning (cs.LG)
In this paper, we propose a novel feature weighting method to address the limitation of existing feature processing methods for tabular data. Typically the existing methods assume equal importance across all samples and features in one dataset. This simplified processing methods overlook the unique contributions of each feature, and thus may miss important feature information. As a result, it leads to suboptimal performance in complex datasets with rich features. To address this problem, we introduce Tabular Feature Weighting with Transformer, a novel feature weighting approach for tabular data. Our method adopts Transformer to capture complex feature dependencies and contextually assign appropriate weights to discrete and continuous features. Besides, we employ a reinforcement learning strategy to further fine-tune the weighting process. Our extensive experimental results across various real-world datasets and diverse downstream tasks show the effectiveness of TFWT and highlight the potential for enhancing feature weighting in tabular data analysis.
- [148] arXiv:2405.08406 [pdf, ps, html, other]
-
Title: Towards a Hybrid Digital Twin: Physics-Informed Neural Networks as Surrogate Model of a Reinforced Concrete BeamSubjects: Computational Engineering, Finance, and Science (cs.CE)
In this study, we investigate the potential of fast-to-evaluate surrogate modeling techniques for developing a hybrid digital twin of a steel-reinforced concrete beam, serving as a representative example of a civil engineering structure. As surrogates, two distinct models are developed utilizing physics-informed neural networks, which integrate experimental data with given governing laws of physics. The experimental data (sensor data) is obtained from a previously conducted four-point bending test. The first surrogate model predicts strains at fixed locations along the center line of the beam for various time instances. This time-dependent surrogate model is inspired by the motion of a harmonic oscillator. For this study, we further compare the physics-based approach with a purely data-driven method, revealing the significance of physical laws for the extrapolation capabilities of models in scenarios with limited access to experimental data. Furthermore, we identify the natural frequency of the system by utilizing the physics-based model as an inverse solver. For the second surrogate model, we then focus on a fixed instance in time and combine the sensor data with the equations of linear elasticity to predict the strain distribution within the beam. This example reveals the importance of balancing different loss components through the selection of suitable loss weights.
- [149] arXiv:2405.08411 [pdf, ps, other]
-
Title: Large-Scale Metric Computation in Online Controlled Experiment PlatformSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Online controlled experiment (also called A/B test or experiment) is the most important tool for decision-making at a wide range of data-driven companies like Microsoft, Google, Meta, etc. Metric computation is the core procedure for reaching a conclusion during an experiment. With the growth of experiments and metrics in an experiment platform, computing metrics efficiently at scale becomes a non-trivial challenge. This work shows how metric computation in WeChat experiment platform can be done efficiently using bit-sliced index (BSI) arithmetic. This approach has been implemented in a real world system and the performance results are presented, showing that the BSI arithmetic approach is very suitable for large-scale metric computation scenarios.
- [150] arXiv:2405.08418 [pdf, ps, other]
-
Title: The Stiffness of 3-PRS PM Across Parasitic and Orientational WorkspaceComments: arXiv admin note: text overlap with arXiv:2404.18575Subjects: Robotics (cs.RO)
This study investigates the stiffness characteristics of the Sprint Z3 head, also known as 3-PRS Parallel Kinematics Machines, which are among the most extensively researched and viably successful manipulators for precision machining applications. Despite the wealth of research on these robotic manipulators, no previous work has demonstrated their stiffness performance within the parasitic motion space. Such an undesired motion influences their stiffness properties, as stiffness is configuration-dependent. Addressing this gap, this paper develops a stiffness model that accounts for both the velocity-level parasitic motion space and the regular workspace. Numerical simulations are provided to illustrate the stiffness characteristics of the manipulator across all considered spaces. The results indicate that the stiffness profile within the parasitic motion space is both shallower and the values are smaller when compared to the stiffness distribution across the orientation workspace. This implies that evaluating a manipulator's performance adequately requires assessing its ability to resist external loads during parasitic motion. Therefore, comprehending this aspect is crucial for redesigning components to enhance overall stiffness.
- [151] arXiv:2405.08419 [pdf, ps, html, other]
-
Title: WaterMamba: Visual State Space Model for Underwater Image EnhancementComments: arXiv admin note: substantial text overlap with arXiv:2403.06098Subjects: Computer Vision and Pattern Recognition (cs.CV)
Underwater imaging often suffers from low quality due to factors affecting light propagation and absorption in water. To improve image quality, some underwater image enhancement (UIE) methods based on convolutional neural networks (CNN) and Transformer have been proposed. However, CNN-based UIE methods are limited in modeling long-range dependencies, and Transformer-based methods involve a large number of parameters and complex self-attention mechanisms, posing efficiency challenges. Considering computational complexity and severe underwater image degradation, a state space model (SSM) with linear computational complexity for UIE, named WaterMamba, is proposed. We propose spatial-channel omnidirectional selective scan (SCOSS) blocks comprising spatial-channel coordinate omnidirectional selective scan (SCCOSS) modules and a multi-scale feedforward network (MSFFN). The SCOSS block models pixel and channel information flow, addressing dependencies. The MSFFN facilitates information flow adjustment and promotes synchronized operations within SCCOSS modules. Extensive experiments showcase WaterMamba's cutting-edge performance with reduced parameters and computational resources, outperforming state-of-the-art methods on various datasets, validating its effectiveness and generalizability. The code will be released on GitHub after acceptance.
- [152] arXiv:2405.08424 [pdf, ps, html, other]
-
Title: Tackling Prevalent Conditions in Unsupervised Combinatorial Optimization: Cardinality, Minimum, Covering, and MoreComments: ICML 2024Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Combinatorial optimization (CO) is naturally discrete, making machine learning based on differentiable optimization inapplicable. Karalias & Loukas (2020) adapted the probabilistic method to incorporate CO into differentiable optimization. Their work ignited the research on unsupervised learning for CO, composed of two main components: probabilistic objectives and derandomization. However, each component confronts unique challenges. First, deriving objectives under various conditions (e.g., cardinality constraints and minimum) is nontrivial. Second, the derandomization process is underexplored, and the existing derandomization methods are either random sampling or naive rounding. In this work, we aim to tackle prevalent (i.e., commonly involved) conditions in unsupervised CO. First, we concretize the targets for objective construction and derandomization with theoretical justification. Then, for various conditions commonly involved in different CO problems, we derive nontrivial objectives and derandomization to meet the targets. Finally, we apply the derivations to various CO problems. Via extensive experiments on synthetic and real-world graphs, we validate the correctness of our derivations and show our empirical superiority w.r.t. both optimization quality and speed.
- [153] arXiv:2405.08427 [pdf, ps, other]
-
Title: Impact of Stickers on Multimodal Chat Sentiment Analysis and Intent Recognition: A New Task, Dataset and BaselineComments: 10 pages, 6 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Stickers are increasingly used in social media to express sentiment and intent. When finding typing troublesome, people often use a sticker instead. Despite the significant impact of stickers on sentiment analysis and intent recognition, little research has been conducted. To address this gap, we propose a new task: Multimodal chat Sentiment Analysis and Intent Recognition involving Stickers (MSAIRS). Additionally, we introduce a novel multimodal dataset containing Chinese chat records and stickers excerpted from several mainstream social media platforms. Our dataset includes paired data with the same text but different stickers, and various stickers consisting of the same images with different texts, allowing us to better understand the impact of stickers on chat sentiment and intent. We also propose an effective multimodal joint model, MMSAIR, for our task, which is validated on our datasets and indicates that visual information of stickers counts. Our dataset and code will be publicly available.
- [154] arXiv:2405.08429 [pdf, ps, html, other]
-
Title: TEDNet: Twin Encoder Decoder Neural Network for 2D Camera and LiDAR Road DetectionMartín Bayón-Gutiérrez, María Teresa García-Ordás, Héctor Alaiz Moretón, Jose Aveleira-Mata, Sergio Rubio Martín, José Alberto Benítez-AndradesComments: Source code: this https URLJournal-ref: M Bay\'on-Guti\'errez, MT Garc\'ia-Ord\'as, H Alaiz Moret\'on, J Aveleira-Mata, S Rubio-Mart\'in, JA Ben\'itez-Andrades. TEDNet: Twin Encoder Decoder Neural Network for 2D Camera and LiDAR Road Detection. Logic Journal of the IGPL. 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Robust road surface estimation is required for autonomous ground vehicles to navigate safely. Despite it becoming one of the main targets for autonomous mobility researchers in recent years, it is still an open problem in which cameras and LiDAR sensors have demonstrated to be adequate to predict the position, size and shape of the road a vehicle is driving on in different environments. In this work, a novel Convolutional Neural Network model is proposed for the accurate estimation of the roadway surface. Furthermore, an ablation study has been conducted to investigate how different encoding strategies affect model performance, testing 6 slightly different neural network architectures. Our model is based on the use of a Twin Encoder-Decoder Neural Network (TEDNet) for independent camera and LiDAR feature extraction, and has been trained and evaluated on the Kitti-Road dataset. Bird's Eye View projections of the camera and LiDAR data are used in this model to perform semantic segmentation on whether each pixel belongs to the road surface. The proposed method performs among other state-of-the-art methods and operates at the same frame-rate as the LiDAR and cameras, so it is adequate for its use in real-time applications.
- [155] arXiv:2405.08434 [pdf, ps, other]
-
Title: TP3M: Transformer-based Pseudo 3D Image Matching with ReferenceComments: Accepted by ICRA 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image matching is still challenging in such scenes with large viewpoints or illumination changes or with low textures. In this paper, we propose a Transformer-based pseudo 3D image matching method. It upgrades the 2D features extracted from the source image to 3D features with the help of a reference image and matches to the 2D features extracted from the destination image by the coarse-to-fine 3D matching. Our key discovery is that by introducing the reference image, the source image's fine points are screened and furtherly their feature descriptors are enriched from 2D to 3D, which improves the match performance with the destination image. Experimental results on multiple datasets show that the proposed method achieves the state-of-the-art on the tasks of homography estimation, pose estimation and visual localization especially in challenging scenes.
- [156] arXiv:2405.08440 [pdf, ps, other]
-
Title: DGCformer: Deep Graph Clustering Transformer for Multivariate Time Series ForecastingSubjects: Machine Learning (cs.LG)
Multivariate time series forecasting tasks are usually conducted in a channel-dependent (CD) way since it can incorporate more variable-relevant information. However, it may also involve a lot of irrelevant variables, and this even leads to worse performance than the channel-independent (CI) strategy. This paper combines the strengths of both strategies and proposes the Deep Graph Clustering Transformer (DGCformer) for multivariate time series forecasting. Specifically, it first groups these relevant variables by a graph convolutional network integrated with an autoencoder, and a former-latter masked self-attention mechanism is then considered with the CD strategy being applied to each group of variables while the CI one for different groups. Extensive experimental results on eight datasets demonstrate the superiority of our method against state-of-the-art models, and our code will be publicly available upon acceptance.
- [157] arXiv:2405.08443 [pdf, ps, other]
-
Title: Safety Constrained Multi-Agent Reinforcement Learning for Active Voltage ControlComments: Accepted by IJCAI2024Subjects: Machine Learning (cs.LG)
Active voltage control presents a promising avenue for relieving power congestion and enhancing voltage quality, taking advantage of the distributed controllable generators in the power network, such as roof-top photovoltaics. While Multi-Agent Reinforcement Learning (MARL) has emerged as a compelling approach to address this challenge, existing MARL approaches tend to overlook the constrained optimization nature of this problem, failing in guaranteeing safety constraints. In this paper, we formalize the active voltage control problem as a constrained Markov game and propose a safety-constrained MARL algorithm. We expand the primal-dual optimization RL method to multi-agent settings, and augment it with a novel approach of double safety estimation to learn the policy and to update the Lagrange-multiplier. In addition, we proposed different cost functions and investigated their influences on the behavior of our constrained MARL method. We evaluate our approach in the power distribution network simulation environment with real-world scale scenarios. Experimental results demonstrate the effectiveness of the proposed method compared with the state-of-the-art MARL methods.
- [158] arXiv:2405.08447 [pdf, ps, other]
-
Title: AI-Resilient InterfacesSubjects: Human-Computer Interaction (cs.HC)
AI is powerful, but it can make choices that result in objective errors, contextually inappropriate outputs, and disliked options. We need AI-resilient interfaces that help people be resilient to the AI choices that are not right, or not right for them. To support this goal, interfaces need to help users notice and have the context to appropriately judge those AI choices. Existing human-AI interaction guidelines recommend efficient user dismissal, modification, or otherwise efficient recovery from AI choices that a user does not like. However, in order to recover from AI choices, the user must notice them first. This can be difficult. For example, when generating summaries of long documents, a system's exclusion of a detail that is critically important to the user is hard for the user to notice. That detail can be hiding in a wall of text in the original document, and the existence of a summary may tempt the user not to read the original document as carefully. Once noticed, judging AI choices well can also be challenging. The interface may provide very little information that contextualizes the choices, and the user may fall back on assumptions when deciding whether to dismiss, modify, or otherwise recover from an AI choice. Building on prior work, this paper defines key aspects of AI-resilient interfaces, illustrated with examples. Designing interfaces for increased AI-resilience of users will improve AI safety, usability, and utility. This is especially critical where AI-powered systems are used for context- and preference-dominated open-ended AI-assisted tasks, like ideating, summarizing, searching, sensemaking, and the reading and writing of text or code.
- [159] arXiv:2405.08448 [pdf, ps, html, other]
-
Title: Understanding the performance gap between online and offline alignment algorithmsYunhao Tang, Daniel Zhaohan Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will DabneySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of reward over-optimization, we start with an opening set of experiments that demonstrate the clear advantage of online methods over offline methods. This prompts us to investigate the causes to the performance discrepancy through a series of carefully designed experimental ablations. We show empirically that hypotheses such as offline data coverage and data quality by itself cannot convincingly explain the performance difference. We also find that while offline algorithms train policy to become good at pairwise classification, it is worse at generations; in the meantime the policies trained by online algorithms are good at generations while worse at pairwise classification. This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process. Lastly, we observe that the performance discrepancy persists for both contrastive and non-contrastive loss functions, and appears not to be addressed by simply scaling up policy networks. Taken together, our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
- [160] arXiv:2405.08454 [pdf, ps, other]
-
Title: How Alignment Helps Make the Most of Multimodal DataComments: Working PaperSubjects: Computation and Language (cs.CL)
When studying political communication, combining the information from text, audio, and video signals promises to reflect the richness of human communication more comprehensively than confining it to individual modalities alone. However, when modeling such multimodal data, its heterogeneity, connectedness, and interaction are challenging to address. We argue that aligning the respective modalities can be an essential step in entirely using the potential of multimodal data because it informs the model with human understanding. Exploring aligned modalities unlocks promising analytical leverage. First, it allows us to make the most of information in the data, which inter alia opens the door to better quality predictions. Second, it is possible to answer research questions that span multiple modalities with cross-modal queries. Finally, alignment addresses concerns about model interpretability. We illustrate the utility of this approach by analyzing how German MPs address members of the far-right AfD in their speeches, and predicting the tone of video advertising in the context of the 2020 US presidential race. Our paper offers important insights to all keen to analyze multimodal data effectively.
- [161] arXiv:2405.08458 [pdf, ps, other]
-
Title: Rethinking Prior Information Generation with CLIP for Few-Shot SegmentationComments: Accepted by CVPR 2024; The camera-ready versionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Few-shot segmentation remains challenging due to the limitations of its labeling information for unseen classes. Most previous approaches rely on extracting high-level feature maps from the frozen visual encoder to compute the pixel-wise similarity as a key prior guidance for the decoder. However, such a prior representation suffers from coarse granularity and poor generalization to new classes since these high-level feature maps have obvious category bias. In this work, we propose to replace the visual prior representation with the visual-text alignment capacity to capture more reliable guidance and enhance the model generalization. Specifically, we design two kinds of training-free prior information generation strategy that attempts to utilize the semantic alignment capability of the Contrastive Language-Image Pre-training model (CLIP) to locate the target class. Besides, to acquire more accurate prior guidance, we build a high-order relationship of attention maps and utilize it to refine the initial prior information. Experiments on both the PASCAL-5{i} and COCO-20{i} datasets show that our method obtains a clearly substantial improvement and reaches the new state-of-the-art performance.
- [162] arXiv:2405.08460 [pdf, ps, other]
-
Title: Evaluating LLMs at Evaluating Temporal GeneralizationComments: PreprintSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The rapid advancement of Large Language Models (LLMs) highlights the urgent need for evolving evaluation methodologies that keep pace with improvements in language comprehension and information processing. However, traditional benchmarks, which are often static, fail to capture the continually changing information landscape, leading to a disparity between the perceived and actual effectiveness of LLMs in ever-changing real-world scenarios. Furthermore, these benchmarks do not adequately measure the models' capabilities over a broader temporal range or their adaptability over time. We examine current LLMs in terms of temporal generalization and bias, revealing that various temporal biases emerge in both language likelihood and prognostic prediction. This serves as a caution for LLM practitioners to pay closer attention to mitigating temporal biases. Also, we propose an evaluation framework Freshbench for dynamically generating benchmarks from the most recent real-world prognostication prediction. Our code is available at this https URL. The dataset will be released soon.
- [163] arXiv:2405.08463 [pdf, ps, other]
-
Title: A Timely Survey on Vision Transformer for Deepfake DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, the rapid advancement of deepfake technology has revolutionized content creation, lowering forgery costs while elevating quality. However, this progress brings forth pressing concerns such as infringements on individual rights, national security threats, and risks to public safety. To counter these challenges, various detection methodologies have emerged, with Vision Transformer (ViT)-based approaches showcasing superior performance in generality and efficiency. This survey presents a timely overview of ViT-based deepfake detection models, categorized into standalone, sequential, and parallel architectures. Furthermore, it succinctly delineates the structure and characteristics of each model. By analyzing existing research and addressing future directions, this survey aims to equip researchers with a nuanced understanding of ViT's pivotal role in deepfake detection, serving as a valuable reference for both academic and practical pursuits in this domain.
- [164] arXiv:2405.08465 [pdf, ps, html, other]
-
Title: How to Surprisingly Consider Recommendations? A Knowledge-Graph-based Approach Relying on Complex Network MetricsSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Social and Information Networks (cs.SI)
Traditional recommendation proposals, including content-based and collaborative filtering, usually focus on similarity between items or users. Existing approaches lack ways of introducing unexpectedness into recommendations, prioritizing globally popular items over exposing users to unforeseen items. This investigation aims to design and evaluate a novel layer on top of recommender systems suited to incorporate relational information and suggest items with a user-defined degree of surprise. We propose a Knowledge Graph (KG) based recommender system by encoding user interactions on item catalogs. Our study explores whether network-level metrics on KGs can influence the degree of surprise in recommendations. We hypothesize that surprisingness correlates with certain network metrics, treating user profiles as subgraphs within a larger catalog KG. The achieved solution reranks recommendations based on their impact on structural graph metrics. Our research contributes to optimizing recommendations to reflect the metrics. We experimentally evaluate our approach on two datasets of LastFM listening histories and synthetic Netflix viewing profiles. We find that reranking items based on complex network metrics leads to a more unexpected and surprising composition of recommendation lists.
- [165] arXiv:2405.08466 [pdf, ps, other]
-
Title: Work-in-Progress: Crash Course: Can (Under Attack) Autonomous Driving Beat Human Drivers?Comments: Accepted at ACSW 2024Subjects: Cryptography and Security (cs.CR)
Autonomous driving is a research direction that has gained enormous traction in the last few years thanks to advancements in Artificial Intelligence (AI). Depending on the level of independence from the human driver, several studies show that Autonomous Vehicles (AVs) can reduce the number of on-road crashes and decrease overall fuel emissions by improving efficiency. However, security research on this topic is mixed and presents some gaps. On one hand, these studies often neglect the intrinsic vulnerabilities of AI algorithms, which are known to compromise the security of these systems. On the other, the most prevalent attacks towards AI rely on unrealistic assumptions, such as access to the model parameters or the training dataset. As such, it is unclear if autonomous driving can still claim several advantages over human driving in real-world applications. This paper evaluates the inherent risks in autonomous driving by examining the current landscape of AVs and establishing a pragmatic threat model. Through our analysis, we develop specific claims highlighting the delicate balance between the advantages of AVs and potential security challenges in real-world scenarios. Our evaluation serves as a foundation for providing essential takeaway messages, guiding both researchers and practitioners at various stages of the automation pipeline. In doing so, we contribute valuable insights to advance the discourse on the security and viability of autonomous driving in real-world applications.
- [166] arXiv:2405.08468 [pdf, ps, other]
-
Title: Challenges and Opportunities in Text Generation ExplainabilityComments: 17 pages, 5 figures, xAI-2024 Conference, Main trackSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The necessity for interpretability in natural language processing (NLP) has risen alongside the growing prominence of large language models. Among the myriad tasks within NLP, text generation stands out as a primary objective of autoregressive models. The NLP community has begun to take a keen interest in gaining a deeper understanding of text generation, leading to the development of model-agnostic explainable artificial intelligence (xAI) methods tailored to this task. The design and evaluation of explainability methods are non-trivial since they depend on many factors involved in the text generation process, e.g., the autoregressive model and its stochastic nature. This paper outlines 17 challenges categorized into three groups that arise during the development and assessment of attribution-based explainability methods. These challenges encompass issues concerning tokenization, defining explanation similarity, determining token importance and prediction change metrics, the level of human intervention required, and the creation of suitable test datasets. The paper illustrates how these challenges can be intertwined, showcasing new opportunities for the community. These include developing probabilistic word-level explainability methods and engaging humans in the explainability pipeline, from the data design to the final evaluation, to draw robust conclusions on xAI methods.
- [167] arXiv:2405.08469 [pdf, ps, other]
-
Title: GPT-3.5 for Grammatical Error CorrectionSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This paper investigates the application of GPT-3.5 for Grammatical Error Correction (GEC) in multiple languages in several settings: zero-shot GEC, fine-tuning for GEC, and using GPT-3.5 to re-rank correction hypotheses generated by other GEC models. In the zero-shot setting, we conduct automatic evaluations of the corrections proposed by GPT-3.5 using several methods: estimating grammaticality with language models (LMs), the Scribendi test, and comparing the semantic embeddings of sentences. GPT-3.5 has a known tendency to over-correct erroneous sentences and propose alternative corrections. For several languages, such as Czech, German, Russian, Spanish, and Ukrainian, GPT-3.5 substantially alters the source sentences, including their semantics, which presents significant challenges for evaluation with reference-based metrics. For English, GPT-3.5 demonstrates high recall, generates fluent corrections, and generally preserves sentence semantics. However, human evaluation for both English and Russian reveals that, despite its strong error-detection capabilities, GPT-3.5 struggles with several error types, including punctuation mistakes, tense errors, syntactic dependencies between words, and lexical compatibility at the sentence level.
- [168] arXiv:2405.08470 [pdf, ps, other]
-
Title: Sparse MTTKRP Acceleration for Tensor Decomposition on GPUComments: In 21st ACM International Conference on Computing Frontiers (CF '24), May 7-9, 2024, Ischia, ItalySubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)
Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In this work, we propose a GPU-based algorithm design to address the key challenges in accelerating spMTTKRP computation, including (1) eliminating global atomic operations across GPU thread blocks, (2) avoiding the intermediate values being communicated between GPU thread blocks and GPU global memory, and (3) ensuring a balanced distribution of workloads across GPU thread blocks. Our approach also supports dynamic tensor remapping, enabling the above optimizations in all the modes of the input tensor. Our approach achieves a geometric mean speedup of 1.5x, 2.0x, and 21.7x in total execution time across widely used datasets compared with the state-of-the-art GPU implementations. Our work is the only GPU implementation that can support tensors with modes greater than 4 since the state-of-the-art works have implementation constraints for tensors with a large number of modes.
- [169] arXiv:2405.08473 [pdf, ps, other]
-
Title: Improving the Real-Data Driven Network Evaluation Model for Digital Twin NetworksComments: accepted at IEEE ICC 2024 Workshop - DDINSSubjects: Machine Learning (cs.LG)
With the emergence and proliferation of new forms of large-scale services such as smart homes, virtual reality/augmented reality, the increasingly complex networks are raising concerns about significant operational costs. As a result, the need for network management automation is emphasized, and Digital Twin Networks (DTN) technology is expected to become the foundation technology for autonomous networks. DTN has the advantage of being able to operate and system networks based on real-time collected data in a closed-loop system, and currently it is mainly designed for optimization scenarios. To improve network performance in optimization scenarios, it is necessary to select appropriate configurations and perform accurate performance evaluation based on real data. However, most network evaluation models currently use simulation data. Meanwhile, according to DTN standards documents, artificial intelligence (AI) models can ensure scalability, real-time performance, and accuracy in large-scale networks. Various AI research and standardization work is ongoing to optimize the use of DTN. When designing AI models, it is crucial to consider the characteristics of the data. This paper presents an autoencoder-based skip connected message passing neural network (AE-SMPN) as a network evaluation model using real network data. The model is created by utilizing graph neural network (GNN) with recurrent neural network (RNN) models to capture the spatiotemporal features of network data. Additionally, an AutoEncoder (AE) is employed to extract initial features. The neural network was trained using the real DTN dataset provided by the Barcelona Neural Networking Center (BNN-UPC), and the paper presents the analysis of the model structure along with experimental results.
- [170] arXiv:2405.08475 [pdf, ps, html, other]
-
Title: Representing Information on DNA using Patterns Induced by Enzymatic LabelingComments: Accepted to The IEEE International Symposium on Information Theory (ISIT) 2024Subjects: Information Theory (cs.IT)
Enzymatic DNA labeling is a powerful tool with applications in biochemistry, molecular biology, biotechnology, medical science, and genomic research. This paper contributes to the evolving field of DNA-based data storage by presenting a formal framework for modeling DNA labeling in strings, specifically tailored for data storage purposes. Our approach involves a known DNA molecule as a template for labeling, employing patterns induced by a set of designed labels to represent information. One hypothetical implementation can use CRISPR-Cas9 and gRNA reagents for labeling. Various aspects of the general labeling channel, including fixed-length labels, are explored, and upper bounds on the maximal size of the corresponding codes are given. The study includes the development of an efficient encoder-decoder pair that is proven optimal in terms of maximum code size under specific conditions.
- [171] arXiv:2405.08477 [pdf, ps, other]
-
Title: Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language ModelsComments: Accepted at EAMT 2024Subjects: Computation and Language (cs.CL)
Machine translation (MT) models are known to suffer from gender bias, especially when translating into languages with extensive gendered morphology. Accordingly, they still fall short in using gender-inclusive language, also representative of non-binary identities. In this paper, we look at gender-inclusive neomorphemes, neologistic elements that avoid binary gender markings as an approach towards fairer MT. In this direction, we explore prompting techniques with large language models (LLMs) to translate from English into Italian using neomorphemes. So far, this area has been under-explored due to its novelty and the lack of publicly available evaluation resources. We fill this gap by releasing Neo-GATE, a resource designed to evaluate gender-inclusive en-it translation with neomorphemes. With Neo-GATE, we assess four LLMs of different families and sizes and different prompt formats, identifying strengths and weaknesses of each on this novel task for MT.
- [172] arXiv:2405.08479 [pdf, ps, other]
-
Title: A Survey on Complexity Measures of Pseudo-Random SequencesSubjects: Cryptography and Security (cs.CR)
Since the introduction of the Kolmogorov complexity of binary sequences in the 1960s, there have been significant advancements in the topic of complexity measures for randomness assessment, which are of fundamental importance in theoretical computer science and of practical interest in cryptography. This survey reviews notable research from the past four decades on the linear, quadratic and maximum-order complexities of pseudo-random sequences and their relations with Lempel-Ziv complexity, expansion complexity, 2-adic complexity, and correlation measures.
- [173] arXiv:2405.08483 [pdf, ps, other]
-
Title: RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D ImagesComments: Accepted by CVPR Workshop DLGC, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
In this work, we introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image. Unlike existing methods that either directly predict objects' poses or rely on sparse keypoints for pose recovery, our approach addresses this challenging task using dense correspondence, i.e., we regress the object coordinates for each visible pixel. Our method leverages existing object detection methods. We incorporate a re-projection mechanism to adjust the camera's intrinsic matrix to accommodate cropping in RGB-D images. Moreover, we transform the 3D object coordinates into a residual representation, which can effectively reduce the output space and yield superior performance. We conducted extensive experiments to validate the efficacy of our approach for 6D pose estimation. Our approach outperforms most previous methods, especially in occlusion scenarios, and demonstrates notable improvements over the state-of-the-art methods. Our code is available on this https URL.
- [174] arXiv:2405.08486 [pdf, ps, other]
-
Title: Gradient Boosting Mapping for Dimensionality Reduction and Feature ExtractionComments: 32 pages, 8 figures, 5 tablesSubjects: Machine Learning (cs.LG)
A fundamental problem in supervised learning is to find a good set of features or distance measures. If the new set of features is of lower dimensionality and can be obtained by a simple transformation of the original data, they can make the model understandable, reduce overfitting, and even help to detect distribution drift. We propose a supervised dimensionality reduction method Gradient Boosting Mapping (GBMAP), where the outputs of weak learners -- defined as one-layer perceptrons -- define the embedding. We show that the embedding coordinates provide better features for the supervised learning task, making simple linear models competitive with the state-of-the-art regressors and classifiers. We also use the embedding to find a principled distance measure between points. The features and distance measures automatically ignore directions irrelevant to the supervised learning task. We also show that we can reliably detect out-of-distribution data points with potentially large regression or classification errors. GBMAP is fast and works in seconds for dataset of million data points or hundreds of features. As a bonus, GBMAP provides a regression and classification performance comparable to the state-of-the-art supervised learning methods.
- [175] arXiv:2405.08487 [pdf, ps, other]
-
Title: Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection MethodSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
In recent years, deep learning has greatly streamlined the process of generating realistic fake face images. Aware of the dangers, researchers have developed various tools to spot these counterfeits. Yet none asked the fundamental question: What digital manipulations make a real photographic face image fake, while others do not? In this paper, we put face forgery in a semantic context and define that computational methods that alter semantic face attributes to exceed human discrimination thresholds are sources of face forgery. Guided by our new definition, we construct a large face forgery image dataset, where each image is associated with a set of labels organized in a hierarchical graph. Our dataset enables two new testing protocols to probe the generalization of face forgery detectors. Moreover, we propose a semantics-oriented face forgery detection method that captures label relations and prioritizes the primary task (\ie, real or fake face detection). We show that the proposed dataset successfully exposes the weaknesses of current detectors as the test set and consistently improves their generalizability as the training set. Additionally, we demonstrate the superiority of our semantics-oriented method over traditional binary and multi-class classification-based detectors.
- [176] arXiv:2405.08493 [pdf, ps, other]
-
Title: Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental StudySubjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformer (ViT), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global receptive field, has gained extensive attention for vision tasks. In such tasks, images need to be serialized to form sequences compatible with the Mamba model. Numerous research efforts have explored scanning strategies to serialize images, aiming to enhance the Mamba model's understanding of images. However, the effectiveness of these scanning strategies remains uncertain. In this research, we conduct a comprehensive experimental investigation on the impact of mainstream scanning directions and their combinations on semantic segmentation of remotely sensed images. Through extensive experiments on the LoveDA, ISPRS Potsdam, and ISPRS Vaihingen datasets, we demonstrate that no single scanning strategy outperforms others, regardless of their complexity or the number of scanning directions involved. A simple, single scanning direction is deemed sufficient for semantic segmentation of high-resolution remotely sensed images. Relevant directions for future research are also recommended.
- [177] arXiv:2405.08497 [pdf, ps, other]
-
Title: Is Less More? Quality, Quantity and Context in Idiom Processing with Natural Language ModelsAgne Knietaite, Adam Allsebrook, Anton Minkov, Adam Tomaszewski, Norbert Slinko, Richard Johnson, Thomas Pickard, Dylan Phelps, Aline VillavicencioComments: 14 pages, 10 figures. Presented at the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD 2024) this https URLSubjects: Computation and Language (cs.CL)
Compositionality in language models presents a problem when processing idiomatic expressions, as their meaning often cannot be directly derived from their individual parts. Although fine-tuning and other optimization strategies can be used to improve representations of idiomatic expressions, this depends on the availability of relevant data. We present the Noun Compound Synonym Substitution in Books - NCSSB - datasets, which are created by substitution of synonyms of potentially idiomatic English noun compounds in public domain book texts. We explore the trade-off between data quantity and quality when training models for idiomaticity detection, in conjunction with contextual information obtained locally (from the surrounding sentences) or externally (through language resources). Performance on an idiomaticity detection task indicates that dataset quality is a stronger factor for context-enriched models, but that quantity also plays a role in models without context inclusion strategies.
- [178] arXiv:2405.08498 [pdf, ps, html, other]
-
Title: Learning Decision Policies with Instrumental Variables through Double Machine LearningComments: Accepted at ICML 2024Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recent IV regression algorithms use a two-stage approach, where a deep neural network (DNN) estimator learnt in the first stage is directly plugged into the second stage, in which another DNN is used to estimate the causal effect. Naively plugging the estimator can cause heavy bias in the second stage, especially when regularisation bias is present in the first stage estimator. We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions and effectively learns high-performing policies. We derive a novel learning objective to reduce bias and design the DML-IV algorithm following the double/debiased machine learning (DML) framework. The learnt DML-IV estimator has strong convergence rate and $O(N^{-1/2})$ suboptimality guarantees that match those when the dataset is unconfounded. DML-IV outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
- [179] arXiv:2405.08502 [pdf, ps, html, other]
-
Title: Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil ProcedureComments: To be published in SemEval-2024Subjects: Computation and Language (cs.CL)
The SemEval task on Argument Reasoning in Civil Procedure is challenging in that it requires understanding legal concepts and inferring complex arguments. Currently, most Large Language Models (LLM) excelling in the legal realm are principally purposed for classification tasks, hence their reasoning rationale is subject to contention. The approach we advocate involves using a powerful teacher-LLM (ChatGPT) to extend the training dataset with explanations and generate synthetic data. The resulting data are then leveraged to fine-tune a small student-LLM. Contrary to previous work, our explanations are not directly derived from the teacher's internal knowledge. Instead they are grounded in authentic human analyses, therefore delivering a superior reasoning signal. Additionally, a new `mutation' method generates artificial data instances inspired from existing ones. We are publicly releasing the explanations as an extension to the original dataset, along with the synthetic dataset and the prompts that were used to generate both. Our system ranked 15th in the SemEval competition. It outperforms its own teacher and can produce explanations aligned with the original human analyses, as verified by legal experts.
- [180] arXiv:2405.08503 [pdf, ps, other]
-
Title: IPC: Incremental Probabilistic Consensus-based Consistent Set Maximization for SLAM BackendsComments: This paper has been accepted for publication at the 2024 IEEE International Conference on Robotics and Automation (ICRA)Journal-ref: 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024Subjects: Robotics (cs.RO)
In SLAM (Simultaneous localization and mapping) problems, Pose Graph Optimization (PGO) is a technique to refine an initial estimate of a set of poses (positions and orientations) from a set of pairwise relative measurements. The optimization procedure can be negatively affected even by a single outlier measurement, with possible catastrophic and meaningless results. Although recent works on robust optimization aim to mitigate the presence of outlier measurements, robust solutions capable of handling large numbers of outliers are yet to come. This paper presents IPC, acronym for Incremental Probabilistic Consensus, a method that approximates the solution to the combinatorial problem of finding the maximally consistent set of measurements in an incremental fashion. It evaluates the consistency of each loop closure measurement through a consensus-based procedure, possibly applied to a subset of the global problem, where all previously integrated inlier measurements have veto power. We evaluated IPC on standard benchmarks against several state-of-the-art methods. Although it is simple and relatively easy to implement, IPC competes with or outperforms the other tested methods in handling outliers while providing online performances. We release with this paper an open-source implementation of the proposed method.
- [181] arXiv:2405.08510 [pdf, ps, other]
-
Title: Growing Artificial Neural Networks for Control: the Role of Neuronal DiversitySubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
In biological evolution complex neural structures grow from a handful of cellular ingredients. As genomes in nature are bounded in size, this complexity is achieved by a growth process where cells communicate locally to decide whether to differentiate, proliferate and connect with other cells. This self-organisation is hypothesized to play an important part in the generalisation, and robustness of biological neural networks. Artificial neural networks (ANNs), on the other hand, are traditionally optimized in the space of weights. Thus, the benefits and challenges of growing artificial neural networks remain understudied. Building on the previously introduced Neural Developmental Programs (NDP), in this work we present an algorithm for growing ANNs that solve reinforcement learning tasks. We identify a key challenge: ensuring phenotypic complexity requires maintaining neuronal diversity, but this diversity comes at the cost of optimization stability. To address this, we introduce two mechanisms: (a) equipping neurons with an intrinsic state inherited upon neurogenesis; (b) lateral inhibition, a mechanism inspired by biological growth, which controlls the pace of growth, helping diversity persist. We show that both mechanisms contribute to neuronal diversity and that, equipped with them, NDPs achieve comparable results to existing direct and developmental encodings in complex locomotion tasks
- [182] arXiv:2405.08513 [pdf, ps, other]
-
Title: Subspace method based on neural networks for solving the partial differential equation in weak formComments: arXiv admin note: substantial text overlap with arXiv:2404.08223Subjects: Numerical Analysis (math.NA)
We present a subspace method based on neural networks for solving the partial differential equation in weak form with high accuracy. The basic idea of our method is to use some functions based on neural networks as base functions to span a subspace, then find an approximate solution in this subspace. Training base functions and finding an approximate solution can be separated, that is different methods can be used to train these base functions, and different methods can also be used to find an approximate solution. In this paper, we find an approximate solution of the partial differential equation in the weak form. Our method can achieve high accuracy with low cost of training. Numerical examples show that the cost of training these base functions is low, and only one hundred to two thousand epochs are needed for most tests. The error of our method can fall below the level of $10^{-7}$ for some tests. The proposed method has the better performance in terms of the accuracy and computational cost.
- [183] arXiv:2405.08514 [pdf, ps, other]
-
Title: Falcon 7b for Software Mention Detection in Scholarly DocumentsComments: Accepted for publication by the first Workshop on Natural Scientific Language Processing and Research Knowledge Graphs - NSLP (@ ESCAI)Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Digital Libraries (cs.DL)
This paper aims to tackle the challenge posed by the increasing integration of software tools in research across various disciplines by investigating the application of Falcon-7b for the detection and classification of software mentions within scholarly texts. Specifically, the study focuses on solving Subtask I of the Software Mention Detection in Scholarly Publications (SOMD), which entails identifying and categorizing software mentions from academic literature. Through comprehensive experimentation, the paper explores different training strategies, including a dual-classifier approach, adaptive sampling, and weighted loss scaling, to enhance detection accuracy while overcoming the complexities of class imbalance and the nuanced syntax of scholarly writing. The findings highlight the benefits of selective labelling and adaptive sampling in improving the model's performance. However, they also indicate that integrating multiple strategies does not necessarily result in cumulative improvements. This research offers insights into the effective application of large language models for specific tasks such as SOMD, underlining the importance of tailored approaches to address the unique challenges presented by academic text analysis.
- [184] arXiv:2405.08515 [pdf, ps, other]
-
Title: Precarious Experiences: Citizens' Frustrations, Anxieties and Burdens of an Online Welfare Benefit SystemComments: 22 pagesSubjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
There is a significant overlap between people who are supported by income-related social welfare benefits, often in precarious situations, and those who experience greater digital exclusion. We report on a study of claimants using the UK's Universal Credit online welfare benefit system designed as, and still, "digital by default". Through data collection involving remote interviews (n=11) and online surveys (n=66), we expose claimants' own lived experiences interacting with this system. The claimants explain how digital channels can contribute to an imbalance of power and agency, at a time when their own circumstances mean they have reduced abilities, resources and capacities, and where design choices can adversely affect people's utility to leverage help from their own wider socio-technical ecosystems. We contribute eight recommendations from these accounts to inform the future design and development of digital welfare benefit systems for this population, to reduce digital barriers and harms.
- [185] arXiv:2405.08520 [pdf, ps, other]
-
Title: Empowering Programmable Wireless Environments with Optical Anchor-based PositioningDimitrios Tyrovolas, Dimitrios Bozanis, Sotiris A. Tegos, Vasilis K. Papanikolaou, Panagiotis D. Diamantoulakis, Christos K. Liaskos, Robert Schober, George K. KaragiannidisSubjects: Information Theory (cs.IT)
The evolution toward sixth-generation (6G) wireless networks has introduced programmable wireless environments (PWEs) and reconfigurable intelligent surfaces (RISs) as transformative elements for achieving near-deterministic wireless communications. However, the enhanced capabilities of RISs within PWEs, especially as we move toward more complex electromagnetic functions by increasing the number of reflecting elements, underscore the need for high-precision user localization, since inaccurate localization could lead to erroneous configuration of RISs, which would then compromise the effectiveness of PWEs. In this direction, this paper investigates the integration of RISs and optical anchors within PWEs, emphasizing the crucial role of ultra-precise localization in unlocking advanced electromagnetic functionalities. Specifically, we present an in-depth analysis of various localization techniques, both RISbased and RIS-independent, while introducing the concept of empowering PWEs with optical anchors for enhanced localization precision. Our findings highlight that accurate localization is essential to fully exploit the capabilities of RISs, paving the way for future applications. Through this exploration, we contribute to the advancement of PWEs in line with the ambitious goals of the 6G standards and improve the quality of service in next generation wireless networks.
- [186] arXiv:2405.08526 [pdf, ps, other]
-
Title: Why Larp?! A Synthesis Paper on Live Action Roleplay in Relation to HCI Research and PracticeKarin Johansson, Raquel Breejon Robinson, Jon Back, Sarah Lynne Bowman, James Fey, Elena Márquez Segura, Annika Waern, Katherine IsbisterSubjects: Human-Computer Interaction (cs.HC)
Live action roleplay (larp) has a wide range of applications, and can be relevant in relation to HCI. While there has been research about larp in relation to topics such as embodied interaction, playfulness and futuring published in HCI venues since the early 2000s, there is not yet a compilation of this knowledge. In this paper, we synthesise knowledge about larp and larp-adjacent work within the domain of HCI. We present a practitioner overview from an expert group of larp researchers, the results of a literature review, and highlight particular larp research exemplars which all work together to showcase the diverse set of ways that larp can be utilised in relation to HCI topics and research. This paper identifies the need for further discussions toward establishing best practices for utilising larp in relation to HCI research, as well as advocating for increased engagement with larps outside academia.
- [187] arXiv:2405.08527 [pdf, ps, other]
-
Title: EEG-Features for Generalized Deepfake DetectionArian Beckmann, Tilman Stephani, Felix Klotzsche, Yonghao Chen, Simon M. Hofmann, Arno Villringer, Michael Gaebler, Vadim Nikulin, Sebastian Bosse, Peter Eisert, Anna HilsmannSubjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
Since the advent of Deepfakes in digital media, the development of robust and reliable detection mechanism is urgently called for. In this study, we explore a novel approach to Deepfake detection by utilizing electroencephalography (EEG) measured from the neural processing of a human participant who viewed and categorized Deepfake stimuli from the FaceForensics++ datset. These measurements serve as input features to a binary support vector classifier, trained to discriminate between real and manipulated facial images. We examine whether EEG data can inform Deepfake detection and also if it can provide a generalized representation capable of identifying Deepfakes beyond the training domain. Our preliminary results indicate that human neural processing signals can be successfully integrated into Deepfake detection frameworks and hint at the potential for a generalized neural representation of artifacts in computer generated faces. Moreover, our study provides next steps towards the understanding of how digital realism is embedded in the human cognitive system, possibly enabling the development of more realistic digital avatars in the future.
- [188] arXiv:2405.08528 [pdf, ps, other]
-
Title: From Internet of Things Data to Business Processes: Challenges and a FrameworkJuergen Mangler, Ronny Seiger, Janik-Vasily Benzin, Joscha Grüger, Yusuf Kirikkayis, Florian Gallik, Lukas Malburg, Matthias Ehrendorfer, Yannis Bertrand, Marco Franceschetti, Barbara Weber, Stefanie Rinderle-Ma, Ralph Bergmann, Estefanía Serral Asensio, Manfred ReichertSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
The IoT and Business Process Management (BPM) communities co-exist in many shared application domains, such as manufacturing and healthcare. The IoT community has a strong focus on hardware, connectivity and data; the BPM community focuses mainly on finding, controlling, and enhancing the structured interactions among the IoT devices in processes. While the field of Process Mining deals with the extraction of process models and process analytics from process event logs, the data produced by IoT sensors often is at a lower granularity than these process-level events. The fundamental questions about extracting and abstracting process-related data from streams of IoT sensor values are: (1) Which sensor values can be clustered together as part of process events?, (2) Which sensor values signify the start and end of such events?, (3) Which sensor values are related but not essential? This work proposes a framework to semi-automatically perform a set of structured steps to convert low-level IoT sensor data into higher-level process events that are suitable for process mining. The framework is meant to provide a generic sequence of abstract steps to guide the event extraction, abstraction, and correlation, with variation points for plugging in specific analysis techniques and algorithms for each step. To assess the completeness of the framework, we present a set of challenges, how they can be tackled through the framework, and an example on how to instantiate the framework in a real-world demonstration from the field of smart manufacturing. Based on this framework, future research can be conducted in a structured manner through refining and improving individual steps.
- [189] arXiv:2405.08533 [pdf, ps, html, other]
-
Title: Dynamic Feature Learning and Matching for Class-Incremental LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
Class-incremental learning (CIL) has emerged as a means to learn new classes incrementally without catastrophic forgetting of previous classes. Recently, CIL has undergone a paradigm shift towards dynamic architectures due to their superior performance. However, these models are still limited by the following aspects: (i) Data augmentation (DA), which are tightly coupled with CIL, remains under-explored in dynamic architecture scenarios. (ii) Feature representation. The discriminativeness of dynamic feature are sub-optimal and possess potential for refinement. (iii) Classifier. The misalignment between dynamic feature and classifier constrains the capabilities of the model. To tackle the aforementioned drawbacks, we propose the Dynamic Feature Learning and Matching (DFLM) model in this paper from above three perspectives. Specifically, we firstly introduce class weight information and non-stationary functions to extend the mix DA method for dynamically adjusting the focus on memory during training. Then, von Mises-Fisher (vMF) classifier is employed to effectively model the dynamic feature distribution and implicitly learn their discriminative properties. Finally, the matching loss is proposed to facilitate the alignment between the learned dynamic features and the classifier by minimizing the distribution distance. Extensive experiments on CIL benchmarks validate that our proposed model achieves significant performance improvements over existing methods.
- [190] arXiv:2405.08538 [pdf, ps, other]
-
Title: Self-Distillation Improves DNA Sequence InferenceSubjects: Machine Learning (cs.LG)
Self-supervised pretraining (SSP) has been recognized as a method to enhance prediction accuracy in various downstream tasks. However, its efficacy for DNA sequences remains somewhat constrained. This limitation stems primarily from the fact that most existing SSP approaches in genomics focus on masked language modeling of individual sequences, neglecting the crucial aspect of encoding statistics across multiple sequences. To overcome this challenge, we introduce an innovative deep neural network model, which incorporates collaborative learning between a `student' and a `teacher' subnetwork. In this model, the student subnetwork employs masked learning on nucleotides and progressively adapts its parameters to the teacher subnetwork through an exponential moving average approach. Concurrently, both subnetworks engage in contrastive learning, deriving insights from two augmented representations of the input sequences. This self-distillation process enables our model to effectively assimilate both contextual information from individual sequences and distributional data across the sequence population. We validated our approach with preliminary pretraining using the human reference genome, followed by applying it to 20 downstream inference tasks. The empirical results from these experiments demonstrate that our novel method significantly boosts inference performance across the majority of these tasks. Our code is available at this https URL.
- [191] arXiv:2405.08539 [pdf, ps, other]
-
Title: SecScore: Enhancing the CVSS Threat Metric Group with Empirical EvidencesSubjects: Cryptography and Security (cs.CR)
Background: Timely prioritising and remediating vulnerabilities are paramount in the dynamic cybersecurity field, and one of the most widely used vulnerability scoring systems (CVSS) does not address the increasing likelihood of emerging an exploit code. Aims: We present SecScore, an innovative vulnerability severity score that enhances CVSS Threat metric group with statistical models from empirical evidences of real-world exploit codes. Method: SecScore adjusts the traditional CVSS score using an explainable and empirical method that more accurately and promptly captures the dynamics of exploit code development. Results: Our approach can integrate seamlessly into the assessment/prioritisation stage of several vulnerability management processes, improving the effectiveness of prioritisation and ensuring timely remediation. We provide real-world statistical analysis and models for a wide range of vulnerability types and platforms, demonstrating that SecScore is flexible according to the vulnerability's profile. Comprehensive experiments validate the value and timeliness of SecScore in vulnerability prioritisation. Conclusions: SecScore advances the vulnerability metrics theory and enhances organisational cybersecurity with practical insights.
- [192] arXiv:2405.08540 [pdf, ps, other]
-
Title: Generalizing Knowledge Graph Embedding with Universal Orthogonal ParameterizationComments: Accepted by ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Recent advances in knowledge graph embedding (KGE) rely on Euclidean/hyperbolic orthogonal relation transformations to model intrinsic logical patterns and topological structures. However, existing approaches are confined to rigid relational orthogonalization with restricted dimension and homogeneous geometry, leading to deficient modeling capability. In this work, we move beyond these approaches in terms of both dimension and geometry by introducing a powerful framework named GoldE, which features a universal orthogonal parameterization based on a generalized form of Householder reflection. Such parameterization can naturally achieve dimensional extension and geometric unification with theoretical guarantees, enabling our framework to simultaneously capture crucial logical patterns and inherent topological heterogeneity of knowledge graphs. Empirically, GoldE achieves state-of-the-art performance on three standard benchmarks. Codes are available at this https URL.
- [193] arXiv:2405.08542 [pdf, ps, html, other]
-
Title: Industrial Metaverse: Enabling Technologies, Open Problems, and Future TrendsComments: 26 pages, 8 figuresSubjects: Computational Engineering, Finance, and Science (cs.CE)
As an emerging technology that enables seamless integration between the physical and virtual worlds, the Metaverse has great potential to be deployed in the industrial production field with the development of extended reality (XR) and next-generation communication networks. This deployment, called the Industrial Metaverse, is used for product design, production operations, industrial quality inspection, and product testing. However, there lacks of in-depth understanding of the enabling technologies associated with the Industrial Metaverse. This encompasses both the precise industrial scenarios targeted by each technology and the potential migration of technologies developed in other domains to the industrial sector. Driven by this issue, in this article, we conduct a comprehensive survey of the state-of-the-art literature on the Industrial Metaverse. Specifically, we first analyze the advantages of the Metaverse for industrial production. Then, we review a collection of key enabling technologies of the Industrial Metaverse, including blockchain (BC), digital twin (DT), 6G, XR, and artificial intelligence (AI), and analyze how these technologies can support different aspects of industrial production. Subsequently, we present numerous formidable challenges encountered within the Industrial Metaverse, including confidentiality and security concerns, resource limitations, and interoperability constraints. Furthermore, we investigate the extant solutions devised to address them. Finally, we briefly outline several open issues and future research directions of the Industrial Metaverse.
- [194] arXiv:2405.08546 [pdf, ps, html, other]
-
Title: Analysing Cross-Speaker Convergence in Face-to-Face Dialogue through the Lens of Automatically Detected Shared Linguistic ConstructionsComments: Accepted for publication at the 46th Proceedings of the Annual Meeting of the Cognitive Science SocietySubjects: Computation and Language (cs.CL)
Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions -- expressions with a common lexical core used by both speakers within a dialogue -- and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.
- [195] arXiv:2405.08547 [pdf, ps, html, other]
-
Title: Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational GraphSubjects: Computer Vision and Pattern Recognition (cs.CV)
In visual tasks, large teacher models capture essential features and deep information, enhancing performance. However, distilling this information into smaller student models often leads to performance loss due to structural differences and capacity limitations. To tackle this, we propose a distillation framework based on graph knowledge, including a multi-level feature alignment strategy and an attention-guided mechanism to provide a targeted learning trajectory for the student model. We emphasize spectral embedding (SE) as a key technique in our distillation process, which merges the student's feature space with the relational knowledge and structural complexities similar to the teacher network. This method captures the teacher's understanding in a graph-based representation, enabling the student model to more accurately mimic the complex structural dependencies present in the teacher model. Compared to methods that focus only on specific distillation areas, our strategy not only considers key features within the teacher model but also endeavors to capture the relationships and interactions among feature sets, encoding these complex pieces of information into a graph structure to understand and utilize the dynamic relationships among these pieces of information from a global perspective. Experiments show that our method outperforms previous feature distillation methods on the CIFAR-100, MS-COCO, and Pascal VOC datasets, proving its efficiency and applicability.
- [196] arXiv:2405.08548 [pdf, ps, other]
-
Title: Strict Self-Assembly of Discrete Self-Similar Fractal ShapesFlorent Becker (LIFO)Subjects: Discrete Mathematics (cs.DM)
This paper gives a (polynomial time) algorithm to decide whether a given Discrete Self-Similar Fractal Shape can be assembled in the aTAM this http URL the positive case, the construction relies on a Self-Assembling System in the aTAM which strictly assembles a particular self-similar fractal shape, namely a variant $K^\infty$ of the Sierpinski Carpet. We prove that the aTAM we propose is correct through a novel device, \emph{self-describing circuits} which are generally useful for rigorous yet readable proofs of the behaviour of aTAMs.We then discuss which self-similar fractals can or cannot be strictly self-assembled in the aTAM. It turns out that the ability of iterates of the generator to pass information is crucial: either this \emph{bandwidth} is eventually sufficient in both cardinal directions and $K^\infty$ appears within the fractal pattern after some finite number of iterations, or that bandwidth remains ever insufficient in one direction and any aTAM trying to self-assemble the shape will end up either bounded with an ultimately periodic pattern covering arbitrarily large squares. This is established thanks to a new characterization of the productions of systems whose productions have a uniformly bounded treewidth.
- [197] arXiv:2405.08549 [pdf, ps, html, other]
-
Title: A Well-Balanced Method for an Unstaggered Central Scheme, the one-space Dimensional CaseSubjects: Numerical Analysis (math.NA)
In this paper, we propose a new MUSCL scheme by combining the ideas of the Kurganov and Tadmor scheme and the so-called Deviation method which results in a well-balanced finite volume method for the hyperbolic balance laws, by evolving the difference between the exact solution and a given stationary solution. After that, we derive a semi-discrete scheme from this new scheme and it can be shown to be essentially TVD when applied to a scalar conservation law. In the end, we apply and validate the developed methods by numerical experiments and solve classical problems featuring Euler equations with gravitational source term.
- [198] arXiv:2405.08550 [pdf, ps, other]
-
Title: Learning Multi-Agent Communication from Graph Modeling PerspectiveComments: Published at ICLR 2024Subjects: Machine Learning (cs.LG)
In numerous artificial intelligence applications, the collaborative efforts of multiple intelligent agents are imperative for the successful attainment of target objectives. To enhance coordination among these agents, a distributed communication framework is often employed. However, information sharing among all agents proves to be resource-intensive, while the adoption of a manually pre-defined communication architecture imposes limitations on inter-agent communication, thereby constraining the potential for collaborative efforts. In this study, we introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph. We formulate this problem as the task of determining the communication graph while enabling the architecture parameters to update normally, thus necessitating a bi-level optimization process. Utilizing continuous relaxation of the graph representation and incorporating attention units, our proposed approach, CommFormer, efficiently optimizes the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner. Extensive experiments on a variety of cooperative tasks substantiate the robustness of our model across diverse cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies regardless of changes in the number of agents.
- [199] arXiv:2405.08553 [pdf, ps, other]
-
Title: Improving Transformers with Dynamically Composable Multi-Head AttentionComments: Accepted to the 41th International Conference on Machine Learning (ICML'24)Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Multi-Head Attention (MHA) is a key component of Transformer. In MHA, attention heads work independently, causing problems such as low-rank bottleneck of attention score matrices and head redundancy. We propose Dynamically Composable Multi-Head Attention (DCMHA), a parameter and computation efficient attention architecture that tackles the shortcomings of MHA and increases the expressive power of the model by dynamically composing attention heads. At the core of DCMHA is a $\it{Compose}$ function that transforms the attention score and weight matrices in an input-dependent way. DCMHA can be used as a drop-in replacement of MHA in any transformer architecture to obtain the corresponding DCFormer. DCFormer significantly outperforms Transformer on different architectures and model scales in language modeling, matching the performance of models with ~1.7x-2.0x compute. For example, DCPythia-6.9B outperforms open source Pythia-12B on both pretraining perplexity and downstream task evaluation. The code and models are available at this https URL.
- [200] arXiv:2405.08555 [pdf, ps, other]
-
Title: Dual-Branch Network for Portrait Image Quality AssessmentWei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, Guangtao ZhaiSubjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Portrait images typically consist of a salient person against diverse backgrounds. With the development of mobile devices and image processing techniques, users can conveniently capture portrait images anytime and anywhere. However, the quality of these portraits may suffer from the degradation caused by unfavorable environmental conditions, subpar photography techniques, and inferior capturing devices. In this paper, we introduce a dual-branch network for portrait image quality assessment (PIQA), which can effectively address how the salient person and the background of a portrait image influence its visual quality. Specifically, we utilize two backbone networks (\textit{i.e.,} Swin Transformer-B) to extract the quality-aware features from the entire portrait image and the facial image cropped from it. To enhance the quality-aware feature representation of the backbones, we pre-train them on the large-scale video quality assessment dataset LSVQ and the large-scale facial image quality assessment dataset GFIQA. Additionally, we leverage LIQE, an image scene classification and quality assessment model, to capture the quality-aware and scene-specific features as the auxiliary features. Finally, we concatenate these features and regress them into quality scores via a multi-perception layer (MLP). We employ the fidelity loss to train the model via a learning-to-rank manner to mitigate inconsistencies in quality scores in the portrait image quality assessment dataset PIQ. Experimental results demonstrate that the proposed model achieves superior performance in the PIQ dataset, validating its effectiveness. The code is available at \url{this https URL}.
- [201] arXiv:2405.08558 [pdf, ps, other]
-
Title: PTPI-DL-ROMs: pre-trained physics-informed deep learning-based reduced order models for nonlinear parametrized PDEsComments: 38 pagesSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
The coupling of Proper Orthogonal Decomposition (POD) and deep learning-based ROMs (DL-ROMs) has proved to be a successful strategy to construct non-intrusive, highly accurate, surrogates for the real time solution of parametric nonlinear time-dependent PDEs. Inexpensive to evaluate, POD-DL-ROMs are also relatively fast to train, thanks to their limited complexity. However, POD-DL-ROMs account for the physical laws governing the problem at hand only through the training data, that are usually obtained through a full order model (FOM) relying on a high-fidelity discretization of the underlying equations. Moreover, the accuracy of POD-DL-ROMs strongly depends on the amount of available data. In this paper, we consider a major extension of POD-DL-ROMs by enforcing the fulfillment of the governing physical laws in the training process -- that is, by making them physics-informed -- to compensate for possible scarce and/or unavailable data and improve the overall reliability. To do that, we first complement POD-DL-ROMs with a trunk net architecture, endowing them with the ability to compute the problem's solution at every point in the spatial domain, and ultimately enabling a seamless computation of the physics-based loss by means of the strong continuous formulation. Then, we introduce an efficient training strategy that limits the notorious computational burden entailed by a physics-informed training phase. In particular, we take advantage of the few available data to develop a low-cost pre-training procedure; then, we fine-tune the architecture in order to further improve the prediction reliability. Accuracy and efficiency of the resulting pre-trained physics-informed DL-ROMs (PTPI-DL-ROMs) are then assessed on a set of test cases ranging from non-affinely parametrized advection-diffusion-reaction equations, to nonlinear problems like the Navier-Stokes equations for fluid flows.
- [202] arXiv:2405.08562 [pdf, ps, other]
-
Title: The Unseen Targets of Hate -- A Systematic Review of Hateful Communication DatasetsZehui Yu, Indira Sen, Dennis Assenmacher, Mattia Samory, Leon Fröhling, Christina Dahn, Debora Nozza, Claudia WagnerComments: 20 pages, 14 figuresSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet, ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities and may discriminate against them, we know surprisingly little about the provenance of such bias. To fill this gap, we present a systematic review of the datasets for the automated detection of hateful communication introduced over the past decade, and unpack the quality of the datasets in terms of the identities that they embody: those of the targets of hateful communication that the data curators focused on, as well as those unintentionally included in the datasets. We find, overall, a skewed representation of selected target identities and mismatches between the targets that research conceptualizes and ultimately includes in datasets. Yet, by contextualizing these findings in the language and location of origin of the datasets, we highlight a positive trend towards the broadening and diversification of this research space.
- [203] arXiv:2405.08564 [pdf, ps, other]
-
Title: Anytime Sorting Algorithms (Extended Version)Comments: 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024), Aug 2024, Jeju City, Jeju Island, South KoreaSubjects: Data Structures and Algorithms (cs.DS)
This paper addresses the anytime sorting problem, aiming to develop algorithms providing tentative estimates of the sorted list at each execution step. Comparisons are treated as steps, and the Spearman's footrule metric evaluates estimation accuracy. We propose a general approach for making any sorting algorithm anytime and introduce two new algorithms: multizip sort and Corsort. Simulations showcase the superior performance of both algorithms compared to existing methods. Multizip sort keeps a low global complexity, while Corsort produces intermediate estimates surpassing previous algorithms.
- [204] arXiv:2405.08565 [pdf, ps, other]
-
Title: Space-time stochastic Galerkin boundary elements for acoustic scattering problemsComments: 30 pages, 14 figures, to appear in International Journal for Numerical Methods in EngineeringSubjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)
Acoustic emission or scattering problems naturally involve uncertainties about the sound sources or boundary conditions. This article initiates the study of time domain boundary elements for such stochastic boundary problems for the acoustic wave equation. We present a space-time stochastic Galerkin boundary element method which is applied to sound-hard, sound-soft and absorbing scatterers. Uncertainties in both the sources and the boundary conditions are considered using a polynomial chaos expansion. The numerical experiments illustrate the performance and convergence of the proposed method in model problems and present an application to a problem from traffic noise.
- [205] arXiv:2405.08566 [pdf, ps, other]
-
Title: Space-time boundary elements for frictional contact in elastodynamicsComments: 34 pages, 14 figures, to appear in Computer Methods in Applied Mechanics and EngineeringSubjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)
This article studies a boundary element method for dynamic frictional contact between linearly elastic bodies. We formulate these problems as a variational inequality on the boundary, involving the elastodynamic Poincaré-Steklov operator. The variational inequality is solved in a mixed formulation using boundary elements in space and time. In the model problem of unilateral Tresca friction contact with a rigid obstacle we obtain an a priori estimate for the resulting Galerkin approximations. Numerical experiments in two space dimensions demonstrate the stability, energy conservation and convergence of the proposed method for contact problems involving concrete and steel in the linearly elastic regime. They address both unilateral and two-sided dynamic contact with Tresca or Coulomb friction.
- [206] arXiv:2405.08567 [pdf, ps, other]
-
Title: Python-Based Reinforcement Learning on Simulink ModelsComments: Accepted at SMPS2024Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
This paper proposes a framework for training Reinforcement Learning agents using Python in conjunction with Simulink models. Leveraging Python's superior customization options and popular libraries like Stable Baselines3, we aim to bridge the gap between the established Simulink environment and the flexibility of Python for training bleeding edge agents. Our approach is demonstrated on the Quanser Aero 2, a versatile dual-rotor helicopter. We show that policies trained on Simulink models can be seamlessly transferred to the real system, enabling efficient development and deployment of Reinforcement Learning agents for control tasks. Through systematic integration steps, including C-code generation from Simulink, DLL compilation, and Python interface development, we establish a robust framework for training agents on Simulink models. Experimental results demonstrate the effectiveness of our approach, surpassing previous efforts and highlighting the potential of combining Simulink with Python for Reinforcement Learning research and applications.
- [207] arXiv:2405.08569 [pdf, ps, other]
-
Title: Can 3GPP New Radio Non-Terrestrial Networks Meet the IMT-2020 Requirements for Satellite Radio Interface Technology?Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
The International Telecommunication Union defined the requirements for 5G in the International Mobile Telecommunications 2020 (IMT-2020) standard in 2017. Since then, advances in technology and standardization have made the ubiquitous deployment of 5G via satellite a practical possibility, for example, in locations where terrestrial networks (TNs) are not available. However, it may be difficult for satellite networks to achieve the same performance as TNs. To address this, the IMT-2020 requirements for satellite radio interface technology have recently been established. In this paper, these requirements are evaluated through system simulations for the 3rd Generation Partnership Project New Radio non-terrestrial networks with a low Earth orbit satellite. The focus is on the throughput, area traffic capacity, and spectral efficiency requirements. It is observed that the downlink (DL) requirements can be met for user equipment with 2 receive antenna elements. The results also reveal that frequency reuse factor 1 (FRF1) may outperform FRF3 in DL with a dual-antenna setup, which is a surprising finding since FRF3 is typically considered to outperform FRF1 due to better interference reduction. For uplink (UL), 1 transmit antenna is sufficient to meet the requirements by a relatively large margin - a promising result given that UL is generally more demanding.
- [208] arXiv:2405.08570 [pdf, ps, other]
-
Title: Rethinking the adaptive relationship between Encoder Layers and Decoder LayersSubjects: Computation and Language (cs.CL)
This article explores the adaptive relationship between Encoder Layers and Decoder Layers using the SOTA model Helsinki-NLP/opus-mt-de-en, which translates German to English. The specific method involves introducing a bias-free fully connected layer between the Encoder and Decoder, with different initializations of the layer's weights, and observing the outcomes of fine-tuning versus retraining. Four experiments were conducted in total. The results suggest that directly modifying the pre-trained model structure for fine-tuning yields suboptimal performance. However, upon observing the outcomes of the experiments with retraining, this structural adjustment shows significant potential.
- [209] arXiv:2405.08572 [pdf, ps, other]
-
Title: COAST: Constraints and Streams for Task and Motion PlanningSubjects: Robotics (cs.RO)
Task and Motion Planning (TAMP) algorithms solve long-horizon robotics tasks by integrating task planning with motion planning; the task planner proposes a sequence of actions towards a goal state and the motion planner verifies whether this action sequence is geometrically feasible for the robot. However, state-of-the-art TAMP algorithms do not scale well with the difficulty of the task and require an impractical amount of time to solve relatively small problems. We propose Constraints and Streams for Task and Motion Planning (COAST), a probabilistically-complete, sampling-based TAMP algorithm that combines stream-based motion planning with an efficient, constrained task planning strategy. We validate COAST on three challenging TAMP domains and demonstrate that our method outperforms baselines in terms of cumulative task planning time by an order of magnitude. You can find more supplementary materials on our project \href{this https URL}{website}.
- [210] arXiv:2405.08573 [pdf, ps, html, other]
-
Title: ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic RadiographSubjects: Human-Computer Interaction (cs.HC)
Tooth segmentation is a key step for computer aided diagnosis of dental diseases. Numerous machine learning models have been employed for tooth segmentation on dental panoramic radiograph. However, it is a difficult task to achieve accurate tooth segmentation due to complex tooth shapes, diverse tooth categories and incomplete sample set for machine learning. In this paper, we propose ViSTooth, a visualization framework for tooth segmentation on dental panoramic radiograph. First, we employ Mask R-CNN to conduct preliminary tooth segmentation, and a set of domain metrics are proposed to estimate the accuracy of the segmented teeth, including tooth shape, tooth position and tooth angle. Then, we represent the teeth with high-dimensional vectors and visualize their distribution in a low-dimensional space, in which experts can easily observe those teeth with specific metrics. Further, we expand the sample set with the expert-specified teeth and train the tooth segmentation model iteratively. Finally, we conduct case study and expert study to demonstrate the effectiveness and usability of our ViSTooth, in aiding experts to implement accurate tooth segmentation guided by expert knowledge.
- [211] arXiv:2405.08576 [pdf, ps, other]
-
Title: Hearing Touch: Audio-Visual Pretraining for Contact-Rich ManipulationComments: Accepted to ICRA 2024Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Although pre-training on a large amount of data is beneficial for robot learning, current paradigms only perform large-scale pretraining for visual representations, whereas representations for other modalities are trained from scratch. In contrast to the abundance of visual data, it is unclear what relevant internet-scale data may be used for pretraining other modalities such as tactile sensing. Such pretraining becomes increasingly crucial in the low-data regimes common in robotics applications. In this paper, we address this gap by using contact microphones as an alternative tactile sensor. Our key insight is that contact microphones capture inherently audio-based information, allowing us to leverage large-scale audio-visual pretraining to obtain representations that boost the performance of robotic manipulation. To the best of our knowledge, our method is the first approach leveraging large-scale multisensory pre-training for robotic manipulation. For supplementary information including videos of real robot experiments, please see this https URL.
- [212] arXiv:2405.08577 [pdf, ps, other]
-
Title: Intelligent Control in 6G Open RAN: Security Risk or Opportunity?Comments: 36 pages, 14 figures, IEEE COMST (in review)Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Performance (cs.PF); Systems and Control (eess.SY)
The Open Radio Access Network (Open RAN) framework, emerging as the cornerstone for Artificial Intelligence (AI)-enabled Sixth-Generation (6G) mobile networks, heralds a transformative shift in radio access network architecture. As the adoption of Open RAN accelerates, ensuring its security becomes critical. The RAN Intelligent Controller (RIC) plays a central role in Open RAN by improving network efficiency and flexibility. Nevertheless, it also brings about potential security risks that need careful scrutiny. Therefore, it is imperative to evaluate the current state of RIC security comprehensively. This assessment is essential to gain a profound understanding of the security considerations associated with RIC. This survey combines a comprehensive analysis of RAN security, tracing its evolution from 2G to 5G, with an in-depth exploration of RIC security, marking the first comprehensive examination of its kind in the literature. Real-world security incidents involving RIC are vividly illustrated, providing practical insights. The study evaluates the security implications of the RIC within the 6G Open RAN context, addressing security vulnerabilities, mitigation strategies, and potential enhancements. It aims to guide stakeholders in the telecom industry toward a secure and dependable telecommunications infrastructure. The article serves as a valuable reference, shedding light on the RIC's crucial role within the broader network infrastructure and emphasizing security's paramount importance. This survey also explores the promising security opportunities that the RIC presents for enhancing network security and resilience in the context of 6G mobile networks. It outlines open issues, lessons learned, and future research directions in the domain of intelligent control in 6G open RAN, facilitating a comprehensive understanding of this dynamic landscape.
- [213] arXiv:2405.08578 [pdf, ps, other]
-
Title: Local-peak scale-invariant feature transform for fast and random image stitchingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image stitching aims to construct a wide field of view with high spatial resolution, which cannot be achieved in a single exposure. Typically, conventional image stitching techniques, other than deep learning, require complex computation and thus computational pricy, especially for stitching large raw images. In this study, inspired by the multiscale feature of fluid turbulence, we developed a fast feature point detection algorithm named local-peak scale-invariant feature transform (LP-SIFT), based on the multiscale local peaks and scale-invariant feature transform method. By combining LP-SIFT and RANSAC in image stitching, the stitching speed can be improved by orders, compared with the original SIFT method. Nine large images (over 2600*1600 pixels), arranged randomly without prior knowledge, can be stitched within 158.94 s. The algorithm is highly practical for applications requiring a wide field of view in diverse application scenes, e.g., terrain mapping, biological analysis, and even criminal investigation.
- [214] arXiv:2405.08582 [pdf, ps, other]
-
Title: Treatment Effect Estimation for User Interest Exploration on Recommender SystemsComments: Accepted to SIGIR 2024Subjects: Information Retrieval (cs.IR)
Recommender systems learn personalized user preferences from user feedback like clicks. However, user feedback is usually biased towards partially observed interests, leaving many users' hidden interests unexplored. Existing approaches typically mitigate the bias, increase recommendation diversity, or use bandit algorithms to balance exploration-exploitation trade-offs. Nevertheless, they fail to consider the potential rewards of recommending different categories of items and lack the global scheduling of allocating top-N recommendations to categories, leading to suboptimal exploration. In this work, we propose an Uplift model-based Recommender (UpliftRec) framework, which regards top-N recommendation as a treatment optimization problem. UpliftRec estimates the treatment effects, i.e., the click-through rate (CTR) under different category exposure ratios, by using observational user feedback. UpliftRec calculates group-level treatment effects to discover users' hidden interests with high CTR rewards and leverages inverse propensity weighting to alleviate confounder bias. Thereafter, UpliftRec adopts a dynamic programming method to calculate the optimal treatment for overall CTR maximization. We implement UpliftRec on different backend models and conduct extensive experiments on three datasets. The empirical results validate the effectiveness of UpliftRec in discovering users' hidden interests while achieving superior recommendation accuracy.
- [215] arXiv:2405.08584 [pdf, ps, other]
-
Title: When Do Low-Rate Concatenated Codes Approach The Gilbert-Varshamov Bound?Subjects: Information Theory (cs.IT); Computational Complexity (cs.CC)
The Gilbert--Varshamov (GV) bound is a classical existential result in coding theory. It implies that a random linear binary code of rate $\epsilon^2$ has relative distance at least $\frac{1}{2} - O(\epsilon)$ with high probability. However, it is a major challenge to construct explicit codes with similar parameters.
One hope to derandomize the Gilbert--Varshamov construction is with code concatenation: We begin with a (hopefully explicit) outer code ${C}_\mathrm{out}$ over a large alphabet, and concatenate that with a small binary random linear code ${C}_\mathrm{in}$. It is known that when we use \emph{independent} small codes for each coordinate, then the result lies on the GV bound with high probability, but this still uses a lot of randomness. In this paper, we consider the question of whether code concatenation with a single random linear inner code ${C}_\mathrm{in}$ can lie on the GV bound; and if so what conditions on ${C}_\mathrm{out}$ are sufficient for this.
We show that first, there do exist linear outer codes ${C}_\mathrm{out}$ that are "good" for concatenation in this sense (in fact, most linear codes codes are good). We also provide two sufficient conditions for ${C}_\mathrm{out}$, so that if ${C}_\mathrm{out}$ satisfies these, ${C}_\mathrm{out}\circ {C}_\mathrm{in}$ will likely lie on the GV bound. We hope that these conditions may inspire future work towards constructing explicit codes ${C}_\mathrm{out}$. - [216] arXiv:2405.08586 [pdf, ps, other]
-
Title: Cross-Domain Feature Augmentation for Domain GeneralizationComments: Accepted to the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024); Code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature space is more versatile and has shown promising results. Nonetheless, feature semantics is seldom considered and existing feature augmentation methods suffer from a limited variety of augmented features. We decompose features into class-generic, class-specific, domain-generic, and domain-specific components. We propose a cross-domain feature augmentation method named XDomainMix that enables us to increase sample diversity while emphasizing the learning of invariant representations to achieve domain generalization. Experiments on widely used benchmark datasets demonstrate that our proposed method is able to achieve state-of-the-art performance. Quantitative analysis indicates that our feature augmentation approach facilitates the learning of effective models that are invariant across different domains.
- [217] arXiv:2405.08587 [pdf, ps, other]
-
Title: EchoTracker: Advancing Myocardial Point Tracking in EchocardiographyMd Abulkalam Azad, Artem Chernyshov, John Nyberg, Ingrid Tveten, Lasse Lovstakken, Håvard Dalen, Bjørnar Grenne, Andreas ØstvikComments: Submitted version that got provisionally (early) accepted (top 11%) to MICCAI2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Tissue tracking in echocardiography is challenging due to the complex cardiac motion and the inherent nature of ultrasound acquisitions. Although optical flow methods are considered state-of-the-art (SOTA), they struggle with long-range tracking, noise occlusions, and drift throughout the cardiac cycle. Recently, novel learning-based point tracking techniques have been introduced to tackle some of these issues. In this paper, we build upon these techniques and introduce EchoTracker, a two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound image sequences. The architecture contains a preliminary coarse initialization of the trajectories, followed by reinforcement iterations based on fine-grained appearance changes. It is efficient, light, and can run on mid-range GPUs. Experiments demonstrate that the model outperforms SOTA methods, with an average position accuracy of 67% and a median trajectory error of 2.86 pixels. Furthermore, we show a relative improvement of 25% when using our model to calculate the global longitudinal strain (GLS) in a clinical test-retest dataset compared to other methods. This implies that learning-based point tracking can potentially improve performance and yield a higher diagnostic and prognostic value for clinical measurements than current techniques. Our source code is available at: this https URL.
- [218] arXiv:2405.08589 [pdf, ps, other]
-
Title: Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point SetsSubjects: Computer Vision and Pattern Recognition (cs.CV)
In many applications, the demand arises for algorithms capable of aligning partially overlapping point sets while remaining invariant to the corresponding transformations. This research presents a method designed to meet such requirements through minimization of the objective function of the robust point matching (RPM) algorithm. First, we show that the RPM objective is a cubic polynomial. Then, through variable substitution, we transform the RPM objective to a quadratic function. Leveraging the convex envelope of bilinear monomials, we proceed to relax the resulting objective function, thus obtaining a lower bound problem that can be conveniently decomposed into distinct linear assignment and low-dimensional convex quadratic program components, both amenable to efficient optimization. Furthermore, a branch-and-bound (BnB) algorithm is devised, which solely branches over the transformation parameters, thereby boosting convergence rate. Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, particularly in scenarios where outliers remain distinct from inliers, when compared with prevailing state-of-the-art approaches.
- [219] arXiv:2405.08593 [pdf, ps, other]
-
Title: Open-Vocabulary Object Detection via Neighboring Region Attention AlignmentSubjects: Computer Vision and Pattern Recognition (cs.CV)
The nature of diversity in real-world environments necessitates neural network models to expand from closed category settings to accommodate novel emerging categories. In this paper, we study the open-vocabulary object detection (OVD), which facilitates the detection of novel object classes under the supervision of only base annotations and open-vocabulary knowledge. However, we find that the inadequacy of neighboring relationships between regions during the alignment process inevitably constrains the performance on recent distillation-based OVD strategies. To this end, we propose Neighboring Region Attention Alignment (NRAA), which performs alignment within the attention mechanism of a set of neighboring regions to boost the open-vocabulary inference. Specifically, for a given proposal region, we randomly explore the neighboring boxes and conduct our proposed neighboring region attention (NRA) mechanism to extract relationship information. Then, this interaction information is seamlessly provided into the distillation procedure to assist the alignment between the detector and the pre-trained vision-language models (VLMs). Extensive experiments validate that our proposed model exhibits superior performance on open-vocabulary benchmarks.
- [220] arXiv:2405.08595 [pdf, ps, other]
-
Title: Online busy time scheduling with flexible jobsSubjects: Data Structures and Algorithms (cs.DS)
We present several competitive ratios for the online busy time scheduling problem with flexible jobs. The busy time scheduling problem is a fundamental scheduling problem motivated by energy efficiency with the goal of minimizing the total time that machines with multiple processors are enabled. In the busy time scheduling problem, an unbounded number of machines is given, where each machine has $g$ processors. No more than $g$ jobs can be scheduled simultaneously on each machine. A machine consumes energy whenever at least one job is scheduled at any time on the machine. Scheduling a single job at some time $t$ consumes the same amount of energy as scheduling $g$ jobs at time $t$. In the online setting, jobs are revealed when they are released.
We consider the cases where $g$ is unbounded and bounded. In this paper, we revisit the bounds of the unbounded general setting from the literature and tighten it significantly. We also consider agreeable jobs. For the bounded setting, we show a tightened upper bound. Furthermore, we show the first constant competitive ratio in the bounded setting that does not require lookahead. - [221] arXiv:2405.08596 [pdf, ps, other]
-
Title: EVDA: Evolving Deepfake Audio Detection Continual Learning BenchmarkSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
The rise of advanced large language models such as GPT-4, GPT-4o, and the Claude family has made fake audio detection increasingly challenging. Traditional fine-tuning methods struggle to keep pace with the evolving landscape of synthetic speech, necessitating continual learning approaches that can adapt to new audio while retaining the ability to detect older types. Continual learning, which acts as an effective tool for detecting newly emerged deepfake audio while maintaining performance on older types, lacks a well-constructed and user-friendly evaluation framework. To address this gap, we introduce EVDA, a benchmark for evaluating continual learning methods in deepfake audio detection. EVDA includes classic datasets from the Anti-Spoofing Voice series, Chinese fake audio detection series, and newly generated deepfake audio from models like GPT-4 and GPT-4o. It supports various continual learning techniques, such as Elastic Weight Consolidation (EWC), Learning without Forgetting (LwF), and recent methods like Regularized Adaptive Weight Modification (RAWM) and Radian Weight Modification (RWM). Additionally, EVDA facilitates the development of robust algorithms by providing an open interface for integrating new continual learning methods
- [222] arXiv:2405.08597 [pdf, ps, other]
-
Title: Risks and Opportunities of Open-Source Generative AIFrancisco Eiras, Aleksander Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, Fazel Keshtkar, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob FoersterComments: Extension of arXiv:2404.17047Subjects: Machine Learning (cs.LG)
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.
- [223] arXiv:2405.08598 [pdf, ps, other]
-
Title: Spectral approximation of convolution operators of Fredholm typeSubjects: Numerical Analysis (math.NA)
We have developed a method for constructing spectral approximations for convolution operators of Fredholm type. The algorithm we propose is numerically stable and takes advantage of the recurrence relations satisfied by the entries of such a matrix approximation. When used for computing the Fredholm convolution of two given functions, such approximations produce the convolution more rapidly than the state-of-the-art methods. The proposed approximation also leads to a spectral method for solving the Fredholm convolution integral equations and enables the computation of eigenvalues and pseudospectra of Fredholm convolution operators, which is otherwise intractable with existing techniques.
- [224] arXiv:2405.08599 [pdf, ps, other]
-
Title: The distributed biased min-consensus protocol revisited: pre-specified finite time control strategies and small-gain based analysisSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Unlike the classical distributed consensus protocols enabling the group of agents as a whole to reach an agreement regarding a certain quantity of interest in a distributed fashion, the distributed biased min-consensus protocol (DBMC) has been proven to generate advanced complexity pertaining to solving the shortest path problem. As such a protocol is commonly incorporated as the first step of a hierarchical architecture in real applications, e.g., robots path planning, management of dispersed computing services, an impedance limiting the application potential of DBMC lies in, the lack of results regarding to its convergence within a user-assigned time. In this paper, we first propose two control strategies ensuring the state error of DBMC decrease exactly to zero or a desired level manipulated by the user, respectively. To compensate the high feedback gains incurred by these two control strategies, this paper further investigates the nominal DBMC itself. By leveraging small gain based stability tools, this paper also proves the global exponential input-to-state stability of DBMC, outperforming its current stability results. Simulations have been provided to validate the efficacy of our theoretical result.
- [225] arXiv:2405.08603 [pdf, ps, other]
-
Title: A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in MedicineSubjects: Computation and Language (cs.CL)
Since the release of ChatGPT and GPT-4, large language models (LLMs) and multimodal large language models (MLLMs) have garnered significant attention due to their powerful and general capabilities in understanding, reasoning, and generation, thereby offering new paradigms for the integration of artificial intelligence with medicine. This survey comprehensively overviews the development background and principles of LLMs and MLLMs, as well as explores their application scenarios, challenges, and future directions in medicine. Specifically, this survey begins by focusing on the paradigm shift, tracing the evolution from traditional models to LLMs and MLLMs, summarizing the model structures to provide detailed foundational knowledge. Subsequently, the survey details the entire process from constructing and evaluating to using LLMs and MLLMs with a clear logic. Following this, to emphasize the significant value of LLMs and MLLMs in healthcare, we survey and summarize 6 promising applications in healthcare. Finally, the survey discusses the challenges faced by medical LLMs and MLLMs and proposes a feasible approach and direction for the subsequent integration of artificial intelligence with medicine. Thus, this survey aims to provide researchers with a valuable and comprehensive reference guide from the perspectives of the background, principles, and clinical applications of LLMs and MLLMs.
- [226] arXiv:2405.08604 [pdf, ps, other]
-
Title: Towards Geometry-Aware Pareto Set Learning for Neural Multi-Objective Combinatorial OptimizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Multi-objective combinatorial optimization (MOCO) problems are prevalent in various real-world applications. Most existing neural methods for MOCO problems rely solely on decomposition and utilize precise hypervolume to enhance diversity. However, these methods often approximate only limited regions of the Pareto front and spend excessive time on diversity enhancement because of ambiguous decomposition and time-consuming hypervolume calculation. To address these limitations, we design a Geometry-Aware Pareto set Learning algorithm named GAPL, which provides a novel geometric perspective for neural MOCO via a Pareto attention model based on hypervolume expectation maximization. In addition, we propose a hypervolume residual update strategy to enable the Pareto attention model to capture both local and non-local information of the Pareto set/front. We also design a novel inference approach to further improve quality of the solution set and speed up hypervolume calculation and local subset selection. Experimental results on three classic MOCO problems demonstrate that our GAPL outperforms state-of-the-art neural baselines via superior decomposition and efficient diversity enhancement.
- [227] arXiv:2405.08609 [pdf, ps, other]
-
Title: Dynamic NeRF: A ReviewComments: 25 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Neural Radiance Field(NeRF) is an novel implicit method to achieve the 3D reconstruction and representation with a high resolution. After the first research of NeRF is proposed, NeRF has gained a robust developing power and is booming in the 3D modeling, representation and reconstruction areas. However the first and most of the followed research projects based on NeRF is static, which are weak in the practical applications. Therefore, more researcher are interested and focused on the study of dynamic NeRF that is more feasible and useful in practical applications or situations. Compared with the static NeRF, implementing the Dynamic NeRF is more difficult and complex. But Dynamic is more potential in the future even is the basic of Editable NeRF. In this review, we made a detailed and abundant statement for the development and important implementation principles of Dynamci NeRF. The analysis of main principle and development of Dynamic NeRF is from 2021 to 2023, including the most of the Dynamic NeRF projects. What is more, with colorful and novel special designed figures and table, We also made a detailed comparison and analysis of different features of various of Dynamic. Besides, we analyzed and discussed the key methods to implement a Dynamic NeRF. The volume of the reference papers is large. The statements and comparisons are multidimensional. With a reading of this review, the whole development history and most of the main design method or principles of Dynamic NeRF can be easy understood and gained.
- [228] arXiv:2405.08619 [pdf, ps, other]
-
Title: ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive OptimisationSubjects: Computation and Language (cs.CL); Multimedia (cs.MM)
The field of chemistry and Artificial Intelligence (AI) intersection is an area of active research that aims to accelerate scientific discovery. The integration of large language models (LLMs) with scientific modalities has shown significant promise in this endeavour. However, challenges persist in effectively addressing training efficacy and the out-of-distribution problem, particularly as existing approaches rely on larger models and datasets. In this context, we focus on machine language-molecule translation and deploy a novel training approach called contrastive preference optimisation, which avoids generating translations that are merely adequate but not perfect. To ensure generalisability and mitigate memorisation effects, we conduct experiments using only 10\% of the data. Our results demonstrate that our models achieve up to a 32\% improvement compared to counterpart models. We also introduce a scalable fine-grained evaluation methodology that accommodates responsibility.
- [229] arXiv:2405.08625 [pdf, ps, html, other]
-
Title: Optimal Almost-Balanced SequencesComments: Accepted to The IEEE International Symposium on Information Theory (ISIT) 2024Subjects: Information Theory (cs.IT)
This paper presents a novel approach to address the constrained coding challenge of generating almost-balanced sequences. While strictly balanced sequences have been well studied in the past, the problem of designing efficient algorithms with small redundancy, preferably constant or even a single bit, for almost balanced sequences has remained unsolved. A sequence is $\varepsilon(n)$-almost balanced if its Hamming weight is between $0.5n\pm \varepsilon(n)$. It is known that for any algorithm with a constant number of bits, $\varepsilon(n)$ has to be in the order of $\Theta(\sqrt{n})$, with $O(n)$ average time complexity. However, prior solutions with a single redundancy bit required $\varepsilon(n)$ to be a linear shift from $n/2$. Employing an iterative method and arithmetic coding, our emphasis lies in constructing almost balanced codes with a single redundancy bit. Notably, our method surpasses previous approaches by achieving the optimal balanced order of $\Theta(\sqrt{n})$. Additionally, we extend our method to the non-binary case considering $q$-ary almost polarity-balanced sequences for even $q$, and almost symbol-balanced for $q=4$. Our work marks the first asymptotically optimal solutions for almost-balanced sequences, for both, binary and non-binary alphabet.
- [230] arXiv:2405.08626 [pdf, ps, html, other]
-
Title: Literature Review on Maneuver-Based Scenario Description for Automated Driving SimulationsComments: 8 pages, 1 figure, 1 tableJournal-ref: 2023 IEEE Intelligent Vehicles SymposiumSubjects: Robotics (cs.RO)
The increasing complexity of automated driving functions and their growing operational design domains imply more demanding requirements on their validation. Classical methods such as field tests or formal analyses are not sufficient anymore and need to be complemented by simulations. For simulations, the standard approach is scenario-based testing, as opposed to distance-based testing primarily performed in field tests. Currently, the time evolution of specific scenarios is mainly described using trajectories, which limit or at least hamper generalizations towards variations. As an alternative, maneuver-based approaches have been proposed. We shed light on the state of the art and available foundations for this new method through a literature review of early and recent works related to maneuver-based scenario description. It includes related modeling approaches originally developed for other applications. Current limitations and research gaps are identified.
- [231] arXiv:2405.08637 [pdf, ps, other]
-
Title: Drift Detection: Introducing Gaussian Split DetectorSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Recent research yielded a wide array of drift detectors. However, in order to achieve remarkable performance, the true class labels must be available during the drift detection phase. This paper targets at detecting drift when the ground truth is unknown during the detection phase. To that end, we introduce Gaussian Split Detector (GSD) a novel drift detector that works in batch mode. GSD is designed to work when the data follow a normal distribution and makes use of Gaussian mixture models to monitor changes in the decision boundary. The algorithm is designed to handle multi-dimension data streams and to work without the ground truth labels during the inference phase making it pertinent for real world use. In an extensive experimental study on real and synthetic datasets, we evaluate our detector against the state of the art. We show that our detector outperforms the state of the art in detecting real drift and in ignoring virtual drift which is key to avoid false alarms.
- [232] arXiv:2405.08638 [pdf, ps, html, other]
-
Title: vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy ImprovementYiwen Zhu, Jinyi Liu, Wenya Wei, Qianyi Fu, Yujing Hu, Zhou Fang, Bo An, Jianye Hao, Tangjie Lv, Changjie FanComments: Accepted by IJCAI 2024, with appendixSubjects: Machine Learning (cs.LG)
Reinforcement Learning (RL) is a widely employed technique in decision-making problems, encompassing two fundamental operations -- policy evaluation and policy improvement. Enhancing learning efficiency remains a key challenge in RL, with many efforts focused on using ensemble critics to boost policy evaluation efficiency. However, when using multiple critics, the actor in the policy improvement process can obtain different gradients. Previous studies have combined these gradients without considering their disagreements. Therefore, optimizing the policy improvement process is crucial to enhance learning efficiency. This study focuses on investigating the impact of gradient disagreements caused by ensemble critics on policy improvement. We introduce the concept of uncertainty of gradient directions as a means to measure the disagreement among gradients utilized in the policy improvement process. Through measuring the disagreement among gradients, we find that transitions with lower uncertainty of gradient directions are more reliable in the policy improvement process. Building on this analysis, we propose a method called von Mises-Fisher Experience Resampling (vMFER), which optimizes the policy improvement process by resampling transitions and assigning higher confidence to transitions with lower uncertainty of gradient directions. Our experiments demonstrate that vMFER significantly outperforms the benchmark and is particularly well-suited for ensemble structures in RL.
- [233] arXiv:2405.08644 [pdf, ps, other]
-
Title: Thinking Tokens for Language ModelingComments: AITP 2023 (May 10, 2023)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
How much is 56 times 37? Language models often make mistakes in these types of difficult calculations. This is usually explained by their inability to perform complex reasoning. Since language models rely on large training sets and great memorization capability, naturally they are not equipped to run complex calculations. However, one can argue that humans also cannot perform this calculation immediately and require a considerable amount of time to construct the solution. In order to enhance the generalization capability of language models, and as a parallel to human behavior, we propose to use special 'thinking tokens' which allow the model to perform much more calculations whenever a complex problem is encountered.
- [234] arXiv:2405.08645 [pdf, ps, other]
-
Title: Certifying Robustness of Graph Convolutional Networks for Node Perturbation with Polyhedra Abstract InterpretationSubjects: Machine Learning (cs.LG); Formal Languages and Automata Theory (cs.FL)
Graph convolutional neural networks (GCNs) are powerful tools for learning graph-based knowledge representations from training data. However, they are vulnerable to small perturbations in the input graph, which makes them susceptible to input faults or adversarial attacks. This poses a significant problem for GCNs intended to be used in critical applications, which need to provide certifiably robust services even in the presence of adversarial perturbations. We propose an improved GCN robustness certification technique for node classification in the presence of node feature perturbations. We introduce a novel polyhedra-based abstract interpretation approach to tackle specific challenges of graph data and provide tight upper and lower bounds for the robustness of the GCN. Experiments show that our approach simultaneously improves the tightness of robustness bounds as well as the runtime performance of certification. Moreover, our method can be used during training to further improve the robustness of GCNs.
- [235] arXiv:2405.08647 [pdf, ps, other]
-
Title: Output-decomposed Learning of Mealy MachinesComments: LearnAut 2024Subjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG)
We present an active automata learning algorithm which learns a decomposition of a finite state machine, based on projecting onto individual outputs. This is dual to a recent compositional learning algorithm by Labbaf et al. (2023). When projecting the outputs to a smaller set, the model itself is reduced in size. By having several such projections, we do not lose any information and the full system can be reconstructed. Depending on the structure of the system this reduces the number of queries drastically, as shown by a preliminary evaluation of the algorithm.
- [236] arXiv:2405.08649 [pdf, ps, other]
-
Title: The computational power of discrete chemical reaction networks with bounded executionsSubjects: Computational Complexity (cs.CC)
Chemical reaction networks (CRNs) model systems where molecules interact according to a finite set of reactions such as (A + B \to C), representing that if a molecule of (A) and (B) collide, they disappear and a molecule of (C) is produced. CRNs can compute Boolean-valued predicates (\phi:\mathbb{N}^d \to \{0,1\}) and integer-valued functions (f:\mathbb{N}^d \to \mathbb{N}); for instance (X_1 + X_2 \to Y) computes the function (\min(x_1,x_2)).
We study the computational power of execution bounded CRNs, in which only a finite number of reactions can occur from the initial configuration (e.g., ruling out reversible reactions such as (A \rightleftharpoons B)). The power and composability of such CRNs depend crucially on some other modeling choices that do not affect the computational power of CRNs with unbounded executions, namely whether an initial leader is present, and whether (for predicates) all species are required to "vote" for the Boolean output. If the CRN starts with an initial leader, and can allow only the leader to vote, then all semilinear predicates and functions can be stably computed in (O(n \log n)) parallel time by execution bounded CRNs.
However, if no initial leader is allowed, all species vote, and the CRN is "noncollapsing" (does not shrink from initially large to final (O(1)) size configurations), then execution bounded CRNs are severely limited, able to compute only eventually constant predicates. A key tool is to characterize execution bounded CRNs as precisely those with a nonnegative linear potential function that is strictly decreased by every reaction, a result that may be of independent interest. - [237] arXiv:2405.08651 [pdf, ps, other]
-
Title: BeACONS: A Blockchain-enabled Authentication and Communications Network for Scalable IoVSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This paper introduces a novel blockchain-enabled authentication and communications network for scalable Internet of Vehicles, which aims to bolster security and confidentiality, diminish communications latency, and reduce dependence on centralised infrastructures like Certificate Authorities and Public Key Infrastructures by leveraging Blockchain-enabled Domain Name Services and Blockchain-enabled Mutual Authentication. The proposed network is structured into a primary layer, consisting of Road Side Units and edge servers as servers of Blockchain-enabled Domain Name Services for managing inter-vehicle communications identities, and a sub-layer within each vehicle for intra-vehicle communications via the Blockchain-enabled Mutual Authentication Protocol. This design facilitates secure connections across vehicles by coordinating between the layers, significantly improving communications security and efficiency. This study also evaluates Road Side Unit availability against the random distribution of Road Side Units along the route of different vehicles. The proposed model presents a novel pathway towards a decentralised, secure, and efficient Internet of Vehicles ecosystem, contributing to the advancement of autonomous and trustworthy vehicular networks.
- [238] arXiv:2405.08654 [pdf, ps, other]
-
Title: Can we Defend Against the Unknown? An Empirical Study About Threshold Selection for Neural Network MonitoringComments: 13 pages, 5 figures, 6 tables. To appear in the proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
With the increasing use of neural networks in critical systems, runtime monitoring becomes essential to reject unsafe predictions during inference. Various techniques have emerged to establish rejection scores that maximize the separability between the distributions of safe and unsafe predictions. The efficacy of these approaches is mostly evaluated using threshold-agnostic metrics, such as the area under the receiver operating characteristic curve. However, in real-world applications, an effective monitor also requires identifying a good threshold to transform these scores into meaningful binary decisions. Despite the pivotal importance of threshold optimization, this problem has received little attention. A few studies touch upon this question, but they typically assume that the runtime data distribution mirrors the training distribution, which is a strong assumption as monitors are supposed to safeguard a system against potentially unforeseen threats. In this work, we present rigorous experiments on various image datasets to investigate: 1. The effectiveness of monitors in handling unforeseen threats, which are not available during threshold adjustments. 2. Whether integrating generic threats into the threshold optimization scheme can enhance the robustness of monitors.
- [239] arXiv:2405.08655 [pdf, ps, html, other]
-
Title: A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement LearningComments: 15 pages, 2 figures, submitted to Agents in Traffic and Transportation (ATT2024)Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Autonomous intersection management (AIM) poses significant challenges due to the intricate nature of real-world traffic scenarios and the need for a highly expensive centralised server in charge of simultaneously controlling all the vehicles. This study addresses such issues by proposing a novel distributed approach to AIM utilizing multi-agent reinforcement learning (MARL). We show that by leveraging the 3D surround view technology for advanced assistance systems, autonomous vehicles can accurately navigate intersection scenarios without needing any centralised controller. The contributions of this paper thus include a MARL-based algorithm for the autonomous management of a 4-way intersection and also the introduction of a new strategy called prioritised scenario replay for improved training efficacy. We validate our approach as an innovative alternative to conventional centralised AIM techniques, ensuring the full reproducibility of our results. Specifically, experiments conducted in virtual environments using the SMARTS platform highlight its superiority over benchmarks across various metrics.
- [240] arXiv:2405.08661 [pdf, ps, other]
-
Title: Gradient Estimation and Variance Reduction in Stochastic and Deterministic ModelsComments: cornell university dissertationSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
It seems that in the current age, computers, computation, and data have an increasingly important role to play in scientific research and discovery. This is reflected in part by the rise of machine learning and artificial intelligence, which have become great areas of interest not just for computer science but also for many other fields of study. More generally, there have been trends moving towards the use of bigger, more complex and higher capacity models. It also seems that stochastic models, and stochastic variants of existing deterministic models, have become important research directions in various fields. For all of these types of models, gradient-based optimization remains as the dominant paradigm for model fitting, control, and more. This dissertation considers unconstrained, nonlinear optimization problems, with a focus on the gradient itself, that key quantity which enables the solution of such problems.
In chapter 1, we introduce the notion of reverse differentiation, a term which describes the body of techniques which enables the efficient computation of gradients. We cover relevant techniques both in the deterministic and stochastic cases. We present a new framework for calculating the gradient of problems which involve both deterministic and stochastic elements. In chapter 2, we analyze the properties of the gradient estimator, with a focus on those properties which are typically assumed in convergence proofs of optimization algorithms. Chapter 3 gives various examples of applying our new gradient estimator. We further explore the idea of working with piecewise continuous models, that is, models with distinct branches and if statements which define what specific branch to use. - [241] arXiv:2405.08663 [pdf, ps, other]
-
Title: D-CAST: Distributed Consensus Switch in Wireless Trustworthy Autonomous SystemSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The protocols of distributed consensus normally aim to tolerate different types of faults including crash faults and byzantine faults that occur in the distributed systems. However, the dynamic network topology and stochastic wireless channels may cause the same trustworthy system to suffer both crash fault and byzantine fault. This article proposes the concept of a distributed consensus autonomous switch mechanism in trustworthy autonomous systems (D-CAST) to reach the different fault tolerance requirements of the dynamic nodes and discusses the challenges of D-CAST while it is implemented in the wireless trustworthy system.
- [242] arXiv:2405.08668 [pdf, ps, other]
-
Title: Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM ResearchSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)
Large-scale Vision-Language Models (VLMs) have demonstrated exceptional performance in natural vision tasks, motivating researchers across domains to explore domain-specific VLMs. However, the construction of powerful domain-specific VLMs demands vast amounts of annotated data, substantial electrical energy, and computing resources, primarily accessible to industry, yet hindering VLM research in academia. To address this challenge and foster sustainable and equitable VLM research, we present the Generalized Domain Prompt Learning (GDPL) framework. GDPL facilitates the transfer of VLMs' robust recognition capabilities from natural vision to specialized domains, without the need for extensive data or resources. By leveraging small-scale domain-specific foundation models and minimal prompt samples, GDPL empowers the language branch with domain knowledge through quaternion networks, uncovering cross-modal relationships between domain-specific vision features and natural vision-based contextual embeddings. Simultaneously, GDPL guides the vision branch into specific domains through hierarchical propagation of generated vision prompt features, grounded in well-matched vision-language relations. Furthermore, to fully harness the domain adaptation potential of VLMs, we introduce a novel low-rank adaptation approach. Extensive experiments across diverse domains like remote sensing, medical imaging, geology, Synthetic Aperture Radar, and fluid dynamics, validate the efficacy of GDPL, demonstrating its ability to achieve state-of-the-art domain recognition performance in a prompt learning paradigm. Our framework paves the way for sustainable and inclusive VLM research, transcending the barriers between academia and industry.
- [243] arXiv:2405.08674 [pdf, ps, other]
-
Title: Expensive Multi-Objective Bayesian Optimization Based on Diffusion ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Multi-objective Bayesian optimization (MOBO) has shown promising performance on various expensive multi-objective optimization problems (EMOPs). However, effectively modeling complex distributions of the Pareto optimal solutions is difficult with limited function evaluations. Existing Pareto set learning algorithms may exhibit considerable instability in such expensive scenarios, leading to significant deviations between the obtained solution set and the Pareto set (PS). In this paper, we propose a novel Composite Diffusion Model based Pareto Set Learning algorithm, namely CDM-PSL, for expensive MOBO. CDM-PSL includes both unconditional and conditional diffusion model for generating high-quality samples. Besides, we introduce an information entropy based weighting method to balance different objectives of EMOPs. This method is integrated with the guiding strategy, ensuring that all the objectives are appropriately balanced and given due consideration during the optimization process; Extensive experimental results on both synthetic benchmarks and real-world problems demonstrates that our proposed algorithm attains superior performance compared with various state-of-the-art MOBO algorithms.
- [244] arXiv:2405.08679 [pdf, ps, other]
-
Title: Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation LearningComments: Self-supervision in Audio, Speech and Beyond workshop, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
This paper addresses the problem of self-supervised general-purpose audio representation learning. We explore the use of Joint-Embedding Predictive Architectures (JEPA) for this task, which consists of splitting an input mel-spectrogram into two parts (context and target), computing neural representations for each, and training the neural network to predict the target representations from the context representations. We investigate several design choices within this framework and study their influence through extensive experiments by evaluating our models on various audio classification benchmarks, including environmental sounds, speech and music downstream tasks. We focus notably on which part of the input data is used as context or target and show experimentally that it significantly impacts the model's quality. In particular, we notice that some effective design choices in the image domain lead to poor performance on audio, thus highlighting major differences between these two modalities.
- [245] arXiv:2405.08681 [pdf, ps, other]
-
Title: Achieving Fairness Through Channel Pruning for Dermatological Disease DiagnosisComments: 13 pages, 3 figures, early accepted by International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Numerous studies have revealed that deep learning-based medical image classification models may exhibit bias towards specific demographic attributes, such as race, gender, and age. Existing bias mitigation methods often achieve high level of fairness at the cost of significant accuracy degradation. In response to this challenge, we propose an innovative and adaptable Soft Nearest Neighbor Loss-based channel pruning framework, which achieves fairness through channel pruning. Traditionally, channel pruning is utilized to accelerate neural network inference. However, our work demonstrates that pruning can also be a potent tool for achieving fairness. Our key insight is that different channels in a layer contribute differently to the accuracy of different groups. By selectively pruning critical channels that lead to the accuracy difference between the privileged and unprivileged groups, we can effectively improve fairness without sacrificing accuracy significantly. Experiments conducted on two skin lesion diagnosis datasets across multiple sensitive attributes validate the effectiveness of our method in achieving state-of-the-art trade-off between accuracy and fairness. Our code is available at this https URL.
- [246] arXiv:2405.08690 [pdf, ps, other]
-
Title: Double-activation neural network for solving parabolic equations with time delayComments: 21 pages,11 figuresSubjects: Numerical Analysis (math.NA)
This paper presents the double-activation neural network (DANN), a novel network architecture designed for solving parabolic equations with time delay. In DANN, each neuron is equipped with two activation functions to augment the network's nonlinear expressive capacity. Additionally, a new parameter is introduced for the construction of the quadratic terms in one of two activation functions, which further enhances the network's ability to capture complex nonlinear relationships. To address the issue of low fitting accuracy caused by the discontinuity of solution's derivative, a piecewise fitting approach is proposed by dividing the global solving domain into several subdomains. The convergence of the loss function is proven. Numerical results are presented to demonstrate the superior accuracy and faster convergence of DANN compared to the traditional physics-informed neural network (PINN).
- [247] arXiv:2405.08691 [pdf, ps, other]
-
Title: Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search PlanningComments: Submitted to IROS 2024Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This study investigates the computational speed and accuracy of two numerical integration methods, cubature and sampling-based, for integrating an integrand over a 2D polygon. Using a group of rovers searching the Martian surface with a limited sensor footprint as a test bed, the relative error and computational time are compared as the area was subdivided to improve accuracy in the sampling-based approach. The results show that the sampling-based approach exhibits a $14.75\%$ deviation in relative error compared to cubature when it matches the computational performance at $100\%$. Furthermore, achieving a relative error below $1\%$ necessitates a $10000\%$ increase in relative time to calculate due to the $\mathcal{O}(N^2)$ complexity of the sampling-based method. It is concluded that for enhancing reinforcement learning capabilities and other high iteration algorithms, the cubature method is preferred over the sampling-based method.
- [248] arXiv:2405.08695 [pdf, ps, other]
-
Title: The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasksSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Addressing multi-label action recognition in videos represents a significant challenge for robotic applications in dynamic environments, especially when the robot is required to cooperate with humans in tasks that involve objects. Existing methods still struggle to recognize unseen actions or require extensive training data. To overcome these problems, we propose Dual-VCLIP, a unified approach for zero-shot multi-label action recognition. Dual-VCLIP enhances VCLIP, a zero-shot action recognition method, with the DualCoOp method for multi-label image classification. The strength of our method is that at training time it only learns two prompts, and it is therefore much simpler than other methods. We validate our method on the Charades dataset that includes a majority of object-based actions, demonstrating that -- despite its simplicity -- our method performs favorably with respect to existing methods on the complete dataset, and promising performance when tested on unseen actions. Our contribution emphasizes the impact of verb-object class-splits during robots' training for new cooperative tasks, highlighting the influence on the performance and giving insights into mitigating biases.
- [249] arXiv:2405.08698 [pdf, ps, other]
-
Title: Byzantine-Resilient Secure Aggregation for Federated Learning Without Privacy CompromisesSubjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Federated learning (FL) shows great promise in large scale machine learning, but brings new risks in terms of privacy and security. We propose ByITFL, a novel scheme for FL that provides resilience against Byzantine users while keeping the users' data private from the federator and private from other users. The scheme builds on the preexisting non-private FLTrust scheme, which tolerates malicious users through trust scores (TS) that attenuate or amplify the users' gradients. The trust scores are based on the ReLU function, which we approximate by a polynomial. The distributed and privacy-preserving computation in ByITFL is designed using a combination of Lagrange coded computing, verifiable secret sharing and re-randomization steps. ByITFL is the first Byzantine resilient scheme for FL with full information-theoretic privacy.
- [250] arXiv:2405.08704 [pdf, ps, other]
-
Title: Full Line Code Completion: Bringing AI to DesktopAnton Semenkin, Vitaliy Bibaev, Yaroslav Sokolov, Kirill Krylov, Alexey Kalina, Anna Khannanova, Danila Savenkov, Darya Rovdo, Igor Davidenko, Kirill Karnaukhov, Maxim Vakhrushev, Mikhail Kostyukov, Mikhail Podvitskii, Petr Surkov, Yaroslav Golubev, Nikita Povarov, Timofey BryksinComments: 12 pages, 4 figuresSubjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
In recent years, several industrial solutions for the problem of multi-token code completion have appeared, each making a great advance in the area but mostly focusing on cloud-based runtime and avoiding working on the end user's device.
In this work, we describe our approach for building a multi-token code completion feature for the JetBrains' IntelliJ Platform, which we call Full Line Code Completion. The feature suggests only syntactically correct code and works fully locally, i.e., data querying and the generation of suggestions happens on the end user's machine. We share important time and memory-consumption restrictions, as well as design principles that a code completion engine should satisfy. Working entirely on the end user's device, our code completion engine enriches user experience while being not only fast and compact but also secure. We share a number of useful techniques to meet the stated development constraints and also describe offline and online evaluation pipelines that allowed us to make better decisions.
Our online evaluation shows that the usage of the tool leads to 1.5 times more code in the IDE being produced by code completion. The described solution was initially started with the help of researchers and was bundled into two JetBrains' IDEs - PyCharm Pro and DataSpell - at the end of 2023, so we believe that this work is useful for bridging academia and industry, providing researchers with the knowledge of what happens when complex research-based solutions are integrated into real products. - [251] arXiv:2405.08706 [pdf, ps, other]
-
Title: Design and Analysis of Resilient Vehicular Platoon Systems over Wireless NetworksComments: 6 pages, 4 figures, in submission of Globecom 2024Subjects: Systems and Control (eess.SY)
Connected vehicular platoons provide a promising solution to improve traffic efficiency and ensure road safety. Vehicles in a platoon utilize on-board sensors and wireless vehicle-to-vehicle (V2V) links to share traffic information for cooperative adaptive cruise control. To process real-time control and alert information, there is a need to ensure clock synchronization among the platoon's vehicles. However, adversaries can jeopardize the operation of the platoon by attacking the local clocks of vehicles, leading to clock offsets with the platoon's reference clock. In this paper, a novel framework is proposed for analyzing the resilience of vehicular platoons that are connected using V2V links. In particular, a resilient design based on a diffusion protocol is proposed to re-synchronize the attacked vehicle through wireless V2V links thereby mitigating the impact of variance of the transmission delay during recovery. Then, a novel metric named temporal conditional mean exceedance is defined and analyzed in order to characterize the resilience of the platoon. Subsequently, the conditions pertaining to the V2V links and recovery time needed for a resilient design are derived. Numerical results show that the proposed resilient design is feasible in face of a nine-fold increase in the variance of transmission delay compared to a baseline designed for reliability. Moreover, the proposed approach improves the reliability, defined as the probability of meeting a desired clock offset error requirement, by 45% compared to the baseline.
- [252] arXiv:2405.08707 [pdf, ps, html, other]
-
Title: Beyond Scaling Laws: Understanding Transformer Performance with Associative MemorySubjects: Machine Learning (cs.LG)
Increasing the size of a Transformer model does not always lead to enhanced performance. This phenomenon cannot be explained by the empirical scaling laws. Furthermore, improved generalization ability occurs as the model memorizes the training samples. We present a theoretical framework that sheds light on the memorization process and performance dynamics of transformer-based language models. We model the behavior of Transformers with associative memories using Hopfield networks, such that each transformer block effectively conducts an approximate nearest-neighbor search. Based on this, we design an energy function analogous to that in the modern continuous Hopfield network which provides an insightful explanation for the attention mechanism. Using the majorization-minimization technique, we construct a global energy function that captures the layered architecture of the Transformer. Under specific conditions, we show that the minimum achievable cross-entropy loss is bounded from below by a constant approximately equal to 1. We substantiate our theoretical results by conducting experiments with GPT-2 on various data sizes, as well as training vanilla Transformers on a dataset of 2M tokens.
- [253] arXiv:2405.08709 [pdf, ps, other]
-
Title: Multi-Task Private Semantic CommunicationSubjects: Information Theory (cs.IT)
We study a multi-task private semantic communication problem, in which an encoder has access to an information source arbitrarily correlated with some latent private data. A user has $L$ tasks with priorities. The encoder designs a message to be revealed which is called the semantic of the information source. Due to the privacy constraints the semantic can not be disclosed directly and the encoder adds noise to produce disclosed data. The goal is to design the disclosed data that maximizes the weighted sum of the utilities achieved by the user while satisfying a privacy constraint on the private data. In this work, we first consider a single-task scenario and design the added noise utilizing various methods including the extended versions of the Functional Representation Lemma, Strong Functional Representation Lemma, and separation technique. We then study the multi-task scenario and derive a simple design of the source semantics. We show that in the multi-task scenario the main problem can be divided into multiple parallel single-task problems.
- [254] arXiv:2405.08710 [pdf, ps, html, other]
-
Title: An Analytic Solution to the 3D CSC Dubins Path ProblemComments: 7 pages, 8 figures, presented at IEEE ICRA this https URL 2024 IEEE International Conference on Robotics and Automation in PACIFICO Yokohama May 13th to 17th, 2024Subjects: Robotics (cs.RO)
We present an analytic solution to the 3D Dubins path problem for paths composed of an initial circular arc, a straight component, and a final circular arc. These are commonly called CSC paths. By modeling the start and goal configurations of the path as the base frame and final frame of an RRPRR manipulator, we treat this as an inverse kinematics problem. The kinematic features of the 3D Dubins path are built into the constraints of our manipulator model. Furthermore, we show that the number of solutions is not constant, with up to seven valid CSC path solutions even in non-singular regions. An implementation of solution is available at this https URL.
- [255] arXiv:2405.08711 [pdf, ps, html, other]
-
Title: Data-driven Force Observer for Human-Robot Interaction with Series Elastic Actuators using Gaussian ProcessesSubjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
Ensuring safety and adapting to the user's behavior are of paramount importance in physical human-robot interaction. Thus, incorporating elastic actuators in the robot's mechanical design has become popular, since it offers intrinsic compliance and additionally provide a coarse estimate for the interaction force by measuring the deformation of the elastic components. While observer-based methods have been shown to improve these estimates, they rely on accurate models of the system, which are challenging to obtain in complex operating environments. In this work, we overcome this issue by learning the unknown dynamics components using Gaussian process (GP) regression. By employing the learned model in a Bayesian filtering framework, we improve the estimation accuracy and additionally obtain an observer that explicitly considers local model uncertainty in the confidence measure of the state estimate. Furthermore, we derive guaranteed estimation error bounds, thus, facilitating the use in safety-critical applications. We demonstrate the effectiveness of the proposed approach experimentally in a human-exoskeleton interaction scenario.
- [256] arXiv:2405.08715 [pdf, ps, html, other]
-
Title: DeVOS: Flow-Guided Deformable Transformer for Video Object SegmentationVolodymyr Fedynyak, Yaroslav Romanus, Bohdan Hlovatskyi, Bohdan Sydor, Oles Dobosevych, Igor Babin, Roman RiazantsevSubjects: Computer Vision and Pattern Recognition (cs.CV)
The recent works on Video Object Segmentation achieved remarkable results by matching dense semantic and instance-level features between the current and previous frames for long-time propagation. Nevertheless, global feature matching ignores scene motion context, failing to satisfy temporal consistency. Even though some methods introduce local matching branch to achieve smooth propagation, they fail to model complex appearance changes due to the constraints of the local window. In this paper, we present DeVOS (Deformable VOS), an architecture for Video Object Segmentation that combines memory-based matching with motion-guided propagation resulting in stable long-term modeling and strong temporal consistency. For short-term local propagation, we propose a novel attention mechanism ADVA (Adaptive Deformable Video Attention), allowing the adaption of similarity search region to query-specific semantic features, which ensures robust tracking of complex shape and scale changes. DeVOS employs an optical flow to obtain scene motion features which are further injected to deformable attention as strong priors to learnable offsets. Our method achieves top-rank performance on DAVIS 2017 val and test-dev (88.1%, 83.0%), YouTube-VOS 2019 val (86.6%) while featuring consistent run-time speed and stable memory consumption
- [257] arXiv:2405.08717 [pdf, ps, other]
-
Title: How Much You Ate? Food Portion Estimation on SpoonsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Monitoring dietary intake is a crucial aspect of promoting healthy living. In recent years, advances in computer vision technology have facilitated dietary intake monitoring through the use of images and depth cameras. However, the current state-of-the-art image-based food portion estimation algorithms assume that users take images of their meals one or two times, which can be inconvenient and fail to capture food items that are not visible from a top-down perspective, such as ingredients submerged in a stew. To address these limitations, we introduce an innovative solution that utilizes stationary user-facing cameras to track food items on utensils, not requiring any change of camera perspective after installation. The shallow depth of utensils provides a more favorable angle for capturing food items, and tracking them on the utensil's surface offers a significantly more accurate estimation of dietary intake without the need for post-meal image capture. The system is reliable for estimation of nutritional content of liquid-solid heterogeneous mixtures such as soups and stews. Through a series of experiments, we demonstrate the exceptional potential of our method as a non-invasive, user-friendly, and highly accurate dietary intake monitoring tool.
- [258] arXiv:2405.08720 [pdf, ps, html, other]
-
Title: The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling PerspectiveComments: To appear at CVPR 2024 Workshop on AI for Content Creation (AI4CC)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Text-to-video generation task has witnessed a notable progress, with the generated outcomes reflecting the text prompts with high fidelity and impressive visual qualities. However, current text-to-video generation models are invariably focused on conveying the visual elements of a single scene, and have so far been indifferent to another important potential of the medium, namely a storytelling. In this paper, we examine text-to-video generation from a storytelling perspective, which has been hardly investigated, and make empirical remarks that spotlight the limitations of current text-to-video generation scheme. We also propose an evaluation framework for storytelling aspects of videos, and discuss the potential future directions.
- [259] arXiv:2405.08721 [pdf, ps, other]
-
Title: A regularized eigenmatrix method for unstructured sparse recoveryComments: 11 pages, 5 figuresSubjects: Numerical Analysis (math.NA)
The recently developed data-driven eigenmatrix method shows very promising reconstruction accuracy in sparse recovery for a wide range of kernel functions and random sample locations. However, its current implementation can lead to numerical instability if the threshold tolerance is not appropriately chosen. To incorporate regularization techniques, we propose to regularize the eigenmatrix method by replacing the computation of an ill-conditioned pseudo-inverse by the solution of an ill-conditioned least square system, which can be efficiently treated by Tikhonov regularization. Extensive numerical examples confirmed the improved effectiveness of our proposed method, especially when the noise levels are relatively high.
- [260] arXiv:2405.08726 [pdf, ps, other]
-
Title: I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement LearningSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
This paper addresses the critical need for refining robot motions that, despite achieving a high visual similarity through human-to-humanoid retargeting methods, fall short of practical execution in the physical realm. Existing techniques in the graphics community often prioritize visual fidelity over physics-based feasibility, posing a significant challenge for deploying bipedal systems in practical applications. Our research introduces a constrained reinforcement learning algorithm to produce physics-based high-quality motion imitation onto legged humanoid robots that enhance motion resemblance while successfully following the reference human trajectory. We name our framework: I-CTRL. By reformulating the motion imitation problem as a constrained refinement over non-physics-based retargeted motions, our framework excels in motion imitation with simple and unique rewards that generalize across four robots. Moreover, our framework can follow large-scale motion datasets with a unique RL agent. The proposed approach signifies a crucial step forward in advancing the control of bipedal robots, emphasizing the importance of aligning visual and physical realism for successful motion imitation.
- [261] arXiv:2405.08729 [pdf, ps, other]
-
Title: Targeted Augmentation for Low-Resource Event ExtractionComments: 15 pages, NAACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Addressing the challenge of low-resource information extraction remains an ongoing issue due to the inherent information scarcity within limited training examples. Existing data augmentation methods, considered potential solutions, struggle to strike a balance between weak augmentation (e.g., synonym augmentation) and drastic augmentation (e.g., conditional generation without proper guidance). This paper introduces a novel paradigm that employs targeted augmentation and back validation to produce augmented examples with enhanced diversity, polarity, accuracy, and coherence. Extensive experimental results demonstrate the effectiveness of the proposed paradigm. Furthermore, identified limitations are discussed, shedding light on areas for future improvement.
- [262] arXiv:2405.08731 [pdf, ps, other]
-
Title: Dynamic On-Palm Manipulation via Controlled SlidingComments: Project website: this https URLSubjects: Robotics (cs.RO)
Non-prehensile manipulation enables fast interactions with objects by circumventing the need to grasp and ungrasp as well as handling objects that cannot be grasped through force closure. Current approaches to non-prehensile manipulation focus on static contacts, avoiding the underactuation that comes with sliding. However, the ability to control sliding contact, essentially removing the no-slip constraint, opens up new possibilities in dynamic manipulation. In this paper, we explore a challenging dynamic non-prehensile manipulation task that requires the consideration of the full spectrum of hybrid contact modes. We leverage recent methods in contact-implicit MPC to handle the multi-modal planning aspect of the task. We demonstrate, with careful consideration of integration between the simple model used for MPC and the low-level tracking controller, how contact-implicit MPC can be adapted to dynamic tasks. Surprisingly, despite the known inaccuracies of frictional rigid contact models, our method is able to react to these inaccuracies while still quickly performing the task. Moreover, we do not use common aids such as reference trajectories or motion primitives, highlighting the generality of our approach. To the best of our knowledge, this is the first application of contact-implicit MPC to a dynamic manipulation task in three dimensions.
- [263] arXiv:2405.08732 [pdf, ps, other]
-
Title: Multi-Server Multi-Function Distributed ComputationComments: Submitted to EntropySubjects: Information Theory (cs.IT)
The work here studies the communication cost for a multi-server multi-task distributed computation framework, and does so for a broad class of functions and data statistics. Considering the framework where a user seeks the computation of multiple complex (conceivably non-linear) tasks from a set of distributed servers, we establish communication cost upper bounds for a variety of data statistics, function classes and data placements across the servers. To do so, we proceed to apply, for the first time here, Körner's characteristic graph approach -- which is known to capture the structural properties of data and functions -- to the promising framework of multi-server multi-task distributed computing. Going beyond the general expressions, and in order to offer clearer insight, we also consider the well-known scenario of cyclic dataset placement and linearly separable functions over the binary field, in which case our approach exhibits considerable gains over the state of art. Similar gains are identified for the case of multi-linear functions.
- [264] arXiv:2405.08733 [pdf, ps, other]
-
Title: A Simple Approach to Differentiable Rendering of SDFsSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
We present a simple algorithm for differentiable rendering of surfaces represented by Signed Distance Fields (SDF), which makes it easy to integrate rendering into gradient-based optimization pipelines. To tackle visibility-related derivatives that make rendering non-differentiable, existing physically based differentiable rendering methods often rely on elaborate guiding data structures or reparameterization with a global impact on variance. In this article, we investigate an alternative that embraces nonzero bias in exchange for low variance and architectural simplicity. Our method expands the lower-dimensional boundary integral into a thin band that is easy to sample when the underlying surface is represented by an SDF. We demonstrate the performance and robustness of our formulation in end-to-end inverse rendering tasks, where it obtains results that are competitive with or superior to existing work.
- [265] arXiv:2405.08740 [pdf, ps, other]
-
Title: Reinformer: Max-Return Sequence Modeling for offline RLComments: ICML 2024Subjects: Machine Learning (cs.LG)
As a data-driven paradigm, offline reinforcement learning (RL) has been formulated as sequence modeling that conditions on the hindsight information including returns, goal or future trajectory. Although promising, this supervised paradigm overlooks the core objective of RL that maximizes the return. This overlook directly leads to the lack of trajectory stitching capability that affects the sequence model learning from sub-optimal data. In this work, we introduce the concept of max-return sequence modeling which integrates the goal of maximizing returns into existing sequence models. We propose Reinforced Transformer (Reinformer), indicating the sequence model is reinforced by the RL objective. Reinformer additionally incorporates the objective of maximizing returns in the training phase, aiming to predict the maximum future return within the distribution. During inference, this in-distribution maximum return will guide the selection of optimal actions. Empirically, Reinformer is competitive with classical RL methods on the D4RL benchmark and outperforms state-of-the-art sequence model particularly in trajectory stitching ability. Code is public at \url{this https URL}.
- [266] arXiv:2405.08741 [pdf, ps, other]
-
Title: On Maximal Families of Binary Polynomials with Pairwise Linear Common FactorsComments: 5 pages. Extended abstract submitted to BFA 2024Subjects: Discrete Mathematics (cs.DM); Cryptography and Security (cs.CR); Combinatorics (math.CO)
We consider the construction of maximal families of polynomials over the finite field $\mathbb{F}_q$, all having the same degree $n$ and a nonzero constant term, where the degree of the GCD of any two polynomials is $d$ with $1 \le d\le n$. The motivation for this problem lies in a recent construction for subspace codes based on cellular automata. More precisely, the minimum distance of such subspace codes relates to the maximum degree $d$ of the pairwise GCD in this family of polynomials. Hence, characterizing the maximal families of such polynomials is equivalent to determining the maximum cardinality of the corresponding subspace codes for a given minimum distance. We first show a lower bound on the cardinality of such families, and then focus on the specific case where $d=1$. There, we characterize the maximal families of polynomials over the binary field $\mathbb{F}_2$. Our findings prompt several more open questions, which we plan to address in an extended version of this work.
- [267] arXiv:2405.08748 [pdf, ps, other]
-
Title: Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese UnderstandingZhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, Jianchen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Mingtao Chen, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin LuComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at this http URL
- [268] arXiv:2405.08751 [pdf, ps, other]
-
Title: From Text to Context: An Entailment Approach for News Stakeholder ClassificationComments: Accepted in SIGIR 2024Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Navigating the complex landscape of news articles involves understanding the various actors or entities involved, referred to as news stakeholders. These stakeholders, ranging from policymakers to opposition figures, citizens, and more, play pivotal roles in shaping news narratives. Recognizing their stakeholder types, reflecting their roles, political alignments, social standing, and more, is paramount for a nuanced comprehension of news content. Despite existing works focusing on salient entity extraction, coverage variations, and political affiliations through social media data, the automated detection of stakeholder roles within news content remains an underexplored domain. In this paper, we bridge this gap by introducing an effective approach to classify stakeholder types in news articles. Our method involves transforming the stakeholder classification problem into a natural language inference task, utilizing contextual information from news articles and external knowledge to enhance the accuracy of stakeholder type detection. Moreover, our proposed model showcases efficacy in zero-shot settings, further extending its applicability to diverse news contexts.
- [269] arXiv:2405.08754 [pdf, ps, other]
-
Title: Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning ApproachComments: Published in: 2023 IEEE International Conference on Cluster Computing (CLUSTER)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
GPU-based heterogeneous architectures are now commonly used in HPC clusters. Due to their architectural simplicity specialized for data-level parallelism, GPUs can offer much higher computational throughput and memory bandwidth than CPUs in the same generation do. However, as the available resources in GPUs have increased exponentially over the past decades, it has become increasingly difficult for a single program to fully utilize them. As a consequence, the industry has started supporting several resource partitioning features in order to improve the resource utilization by co-scheduling multiple programs on the same GPU die at the same time. Driven by the technological trend, this paper focuses on hierarchical resource partitioning on modern GPUs, and as an example, we utilize a combination of two different features available on recent NVIDIA GPUs in a hierarchical manner: MPS (Multi-Process Service), a finer-grained logical partitioning; and MIG (Multi-Instance GPU), a coarse-grained physical partitioning. We propose a method for comprehensively co-optimizing the setup of hierarchical partitioning and the selection of co-scheduling groups from a given set of jobs, based on reinforcement learning using their profiles. Our thorough experimental results demonstrate that our approach can successfully set up job concurrency, partitioning, and co-scheduling group selections simultaneously. This results in a maximum throughput improvement by a factor of 1.87 compared to the time-sharing scheduling.
- [270] arXiv:2405.08755 [pdf, ps, other]
-
Title: Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven ApproachSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
With the proliferation of edge devices, there is a significant increase in attack surface on these devices. The decentralized deployment of threat intelligence on edge devices, coupled with adaptive machine learning techniques such as the in-context learning feature of large language models (LLMs), represents a promising paradigm for enhancing cybersecurity on low-powered edge devices. This approach involves the deployment of lightweight machine learning models directly onto edge devices to analyze local data streams, such as network traffic and system logs, in real-time. Additionally, distributing computational tasks to an edge server reduces latency and improves responsiveness while also enhancing privacy by processing sensitive data locally. LLM servers can enable these edge servers to autonomously adapt to evolving threats and attack patterns, continuously updating their models to improve detection accuracy and reduce false positives. Furthermore, collaborative learning mechanisms facilitate peer-to-peer secure and trustworthy knowledge sharing among edge devices, enhancing the collective intelligence of the network and enabling dynamic threat mitigation measures such as device quarantine in response to detected anomalies. The scalability and flexibility of this approach make it well-suited for diverse and evolving network environments, as edge devices only send suspicious information such as network traffic and system log changes, offering a resilient and efficient solution to combat emerging cyber threats at the network edge. Thus, our proposed framework can improve edge computing security by providing better security in cyber threat detection and mitigation by isolating the edge devices from the network.
- [271] arXiv:2405.08756 [pdf, ps, other]
-
Title: Stable Inverse Reinforcement Learning: Policies from Control Lyapunov LandscapesSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, it is also computationally demanding and generally lacks convergence guarantees. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world data.
- [272] arXiv:2405.08760 [pdf, ps, other]
-
Title: Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Humans often express their communicative intents indirectly or non-literally, which requires their interlocutors -- human or AI -- to understand beyond the literal meaning of words. While most existing work has focused on discriminative evaluations, we present a new approach to generatively evaluate large language models' (LLMs') intention understanding by examining their responses to non-literal utterances. Ideally, an LLM should respond in line with the true intention of a non-literal utterance, not its literal interpretation. Our findings show that LLMs struggle to generate pragmatically relevant responses to non-literal language, achieving only 50-55% accuracy on average. While explicitly providing oracle intentions significantly improves performance (e.g., 75% for Mistral-Instruct), this still indicates challenges in leveraging given intentions to produce appropriate responses. Using chain-of-thought to make models spell out intentions yields much smaller gains (60% for Mistral-Instruct). These findings suggest that LLMs are not yet effective pragmatic interlocutors, highlighting the need for better approaches for modeling intentions and utilizing them for pragmatic generation.
- [273] arXiv:2405.08762 [pdf, ps, other]
-
Title: S3C2 Summit 2024-03: Industry Secure Supply Chain SummitGreg Tystahl, Yasemin Acar, Michel Cukier, William Enck, Christian Kastner, Alexandros Kapravelos, Dominik Wermke, Laurie WilliamsComments: This is our WIP paper on the Summit. More versions will be released soonSubjects: Cryptography and Security (cs.CR)
Supply chain security has become a very important vector to consider when defending against adversary attacks. Due to this, more and more developers are keen on improving their supply chains to make them more robust against future threats. On March 7th, 2024 researchers from the Secure Software Supply Chain Center (S3C2) gathered 14 industry leaders, developers and consumers of the open source ecosystem to discuss the state of supply chain security. The goal of the summit is to share insights between companies and developers alike to foster new collaborations and ideas moving forward. Through this meeting, participants were questions on best practices and thoughts how to improve things for the future. In this paper we summarize the responses and discussions of the summit. The panel questions can be found in the appendix.
- [274] arXiv:2405.08764 [pdf, ps, other]
-
Title: A Generalized Curvilinear Coordinate system-based Patch Dynamics Scheme in Equation-free Multiscale ModellingComments: 21 pages, 8 figures, 4 tablesSubjects: Numerical Analysis (math.NA)
The patch dynamics scheme in equation-free multiscale modelling can efficiently predict the macroscopic behaviours by simulating the microscale problem in a fraction of the space-time domain. The patch dynamics schemes developed so far, are mainly on rectangular domains with uniform grids and uniform rectangular patches. In real-life problems where the geometry of the domain is not regular or simple, rectangular and uniform grids or patches may not be useful. To address this kind of complexity, the concept of a generalized curvilinear coordinate system is used. An explicit representation of a patch dynamics scheme on a generalized curvilinear coordinate system in a two-dimensional domain is proposed for evolution equations. It has been applied to unsteady convection-diffusion-reaction (CDR) problems. The robustness of the scheme on the generalized curvilinear coordinate system is assessed through numerical test cases. Firstly, a convection-dominated CDR equation is considered, featuring high gradient regions in some part of the domain, for which stretched grids with non-uniform patch sizes are employed. Secondly, a non-axisymmetric diffusion equation is examined in an annulus region, where the patches have non-rectangular shapes. The results obtained demonstrate excellent agreement with the analytical solution or existing numerical solutions.
- [275] arXiv:2405.08765 [pdf, ps, other]
-
Title: Image to Pseudo-Episode: Boosting Few-Shot Segmentation by Unlabeled DataSubjects: Computer Vision and Pattern Recognition (cs.CV)
Few-shot segmentation (FSS) aims to train a model which can segment the object from novel classes with a few labeled samples. The insufficient generalization ability of models leads to unsatisfactory performance when the models lack enough labeled data from the novel classes. Considering that there are abundant unlabeled data available, it is promising to improve the generalization ability by exploiting these various data. For leveraging unlabeled data, we propose a novel method, named Image to Pseudo-Episode (IPE), to generate pseudo-episodes from unlabeled data. Specifically, our method contains two modules, i.e., the pseudo-label generation module and the episode generation module. The former module generates pseudo-labels from unlabeled images by the spectral clustering algorithm, and the latter module generates pseudo-episodes from pseudo-labeled images by data augmentation methods. Extensive experiments on PASCAL-$5^i$ and COCO-$20^i$ demonstrate that our method achieves the state-of-the-art performance for FSS.
- [276] arXiv:2405.08766 [pdf, ps, other]
-
Title: Energy-based Hopfield Boosting for Out-of-Distribution DetectionSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Out-of-distribution (OOD) detection is critical when deploying machine learning models in the real world. Outlier exposure methods, which incorporate auxiliary outlier data in the training process, can drastically improve OOD detection performance compared to approaches without advanced training strategies. We introduce Hopfield Boosting, a boosting approach, which leverages modern Hopfield energy (MHE) to sharpen the decision boundary between the in-distribution and OOD data. Hopfield Boosting encourages the model to concentrate on hard-to-distinguish auxiliary outlier examples that lie close to the decision boundary between in-distribution and auxiliary outlier data. Our method achieves a new state-of-the-art in OOD detection with outlier exposure, improving the FPR95 metric from 2.28 to 0.92 on CIFAR-10 and from 11.76 to 7.94 on CIFAR-100.
- [277] arXiv:2405.08768 [pdf, ps, other]
-
Title: EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone TrainingComments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Journal version of arXiv:2211.09703 (ICCV 2023). Code is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The superior performance of modern visual backbones usually comes with a costly training procedure. We contribute to this issue by generalizing the idea of curriculum learning beyond its original formulation, i.e., training models using easier-to-harder data. Specifically, we reformulate the training curriculum as a soft-selection function, which uncovers progressively more difficult patterns within each example during training, instead of performing easier-to-harder sample selection. Our work is inspired by an intriguing observation on the learning dynamics of visual backbones: during the earlier stages of training, the model predominantly learns to recognize some 'easier-to-learn' discriminative patterns in the data. These patterns, when observed through frequency and spatial domains, incorporate lower-frequency components, and the natural image contents without distortion or data augmentation. Motivated by these findings, we propose a curriculum where the model always leverages all the training data at every learning stage, yet the exposure to the 'easier-to-learn' patterns of each example is initiated first, with harder patterns gradually introduced as training progresses. To implement this idea in a computationally efficient way, we introduce a cropping operation in the Fourier spectrum of the inputs, enabling the model to learn from only the lower-frequency components. Then we show that exposing the contents of natural images can be readily achieved by modulating the intensity of data augmentation. Finally, we integrate these aspects and design curriculum schedules with tailored search algorithms. The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective. It reduces the training time of a wide variety of popular models by 1.5-3.0x on ImageNet-1K/22K without sacrificing accuracy. It also demonstrates efficacy in self-supervised learning (e.g., MAE).
- [278] arXiv:2405.08770 [pdf, ps, other]
-
Title: An optimization-based construction procedure for function space based summation-by-parts operators on arbitrary gridsComments: 18 pages, 8 FiguresSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
We introduce a novel construction procedure for one-dimensional summation-by-parts (SBP) operators. Existing construction procedures for FSBP operators of the form $D = P^{-1} Q$ proceed as follows: Given a boundary operator $B$, the norm matrix $P$ is first determined and then in a second step the complementary matrix $Q$ is calculated to finally get the FSBP operator $D$. In contrast, the approach proposed here determines the norm and complementary matrices, $P$ and $Q$, simultaneously by solving an optimization problem. The proposed construction procedure applies to classical SBP operators based on polynomial approximation and the broader class of function space SBP (FSBP) operators. According to our experiments, the presented approach yields a numerically stable construction procedure and FSBP operators with higher accuracy for diagonal norm difference operators at the boundaries than the traditional approach. Through numerical simulations, we highlight the advantages of our proposed technique.
- [279] arXiv:2405.08776 [pdf, ps, other]
-
Title: FolkTalent: Enhancing Classification and Tagging of Indian Folk PaintingsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Indian folk paintings have a rich mosaic of symbols, colors, textures, and stories making them an invaluable repository of cultural legacy. The paper presents a novel approach to classifying these paintings into distinct art forms and tagging them with their unique salient features. A custom dataset named FolkTalent, comprising 2279 digital images of paintings across 12 different forms, has been prepared using websites that are direct outlets of Indian folk paintings. Tags covering a wide range of attributes like color, theme, artistic style, and patterns are generated using GPT4, and verified by an expert for each painting. Classification is performed employing the RandomForest ensemble technique on fine-tuned Convolutional Neural Network (CNN) models to classify Indian folk paintings, achieving an accuracy of 91.83%. Tagging is accomplished via the prominent fine-tuned CNN-based backbones with a custom classifier attached to its top to perform multi-label image classification. The generated tags offer a deeper insight into the painting, enabling an enhanced search experience based on theme and visual attributes. The proposed hybrid model sets a new benchmark in folk painting classification and tagging, significantly contributing to cataloging India's folk-art heritage.
- [280] arXiv:2405.08779 [pdf, ps, html, other]
-
Title: Jacobian Regularizer-based Neural Granger CausalityComments: 20 pages, 7 figures, ICML 2024Subjects: Machine Learning (cs.LG)
With the advancement of neural networks, diverse methods for neural Granger causality have emerged, which demonstrate proficiency in handling complex data, and nonlinear relationships. However, the existing framework of neural Granger causality has several limitations. It requires the construction of separate predictive models for each target variable, and the relationship depends on the sparsity on the weights of the first layer, resulting in challenges in effectively modeling complex relationships between variables as well as unsatisfied estimation accuracy of Granger causality. Moreover, most of them cannot grasp full-time Granger causality. To address these drawbacks, we propose a Jacobian Regularizer-based Neural Granger Causality (JRNGC) approach, a straightforward yet highly effective method for learning multivariate summary Granger causality and full-time Granger causality by constructing a single model for all target variables. Specifically, our method eliminates the sparsity constraints of weights by leveraging an input-output Jacobian matrix regularizer, which can be subsequently represented as the weighted causal matrix in the post-hoc analysis. Extensive experiments show that our proposed approach achieves competitive performance with the state-of-the-art methods for learning summary Granger causality and full-time Granger causality while maintaining lower model complexity and high scalability.
- [281] arXiv:2405.08780 [pdf, ps, other]
-
Title: Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modelingGregory Holste, Mingquan Lin, Ruiwen Zhou, Fei Wang, Lei Liu, Qi Yan, Sarah H. Van Tassel, Kyle Kovacs, Emily Y. Chew, Zhiyong Lu, Zhangyang Wang, Yifan PengSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and primary open-angle glaucoma (POAG), patients undergo repeated imaging over time to track disease progression and forecasting the future risk of developing disease is critical to properly plan treatment. Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging, modeling the time to disease from sequences of fundus photography images captured over long, irregular time periods. Using longitudinal imaging data from the Age-Related Eye Disease Study (AREDS) and Ocular Hypertension Treatment Study (OHTS), LTSA significantly outperformed a single-image baseline in 19/20 head-to-head comparisons on late AMD prognosis and 18/20 comparisons on POAG prognosis. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value.
- [282] arXiv:2405.08784 [pdf, ps, other]
-
Title: Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on InstagramSubjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. OpenAI's GPT series models were compared against human annotation. Frequent terms with a high false-positive rate were removed from the dictionary. Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI's GPT series models fare worse than human annotators in this task.
- [283] arXiv:2405.08786 [pdf, ps, other]
-
Title: Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS ScoringSubjects: Computer Vision and Pattern Recognition (cs.CV)
The Prostate Imaging Reporting and Data System (PI-RADS) is pivotal in the diagnosis of clinically significant prostate cancer through MRI imaging. Current deep learning-based PI-RADS scoring methods often lack the incorporation of essential PI-RADS clinical guidelines~(PICG) utilized by radiologists, potentially compromising scoring accuracy. This paper introduces a novel approach that adapts a multi-modal large language model (MLLM) to incorporate PICG into PI-RADS scoring without additional annotations and network parameters. We present a two-stage fine-tuning process aimed at adapting MLLMs originally trained on natural images to the MRI data domain while effectively integrating the PICG. In the first stage, we develop a domain adapter layer specifically tailored for processing 3D MRI image inputs and design the MLLM instructions to differentiate MRI modalities effectively. In the second stage, we translate PICG into guiding instructions for the model to generate PICG-guided image features. Through feature distillation, we align scoring network features with the PICG-guided image feature, enabling the scoring network to effectively incorporate the PICG information. We develop our model on a public dataset and evaluate it in a real-world challenging in-house dataset. Experimental results demonstrate that our approach improves the performance of current scoring networks.
- [284] arXiv:2405.08787 [pdf, ps, other]
-
Title: Explicit Orthogonal Arrays and Universal Hashing with Arbitrary ParametersSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Combinatorics (math.CO); Statistics Theory (math.ST)
Orthogonal arrays are a type of combinatorial design that were developed in the 1940s in the design of statistical experiments. In 1947, Rao proved a lower bound on the size of any orthogonal array, and raised the problem of constructing arrays of minimum size. Kuperberg, Lovett and Peled (2017) gave a non-constructive existence proof of orthogonal arrays whose size is near-optimal (i.e., within a polynomial of Rao's lower bound), leaving open the question of an algorithmic construction. We give the first explicit, deterministic, algorithmic construction of orthogonal arrays achieving near-optimal size for all parameters. Our construction uses algebraic geometry codes.
In pseudorandomness, the notions of $t$-independent generators or $t$-independent hash functions are equivalent to orthogonal arrays. Classical constructions of $t$-independent hash functions are known when the size of the codomain is a prime power, but very few constructions are known for an arbitrary codomain. Our construction yields algorithmically efficient $t$-independent hash functions for arbitrary domain and codomain. - [285] arXiv:2405.08788 [pdf, ps, html, other]
-
Title: Using application conditions to rank graph transformations for graph repairSubjects: Software Engineering (cs.SE)
When using graphs and graph transformations to model systems, consistency is an important concern. While consistency has primarily been viewed as a binary property, i.e., a graph is consistent or inconsistent with respect to a set of constraints, recent work has presented an approach to consistency as a graduated property. This allows living with inconsistencies for a while and repairing them when necessary. When repairing inconsistencies in a graph, we use graph transformation rules with so-called impairment- and repair-indicating application conditions to understand how much repair gain certain rule applications would bring. Both types of conditions can be derived from given graph constraints. Our main theorem shows that the difference between the number of actual constraint violations before and after a graph transformation step can be characterized by the difference between the numbers of violated impairment-indicating and repair-indicating application conditions. This theory forms the basis for algorithms with look-ahead that rank graph transformations according to their potential for graph repair. An initial evaluation shows that graph repair can be well supported by rules with these new types of application conditions.
- [286] arXiv:2405.08792 [pdf, ps, other]
-
Title: Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMsEdison Jair Bejarano Sepulveda, Nicolai Potes Hector, Santiago Pineda Montoya, Felipe Ivan Rodriguez, Jaime Enrique Orduy, Alec Rosales Cabezas, Danny Traslaviña Navarrete, Sergio Madrid FarfanSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper explores the potential of large language models (LLMs) to make the Aeronautical Regulations of Colombia (RAC) more accessible. Given the complexity and extensive technicality of the RAC, this study introduces a novel approach to simplifying these regulations for broader understanding. By developing the first-ever RAC database, which contains 24,478 expertly labeled question-and-answer pairs, and fine-tuning LLMs specifically for RAC applications, the paper outlines the methodology for dataset assembly, expert-led annotation, and model training. Utilizing the Gemma1.1 2b model along with advanced techniques like Unsloth for efficient VRAM usage and flash attention mechanisms, the research aims to expedite training processes. This initiative establishes a foundation to enhance the comprehensibility and accessibility of RAC, potentially benefiting novices and reducing dependence on expert consultations for navigating the aviation industry's regulatory landscape.
You can visit the dataset (this https URL) and the model (this https URL) here. - [287] arXiv:2405.08793 [pdf, ps, html, other]
-
Title: A Brief Introduction to Causal Inference in Machine LearningSubjects: Machine Learning (cs.LG)
This is a lecture note produced for DS-GA 3001.003 "Special Topics in DS - Causal Inference in Machine Learning" at the Center for Data Science, New York University in Spring, 2024. This course was created to target master's and PhD level students with basic background in machine learning but who were not exposed to causal inference or causal reasoning in general previously. In particular, this course focuses on introducing such students to expand their view and knowledge of machine learning to incorporate causal reasoning, as this aspect is at the core of so-called out-of-distribution generalization (or lack thereof.)
- [288] arXiv:2405.08794 [pdf, ps, other]
-
Title: Ambiguous Annotations: When is a Pedestrian not a Pedestrian?Comments: Paper accepted at the CVPR 2024 Vision and Language for Autonomous Driving and Robotics WorkshopSubjects: Computer Vision and Pattern Recognition (cs.CV)
Datasets labelled by human annotators are widely used in the training and testing of machine learning models. In recent years, researchers are increasingly paying attention to label quality. However, it is not always possible to objectively determine whether an assigned label is correct or not. The present work investigates this ambiguity in the annotation of autonomous driving datasets as an important dimension of data quality. Our experiments show that excluding highly ambiguous data from the training improves model performance of a state-of-the-art pedestrian detector in terms of LAMR, precision and F1 score, thereby saving training time and annotation costs. Furthermore, we demonstrate that, in order to safely remove ambiguous instances and ensure the retained representativeness of the training data, an understanding of the properties of the dataset and class under investigation is crucial.
- [289] arXiv:2405.08800 [pdf, ps, other]
-
Title: Estimation of Participation Factors for Power System Oscillation from MeasurementsSubjects: Systems and Control (eess.SY)
In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under selected disturbances. The approach computes extended participation factors that coincide with accurate model-based participation factors when the measured responses satisfy an ideally symmetric condition. This paper relaxes this symmetric condition with the original measurement space by identifying and utilizing a coordinate transformation to a new space optimally recovering the symmetry. Thus, the optimal estimates of participation factors solely from measurements are achieved, and the accuracy and influencing factors are discussed. The proposed approach is first demonstrated in detail on a two-area system and then tested on an NPCC 48-machine power system. The penetration of inverter-based resources is also considered.
- [290] arXiv:2405.08807 [pdf, ps, html, other]
-
Title: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure InterpretationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Large multimodal models (LMMs) have proven flexible and generalisable across many tasks and fields. Although they have strong potential to aid scientific research, their capabilities in this domain are not well characterised. A key aspect of scientific research is the ability to understand and interpret figures, which serve as a rich, compressed source of complex information. In this work, we present SciFIBench, a scientific figure interpretation benchmark. Our main benchmark consists of a 1000-question gold set of multiple-choice questions split between two tasks across 12 categories. The questions are curated from CS arXiv paper figures and captions, using adversarial filtering to find hard negatives and human verification for quality control. We evaluate 26 LMMs on SciFIBench, finding it to be a challenging benchmark. Finally, we investigate the alignment and reasoning faithfulness of the LMMs on augmented question sets from our benchmark. We release SciFIBench to encourage progress in this domain.
- [291] arXiv:2405.08813 [pdf, ps, other]
-
Title: CinePile: A Long Video Question Answering Dataset and BenchmarkComments: Project page with all the artifacts - this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
Current datasets for long-form video understanding often fall short of providing genuine long-form comprehension challenges, as many tasks derived from these datasets can be successfully tackled by analyzing just one or a few random frames from a video. To address this issue, we present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding. This paper details our innovative approach for creating a question-answer dataset, utilizing advanced LLMs with human-in-the-loop and building upon human-generated raw data. Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects, including temporal comprehension, understanding human-object interactions, and reasoning about events or actions within a scene. Additionally, we evaluate recent video-centric LLMs, both open-source and proprietary, on the test split of our dataset. The findings reveal that even state-of-the-art video-centric LLMs significantly lag behind human performance in these tasks, highlighting the complexity and challenge inherent in video understanding. The dataset is available at this https URL
- [292] arXiv:2405.08815 [pdf, ps, other]
-
Title: Efficient Vision-Language Pre-training by Cluster MaskingSubjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a simple strategy for masking image patches during visual-language contrastive learning that improves the quality of the learned representations and the training speed. During each iteration of training, we randomly mask clusters of visually similar image patches, as measured by their raw pixel intensities. This provides an extra learning signal, beyond the contrastive training itself, since it forces a model to predict words for masked visual structures solely from context. It also speeds up training by reducing the amount of data used in each image. We evaluate the effectiveness of our model by pre-training on a number of benchmarks, finding that it outperforms other masking strategies, such as FLIP, on the quality of the learned representation.
- [293] arXiv:2405.08816 [pdf, ps, other]
-
Title: The RoboDrive Challenge: Drive Anytime Anywhere in Any ConditionLingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang, Zeliang Ma, Dengyi Ji, Haiwen Li, Xingliang Huang, Yu Tian, Genghua Kou, Fan Jia, Yingfei Liu, Tiancai Wang, Ying Li, Xiaoshuai Hao, Yifan Yang, Hui Zhang, Mengchuan Wei, Yi Zhou, Haimei Zhao, Jing Zhang, Jinke Li, Xiao He, Xiaoqiang Cheng, Bingyang Zhang, Lirong Zhao, Dianlei Ding, Fangsheng Liu, Yixiang Yan, Hongming Wang, Nanfei Ye, Lun Luo, Yubo Tian, Yiwei Zuo, Zhe Cao, Yi Ren, Yunfan Li, Wenjie Liu, Xun Wu, Yifan Mao, Ming Li, Jian Liu, Jiayang Liu, Zihan Qin, Cunxi Chu, Jialei Xu, Wenbo Zhao, Junjun Jiang, Xianming Liu, Ziyan Wang, Chiwei Li, Shilong Li, Chendong Yuan, Songyue Yang, Wentao Liu, Peng Chen, Bin Zhou, Yubo Wang, Chi Zhang, Jianhang Sun, Hai Chen, Xiao Yang, Lizhong Wang, Dongyi Fu, Yongchun Lin, Huitong Yang, Haoang Li, Yadan Luo, Xianjing Cheng, Yong XuComments: ICRA 2024; 31 pages, 24 figures, 5 tables; Code at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.
New submissions for Wednesday, 15 May 2024 (showing 293 of 293 entries )
- [294] arXiv:2207.14006 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Dynamics of qudit gates and effects of spectator modes on optimal control pulsesComments: 6 pages, 1 figure. Published in Physical Review AJournal-ref: Phys. Rev. A 109, 052404 (2024)Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
Qudit gates for high-dimensional quantum computing can be synthesized with high precision using numerical quantum optimal control techniques. Large circuits are broken down into modules and the tailored pulses for each module can be used as primitives for a qudit compiler. Application of the pulses of each module in the presence of extra modes may decrease their effectiveness due to crosstalk. In this paper, we address this problem by simulating qudit dynamics for circuit quantum electrodynamics (cQED) systems. As a test case, we take pulses for single-qudit SWAP gates optimized in isolation and then apply them in the presence of spectator modes each of which are in Fock states. We provide an experimentally relevant scaling formula that can be used as a bound on the fidelity decay. Our results show that frequency shift from spectator mode populations has to be $\lesssim 0.1\%$ of the qudit's nonlinearity in order for high-fidelity single-qudit gates to be useful in the presence of occupied spectator modes.
- [295] arXiv:2405.07994 (cross-list from eess.IV) [pdf, ps, other]
-
Title: BubbleID: A Deep Learning Framework for Bubble Interface Dynamics AnalysisComments: 16 pages, 4 figuresSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This paper presents BubbleID, a sophisticated deep learning architecture designed to comprehensively identify both static and dynamic attributes of bubbles within sequences of boiling images. By amalgamating segmentation powered by Mask R-CNN with SORT-based tracking techniques, the framework is capable of analyzing each bubble's location, dimensions, interface shape, and velocity over its lifetime, and capturing dynamic events such as bubble departure. BubbleID is trained and tested on boiling images across diverse heater surfaces and operational settings. This paper also offers a comparative analysis of bubble interface dynamics prior to and post-critical heat flux (CHF) conditions.
- [296] arXiv:2405.08001 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: Preconditioned Nonlinear Conjugate Gradient Method for Real-time Interior-point HyperelasticitySubjects: Optimization and Control (math.OC); Graphics (cs.GR)
The linear conjugate gradient method is widely used in physical simulation, particularly for solving large-scale linear systems derived from Newton's method. The nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization, which is extensively utilized in solving practical large-scale unconstrained optimization problems. However, it is rarely discussed in physical simulation due to the requirement of multiple vector-vector dot products. Fortunately, with the advancement of GPU-parallel acceleration techniques, it is no longer a bottleneck. In this paper, we propose a Jacobi preconditioned nonlinear conjugate gradient method for elastic deformation using interior-point methods. Our method is straightforward, GPU-parallelizable, and exhibits fast convergence and robustness against large time steps. The employment of the barrier function in interior-point methods necessitates continuous collision detection per iteration to obtain a penetration-free step size, which is computationally expensive and challenging to parallelize on GPUs. To address this issue, we introduce a line search strategy that deduces an appropriate step size in a single pass, eliminating the need for additional collision detection. Furthermore, we simplify and accelerate the computations of Jacobi preconditioning and Hessian-vector product for hyperelasticity and barrier function. Our method can accurately simulate objects comprising over 100,000 tetrahedra in complex self-collision scenarios at real-time speeds.
- [297] arXiv:2405.08003 (cross-list from math.FA) [pdf, ps, html, other]
-
Title: Continuous Krishna-Parthasarathy Entropic Uncertainty PrincipleComments: 7 pages, 0 FiguresJournal-ref: Special issue of Infinite Dimensional Analysis, Quantum Probability and Related Topics in honour of Prof. K. R. Parthasarathy, 18 March 2024Subjects: Functional Analysis (math.FA); Information Theory (cs.IT); Operator Algebras (math.OA); Quantum Algebra (math.QA)
In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to the special case of compact groups.
- [298] arXiv:2405.08005 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: Graphon Mean Field Games with A Representative Player: Analysis and Learning AlgorithmSubjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)
We propose a discrete-time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence.
- [299] arXiv:2405.08045 (cross-list from q-fin.MF) [pdf, ps, other]
-
Title: Comparative analysis of neural network architectures for short-term FOREX forecastingSubjects: Mathematical Finance (q-fin.MF); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The present document delineates the analysis, design, implementation, and benchmarking of various neural network architectures within a short-term frequency prediction system for the foreign exchange market (FOREX). Our aim is to simulate the judgment of the human expert (technical analyst) using a system that responds promptly to changes in market conditions, thus enabling the optimization of short-term trading strategies. We designed and implemented a series of LSTM neural network architectures which are taken as input the exchange rate values and generate the short-term market trend forecasting signal and an ANN custom architecture based on technical analysis indicator simulators We performed a comparative analysis of the results and came to useful conclusions regarding the suitability of each architecture and the cost in terms of time and computational power to implement them. The ANN custom architecture produces better prediction quality with higher sensitivity using fewer resources and spending less time than LSTM architectures. The ANN custom architecture appears to be ideal for use in low-power computing systems and for use cases that need fast decisions with the least possible computational cost.
- [300] arXiv:2405.08047 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: Autonomous Sparse Mean-CVaR Portfolio OptimizationComments: ICML 2024Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Portfolio Management (q-fin.PM)
The $\ell_0$-constrained mean-CVaR model poses a significant challenge due to its NP-hard nature, typically tackled through combinatorial methods characterized by high computational demands. From a markedly different perspective, we propose an innovative autonomous sparse mean-CVaR portfolio model, capable of approximating the original $\ell_0$-constrained mean-CVaR model with arbitrary accuracy. The core idea is to convert the $\ell_0$ constraint into an indicator function and subsequently handle it through a tailed approximation. We then propose a proximal alternating linearized minimization algorithm, coupled with a nested fixed-point proximity algorithm (both convergent), to iteratively solve the model. Autonomy in sparsity refers to retaining a significant portion of assets within the selected asset pool during adjustments in pool size. Consequently, our framework offers a theoretically guaranteed approximation of the $\ell_0$-constrained mean-CVaR model, improving computational efficiency while providing a robust asset selection scheme.
- [301] arXiv:2405.08049 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Optimizing Synthetic Correlated Diffusion Imaging for Breast Cancer Tumour DelineationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Breast cancer is a significant cause of death from cancer in women globally, highlighting the need for improved diagnostic imaging to enhance patient outcomes. Accurate tumour identification is essential for diagnosis, treatment, and monitoring, emphasizing the importance of advanced imaging technologies that provide detailed views of tumour characteristics and disease. Synthetic correlated diffusion imaging (CDI$^s$) is a recent method that has shown promise for prostate cancer delineation compared to current MRI images. In this paper, we explore tuning the coefficients in the computation of CDI$^s$ for breast cancer tumour delineation by maximizing the area under the receiver operating characteristic curve (AUC) using a Nelder-Mead simplex optimization strategy. We show that the best AUC is achieved by the CDI$^s$ - Optimized modality, outperforming the best gold-standard modality by 0.0044. Notably, the optimized CDI$^s$ modality also achieves AUC values over 0.02 higher than the Unoptimized CDI$^s$ value, demonstrating the importance of optimizing the CDI$^s$ exponents for the specific cancer application.
- [302] arXiv:2405.08053 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Radio Resource Management and Path Planning in Intelligent Transportation Systems via Reinforcement Learning for Environmental SustainabilityComments: 6 pages, 5 figures, accepted and presented in International Conference on Innovation and Technological Advances for SustainabilitySubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Efficient and dynamic path planning has become an important topic for urban areas with larger density of connected vehicles (CV) which results in reduction of travel time and directly contributes to environmental sustainability through reducing energy consumption. CVs exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the vehicle-to-infrastructure (V2I) messages to the Base-station (BS) to improve situation awareness on urban roads. In this paper, we investigate radio resource management (RRM) in such a framework to minimize the age of information (AoI) so as to enhance path planning results. We use the fact that V2I messages with lower AoI value result in less error in estimating the road capacity and more accurate path planning. Through simulations, we compare road travel times and volume over capacity (V/C) against different levels of AoI and demonstrate the promising performance of the proposed framework.
- [303] arXiv:2405.08068 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: An Abstract Model and Efficient Routing for Logical Entangling Gates on Zoned Neutral Atom ArchitecturesComments: 12 pages, 19 figuresSubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
Recent experimental achievements have demonstrated the potential of neutral atom architectures for fault-tolerant quantum computing. These architectures feature the dynamic rearrangement of atoms during computation, enabling nearly arbitrary two-dimensional rearrangements. Additionally, they employ a zoned layout with dedicated regions for entangling, storage, and readout. This architecture requires design automation software that efficiently compiles quantum circuits to this hardware and takes care that atoms are in the right place at the right time. In this paper, we initiate this line of work by providing, (1) an abstract model of the novel architecture and, (2) an efficient solution to the routing problem of entangling gates. By this, we aim to maximize the parallelism of entangling gates and minimize the overhead caused by the routing of atoms between zones. In addition to that, we keep the realm of fault-tolerant quantum computing in mind and consider logical qubit arrays, each of which encodes one logical qubit. We implemented the proposed idea as a tool called NALAC and demonstrated its effectiveness and efficiency by showing that it can significantly reduce the routing overhead of logical entangling gates compared to the naive approach. As part of the Munich Quantum Toolkit (MQT), NALAC is publicly available as open-source at this https URL.
- [304] arXiv:2405.08076 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Show Me the Way: Real-Time Tracking of Wireless Mobile Users with UWB-Enabled RISComments: 6 pages, 12 figures, submitted to 19th International Symposium on Wireless Communication Systems (ISWCS 2024)Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
The integration of Reconfigurable Intelligent Surfaces (RIS) in 6G wireless networks offers unprecedented control over communication environments. However, identifying optimal configurations within practical constraints remains a significant challenge. This becomes especially pronounced, when the user is mobile and the configurations need to be deployed in real time. Leveraging Ultra-Wideband (UWB) as localization technique, we capture and analyze real-time movements of a user within the RIS-enabled indoor environment. Given this information about the system's geometry, a model-based optimization is utilized, which enables real-time beam steering of the RIS towards the user. However, practical limitations of UWB modules lead to fluctuating UWB estimates, causing the RIS beam to occasionally miss the tracked user. The methodologies proposed in this work aim to increase the compatibility between these two systems. To this end, we provide two key solutions: beam splitting for obtaining more robust RIS configurations and UWB estimation correction for reducing the variations in the UWB data. Through comprehensive theoretical and experimental evaluations in both stationary and mobile scenarios, the effectiveness of the proposed techniques is demonstrated. When combined, the proposed methods improve worst-case tracking performance by a significant 17.5dB compared to the conventional approach.
- [305] arXiv:2405.08089 (cross-list from q-fin.ST) [pdf, ps, html, other]
-
Title: Comparative Study of Bitcoin Price PredictionSubjects: Statistical Finance (q-fin.ST); Machine Learning (cs.LG)
Prediction of stock prices has been a crucial and challenging task, especially in the case of highly volatile digital currencies such as Bitcoin. This research examineS the potential of using neural network models, namely LSTMs and GRUs, to forecast Bitcoin's price movements. We employ five-fold cross-validation to enhance generalization and utilize L2 regularization to reduce overfitting and noise. Our study demonstrates that the GRUs models offer better accuracy than LSTMs model for predicting Bitcoin's price. Specifically, the GRU model has an MSE of 4.67, while the LSTM model has an MSE of 6.25 when compared to the actual prices in the test set data. This finding indicates that GRU models are better equipped to process sequential data with long-term dependencies, a characteristic of financial time series data such as Bitcoin prices. In summary, our results provide valuable insights into the potential of neural network models for accurate Bitcoin price prediction and emphasize the importance of employing appropriate regularization techniques to enhance model performance.
- [306] arXiv:2405.08096 (cross-list from eess.AS) [pdf, ps, html, other]
-
Title: Semantic MIMO Systems for Speech-to-Text TransmissionSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve the speech-to-text task at the receiver is first designed, which compresses the semantic information and generates the low-dimensional semantic features by leveraging the transformer module. In addition, a novel semantic-aware network is proposed to facilitate the transmission with high semantic fidelity to identify the critical semantic information and guarantee it is recovered accurately. Furthermore, we extend the SAC-ST with a neural network-enabled channel estimation network to mitigate the dependence on accurate channel state information and validate the feasibility of SAC-ST in practical communication environments. Simulation results will show that the proposed SAC-ST outperforms the communication framework without the semantic-aware network for speech-to-text transmission over the MIMO channels in terms of the speech-to-text metrics, especially in the low signal-to-noise regime. Moreover, the SAC-ST with the developed channel estimation network is comparable to the SAC-ST with perfect channel state information.
- [307] arXiv:2405.08100 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Graph Neural Networks for Parameterized Quantum Circuits Expressibility EstimationSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
Parameterized quantum circuits (PQCs) are fundamental to quantum machine learning (QML), quantum optimization, and variational quantum algorithms (VQAs). The expressibility of PQCs is a measure that determines their capability to harness the full potential of the quantum state space. It is thus a crucial guidepost to know when selecting a particular PQC ansatz. However, the existing technique for expressibility computation through statistical estimation requires a large number of samples, which poses significant challenges due to time and computational resource constraints. This paper introduces a novel approach for expressibility estimation of PQCs using Graph Neural Networks (GNNs). We demonstrate the predictive power of our GNN model with a dataset consisting of 25,000 samples from the noiseless IBM QASM Simulator and 12,000 samples from three distinct noisy quantum backends. The model accurately estimates expressibility, with root mean square errors (RMSE) of 0.05 and 0.06 for the noiseless and noisy backends, respectively. We compare our model's predictions with reference circuits [Sim and others, QuTe'2019] and IBM Qiskit's hardware-efficient ansatz sets to further evaluate our model's performance. Our experimental evaluation in noiseless and noisy scenarios reveals a close alignment with ground truth expressibility values, highlighting the model's efficacy. Moreover, our model exhibits promising extrapolation capabilities, predicting expressibility values with low RMSE for out-of-range qubit circuits trained solely on only up to 5-qubit circuit sets. This work thus provides a reliable means of efficiently evaluating the expressibility of diverse PQCs on noiseless simulators and hardware.
- [308] arXiv:2405.08101 (cross-list from q-fin.CP) [pdf, ps, other]
-
Title: Can machine learning unlock new insights into high-frequency trading?Comments: 56 pages, 4 figures, 8 tablesSubjects: Computational Finance (q-fin.CP); Machine Learning (cs.LG)
We design and train machine learning models to capture the nonlinear interactions between financial market dynamics and high-frequency trading (HFT) activity. In doing so, we introduce new metrics to identify liquidity-demanding and -supplying HFT strategies. Both types of HFT strategies increase activity in response to information events and decrease it when trading speed is restricted, with liquidity-supplying strategies demonstrating greater responsiveness. Liquidity-demanding HFT is positively linked with latency arbitrage opportunities, whereas liquidity-supplying HFT is negatively related, aligning with theoretical expectations. Our metrics have implications for understanding the information production process in financial markets.
- [309] arXiv:2405.08137 (cross-list from physics.comp-ph) [pdf, ps, html, other]
-
Title: LATTE: an atomic environment descriptor based on Cartesian tensor contractionsComments: 7 pages, 1 figureSubjects: Computational Physics (physics.comp-ph); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
We propose a new descriptor for local atomic environments, to be used in combination with machine learning models for the construction of interatomic potentials. The Local Atomic Tensors Trainable Expansion (LATTE) allows for the efficient construction of a variable number of many-body terms with learnable parameters, resulting in a descriptor that is efficient, expressive, and can be scaled to suit different accuracy and computational cost requirements. We compare this new descriptor to existing ones on several systems, showing it to be competitive with very fast potentials at one end of the spectrum, and extensible to an accuracy close to the state of the art.
- [310] arXiv:2405.08169 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Rethinking Histology Slide Digitization Workflows for Low-Resource SettingsComments: MICCAI 2024 Early Accept. First four authors contributed equallySubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Histology slide digitization is becoming essential for telepathology (remote consultation), knowledge sharing (education), and using the state-of-the-art artificial intelligence algorithms (augmented/automated end-to-end clinical workflows). However, the cumulative costs of digital multi-slide high-speed brightfield scanners, cloud/on-premises storage, and personnel (IT and technicians) make the current slide digitization workflows out-of-reach for limited-resource settings, further widening the health equity gap; even single-slide manual scanning commercial solutions are costly due to hardware requirements (high-resolution cameras, high-spec PC/workstation, and support for only high-end microscopes). In this work, we present a new cloud slide digitization workflow for creating scanner-quality whole-slide images (WSIs) from uploaded low-quality videos, acquired from cheap and inexpensive microscopes with built-in cameras. Specifically, we present a pipeline to create stitched WSIs while automatically deblurring out-of-focus regions, upsampling input 10X images to 40X resolution, and reducing brightness/contrast and light-source illumination variations. We demonstrate the WSI creation efficacy from our workflow on World Health Organization-declared neglected tropical disease, Cutaneous Leishmaniasis (prevalent only in the poorest regions of the world and only diagnosed by sub-specialist dermatopathologists, rare in poor countries), as well as other common pathologies on core biopsies of breast, liver, duodenum, stomach and lymph node. The code and pretrained models will be accessible via our GitHub (this https URL), and the cloud platform will be available at this https URL for uploading microscope videos and downloading/viewing WSIs with shareable links (no sign-in required) for telepathology and knowledge sharing.
- [311] arXiv:2405.08179 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Do Bayesian imaging methods report trustworthy probabilities?Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Signal Processing (eess.SP); Applications (stat.AP); Machine Learning (stat.ML)
Bayesian statistics is a cornerstone of imaging sciences, underpinning many and varied approaches from Markov random fields to score-based denoising diffusion models. In addition to powerful image estimation methods, the Bayesian paradigm also provides a framework for uncertainty quantification and for using image data as quantitative evidence. These probabilistic capabilities are important for the rigorous interpretation of experimental results and for robust interfacing of quantitative imaging pipelines with scientific and decision-making processes. However, are the probabilities delivered by existing Bayesian imaging methods meaningful under replication of an experiment, or are they only meaningful as subjective measures of belief? This paper presents a Monte Carlo method to explore this question. We then leverage the proposed Monte Carlo method and run a large experiment requiring 1,000 GPU-hours to probe the accuracy of five canonical Bayesian imaging methods that are representative of some of the main Bayesian imaging strategies from the past decades (a score-based denoising diffusion technique, a plug-and-play Langevin algorithm utilising a Lipschitz-regularised DnCNN denoiser, a Bayesian method with a dictionary-based prior trained subject to a log-concavity constraint, an empirical Bayesian method with a total-variation prior, and a hierarchical Bayesian Gibbs sampler based on a Gaussian Markov random field model). We find that, a few cases, the probabilities reported by modern Bayesian imaging techniques are in broad agreement with long-term averages as observed over a large number of replication of an experiment, but existing Bayesian imaging methods are generally not able to deliver reliable uncertainty quantification results.
- [312] arXiv:2405.08185 (cross-list from physics.flu-dyn) [pdf, ps, html, other]
-
Title: Probabilistic Flux LimitersComments: The paper spans 10 pages and includes 5 figuresSubjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
The stable numerical integration of shocks in compressible flow simulations relies on the reduction or elimination of Gibbs phenomena (unstable, spurious oscillations). A popular method to virtually eliminate Gibbs oscillations caused by numerical discretization in under-resolved simulations is to use a flux limiter. A wide range of flux limiters has been studied in the literature, with recent interest in their optimization via machine learning methods trained on high-resolution datasets. The common use of flux limiters in numerical codes as plug-and-play blackbox components makes them key targets for design improvement. Moreover, while aleatoric (inherent randomness) and epistemic (lack of knowledge) uncertainty is commonplace in fluid dynamical systems, these effects are generally ignored in the design of flux limiters. Even for deterministic dynamical models, numerical uncertainty is introduced via coarse-graining required by insufficient computational power to solve all scales of motion. Here, we introduce a conceptually distinct type of flux limiter that is designed to handle the effects of randomness in the model and uncertainty in model parameters. This new, {\it probabilistic flux limiter}, learned with high-resolution data, consists of a set of flux limiting functions with associated probabilities, which define the frequencies of selection for their use. Using the example of Burgers' equation, we show that a machine learned, probabilistic flux limiter may be used in a shock capturing code to more accurately capture shock profiles. In particular, we show that our probabilistic flux limiter outperforms standard limiters, and can be successively improved upon (up to a point) by expanding the set of probabilistically chosen flux limiting functions.
- [313] arXiv:2405.08190 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Barren plateaus induced by the dimension of quditsSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
Variational Quantum Algorithms (VQAs) have emerged as pivotal strategies for attaining quantum advantages in diverse scientific and technological domains, notably within Quantum Neural Networks. However, despite their potential, VQAs encounter significant obstacles, chief among them being the gradient vanishing problem, commonly referred to as barren plateaus. In this study, we unveil a direct correlation between the dimension of qudits and the occurrence of barren plateaus, a connection previously overlooked. Through meticulous analysis, we demonstrate that existing literature implicitly suggests the intrinsic influence of qudit dimensionality on barren plateaus. To instantiate these findings, we present numerical results that exemplify the impact of qudit dimensionality on barren plateaus. Additionally, despite the proposition of various error mitigation techniques, our results call for further scrutiny about their efficacy in the context of VQAs with qudits.
- [314] arXiv:2405.08201 (cross-list from math.PR) [pdf, ps, other]
-
Title: Numerical approximation of the stochastic heat equation with a distributional reaction termSubjects: Probability (math.PR); Numerical Analysis (math.NA)
We study the numerical approximation of the stochastic heat equation with a distributional reaction term. Under a condition on the Besov regularity of the reaction term, it was proven recently that a strong solution exists and is unique in the pathwise sense, in a class of Hölder continuous processes. For a suitable choice of sequence $(b^k)_{k\in \mathbb{N}}$ approximating $b$, we prove that the error between the solution $u$ of the SPDE with reaction term $b$ and its tamed Euler finite-difference scheme with mollified drift $b^k$, converges to $0$ in $L^m(\Omega)$ with a rate that depends on the Besov regularity of $b$. In particular, one can consider two interesting cases: first, even when $b$ is only a (finite) measure, a rate of convergence is obtained. On the other hand, when $b$ is a bounded measurable function, the (almost) optimal rate of convergence $(\frac{1}{2}-\varepsilon)$-in space and $(\frac{1}{4}-\varepsilon)$-in time is achieved. Stochastic sewing techniques are used in the proofs, in particular to deduce new regularising properties of the discrete Ornstein-Uhlenbeck process.
- [315] arXiv:2405.08203 (cross-list from physics.soc-ph) [pdf, ps, html, other]
-
Title: Community detection in bipartite signed networks is highly dependent on parameter choiceSubjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI); Methodology (stat.ME)
Decision-making processes often involve voting. Human interactions with exogenous entities such as legislations or products can be effectively modeled as two-mode (bipartite) signed networks-where people can either vote positively, negatively, or abstain from voting on the entities. Detecting communities in such networks could help us understand underlying properties: for example ideological camps or consumer preferences. While community detection is an established practice separately for bipartite and signed networks, it remains largely unexplored in the case of bipartite signed networks. In this paper, we systematically evaluate the efficacy of community detection methods on bipartite signed networks using a synthetic benchmark and real-world datasets. Our findings reveal that when no communities are present in the data, these methods often recover spurious communities. When communities are present, the algorithms exhibit promising performance, although their performance is highly susceptible to parameter choice. This indicates that researchers using community detection methods in the context of bipartite signed networks should not take the communities found at face value: it is essential to assess the robustness of parameter choices or perform domain-specific external validation.
- [316] arXiv:2405.08225 (cross-list from math.ST) [pdf, ps, other]
-
Title: Linear Operator Approximate Message Passing (OpAMP)Comments: 31 pages, 5 figuresSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Probability (math.PR)
This paper introduces a framework for approximate message passing (AMP) in dynamic settings where the data at each iteration is passed through a linear operator. This framework is motivated in part by applications in large-scale, distributed computing where only a subset of the data is available at each iteration. An autoregressive memory term is used to mitigate information loss across iterations and a specialized algorithm, called projection AMP, is designed for the case where each linear operator is an orthogonal projection. Precise theoretical guarantees are provided for a class of Gaussian matrices and non-separable denoising functions. Specifically, it is shown that the iterates can be well-approximated in the high-dimensional limit by a Gaussian process whose second-order statistics are defined recursively via state evolution. These results are applied to the problem of estimating a rank-one spike corrupted by additive Gaussian noise using partial row updates, and the theory is validated by numerical simulations.
- [317] arXiv:2405.08235 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Additive-Effect Assisted LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
It is quite popular nowadays for researchers and data analysts holding different datasets to seek assistance from each other to enhance their modeling performance. We consider a scenario where different learners hold datasets with potentially distinct variables, and their observations can be aligned by a nonprivate identifier. Their collaboration faces the following difficulties: First, learners may need to keep data values or even variable names undisclosed due to, e.g., commercial interest or privacy regulations; second, there are restrictions on the number of transmission rounds between them due to e.g., communication costs. To address these challenges, we develop a two-stage assisted learning architecture for an agent, Alice, to seek assistance from another agent, Bob. In the first stage, we propose a privacy-aware hypothesis testing-based screening method for Alice to decide on the usefulness of the data from Bob, in a way that only requires Bob to transmit sketchy data. Once Alice recognizes Bob's usefulness, Alice and Bob move to the second stage, where they jointly apply a synergistic iterative model training procedure. With limited transmissions of summary statistics, we show that Alice can achieve the oracle performance as if the training were from centralized data, both theoretically and numerically.
- [318] arXiv:2405.08247 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Automated classification of multi-parametric body MRI seriesSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)
Multi-parametric MRI (mpMRI) studies are widely available in clinical practice for the diagnosis of various diseases. As the volume of mpMRI exams increases yearly, there are concomitant inaccuracies that exist within the DICOM header fields of these exams. This precludes the use of the header information for the arrangement of the different series as part of the radiologist's hanging protocol, and clinician oversight is needed for correction. In this pilot work, we propose an automated framework to classify the type of 8 different series in mpMRI studies. We used 1,363 studies acquired by three Siemens scanners to train a DenseNet-121 model with 5-fold cross-validation. Then, we evaluated the performance of the DenseNet-121 ensemble on a held-out test set of 313 mpMRI studies. Our method achieved an average precision of 96.6%, sensitivity of 96.6%, specificity of 99.6%, and F1 score of 96.6% for the MRI series classification task. To the best of our knowledge, we are the first to develop a method to classify the series type in mpMRI studies acquired at the level of the chest, abdomen, and pelvis. Our method has the capability for robust automation of hanging protocols in modern radiology practice.
- [319] arXiv:2405.08253 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Thompson Sampling for Infinite-Horizon Discounted Decision ProcessesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
We model a Markov decision process, parametrized by an unknown parameter, and study the asymptotic behavior of a sampling-based algorithm, called Thompson sampling. The standard definition of regret is not always suitable to evaluate a policy, especially when the underlying chain structure is general. We show that the standard (expected) regret can grow (super-)linearly and fails to capture the notion of learning in realistic settings with non-trivial state evolution. By decomposing the standard (expected) regret, we develop a new metric, called the expected residual regret, which forgets the immutable consequences of past actions. Instead, it measures regret against the optimal reward moving forward from the current period. We show that the expected residual regret of the Thompson sampling algorithm is upper bounded by a term which converges exponentially fast to 0. We present conditions under which the posterior sampling error of Thompson sampling converges to 0 almost surely. We then introduce the probabilistic version of the expected residual regret and present conditions under which it converges to 0 almost surely. Thus, we provide a viable concept of learning for sampling algorithms which will serve useful in broader settings than had been considered previously.
- [320] arXiv:2405.08275 (cross-list from math.OC) [pdf, ps, other]
-
Title: Power of $\ell_1$-Norm Regularized Kaczmarz Algorithms for High-Order Tensor RecoveryComments: arXiv admin note: text overlap with arXiv:2311.00783Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
Tensors serve as a crucial tool in the representation and analysis of complex, multi-dimensional data. As data volumes continue to expand, there is an increasing demand for developing optimization algorithms that can directly operate on tensors to deliver fast and effective computations. Many problems in real-world applications can be formulated as the task of recovering high-order tensors characterized by sparse and/or low-rank structures. In this work, we propose novel Kaczmarz algorithms with a power of the $\ell_1$-norm regularization for reconstructing high-order tensors by exploiting sparsity and/or low-rankness of tensor data. In addition, we develop both a block and an accelerated variant, along with a thorough convergence analysis of these algorithms. A variety of numerical experiments on both synthetic and real-world datasets demonstrate the effectiveness and significant potential of the proposed methods in image and video processing tasks, such as image sequence destriping and video deconvolution.
- [321] arXiv:2405.08276 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Scalable Subsampling Inference for Deep Neural NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest results on the approximation ability of DNN. More importantly, however, a non-random subsampling technique--scalable subsampling--is applied to construct a `subagged' DNN estimator. Under regularity conditions, it is shown that the subagged DNN estimator is computationally efficient without sacrificing accuracy for either estimation or prediction tasks. Beyond point estimation/prediction, we propose different approaches to build confidence and prediction intervals based on the subagged DNN estimator. In addition to being asymptotically valid, the proposed confidence/prediction intervals appear to work well in finite samples. All in all, the scalable subsampling DNN estimator offers the complete package in terms of statistical inference, i.e., (a) computational efficiency; (b) point estimation/prediction accuracy; and (c) allowing for the construction of practically useful confidence and prediction intervals.
- [322] arXiv:2405.08282 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Automatic Segmentation of the Kidneys and Cystic Renal Lesions on Non-Contrast CT Using a Convolutional Neural NetworkLucas Aronson (1), Ruben Ngnitewe Massaa (1), Syed Jamal Safdar Gardezi (1), Andrew L. Wentland (1,2,3) ((1) Department of Radiology, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA, (2) Department of Medical Physics, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA, (3) Department of Biomedical Engineering, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Objective: Automated segmentation tools are useful for calculating kidney volumes rapidly and accurately. Furthermore, these tools have the power to facilitate large-scale image-based artificial intelligence projects by generating input labels, such as for image registration algorithms. Prior automated segmentation models have largely ignored non-contrast computed tomography (CT) imaging. This work aims to implement and train a deep learning (DL) model to segment the kidneys and cystic renal lesions (CRLs) from non-contrast CT scans.
Methods: Manual segmentation of the kidneys and CRLs was performed on 150 non-contrast abdominal CT scans. The data were divided into an 80/20 train/test split and a deep learning (DL) model was trained to segment the kidneys and CRLs. Various scoring metrics were used to assess model performance, including the Dice Similarity Coefficient (DSC), Jaccard Index (JI), and absolute and percent error kidney volume and lesion volume. Bland-Altman (B-A) analysis was performed to compare manual versus DL-based kidney volumes.
Results: The DL model achieved a median kidney DSC of 0.934, median CRL DSC of 0.711, and total median study DSC of 0.823. Average volume errors were 0.9% for renal parenchyma, 37.0% for CRLs, and 2.2% overall. B-A analysis demonstrated that DL-based volumes tended to be greater than manual volumes, with a mean bias of +3.0 ml (+/- 2 SD of +/- 50.2 ml).
Conclusion: A deep learning model trained to segment kidneys and cystic renal lesions on non-contrast CT examinations was able to provide highly accurate segmentations, with a median kidney Dice Similarity Coefficient of 0.934.
Keywords: deep learning; kidney segmentation; artificial intelligence; convolutional neural networks. - [323] arXiv:2405.08284 (cross-list from econ.EM) [pdf, ps, other]
-
Title: Predicting NVIDIA's Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH ModelsComments: 7 pages, 4 figures, 2 tables, conference paperSubjects: Econometrics (econ.EM); Machine Learning (cs.LG); Applications (stat.AP)
Forecasting stock prices remains a considerable challenge in financial markets, bearing significant implications for investors, traders, and financial institutions. Amid the ongoing AI revolution, NVIDIA has emerged as a key player driving innovation across various sectors. Given its prominence, we chose NVIDIA as the subject of our study.
- [324] arXiv:2405.08306 (cross-list from math.OC) [pdf, ps, other]
-
Title: Flight Path Optimization with Optimal Control MethodSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to define the dynamic model of the aircraft in accordance with the controllable inputs and wind disturbances. Then we will identify a precise objective in terms of optimization and implement an optimization program to solve it under the circumstances of simulated real flight situation. Finally, the optimization result is validated and discussed by different scenarios.
- [325] arXiv:2405.08307 (cross-list from stat.ME) [pdf, ps, other]
-
Title: Sequential Maximal Updated Density Parameter Estimation for Dynamical Systems with Parameter DriftComments: 29 pages, 9 Figures, Code available at this https URLSubjects: Methodology (stat.ME); Numerical Analysis (math.NA); Other Statistics (stat.OT)
We present a novel method for generating sequential parameter estimates and quantifying epistemic uncertainty in dynamical systems within a data-consistent (DC) framework. The DC framework differs from traditional Bayesian approaches due to the incorporation of the push-forward of an initial density, which performs selective regularization in parameter directions not informed by the data in the resulting updated density. This extends a previous study that included the linear Gaussian theory within the DC framework and introduced the maximal updated density (MUD) estimate as an alternative to both least squares and maximum a posterior (MAP) estimates. In this work, we introduce algorithms for operational settings of MUD estimation in real or near-real time where spatio-temporal datasets arrive in packets to provide updated estimates of parameters and identify potential parameter drift. Computational diagnostics within the DC framework prove critical for evaluating (1) the quality of the DC update and MUD estimate and (2) the detection of parameter value drift. The algorithms are applied to estimate (1) wind drag parameters in a high-fidelity storm surge model, (2) thermal diffusivity field for a heat conductivity problem, and (3) changing infection and incubation rates of an epidemiological model.
- [326] arXiv:2405.08417 (cross-list from eess.AS) [pdf, ps, other]
-
Title: Simple and Efficient Quantization Techniques for Neural Speech CodingSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder has to be learned that allows for efficient transmission of the input audio signal. This discrete representation is typically generated by applying a quantizer to the output of the neural encoder. In almost all state-of-the-art neural audio coding approaches, this quantizer is realized as a Vector Quantizer (VQ) and a lot of effort has been spent to alleviate drawbacks of this quantization technique when used together with a neural audio coder. In this paper, we propose simple alternatives to VQ, which are based on projected Scalar Quantization (SQ). These quantization techniques do not need any additional losses, scheduling parameters or codebook storage thereby simplifying the training of neural audio codecs. Furthermore, we propose a new causal network architecture for neural speech coding that shows good performance at very low computational complexity.
- [327] arXiv:2405.08421 (cross-list from cond-mat.dis-nn) [pdf, ps, other]
-
Title: Faster algorithms for the alignment of sparse correlated Erd\"os-R\'enyi random graphsComments: 31 pagesSubjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Data Structures and Algorithms (cs.DS); Probability (math.PR); Statistics Theory (math.ST)
The correlated Erdös-Rényi random graph ensemble is a probability law on pairs of graphs with $n$ vertices, parametrized by their average degree $\lambda$ and their correlation coefficient $s$. It can be used as a benchmark for the graph alignment problem, in which the labels of the vertices of one of the graphs are reshuffled by an unknown permutation; the goal is to infer this permutation and thus properly match the pairs of vertices in both graphs. A series of recent works has unveiled the role of Otter's constant $\alpha$ (that controls the exponential rate of growth of the number of unlabeled rooted trees as a function of their sizes) in this problem: for $s>\sqrt{\alpha}$ and $\lambda$ large enough it is possible to recover in a time polynomial in $n$ a positive fraction of the hidden permutation. The exponent of this polynomial growth is however quite large and depends on the other parameters, which limits the range of applications of the algorithm. In this work we present a family of faster algorithms for this task, show through numerical simulations that their accuracy is only slightly reduced with respect to the original one, and conjecture that they undergo, in the large $\lambda$ limit, phase transitions at modified Otter's thresholds $\sqrt{\widehat{\alpha}}>\sqrt{\alpha}$, with $\widehat{\alpha}$ related to the enumeration of a restricted family of trees.
- [328] arXiv:2405.08423 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-ResolutionYihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming LiuSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high computational complexity. Here, we propose a simple yet efficient stereo image SR model called NAFRSSR, which is modified from the previous state-of-the-art model NAFSSR by introducing recursive connections and lightweighting the constituent modules. Our NAFRSSR model is composed of nonlinear activation free and group convolution-based blocks (NAFGCBlocks) and depth-separated stereo cross attention modules (DSSCAMs). The NAFGCBlock improves feature extraction and reduces number of parameters by removing the simple channel attention mechanism from NAFBlock and using group convolution. The DSSCAM enhances feature fusion and reduces number of parameters by replacing 1x1 pointwise convolution in SCAM with weight-shared 3x3 depthwise convolution. Besides, we propose to incorporate trainable edge detection operator into NAFRSSR to further improve the model performance. Four variants of NAFRSSR with different sizes, namely, NAFRSSR-Mobile (NAFRSSR-M), NAFRSSR-Tiny (NAFRSSR-T), NAFRSSR-Super (NAFRSSR-S) and NAFRSSR-Base (NAFRSSR-B) are designed, and they all exhibit fewer parameters, higher PSNR/SSIM, and faster speed than the previous state-of-the-art models. In particular, to the best of our knowledge, NAFRSSR-M is the lightest (0.28M parameters) and fastest (50 ms inference time) model achieving an average PSNR/SSIM as high as 24.657 dB/0.7622 on the benchmark datasets. Codes and models will be released at this https URL.
- [329] arXiv:2405.08431 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Similarity Metrics for MR Image-To-Image TranslationMelanie Dohmen (1), Mark Klemens (1), Ivo Baltruschat (1), Tuan Truong (1), Matthias Lenga (1) ((1) Bayer AG, Berlin, Germany)Comments: 29 pages, 6 figures, appendix with 5 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Image-to-image translation can create large impact in medical imaging, i.e. if images of a patient can be translated to another modality, type or sequence for better diagnosis. However, these methods must be validated by human reader studies, which are costly and restricted to small samples. Automatic evaluation of large samples to pre-evaluate and continuously improve methods before human validation is needed. In this study, we give an overview of reference and non-reference metrics for image synthesis assessment and investigate the ability of nine metrics, that need a reference (SSIM, MS-SSIM, PSNR, MSE, NMSE, MAE, LPIPS, NMI and PCC) and three non-reference metrics (BLUR, MSN, MNG) to detect 11 kinds of distortions in MR images from the BraSyn dataset. In addition we test a downstream segmentation metric and the effect of three normalization methods (Minmax, cMinMax and Zscore). Although PSNR and SSIM are frequently used to evaluate generative models for image-to-image-translation tasks in the medical domain, they show very specific shortcomings. SSIM ignores blurring but is very sensitive to intensity shifts in unnormalized MR images. PSNR is even more sensitive to different normalization methods and hardly measures the degree of distortions. Further metrics, such as LPIPS, NMI and DICE can be very useful to evaluate other similarity aspects. If the images to be compared are misaligned, most metrics are flawed. By carefully selecting and reasonably combining image similarity metrics, the training and selection of generative models for MR image synthesis can be improved. Many aspects of their output can be validated before final and costly evaluation by trained radiologists is conducted.
- [330] arXiv:2405.08467 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Equilibrium Propagation: the Quantum and the Thermal CasesSubjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI)
Equilibrium propagation is a recently introduced method to use and train artificial neural networks in which the network is at the minimum (more generally extremum) of an energy functional. Equilibrium propagation has shown good performance on a number of benchmark tasks. Here we extend equilibrium propagation in two directions. First we show that there is a natural quantum generalization of equilibrium propagation in which a quantum neural network is taken to be in the ground state (more generally any eigenstate) of the network Hamiltonian, with a similar training mechanism that exploits the fact that the mean energy is extremal on eigenstates. Second we extend the analysis of equilibrium propagation at finite temperature, showing that thermal fluctuations allow one to naturally train the network without having to clamp the output layer during training. We also study the low temperature limit of equilibrium propagation.
- [331] arXiv:2405.08484 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Universal replication of chaotic characteristics by classical and quantum machine learningComments: 8 pages, 4 figuresSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)
Replicating chaotic characteristics of non-linear dynamics by machine learning (ML) has recently drawn wide attentions. In this work, we propose that a ML model, trained to predict the state one-step-ahead from several latest historic states, can accurately replicate the bifurcation diagram and the Lyapunov exponents of discrete dynamic systems. The characteristics for different values of the hyper-parameters are captured universally by a single ML model, while the previous works considered training the ML model independently by fixing the hyper-parameters to be specific values. Our benchmarks on the one- and two-dimensional Logistic maps show that variational quantum circuit can reproduce the long-term characteristics with higher accuracy than the long short-term memory (a well-recognized classical ML model). Our work reveals an essential difference between the ML for the chaotic characteristics and that for standard tasks, from the perspective of the relation between performance and model complexity. Our results suggest that quantum circuit model exhibits potential advantages on mitigating over-fitting, achieving higher accuracy and stability.
- [332] arXiv:2405.08556 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Shape-aware synthesis of pathological lung CT scans using CycleGAN for enhanced semi-supervised lung segmentationComments: 14 pages, 7 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This paper addresses the problem of pathological lung segmentation, a significant challenge in medical image analysis, particularly pronounced in cases of peripheral opacities (severe fibrosis and consolidation) because of the textural similarity between lung tissue and surrounding areas. To overcome these challenges, this paper emphasizes the use of CycleGAN for unpaired image-to-image translation, in order to provide an augmentation method able to generate fake pathological images matching an existing ground truth. Although previous studies have employed CycleGAN, they often neglect the challenge of shape deformation, which is crucial for accurate medical image segmentation. Our work introduces an innovative strategy that incorporates additional loss functions. Specifically, it proposes an L1 loss based on the lung surrounding which shape is constrained to remain unchanged at the transition from the healthy to pathological domains. The lung surrounding is derived based on ground truth lung masks available in the healthy domain. Furthermore, preprocessing steps, such as cropping based on ribs/vertebra locations, are applied to refine the input for the CycleGAN, ensuring that the network focus on the lung region. This is essential to avoid extraneous biases, such as the zoom effect bias, which can divert attention from the main task. The method is applied to enhance in semi-supervised manner the lung segmentation process by employing a U-Net model trained with on-the-fly data augmentation incorporating synthetic pathological tissues generated by the CycleGAN model. Preliminary results from this research demonstrate significant qualitative and quantitative improvements, setting a new benchmark in the field of pathological lung segmentation. Our code is available at this https URL
- [333] arXiv:2405.08590 (cross-list from math.OC) [pdf, ps, other]
-
Title: Accelerated Alternating Direction Method of Multipliers Gradient Tracking for Distributed OptimizationComments: This paper has been accepted for publication at IEEE Control Systems LettersSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper presents a novel accelerated distributed algorithm for unconstrained consensus optimization over static undirected networks. The proposed algorithm combines the benefits of acceleration from momentum, the robustness of the alternating direction method of multipliers, and the computational efficiency of gradient tracking to surpass existing state-of-the-art methods in convergence speed, while preserving their computational and communication cost. First, we prove that, by applying momentum on the average dynamic consensus protocol over the estimates and gradient, we can study the algorithm as an interconnection of two singularly perturbed systems: the outer system connects the consensus variables and the optimization variables, and the inner system connects the estimates of the optimum and the auxiliary optimization variables. Next, we prove that, by adding momentum to the auxiliary dynamics, our algorithm always achieves faster convergence than the achievable linear convergence rate for the non-accelerated alternating direction method of multipliers gradient tracking algorithm case. Through simulations, we numerically show that our accelerated algorithm surpasses the existing accelerated and non-accelerated distributed consensus first-order optimization protocols in convergence speed.
- [334] arXiv:2405.08602 (cross-list from q-fin.RM) [pdf, ps, other]
-
Title: Optimizing Deep Reinforcement Learning for American Put Option HedgingSubjects: Risk Management (q-fin.RM); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
This paper contributes to the existing literature on hedging American options with Deep Reinforcement Learning (DRL). The study first investigates hyperparameter impact on hedging performance, considering learning rates, training episodes, neural network architectures, training steps, and transaction cost penalty functions. Results highlight the importance of avoiding certain combinations, such as high learning rates with a high number of training episodes or low learning rates with few training episodes and emphasize the significance of utilizing moderate values for optimal outcomes. Additionally, the paper warns against excessive training steps to prevent instability and demonstrates the superiority of a quadratic transaction cost penalty function over a linear version. This study then expands upon the work of Pickard et al. (2024), who utilize a Chebyshev interpolation option pricing method to train DRL agents with market calibrated stochastic volatility models. While the results of Pickard et al. (2024) showed that these DRL agents achieve satisfactory performance on empirical asset paths, this study introduces a novel approach where new agents at weekly intervals to newly calibrated stochastic volatility models. Results show DRL agents re-trained using weekly market data surpass the performance of those trained solely on the sale date. Furthermore, the paper demonstrates that both single-train and weekly-train DRL agents outperform the Black-Scholes Delta method at transaction costs of 1% and 3%. This practical relevance suggests that practitioners can leverage readily available market data to train DRL agents for effective hedging of options in their portfolios.
- [335] arXiv:2405.08608 (cross-list from math.CO) [pdf, ps, other]
-
Title: On the Paley RIP and Paley graph extractorComments: 10 pages, comments are welcomeSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Number Theory (math.NT)
Constructing explicit RIP matrices is an open problem in compressed sensing theory. In particular, it is quite challenging to construct explicit RIP matrices that break the square-root bottleneck. On the other hand, providing explicit $2$-source extractors is a fundamental problem in theoretical computer science, cryptography and combinatorics. Nowadays, there are only a few known constructions for explicit $2$-source extractors (with negligible errors) that break the half barrier for min-entropy.
In this paper, we establish a new connection between RIP matrices breaking the square-root bottleneck and $2$-source extractors breaking the half barrier for min-entropy. Here we focus on an RIP matrix (called the Paley ETF) and a $2$-source extractor (called the Paley graph extractor), where both are defined from quadratic residues over the finite field of odd prime order $p\equiv 1 \pmod{4}$. As a main result, we prove that if the Paley ETF breaks the square-root bottleneck, then the Paley graph extractor breaks the half barrier for min-entropy as well. Since it is widely believed that the Paley ETF breaks the square-root bottleneck, our result accordingly provides a new affirmative intuition on the conjecture for the Paley graph extractor by Benny Chor and Oded Goldreich. - [336] arXiv:2405.08613 (cross-list from math.DS) [pdf, ps, other]
-
Title: GN-SINDy: Greedy Sampling Neural Network in Sparse Identification of Nonlinear Partial Differential EquationsSubjects: Dynamical Systems (math.DS); Machine Learning (cs.LG)
The sparse identification of nonlinear dynamical systems (SINDy) is a data-driven technique employed for uncovering and representing the fundamental dynamics of intricate systems based on observational data. However, a primary obstacle in the discovery of models for nonlinear partial differential equations (PDEs) lies in addressing the challenges posed by the curse of dimensionality and large datasets. Consequently, the strategic selection of the most informative samples within a given dataset plays a crucial role in reducing computational costs and enhancing the effectiveness of SINDy-based algorithms. To this aim, we employ a greedy sampling approach to the snapshot matrix of a PDE to obtain its valuable samples, which are suitable to train a deep neural network (DNN) in a SINDy framework. SINDy based algorithms often consist of a data collection unit, constructing a dictionary of basis functions, computing the time derivative, and solving a sparse identification problem which ends to regularised least squares minimization. In this paper, we extend the results of a SINDy based deep learning model discovery (DeePyMoD) approach by integrating greedy sampling technique in its data collection unit and new sparsity promoting algorithms in the least squares minimization unit. In this regard we introduce the greedy sampling neural network in sparse identification of nonlinear partial differential equations (GN-SINDy) which blends a greedy sampling method, the DNN, and the SINDy algorithm. In the implementation phase, to show the effectiveness of GN-SINDy, we compare its results with DeePyMoD by using a Python package that is prepared for this purpose on numerous PDE discovery
- [337] arXiv:2405.08621 (cross-list from eess.IV) [pdf, ps, other]
-
Title: RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video ContentComments: 8pages, 2figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artefacts and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose a novel blind deep video quality assessment (VQA) method specifically for enhanced video content. It employs a new Recurrent Memory Transformer (RMT) based network architecture to obtain video quality representations, which is optimised through a novel content-quality-aware contrastive learning strategy based on a new database containing 13K training patches with enhanced content. The extracted quality representations are then combined through linear regression to generate video-level quality indices. The proposed method, RMT-BVQA, has been evaluated on the VDPVE (VQA Dataset for Perceptual Video Enhancement) database through a five-fold cross validation. The results show its superior correlation performance when compared to ten existing no-reference quality metrics.
- [338] arXiv:2405.08631 (cross-list from stat.CO) [pdf, ps, other]
-
Title: A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate DescentSubjects: Computation (stat.CO); Machine Learning (cs.LG); Mathematical Software (cs.MS); Software Engineering (cs.SE)
We develop fast and scalable algorithms based on block-coordinate descent to solve the group lasso and the group elastic net for generalized linear models along a regularization path. Special attention is given when the loss is the usual least squares loss (Gaussian loss). We show that each block-coordinate update can be solved efficiently using Newton's method and further improved using an adaptive bisection method, solving these updates with a quadratic convergence rate. Our benchmarks show that our package adelie performs 3 to 10 times faster than the next fastest package on a wide array of both simulated and real datasets. Moreover, we demonstrate that our package is a competitive lasso solver as well, matching the performance of the popular lasso package glmnet.
- [339] arXiv:2405.08635 (cross-list from math.OC) [pdf, ps, other]
-
Title: Approaches to iterative algorithms for solving nonlinear equations with an application in tomographic absorption spectroscopyComments: Accepted for publication in the journal: Communications in Optimization TheorySubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA); Medical Physics (physics.med-ph)
In this paper we propose an approach for solving systems of nonlinear equations without computing function derivatives. Motivated by the application area of tomographic absorption spectroscopy, which is a highly-nonlinear problem with variables coupling, we consider a situation where straightforward translation to a fixed point problem is not possible because the operators that represent the relevant systems of nonlinear equations are not self-mappings, i.e., they operate between spaces of different dimensions. To overcome this difficulty we suggest an "alternating common fixed points algorithm" that acts alternatingly on the different vector variables. This approach translates the original problem to a common fixed point problem for which iterative algorithms are abound and exhibits a viable alternative to translation to an optimization problem, which usually requires derivatives information. However, to apply any of these iterative algorithms requires to ascertain the conditions that appear in their convergence theorems. To circumvent the need to verify conditions for convergence, we propose and motivate a derivative-free algorithm that better suits the tomographic absorption spectroscopy problem at hand and is even further improved by applying to it the superiorization approach. This is presented along with experimental results that demonstrate our approach.
- [340] arXiv:2405.08636 (cross-list from cond-mat.mtrl-sci) [pdf, ps, other]
-
Title: Optimal design of experiments in the context of machine-learning inter-atomic potentials: improving the efficiency and transferability of kernel based methodsSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
Data-driven, machine learning (ML) models of atomistic interactions are often based on flexible and non-physical functions that can relate nuanced aspects of atomic arrangements into predictions of energies and forces. As a result, these potentials are as good as the training data (usually results of so-called ab initio simulations) and we need to make sure that we have enough information for a model to become sufficiently accurate, reliable and transferable. The main challenge stems from the fact that descriptors of chemical environments are often sparse high-dimensional objects without a well-defined continuous metric. Therefore, it is rather unlikely that any ad hoc method of choosing training examples will be indiscriminate, and it will be easy to fall into the trap of confirmation bias, where the same narrow and biased sampling is used to generate train- and test- sets. We will demonstrate that classical concepts of statistical planning of experiments and optimal design can help to mitigate such problems at a relatively low computational cost. The key feature of the method we will investigate is that they allow us to assess the informativeness of data (how much we can improve the model by adding/swapping a training example) and verify if the training is feasible with the current set before obtaining any reference energies and forces -- a so-called off-line approach. In other words, we are focusing on an approach that is easy to implement and doesn't require sophisticated frameworks that involve automated access to high-performance computational (HPC).
- [341] arXiv:2405.08657 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Self-supervised learning improves robustness of deep learning lung tumor segmentation to CT imaging differencesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Self-supervised learning (SSL) is an approach to extract useful feature representations from unlabeled data, and enable fine-tuning on downstream tasks with limited labeled examples. Self-pretraining is a SSL approach that uses the curated task dataset for both pretraining the networks and fine-tuning them. Availability of large, diverse, and uncurated public medical image sets provides the opportunity to apply SSL in the "wild" and potentially extract features robust to imaging variations. However, the benefit of wild- vs self-pretraining has not been studied for medical image analysis. In this paper, we compare robustness of wild versus self-pretrained transformer (vision transformer [ViT] and hierarchical shifted window [Swin]) models to computed tomography (CT) imaging differences for non-small cell lung cancer (NSCLC) segmentation. Wild-pretrained Swin models outperformed self-pretrained Swin for the various imaging acquisitions. ViT resulted in similar accuracy for both wild- and self-pretrained models. Masked image prediction pretext task that forces networks to learn the local structure resulted in higher accuracy compared to contrastive task that models global image information. Wild-pretrained models resulted in higher feature reuse at the lower level layers and feature differentiation close to output layer after fine-tuning. Hence, we conclude: Wild-pretrained networks were more robust to analyzed CT imaging differences for lung tumor segmentation than self-pretrained methods. Swin architecture benefited from such pretraining more than ViT.
- [342] arXiv:2405.08658 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Beyond the Black Box: Do More Complex Models Provide Superior XAI Explanations?Comments: 15 pages, 9 figures, 5 tablesSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The increasing complexity of Artificial Intelligence models poses challenges to interpretability, particularly in the healthcare sector. This study investigates the impact of deep learning model complexity and Explainable AI (XAI) efficacy, utilizing four ResNet architectures (ResNet-18, 34, 50, 101). Through methodical experimentation on 4,369 lung X-ray images of COVID-19-infected and healthy patients, the research evaluates models' classification performance and the relevance of corresponding XAI explanations with respect to the ground-truth disease masks. Results indicate that the increase in model complexity is associated with a decrease in classification accuracy and AUC-ROC scores (ResNet-18: 98.4%, 0.997; ResNet-101: 95.9%, 0.988). Notably, in eleven out of twelve statistical tests performed, no statistically significant differences occurred between XAI quantitative metrics - Relevance Rank Accuracy and the proposed Positive Attribution Ratio - across trained models. These results suggest that increased model complexity does not consistently lead to higher performance or relevance of explanations for models' decision-making processes.
- [343] arXiv:2405.08672 (cross-list from eess.IV) [pdf, ps, other]
-
Title: EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic CameraComments: early accepted by MICCAI 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adaptation methods to adapt these models to endoscopic depth estimation. We propose Endoscopic Depth Any Camera (EndoDAC) which is an efficient self-supervised depth estimation framework that adapts foundation models to endoscopic scenes. Specifically, we develop the Dynamic Vector-Based Low-Rank Adaptation (DV-LoRA) and employ Convolutional Neck blocks to tailor the foundational model to the surgical domain, utilizing remarkably few trainable parameters. Given that camera information is not always accessible, we also introduce a self-supervised adaptation strategy that estimates camera intrinsics using the pose encoder. Our framework is capable of being trained solely on monocular surgical videos from any camera, ensuring minimal training costs. Experiments demonstrate that our approach obtains superior performance even with fewer training epochs and unaware of the ground truth camera intrinsics. Code is available at this https URL.
- [344] arXiv:2405.08699 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Weakly-supervised causal discovery based on fuzzy knowledge and complex data complementaritySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Causal discovery based on observational data is important for deciphering the causal mechanism behind complex systems. However, the effectiveness of existing causal discovery methods is limited due to inferior prior knowledge, domain inconsistencies, and the challenges of high-dimensional datasets with small sample sizes. To address this gap, we propose a novel weakly-supervised fuzzy knowledge and data co-driven causal discovery method named KEEL. KEEL adopts a fuzzy causal knowledge schema to encapsulate diverse types of fuzzy knowledge, and forms corresponding weakened constraints. This schema not only lessens the dependency on expertise but also allows various types of limited and error-prone fuzzy knowledge to guide causal discovery. It can enhance the generalization and robustness of causal discovery, especially in high-dimensional and small-sample scenarios. In addition, we integrate the extended linear causal model (ELCM) into KEEL for dealing with the multi-distribution and incomplete data. Extensive experiments with different datasets demonstrate the superiority of KEEL over several state-of-the-art methods in accuracy, robustness and computational efficiency. For causal discovery in real protein signal transduction processes, KEEL outperforms the benchmark method with limited data. In summary, KEEL is effective to tackle the causal discovery tasks with higher accuracy while alleviating the requirement for extensive domain expertise.
- [345] arXiv:2405.08703 (cross-list from astro-ph.SR) [pdf, ps, other]
-
Title: Using autoencoders and deep transfer learning to determine the stellar parameters of 286 CARMENES M dwarfsP. Mas-Buitrago, A. González-Marcos, E. Solano, V. M. Passegger, M. Cortés-Contreras, J. Ordieres-Meré, A. Bello-García, J. A. Caballero, A. Schweitzer, H. M. Tabernero, D. Montes, C. CifuentesComments: Accepted in A&ASubjects: Solar and Stellar Astrophysics (astro-ph.SR); Earth and Planetary Astrophysics (astro-ph.EP); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
Deep learning (DL) techniques are a promising approach among the set of methods used in the ever-challenging determination of stellar parameters in M dwarfs. In this context, transfer learning could play an important role in mitigating uncertainties in the results due to the synthetic gap (i.e. difference in feature distributions between observed and synthetic data). We propose a feature-based deep transfer learning (DTL) approach based on autoencoders to determine stellar parameters from high-resolution spectra. Using this methodology, we provide new estimations for the effective temperature, surface gravity, metallicity, and projected rotational velocity for 286 M dwarfs observed by the CARMENES survey. Using autoencoder architectures, we projected synthetic PHOENIX-ACES spectra and observed CARMENES spectra onto a new feature space of lower dimensionality in which the differences between the two domains are reduced. We used this low-dimensional new feature space as input for a convolutional neural network to obtain the stellar parameter determinations. We performed an extensive analysis of our estimated stellar parameters, ranging from 3050 to 4300 K, 4.7 to 5.1 dex, and -0.53 to 0.25 dex for Teff, logg, and [Fe/H], respectively. Our results are broadly consistent with those of recent studies using CARMENES data, with a systematic deviation in our Teff scale towards hotter values for estimations above 3750 K. Furthermore, our methodology mitigates the deviations in metallicity found in previous DL techniques due to the synthetic gap. We consolidated a DTL-based methodology to determine stellar parameters in M dwarfs from synthetic spectra, with no need for high-quality measurements involved in the knowledge transfer. These results suggest the great potential of DTL to mitigate the differences in feature distributions between the observations and the PHOENIX-ACES spectra.
- [346] arXiv:2405.08719 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Addressing Misspecification in Simulation-based Inference through Data-driven CalibrationAntoine Wehenkel, Juan L. Gamella, Ozan Sener, Jens Behrmann, Guillermo Sapiro, Marco Cuturi, Jörn-Henrik JacobsenSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Driven by steady progress in generative modeling, simulation-based inference (SBI) has enabled inference over stochastic simulators. However, recent work has demonstrated that model misspecification can harm SBI's reliability. This work introduces robust posterior estimation (ROPE), a framework that overcomes model misspecification with a small real-world calibration set of ground truth parameter measurements. We formalize the misspecification gap as the solution of an optimal transport problem between learned representations of real-world and simulated observations. Assuming the prior distribution over the parameters of interest is known and well-specified, our method offers a controllable balance between calibrated uncertainty and informative inference under all possible misspecifications of the simulator. Our empirical results on four synthetic tasks and two real-world problems demonstrate that ROPE outperforms baselines and consistently returns informative and calibrated credible intervals.
- [347] arXiv:2405.08742 (cross-list from eess.AS) [pdf, ps, other]
-
Title: A tunable binaural audio telepresence system capable of balancing immersive and enhanced modesComments: 5 pages, 4 figuresSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Binaural Audio Telepresence (BAT) aims to encode the acoustic scene at the far end into binaural signals for the user at the near end. BAT encompasses an immense range of applications that can vary between two extreme modes of Immersive BAT (I-BAT) and Enhanced BAT (E-BAT). With I-BAT, our goal is to preserve the full ambience as if we were at the far end, while with E-BAT, our goal is to enhance the far-end conversation with significantly improved speech quality and intelligibility. To this end, this paper presents a tunable BAT system to vary between these two AT modes with a desired application-specific balance. Microphone signals are converted into binaural signals with prescribed ambience factor. A novel Spatial COherence REpresentation (SCORE) is proposed as an input feature for model training so that the network remains robust to different array setups. Experimental results demonstrated the superior performance of the proposed BAT, even when the array configurations were not included in the training phase.
- [348] arXiv:2405.08745 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Enhancing Blind Video Quality Assessment with Rich Quality-aware FeaturesWei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao ZhaiSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features to help the BVQA model to handle complex distortions and diverse content of social media videos. Specifically, we use SimpleVQA, a BVQA model that consists of a trainable Swin Transformer-B and a fixed SlowFast, as our base model. The Swin Transformer-B and SlowFast components are responsible for extracting spatial and motion features, respectively. Then, we extract three kinds of features from Q-Align, LIQE, and FAST-VQA to capture frame-level quality-aware features, frame-level quality-aware along with scene-specific features, and spatiotemporal quality-aware features, respectively. Through concatenating these features, we employ a multi-layer perceptron (MLP) network to regress them into quality scores. Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets. Moreover, the proposed model won first place in the CVPR NTIRE 2024 Short-form UGC Video Quality Assessment Challenge. The code is available at \url{this https URL}.
- [349] arXiv:2405.08790 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Kolmogorov-Arnold Networks (KANs) for Time Series AnalysisSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This paper introduces a novel application of Kolmogorov-Arnold Networks (KANs) to time series forecasting, leveraging their adaptive activation functions for enhanced predictive modeling. Inspired by the Kolmogorov-Arnold representation theorem, KANs replace traditional linear weights with spline-parametrized univariate functions, allowing them to learn activation patterns dynamically. We demonstrate that KANs outperforms conventional Multi-Layer Perceptrons (MLPs) in a real-world satellite traffic forecasting task, providing more accurate results with considerably fewer number of learnable parameters. We also provide an ablation study of KAN-specific parameters impact on performance. The proposed approach opens new avenues for adaptive forecasting models, emphasizing the potential of KANs as a powerful tool in predictive analytics.
- [350] arXiv:2405.08797 (cross-list from math.CO) [pdf, ps, other]
-
Title: Two questions on Kneser coloringsSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
In this paper, we investigate two questions on Kneser graphs $KG_{n,k}$. First, we prove that the union of $s$ non-trivial intersecting families in ${[n]\choose k}$ has size at most ${n\choose k}-{n-s\choose k}$ for all sufficiently large $n$ that satisfy $n>(2+\epsilon)k^2$ with $\epsilon>0$. We provide an example that shows that this result is essentially tight for the number of colors close to $\chi(KG_{n,k})=n-2k+2$. We also improve the result of Bulankina and Kupavskii on the choice chromatic number, showing that it is at least $\frac 1{16} n\log n$ for all $k<\sqrt n$ and $n$ sufficiently large.
- [351] arXiv:2405.08801 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Prospects of Privacy Advantage in Quantum Machine LearningJamie Heredge, Niraj Kumar, Dylan Herman, Shouvanik Chakrabarti, Romina Yalovetzky, Shree Hari Sureshbabu, Marco PistoiaComments: 8 figures, 1 tableSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
Ensuring data privacy in machine learning models is critical, particularly in distributed settings where model gradients are typically shared among multiple parties to allow collaborative learning. Motivated by the increasing success of recovering input data from the gradients of classical models, this study addresses a central question: How hard is it to recover the input data from the gradients of quantum machine learning models? Focusing on variational quantum circuits (VQC) as learning models, we uncover the crucial role played by the dynamical Lie algebra (DLA) of the VQC ansatz in determining privacy vulnerabilities. While the DLA has previously been linked to the classical simulatability and trainability of VQC models, this work, for the first time, establishes its connection to the privacy of VQC models. In particular, we show that properties conducive to the trainability of VQCs, such as a polynomial-sized DLA, also facilitate the extraction of detailed snapshots of the input. We term this a weak privacy breach, as the snapshots enable training VQC models for distinct learning tasks without direct access to the original input. Further, we investigate the conditions for a strong privacy breach where the original input data can be recovered from these snapshots by classical or quantum-assisted polynomial time methods. We establish conditions on the encoding map such as classical simulatability, overlap with DLA basis, and its Fourier frequency characteristics that enable such a privacy breach of VQC models. Our findings thus play a crucial role in detailing the prospects of quantum privacy advantage by guiding the requirements for designing quantum machine learning models that balance trainability with robust privacy protection.
- [352] arXiv:2405.08810 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Quantum computing with QiskitAli Javadi-Abhari, Matthew Treinish, Kevin Krsulich, Christopher J. Wood, Jake Lishman, Julien Gacon, Simon Martiel, Paul D. Nation, Lev S. Bishop, Andrew W. Cross, Blake R. Johnson, Jay M. GambettaSubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
We describe Qiskit, a software development kit for quantum information science. We discuss the key design decisions that have shaped its development, and examine the software architecture and its core components. We demonstrate an end-to-end workflow for solving a problem in condensed matter physics on a quantum computer that serves to highlight some of Qiskit's capabilities, for example the representation and optimization of circuits at various abstraction levels, its scalability and retargetability to new gates, and the use of quantum-classical computations via dynamic circuits. Lastly, we discuss some of the ecosystem of tools and plugins that extend Qiskit for various tasks, and the future ahead.
- [353] arXiv:nlin/0408039 (cross-list from nlin.AO) [pdf, ps, other]
-
Title: Stability and Diversity in Collective AdaptationComments: 22 pages, 23 figures; updated references, corrected typos, changed contentSubjects: Adaptation and Self-Organizing Systems (nlin.AO); Machine Learning (cs.LG); Dynamical Systems (math.DS); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)
We derive a class of macroscopic differential equations that describe collective adaptation, starting from a discrete-time stochastic microscopic model. The behavior of each agent is a dynamic balance between adaptation that locally achieves the best action and memory loss that leads to randomized behavior. We show that, although individual agents interact with their environment and other agents in a purely self-interested way, macroscopic behavior can be interpreted as game dynamics. Application to several familiar, explicit game interactions shows that the adaptation dynamics exhibits a diversity of collective behaviors. The simplicity of the assumptions underlying the macroscopic equations suggests that these behaviors should be expected broadly in collective adaptation. We also analyze the adaptation dynamics from an information-theoretic viewpoint and discuss self-organization induced by information flux between agents, giving a novel view of collective adaptation.
Cross submissions for Wednesday, 15 May 2024 (showing 60 of 60 entries )
- [354] arXiv:1901.08930 (replaced) [pdf, ps, other]
-
Title: Effectiveness of Tree-based Ensembles for Anomaly Discovery: Insights, Batch and Streaming Active LearningComments: Accepted for Publication in Journal of Artificial Intelligence Research. 46 pages; code is available at this https URL. arXiv admin note: substantial text overlap with arXiv:1809.06477Journal-ref: Journal of Artificial Intelligence Research 80 (2024) 127-172Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In many real-world AD applications including computer security and fraud prevention, the anomaly detector must be configurable by the human analyst to minimize the effort on false positives. One important way to configure the detector is by providing true labels (nominal or anomaly) for a few instances. Recent work on active anomaly discovery has shown that greedily querying the top-scoring instance and tuning the weights of ensemble detectors based on label feedback allows us to quickly discover true anomalies.
This paper makes four main contributions to improve the state-of-the-art in anomaly discovery using tree-based ensembles. First, we provide an important insight that explains the practical successes of unsupervised tree-based ensembles and active learning based on greedy query selection strategy. We also present empirical results on real-world data to support our insights and theoretical analysis to support active learning. Second, we develop a novel batch active learning algorithm to improve the diversity of discovered anomalies based on a formalism called compact description to describe the discovered anomalies. Third, we develop a novel active learning algorithm to handle streaming data setting. We present a data drift detection algorithm that not only detects the drift robustly, but also allows us to take corrective actions to adapt the anomaly detector in a principled manner. Fourth, we present extensive experiments to evaluate our insights and our tree-based active anomaly discovery algorithms in both batch and streaming data settings. Our results show that active learning allows us to discover significantly more anomalies than state-of-the-art unsupervised baselines, our batch active learning algorithm discovers diverse anomalies, and our algorithms under the streaming-data setup are competitive with the batch setup. - [355] arXiv:1907.12489 (replaced) [pdf, ps, other]
-
Title: FDive: Learning Relevance Models using Pattern-based Similarity MeasuresComments: 12 pages, 7 figures, 2 tables, LaTeX; added DOI; corrected typos and formattingJournal-ref: 2019 IEEE Conference on Visual Analytics Science and Technology (VAST)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The detection of interesting patterns in large high-dimensional datasets is difficult because of their dimensionality and pattern complexity. Therefore, analysts require automated support for the extraction of relevant patterns. In this paper, we present FDive, a visual active learning system that helps to create visually explorable relevance models, assisted by learning a pattern-based similarity. We use a small set of user-provided labels to rank similarity measures, consisting of feature descriptor and distance function combinations, by their ability to distinguish relevant from irrelevant data. Based on the best-ranked similarity measure, the system calculates an interactive Self-Organizing Map-based relevance model, which classifies data according to the cluster affiliation. It also automatically prompts further relevance feedback to improve its accuracy. Uncertain areas, especially near the decision boundaries, are highlighted and can be refined by the user. We evaluate our approach by comparison to state-of-the-art feature selection techniques and demonstrate the usefulness of our approach by a case study classifying electron microscopy images of brain cells. The results show that FDive enhances both the quality and understanding of relevance models and can thus lead to new insights for brain research.
- [356] arXiv:1910.02302 (replaced) [pdf, ps, other]
-
Title: Decidability of membership problems for flat rational subsets of $\mathrm{GL}(2,\mathbb{Q})$ and singular matricesComments: 45 pagesSubjects: Formal Languages and Automata Theory (cs.FL)
We consider membership problems for rational subsets of the semigroup of $2\times 2$ matrices over $\mathbb{Q}$. For a semigroup $M$, the rational subsets $\mathrm{Rat}(M)$ are defined as the sets accepted by NFAs whose transitions are labeled by elements of $M$. In general, it is undecidable on inputs $m\in M$ and $R\in \mathrm{Rat}(M)$ whether $m$ belongs to $R$. Therefore, we restrict our attention to the family $\mathrm{FRat}(M,S)$ of flat rational subsets of $M$ over $S$, where $S$ is a subsemigroup of $M$. It consists of finite unions of the form $g_0L_1g_1 \cdots L_tg_t$, where $L_i\in \mathrm{Rat}(S)$ and $g_i\in M$. Assuming that the membership for $\mathrm{Rat}(S)$ is decidable, we prove various results when the membership for $\mathrm{FRat}(M,S)$ is decidable.
If $H$ is a subgroup of a group $G$, then we provide a rather general condition when $\mathrm{FRat}(G,H)$ is an (effective) relative Boolean algebra. This leads to one of our main results that the emptiness problem for Boolean combinations of sets in $\mathrm{FRat}(\mathrm{GL}(2,\mathbb{Q}),\mathrm{GL}(2,\mathbb{Z}))$ is decidable. It is possible that this result cannot be pushed any further as indicated by the following dichotomy: if $G$ is a finitely generated group such that $\mathrm{GL}(2,\mathbb{Z}) < G < \mathrm{GL}(2,\mathbb{Q})$, then either $G\cong \mathrm{GL}(2,\mathbb{Z})\times \mathbb{Z}^k$ or $G$ contains an extension of the Baumslag-Solitar group $\mathrm{BS}(1,q)$ of infinite index. It is open whether the membership for rational subsets is decidable in the latter case. For singular matrices, we will show that the membership problem for $\mathrm{FRat}(\mathbb{Q}^{2\times 2},S)$ is decidable in doubly exponential time, where $S$ is the monoid generated by $\mathrm{GL}(2,\mathbb{Z})\cup \{r\in \mathbb{Q}\,\mid\,r>1\} \cup \{0,\left(\begin{smallmatrix}1 & 0\\ 0 & 0\end{smallmatrix}\right)\}$. - [357] arXiv:2006.01830 (replaced) [pdf, ps, other]
-
Title: Tangles: a structural approach to artificial intelligence in the empirical sciences (Part I)Comments: The print edition of the book will appear later this year with CUP. An enhanced eBook edition and open-source software, with tutorials, are available already from this http URLSubjects: Artificial Intelligence (cs.AI); Combinatorics (math.CO)
Traditional clustering identifies groups of objects that share certain qualities. Tangles do the converse: they identify groups of qualities that often occur together. They can thereby discover, relate, and structure types: of behaviour, political views, texts, or viruses.
If desired, tangles can also be used as a new method for traditional clustering. They offer a precise, quantitative paradigm suited particularly to fuzzy clusters, since they do not require any assignment of objects to the clusters which these collectively form.
This is the first of four parts of a book with the above title. The book explores applications outside mathematics of the notion and theory of tangles generalised from the graph tangles know from graph minor theory. - [358] arXiv:2104.13753 (replaced) [pdf, ps, other]
-
Title: Sum-of-norms clustering does not separate nearby ballsComments: 40 pages, 17 figures, published versionJournal-ref: Journal of Machine Learning Research, volume 25 (2024), no. 143, pp. 1--40Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Sum-of-norms clustering is a popular convexification of $K$-means clustering. We show that, if the dataset is made of a large number of independent random variables distributed according to the uniform measure on the union of two disjoint balls of unit radius, and if the balls are sufficiently close to one another, then sum-of-norms clustering will typically fail to recover the decomposition of the dataset into two clusters. As the dimension tends to infinity, this happens even when the distance between the centers of the two balls is taken to be as large as $2\sqrt{2}$. In order to show this, we introduce and analyze a continuous version of sum-of-norms clustering, where the dataset is replaced by a general measure. In particular, we state and prove a local-global characterization of the clustering that seems to be new even in the case of discrete datapoints.
- [359] arXiv:2105.14307 (replaced) [pdf, ps, other]
-
Title: Minimally Factorizing the Provenance of Self-join Free Conjunctive QueriesComments: 57 pages, 38 figuresSubjects: Databases (cs.DB)
We consider the problem of finding the minimal-size factorization of the provenance of self-join-free conjunctive queries, i.e., we want to find a formula that minimizes the number of variable repetitions. This problem is equivalent to solving the fundamental Boolean formula factorization problem for the restricted setting of the provenance formulas of self-join free queries. While general Boolean formula minimization is $\Sigma^p_2$-complete, we show that the problem is NP-C in our case. Additionally, we identify a large category of queries that can be solved in PTIME, expanding beyond the previously known tractable cases of read-once formulas and hierarchical queries.
We describe connections between factorizations, Variable Elimination Orders (VEOs), and minimal query plans. We leverage these insights to create an Integer Linear Program (ILP) that can solve the minimal factorization problem exactly. We also propose a Max-Flow Min-Cut (MFMC) based algorithm that gives an efficient approximate solution. Importantly, we show that both the Linear Programming (LP) relaxation of our ILP, and our MFMC-based algorithm are always correct for all currently known PTIME cases. Thus, we present two unified algorithms (ILP and MFMC) that can both recover all known PTIME cases in PTIME, yet also solve NP-complete cases either exactly (ILP) or approximately (MFMC), as desired. - [360] arXiv:2110.02830 (replaced) [pdf, ps, other]
-
Title: Parameterized Algorithms for the Steiner Arborescence Problem on a HypercubeSubjects: Data Structures and Algorithms (cs.DS)
Motivated by a phylogeny reconstruction problem in evolutionary biology, we study the minimum Steiner arborescence problem on directed hypercubes (MSA-DH). Given $m$, representing the directed hypercube $\vec{Q}_m$, and a set of terminals $R$, the problem asks to find a Steiner arborescence that spans $R$ with minimum cost. As $m$ implicitly represents $\vec{Q}_m$ comprising $2^{m}$ vertices, the running time analyses of traditional Steiner tree algorithms on general graphs does not give a clear understanding of the actual complexity of this problem. We present algorithms that exploit the structure of the hypercube and run in time polynomial in $|R|$ and $m$.
We explore the MSA-DH problem on three natural parameters - $R$, and two above-guarantee parameters, number of Steiner nodes $p$ and penalty $q$. For above-guarantee parameters, the parameterized MSA-DH problem takes $p \geq 0$ or $q\geq 0$ as input, and outputs a Steiner arborescence with at most $|R| + p - 1$ or $m + q$ edges respectively. We present the following results ($\tilde{\mathcal{O}}$ hides the polynomial factors):
1. An exact algorithm that runs in $\tilde{\mathcal{O}}(3^{|R|})$ time.
2. A randomized algorithm that runs in $\tilde{\mathcal{O}}(9^q)$ time with success probability $\geq 4^{-q}$.
3. An exact algorithm that runs in $\tilde{\mathcal{O}}(36^q)$ time.
4. A $(1+q)$-approximation algorithm that runs in $\tilde{\mathcal{O}}(1.25284^q)$ time.
5. An $\mathcal{O}\left(p\ell_{\mathrm{max}} \right)$-additive approximation algorithm that runs in $\tilde{\mathcal{O}}(\ell_{\mathrm{max}}^{p+2})$ time, where $\ell_{\mathrm{max}}$ is the maximum distance of any terminal from the root. - [361] arXiv:2202.01103 (replaced) [pdf, ps, other]
-
Title: A New Temporal Interpretation of Cluster EditingComments: 27 pages, 2 figures. Extended abstract appeared at IWOCA 2022Subjects: Discrete Mathematics (cs.DM); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
The NP-complete graph problem Cluster Editing seeks to transform a static graph into a disjoint union of cliques by making the fewest possible edits to the edges. We introduce a natural interpretation of this problem in temporal graphs, whose edge sets change over time. This problem is NP-complete even when restricted to temporal graphs whose underlying graph is a path, but we obtain two polynomial-time algorithms for restricted cases. In the static setting, it is well-known that a graph is a disjoint union of cliques if and only if it contains no induced copy of $P_3$; we demonstrate that no general characterisation involving sets of at most four vertices can exist in the temporal setting, but obtain a complete characterisation involving forbidden configurations on at most five vertices. This characterisation gives rise to an FPT algorithm parameterised simultaneously by the permitted number of modifications and the lifetime of the temporal graph.
- [362] arXiv:2206.08202 (replaced) [pdf, ps, other]
-
Title: Token Spammers, Rug Pulls, and SniperBots: An Analysis of the Ecosystem of Tokens in Ethereum and the Binance Smart Chain (BNB)Comments: This paper is included in the Proceedings of the 32nd USENIX Security SymposiumJournal-ref: https://www.usenix.org/conference/usenixsecurity23/presentation/cernera ISBN: 978-1-939133-37-3 Year:2023Subjects: Computers and Society (cs.CY)
In this work, we perform a longitudinal analysis of the BNB Smart Chain and Ethereum blockchain from their inception to March 2022. We study the ecosystem of the tokens and liquidity pools, highlighting analogies and differences between the two blockchains. We estimate the lifetime of the tokens, discovering that about 60% of them are active for less than one day. Moreover, we find that 1% of addresses create an anomalous number of tokens (between 20% and 25%). We present an exit scam fraud and quantify its prevalence on both blockchains. We find that token spammers use short lifetime tokens as disposable tokens to perpetrate these frauds serially. Finally, we present a new kind of trader bot involved in these activities, and we detect their presence and quantify their activity in the exit scam operations.
- [363] arXiv:2206.12934 (replaced) [pdf, ps, other]
-
Title: Checking Trustworthiness of Probabilistic Computations in a Typed Natural Deduction SystemSubjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
In this paper we present the probabilistic typed natural deduction calculus TPTND, designed to reason about and derive trustworthiness properties of probabilistic computational processes, like those underlying current AI applications. Derivability in TPTND is interpreted as the process of extracting $n$ samples of possibly complex outputs with a certain frequency from a given categorical distribution. We formalize trust for such outputs as a form of hypothesis testing on the distance between such frequency and the intended probability. The main advantage of the calculus is to render such notion of trustworthiness checkable. We present a computational semantics for the terms over which we reason and then the semantics of TPTND, where logical operators as well as a Trust operator are defined through introduction and elimination rules. We illustrate structural and metatheoretical properties, with particular focus on the ability to establish under which term evolutions and logical rules applications the notion of trustworhtiness can be preserved.
- [364] arXiv:2208.03123 (replaced) [pdf, ps, html, other]
-
Title: Watson-Crick conjugates of words and languagesSubjects: Formal Languages and Automata Theory (cs.FL); Combinatorics (math.CO)
This paper explores the concept of Watson-Crick conjugates, also known as $\theta$-conjugates, of words and languages. This concept extends the classical idea of conjugates by incorporating the Watson-Crick complementarity of DNA sequences, from the perspective of DNA computing. Our investigation initially focuses on the properties of $\theta$-conjugates of words. We then define $\theta$-conjugates of a language and study closure properties of certain families of languages under the $\theta$-conjugate operation. Furthermore, we analyze the iterated $\theta$-conjugate of both words and languages. Finally, we delve into the idea of $\theta$-conjugate-free languages and examine the decidability problems surrounding $\theta$-conjugate-freeness for different classes of languages
- [365] arXiv:2208.03411 (replaced) [pdf, ps, other]
-
Title: Expanded-clique graphs and the domination problemComments: 17 pages, 5 figuresSubjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
Given a graph $G$ such that each vertex $v_i$ has a value $f(v_i)$, the expanded-clique graph $H$ is the graph where each vertex $v_i$ of $G$ becomes a clique $V_i$ of size $f(v_i)$ and for each edge $v_iv_j \in E(G)$, there is a vertex of $V_i$ adjacent to an exclusive vertex of $V_j$. In this work, among the results, we present two characterizations of the expanded-clique graphs, one of them leads to a linear-time recognition algorithm. Regarding the domination number, we show that this problem is \NP-complete for planar bipartite $3$-expanded-clique graphs and for cubic line graphs of bipartite graphs.
- [366] arXiv:2208.08093 (replaced) [pdf, ps, other]
-
Title: Near Threshold Computation of Partitioned Ring Learning With Error (RLWE) Post Quantum Cryptography on Reconfigurable ArchitectureComments: Full Version of the manuscript is availble in IEEE Access DOI: https://doi.org/10.1109/access.2024.3401235. In IEEE Access, 2024Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
Ring Learning With Error (RLWE) algorithm is used in Post Quantum Cryptography (PQC) and Homomorphic Encryption (HE) algorithm. The existing classical crypto algorithms may be broken in quantum computers. The adversaries can store all encrypted data. While the quantum computer will be available, these encrypted data can be exposed by the quantum computer. Therefore, the PQC algorithms are an essential solution in recent applications. On the other hand, the HE allows operations on encrypted data which is appropriate for getting services from third parties without revealing confidential plain-texts. The FPGA based PQC and HE hardware accelerators like RLWE is much cost-effective than processor based platform and Application Specific Integrated Circuit (ASIC). FPGA based hardware accelerators still consume more power compare to ASIC based design. Near Threshold Computation (NTC) may be a convenient solution for FPGA based RLWE implementation. In this paper, we have implemented RLWE hardware accelerator which has 14 subcomponents. This paper creates clusters based on the critical path of all 14 subcomponents. Each cluster is implemented in an FPGA partition which has the same biasing voltage $V_{ccint}$. The clusters that have higher critical paths use higher Vccint to avoid timing failure. The clusters have lower critical paths use lower biasing voltage Vccint. This voltage scaled, partitioned RLWE can save ~6% and ~11% power in Vivado and VTR platform respectively. The resource usage and throughput of the implemented RLWE hardware accelerator is comparatively better than existing literature.
- [367] arXiv:2209.00271 (replaced) [pdf, ps, other]
-
Title: Maximal Closed SubstringsSubjects: Data Structures and Algorithms (cs.DS); Formal Languages and Automata Theory (cs.FL)
A string is closed if it has length 1 or has a nonempty border without internal occurrences. In this paper we introduce the definition of a \emph{maximal closed substring} (MCS), which is an occurrence of a closed substring that cannot be extended to the left nor to the right into a longer closed substring. MCSs with exponent at least $2$ are commonly called \emph{runs}; those with exponent smaller than $2$, instead, are particular cases of \emph{maximal gapped repeats}. We provide an algorithm that, given a string of length $n$ locates all MCSs the string contains in $\mathcal O(n\log n)$ time.
- [368] arXiv:2209.07618 (replaced) [pdf, ps, other]
-
Title: Differentiable Bilevel Programming for Stackelberg Congestion GamesSubjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
In a Stackelberg congestion game (SCG), a leader aims to maximize their own gain by anticipating and manipulating the equilibrium state at which the followers settle by playing a congestion game. Often formulated as bilevel programs, large-scale SCGs are well known for their intractability and complexity. Here, we attempt to tackle this computational challenge by marrying traditional methodologies with the latest differentiable programming techniques in machine learning. The core idea centers on replacing the lower-level equilibrium problem with a smooth evolution trajectory defined by the imitative logit dynamic (ILD), which we prove converges to the equilibrium of the congestion game under mild conditions. Building upon this theoretical foundation, we propose two new local search algorithms for SCGs. The first is a gradient descent algorithm that obtains the derivatives by unrolling ILD via differentiable programming. Thanks to the smoothness of ILD, the algorithm promises both efficiency and scalability. The second algorithm adds a heuristic twist by cutting short the followers' evolution trajectory. Behaviorally, this means that, instead of anticipating the followers' best response at equilibrium, the leader seeks to approximate that response by only looking ahead a limited number of steps. Our numerical experiments are carried out over various instances of classic SCG applications, ranging from toy benchmarks to large-scale real-world examples. The results show the proposed algorithms are reliable and scalable local solvers that deliver high-quality solutions with greater regularity and significantly less computational effort compared to the many incumbents included in our study.
- [369] arXiv:2210.07502 (replaced) [pdf, ps, other]
-
Title: Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted AuctionsComments: Accepted in EC'23 and Mathematics of Operations ResearchSubjects: Computer Science and Game Theory (cs.GT)
We study the liquid welfare in sequential first-price auctions with budget-limited buyers. We focus on first-price auctions, which are increasingly commonly used in many settings, and consider liquid welfare, a natural and well-studied generalization of social welfare for buyers with budgets. We use a behavioral model for the buyers, assuming a learning style guarantee: the resulting utility of each buyer is within a $\gamma$ factor (where $\gamma\ge 1$) of the utility achievable by shading her value with the same factor at each round. Under this assumption, we show a $\gamma+1/2+O(1/\gamma)$ price of anarchy for liquid welfare assuming buyers have additive valuations. This positive result is in contrast to sequential second-price auctions, where even with $\gamma=1$, the resulting liquid welfare can be arbitrarily smaller than the maximum liquid welfare. We prove a lower bound of $\gamma$ on the liquid welfare loss under the above assumption in first-price auctions, making our bound asymptotically tight. For the case when $\gamma = 1$ our theorem implies a price of anarchy upper bound that is about $2.41$; we show a lower bound of $2$ for that case.
We also give a learning algorithm that the players can use to achieve the guarantee needed for our liquid welfare result. Our algorithm achieves utility within a $\gamma=O(1)$ factor of the optimal utility even when a buyer's values and the bids of the other buyers are chosen adversarially, assuming the buyer's budget grows linearly with time. The competitiveness guarantee of the learning algorithm deteriorates somewhat as the budget grows slower than linearly with time.
Finally, we extend our liquid welfare results for the case where buyers have submodular valuations over the set of items they win across iterations with a slightly worse price of anarchy bound of $\gamma+1+O(1/\gamma)$ compared to the guarantee for the additive case. - [370] arXiv:2210.14972 (replaced) [pdf, ps, other]
-
Title: Environment Design for Inverse Reinforcement LearningComments: to appear at ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Learning a reward function from demonstrations suffers from low sample-efficiency. Even with abundant data, current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics. We tackle these challenges through adaptive environment design. In our framework, the learner repeatedly interacts with the expert, with the former selecting environments to identify the reward function as quickly as possible from the expert's demonstrations in said environments. This results in improvements in both sample-efficiency and robustness, as we show experimentally, for both exact and approximate inference.
- [371] arXiv:2211.02003 (replaced) [pdf, ps, other]
-
Title: Distributed DP-Helmet: Scalable Differentially Private Non-interactive Averaging of Single LayersSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
In this work, we propose two differentially private, non-interactive, distributed learning algorithms in a framework called Distributed DP-Helmet. Our framework is based on what we coin blind averaging: each user locally learns and noises a model and all users then jointly compute the mean of their models via a secure summation protocol. We provide experimental evidence that blind averaging for SVMs and single Softmax-layer (Softmax-SLP) can have a strong utility-privacy tradeoff: we reach an accuracy of 86% on CIFAR-10 for $\varepsilon$ = 0.4 and 1,000 users, of 44% on CIFAR-100 for $\varepsilon$ = 1.2 and 100 users, and of 39% on federated EMNIST for $\varepsilon$ = 0.4 and 3,400 users, all after a SimCLR-based pretraining. As an ablation, we study the resilience of our approach to a strongly non-IID setting. On the theoretical side, we show that blind averaging preserves differential privacy if the objective function is smooth, Lipschitz, and strongly convex like SVMs. We show that these properties also hold for Softmax-SLP which is often used for last-layer fine-tuning such that for a fixed model size the privacy bound $\varepsilon$ of Softmax-SLP no longer depends on the number of classes. This marks a significant advantage in utility and privacy of Softmax-SLP over SVMs. Furthermore, in the limit blind averaging of hinge-loss SVMs convergences to a centralized learned SVM. The latter result is based on the representer theorem and can be seen as a blueprint for finding convergence for other empirical risk minimizers (ERM) like Softmax-SLP.
- [372] arXiv:2211.14599 (replaced) [pdf, ps, other]
-
Title: Condensed Gradient BoostingSubjects: Machine Learning (cs.LG)
This paper presents a computationally efficient variant of gradient boosting for multi-class classification and multi-output regression tasks. Standard gradient boosting uses a 1-vs-all strategy for classifications tasks with more than two classes. This strategy translates in that one tree per class and iteration has to be trained. In this work, we propose the use of multi-output regressors as base models to handle the multi-class problem as a single task. In addition, the proposed modification allows the model to learn multi-output regression problems. An extensive comparison with other multi-ouptut based gradient boosting methods is carried out in terms of generalization and computational efficiency. The proposed method showed the best trade-off between generalization ability and training and predictions speeds.
- [373] arXiv:2212.02387 (replaced) [pdf, ps, other]
-
Title: An Efficient Stochastic Algorithm for Decentralized Nonconvex-Strongly-Concave Minimax OptimizationSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
This paper studies the stochastic nonconvex-strongly-concave minimax optimization over a multi-agent network. We propose an efficient algorithm, called Decentralized Recursive gradient descEnt Ascent Method (DREAM), which achieves the best-known theoretical guarantee for finding the $\epsilon$-stationary points. Concretely, it requires $\mathcal{O}(\min (\kappa^3\epsilon^{-3},\kappa^2 \sqrt{N} \epsilon^{-2} ))$ stochastic first-order oracle (SFO) calls and $\tilde{\mathcal{O}}(\kappa^2 \epsilon^{-2})$ communication rounds, where $\kappa$ is the condition number and $N$ is the total number of individual functions. Our numerical experiments also validate the superiority of DREAM over previous methods.
- [374] arXiv:2212.08744 (replaced) [pdf, ps, other]
-
Title: Machine Learning Strategies to Improve Generalization in EEG-based Emotion Assessment: a Systematic ReviewAndrea Apicella, Pasquale Arpaia, Giovanni D'Errico, Davide Marocco, Giovanna Mastrati, Nicola Moccaldi, Roberto PreveteComments: under reviewSubjects: Machine Learning (cs.LG)
A systematic review on machine-learning strategies for improving generalizability (cross-subjects and cross-sessions) electroencephalography (EEG) based in emotion classification was realized. In this context, the non-stationarity of EEG signals is a critical issue and can lead to the Dataset Shift problem. Several architectures and methods have been proposed to address this issue, mainly based on transfer learning methods. 418 papers were retrieved from the Scopus, IEEE Xplore and PubMed databases through a search query focusing on modern machine learning techniques for generalization in EEG-based emotion assessment. Among these papers, 75 were found eligible based on their relevance to the problem. Studies lacking a specific cross-subject and cross-session validation strategy and making use of other biosignals as support were excluded. On the basis of the selected papers' analysis, a taxonomy of the studies employing Machine Learning (ML) methods was proposed, together with a brief discussion on the different ML approaches involved. The studies with the best results in terms of average classification accuracy were identified, supporting that transfer learning methods seem to perform better than other approaches. A discussion is proposed on the impact of (i) the emotion theoretical models and (ii) psychological screening of the experimental sample on the classifier performances.
- [375] arXiv:2301.08403 (replaced) [pdf, ps, other]
-
Title: One-shot Generative Data Augmentation with Bounded Divergence for UAV Identification in Limited RF EnvironmentsComments: 12 pages, 7 figures, 4 tablesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Applications (stat.AP); Machine Learning (stat.ML)
This work addresses the pressing need for cybersecurity in Unmanned Aerial Vehicles (UAVs), particularly focusing on the challenges of identifying UAVs using radiofrequency (RF) fingerprinting in constrained environments. The complexity and variability of RF signals, influenced by environmental interference and hardware imperfections, often render traditional RF-based identification methods ineffective. To address these complications, the study introduces the rigorous use of one-shot generative methods for augmenting transformed RF signals, offering a significant improvement in UAV identification. This approach shows promise in low-data regimes, outperforming deep generative methods like conditional generative adversarial networks (GANs) and variational autoencoders (VAEs). The paper provides a theoretical guarantee for the effectiveness of one-shot generative models in augmenting limited data, setting a precedent for their application in limited RF environments. This research not only contributes to the cybersecurity of UAVs but also rigorously broadens the scope of machine learning techniques in data-constrained scenarios, which may include atypical complex sequences beyond images and videos.
- [376] arXiv:2302.07243 (replaced) [pdf, ps, other]
-
Title: A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder IdentificationSin-Yee Yap, Junn Yong Loo, Chee-Ming Ting, Fuad Noman, Raphael C.-W. Phan, Adeel Razi, David L. DoweSubjects: Machine Learning (cs.LG)
Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and causal dynamics of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states.
- [377] arXiv:2302.08390 (replaced) [pdf, ps, other]
-
Title: Photonic Neural Networks: A Compact ReviewSubjects: Neural and Evolutionary Computing (cs.NE)
It has long been known that photonic science and especially photonic communications can raise the speed of technologies and producing manufacturing. More recently, photonic science has also been interested in its capabilities to implement low-precision linear operations, such as matrix multiplications, fast and effciently. For a long time most scientists taught that Electronics is the end of science but after many years and about 35 years ago had been understood that electronics do not answer alone and should have a new science. Today we face modern ways and instruments for doing tasks as soon as possible in proportion to many decays before. The velocity of progress in science is very fast. All our progress in science area is dependent on modern knowledge about new methods. In this research, we want to review the concept of a photonic neural network. For this research was selected 18 main articles were among the main 30 articles on this subject from 2015 to the 2022 year. These articles noticed three principles: 1- Experimental concepts, 2- Theoretical concepts, and, finally 3- Mathematic concepts. We should be careful with this research because mathematics has a very important and constructive role in our topics! One of the topics that are very valid and also new, is simulation. We used to work with simulation in some parts of this research. First, briefly, we start by introducing photonics and neural networks. In the second we explain the advantages and disadvantages of a combination of both in the science world and industries and technologies about them. Also, we are talking about the achievements of a thin modern science. Third, we try to introduce some important and valid parameters in neural networks. In this manner, we use many mathematic tools in some portions of this article.
- [378] arXiv:2302.12170 (replaced) [pdf, ps, html, other]
-
Title: Language Model Crossover: Variation through Few-Shot PromptingElliot Meyerson, Mark J. Nelson, Herbie Bradley, Adam Gaier, Arash Moradi, Amy K. Hoover, Joel LehmanSubjects: Neural and Evolutionary Computing (cs.NE)
This paper pursues the insight that language models naturally enable an intelligent variation operator similar in spirit to evolutionary crossover. In particular, language models of sufficient scale demonstrate in-context learning, i.e. they can learn from associations between a small number of input patterns to generate outputs incorporating such associations (also called few-shot prompting). This ability can be leveraged to form a simple but powerful variation operator, i.e. to prompt a language model with a few text-based genotypes (such as code, plain-text sentences, or equations), and to parse its corresponding output as those genotypes' offspring. The promise of such language model crossover (which is simple to implement and can leverage many different open-source language models) is that it enables a simple mechanism to evolve semantically-rich text representations (with few domain-specific tweaks), and naturally benefits from current progress in language models. Experiments in this paper highlight the versatility of language-model crossover, through evolving binary bit-strings, sentences, equations, text-to-image prompts, and Python code. The conclusion is that language model crossover is a promising method for evolving genomes representable as text.
- [379] arXiv:2303.05240 (replaced) [pdf, ps, other]
-
Title: Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Generative adversarial networks (GANs) have achieved remarkable progress in the natural image field. However, when applying GANs in the remote sensing (RS) image generation task, an extraordinary phenomenon is observed: the GAN model is more sensitive to the size of training data for RS image generation than for natural image generation. In other words, the generation quality of RS images will change significantly with the number of training categories or samples per category. In this paper, we first analyze this phenomenon from two kinds of toy experiments and conclude that the amount of feature information contained in the GAN model decreases with reduced training data. Then we establish a structural causal model (SCM) of the data generation process and interpret the generated data as the counterfactuals. Based on this SCM, we theoretically prove that the quality of generated images is positively correlated with the amount of feature information. This provides insights for enriching the feature information learned by the GAN model during training. Consequently, we propose two innovative adjustment schemes, namely Uniformity Regularization (UR) and Entropy Regularization (ER), to increase the information learned by the GAN model at the distributional and sample levels, respectively. We theoretically and empirically demonstrate the effectiveness and versatility of our methods. Extensive experiments on three RS datasets and two natural datasets show that our methods outperform the well-established models on RS image generation tasks. The source code is available at this https URL.
- [380] arXiv:2303.05503 (replaced) [pdf, ps, other]
-
Title: Open-world Instance Segmentation: Top-down Learning with Bottom-up SupervisionComments: L3D-IVU Workshop, CVPR 2024. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy. However, when deployed in the open world, they exhibit notable bias towards seen classes and suffer from significant performance drop. In this work, we propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world Segmentation (UDOS) that combines classical bottom-up segmentation algorithms within a top-down learning framework. UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations. The bottom-up segmentations are class-agnostic and do not overfit to specific taxonomies. The part-masks are then fed into affinity-based grouping and refinement modules to predict robust instance-level segmentations. UDOS enjoys both the speed and efficiency from the top-down architectures and the generalization ability to unseen categories from bottom-up supervision. We validate the strengths of UDOS on multiple cross-category as well as cross-dataset transfer tasks from 5 challenging datasets including MS-COCO, LVIS, ADE20k, UVO and OpenImages, achieving significant improvements over state-of-the-art across the board. Our code and models are available on our project page.
- [381] arXiv:2303.07247 (replaced) [pdf, ps, other]
-
Title: Are Models Trained on Indian Legal Data Fair?Sahil Girhepuje, Anmol Goel, Gokul S Krishnan, Shreya Goyal, Satyendra Pandey, Ponnurangam Kumaraguru, Balaraman RavindranComments: Presented at the Symposium on AI and Law (SAIL) 2023Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Recent advances and applications of language technology and artificial intelligence have enabled much success across multiple domains like law, medical and mental health. AI-based Language Models, like Judgement Prediction, have recently been proposed for the legal sector. However, these models are strife with encoded social biases picked up from the training data. While bias and fairness have been studied across NLP, most studies primarily locate themselves within a Western context. In this work, we present an initial investigation of fairness from the Indian perspective in the legal domain. We highlight the propagation of learnt algorithmic biases in the bail prediction task for models trained on Hindi legal documents. We evaluate the fairness gap using demographic parity and show that a decision tree model trained for the bail prediction task has an overall fairness disparity of 0.237 between input features associated with Hindus and Muslims. Additionally, we highlight the need for further research and studies in the avenues of fairness/bias in applying AI in the legal sector with a specific focus on the Indian context.
- [382] arXiv:2303.12797 (replaced) [pdf, ps, other]
-
Title: An algorithmic framework for the optimization of deep neural networks architectures and hyperparametersJulie Keisler (EDF R\&D OSIRIS, EDF R\&D, CRIStAL, CRIStAL), El-Ghazali Talbi (CRIStAL, CRIStAL), Sandra Claudel (EDF R\&D OSIRIS, EDF R\&D), Gilles Cabriel (EDF R\&D OSIRIS, EDF R\&D)Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In this paper, we propose an algorithmic framework to automatically generate efficient deep neural networks and optimize their associated hyperparameters. The framework is based on evolving directed acyclic graphs (DAGs), defining a more flexible search space than the existing ones in the literature. It allows mixtures of different classical operations: convolutions, recurrences and dense layers, but also more newfangled operations such as self-attention. Based on this search space we propose neighbourhood and evolution search operators to optimize both the architecture and hyper-parameters of our networks. These search operators can be used with any metaheuristic capable of handling mixed search spaces. We tested our algorithmic framework with an evolutionary algorithm on a time series prediction benchmark. The results demonstrate that our framework was able to find models outperforming the established baseline on numerous datasets.
- [383] arXiv:2304.00999 (replaced) [pdf, ps, other]
-
Title: Bandits for Sponsored Search Auctions under Unknown Valuation Model: Case Study in E-Commerce AdvertisingSubjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
This paper presents a bidding system for sponsored search auctions under an unknown valuation model. This formulation assumes that the bidder's value is unknown, evolving arbitrarily, and observed only upon winning an auction. Unlike previous studies, we do not impose any assumptions on the nature of feedback and consider the problem of bidding in sponsored search auctions in its full generality. Our system is based on a bandit framework that is resilient to the black-box auction structure and delayed and batched feedback. To validate our proposed solution, we conducted a case study at Zalando, a leading fashion e-commerce company. We outline the development process and describe the promising outcomes of our bandits-based approach to increase profitability in sponsored search auctions. We discuss in detail the technical challenges that were overcome during the implementation, shedding light on the mechanisms that led to increased profitability.
- [384] arXiv:2304.05020 (replaced) [pdf, ps, html, other]
-
Title: Cooperative Coevolution for Non-Separable Large-Scale Black-Box Optimization: Convergence Analyses and Distributed AccelerationsSubjects: Neural and Evolutionary Computing (cs.NE)
Given the ubiquity of non-separable optimization problems in real worlds, in this paper we analyze and extend the large-scale version of the well-known cooperative coevolution (CC), a divide-and-conquer black-box optimization framework, on non-separable functions. First, we reveal empirical reasons of when decomposition-based methods are preferred or not in practice on some non-separable large-scale problems, which have not been clearly pointed out in many previous CC papers. Then, we formalize CC to a continuous-game model via simplification, but without losing its essential property. Different from previous evolutionary game theory for CC, our new model provides a much simpler but useful viewpoint to analyze its convergence, since only the pure Nash equilibrium concept is needed and more general fitness landscapes can be explicitly considered. Based on convergence analyses, we propose a hierarchical decomposition strategy for better generalization, as for any decomposition, there is a risk of getting trapped into a suboptimal Nash equilibrium. Finally, we use powerful distributed computing to accelerate it under the recent multi-level learning framework, which combines the fine-tuning ability from decomposition with the invariance property of CMA-ES. Experiments on a set of high-dimensional test functions validate both its search performance and scalability (w.r.t. CPU cores) on a clustering computing platform with 400 CPU cores.
- [385] arXiv:2304.05215 (replaced) [pdf, ps, other]
-
Title: A Billion-scale Foundation Model for Remote Sensing ImagesComments: This manuscript is the accepted version for IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
As the potential of foundation models in visual tasks has garnered significant attention, pretraining these models before downstream tasks has become a crucial step. The three key factors in pretraining foundation models are the pretraining method, the size of the pretraining dataset, and the number of model parameters. Recently, research in the remote sensing field has focused primarily on the pretraining method and the size of the dataset, with limited emphasis on the number of model parameters. This paper addresses this gap by examining the effect of increasing the number of model parameters on the performance of foundation models in downstream tasks such as rotated object detection and semantic segmentation. We pretrained foundation models with varying numbers of parameters, including 86M, 605.26M, 1.3B, and 2.4B, to determine whether performance in downstream tasks improved with an increase in parameters. To the best of our knowledge, this is the first billion-scale foundation model in the remote sensing field. Furthermore, we propose an effective method for scaling up and fine-tuning a vision transformer in the remote sensing field. To evaluate general performance in downstream tasks, we employed the DOTA v2.0 and DIOR-R benchmark datasets for rotated object detection, and the Potsdam and LoveDA datasets for semantic segmentation. Experimental results demonstrated that, across all benchmark datasets and downstream tasks, the performance of the foundation models and data efficiency improved as the number of parameters increased. Moreover, our models achieve the state-of-the-art performance on several datasets including DIOR-R, Postdam, and LoveDA.
- [386] arXiv:2304.07162 (replaced) [pdf, ps, other]
-
Title: Operations on Fixpoint Equation SystemsComments: 32 pagesSubjects: Logic in Computer Science (cs.LO)
We study operations on fixpoint equation systems (FES) over arbitrary complete lattices. We investigate under which conditions these operations, such as substituting variables by their definition, and swapping the ordering of equations, preserve the solution of a FES. We provide rigorous, computer-checked proofs. Along the way, we list a number of known and new identities and inequalities on extremal fixpoints in complete lattices.
- [387] arXiv:2304.10535 (replaced) [pdf, ps, html, other]
-
Title: Farm3D: Learning Articulated 3D Animals by Distilling 2D DiffusionComments: In 3DV 2024, Project page: this http URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects, relying solely on "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn a monocular network that predicts the 3D shape, albedo, illumination, and viewpoint of any object occurrence, given a collection of single-view images of an object category. However, these approaches heavily rely on manually curated clean training data, which are expensive to obtain. We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data that are sufficiently clean and do not require further manual curation, enabling the learning of such a reconstruction network from scratch. Additionally, we incorporate the diffusion model as a score to enhance the learning process. The idea involves randomizing certain aspects of the reconstruction, such as viewpoint and illumination, generating virtual views of the reconstructed 3D object, and allowing the 2D network to assess the quality of the resulting image, thus providing feedback to the reconstructor. Unlike work based on distillation, which produces a single 3D asset for each textual prompt, our approach yields a monocular reconstruction network capable of outputting a controllable 3D asset from any given image, whether real or generated, in a single forward pass in a matter of seconds. Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.
- [388] arXiv:2305.01290 (replaced) [pdf, ps, other]
-
Title: A Construction of Arbitrarily Large Type-II $Z$ Complementary Code SetSubjects: Information Theory (cs.IT)
For a type-I $(K,M,Z,N)$-ZCCS, it follows $K \leq M \left\lfloor \frac{N}{Z}\right\rfloor$. In this paper, we propose a construction of type-II $(p^{k+n},p^k,p^{n+r}-p^r+1,p^{n+r})$-$Z$ complementary code set (ZCCS) using an extended Boolean function, its properties of Hamiltonian paths and the concept of isolated vertices, where $p\ge 2$. However, the proposed type-II ZCCS provides $K = M(N-Z+1)$ codes, where as for type-I $(K,M,N,Z)$-ZCCS, it is $K \leq M \left\lfloor \frac{N}{Z}\right\rfloor$. Therefore, the proposed type-II ZCCS provides a larger number of codes compared to type-I ZCCS. Further, as a special case of the proposed construction, $(p^k,p^k,p^n)$-CCC can be generated, for any integral value of $p\ge2$ and $k\le n$.
- [389] arXiv:2305.13525 (replaced) [pdf, ps, html, other]
-
Title: A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Heavy communication, in particular, collective operations, can become a critical performance bottleneck in scaling the training of billion-parameter neural networks to large-scale parallel systems. This paper introduces a four-dimensional (4D) approach to optimize communication in parallel training. This 4D approach is a hybrid of 3D tensor and data parallelism, and is implemented in the AxoNN framework. In addition, we employ two key strategies to further minimize communication overheads. First, we aggressively overlap expensive collective operations (reduce-scatter, all-gather, and all-reduce) with computation. Second, we develop an analytical model to identify high-performing configurations within the large search space defined by our 4D algorithm. This model empowers practitioners by simplifying the tuning process for their specific training workloads. When training an 80-billion parameter GPT on 1024 GPUs of Perlmutter, AxoNN surpasses Megatron-LM, a state-of-the-art framework, by a significant 26%. Additionally, it achieves a significantly high 57% of the theoretical peak FLOP/s or 182 PFLOP/s in total.
- [390] arXiv:2305.13935 (replaced) [pdf, ps, other]
-
Title: Distribution-aware Fairness Test GenerationComments: Paper accepted at JSS; 18 pages, 4 figures; LaTex; Data section addedSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Software Engineering (cs.SE)
Ensuring that all classes of objects are detected with equal accuracy is essential in AI systems. For instance, being unable to identify any one class of objects could have fatal consequences in autonomous driving systems. Hence, ensuring the reliability of image recognition systems is crucial. This work addresses how to validate group fairness in image recognition software. We propose a distribution-aware fairness testing approach (called DistroFair) that systematically exposes class-level fairness violations in image classifiers via a synergistic combination of out-of-distribution (OOD) testing and semantic-preserving image mutation. DistroFair automatically learns the distribution (e.g., number/orientation) of objects in a set of images. Then it systematically mutates objects in the images to become OOD using three semantic-preserving image mutations - object deletion, object insertion and object rotation. We evaluate DistroFair using two well-known datasets (CityScapes and MS-COCO) and three major, commercial image recognition software (namely, Amazon Rekognition, Google Cloud Vision and Azure Computer Vision). Results show that about 21% of images generated by DistroFair reveal class-level fairness violations using either ground truth or metamorphic oracles. DistroFair is up to 2.3x more effective than two main baselines, i.e., (a) an approach which focuses on generating images only within the distribution (ID) and (b) fairness analysis using only the original image dataset. We further observed that DistroFair is efficient, it generates 460 images per hour, on average. Finally, we evaluate the semantic validity of our approach via a user study with 81 participants, using 30 real images and 30 corresponding mutated images generated by DistroFair. We found that images generated by DistroFair are 80% as realistic as real-world images.
- [391] arXiv:2305.17693 (replaced) [pdf, ps, other]
-
Title: Deflation for the off-diagonal block in symmetric saddle point systemsComments: 28 pages, 13 figuresSubjects: Numerical Analysis (math.NA)
Deflation techniques are typically used to shift isolated clusters of small eigenvalues in order to obtain a tighter distribution and a smaller condition number. Such changes induce a positive effect in the convergence behavior of Krylov subspace methods, which are among the most popular iterative solvers for large sparse linear systems. We develop a deflation strategy for symmetric saddle point matrices by taking advantage of their underlying block structure. The vectors used for deflation come from an elliptic singular value decomposition relying on the generalized Golub-Kahan bidiagonalization process. The block targeted by deflation is the off-diagonal one since it features a problematic singular value distribution for certain applications. One example is the Stokes flow in elongated channels, where the off-diagonal block has several small, isolated singular values, depending on the length of the channel. Applying deflation to specific parts of the saddle point system is important when using solvers such as CRAIG, which operates on individual blocks rather than the whole system. The theory is developed by extending the existing framework for deflating square matrices before applying a Krylov subspace method like MINRES. Numerical experiments confirm the merits of our strategy and lead to interesting questions about using approximate vectors for deflation.
- [392] arXiv:2306.04981 (replaced) [pdf, ps, other]
-
Title: Computation of a Unified Graph-Based Rate Optimization ProblemSubjects: Information Theory (cs.IT)
We define a graph-based rate optimization problem and consider its computation, which provides a unified approach to the computation of various theoretical limits, such as the (conditional) graph entropy, rate-distortion functions and capacity-cost functions with two-sided information. Our contributions are twofold.
On the theoretical side, we simplify the graph-based problem by constructing explicit graph contractions in some special cases. These efforts reduce the number of decision variables in the optimization problem. Graph characterizations for rate-distortion and capacity-cost functions with two-sided information are simplified by specializing the results.
On the computational side, we design an alternating minimization algorithm for the graph-based problem, which deals with the inequality constraint by a flexible multiplier update strategy. Moreover, deflation techniques are introduced, so that the computing time can be largely reduced. Theoretical analysis shows that the algorithm converges to an optimal solution. The accuracy and efficiency of the algorithm are illustrated by numerical experiments. - [393] arXiv:2306.10432 (replaced) [pdf, ps, other]
-
Title: Universal quantification makes automatic structures hard to decideSubjects: Logic in Computer Science (cs.LO)
Automatic structures are structures whose universe and relations can be represented as regular languages. It follows from the standard closure properties of regular languages that the first-order theory of an automatic structure is decidable. While existential quantifiers can be eliminated in linear time by application of a homomorphism, universal quantifiers are commonly eliminated via the identity $\forall{x}. \Phi \equiv \neg (\exists{x}. \neg \Phi)$. If $\Phi$ is represented in the standard way as an NFA, a priori this approach results in a doubly exponential blow-up. However, the recent literature has shown that there are classes of automatic structures for which universal quantifiers can be eliminated by different means without this blow-up by treating them as first-class citizens and not resorting to double complementation. While existing lower bounds for some classes of automatic structures show that a singly exponential blow-up is unavoidable when eliminating a universal quantifier, it is not known whether there may be better approaches that avoid the naïve doubly exponential blow-up, perhaps at least in restricted settings.
In this paper, we answer this question negatively and show that there is a family of NFA representing automatic relations for which the minimal NFA recognising the language after eliminating a single universal quantifier is doubly exponential, and deciding whether this language is empty is \expspace-complete.
The techniques underlying our \expspace lower bound further enable us to establish new lower bounds for some fragments of Büchi arithmetic with a fixed number of quantifier alternations. - [394] arXiv:2306.11177 (replaced) [pdf, ps, other]
-
Title: Pipit: Scripting the analysis of parallel execution tracesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program execution to identify different kinds of performance issues. Often times, trace collection tools provide a graphical tool to analyze the trace output. However, these GUI-based tools only support specific file formats, are challenging to scale to large trace sizes, limit data exploration to the implemented graphical views, and do not support automated comparisons of two or more datasets. In this paper, we present a programmatic approach to analyzing parallel execution traces by leveraging pandas, a powerful Python-based data analysis library. We have developed a Python library, Pipit, on top of pandas that can read traces in different file formats (OTF2, HPCToolkit, Projections, Nsight Systems, etc.) and provides a uniform data structure in the form of a pandas DataFrame. Pipit provides operations to aggregate, filter, and transform the events in a trace to present the data in different ways. We also provide several functions to quickly and easily identify performance issues in parallel executions. More importantly, the API is easily extensible to support custom analyses by different end users.
- [395] arXiv:2306.17281 (replaced) [pdf, ps, other]
-
Title: HPC-Coder: Modeling Parallel Programs using Large Language ModelsJournal-ref: ISC High Performance 2024 Research Paper Proceedings (39th International Conference), Hamburg, Germany, 2024, pp. 1-12Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Parallel programs in high performance computing (HPC) continue to grow in complexity and scale in the exascale era. The diversity in hardware and parallel programming models make developing, optimizing, and maintaining parallel software even more burdensome for developers. One way to alleviate some of these burdens is with automated development and analysis tools. Such tools can perform complex and/or remedial tasks for developers that increase their productivity and decrease the chance for error. Until recently, such tools for code development and performance analysis have been limited in the complexity of tasks they can perform, especially for parallel programs. However, with recent advancements in language modeling, and the availability of large amounts of open-source code related data, these tools have started to utilize predictive language models to automate more complex tasks. In this paper, we show how large language models (LLMs) can be applied to tasks specific to high performance and scientific codes. We introduce a new dataset of HPC and scientific codes and use it to fine-tune several pre-trained models. We compare several pre-trained LLMs on HPC-related tasks and introduce a new model, HPC-Coder, fine-tuned on parallel codes. In our experiments, we show that this model can auto-complete HPC functions where generic models cannot, decorate for loops with OpenMP pragmas, and model performance changes in scientific application repositories as well as programming competition solutions.
- [396] arXiv:2307.00831 (replaced) [pdf, ps, other]
-
Title: On the Satisfiability of Local First-Order Logics with DataComments: arXiv admin note: substantial text overlap with arXiv:2209.10309Subjects: Logic in Computer Science (cs.LO)
We study first-order logic over unordered structures whose elements carry a finite number of data values from an infinite domain. Data values can be compared wrt.\ equality. As the satisfiability problem for this logic is undecidable in general, we introduce a family of local fragments. They restrict quantification to the neighbourhood of a given reference point that is bounded by some radius. Our first main result establishes decidability of the satisfiability problem for the local radius-1 fragment in presence of one ``diagonal relation''. On the other hand, extending the radius leads to undecidability. In a second part, we provide the precise decidability and complexity landscape of the satisfiability problem for the existential fragments of local logic, which are parameterized by the number of data values carried by each element and the radius of the considered neighbourhoods. Altogether, we draw a landscape of formalisms that are suitable for the specification of systems with data and open up new avenues for future research.
- [397] arXiv:2307.05147 (replaced) [pdf, ps, html, other]
-
Title: Tests4Py: A Benchmark for System TestingComments: 5 pages, 4 figuresSubjects: Software Engineering (cs.SE)
Benchmarks are among the main drivers of progress in software engineering research. However, many current benchmarks are limited by inadequate system oracles and sparse unit tests. Our Tests4Py benchmark, derived from the BugsInPy benchmark, addresses these limitations. It includes 73 bugs from seven real-world Python applications and six bugs from example programs. Each subject in Tests4Py is equipped with an oracle for verifying functional correctness and supports both system and unit test generation. This allows for comprehensive qualitative studies and extensive evaluations, making Tests4Py a cutting-edge benchmark for research in test generation, debugging, and automatic program repair.
- [398] arXiv:2307.06306 (replaced) [pdf, ps, other]
-
Title: Locally Adaptive Federated LearningComments: 29 pages, 9 figuresSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Federated learning is a paradigm of distributed machine learning in which multiple clients coordinate with a central server to learn a model, without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) ensure balance among the clients by using the same stepsize for local updates on all clients. However, this means that all clients need to respect the global geometry of the function which could yield slow convergence. In this work, we propose locally adaptive federated learning algorithms, that leverage the local geometric information for each client function. We show that such locally adaptive methods with uncoordinated stepsizes across all clients can be particularly efficient in interpolated (overparameterized) settings, and analyze their convergence in the presence of heterogeneous data for convex and strongly convex settings. We validate our theoretical claims by performing illustrative experiments for both i.i.d. non-i.i.d. cases. Our proposed algorithms match the optimization performance of tuned FedAvg in the convex setting, outperform FedAvg as well as state-of-the-art adaptive federated algorithms like FedAMS for non-convex experiments, and come with superior generalization performance.
- [399] arXiv:2307.11465 (replaced) [pdf, ps, html, other]
-
Title: A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing ValuesComments: 23 pages, 3 figuresSubjects: Machine Learning (cs.LG); Applications (stat.AP)
In the field of lung cancer research, particularly in the analysis of overall survival (OS), artificial intelligence (AI) serves crucial roles with specific aims. Given the prevalent issue of missing data in the medical domain, our primary objective is to develop an AI model capable of dynamically handling this missing data. Additionally, we aim to leverage all accessible data, effectively analyzing both uncensored patients who have experienced the event of interest and censored patients who have not, by embedding a specialized technique within our AI model, not commonly utilized in other AI tasks. Through the realization of these objectives, our model aims to provide precise OS predictions for non-small cell lung cancer (NSCLC) patients, thus overcoming these significant challenges. We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. More specifically, this model tailors the transformer architecture to tabular data by adapting its feature embedding and masked self-attention to mask missing data and fully exploit the available ones. By making use of ad-hoc designed losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.
- [400] arXiv:2307.11973 (replaced) [pdf, ps, other]
-
Title: Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
As a fundamental aspect of human life, two-person interactions contain meaningful information about people's activities, relationships, and social settings. Human action recognition serves as the foundation for many smart applications, with a strong focus on personal privacy. However, recognizing two-person interactions poses more challenges due to increased body occlusion and overlap compared to single-person actions. In this paper, we propose a point cloud-based network named Two-stream Multi-level Dynamic Point Transformer for two-person interaction recognition. Our model addresses the challenge of recognizing two-person interactions by incorporating local-region spatial information, appearance information, and motion information. To achieve this, we introduce a designed frame selection method named Interval Frame Sampling (IFS), which efficiently samples frames from videos, capturing more discriminative information in a relatively short processing time. Subsequently, a frame features learning module and a two-stream multi-level feature aggregation module extract global and partial features from the sampled frames, effectively representing the local-region spatial information, appearance information, and motion information related to the interactions. Finally, we apply a transformer to perform self-attention on the learned features for the final classification. Extensive experiments are conducted on two large-scale datasets, the interaction subsets of NTU RGB+D 60 and NTU RGB+D 120. The results show that our network outperforms state-of-the-art approaches in most standard evaluation settings.
- [401] arXiv:2308.09381 (replaced) [pdf, ps, other]
-
Title: On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-boxSubjects: Machine Learning (cs.LG)
Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents \methodAbr~(gradient-estimation-based explanation), an approach that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.
- [402] arXiv:2309.01408 (replaced) [pdf, ps, other]
-
Title: Leveraging Self-Supervised Vision Transformers for Segmentation-based Transfer Function DesignComments: accepted at TVCG 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
In volume rendering, transfer functions are used to classify structures of interest, and to assign optical properties such as color and opacity. They are commonly defined as 1D or 2D functions that map simple features to these optical properties. As the process of designing a transfer function is typically tedious and unintuitive, several approaches have been proposed for their interactive specification. In this paper, we present a novel method to define transfer functions for volume rendering by leveraging the feature extraction capabilities of self-supervised pre-trained vision transformers. To design a transfer function, users simply select the structures of interest in a slice viewer, and our method automatically selects similar structures based on the high-level features extracted by the neural network. Contrary to previous learning-based transfer function approaches, our method does not require training of models and allows for quick inference, enabling an interactive exploration of the volume data. Our approach reduces the amount of necessary annotations by interactively informing the user about the current classification, so they can focus on annotating the structures of interest that still require annotation. In practice, this allows users to design transfer functions within seconds, instead of minutes. We compare our method to existing learning-based approaches in terms of annotation and compute time, as well as with respect to segmentation accuracy. Our accompanying video showcases the interactivity and effectiveness of our method.
- [403] arXiv:2309.02340 (replaced) [pdf, ps, other]
-
Title: Local Padding in Patch-Based GANs for Seamless Infinite-Sized Texture SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Texture models based on Generative Adversarial Networks (GANs) use zero-padding to implicitly encode positional information of the image features. However, when extending the spatial input to generate images at large sizes, zero-padding can often lead to degradation of quality due to the incorrect positional information at the center of the image and limit the diversity within the generated images. In this paper, we propose a novel approach for generating stochastic texture images at large arbitrary sizes using GANs model that is based on patch-by-patch generation. Instead of zero-padding, the model uses \textit{local padding} in the generator that shares border features between the generated patches; providing positional context and ensuring consistency at the boundaries. The proposed models are trainable on a single texture image and have a constant GPU scalability with respect to the output image size, and hence can generate images of infinite sizes. We show in the experiments that our method has a significant advancement beyond existing texture models in terms of the quality and diversity of the generated textures. Furthermore, the implementation of local padding in the state-of-the-art super-resolution models effectively eliminates tiling artifacts enabling large-scale super-resolution. Our code is available at \url{this https URL
- [404] arXiv:2309.05642 (replaced) [pdf, ps, other]
-
Title: On the Potential and Limitations of Proxy Voting: Delegation with Incomplete VotesGeorgios Amanatidis, Aris Filos-Ratsikas, Philip Lazos, Evangelos Markakis, Georgios PapasotiropoulosSubjects: Computer Science and Game Theory (cs.GT)
We study elections where voters are faced with the challenge of expressing preferences over an extreme number of issues under consideration. This is largely motivated by emerging blockchain governance systems, which include voters with different weights and a massive number of community generated proposals. In such scenarios, it is natural to expect that voters will have incomplete preferences, as they may only be able to evaluate or be confident about a very small proportion of the alternatives. As a result, the election outcome may be significantly affected, leading to suboptimal decisions. Our central inquiry revolves around whether delegation of ballots to proxies possessing greater expertise or a more comprehensive understanding of the voters' preferences can lead to outcomes with higher legitimacy and enhanced voters' satisfaction in elections where voters submit incomplete preferences. To explore its aspects, we introduce the following model: potential proxies advertise their ballots over multiple issues, and each voter either delegates to a seemingly attractive proxy or casts a ballot directly. We identify necessary and sufficient conditions that could lead to a socially better outcome by leveraging the participation of proxies. We accompany our theoretical findings with experiments on instances derived from real datasets. Overall, our results enhance the understanding of the power of delegation towards improving election outcomes.
- [405] arXiv:2309.05950 (replaced) [pdf, ps, other]
-
Title: Language Models as Black-Box Optimizers for Vision-Language ModelsComments: Published at CVPR 2024. Project site: this https URLSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
Vision-language models (VLMs) pre-trained on web-scale datasets have demonstrated remarkable capabilities on downstream tasks when fine-tuned with minimal data. However, many VLMs rely on proprietary data and are not open-source, which restricts the use of white-box approaches for fine-tuning. As such, we aim to develop a black-box approach to optimize VLMs through natural language prompts, thereby avoiding the need to access model parameters, feature embeddings, or even output logits. We propose employing chat-based LLMs to search for the best text prompt for VLMs. Specifically, we adopt an automatic hill-climbing procedure that converges to an effective prompt by evaluating the performance of current prompts and asking LLMs to refine them based on textual feedback, all within a conversational process without human-in-the-loop. In a challenging 1-shot image classification setup, our simple approach surpasses the white-box continuous prompting method (CoOp) by an average of 1.5% across 11 datasets including ImageNet. Our approach also outperforms both human-engineered and LLM-generated prompts. We highlight the advantage of conversational feedback that incorporates both positive and negative prompts, suggesting that LLMs can utilize the implicit gradient direction in textual feedback for a more efficient search. In addition, we find that the text prompts generated through our strategy are not only more interpretable but also transfer well across different VLM architectures in a black-box manner. Lastly, we apply our framework to optimize the state-of-the-art black-box VLM (DALL-E 3) for text-to-image generation, prompt inversion, and personalization.
- [406] arXiv:2309.16668 (replaced) [pdf, ps, other]
-
Title: RealFill: Reference-Driven Generation for Authentic Image CompletionLuming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael RubinsteinComments: SIGGRAPH 2024 (Journal Track). Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions. However, the content these models hallucinate is necessarily inauthentic, since they are unaware of the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. Project page: this https URL
- [407] arXiv:2310.00184 (replaced) [pdf, ps, html, other]
-
Title: NASU -- Novel Actuating Screw Unit: Origami-inspired Screw-based Propulsion on Mobile Ground RobotsCalvin Joyce, Jason Lim, Roger Nguyen, Michael Owens, Sara Wickenhiser, Elizabeth Peiros, Florian Richter, Michael C. YipComments: 7 pages, 8 Figures, submitted to IROS 2024Subjects: Robotics (cs.RO)
Screw-based locomotion is a robust method of locomotion across a wide range of media including water, sand, and gravel. A challenge with screws is their significant number of impactful design parameters that affect locomotion performance. One crucial parameter is the angle of attack (also called the lead angle), which has been shown to significantly impact the performance of screw propellers in terms of traveling velocity, force produced, degree of slip, and sinkage. As a result, the optimal design choice may vary significantly depending on application and mission objectives. In this work, we present the Novel Actuating Screw Unit (NASU). It is the first screw-based propulsion design that enables dynamic reconfiguration of the angle of attack for optimized locomotion across multiple media and use cases. The design is inspired by the kresling unit, a mechanism from origami robotics, and the angle of attack is adjusted with a linear actuator. In contrast, the entire unit is spun on its axis to generate propulsion. NASU is integrated into a mobile test bed and experiments are conducted in various media including gravel, grass, and sand. Our experiment results indicate a trade-off between locomotive efficiency and velocity exists regarding angle of attack, and the proposed design is a promising direction for reconfigurable screws by allowing control to optimize for efficiency or velocity.
- [408] arXiv:2310.00434 (replaced) [pdf, ps, html, other]
-
Title: DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion ModelsComments: SIGGRAPH 2024 (Journal Track). Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
The generation of stylistic 3D facial animations driven by speech presents a significant challenge as it requires learning a many-to-many mapping between speech, style, and the corresponding natural facial motion. However, existing methods either employ a deterministic model for speech-to-motion mapping or encode the style using a one-hot encoding scheme. Notably, the one-hot encoding approach fails to capture the complexity of the style and thus limits generalization ability. In this paper, we propose DiffPoseTalk, a generative framework based on the diffusion model combined with a style encoder that extracts style embeddings from short reference videos. During inference, we employ classifier-free guidance to guide the generation process based on the speech and style. In particular, our style includes the generation of head poses, thereby enhancing user perception. Additionally, we address the shortage of scanned 3D talking face data by training our model on reconstructed 3DMM parameters from a high-quality, in-the-wild audio-visual dataset. Extensive experiments and user study demonstrate that our approach outperforms state-of-the-art methods. The code and dataset are at this https URL .
- [409] arXiv:2310.02700 (replaced) [pdf, ps, html, other]
-
Title: Insights of using Control Theory for minimizing Induced Seismicity in Underground ReservoirsSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Deep Geothermal Energy, Carbon Capture, and Storage and Hydrogen Storage have significant potential to meet the large-scale needs of the energy sector and reduce the CO$_2$ emissions. However, the injection of fluids into the earth's crust, upon which these activities rely, can lead to the formation of new seismogenic faults or the reactivation of existing ones, thereby causing earthquakes. In this study, we propose a novel approach based on control theory to address this issue. First, we obtain a simplified model of induced seismicity due to fluid injections in an underground reservoir using a diffusion equation in three dimensions. Then, we design a robust tracking control approach to force the seismicity rate to follow desired references. In this way, the induced seismicity is minimized while ensuring fluid circulation for the needs of renewable energy production and storage. The designed control guarantees the achievement of the control objectives even in the presence of system uncertainties and unknown dynamics. Finally, we present simulations of a simplified geothermal reservoir under different scenarios of energy demand to show the reliability and performance of the control approach, opening new perspectives for field experiments based on real-time regulators.
- [410] arXiv:2310.05393 (replaced) [pdf, ps, other]
-
Title: Hierarchical Side-Tuning for Vision TransformersComments: 10 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Fine-tuning pre-trained Vision Transformers (ViTs) has showcased significant promise in enhancing visual recognition tasks. Yet, the demand for individualized and comprehensive fine-tuning processes for each task entails substantial computational and memory costs, posing a considerable challenge. Recent advancements in Parameter-Efficient Transfer Learning (PETL) have shown potential for achieving high performance with fewer parameter updates compared to full fine-tuning. However, their effectiveness is primarily observed in simple tasks like image classification, while they encounter challenges with more complex vision tasks like dense prediction. To address this gap, this study aims to identify an effective tuning method that caters to a wider range of visual tasks. In this paper, we introduce Hierarchical Side-Tuning (HST), an innovative PETL method facilitating the transfer of ViT models to diverse downstream tasks. Diverging from existing methods that focus solely on fine-tuning parameters within specific input spaces or modules, HST employs a lightweight Hierarchical Side Network (HSN). This network leverages intermediate activations from the ViT backbone to model multi-scale features, enhancing prediction capabilities. To evaluate HST, we conducted comprehensive experiments across a range of visual tasks, including classification, object detection, instance segmentation, and semantic segmentation. Remarkably, HST achieved state-of-the-art performance in 13 out of the 19 tasks on the VTAB-1K benchmark, with the highest average Top-1 accuracy of 76.1%, while fine-tuning a mere 0.78M parameters. When applied to object detection and semantic segmentation tasks on the COCO and ADE20K testdev benchmarks, HST outperformed existing PETL methods and even surpassed full fine-tuning.
- [411] arXiv:2310.09202 (replaced) [pdf, ps, html, other]
-
Title: Graph Distillation with Eigenbasis MatchingComments: Accepted by ICML 2024Subjects: Machine Learning (cs.LG)
The increasing amount of graph data places requirements on the efficient training of graph neural networks (GNNs). The emerging graph distillation (GD) tackles this challenge by distilling a small synthetic graph to replace the real large graph, ensuring GNNs trained on real and synthetic graphs exhibit comparable performance. However, existing methods rely on GNN-related information as supervision, including gradients, representations, and trajectories, which have two limitations. First, GNNs can affect the spectrum (i.e., eigenvalues) of the real graph, causing spectrum bias in the synthetic graph. Second, the variety of GNN architectures leads to the creation of different synthetic graphs, requiring traversal to obtain optimal performance. To tackle these issues, we propose Graph Distillation with Eigenbasis Matching (GDEM), which aligns the eigenbasis and node features of real and synthetic graphs. Meanwhile, it directly replicates the spectrum of the real graph and thus prevents the influence of GNNs. Moreover, we design a discrimination constraint to balance the effectiveness and generalization of GDEM. Theoretically, the synthetic graphs distilled by GDEM are restricted spectral approximations of the real graphs. Extensive experiments demonstrate that GDEM outperforms state-of-the-art GD methods with powerful cross-architecture generalization ability and significant distillation efficiency. Our code is available at this https URL.
- [412] arXiv:2310.11843 (replaced) [pdf, ps, other]
-
Title: Do We Run Large-scale Multi-Robot Systems on the Edge? More Evidence for Two-Phase Performance in System Size ScalingComments: Submitted to the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
With increasing numbers of mobile robots arriving in real-world applications, more robots coexist in the same space, interact, and possibly collaborate. Methods to provide such systems with system size scalability are known, for example, from swarm robotics. Example strategies are self-organizing behavior, a strict decentralized approach, and limiting the robot-robot communication. Despite applying such strategies, any multi-robot system breaks above a certain critical system size (i.e., number of robots) as too many robots share a resource (e.g., space, communication channel). We provide additional evidence based on simulations, that at these critical system sizes, the system performance separates into two phases: nearly optimal and minimal performance. We speculate that in real-world applications that are configured for optimal system size, the supposedly high-performing system may actually live on borrowed time as it is on a transient to breakdown. We provide two modeling options (based on queueing theory and a population model) that may help to support this reasoning.
- [413] arXiv:2310.12350 (replaced) [pdf, ps, html, other]
-
Title: Equipping Federated Graph Neural Networks with Structure-aware Group FairnessSubjects: Machine Learning (cs.LG)
Graph Neural Networks (GNNs) have been widely used for various types of graph data processing and analytical tasks in different domains. Training GNNs over centralized graph data can be infeasible due to privacy concerns and regulatory restrictions. Thus, federated learning (FL) becomes a trending solution to address this challenge in a distributed learning paradigm. However, as GNNs may inherit historical bias from training data and lead to discriminatory predictions, the bias of local models can be easily propagated to the global model in distributed settings. This poses a new challenge in mitigating bias in federated GNNs. To address this challenge, we propose $\text{F}^2$GNN, a Fair Federated Graph Neural Network, that enhances group fairness of federated GNNs. As bias can be sourced from both data and learning algorithms, $\text{F}^2$GNN aims to mitigate both types of bias under federated settings. First, we provide theoretical insights on the connection between data bias in a training graph and statistical fairness metrics of the trained GNN models. Based on the theoretical analysis, we design $\text{F}^2$GNN which contains two key components: a fairness-aware local model update scheme that enhances group fairness of the local models on the client side, and a fairness-weighted global model update scheme that takes both data bias and fairness metrics of local models into consideration in the aggregation process. We evaluate $\text{F}^2$GNN empirically versus a number of baseline methods, and demonstrate that $\text{F}^2$GNN outperforms these baselines in terms of both fairness and model accuracy.
- [414] arXiv:2310.13206 (replaced) [pdf, ps, html, other]
-
Title: Primacy Effect of ChatGPTComments: EMNLP 2023 short paperSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Instruction-tuned large language models (LLMs), such as ChatGPT, have led to promising zero-shot performance in discriminative natural language understanding (NLU) tasks. This involves querying the LLM using a prompt containing the question, and the candidate labels to choose from. The question-answering capabilities of ChatGPT arise from its pre-training on large amounts of human-written text, as well as its subsequent fine-tuning on human preferences, which motivates us to ask: Does ChatGPT also inherits humans' cognitive biases? In this paper, we study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We have two main findings: i) ChatGPT's decision is sensitive to the order of labels in the prompt; ii) ChatGPT has a clearly higher chance to select the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions. We release the source code at this https URL.
- [415] arXiv:2310.14558 (replaced) [pdf, ps, html, other]
-
Title: AlpaCare:Instruction-tuned Large Language Models for Medical ApplicationSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Instruction-finetuning (IFT) has become crucial in aligning Large Language Models (LLMs) with diverse human needs and has shown great potential in medical applications. However, previous studies mainly fine-tune LLMs on biomedical datasets with limited diversity, which often rely on benchmarks or narrow task scopes, and hence significantly limit the effectiveness on their medical instruction-following ability and generalizability. To bridge this gap, we propose creating a diverse, machine-generated medical IFT dataset, MedInstruct-52k, using GPT-4 and ChatGPT with a high-quality expert-curated seed set. We then fine-tune LLaMA-series models on the dataset to develop AlpaCare. Despite using a smaller domain-specific dataset than previous medical LLMs, AlpaCare not only demonstrates superior performance on medical applications, with up to 38.1% absolute gain over best baselines in medical free-form instruction evaluations, but also achieves 6.7% absolute gains averaged over multiple general domain benchmarks. Human evaluation further shows that AlpaCare consistently outperforms best baselines in terms of both correctness and helpfulness. We offer public access to our data, model, and codebase in this https URL.
- [416] arXiv:2310.14692 (replaced) [pdf, ps, html, other]
-
Title: On Partial Shape Correspondence and Functional MapsSubjects: Computer Vision and Pattern Recognition (cs.CV)
While dealing with matching shapes to their parts, we often apply a tool known as functional maps. The idea is to translate the shape matching problem into ``convenient'' spaces by which matching is performed algebraically by solving a least squares problem. Here, we argue that such formulations, though popular in this field, introduce errors in the estimated match when partiality is invoked. Such errors are unavoidable even for advanced feature extraction networks, and they can be shown to escalate with increasing degrees of shape partiality, adversely affecting the learning capability of such systems. To circumvent these limitations, we propose a novel approach for partial shape matching. Our study of functional maps led us to a novel method that establishes direct correspondence between partial and full shapes through feature matching bypassing the need for functional map intermediate spaces. The Gromov distance between metric spaces leads to the construction of the first part of our loss functions. For regularization we use two options: a term based on the area preserving property of the mapping, and a relaxed version that avoids the need to resort to functional maps. The proposed approach shows superior performance on the SHREC'16 dataset, outperforming existing unsupervised methods for partial shape matching. Notably, it achieves state-of-the-art results on the SHREC'16 HOLES benchmark, superior also compared to supervised methods. We demonstrate the benefits of the proposed unsupervised method when applied to a new dataset PFAUST for part-to-full shape correspondence
- [417] arXiv:2310.15023 (replaced) [pdf, ps, html, other]
-
Title: SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging SonarsSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
In this paper, we address the challenging problem of data association for underwater SLAM through a novel method for sonar image correspondence using learned features. We introduce SONIC (SONar Image Correspondence), a pose-supervised network designed to yield robust feature correspondence capable of withstanding viewpoint variations. The inherent complexity of the underwater environment stems from the dynamic and frequently limited visibility conditions, restricting vision to a few meters of often featureless expanses. This makes camera-based systems suboptimal in most open water application scenarios. Consequently, multibeam imaging sonars emerge as the preferred choice for perception sensors. However, they too are not without their limitations. While imaging sonars offer superior long-range visibility compared to cameras, their measurements can appear different from varying viewpoints. This inherent variability presents formidable challenges in data association, particularly for feature-based methods. Our method demonstrates significantly better performance in generating correspondences for sonar images which will pave the way for more accurate loop closure constraints and sonar-based place recognition. Code as well as simulated and real-world datasets will be made public to facilitate further development in the field.
- [418] arXiv:2310.16050 (replaced) [pdf, ps, other]
-
Title: EquivAct: SIM(3)-Equivariant Visuomotor Policies beyond Rigid Object ManipulationComments: ICRA 2024; The first two authors contributed equallySubjects: Robotics (cs.RO)
If a robot masters folding a kitchen towel, we would expect it to master folding a large beach towel. However, existing policy learning methods that rely on data augmentation still don't guarantee such generalization. Our insight is to add equivariance to both the visual object representation and policy architecture. We propose EquivAct which utilizes SIM(3)-equivariant network structures that guarantee generalization across all possible object translations, 3D rotations, and scales by construction. EquivAct is trained in two phases. We first pre-train a SIM(3)-equivariant visual representation on simulated scene point clouds. Then, we learn a SIM(3)-equivariant visuomotor policy using a small amount of source task demonstrations. We show that the learned policy directly transfers to objects that substantially differ from demonstrations in scale, position, and orientation. We evaluate our method in three manipulation tasks involving deformable and articulated objects, going beyond typical rigid object manipulation tasks considered in prior work. We conduct experiments both in simulation and in reality. For real robot experiments, our method uses 20 human demonstrations of a tabletop task and transfers zero-shot to a mobile manipulation task in a much larger setup. Experiments confirm that our contrastive pre-training procedure and equivariant architecture offer significant improvements over prior work. Project website: this https URL
- [419] arXiv:2310.17813 (replaced) [pdf, ps, html, other]
-
Title: A Spectral Condition for Feature LearningSubjects: Machine Learning (cs.LG)
The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like $\sqrt{\texttt{fan-out}/\texttt{fan-in}}$, in contrast to widely used but heuristic scalings based on Frobenius norm and entry size. Our spectral scaling analysis also leads to an elementary derivation of \emph{maximal update parametrization}. All in all, we aim to provide the reader with a solid conceptual understanding of feature learning in neural networks.
- [420] arXiv:2310.19561 (replaced) [pdf, ps, other]
-
Title: Non-parametric regression for robot learning on manifoldsComments: 17 pages, 15 figures; added quantitative comparisons with baselines in the experiments Section; modified introduction; fixed typos; added Appendixes B and C; reordered sections for better understanding; changed the Section on adaptation of trajectoriesSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Many of the tools available for robot learning were designed for Euclidean data. However, many applications in robotics involve manifold-valued data. A common example is orientation; this can be represented as a 3-by-3 rotation matrix or a quaternion, the spaces of which are non-Euclidean manifolds. In robot learning, manifold-valued data are often handled by relating the manifold to a suitable Euclidean space, either by embedding the manifold or by projecting the data onto one or several tangent spaces. These approaches can result in poor predictive accuracy, and convoluted algorithms. In this paper, we propose an "intrinsic" approach to regression that works directly within the manifold. It involves taking a suitable probability distribution on the manifold, letting its parameter be a function of a predictor variable, such as time, then estimating that function non-parametrically via a "local likelihood" method that incorporates a kernel. We name the method kernelised likelihood estimation. The approach is conceptually simple, and generally applicable to different manifolds. We implement it with three different types of manifold-valued data that commonly appear in robotics applications. The results of these experiments show better predictive accuracy than projection-based algorithms.
- [421] arXiv:2311.02495 (replaced) [pdf, ps, other]
-
Title: Uncertainty Quantification in Multivariable Regression for Material Property Prediction with Bayesian Neural NetworksComments: 24 pages, 4 figuresJournal-ref: Scientific Reports, 14(1):10543, 2024Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
With the increased use of data-driven approaches and machine learning-based methods in material science, the importance of reliable uncertainty quantification (UQ) of the predicted variables for informed decision-making cannot be overstated. UQ in material property prediction poses unique challenges, including the multi-scale and multi-physics nature of advanced materials, intricate interactions between numerous factors, limited availability of large curated datasets for model training, etc. Recently, Bayesian Neural Networks (BNNs) have emerged as a promising approach for UQ, offering a probabilistic framework for capturing uncertainties within neural networks. In this work, we introduce an approach for UQ within physics-informed BNNs, which integrates knowledge from governing laws in material modeling to guide the models toward physically consistent predictions. To evaluate the effectiveness of this approach, we present case studies for predicting the creep rupture life of steel alloys. Experimental validation with three datasets of collected measurements from creep tests demonstrates the ability of BNNs to produce accurate point and uncertainty estimates that are competitive or exceed the performance of the conventional method of Gaussian Process Regression. Similarly, we evaluated the suitability of BNNs for UQ in an active learning application and reported competitive performance. The most promising framework for creep life prediction is BNNs based on Markov Chain Monte Carlo approximation of the posterior distribution of network parameters, as it provided more reliable results in comparison to BNNs based on variational inference approximation or related NNs with probabilistic outputs. The codes are available at: this https URL.
- [422] arXiv:2311.02546 (replaced) [pdf, ps, html, other]
-
Title: On the Second-Order Convergence of Biased Policy Gradient AlgorithmsSubjects: Machine Learning (cs.LG)
Since the objective functions of reinforcement learning problems are typically highly nonconvex, it is desirable that policy gradient, the most popular algorithm, escapes saddle points and arrives at second-order stationary points. Existing results only consider vanilla policy gradient algorithms with unbiased gradient estimators, but practical implementations under the infinite-horizon discounted reward setting are biased due to finite-horizon sampling. Moreover, actor-critic methods, whose second-order convergence has not yet been established, are also biased due to the critic approximation of the value function. We provide a novel second-order analysis of biased policy gradient methods, including the vanilla gradient estimator computed from Monte-Carlo sampling of trajectories as well as the double-loop actor-critic algorithm, where in the inner loop the critic improves the approximation of the value function via TD(0) learning. Separately, we also establish the convergence of TD(0) on Markov chains irrespective of initial state distribution.
- [423] arXiv:2311.03976 (replaced) [pdf, ps, other]
-
Title: Towards Generalised Pre-Training of Graph ModelsComments: 23 pages, 5 figures, 11 tables. For in-development code see this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The principal benefit of unsupervised representation learning is that a pre-trained model can be fine-tuned where data or labels are scarce. Existing approaches for graph representation learning are domain specific, maintaining consistent node and edge features across the pre-training and target datasets. This has precluded transfer to multiple domains.
In this work we present Topology Only Pre-Training, a graph pre-training method based on node and edge feature exclusion. Separating graph learning into two stages, topology and features, we use contrastive learning to pre-train models over multiple domains. These models show positive transfer on evaluation datasets from multiple domains, including domains not present in pre-training data. On 75% of experiments, ToP models perform significantly ($P \leq 0.01$) better than a supervised baseline. These results include when node and edge features are used in evaluation, where performance is significantly better on 85.7% of tasks compared to single-domain or non-pre-trained models. We further show that out-of-domain topologies can produce more useful pre-training than in-domain. We show better transfer from non-molecule pre-training, compared to molecule pre-training, on 79% of molecular benchmarks. - [424] arXiv:2311.08707 (replaced) [pdf, ps, other]
-
Title: K-BMPC: Derivative-based Koopman Bilinear Model Predictive Control for Tractor-Trailer Trajectory Tracking with Unknown ParametersSubjects: Systems and Control (eess.SY)
Nonlinear dynamics bring difficulties to controller design for control-affine systems such as tractor-trailer vehicles, especially when the parameters in the dynamics are unknown. To address this constraint, we propose a derivative-based lifting function construction method, show that the corresponding infinite dimensional Koopman bilinear model over the lifting function is equivalent to the original control-affine system. Further, we analyze the propagation and bounds of state prediction errors caused by the truncation in derivative order. The identified finite dimensional Koopman bilinear model would serve as predictive model in the next step. Koopman Bilinear Model Predictive control (K-BMPC) is proposed to solve the trajectory tracking problem. We linearize the bilinear model around the estimation of the lifted state and control input. Then the bilinear Model Predictive Control problem is approximated by a quadratic programming problem. Further, the estimation is updated at each iteration until the convergence is reached. Moreover, we implement our algorithm on a tractor-trailer system, taking into account the longitudinal and side slip effects. The open-loop simulation shows the proposed Koopman bilinear model captures the dynamics with unknown parameters and has good prediction performance. Closed-loop tracking results show the proposed K-BMPC exhibits elevated tracking precision with the commendable computational efficiency. The experimental results demonstrate the feasibility of K-BMPC.
- [425] arXiv:2311.12981 (replaced) [pdf, ps, other]
-
Title: SD-NAE: Generating Natural Adversarial Examples with Stable DiffusionComments: Accepted by ICLR 2024 TinyPapersSubjects: Computer Vision and Pattern Recognition (cs.CV)
Natural Adversarial Examples (NAEs), images arising naturally from the environment and capable of deceiving classifiers, are instrumental in robustly evaluating and identifying vulnerabilities in trained models. In this work, unlike prior works that passively collect NAEs from real images, we propose to actively synthesize NAEs using the state-of-the-art Stable Diffusion. Specifically, our method formulates a controlled optimization process, where we perturb the token embedding that corresponds to a specified class to generate NAEs. This generation process is guided by the gradient of loss from the target classifier, ensuring that the created image closely mimics the ground-truth class yet fools the classifier. Named SD-NAE (Stable Diffusion for Natural Adversarial Examples), our innovative method is effective in producing valid and useful NAEs, which is demonstrated through a meticulously designed experiment. Code is available at this https URL.
- [426] arXiv:2311.13080 (replaced) [pdf, ps, other]
-
Title: High-Speed Voltage Control in Active Distribution Systems with Smart Inverter Coordination and Deep Reinforcement LearningSubjects: Systems and Control (eess.SY)
The increasing penetration of renewable energy resources in distribution systems necessitates high-speed monitoring and control of voltage for ensuring reliable system operation. However, existing voltage control algorithms often make simplifying assumptions in their formulation, such as real-time availability of smart meter measurements (for monitoring), or real-time knowledge of every power injection information(for control).This paper leverages the recent advances made in highspeed state estimation for real-time unobservable distribution systems to formulate a deep reinforcement learning-based control algorithm that utilizes the state estimates alone to control the voltage of the entire system. The results obtained for a modified (renewable-rich) IEEE34-nodedistributionfeeder indicate that the proposed approach excels in monitoring and controlling voltage of active distribution systems.
- [427] arXiv:2311.14201 (replaced) [pdf, ps, other]
-
Title: On the convergence of adaptive approximations for stochastic differential equationsComments: 33 pages, 4 figuresSubjects: Numerical Analysis (math.NA); Probability (math.PR)
In this paper, we study numerical approximations for stochastic differential equations (SDEs) that use adaptive step sizes. In particular, we consider a general setting where decisions to reduce step sizes are allowed to depend on the future trajectory of the underlying Brownian motion. Since these adaptive step sizes may not be previsible, the standard mean squared error analysis cannot be directly applied to show that the numerical method converges to the solution of the SDE. Building upon the pioneering work of Gaines and Lyons, we shall instead use rough path theory to establish convergence for a wide class of adaptive numerical methods on general Stratonovich SDEs (with sufficiently smooth vector fields). To our knowledge, this is the first convergence guarantee that applies to standard solvers, such as the Milstein and Heun methods, with non-previsible step sizes. In our analysis, we require the sequence of adaptive step sizes to be nested and the SDE solver to have unbiased "Lévy area" terms in its Taylor expansion. We conjecture that for adaptive SDE solvers more generally, convergence is still possible provided the method does not introduce "Lévy area bias". We present a simple example where the step size control can skip over previously considered times, resulting in the numerical method converging to an incorrect limit (i.e. not the Stratonovich SDE). Finally, we conclude with an experiment demonstrating the accuracy of Heun's method and a newly introduced Splitting Path-based Runge-Kutta scheme (SPaRK) when used with adaptive step sizes.
- [428] arXiv:2311.14646 (replaced) [pdf, ps, html, other]
-
Title: More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is ObligatoryComments: Appeared in ICLR 2024Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in random feature (RF) regression, a class of models equivalent to shallow networks with only the last layer trained.
Concretely, we first show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples, provided the ridge penalty is tuned optimally. In particular, this implies that infinite width RF architectures are preferable to those of any finite width. We then proceed to demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory: near-optimal performance can only be achieved when the training error is much smaller than the test error. Grounding our theory in real-world data, we find empirically that standard computer vision tasks with convolutional neural tangent kernels clearly fall into this class. Taken together, our results tell a simple, testable story of the benefits of overparameterization, overfitting, and more data in random feature models. - [429] arXiv:2311.14650 (replaced) [pdf, ps, html, other]
-
Title: GVEL: Fast Graph Loading in Edgelist and Compressed Sparse Row (CSR) formatsComments: 10 pages, 8 figures, 1 tableSubjects: Performance (cs.PF)
Efficient IO techniques are crucial in high-performance graph processing frameworks like Gunrock and Hornet, as fast graph loading can help minimize processing time and reduce system/cloud usage charges. This research study presents approaches for efficiently reading an Edgelist from a text file and converting it to a Compressed Sparse Row (CSR) representation. On a server with dual 16-core Intel Xeon Gold 6226R processors and MegaRAID SAS-3 storage, our approach, which we term as GVEL, outperforms Hornet, Gunrock, and PIGO by significant margins in CSR reading, exhibiting an average speedup of 78x, 112x, and 1.8x, respectively. For Edgelist reading, GVEL is 2.6x faster than PIGO on average, and achieves a Edgelist read rate of 1.9 billion edges/s. For every doubling of threads, GVEL improves performance at an average rate of 1.9x and 1.7x for reading Edgelist and reading CSR respectively.
- [430] arXiv:2311.17330 (replaced) [pdf, ps, other]
-
Title: Biomedical knowledge graph-optimized prompt generation for large language modelsKarthik Soman, Peter W Rose, John H Morris, Rabia E Akbas, Brett Smith, Braian Peetoom, Catalina Villouta-Reyes, Gabriel Cerono, Yongmei Shi, Angela Rizk-Jackson, Sharat Israni, Charlotte A Nelson, Sui Huang, Sergio E BaranziniComments: 29 pages, 5 figures, 1 table, 1 supplementary fileSubjects: Computation and Language (cs.CL)
Large Language Models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, requiring further domain expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to generate meaningful biomedical text rooted in established knowledge. Compared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion.
- [431] arXiv:2311.17331 (replaced) [pdf, ps, other]
-
Title: Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question AnsweringComments: 16 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, several methods have been proposed to augment large Vision Language Models (VLMs) for Visual Question Answering (VQA) simplicity by incorporating external knowledge from knowledge bases or visual clues derived from question decomposition. Although having achieved promising results, these methods still suffer from the challenge that VLMs cannot inherently understand the incorporated knowledge and might fail to generate the optimal answers. Contrarily, human cognition engages visual questions through a top-down reasoning process, systematically exploring relevant issues to derive a comprehensive answer. This not only facilitates an accurate answer but also provides a transparent rationale for the decision-making pathway. Motivated by this cognitive mechanism, we introduce a novel, explainable multi-agent collaboration framework designed to imitate human-like top-down reasoning by leveraging the expansive knowledge of Large Language Models (LLMs). Our framework comprises three agents, i.e., Responder, Seeker, and Integrator, each contributing uniquely to the top-down reasoning process. The VLM-based Responder generates the answer candidates for the question and gives responses to other issues. The Seeker, primarily based on LLM, identifies relevant issues related to the question to inform the Responder and constructs a Multi-View Knowledge Base (MVKB) for the given visual scene by leveraging the understanding capabilities of LLM. The Integrator agent combines information from the Seeker and the Responder to produce the final VQA answer. Through this collaboration mechanism, our framework explicitly constructs an MVKB for a specific visual scene and reasons answers in a top-down reasoning process. Extensive and comprehensive evaluations on diverse VQA datasets and VLMs demonstrate the superior applicability and interpretability of our framework over the existing compared methods.
- [432] arXiv:2312.00172 (replaced) [pdf, ps, html, other]
-
Title: Projected exponential methods for stiff dynamical low-rank approximation problemsComments: 39 pages, 9 figuresSubjects: Numerical Analysis (math.NA)
The numerical integration of stiff equations is a challenging problem that needs to be approached by specialized numerical methods. Exponential integrators form a popular class of such methods since they are provably robust to stiffness and have been successfully applied to a variety of problems. The dynamical low- \rank approximation is a recent technique for solving high-dimensional differential equations by means of low-rank approximations. However, the domain is lacking numerical methods for stiff equations since existing methods are either not robust-to-stiffness or have unreasonably large hidden constants. In this paper, we focus on solving large-scale stiff matrix differential equations with a Sylvester-like structure, that admit good low-rank approximations. We propose two new methods that have good convergence properties, small memory footprint and that are fast to compute. The theoretical analysis shows that the new methods have order one and two, respectively. We also propose a practical implementation based on Krylov techniques. The approximation error is analyzed, leading to a priori error bounds and, therefore, a mean for choosing the size of the Krylov space. Numerical experiments are performed on several examples, confirming the theory and showing good speedup in comparison to existing techniques.
- [433] arXiv:2312.01991 (replaced) [pdf, ps, html, other]
-
Title: Information Modified K-Nearest NeighborMohammad Ali Vahedifar, Azim Akhtarshenas, Maryam Sabbaghian, Mohammad Mohammadi Rafatpanah, Ramin ToosiSubjects: Machine Learning (cs.LG); Information Theory (cs.IT)
The fundamental concept underlying K-Nearest Neighbors (KNN) is the classification of samples based on the majority through their nearest neighbors. Although distance and neighbors' labels are critical in KNN, traditional KNN treats all samples equally. However, some KNN variants weigh neighbors differently based on a specific rule, considering each neighbor's distance and label. Many KNN methodologies introduce complex algorithms that do not significantly outperform the traditional KNN, often leading to less satisfactory outcomes. The gap in reliably extracting information for accurately predicting true weights remains an open research challenge. In our proposed method, information-modified KNN (IMKNN), we bridge the gap by presenting a straightforward algorithm that achieves effective results. To this end, we introduce a classification method to improve the performance of the KNN algorithm. By exploiting mutual information (MI) and incorporating ideas from Shapley's values, we improve the traditional KNN performance in accuracy, precision, and recall, offering a more refined and effective solution.
To evaluate the effectiveness of our method, it is compared with eight variants of KNN. We conduct experiments on 12 widely-used datasets, achieving 11.05\%, 12.42\%, and 12.07\% in accuracy, precision, and recall performance, respectively, compared to traditional KNN. Additionally, we compared IMKNN with traditional KNN across four large-scale datasets to highlight the distinct advantages of IMKNN in the impact of monotonicity, noise, density, subclusters, and skewed distributions. Our research indicates that IMKNN consistently surpasses other methods in diverse datasets. - [434] arXiv:2312.05601 (replaced) [pdf, ps, other]
-
Title: A Meshless Solver for Blood Flow Simulations in Elastic Vessels Using Physics-Informed Neural NetworkSubjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
Investigating blood flow in the cardiovascular system is crucial for assessing cardiovascular health. Computational approaches offer some non-invasive alternatives to measure blood flow dynamics. Numerical simulations based on traditional methods such as finite-element and other numerical discretizations have been extensively studied and have yielded excellent results. However, adapting these methods to real-life simulations remains a complex task. In this paper, we propose a method that offers flexibility and can efficiently handle real-life simulations. We suggest utilizing the physics-informed neural network (PINN) to solve the Navier-Stokes equation in a deformable domain, specifically addressing the simulation of blood flow in elastic vessels. Our approach models blood flow using an incompressible, viscous Navier-Stokes equation in an Arbitrary Lagrangian-Eulerian form. The mechanical model for the vessel wall structure is formulated by an equation of Newton's second law of momentum and linear elasticity to the force exerted by the fluid flow. Our method is a mesh-free approach that eliminates the need for discretization and meshing of the computational domain. This makes it highly efficient in solving simulations involving complex geometries. Additionally, with the availability of well-developed open-source machine learning framework packages and parallel modules, our method can easily be accelerated through GPU computing and parallel computing. To evaluate our approach, we conducted experiments on regular cylinder vessels as well as vessels with plaque on their walls. We compared our results to a solution calculated by Finite Element Methods using a dense grid and small time steps, which we considered as the ground truth solution. We report the relative error and the time consumed to solve the problem, highlighting the advantages of our method.
- [435] arXiv:2312.06399 (replaced) [pdf, ps, other]
-
Title: Who Are Tweeting About Academic Publications? A Cochrane Systematic Review and Meta-Analysis of Altmetric StudiesSubjects: Digital Libraries (cs.DL)
Previous studies have developed different categorizations of Twitter users who interact with scientific publications online, reflecting the difficulty in creating a unified approach. Using Cochrane Review meta-analysis to analyse earlier research (including 79,014 Twitter users, over twenty million tweets, and over five million tweeted publications from 23 studies), we created a consolidated robust categorization consisting of 11 user categories, at different dimensions, covering most of any future needs for user categorizations on Twitter and possibly also other social media platforms. Our findings showed, with moderate certainty, covering all the earlier different approaches employed, that the predominant group of Twitter was individual users (66%), being responsible for the majority of tweets (55%) and tweeted publications (50%), while organizations (22%, 27%, and 28%, respectively) and science communicators (16%, 13%, and 30%) clearly contributed to a lesser degree. These individual users consisted of both academic individuals (33%) and other individuals (28%). While academic individuals shared more academic publications than other individuals (42% vs. 31%), they posted fewer tweets overall (22% vs. 30%), but these differences do not reach statistical significance. Despite significant heterogeneity arising from variations in earlier categorizations, the findings consistently indicate the importance of academics in disseminating academic publications on Twitter.
- [436] arXiv:2312.06446 (replaced) [pdf, ps, other]
-
Title: Measurement-driven neural-network training for integrated magnetic tunnel junction arraysWilliam A. Borders, Advait Madhavan, Matthew W. Daniels, Vasileia Georgiou, Martin Lueker-Boden, Tiffany S. Santos, Patrick M. Braganca, Mark D. Stiles, Jabez J. McClelland, Brian D. HoskinsComments: 17 pages, 9 figuresSubjects: Emerging Technologies (cs.ET); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Applied Physics (physics.app-ph)
The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as device-to-device variations or poor device yield impact performance. Methods such as hardware-aware training, where substrate non-idealities are incorporated during network training, are one way to recover performance at the cost of solution generality. In this work, we demonstrate inference on hardware neural networks consisting of 20,000 magnetic tunnel junction arrays integrated on a complementary metal-oxide-semiconductor chips that closely resembles market-ready spin transfer-torque magnetoresistive random access memory technology. Using 36 dies, each containing a crossbar array with its own non-idealities, we show that even a small number of defects in physically mapped networks significantly degrades the performance of networks trained without defects and show that, at the cost of generality, hardware-aware training accounting for specific defects on each die can recover to comparable performance with ideal networks. We then demonstrate a robust training method that extends hardware-aware training to statistics-aware training, producing network weights that perform well on most defective dies regardless of their specific defect locations. When evaluated on the 36 physical dies, statistics-aware trained solutions can achieve a mean misclassification error on the MNIST dataset that differs from the software-baseline by only 2 %. This statistics-aware training method could be generalized to networks with many layers that are mapped to hardware suited for industry-ready applications.
- [437] arXiv:2312.06472 (replaced) [pdf, ps, other]
-
Title: Dissipativity-Based Decentralized Co-Design of Distributed Controllers and Communication Topologies for Vehicular PlatoonsComments: 16 pages, 14 figures, one manuscript has been submitted to AutomaticaSubjects: Systems and Control (eess.SY)
Vehicular platoons provide an appealing option for future transportation systems. Most of the existing work on platoons separated the design of the controller and its communication topologies. However, it is beneficial to design both the platooning controller and the communication topology simultaneously, i.e., controller and topology co-design, especially in the cases of platoon splitting and merging. We are, therefore, motivated to propose a co-design framework for vehicular platoons that maintains both the compositionality of the controller and the string stability of the platoon, which enables the merging and splitting of the vehicles in a platoon. To this end, we first formulate the co-design problem as a centralized linear matrix inequality (LMI) problem and then decompose it using Sylvester's criterion to obtain a set of smaller decentralized LMI problems that can be solved sequentially at individual vehicles in the platoon. Moreover, in the formulated decentralized LMI problems, we encode a specifically derived local LMI to enforce the $L_2$ stability of the closed-loop platooning system, further implying the $L_2$ weak string stability of the vehicular platoon. Finally, to validate the proposed co-design method and its features in terms of merging/splitting, we provide an extensive collection of simulation results generated from a specifically developed simulation framework. Available in GitHub: this http URL that we have made publicly available.
- [438] arXiv:2312.08718 (replaced) [pdf, ps, other]
-
Title: Trajectory Planning and Tracking of Hybrid Flying-Crawling QuadrotorsSubjects: Robotics (cs.RO)
Hybrid Flying-Crawling Quadrotors (HyFCQs) are transformable robots with the ability of terrestrial and aerial hybrid motion. This article presents a trajectory planning and tracking framework designed for HyFCQs. In this framework, a terrestrial-aerial path-searching method with the crawling limitation of HyFCQs is proposed to guarantee the dynamical feasibility of trajectories. Additionally, a trajectory tracking method is proposed to address the challenges associated with the deformation time required by HyFCQs, which makes tracking hybrid trajectories at the junction between terrestrial and aerial segments difficult. Simulations and real-world experiments in diverse scenarios validate the exceptional performance of the proposed approach.
- [439] arXiv:2312.10225 (replaced) [pdf, ps, html, other]
-
Title: Customizing Large Language Models for Business Context: Framework and ExperimentsSubjects: Computers and Society (cs.CY)
The advent of Large Language Models (LLMs) has ushered in a new era for design science in Information Systems, demanding a paradigm shift in tailoring LLMs design for business contexts. We propose and test a novel framework to customize LLMs for general business contexts that aims to achieve three fundamental objectives simultaneously: (1) aligning conversational patterns, (2) integrating in-depth domain knowledge, and (3) embodying theory-driven soft skills and core principles. We design methodologies that combine domain-specific theory with Supervised Fine Tuning (SFT) to achieve these objectives simultaneously. We instantiate our proposed framework in the context of medical consultation. Specifically, we carefully construct a large volume of real doctors' consultation records and medical knowledge from multiple professional databases. Additionally, drawing on medical theory, we identify three soft skills and core principles of human doctors: professionalism, explainability, and emotional support, and design approaches to integrate these traits into LLMs. We demonstrate the feasibility of our framework using online experiments with thousands of real patients as well as evaluation by domain experts and consumers. Experimental results show that the customized LLM model substantially outperforms untuned base model in medical expertise as well as consumer satisfaction and trustworthiness, and it substantially reduces the gap between untuned LLMs and human doctors, elevating LLMs to the level of human experts. Additionally, we delve into the characteristics of textual consultation records and adopt interpretable machine learning techniques to identify what drives the performance gain. Finally, we showcase the practical value of our model through a decision support system designed to assist human doctors in a lab experiment.
- [440] arXiv:2312.10359 (replaced) [pdf, ps, html, other]
-
Title: Conformer-Based Speech Recognition On Extreme Edge-Computing DevicesMingbin Xu, Alex Jin, Sicheng Wang, Mu Su, Tim Ng, Henry Mason, Shiyi Han, Zhihong Lei, Yaqiao Deng, Zhen Huang, Mahesh KrishnamoorthySubjects: Machine Learning (cs.LG); Performance (cs.PF)
With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other smart home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. We achieve over 5.26 times faster than realtime (0.19 RTF) speech recognition on smart wearables while minimizing energy consumption and achieving state-of-the-art accuracy. The proposed methods are widely applicable to other transformer-based server-free AI applications. In addition, we provide a complete theory on optimal pre-normalizers that numerically stabilize layer normalization in any Lp-norm using any floating point precision.
- [441] arXiv:2312.11529 (replaced) [pdf, ps, other]
-
Title: Efficient and Scalable Graph Generation through Iterative Local ExpansionComments: Published as a conference paper at ICLR 2024Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
In the realm of generative models for graphs, extensive research has been conducted. However, most existing methods struggle with large graphs due to the complexity of representing the entire joint distribution across all node pairs and capturing both global and local graph structures simultaneously. To overcome these issues, we introduce a method that generates a graph by progressively expanding a single node to a target graph. In each step, nodes and edges are added in a localized manner through denoising diffusion, building first the global structure, and then refining the local details. The local generation avoids modeling the entire joint distribution over all node pairs, achieving substantial computational savings with subquadratic runtime relative to node count while maintaining high expressivity through multiscale generation. Our experiments show that our model achieves state-of-the-art performance on well-established benchmark datasets while successfully scaling to graphs with at least 5000 nodes. Our method is also the first to successfully extrapolate to graphs outside of the training distribution, showcasing a much better generalization capability over existing methods.
- [442] arXiv:2312.16693 (replaced) [pdf, ps, other]
-
Title: I2V-Adapter: A General Image-to-Video Adapter for Diffusion ModelsXun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, Zhengjun Zha, Haibin Huang, Chongyang MaSubjects: Computer Vision and Pattern Recognition (cs.CV)
Text-guided image-to-video (I2V) generation aims to generate a coherent video that preserves the identity of the input image and semantically aligns with the input prompt. Existing methods typically augment pretrained text-to-video (T2V) models by either concatenating the image with noised video frames channel-wise before being fed into the model or injecting the image embedding produced by pretrained image encoders in cross-attention modules. However, the former approach often necessitates altering the fundamental weights of pretrained T2V models, thus restricting the model's compatibility within the open-source communities and disrupting the model's prior knowledge. Meanwhile, the latter typically fails to preserve the identity of the input image. We present I2V-Adapter to overcome such limitations. I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model. Notably, I2V-Adapter only introduces a few trainable parameters, significantly alleviating the training cost and also ensures compatibility with existing community-driven personalized models and control tools. Moreover, we propose a novel Frame Similarity Prior to balance the motion amplitude and the stability of generated videos through two adjustable control coefficients. Our experimental results demonstrate that I2V-Adapter is capable of producing high-quality videos. This performance, coupled with its agility and adaptability, represents a substantial advancement in the field of I2V, particularly for personalized and controllable applications.
- [443] arXiv:2401.02902 (replaced) [pdf, ps, other]
-
Title: State Derivative Normalization for Continuous-Time Deep Neural NetworksJonas Weigand, Gerben I. Beintema, Jonas Ulmen, Daniel Görges, Roland Tóth, Maarten Schoukens, Martin RuskowskiComments: This work has been accepted for presentation at the 20th IFAC Symposium on System Identification 2024Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
The importance of proper data normalization for deep neural networks is well known. However, in continuous-time state-space model estimation, it has been observed that improper normalization of either the hidden state or hidden state derivative of the model estimate, or even of the time interval can lead to numerical and optimization challenges with deep learning based methods. This results in a reduced model quality. In this contribution, we show that these three normalization tasks are inherently coupled. Due to the existence of this coupling, we propose a solution to all three normalization challenges by introducing a normalization constant at the state derivative level. We show that the appropriate choice of the normalization constant is related to the dynamics of the to-be-identified system and we derive multiple methods of obtaining an effective normalization constant. We compare and discuss all the normalization strategies on a benchmark problem based on experimental data from a cascaded tanks system and compare our results with other methods of the identification literature.
- [444] arXiv:2401.03807 (replaced) [pdf, ps, other]
-
Title: Quantum Oblivious LWE Sampling and Insecurity of Standard Model Lattice-Based SNARKsSubjects: Cryptography and Security (cs.CR)
The Learning With Errors ($\mathsf{LWE}$) problem asks to find $\mathbf{s}$ from an input of the form $(\mathbf{A}, \mathbf{b} = \mathbf{A}\mathbf{s}+\mathbf{e}) \in (\mathbb{Z}/q\mathbb{Z})^{m \times n} \times (\mathbb{Z}/q\mathbb{Z})^{m}$, for a vector $\mathbf{e}$ that has small-magnitude entries. In this work, we do not focus on solving $\mathsf{LWE}$ but on the task of sampling instances. As these are extremely sparse in their range, it may seem plausible that the only way to proceed is to first create $\mathbf{s}$ and $\mathbf{e}$ and then set $\mathbf{b} = \mathbf{A}\mathbf{s}+\mathbf{e}$. In particular, such an instance sampler knows the solution. This raises the question whether it is possible to obliviously sample $(\mathbf{A}, \mathbf{A}\mathbf{s}+\mathbf{e})$, namely, without knowing the underlying $\mathbf{s}$. A variant of the assumption that oblivious $\mathsf{LWE}$ sampling is hard has been used in a series of works to analyze the security of candidate constructions of Succinct Non interactive Arguments of Knowledge (SNARKs). As the assumption is related to $\mathsf{LWE}$, these SNARKs have been conjectured to be secure in the presence of quantum adversaries.
Our main result is a quantum polynomial-time algorithm that samples well-distributed $\mathsf{LWE}$ instances while provably not knowing the solution, under the assumption that $\mathsf{LWE}$ is hard. Moreover, the approach works for a vast range of $\mathsf{LWE}$ parametrizations, including those used in the above-mentioned SNARKs. This invalidates the assumptions used in their security analyses, although it does not yield attacks against the constructions themselves. - [445] arXiv:2401.08808 (replaced) [pdf, ps, html, other]
-
Title: lpNTK: Better Generalisation with Less Data via Sample Interaction During LearningComments: ICLR-2024Subjects: Machine Learning (cs.LG)
Although much research has been done on proposing new models or loss functions to improve the generalisation of artificial neural networks (ANNs), less attention has been directed to the impact of the training data on generalisation. In this work, we start from approximating the interaction between samples, i.e. how learning one sample would modify the model's prediction on other samples. Through analysing the terms involved in weight updates in supervised learning, we find that labels influence the interaction between samples. Therefore, we propose the labelled pseudo Neural Tangent Kernel (lpNTK) which takes label information into consideration when measuring the interactions between samples. We first prove that lpNTK asymptotically converges to the empirical neural tangent kernel in terms of the Frobenius norm under certain assumptions. Secondly, we illustrate how lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning. Moreover, we also show that using lpNTK to identify and remove poisoning training samples does not hurt the generalisation performance of ANNs.
- [446] arXiv:2401.12554 (replaced) [pdf, ps, other]
-
Title: Can Large Language Models Write Parallel Code?Journal-ref: The 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC '24), June 3-7, 2024, Pisa, Italy. ACM, New York, NY, USA, 14 pagesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Large language models are increasingly becoming a popular tool for software development. Their ability to model and generate source code has been demonstrated in a variety of contexts, including code completion, summarization, translation, and lookup. However, they often struggle to generate code for complex programs. In this paper, we study the capabilities of state-of-the-art language models to generate parallel code. In order to evaluate language models, we create a benchmark, ParEval, consisting of prompts that represent 420 different coding tasks related to scientific and parallel computing. We use ParEval to evaluate the effectiveness of several state-of-the-art open- and closed-source language models on these tasks. We introduce novel metrics for evaluating the performance of generated code, and use them to explore how well each large language model performs for 12 different computational problem types and six different parallel programming models.
- [447] arXiv:2401.13327 (replaced) [pdf, ps, other]
-
Title: Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress DetectionComments: Published in the MDPI Sensors JournalJournal-ref: Sensors 2024, 24(10), 3052Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Smartwatch health sensor data are increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprise sensitive personal information and are resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress, employing Generative Adversarial Networks (GANs) and Differential Privacy (DP) safeguards. Our method not only protects patient information but also enhances data availability for research. To ensure its usefulness, we test synthetic data from multiple GANs and employ different data enhancement strategies on an actual stress detection task. Our GAN-based augmentation methods demonstrate significant improvements in model performance, with private DP training scenarios observing an 11.90-15.48% increase in F1-score, while non-private training scenarios still see a 0.45% boost. These results underline the potential of differentially private synthetic data in optimizing utility-privacy trade-offs, especially with the limited availability of real training samples. Through rigorous quality assessments, we confirm the integrity and plausibility of our synthetic data, which, however, are significantly impacted when increasing privacy requirements.
- [448] arXiv:2401.14893 (replaced) [pdf, ps, other]
-
Title: A structured regression approach for evaluating model performance across intersectional subgroupsSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Applications (stat.AP); Machine Learning (stat.ML)
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups defined by combinations of demographic or other sensitive attributes. The standard approach is to stratify the evaluation data across subgroups and compute performance metrics separately for each group. However, even for moderately-sized evaluation datasets, sample sizes quickly get small once considering intersectional subgroups, which greatly limits the extent to which intersectional groups are included in analysis. In this work, we introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups. We provide corresponding inference strategies for constructing confidence intervals and explore how goodness-of-fit testing can yield insight into the structure of fairness-related harms experienced by intersectional groups. We evaluate our approach on two publicly available datasets, and several variants of semi-synthetic data. The results show that our method is considerably more accurate than the standard approach, especially for small subgroups, and demonstrate how goodness-of-fit testing helps identify the key factors that drive differences in performance.
- [449] arXiv:2401.15289 (replaced) [pdf, ps, html, other]
-
Title: SoK: Where's the "up"?! A Comprehensive (bottom-up) Study on the Security of Arm Cortex-M SystemsXi Tan, Zheyuan Ma, Sandro Pinto, Le Guan, Ning Zhang, Jun Xu, Zhiqiang Lin, Hongxin Hu, Ziming ZhaoComments: To Appear in the 18th USENIX WOOT Conference on Offensive Technologies, August 12-13, 2024Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
Arm Cortex-M processors are the most widely used 32-bit microcontrollers among embedded and Internet-of-Things devices. Despite the widespread usage, there has been little effort in summarizing their hardware security features, characterizing the limitations and vulnerabilities of their hardware and software stack, and systematizing the research on securing these systems. The goals and contributions of this paper are multi-fold. First, we analyze the hardware security limitations and issues of Cortex-M systems. Second, we conducted a deep study of the software stack designed for Cortex-M and revealed its limitations, which is accompanied by an empirical analysis of 1,797 real-world firmware. Third, we categorize the reported bugs in Cortex-M software systems. Finally, we systematize the efforts that aim at securing Cortex-M systems and evaluate them in terms of the protections they offer, runtime performance, required hardware features, etc. Based on the insights, we develop a set of recommendations for the research community and MCU software developers.
- [450] arXiv:2401.15595 (replaced) [pdf, ps, other]
-
Title: Comuniqa : Exploring Large Language Models for improving speaking skillsManas Mhasakar, Shikhar Sharma, Apurv Mehra, Utkarsh Venaik, Ujjwal Singhal, Dhruv Kumar, Kashish MittalComments: Accepted at 7th ACM SIGCAS/SIGCHI Conference of Computing and Sustainable Societies : ACM COMPASS 2024Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
In this paper, we investigate the potential of Large Language Models (LLMs) to improve English speaking skills. This is particularly relevant in countries like India, where English is crucial for academic, professional, and personal communication but remains a non-native language for many. Traditional methods for enhancing speaking skills often rely on human experts, which can be limited in terms of scalability, accessibility, and affordability. Recent advancements in Artificial Intelligence (AI) offer promising solutions to overcome these limitations.
We propose Comuniqa, a novel LLM-based system designed to enhance English speaking skills. We adopt a human-centric evaluation approach, comparing Comuniqa with the feedback and instructions provided by human experts. In our evaluation, we divide the participants in three groups: those who use LLM-based system for improving speaking skills, those guided by human experts for the same task and those who utilize both the LLM-based system as well as the human experts. Using surveys, interviews, and actual study sessions, we provide a detailed perspective on the effectiveness of different learning modalities. Our preliminary findings suggest that while LLM-based systems have commendable accuracy, they lack human-level cognitive capabilities, both in terms of accuracy and empathy. Nevertheless, Comuniqa represents a significant step towards achieving Sustainable Development Goal 4: Quality Education by providing a valuable learning tool for individuals who may not have access to human experts for improving their speaking skills. - [451] arXiv:2401.17734 (replaced) [pdf, ps, other]
-
Title: Efficient Shape Formation by 3D Hybrid Programmable Matter: An Algorithm for Low Diameter Intermediate StructuresComments: 27 pages, 8 figures; fixed typos in Observation 1Subjects: Data Structures and Algorithms (cs.DS); Emerging Technologies (cs.ET)
This paper considers the shape formation problem within the 3D hybrid model, where a single agent with a strictly limited viewing range and the computational capacity of a deterministic finite automaton manipulates passive tiles through pick-up, movement, and placement actions. The goal is to reconfigure a set of tiles into a specific shape termed an icicle. The icicle, identified as a dense, hole-free structure, is strategically chosen to function as an intermediate shape for more intricate shape formation tasks. It is designed for easy exploration by a finite state agent, enabling the identification of tiles that can be lifted without breaking connectivity. Compared to the line shape, the icicle presents distinct advantages, including a reduced diameter and the presence of multiple removable tiles. We propose an algorithm that transforms an arbitrary initially connected tile structure into an icicle in $\mathcal{O}(n^3)$ steps, matching the runtime of the line formation algorithm from prior work. Our theoretical contribution is accompanied by an extensive experimental analysis, indicating that our algorithm decreases the diameter of tile structures on average.
- [452] arXiv:2401.17804 (replaced) [pdf, ps, other]
-
Title: An Efficient PGD Solver for Structural Dynamics ApplicationsSubjects: Computational Engineering, Finance, and Science (cs.CE)
We propose in this paper a Proper Generalized Decomposition (PGD) solver for reduced-order modeling of linear elastodynamic problems. It primarily focuses on enhancing the computational efficiency of a previously introduced PGD solver based on the Hamiltonian formalism. The novelty of this work lies in the implementation of a solver that is halfway between Modal Decomposition and the conventional PGD framework, so as to accelerate the fixed-point iteration algorithm. Additional procedures such that Aitken's delta-squared process and mode-orthogonalization are incorporated to ensure convergence and stability of the algorithm. Numerical results regarding the ROM accuracy, time complexity, and scalability are provided to demonstrate the performance of the new solver when applied to dynamic simulation of a three-dimensional structure.
- [453] arXiv:2402.00281 (replaced) [pdf, ps, html, other]
-
Title: Guided Interpretable Facial Expression Recognition via Spatial Action Unit CuesComments: 15 pages, 11 figures, 3 tables, International Conference on Automatic Face and Gesture Recognition (FG 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Although state-of-the-art classifiers for facial expression recognition (FER) can achieve a high level of accuracy, they lack interpretability, an important feature for end-users. Experts typically associate spatial action units (\aus) from a codebook to facial regions for the visual interpretation of expressions. In this paper, the same expert steps are followed. A new learning strategy is proposed to explicitly incorporate \au cues into classifier training, allowing to train deep interpretable models. During training, this \au codebook is used, along with the input image expression label, and facial landmarks, to construct a \au heatmap that indicates the most discriminative image regions of interest w.r.t the facial expression. This valuable spatial cue is leveraged to train a deep interpretable classifier for FER. This is achieved by constraining the spatial layer features of a classifier to be correlated with \au heatmaps. Using a composite loss, the classifier is trained to correctly classify an image while yielding interpretable visual layer-wise attention correlated with \au maps, simulating the expert decision process. Our strategy only relies on image class expression for supervision, without additional manual annotations. Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time. Our extensive evaluation on two public benchmarks \rafdb, and \affectnet datasets shows that our proposed strategy can improve layer-wise interpretability without degrading classification performance. In addition, we explore a common type of interpretable classifiers that rely on class activation mapping (CAM) methods, and show that our approach can also improve CAM interpretability.
- [454] arXiv:2402.00564 (replaced) [pdf, ps, other]
-
Title: A Single Graph Convolution Is All You Need: Efficient Grayscale Image ClassificationComments: Limited noveltySubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Image classifiers often rely on convolutional neural networks (CNN) for their tasks, which are inherently more heavyweight than multilayer perceptrons (MLPs), which can be problematic in real-time applications. Additionally, many image classification models work on both RGB and grayscale datasets. Classifiers that operate solely on grayscale images are much less common. Grayscale image classification has diverse applications, including but not limited to medical image classification and synthetic aperture radar (SAR) automatic target recognition (ATR). Thus, we present a novel grayscale (single channel) image classification approach using a vectorized view of images. We exploit the lightweightness of MLPs by viewing images as a vector and reducing our problem setting to the grayscale image classification setting. We find that using a single graph convolutional layer batch-wise increases accuracy and reduces variance in the performance of our model. Moreover, we develop a customized accelerator on FPGA for the proposed model with several optimizations to improve its performance. Our experimental results on benchmark grayscale image datasets demonstrate the effectiveness of the proposed model, achieving vastly lower latency (up to 16$\times$ less) and competitive or leading performance compared to other state-of-the-art image classification models on various domain-specific grayscale image classification datasets.
- [455] arXiv:2402.02807 (replaced) [pdf, ps, other]
-
Title: Are Sounds Sound for Phylogenetic Reconstruction?Comments: Paper accepted for SIGTYP (2024): Häuser, Luise; Jäger, Gerhard; List, Johann-Mattis; Rama, Taraka; and Stamatakis, Alexandros (2024): Are sounds sound for phylogenetic reconstruction? In: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (SIGTYP 2024)Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.
- [456] arXiv:2402.03885 (replaced) [pdf, ps, other]
-
Title: MOMENT: A Family of Open Time-series Foundation ModelsComments: Accepted at ICML 2024. This version contains new experimental results and a section on contemporary workSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We introduce MOMENT, a family of open-source foundation models for general-purpose time series analysis. Pre-training large models on time series data is challenging due to (1) the absence of a large and cohesive public time series repository, and (2) diverse time series characteristics which make multi-dataset training onerous. Additionally, (3) experimental benchmarks to evaluate these models, especially in scenarios with limited resources, time, and supervision, are still in their nascent stages. To address these challenges, we compile a large and diverse collection of public time series, called the Time series Pile, and systematically tackle time series-specific challenges to unlock large-scale multi-dataset pre-training. Finally, we build on recent work to design a benchmark to evaluate time series foundation models on diverse tasks and datasets in limited supervision settings. Experiments on this benchmark demonstrate the effectiveness of our pre-trained models with minimal data and task-specific fine-tuning. Finally, we present several interesting empirical observations about large pre-trained time series models. Pre-trained models (AutonLab/MOMENT-1-large) and Time Series Pile (AutonLab/Timeseries-PILE) are available on Huggingface.
- [457] arXiv:2402.04486 (replaced) [pdf, ps, html, other]
-
Title: Outer Code Designs for Augmented and Local-Global Polar Code ArchitecturesComments: 8 pages, 8 figures. Accepted by ISIT 2024Subjects: Information Theory (cs.IT)
In this paper, we introduce two novel methods to design outer polar codes for two previously proposed concatenated polar code architectures: augmented polar codes and local-global polar codes. These methods include a stopping set (SS) construction and a nonstationary density evolution (NDE) construction. Simulation results demonstrate the advantage of these methods over previously proposed constructions based on density evolution (DE) and LLR evolution.
- [458] arXiv:2402.07720 (replaced) [pdf, ps, other]
-
Title: VistaScenario: Interaction Scenario Engineering for Vehicles with Intelligent Systems for Transport AutomationComments: Accepted by IEEE Transactions on Intelligent VehiclesSubjects: Software Engineering (cs.SE)
Intelligent vehicles and autonomous driving systems rely on scenario engineering for intelligence and index (I&I), calibration and certification (C&C), and verification and validation (V&V). To extract and index scenarios, various vehicle interactions are worthy of much attention, and deserve refined descriptions and labels. However, existing methods cannot cope well with the problem of scenario classification and labeling with vehicle interactions as the core. In this paper, we propose VistaScenario framework to conduct interaction scenario engineering for vehicles with intelligent systems for transport automation. Based on the summarized basic types of vehicle interactions, we slice scenario data stream into a series of segments via spatiotemporal scenario evolution tree. We also propose the scenario metric Graph-DTW based on Graph Computation Tree and Dynamic Time Warping to conduct refined scenario comparison and labeling. The extreme interaction scenarios and corner cases can be efficiently filtered and extracted. Moreover, with naturalistic scenario datasets, testing examples on trajectory prediction model demonstrate the effectiveness and advantages of our framework. VistaScenario can provide solid support for the usage and indexing of scenario data, further promote the development of intelligent vehicles and transport automation.
- [459] arXiv:2402.08621 (replaced) [pdf, ps, html, other]
-
Title: A Generalized Approach to Online Convex OptimizationSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
In this paper, we analyze the problem of online convex optimization in different settings. We show that any algorithm for online linear optimization with fully adaptive adversaries is an algorithm for online convex optimization. We also show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound. We further show that algorithms that are designed for fully adaptive adversaries using deterministic semi-bandit feedback can obtain similar bounds using only stochastic semi-bandit feedback when facing oblivious adversaries. We use this to describe general meta-algorithms to convert first order algorithms to zeroth order algorithms with comparable regret bounds. Our framework allows us to analyze online optimization in various settings, such full-information feedback, bandit feedback, stochastic regret, adversarial regret and various forms of non-stationary regret.
- [460] arXiv:2402.09651 (replaced) [pdf, ps, other]
-
Title: Practitioners' Challenges and Perceptions of CI Build Failure Predictions at AtlassianYang Hong, Chakkrit Tantithamthavorn, Jirat Pasuksmit, Patanamon Thongtanunam, Arik Friedman, Xing Zhao, Anton KrasikovSubjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Continuous Integration (CI) build failures could significantly impact the software development process and teams, such as delaying the release of new features and reducing developers' productivity. In this work, we report on an empirical study that investigates CI build failures throughout product development at Atlassian. Our quantitative analysis found that the repository dimension is the key factor influencing CI build failures. In addition, our qualitative survey revealed that Atlassian developers perceive CI build failures as challenging issues in practice. Furthermore, we found that the CI build prediction can not only provide proactive insight into CI build failures but also facilitate the team's decision-making. Our study sheds light on the challenges and expectations involved in integrating CI build prediction tools into the Bitbucket environment, providing valuable insights for enhancing CI processes.
- [461] arXiv:2402.10097 (replaced) [pdf, ps, other]
-
Title: Adaptive Federated Learning in Heterogeneous Wireless Networks with Independent SamplingComments: 6 pages, 5 figures, accepted for publication in IEEE International Conference on Communications (ICC)Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Federated Learning (FL) algorithms commonly sample a random subset of clients to address the straggler issue and improve communication efficiency. While recent works have proposed various client sampling methods, they have limitations in joint system and data heterogeneity design, which may not align with practical heterogeneous wireless networks. In this work, we advocate a new independent client sampling strategy to minimize the wall-clock training time of FL, while considering data heterogeneity and system heterogeneity in both communication and computation. We first derive a new convergence bound for non-convex loss functions with independent client sampling and then propose an adaptive bandwidth allocation scheme. Furthermore, we propose an efficient independent client sampling algorithm based on the upper bounds on the convergence rounds and the expected per-round training time, to minimize the wall-clock time of FL, while considering both the data and system heterogeneity. Experimental results under practical wireless network settings with real-world prototype demonstrate that the proposed independent sampling scheme substantially outperforms the current best sampling schemes under various training models and datasets.
- [462] arXiv:2402.11184 (replaced) [pdf, ps, other]
-
Title: A modified version of the PRESB preconditioner for a class of non-Hermitian complex systems of linear equationsComments: 12 pages, 1 Figure, SubmittedSubjects: Numerical Analysis (math.NA)
We present a modified version of the PRESB preconditioner for two-by-two block system of linear equations with the coefficient matrix $$\textbf{A}=\left(\begin{array}{cc}
F & -G^*
G & F \end{array}\right),$$ where $F\in\mathbb{C}^{n\times n}$ is Hermitian positive definite and $G\in\mathbb{C}^{n\times n}$ is positive semidefinite. Spectral analysis of the preconditioned matrix is analyzed. In each iteration of a Krylov subspace method, like GMRES, for solving the preconditioned system in conjunction with proposed preconditioner two subsystems with Hermitian positive definite coefficient matrix should be solved which can be accomplished exactly using the Cholesky factorization or inexactly utilizing the conjugate gradient method. Application of the proposed preconditioner to the systems arising from finite element discretization of PDE-constrained optimization problems is presented. Numerical results are given to demonstrate the efficiency of the preconditioner. - [463] arXiv:2402.15151 (replaced) [pdf, ps, other]
-
Title: Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech ProcessingComments: An Erratum was added on the last page of this paperSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM), to maximize the context modeling ability by bringing the overwhelming power of LLMs. Specifically, VSP-LLM is designed to perform multi-tasks of visual speech recognition and translation, where the given instructions control the type of task. The input video is mapped to the input latent space of an LLM by employing a self-supervised visual speech model. Focused on the fact that there is redundant information in input frames, we propose a novel deduplication method that reduces the embedded visual features by employing visual speech units. Through the proposed deduplication and Low Rank Adaptation (LoRA), VSP-LLM can be trained in a computationally efficient manner. In the translation dataset, the MuAViC benchmark, we demonstrate that VSP-LLM trained on just 30 hours of labeled data can more effectively translate lip movements compared to the recent model trained with 433 hours of data.
- [464] arXiv:2402.17050 (replaced) [pdf, ps, other]
-
Title: Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational testKathy Jang, Nathan Lichtlé, Eugene Vinitsky, Adit Shah, Matthew Bunting, Matthew Nice, Benedetto Piccoli, Benjamin Seibold, Daniel B. Work, Maria Laura Delle Monache, Jonathan Sprinkle, Jonathan W. Lee, Alexandre M. BayenSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
In this article, we explore the technical details of the reinforcement learning (RL) algorithms that were deployed in the largest field test of automated vehicles designed to smooth traffic flow in history as of 2023, uncovering the challenges and breakthroughs that come with developing RL controllers for automated vehicles. We delve into the fundamental concepts behind RL algorithms and their application in the context of self-driving cars, discussing the developmental process from simulation to deployment in detail, from designing simulators to reward function shaping. We present the results in both simulation and deployment, discussing the flow-smoothing benefits of the RL controller. From understanding the basics of Markov decision processes to exploring advanced techniques such as deep RL, our article offers a comprehensive overview and deep dive of the theoretical foundations and practical implementations driving this rapidly evolving field. We also showcase real-world case studies and alternative research projects that highlight the impact of RL controllers in revolutionizing autonomous driving. From tackling complex urban environments to dealing with unpredictable traffic scenarios, these intelligent controllers are pushing the boundaries of what automated vehicles can achieve. Furthermore, we examine the safety considerations and hardware-focused technical details surrounding deployment of RL controllers into automated vehicles. As these algorithms learn and evolve through interactions with the environment, ensuring their behavior aligns with safety standards becomes crucial. We explore the methodologies and frameworks being developed to address these challenges, emphasizing the importance of building reliable control systems for automated vehicles.
- [465] arXiv:2402.18285 (replaced) [pdf, ps, other]
-
Title: PiShield: A PyTorch Package for Learning with RequirementsComments: Demo paper, accepted at IJCAI 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Deep learning models have shown their strengths in various application domains, however, they often struggle to meet safety requirements for their outputs. In this paper, we introduce PiShield, the first package ever allowing for the integration of the requirements into the neural networks' topology. PiShield guarantees compliance with these requirements, regardless of input. Additionally, it allows for integrating requirements both at inference and/or training time, depending on the practitioners' needs. Given the widespread application of deep learning, there is a growing need for frameworks allowing for the integration of the requirements across various domains. Here, we explore three application scenarios: functional genomics, autonomous driving, and tabular data generation.
- [466] arXiv:2403.00858 (replaced) [pdf, ps, html, other]
-
Title: Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMsComments: 8 pages, 3 figures, Published at the ICLR 2024 Workshop on Understanding of Foundation Models (ME-FoMo)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Text generation with Large Language Models (LLMs) is known to be memory bound due to the combination of their auto-regressive nature, huge parameter counts, and limited memory bandwidths, often resulting in low token rates. Speculative decoding has been proposed as a solution for LLM inference acceleration. However, since draft models are often unavailable in the modern open-source LLM families, e.g., for Llama 2 7B, training a high-quality draft model is required to enable inference acceleration via speculative decoding. In this paper, we propose a simple draft model training framework for direct alignment to chat-capable target models. With the proposed framework, we train Llama 2 Chat Drafter 115M, a draft model for Llama 2 Chat 7B or larger, with only 1.64\% of the original size. Our training framework only consists of pretraining, distillation dataset generation, and finetuning with knowledge distillation, with no additional alignment procedure. For the finetuning step, we use instruction-response pairs generated by target model for distillation in plausible data distribution, and propose a new Total Variation Distance++ (TVD++) loss that incorporates variance reduction techniques inspired from the policy gradient method in reinforcement learning. Our empirical results show that Llama 2 Chat Drafter 115M with speculative decoding achieves up to 2.3 block efficiency and 2.4$\times$ speed-up relative to autoregressive decoding on various tasks with no further task-specific fine-tuning.
- [467] arXiv:2403.01139 (replaced) [pdf, ps, other]
-
Title: ParallelPARC: A Scalable Pipeline for Generating Natural-Language AnalogiesComments: NAACL 2024 (Main Conference)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Analogy-making is central to human cognition, allowing us to adapt to novel situations -- an ability that current AI systems still lack. Most analogy datasets today focus on simple analogies (e.g., word analogies); datasets including complex types of analogies are typically manually curated and very small. We believe that this holds back progress in computational analogy. In this work, we design a data generation pipeline, ParallelPARC (Parallel Paragraph Creator) leveraging state-of-the-art Large Language Models (LLMs) to create complex, paragraph-based analogies, as well as distractors, both simple and challenging. We demonstrate our pipeline and create ProPara-Logy, a dataset of analogies between scientific processes. We publish a gold-set, validated by humans, and a silver-set, generated automatically. We test LLMs' and humans' analogy recognition in binary and multiple-choice settings, and found that humans outperform the best models (~13% gap) after a light supervision. We demonstrate that our silver-set is useful for training models. Lastly, we show challenging distractors confuse LLMs, but not humans. We hope our pipeline will encourage research in this emerging field.
- [468] arXiv:2403.03744 (replaced) [pdf, ps, other]
-
Title: Towards Safe Large Language Models for MedicineSubjects: Artificial Intelligence (cs.AI)
As large language models (LLMs) develop ever-improving capabilities and are applied in real-world settings, it is important to understand their safety. While initial steps have been taken to evaluate the safety of general-knowledge LLMs, exposing some weaknesses, the safety of medical LLMs has not been sufficiently evaluated despite their high risks to personal health and safety, public health and safety, patient rights, and human rights. To address this gap, we conduct, to our knowledge, the first study of its kind to evaluate and improve the safety of medical LLMs. We find that 1) current medical LLMs do not meet standards of general or medical safety, as they readily comply with harmful requests and that 2) fine-tuning medical LLMs on safety demonstrations significantly improves their safety, reducing their tendency to comply with harmful requests. In addition, we present a definition of medical safety for LLMs and develop a benchmark dataset to evaluate and train for medical safety in LLMs. Poised at the intersection of research on machine learning safety and medical machine learning, this work casts light on the status quo of the safety of medical LLMs and motivates future work in this area, mitigating the risks of harm of LLMs in medicine.
- [469] arXiv:2403.05535 (replaced) [pdf, ps, other]
-
Title: Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and VideosComments: ICML 2024 Version. Project Page and Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
We introduce LaGTran, a novel framework that utilizes readily available or easily acquired text descriptions to guide robust transfer of discriminative knowledge from labeled source to unlabeled target data with domain shifts. While unsupervised adaptation methods have been established to address this problem, they show limitations in handling challenging domain shifts due to their exclusive operation within the pixel-space. Motivated by our observation that semantically richer text modality has more favorable transfer properties, we devise a transfer mechanism to use a source-trained text-classifier to generate predictions on the target text descriptions, and utilize these predictions as supervision for the corresponding images. Our approach driven by language guidance is surprisingly easy and simple, yet significantly outperforms all prior approaches on challenging datasets like GeoNet and DomainNet, validating its extreme effectiveness. To further extend the scope of our study beyond images, we introduce a new benchmark to study ego-exo transfer in videos and find that our language-aided LaGTran yields significant gains in this highly challenging and non-trivial transfer setting. Code, models, and proposed datasets are publicly available at this https URL.
- [470] arXiv:2403.06098 (replaced) [pdf, ps, html, other]
-
Title: VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion ModelsComments: The project (including the collected dataset VidProM and related code) is publicly available at this https URL under the CC-BY-NC 4.0 LicenseSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
The arrival of Sora marks a new era for text-to-video diffusion models, bringing significant advancements in video generation and potential applications. However, Sora, along with other text-to-video diffusion models, is highly reliant on prompts, and there is no publicly available dataset that features a study of text-to-video prompts. In this paper, we introduce VidProM, the first large-scale dataset comprising 1.67 Million unique text-to-Video Prompts from real users. Additionally, this dataset includes 6.69 million videos generated by four state-of-the-art diffusion models, alongside some related data. We initially discuss the curation of this large-scale dataset, a process that is both time-consuming and costly. Subsequently, we underscore the need for a new prompt dataset specifically designed for text-to-video generation by illustrating how VidProM differs from DiffusionDB, a large-scale prompt-gallery dataset for image generation. Our extensive and diverse dataset also opens up many exciting new research areas. For instance, we suggest exploring text-to-video prompt engineering, efficient video generation, and video copy detection for diffusion models to develop better, more efficient, and safer models. The project (including the collected dataset VidProM and related code) is publicly available at this https URL under the CC-BY-NC 4.0 License.
- [471] arXiv:2403.06898 (replaced) [pdf, ps, html, other]
-
Title: SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation InstructionsComments: DaMoN 2024Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
The ubiquity of variable-length integers in data storage and communication necessitates efficient decoding techniques. In this paper, we present SFVInt, a simple and fast approach to decode the prevalent Little Endian Base-128 (LEB128) varints. Our approach effectively utilizes the Bit Manipulation Instruction Set 2 (BMI2) in modern Intel and AMD processors, achieving significant performance improvement while maintaining simplicity and avoiding overengineering. SFVInt, with its generic design, effectively processes both 32-bit and 64-bit unsigned integers using a unified code template, marking a significant leap forward in varint decoding efficiency. We thoroughly evaluate SFVInt's performance across various datasets and scenarios, demonstrating that it achieves up to a 2x increase in decoding speed when compared to varint decoding methods used in established frameworks like Facebook Folly and Google Protobuf.
- [472] arXiv:2403.08901 (replaced) [pdf, ps, other]
-
Title: A Framework for Strategic Discovery of Credible Neural Network Surrogate Models under UncertaintySubjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
The widespread integration of deep neural networks in developing data-driven surrogate models for high-fidelity simulations of complex physical systems highlights the critical necessity for robust uncertainty quantification techniques and credibility assessment methodologies, ensuring the reliable deployment of surrogate models in consequential decision-making. This study presents the Occam Plausibility Algorithm for surrogate models (OPAL-surrogate), providing a systematic framework to uncover predictive neural network-based surrogate models within the large space of potential models, including various neural network classes and choices of architecture and hyperparameters. The framework is grounded in hierarchical Bayesian inferences and employs model validation tests to evaluate the credibility and prediction reliability of the surrogate models under uncertainty. Leveraging these principles, OPAL-surrogate introduces a systematic and efficient strategy for balancing the trade-off between model complexity, accuracy, and prediction uncertainty. The effectiveness of OPAL-surrogate is demonstrated through two modeling problems, including the deformation of porous materials for building insulation and turbulent combustion flow for the ablation of solid fuels within hybrid rocket motors.
- [473] arXiv:2403.10175 (replaced) [pdf, ps, other]
-
Title: A Short Survey on Importance Weighting for Machine LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Importance weighting is a fundamental procedure in statistics and machine learning that weights the objective function or probability distribution based on the importance of the instance in some sense. The simplicity and usefulness of the idea has led to many applications of importance weighting. For example, it is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio. This survey summarizes the broad applications of importance weighting in machine learning and related research.
- [474] arXiv:2403.10799 (replaced) [pdf, ps, html, other]
-
Title: Efficient Pruning of Large Language Model with Adaptive Estimation FusionJun Liu, Chao Wu, Changdi Yang, Hao Tang, Haoye Dong, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi WangSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models (LLMs) have become crucial for many generative downstream tasks, leading to an inevitable trend and significant challenge to deploy them efficiently on resource-constrained devices. Structured pruning is a widely used method to address this challenge. However, when dealing with the complex structure of the multiple decoder layers, general methods often employ common estimation approaches for pruning. These approaches lead to a decline in accuracy for specific downstream tasks. In this paper, we introduce a simple yet efficient method that adaptively models the importance of each substructure. Meanwhile, it can adaptively fuse coarse-grained and finegrained estimations based on the results from complex and multilayer structures. All aspects of our design seamlessly integrate into the endto-end pruning framework. Our experimental results, compared with state-of-the-art methods on mainstream datasets, demonstrate average accuracy improvements of 1.1%, 1.02%, 2.0%, and 1.2% for LLaMa-7B,Vicuna-7B, Baichuan-7B, and Bloom-7b1, respectively.
- [475] arXiv:2403.10830 (replaced) [pdf, ps, other]
-
Title: View-Centric Multi-Object Tracking with Homographic Matching in Moving UAVSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we address the challenge of multi-object tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning left/right, and moving up/down, lead to significantly greater complexity compared to fixed-camera MOT. Specifically, changes in the scene background not only render traditional frame-to-frame object IOU association methods ineffective but also introduce significant view shifts in the objects, which complicates tracking. To overcome these issues, we propose a novel universal HomView-MOT framework, which for the first time, harnesses the view Homography inherent in changing scenes to solve MOT challenges in moving environments, incorporating Homographic Matching and View-Centric concepts. We introduce a Fast Homography Estimation (FHE) algorithm for rapid computation of Homography matrices between video frames, enabling object View-Centric ID Learning (VCIL) and leveraging multi-view Homography to learn cross-view ID features. Concurrently, our Homographic Matching Filter (HMF) maps object bounding boxes from different frames onto a common view plane for a more realistic physical IOU association. Extensive experiments have proven that these innovations allow HomView-MOT to achieve state-of-the-art performance on prominent UAV MOT datasets VisDrone and UAVDT.
- [476] arXiv:2403.11463 (replaced) [pdf, ps, other]
-
Title: Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph GroundingComments: Accepted to CVPR 2024. v2: fix a typo in figure 1Subjects: Computer Vision and Pattern Recognition (cs.CV)
Video Paragraph Grounding (VPG) is an emerging task in video-language understanding, which aims at localizing multiple sentences with semantic relations and temporal order from an untrimmed video. However, existing VPG approaches are heavily reliant on a considerable number of temporal labels that are laborious and time-consuming to acquire. In this work, we introduce and explore Weakly-Supervised Video Paragraph Grounding (WSVPG) to eliminate the need of temporal annotations. Different from previous weakly-supervised grounding frameworks based on multiple instance learning or reconstruction learning for two-stage candidate ranking, we propose a novel siamese learning framework that jointly learns the cross-modal feature alignment and temporal coordinate regression without timestamp labels to achieve concise one-stage localization for WSVPG. Specifically, we devise a Siamese Grounding TRansformer (SiamGTR) consisting of two weight-sharing branches for learning complementary supervision. An Augmentation Branch is utilized for directly regressing the temporal boundaries of a complete paragraph within a pseudo video, and an Inference Branch is designed to capture the order-guided feature correspondence for localizing multiple sentences in a normal video. We demonstrate by extensive experiments that our paradigm has superior practicability and flexibility to achieve efficient weakly-supervised or semi-supervised learning, outperforming state-of-the-art methods trained with the same or stronger supervision.
- [477] arXiv:2403.12024 (replaced) [pdf, ps, other]
-
Title: Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing SystemsComments: Accepted by LREC-COLING 2024 as a long oral paperSubjects: Computation and Language (cs.CL)
Machine translation focuses mainly on high-resource languages (HRLs), while low-resource languages (LRLs) like Taiwanese Hokkien are relatively under-explored. The study aims to address this gap by developing a dual translation model between Taiwanese Hokkien and both Traditional Mandarin Chinese and English. We employ a pre-trained LLaMA 2-7B model specialized in Traditional Mandarin Chinese to leverage the orthographic similarities between Taiwanese Hokkien Han and Traditional Mandarin Chinese. Our comprehensive experiments involve translation tasks across various writing systems of Taiwanese Hokkien as well as between Taiwanese Hokkien and other HRLs. We find that the use of a limited monolingual corpus still further improves the model's Taiwanese Hokkien capabilities. We then utilize our translation model to standardize all Taiwanese Hokkien writing systems into Hokkien Han, resulting in further performance improvements. Additionally, we introduce an evaluation method incorporating back-translation and GPT-4 to ensure reliable translation quality assessment even for LRLs. The study contributes to narrowing the resource gap for Taiwanese Hokkien and empirically investigates the advantages and limitations of pre-training and fine-tuning based on LLaMA 2.
- [478] arXiv:2403.12075 (replaced) [pdf, ps, other]
-
Title: Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image GenerationJessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, Lora AroyoComments: 10 pages, 6 figuresSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
With the rise of text-to-image (T2I) generative AI models reaching wide audiences, it is critical to evaluate model robustness against non-obvious attacks to mitigate the generation of offensive images. By focusing on ``implicitly adversarial'' prompts (those that trigger T2I models to generate unsafe images for non-obvious reasons), we isolate a set of difficult safety issues that human creativity is well-suited to uncover. To this end, we built the Adversarial Nibbler Challenge, a red-teaming methodology for crowdsourcing a diverse set of implicitly adversarial prompts. We have assembled a suite of state-of-the-art T2I models, employed a simple user interface to identify and annotate harms, and engaged diverse populations to capture long-tail safety issues that may be overlooked in standard testing. The challenge is run in consecutive rounds to enable a sustained discovery and analysis of safety pitfalls in T2I models.
In this paper, we present an in-depth account of our methodology, a systematic study of novel attack strategies and discussion of safety failures revealed by challenge participants. We also release a companion visualization tool for easy exploration and derivation of insights from the dataset. The first challenge round resulted in over 10k prompt-image pairs with machine annotations for safety. A subset of 1.5k samples contains rich human annotations of harm types and attack styles. We find that 14% of images that humans consider harmful are mislabeled as ``safe'' by machines. We have identified new attack strategies that highlight the complexity of ensuring T2I model robustness. Our findings emphasize the necessity of continual auditing and adaptation as new vulnerabilities emerge. We are confident that this work will enable proactive, iterative safety assessments and promote responsible development of T2I models. - [479] arXiv:2403.12365 (replaced) [pdf, ps, html, other]
-
Title: GaussianFlow: Splatting Gaussian Dynamics for 4D Content CreationQuankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich NeumannSubjects: Computer Vision and Pattern Recognition (cs.CV)
Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. Project page: this https URL
- [480] arXiv:2403.14593 (replaced) [pdf, ps, other]
-
Title: Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium ProofSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) significantly enhances the efficiency of policy imitation. Criticism 2 lies in Limited Performance in Transferable Reward Recovery Despite SAC Integration. While we find that SAC indeed exhibits a significant improvement in policy imitation, it introduces drawbacks to transferable reward recovery. We prove that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for a satisfactory transfer effect. Criticism 3 lies in Unsatisfactory Proof from the Perspective of Potential Equilibrium. We reanalyze it from an algebraic theory perspective.
- [481] arXiv:2403.15436 (replaced) [pdf, ps, html, other]
-
Title: Using Contextual Information for Sentence-level Morpheme SegmentationComments: 5 pages, 3 tablesSubjects: Computation and Language (cs.CL)
Recent advancements in morpheme segmentation primarily emphasize word-level segmentation, often neglecting the contextual relevance within the sentence. In this study, we redefine the morpheme segmentation task as a sequence-to-sequence problem, treating the entire sentence as input rather than isolating individual words. Our findings reveal that the multilingual model consistently exhibits superior performance compared to monolingual counterparts. While our model did not surpass the performance of the current state-of-the-art, it demonstrated comparable efficacy with high-resource languages while revealing limitations in low-resource language scenarios.
- [482] arXiv:2403.16967 (replaced) [pdf, ps, other]
-
Title: Visual Whole-Body Control for Legged Loco-ManipulationComments: Add more details. The first two authors contribute equally. Project page: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduct the whole-body control autonomously with visual observations. Our approach, namely Visual Whole-Body Control(VBC), is composed of a low-level policy using all degrees of freedom to track the body velocities along with the end-effector position, and a high-level policy proposing the velocities and end-effector position based on visual inputs. We train both levels of policies in simulation and perform Sim2Real transfer for real robot deployment. We perform extensive experiments and show significant improvements over baselines in picking up diverse objects in different configurations (heights, locations, orientations) and environments.
- [483] arXiv:2403.17400 (replaced) [pdf, ps, other]
-
Title: A Survey on Resource Management in Joint Communication and Computing-Embedded SAGINComments: 43 pages, 17 figuresSubjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
The advent of the 6G era aims for ubiquitous connectivity, with the integration of non-terrestrial networks (NTN) offering extensive coverage and enhanced capacity. As manufacturing advances and user demands evolve, space-air-ground integrated networks (SAGIN) with computational capabilities emerge as a viable solution for services requiring low latency and high computational power. Resource management within joint communication and computing-embedded SAGIN (JCC-SAGIN) presents greater complexity than traditional terrestrial networks. This complexity arises from the spatiotemporal dynamics of network topology and service demand, the interdependency of large-scale resource variables, and intricate tradeoffs among various performance metrics. Thus, a thorough examination of resource management strategies in JCC-SAGIN is crucial, emphasizing the role of non-terrestrial platforms with processing capabilities in 6G. This paper begins by reviewing the architecture, enabling technologies, and applications in JCC-SAGIN. Then, we offer a detailed overview of resource management modeling and optimization methods, encompassing both traditional optimization approaches and learning-based intelligent decision-making frameworks. Finally, we outline the prospective research directions in JCC-SAGIN.
- [484] arXiv:2403.18227 (replaced) [pdf, ps, other]
-
Title: One Backpropagation in Two Tower Recommendation ModelsComments: 10 pages, 8 figuresSubjects: Information Retrieval (cs.IR)
Recent years have witnessed extensive researches on developing two tower recommendation models for relieving information overload. Four building modules can be identified in such models, namely, user-item encoding, negative sampling, loss computing and back-propagation updating. To the best of our knowledge, existing algorithms have researched only on the first three modules, yet neglecting the backpropagation module. They all adopt a kind of two backpropagation strategy, which are based on an implicit assumption of equally treating users and items in the training phase. In this paper, we challenge such an equal training assumption and propose a novel one backpropagation updating strategy, which keeps the normal gradient backpropagation for the item encoding tower, but cuts off the backpropagation for the user encoding tower. Instead, we propose a moving-aggregation updating strategy to update a user encoding in each training epoch. Except the proposed backpropagation updating module, we implement the other three modules with the most straightforward choices. Experiments on four public datasets validate the effectiveness and efficiency of our model in terms of improved recommendation performance and reduced computation overload over the state-of-the-art competitors.
- [485] arXiv:2403.18296 (replaced) [pdf, ps, other]
-
Title: GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication ParadigmSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. Moreover, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise, thereby facilitating Task-Oriented Communication (TOC). We propose a novel approach where we first transform the input data image into graph structures. Then we leverage a GNN-based encoder to extract semantic information from the source data. This extracted semantic information is then transmitted through the channel. At the receiver's end, a GNN-based decoder is utilized to reconstruct the relevant semantic information from the source data for TOC. Through experimental evaluation, we show GeNet's effectiveness in anti-noise TOC while decoupling the SNR dependency. We further evaluate GeNet's performance by varying the number of nodes, revealing its versatility as a new paradigm for semantic communication. Additionally, we show GeNet's robustness to geometric transformations by testing it with different rotation angles, without resorting to data augmentation.
- [486] arXiv:2403.18546 (replaced) [pdf, ps, html, other]
-
Title: Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered ScenesComments: Extensive results on GraspNet-1B datasetSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Fast and robust object grasping in clutter is a crucial component of robotics. Most current works resort to the whole observed point cloud for 6-Dof grasp generation, ignoring the guidance information excavated from global semantics, thus limiting high-quality grasp generation and real-time performance. In this work, we show that the widely used heatmaps are underestimated in the efficiency of 6-Dof grasp generation. Therefore, we propose an effective local grasp generator combined with grasp heatmaps as guidance, which infers in a global-to-local semantic-to-point way. Specifically, Gaussian encoding and the grid-based strategy are applied to predict grasp heatmaps as guidance to aggregate local points into graspable regions and provide global semantic information. Further, a novel non-uniform anchor sampling mechanism is designed to improve grasp accuracy and diversity. Benefiting from the high-efficiency encoding in the image space and focusing on points in local graspable regions, our framework can perform high-quality grasp detection in real-time and achieve state-of-the-art results. In addition, real robot experiments demonstrate the effectiveness of our method with a success rate of 94% and a clutter completion rate of 100%. Our code is available at this https URL.
- [487] arXiv:2404.00793 (replaced) [pdf, ps, other]
-
Title: Learning the mechanisms of network growthComments: Main text: 13 pages, 4 figures. Supplementary: 12 pages; Rewording throughout and elaboration in Section 3Subjects: Social and Information Networks (cs.SI); Probability (math.PR); Machine Learning (stat.ML)
We propose a novel model-selection method for dynamic networks. Our approach involves training a classifier on a large body of synthetic network data. The data is generated by simulating nine state-of-the-art random graph models for dynamic networks, with parameter range chosen to ensure exponential growth of the network size in time. We design a conceptually novel type of dynamic features that count new links received by a group of vertices in a particular time interval. The proposed features are easy to compute, analytically tractable, and interpretable. Our approach achieves a near-perfect classification of synthetic networks, exceeding the state-of-the-art by a large margin. Applying our classification method to real-world citation networks gives credibility to the claims in the literature that models with preferential attachment, fitness and aging fit real-world citation networks best, although sometimes, the predicted model does not involve vertex fitness.
- [488] arXiv:2404.03843 (replaced) [pdf, ps, html, other]
-
Title: Scaling Motion Forecasting Models with Ensemble DistillationComments: 11 pages, 14 figuresSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Motion forecasting has become an increasingly critical component of autonomous robotic systems. Onboard compute budgets typically limit the accuracy of real-time systems. In this work we propose methods of improving motion forecasting systems subject to limited compute budgets by combining model ensemble and distillation techniques. The use of ensembles of deep neural networks has been shown to improve generalization accuracy in many application domains. We first demonstrate significant performance gains by creating a large ensemble of optimized single models. We then develop a generalized framework to distill motion forecasting model ensembles into small student models which retain high performance with a fraction of the computing cost. For this study we focus on the task of motion forecasting using real world data from autonomous driving systems. We develop ensemble models that are very competitive on the Waymo Open Motion Dataset (WOMD) and Argoverse leaderboards. From these ensembles, we train distilled student models which have high performance at a fraction of the compute costs. These experiments demonstrate distillation from ensembles as an effective method for improving accuracy of predictive models for robotic systems with limited compute budgets.
- [489] arXiv:2404.04394 (replaced) [pdf, ps, other]
-
Title: Analyzing Participants' Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral FeaturesComments: Accepted for the CVPR 2024 workshop (CVPM)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Engagement measurement finds application in healthcare, education, services. The use of physiological and behavioral features is viable, but the impracticality of traditional physiological measurement arises due to the need for contact sensors. We demonstrate the feasibility of unsupervised remote photoplethysmography (rPPG) as an alternative for contact sensors in deriving heart rate variability (HRV) features, then fusing these with behavioral features to measure engagement in online group meetings. Firstly, a unique Engagement Dataset of online interactions among social workers is collected with granular engagement labels, offering insight into virtual meeting dynamics. Secondly, a pre-trained rPPG model is customized to reconstruct rPPG signals from video meetings in an unsupervised manner, enabling the calculation of HRV features. Thirdly, the feasibility of estimating engagement from HRV features using short observation windows, with a notable enhancement when using longer observation windows of two to four minutes, is demonstrated. Fourthly, the effectiveness of behavioral cues is evaluated when fused with physiological data, which further enhances engagement estimation performance. An accuracy of 94% is achieved when only HRV features are used, eliminating the need for contact sensors or ground truth signals; use of behavioral cues raises the accuracy to 96%. Facial analysis offers precise engagement measurement, beneficial for future applications.
- [490] arXiv:2404.04713 (replaced) [pdf, ps, other]
-
Title: Faster Algorithms for Fair Max-Min Diversification in $\mathbb{R}^d$Journal-ref: SIGMOD 2024Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)
The task of extracting a diverse subset from a dataset, often referred to as maximum diversification, plays a pivotal role in various real-world applications that have far-reaching consequences. In this work, we delve into the realm of fairness-aware data subset selection, specifically focusing on the problem of selecting a diverse set of size $k$ from a large collection of $n$ data points (FairDiv).
The FairDiv problem is well-studied in the data management and theory community. In this work, we develop the first constant approximation algorithm for FairDiv that runs in near-linear time using only linear space. In contrast, all previously known constant approximation algorithms run in super-linear time (with respect to $n$ or $k$) and use super-linear space. Our approach achieves this efficiency by employing a novel combination of the Multiplicative Weight Update method and advanced geometric data structures to implicitly and approximately solve a linear program. Furthermore, we improve the efficiency of our techniques by constructing a coreset. Using our coreset, we also propose the first efficient streaming algorithm for the FairDiv problem whose efficiency does not depend on the distribution of data points. Empirical evaluation on million-sized datasets demonstrates that our algorithm achieves the best diversity within a minute. All prior techniques are either highly inefficient or do not generate a good solution. - [491] arXiv:2404.05241 (replaced) [pdf, ps, other]
-
Title: Lightweight Inference for Forward-Forward AlgorithmSubjects: Machine Learning (cs.LG)
The human brain performs tasks with an outstanding energy-efficiency, i.e., with approximately 20 Watts. The state-of-the-art Artificial/Deep Neural Networks (ANN/DNN), on the other hand, have recently been shown to consume massive amounts of energy. The training of these ANNs/DNNs is done almost exclusively based on the back-propagation algorithm, which is known to be biologically implausible. This has led to a new generation of forward-only techniques, including the Forward-Forward algorithm. In this paper, we propose a lightweight inference scheme specifically designed for DNNs trained using the Forward-Forward algorithm. We have evaluated our proposed lightweight inference scheme in the case of the MNIST and CIFAR datasets, as well as two real-world applications, namely, epileptic seizure detection and cardiac arrhythmia classification using wearable technologies, where complexity overheads/energy consumption is a major constraint, and demonstrate its relevance.
- [492] arXiv:2404.05388 (replaced) [pdf, ps, html, other]
-
Title: An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle MappingComments: 1st ACM International Conference on AI-powered Software (AIware)Subjects: Software Engineering (cs.SE)
The advent of advanced AI underscores the urgent need for comprehensive safety evaluations, necessitating collaboration across communities (i.e., AI, software engineering, and governance). However, divergent practices and terminologies across these communities, combined with the complexity of AI systems-of which models are only a part-and environmental affordances (e.g., access to tools), obstruct effective communication and comprehensive evaluation. This paper proposes a framework for AI system evaluation comprising three components: 1) harmonised terminology to facilitate communication across communities involved in AI safety evaluation; 2) a taxonomy identifying essential elements for AI system evaluation; 3) a mapping between AI lifecycle, stakeholders, and requisite evaluations for accountable AI supply chain. This framework catalyses a deeper discourse on AI system evaluation beyond model-centric approaches.
- [493] arXiv:2404.06553 (replaced) [pdf, ps, other]
-
Title: Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator DesignSubjects: Hardware Architecture (cs.AR)
Analog Compute-in-Memory (CiM) accelerators use analog-digital converters (ADCs) to read the analog values that they compute. ADCs can consume significant energy and area, so architecture-level ADC decisions such as ADC resolution or number of ADCs can significantly impact overall CiM accelerator energy and area. Therefore, modeling how architecture-level decisions affect ADC energy and area is critical for performing architecture-level design space exploration of CiM accelerators.
This work presents an open-source architecture-level model to estimate ADC energy and area. To enable fast design space exploration, the model uses only architecture-level attributes while abstracting circuit-level details. Our model enables researchers to quickly and easily model key architecture-level tradeoffs in accelerators that use ADCs. - [494] arXiv:2404.07773 (replaced) [pdf, ps, other]
-
Title: ConsistencyDet: A Robust Object Detector with a Denoising Paradigm of Consistency ModelSubjects: Computer Vision and Pattern Recognition (cs.CV)
Object detection, a quintessential task in the realm of perceptual computing, can be tackled using a generative methodology. In the present study, we introduce a novel framework designed to articulate object detection as a denoising diffusion process, which operates on the perturbed bounding boxes of annotated entities. This framework, termed ConsistencyDet, leverages an innovative denoising concept known as the Consistency Model. The hallmark of this model is its self-consistency feature, which empowers the model to map distorted information from any temporal stage back to its pristine state, thereby realizing a "one-step denoising" mechanism. Such an attribute markedly elevates the operational efficiency of the model, setting it apart from the conventional Diffusion Model. Throughout the training phase, ConsistencyDet initiates the diffusion sequence with noise-infused boxes derived from the ground-truth annotations and conditions the model to perform the denoising task. Subsequently, in the inference stage, the model employs a denoising sampling strategy that commences with bounding boxes randomly sampled from a normal distribution. Through iterative refinement, the model transforms an assortment of arbitrarily generated boxes into definitive detections. Comprehensive evaluations employing standard benchmarks, such as MS-COCO and LVIS, corroborate that ConsistencyDet surpasses other leading-edge detectors in performance metrics. Our code is available at this https URL.
- [495] arXiv:2404.08760 (replaced) [pdf, ps, html, other]
-
Title: The Generation Gap:Exploring Age Bias in the Underlying Value Systems of Large Language ModelsComments: 4 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In this paper, we explore the alignment of values in Large Language Models (LLMs) with specific age groups, leveraging data from the World Value Survey across thirteen categories. Through a diverse set of prompts tailored to ensure response robustness, we find a general inclination of LLM values towards younger demographics. Additionally, we explore the impact of incorporating age identity information in prompts and observe challenges in mitigating value discrepancies with different age cohorts. Our findings highlight the age bias in LLMs and provide insights for future work.
- [496] arXiv:2404.09474 (replaced) [pdf, ps, other]
-
Title: TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature SignalsComments: Accepted for the CVPR 2024 workshop (ABAW)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Engagement analysis finds various applications in healthcare, education, advertisement, services. Deep Neural Networks, used for analysis, possess complex architecture and need large amounts of input data, computational power, inference time. These constraints challenge embedding systems into devices for real-time use. To address these limitations, we present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain and boost processing speed, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form. Evaluated on the EngageNet dataset, the proposed method outperforms existing baselines, utilizing only two behavioral features (head pose rotations) compared to the 98 used in baseline models. Furthermore, comparative analysis shows TCCT-Net's architecture offers an order-of-magnitude improvement in inference speed compared to state-of-the-art image-based Recurrent Neural Network (RNN) methods. The code will be released at this https URL.
- [497] arXiv:2404.09494 (replaced) [pdf, ps, other]
-
Title: On the Necessity of Collaboration in Online Model Selection with Decentralized DataSubjects: Machine Learning (cs.LG)
We consider online model selection with decentralized data over $M$ clients, and study the necessity of collaboration among clients. Previous work omitted the problem and proposed various federated algorithms, while we provide a comprehensive answer from the perspective of computational constraints. We propose a federated algorithm and analyze the upper and lower bounds on the regret that show (i) collaboration is unnecessary in the absence of additional constraints on the problem; (ii) collaboration is necessary if the computational cost on each client is limited to $o(K)$, where $K$ is the number of candidate hypothesis spaces. We clarify the unnecessary nature of collaboration in previous federated algorithms, and improve the regret bounds of algorithms for distributed online multi-kernel learning at a smaller computational and communication cost. Our algorithm relies on three new techniques including an improved Bernstein's inequality for martingale, a federated online mirror descent framework, and decoupling model selection and predictions, which might be of independent interest.
- [498] arXiv:2404.10281 (replaced) [pdf, ps, other]
-
Title: AI-Assisted Writing in Education: Ecosystem Risks and MitigationsSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
While the excitement around the capabilities of technological advancements is giving rise to new AI-based writing assistants, the overarching ecosystem plays a crucial role in how they are adopted in educational practice. In this paper, we point to key ecological aspects for consideration. We draw insights from extensive research integrated with practice on a writing feedback tool over 9 years at a university, and we highlight potential risks when these are overlooked. It informs the design of educational writing support tools to be better aligned within broader contexts to balance innovation with practical impact.
- [499] arXiv:2404.11677 (replaced) [pdf, ps, other]
-
Title: Cross-Problem Learning for Solving Vehicle Routing ProblemsComments: Accepted by IJCAI'24Subjects: Artificial Intelligence (cs.AI)
Existing neural heuristics often train a deep architecture from scratch for each specific vehicle routing problem (VRP), ignoring the transferable knowledge across different VRP variants. This paper proposes the cross-problem learning to assist heuristics training for different downstream VRP variants. Particularly, we modularize neural architectures for complex VRPs into 1) the backbone Transformer for tackling the travelling salesman problem (TSP), and 2) the additional lightweight modules for processing problem-specific features in complex VRPs. Accordingly, we propose to pre-train the backbone Transformer for TSP, and then apply it in the process of fine-tuning the Transformer models for each target VRP variant. On the one hand, we fully fine-tune the trained backbone Transformer and problem-specific modules simultaneously. On the other hand, we only fine-tune small adapter networks along with the modules, keeping the backbone Transformer still. Extensive experiments on typical VRPs substantiate that 1) the full fine-tuning achieves significantly better performance than the one trained from scratch, and 2) the adapter-based fine-tuning also delivers comparable performance while being notably parameter-efficient. Furthermore, we empirically demonstrate the favorable effect of our method in terms of cross-distribution application and versatility.
- [500] arXiv:2404.12064 (replaced) [pdf, ps, other]
-
Title: PureForest: A Large-Scale Aerial Lidar and Aerial Imagery Dataset for Tree Species Classification in Monospecific ForestsComments: 14 pages | 5 figures | Dataset is available at this http URL | Deep learning code repository is on Gihtub at this https URL | Data engineering code repository is on Github at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Knowledge of tree species distribution is fundamental to managing forests. New deep learning approaches promise significant accuracy gains for forest mapping, and are becoming a critical tool for mapping multiple tree species at scale. To advance the field, deep learning researchers need large benchmark datasets with high-quality annotations. To this end, we present the PureForest dataset: a large-scale, open, multimodal dataset designed for tree species classification from both Aerial Lidar Scanning (ALS) point clouds and Very High Resolution (VHR) aerial images. Most current public Lidar datasets for tree species classification have low diversity as they only span a small area of a few dozen annotated hectares at most. In contrast, PureForest has 18 tree species grouped into 13 semantic classes, and spans 339 km$^2$ across 449 distinct monospecific forests, and is to date the largest and most comprehensive Lidar dataset for the identification of tree species. By making PureForest publicly available, we hope to provide a challenging benchmark dataset to support the development of deep learning approaches for tree species identification from Lidar and/or aerial imagery. In this data paper, we describe the annotation workflow, the dataset, the recommended evaluation methodology, and establish a baseline performance from both 3D and 2D modalities.
- [501] arXiv:2404.12241 (replaced) [pdf, ps, html, other]
-
Title: Introducing v0.5 of the AI Safety Benchmark from MLCommonsBertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse Khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Srijan Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Sarah Luger, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin VanschorenSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
- [502] arXiv:2404.13172 (replaced) [pdf, ps, other]
-
Title: Insights from an experiment crowdsourcing data from thousands of US Amazon users: The importance of transparency, money, and data useComments: In review at CSCW '24, accepted with minor changes. 24 pages + additional pages for references and appendicesSubjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Data generated by users on digital platforms are a crucial resource for advocates and researchers interested in uncovering digital inequities, auditing algorithms, and understanding human behavior. Yet data access is often restricted. How can researchers both effectively and ethically collect user data? This paper shares an innovative approach to crowdsourcing user data to collect otherwise inaccessible Amazon purchase histories, spanning 5 years, from more than 5000 US users. We developed a data collection tool that prioritizes participant consent and includes an experimental study design. The design allows us to study multiple aspects of privacy perception and data sharing behavior. Experiment results (N=6325) reveal both monetary incentives and transparency can significantly increase data sharing. Age, race, education, and gender also played a role, where female and less-educated participants were more likely to share. Our study design enables a unique empirical evaluation of the "privacy paradox", where users claim to value their privacy more than they do in practice. We set up both real and hypothetical data sharing scenarios and find measurable similarities and differences in share rates across these contexts. For example, increasing monetary incentives had a 6 times higher impact on share rates in real scenarios. In addition, we study participants' opinions on how data should be used by various third parties, again finding demographics have a significant impact. Notably, the majority of participants disapproved of government agencies using purchase data yet the majority approved of use by researchers. Overall, our findings highlight the critical role that transparency, incentive design, and user demographics play in ethical data collection practices, and provide guidance for future researchers seeking to crowdsource user generated data.
- [503] arXiv:2404.13624 (replaced) [pdf, ps, other]
-
Title: Necessary and Sufficient Conditions for Capacity-Achieving Private Information Retrieval with Non-Colluding and Colluding ServersComments: 16 pagesSubjects: Information Theory (cs.IT)
Private Information Retrieval (PIR) is a mechanism for efficiently downloading messages while keeping the index secret. Here, PIRs in which servers do not communicate with each other are called standard PIRs, and PIRs in which some servers communicate with each other are called colluding PIRs. The information-theoretic upper bound on efficiency has been given in previous studies. However, the conditions for PIRs to keep privacy, to decode the desired message, and to achieve that upper bound have not been clarified in matrix form. In this paper, we prove the necessary and sufficient conditions for the properties of standard PIR and colluding PIR. Further, we represent the properties in matrix form.
- [504] arXiv:2404.13891 (replaced) [pdf, ps, html, other]
-
Title: Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror DescentComments: Accepted to 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. It decomposes the total regret into counterfactual regrets, utilizing local regret minimization algorithms, such as Regret Matching (RM) or RM+, to minimize them. Recent research establishes a connection between Online Mirror Descent (OMD) and RM+, paving the way for an optimistic variant PRM+ and its extension PCFR+. However, PCFR+ assigns uniform weights for each iteration when determining regrets, leading to substantial regrets when facing dominated actions. This work explores minimizing weighted counterfactual regret with optimistic OMD, resulting in a novel CFR variant PDCFR+. It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions and consistently leveraging predictions to accelerate convergence. Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games. The code is available at this https URL.
- [505] arXiv:2404.13906 (replaced) [pdf, ps, other]
-
Title: Generating Attractive and Authentic Copywriting from Customer ReviewsComments: NAACL 2024 main conference paperSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The goal of product copywriting is to capture the interest of potential buyers by emphasizing the features of products through text descriptions. As e-commerce platforms offer a wide range of services, it's becoming essential to dynamically adjust the styles of these auto-generated descriptions. Typical approaches to copywriting generation often rely solely on specified product attributes, which may result in dull and repetitive content. To tackle this issue, we propose to generate copywriting based on customer reviews, as they provide firsthand practical experiences with products, offering a richer source of information than just product attributes. We have developed a sequence-to-sequence framework, enhanced with reinforcement learning, to produce copywriting that is attractive, authentic, and rich in information. Our framework outperforms all existing baseline and zero-shot large language models, including LLaMA-2-chat-7B and GPT-3.5, in terms of both attractiveness and faithfulness. Furthermore, this work features the use of LLMs for aspect-based summaries collection and argument allure assessment. Experiments demonstrate the effectiveness of using LLMs for marketing domain corpus construction. The code and the dataset is publicly available at: this https URL.
- [506] arXiv:2404.15275 (replaced) [pdf, ps, other]
-
Title: ID-Animator: Zero-Shot Identity-Preserving Human Video GenerationComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Generating high fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case finetuning or usually missing the identity details in video generation process. In this study, we present ID-Animator, a zero-shot human-video generation approach that can perform personalized video generation given single reference facial image without further training. ID-Animator inherits existing diffusion-based video generation backbones with a face adapter to encode the ID-relevant embeddings from learnable facial latent queries. To facilitate the extraction of identity information in video generation, we introduce an ID-oriented dataset construction pipeline, which incorporates decoupled human attribute and action captioning technique from a constructed facial image pool. Based on this pipeline, a random face reference training method is further devised to precisely capture the ID-relevant embeddings from reference images, thus improving the fidelity and generalization capacity of our model for ID-specific video generation. Extensive experiments demonstrate the superiority of ID-Animator to generate personalized human videos over previous models. Moreover, our method is highly compatible with popular pre-trained T2V models like animatediff and various community backbone models, showing high extendability in real-world applications for video generation where identity preservation is highly desired. Our codes and checkpoints will be released at this https URL.
- [507] arXiv:2404.15565 (replaced) [pdf, ps, html, other]
-
Title: CASPR: Automated Evaluation Metric for Contrastive SummarizationSubjects: Computation and Language (cs.CL)
Summarizing comparative opinions about entities (e.g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making. However, reliably measuring the contrastiveness of the output summaries without relying on human evaluations remains an open problem. Prior work has proposed token-overlap based metrics, Distinctiveness Score, to measure contrast which does not take into account the sensitivity to meaning-preserving lexical variations. In this work, we propose an automated evaluation metric CASPR to better measure contrast between a pair of summaries. Our metric is based on a simple and light-weight method that leverages natural language inference (NLI) task to measure contrast by segmenting reviews into single-claim sentences and carefully aggregating NLI scores between them to come up with a summary-level score. We compare CASPR with Distinctiveness Score and a simple yet powerful baseline based on BERTScore. Our results on a prior dataset CoCoTRIP demonstrate that CASPR can more reliably capture the contrastiveness of the summary pairs compared to the baselines.
- [508] arXiv:2404.15735 (replaced) [pdf, ps, other]
-
Title: On Replacing Cryptopuzzles with Useful Computation in Blockchain Proof-of-Work ProtocolsComments: Submitted to ACM Computing SurveysSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Proof-of-Work (PoW) blockchains have emerged as a robust and effective consensus mechanism in open environments, leading to widespread deployment with numerous cryptocurrency platforms and substantial investments. However, the commonly deployed PoW implementations are all based on solving cryptographic puzzles. Researchers have been pursuing the compelling idea of replacing cryptopuzzles with useful computing tasks for over a decade, in face of the substantial computational capacity of blockchain networks and the global pursuit of a more sustainable IT infrastructure. In this study, we conduct a comprehensive analysis of the prerequisites for alternative classes of tasks. We provide insight into the effect of introducing "usefulness" and of transitioning to task classes other than cryptopuzzles. Having distilled the prerequisites, we use them to examine proposed designs from existing literature. Finally, we discuss pertinent techniques and present research gaps in the current state-of-the-art.
- [509] arXiv:2404.16044 (replaced) [pdf, ps, html, other]
-
Title: Toward the Categorical Data MapFrederik L. Dennig, Lucas Joos, Patrick Paetzold, Daniela Blumberg, Oliver Deussen, Daniel A. Keim, Maximilian T. FischerComments: 12 pages, 10 figures, LaTeX; formatting; corrected typoSubjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
Categorical data does not have an intrinsic definition of distance or order, and therefore, established visualization techniques for categorical data only allow for a set-based or frequency-based analysis, e.g., through Euler diagrams or Parallel Sets, and do not support a similarity-based analysis. We present a novel dimensionality reduction-based visualization for categorical data, which is based on defining the distance of two data items as the number of varying attributes. Our technique enables users to pre-attentively detect groups of similar data items and observe the properties of the projection, such as attributes strongly influencing the embedding. Our prototype visually encodes data properties in an enhanced scatterplot-like visualization, encoding attributes in the background to show the distribution of categories. In addition, we propose two graph-based measures to quantify the plot's visual quality, which rank attributes according to their contribution to cluster cohesion. To demonstrate the capabilities of our similarity-based approach, we compare it to Euler diagrams and Parallel Sets regarding visual scalability and show its benefits through an expert study with five data scientists analyzing the Titanic and Mushroom datasets with up to 23 attributes and 8124 category combinations. Our results indicate that the Categorical Data Map offers an effective analysis method, especially for large datasets with a high number of category combinations.
- [510] arXiv:2404.16856 (replaced) [pdf, ps, other]
-
Title: HookChain: A new perspective for Bypassing EDR SolutionsComments: 46 pages, 22 figures, HookChain, Bypass EDR, Evading EDR, IAT Hook, Halo's GateSubjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Operating Systems (cs.OS)
In the current digital security ecosystem, where threats evolve rapidly and with complexity, companies developing Endpoint Detection and Response (EDR) solutions are in constant search for innovations that not only keep up but also anticipate emerging attack vectors. In this context, this article introduces the HookChain, a look from another perspective at widely known techniques, which when combined, provide an additional layer of sophisticated evasion against traditional EDR systems. Through a precise combination of IAT Hooking techniques, dynamic SSN resolution, and indirect system calls, HookChain redirects the execution flow of Windows subsystems in a way that remains invisible to the vigilant eyes of EDRs that only act on Ntdll.dll, without requiring changes to the source code of the applications and malwares involved. This work not only challenges current conventions in cybersecurity but also sheds light on a promising path for future protection strategies, leveraging the understanding that continuous evolution is key to the effectiveness of digital security. By developing and exploring the HookChain technique, this study significantly contributes to the body of knowledge in endpoint security, stimulating the development of more robust and adaptive solutions that can effectively address the ever-changing dynamics of digital threats. This work aspires to inspire deep reflection and advancement in the research and development of security technologies that are always several steps ahead of adversaries.
UNDER CONSTRUCTION RESEARCH: This paper is not the final version, as it is currently undergoing final tests against several EDRs. We expect to release the final version by August 2024. - [511] arXiv:2404.17481 (replaced) [pdf, ps, other]
-
Title: ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking ExplanationsComments: Accepted to HumEval at LREC-Coling 2024. Table 1 updatedSubjects: Computation and Language (cs.CL)
This paper presents a partial reproduction of Generating Fact Checking Explanations by Anatanasova et al (2020) as part of the ReproHum element of the ReproNLP shared task to reproduce the findings of NLP research regarding human evaluation. This shared task aims to investigate the extent to which NLP as a field is becoming more or less reproducible over time. Following the instructions provided by the task organisers and the original authors, we collect relative rankings of 3 fact-checking explanations (comprising a gold standard and the outputs of 2 models) for 40 inputs on the criteria of Coverage. The results of our reproduction and reanalysis of the original work's raw results lend support to the original findings, with similar patterns seen between the original work and our reproduction. Whilst we observe slight variation from the original results, our findings support the main conclusions drawn by the original authors pertaining to the efficacy of their proposed models.
- [512] arXiv:2404.18841 (replaced) [pdf, ps, other]
-
Title: Deep orthogonal decomposition: a continuously adaptive data-driven approach to model order reductionSubjects: Numerical Analysis (math.NA)
We develop a novel deep learning technique, termed Deep Orthogonal Decomposition (DOD), for dimensionality reduction and reduced order modeling of parameter dependent partial differential equations. The approach consists in the construction of a deep neural network model that approximates the solution manifold through a continuously adaptive local basis. In contrast to global methods, such as Principal Orthogonal Decomposition (POD), the adaptivity allows the DOD to overcome the Kolmogorov barrier, making the approach applicable to a wide spectrum of parametric problems. Furthermore, due to its hybrid linear-nonlinear nature, the DOD can accommodate both intrusive and nonintrusive techniques, providing highly interpretable latent representations and tighter control on error propagation. For this reason, the proposed approach stands out as a valuable alternative to other nonlinear techniques, such as deep autoencoders. The methodology is discussed both theoretically and practically, evaluating its performances on problems featuring nonlinear PDEs, singularities, and parametrized geometries.
- [513] arXiv:2404.18975 (replaced) [pdf, ps, other]
-
Title: M3H: Multimodal Multitask Machine Learning for HealthcareSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Artificial intelligence holds promise to fundamentally enhance healthcare. Developing an integrated many-to-many framework leveraging multimodal data for multiple tasks is essential to unifying modern medicine. We introduce M3H, an explainable Multimodal Multitask Machine Learning for Healthcare framework that consolidates learning from tabular, time-series, language, and vision data for supervised binary/multiclass classification, regression, and unsupervised clustering. M3H encompasses an unprecedented range of medical tasks and problem domains and consistently outperforms traditional single-task models by on average 11.6% across 40 disease diagnoses from 16 medical departments, three hospital operation forecasts, and one patient phenotyping task. It features a novel attention mechanism balancing self-exploitation (learning source-task), and cross-exploration (learning cross-tasks), and offers explainability through a proposed TIM score, shedding light on the dynamics of task learning interdependencies. Its adaptable architecture supports easy customization and integration of new data modalities and tasks, establishing it as a robust, scalable solution for advancing AI-driven healthcare systems.
- [514] arXiv:2404.19652 (replaced) [pdf, ps, html, other]
-
Title: VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain GeneralizationYuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang BaiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters. The Prompt Queries Generation Module facilitates explicit interaction between different tasks, while the Tasks-aware Adapter helps the model dynamically learn suitable features for each task. Additionally, to further enable the model to learn temporal information at a lower cost, we propose a synthetic video text dataset (VTD-368k) by leveraging the Content Deformation Fields (CoDeF) algorithm. Notably, our method outperforms the state-of-the-art method by an average of 2.6% in six cross-domain benchmarks such as TT-to-IC15, CTW1500-to-TT, and TT-to-CTW1500. For video-level cross-domain adaption, our method even surpasses the previous end-to-end video spotting method in ICDAR2015 video and DSText v2 by an average of 5.5% on the MOTA metric, using only image-level data. We further demonstrate that existing Large Multimodal Models exhibit limitations in generating cross-domain scene text spotting, in contrast to our VimTS model which requires significantly fewer parameters and data. The code and datasets will be made available at the this https URL.
- [515] arXiv:2405.00295 (replaced) [pdf, ps, html, other]
-
Title: Proof of Sampling: A Nash Equilibrium-Secured Verification Protocol for Decentralized SystemsSubjects: Computer Science and Game Theory (cs.GT)
This paper presents a secure and versatile sampling-based verification protocol, Proof of Sampling (PoSP) protocol, suitable for a wide range of decentralized applications. Our protocol has a unique Nash Equilibrium in pure strategies, which compels rational participants to act honestly, thus fortifying the network's integrity. This can be achieved with manageable computational overhead. When applied to decentralized inference for AI applications, we design spML based on PoSP protocol, which ingeniously amalgamates the strengths of optimistic fraud proof and zero knowledge proof based approaches, the foremost approaches in the domain at present. Moreover, the PoSP protocol can be effectively utilized for designing verification mechanisms within Actively Validated Services (AVS) in EigenLayer and Layer 2 solutions, further broadening its applicability. This innovative approach not only enhances the security and efficiency of decentralized systems but also paves the way for a new generation of scalable and reliable decentralized applications.
- [516] arXiv:2405.00515 (replaced) [pdf, ps, other]
-
Title: GAD-Generative Learning for HD Map-Free Autonomous DrivingSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic programming or model predictive control. This results in a performance bottleneck for autonomous driving systems in that corner cases simply cannot be solved by enumerating hand-crafted rules. We present a deep-learning-based approach that brings prediction, decision, and planning modules together with the attempt to overcome the rule-based methods' deficiency in real-world applications of autonomous driving, especially for urban scenes. The DNN model we proposed is solely trained with 10 hours of human driver data, and it supports all mass-production ADAS features available on the market to date. This method is deployed onto a Jiyue test car with no modification to its factory-ready sensor set and compute platform. the feasibility, usability, and commercial potential are demonstrated in this article.
- [517] arXiv:2405.00630 (replaced) [pdf, ps, html, other]
-
Title: Depth Priors in Removal Neural Radiance FieldsComments: 16 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Neural Radiance Fields have achieved impressive results in 3D reconstruction and novel view generation. A significant challenge within NeRF involves editing reconstructed 3D scenes, such as object removal, which demands consistency across multiple views and the synthesis of high-quality perspectives. Previous studies have integrated depth priors, typically sourced from LiDAR or sparse depth estimates from COLMAP, to enhance NeRF's performance in object removal. However, these methods are either expensive or time-consuming. This paper proposes a new pipeline that leverages SpinNeRF and monocular depth estimation models like ZoeDepth to enhance NeRF's performance in complex object removal with improved efficiency. A thorough evaluation of COLMAP's dense depth reconstruction on the KITTI dataset is conducted to demonstrate that COLMAP can be viewed as a cost-effective and scalable alternative for acquiring depth ground truth compared to traditional methods like LiDAR. This serves as the basis for evaluating the performance of monocular depth estimation models to determine the best one for generating depth priors for SpinNeRF. The new pipeline is tested in various scenarios involving 3D reconstruction and object removal, and the results indicate that our pipeline significantly reduces the time required for depth prior acquisition for object removal and enhances the fidelity of the synthesized views, suggesting substantial potential for building high-fidelity digital twin systems with increased efficiency in the future.
- [518] arXiv:2405.00740 (replaced) [pdf, ps, other]
-
Title: Modeling Caption Diversity in Contrastive Vision-Language PretrainingSamuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wilson, Aaron Courville, Nicolas BallasComments: 14 pages, 8 figures, 7 tables, to be published at ICML2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.
- [519] arXiv:2405.00808 (replaced) [pdf, ps, html, other]
-
Title: ReeSPOT: Reeb Graph Models Semantic Patterns of Normalcy in Human TrajectoriesSubjects: Data Structures and Algorithms (cs.DS); Computational Engineering, Finance, and Science (cs.CE); Social and Information Networks (cs.SI)
This paper introduces ReeSPOT, a novel Reeb graph-based method to model patterns of life in human trajectories (akin to a fingerprint). Human behavior typically follows a pattern of normalcy in day-to-day activities. This is marked by recurring activities within specific time periods. In this paper, we model this behavior using Reeb graphs where any deviation from usual day-to-day activities is encoded as nodes in the Reeb graph. The complexity of the proposed algorithm is linear with respect to the number of time points in a given trajectory. We demonstrate the usage of ReeSPOT and how it captures the critically significant spatial and temporal deviations using the nodes of the Reeb graph. Our case study presented in this paper includes realistic human movement scenarios: visiting uncommon locations, taking odd routes at infrequent times, uncommon time visits, and uncommon stay durations. We analyze the Reeb graph to interpret the topological structure of the GPS trajectories. Potential applications of ReeSPOT include urban planning, security surveillance, and behavioral research.
- [520] arXiv:2405.01066 (replaced) [pdf, ps, other]
-
Title: HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB imagesComments: 12 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high computational efficiency, in this work, we propose a simple but effective 3D hand mesh reconstruction network (i.e., HandS3C), which is the first time to incorporate state space model into the task of hand mesh reconstruction. In the network, we design a novel state-space spatial-channel attention module that extends the effective receptive field, extracts hand features in the spatial dimension, and enhances regional features of hands in the channel dimension. This helps to reconstruct a complete and detailed hand mesh. Extensive experiments conducted on well-known datasets facing heavy occlusions (such as FREIHAND, DEXYCB, and HO3D) demonstrate that our proposed HandS3C achieves state-of-the-art performance while maintaining a minimal parameters.
- [521] arXiv:2405.01092 (replaced) [pdf, ps, other]
-
Title: About enveloping algebras of direct sumsGérard Henry Edmond Duchamp (LIPN), Christophe Tollu (LIPN), Jean-Gabriel Luque (GR2IF), Vu Nguyen Dinh (USTH)Subjects: Logic in Computer Science (cs.LO)
We solve the PBW-like problem of normal ordering for enveloping algebras of direct sums.
- [522] arXiv:2405.02501 (replaced) [pdf, ps, other]
-
Title: PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context LearningComments: ICML 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large Language Models (LLMs) are trained on massive text corpora, which are encoded with diverse personality traits. This triggers an interesting goal of eliciting a desired personality trait from the LLM, and probing its behavioral preferences. Accordingly, we formalize the persona elicitation task, aiming to customize LLM behaviors to align with a target persona. We present Persona In-Context Learning (PICLe), a novel persona elicitation framework grounded in Bayesian inference. At the core, PICLe introduces a new ICL example selection criterion based on likelihood ratio, which is designed to optimally guide the model in eliciting a specific target persona. We demonstrate the effectiveness of PICLe through extensive comparisons against baseline methods across three contemporary LLMs. Code is available at this https URL.
- [523] arXiv:2405.02647 (replaced) [pdf, ps, other]
-
Title: SubwayMeshDTN: Exploring Opportunistic Delay Tolerant Routing Protocols when Disseminating Emergency Alerts on a Smart City Subway NetworkSubjects: Networking and Internet Architecture (cs.NI)
This paper seeks to understand the effectiveness of using multi-dimensional opportunistic delay-tolerant network (DTN) routing protocols, specifically Epidemic and MaxProp, in the context of New York City (NYC) metropolitan subway network. We examine how efficiently emergency messages spread through mobile, self-configuring, edge-based movement patterns on the train network to understand and propose solutions for improving communication in subterranean environments. Since DTNs are able to store, carry and forward messages through intermediate edges, this paper benchmarks both Wi-Fi and Bluetooth topologies to compare and critically evaluate movement patterns, latency, overheads and delivery rates on pseudo-realistic underground traces. We also show that the accordion effect is predominant in these networks, and therefore, the most effective protocol configurations vary.
- [524] arXiv:2405.03333 (replaced) [pdf, ps, html, other]
-
Title: Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language GuidanceXunchu Zhou, Xiaohong Liu, Yunlong Dong, Tengchuan Kou, Yixuan Gao, Zicheng Zhang, Chunyi Li, Haoning Wu, Guangtao ZhaiSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, User-Generated Content (UGC) videos have gained popularity in our daily lives. However, UGC videos often suffer from poor exposure due to the limitations of photographic equipment and techniques. Therefore, Video Exposure Correction (VEC) algorithms have been proposed, Low-Light Video Enhancement (LLVE) and Over-Exposed Video Recovery (OEVR) included. Equally important to the VEC is the Video Quality Assessment (VQA). Unfortunately, almost all existing VQA models are built generally, measuring the quality of a video from a comprehensive perspective. As a result, Light-VQA, trained on LLVE-QA, is proposed for assessing LLVE. We extend the work of Light-VQA by expanding the LLVE-QA dataset into Video Exposure Correction Quality Assessment (VEC-QA) dataset with over-exposed videos and their corresponding corrected versions. In addition, we propose Light-VQA+, a VQA model specialized in assessing VEC. Light-VQA+ differs from Light-VQA mainly from the usage of the CLIP model and the vision-language guidance during the feature extraction, followed by a new module referring to the Human Visual System (HVS) for more accurate assessment. Extensive experimental results show that our model achieves the best performance against the current State-Of-The-Art (SOTA) VQA models on the VEC-QA dataset and other public datasets.
- [525] arXiv:2405.03432 (replaced) [pdf, ps, other]
-
Title: Improved Forward-Forward Contrastive LearningSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The backpropagation algorithm, or backprop, is a widely utilized optimization technique in deep learning. While there's growing evidence suggesting that models trained with backprop can accurately explain neuronal data, no backprop-like method has yet been discovered in the biological brain for learning. Moreover, employing a naive implementation of backprop in the brain has several drawbacks. In 2022, Geoffrey Hinton proposed a biologically plausible learning method known as the Forward-Forward (FF) algorithm. Shortly after this paper, a modified version called FFCL was introduced. However, FFCL had limitations, notably being a three-stage learning system where the final stage still relied on regular backpropagation. In our approach, we address these drawbacks by eliminating the last two stages of FFCL and completely removing regular backpropagation. Instead, we rely solely on local updates, offering a more biologically plausible alternative.
- [526] arXiv:2405.03548 (replaced) [pdf, ps, other]
-
Title: MAmmoTH2: Scaling Instructions from the WebComments: Work in ProgressSubjects: Computation and Language (cs.CL)
Instruction tuning improves the reasoning abilities of large language models (LLMs), with data quality and scalability being the crucial factors. Most instruction tuning data come from human crowd-sourcing or GPT-4 distillation. We propose a paradigm to efficiently harvest 10 million naturally existing instruction data from the pre-training web corpus to enhance LLM reasoning. Our approach involves (1) recalling relevant documents, (2) extracting instruction-response pairs, and (3) refining the extracted pairs using open-source LLMs. Fine-tuning base LLMs on this dataset, we build MAmmoTH2 models, which significantly boost performance on reasoning benchmarks. Notably, MAmmoTH2-7B's (Mistral) performance increases from 11% to 34% on MATH and from 36% to 67% on GSM8K without training on any in-domain data. Further training MAmmoTH2 on public instruction tuning datasets yields MAmmoTH2-Plus, achieving state-of-the-art performance on several reasoning and chatbot benchmarks. Our work demonstrates how to harvest large-scale, high-quality instruction data without costly human annotation or GPT-4 distillation, providing a new paradigm for building better instruction tuning data.
- [527] arXiv:2405.03709 (replaced) [pdf, ps, other]
-
Title: Generating Probabilistic Scenario Programs from Natural LanguageKarim Elmaaroufi, Devan Shanker, Ana Cismaru, Marcell Vazquez-Chanlatte, Alberto Sangiovanni-Vincentelli, Matei Zaharia, Sanjit A. SeshiaComments: 17 pages, 2 figuresSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)
For cyber-physical systems (CPS), including robotics and autonomous vehicles, mass deployment has been hindered by fatal errors that occur when operating in rare events. To replicate rare events such as vehicle crashes, many companies have created logging systems and employed crash reconstruction experts to meticulously recreate these valuable events in simulation. However, in these methods, "what if" questions are not easily formulated and answered. We present ScenarioNL, an AI System for creating scenario programs from natural language. Specifically, we generate these programs from police crash reports. Reports normally contain uncertainty about the exact details of the incidents which we represent through a Probabilistic Programming Language (PPL), Scenic. By using Scenic, we can clearly and concisely represent uncertainty and variation over CPS behaviors, properties, and interactions. We demonstrate how commonplace prompting techniques with the best Large Language Models (LLM) are incapable of reasoning about probabilistic scenario programs and generating code for low-resource languages such as Scenic. Our system is comprised of several LLMs chained together with several kinds of prompting strategies, a compiler, and a simulator. We evaluate our system on publicly available autonomous vehicle crash reports in California from the last five years and share insights into how we generate code that is both semantically meaningful and syntactically correct.
- [528] arXiv:2405.03963 (replaced) [pdf, ps, other]
-
Title: ERATTA: Extreme RAG for Table To Answers with Large Language ModelsSohini Roychowdhury, Marko Krema, Anvar Mahammad, Brian Moore, Arijit Mukherjee, Punit PrakashchandraComments: 5 pages, 3 tables, Asilomar SSC Conference, 2024Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models (LLMs) with retrieval augmented-generation (RAG) have been the optimal choice for scalable generative AI solutions in the recent past. However, the choice of use-cases that incorporate RAG with LLMs have been either generic or extremely domain specific, thereby questioning the scalability and generalizability of RAG-LLM approaches. In this work, we propose a unique LLM-based system where multiple LLMs can be invoked to enable data authentication, user query routing, data retrieval and custom prompting for question answering capabilities from data tables that are highly varying and large in size. Our system is tuned to extract information from Enterprise-level data products and furnish real time responses under 10 seconds. One prompt manages user-to-data authentication followed by three prompts to route, fetch data and generate a customizable prompt natural language responses. Additionally, we propose a five metric scoring module that detects and reports hallucinations in the LLM responses. Our proposed system and scoring metrics achieve >90% confidence scores across hundreds of user queries in the sustainability, financial health and social media domains. Extensions to the proposed extreme RAG architectures can enable heterogeneous source querying using LLMs.
- [529] arXiv:2405.04051 (replaced) [pdf, ps, html, other]
-
Title: On the quantization goodness of polar latticesComments: 12 pages, 5 figures, submitted to IEEE for possible publicationSubjects: Information Theory (cs.IT)
In this work, we prove that polar lattices, when tailored for lossy compression, are quantization-good in the sense that their normalized second moments approach $\frac{1}{2\pi e}$ as the dimension of lattices increases. It has been predicted by Zamir et al. \cite{ZamirQZ96} that the Entropy Coded Dithered Quantization (ECDQ) system using quantization-good lattices can achieve the rate-distortion bound of i.i.d. Gaussian sources. In our previous work \cite{LingQZ}, we established that polar lattices are indeed capable of attaining the same objective. It is reasonable to conjecture that polar lattices also demonstrate quantization goodness in the context of lossy compression. This study confirms this hypothesis.
- [530] arXiv:2405.04363 (replaced) [pdf, ps, other]
-
Title: Some Notes on the Sample Complexity of Approximate Channel SimulationComments: Accepted as a spotlight paper at the first 'Learn to Compress' Workshop@ ISIT 2024. V2: corrected some typos and simplified Appendix CSubjects: Information Theory (cs.IT)
Channel simulation algorithms can efficiently encode random samples from a prescribed target distribution $Q$ and find applications in machine learning-based lossy data compression. However, algorithms that encode exact samples usually have random runtime, limiting their applicability when a consistent encoding time is desirable. Thus, this paper considers approximate schemes with a fixed runtime instead. First, we strengthen a result of Agustsson and Theis and show that there is a class of pairs of target distribution $Q$ and coding distribution $P$, for which the runtime of any approximate scheme scales at least super-polynomially in $D_\infty[Q \Vert P]$. We then show, by contrast, that if we have access to an unnormalised Radon-Nikodym derivative $r \propto dQ/dP$ and knowledge of $D_{KL}[Q \Vert P]$, we can exploit global-bound, depth-limited A* coding to ensure $\mathrm{TV}[Q \Vert P] \leq \epsilon$ and maintain optimal coding performance with a sample complexity of only $\exp_2\big((D_{KL}[Q \Vert P] + o(1)) \big/ \epsilon\big)$.
- [531] arXiv:2405.04378 (replaced) [pdf, ps, html, other]
-
Title: Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian SplattingOla Shorinwa, Johnathan Tucker, Aliyah Smith, Aiden Swann, Timothy Chen, Roya Firoozi, Monroe Kennedy III, Mac SchwagerSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
We present Splat-MOVER, a modular robotics stack for open-vocabulary robotic manipulation, which leverages the editability of Gaussian Splatting (GSplat) scene representations to enable multi-stage manipulation tasks. Splat-MOVER consists of: (i) ASK-Splat, a GSplat representation that distills latent codes for language semantics and grasp affordance into the 3D scene. ASK-Splat enables geometric, semantic, and affordance understanding of 3D scenes, which is critical for many robotics tasks; (ii) SEE-Splat, a real-time scene-editing module using 3D semantic masking and infilling to visualize the motions of objects that result from robot interactions in the real-world. SEE-Splat creates a "digital twin" of the evolving environment throughout the manipulation task; and (iii) Grasp-Splat, a grasp generation module that uses ASK-Splat and SEE-Splat to propose candidate grasps for open-world objects. ASK-Splat is trained in real-time from RGB images in a brief scanning phase prior to operation, while SEE-Splat and Grasp-Splat run in real-time during operation. We demonstrate the superior performance of Splat-MOVER in hardware experiments on a Kinova robot compared to two recent baselines in four single-stage, open-vocabulary manipulation tasks, as well as in four multi-stage manipulation tasks using the edited scene to reflect scene changes due to prior manipulation stages, which is not possible with the existing baselines. Code for this project and a link to the project page will be made available soon.
- [532] arXiv:2405.04515 (replaced) [pdf, ps, other]
-
Title: A Transformer with Stack AttentionComments: NAACL 2024 FindingsSubjects: Computation and Language (cs.CL)
Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in the modeling power of transformer-based language models, we propose augmenting them with a differentiable, stack-based attention mechanism. Our stack-based attention mechanism can be incorporated into any transformer-based language model and adds a level of interpretability to the model. We show that the addition of our stack-based attention mechanism enables the transformer to model some, but not all, deterministic context-free languages.
- [533] arXiv:2405.05144 (replaced) [pdf, ps, html, other]
-
Title: Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rankComments: BEA workshop NAACL 2024Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG)
Multiple-choice questions (MCQs) are commonly used across all levels of math education since they can be deployed and graded at a large scale. A critical component of MCQs is the distractors, i.e., incorrect answers crafted to reflect student errors or misconceptions. Automatically generating them in math MCQs, e.g., with large language models, has been challenging. In this work, we propose a novel method to enhance the quality of generated distractors through overgenerate-and-rank, training a ranking model to predict how likely distractors are to be selected by real students. Experimental results on a real-world dataset and human evaluation with math teachers show that our ranking model increases alignment with human-authored distractors, although human-authored ones are still preferred over generated ones.
- [534] arXiv:2405.05236 (replaced) [pdf, ps, html, other]
-
Title: Stability and Performance Analysis of Discrete-Time ReLU Recurrent Neural NetworksSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
This paper presents sufficient conditions for the stability and $\ell_2$-gain performance of recurrent neural networks (RNNs) with ReLU activation functions. These conditions are derived by combining Lyapunov/dissipativity theory with Quadratic Constraints (QCs) satisfied by repeated ReLUs. We write a general class of QCs for repeated RELUs using known properties for the scalar ReLU. Our stability and performance condition uses these QCs along with a "lifted" representation for the ReLU RNN. We show that the positive homogeneity property satisfied by a scalar ReLU does not expand the class of QCs for the repeated ReLU. We present examples to demonstrate the stability / performance condition and study the effect of the lifting horizon.
- [535] arXiv:2405.05329 (replaced) [pdf, ps, html, other]
-
Title: KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache GenerationComments: preprint for ICML 2024Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or decoding) phase to the generate subsequent tokens. In this work, we propose an efficient parallelization scheme, KV-Runahead to accelerate the prompt phase. The key observation is that the extension phase generates tokens faster than the prompt phase because of key-value cache (KV-cache). Hence, KV-Runahead parallelizes the prompt phase by orchestrating multiple processes to populate the KV-cache and minimizes the time-to-first-token (TTFT). Dual-purposing the KV-cache scheme has two main benefits. First, since KV-cache is designed to leverage the causal attention map, we minimize computation and computation automatically. Second, since it already exists for the extension phase, KV-Runahead is easy to implement. We further propose context-level load-balancing to handle uneven KV-cache generation (due to the causal attention) and to optimize TTFT. Compared with an existing parallelization scheme such as tensor or sequential parallelization where keys and values are locally generated and exchanged via all-gather collectives, our experimental results demonstrate that KV-Runahead can offer over 1.4x and 1.6x speedups for Llama 7B and Falcon 7B respectively.
- [536] arXiv:2405.05429 (replaced) [pdf, ps, html, other]
-
Title: How Inverse Conditional Flows Can Serve as a Substitute for Distributional RegressionLucas Kook, Chris Kolb, Philipp Schiele, Daniel Dold, Marcel Arpogaus, Cornelius Fritz, Philipp F. Baumann, Philipp Kopper, Tobias Pielok, Emilio Dorigatti, David RügamerComments: Accepted at UAI 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation (stat.CO); Machine Learning (stat.ML)
Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.
- [537] arXiv:2405.05433 (replaced) [pdf, ps, other]
-
Title: Robust Reward Placement under UncertaintyComments: Accepted for publication in IJCAI 2024Subjects: Multiagent Systems (cs.MA); Social and Information Networks (cs.SI)
We consider a problem of placing generators of rewards to be collected by randomly moving agents in a network. In many settings, the precise mobility pattern may be one of several possible, based on parameters outside our control, such as weather conditions. The placement should be robust to this uncertainty, to gain a competent total reward across possible networks. To study such scenarios, we introduce the Robust Reward Placement problem (RRP). Agents move randomly by a Markovian Mobility Model with a predetermined set of locations whose connectivity is chosen adversarially from a known set $\Pi$ of candidates. We aim to select a set of reward states within a budget that maximizes the minimum ratio, among all candidates in $\Pi$, of the collected total reward over the optimal collectable reward under the same candidate. We prove that RRP is NP-hard and inapproximable, and develop $\Psi$-Saturate, a pseudo-polynomial time algorithm that achieves an $\epsilon$-additive approximation by exceeding the budget constraint by a factor that scales as $O(\ln |\Pi|/\epsilon)$. In addition, we present several heuristics, most prominently one inspired by a dynamic programming algorithm for the max-min 0-1 KNAPSACK problem. We corroborate our analysis with an experimental evaluation of the methods in both synthetic and real-world datasets.
- [538] arXiv:2405.05499 (replaced) [pdf, ps, other]
-
Title: Multi-Scale Dilated Convolution Network for Long-Term Time Series ForecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Accurate forecasting of long-term time series has important applications for decision making and planning. However, it remains challenging to capture the long-term dependencies in time series data. To better extract long-term dependencies, We propose Multi Scale Dilated Convolution Network (MSDCN), a method that utilizes a shallow dilated convolution architecture to capture the period and trend characteristics of long time series. We design different convolution blocks with exponentially growing dilations and varying kernel sizes to sample time series data at different scales. Furthermore, we utilize traditional autoregressive model to capture the linear relationships within the data. To validate the effectiveness of the proposed approach, we conduct experiments on eight challenging long-term time series forecasting benchmark datasets. The experimental results show that our approach outperforms the prior state-of-the-art approaches and shows significant inference speed improvements compared to several strong baseline methods.
- [539] arXiv:2405.05989 (replaced) [pdf, ps, other]
-
Title: Clustering-based Multitasking Deep Neural Network for Solar Photovoltaics Power Generation PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The increasing installation of Photovoltaics (PV) cells leads to more generation of renewable energy sources (RES), but results in increased uncertainties of energy scheduling. Predicting PV power generation is important for energy management and dispatch optimization in smart grid. However, the PV power generation data is often collected across different types of customers (e.g., residential, agricultural, industrial, and commercial) while the customer information is always de-identified. This often results in a forecasting model trained with all PV power generation data, allowing the predictor to learn various patterns through intra-model self-learning, instead of constructing a separate predictor for each customer type. In this paper, we propose a clustering-based multitasking deep neural network (CM-DNN) framework for PV power generation prediction. K-means is applied to cluster the data into different customer types. For each type, a deep neural network (DNN) is employed and trained until the accuracy cannot be improved. Subsequently, for a specified customer type (i.e., the target task), inter-model knowledge transfer is conducted to enhance its training accuracy. During this process, source task selection is designed to choose the optimal subset of tasks (excluding the target customer), and each selected source task uses a coefficient to determine the amount of DNN model knowledge (weights and biases) transferred to the aimed prediction task. The proposed CM-DNN is tested on a real-world PV power generation dataset and its superiority is demonstrated by comparing the prediction performance on training the dataset with a single model without clustering.
- [540] arXiv:2405.06067 (replaced) [pdf, ps, html, other]
-
Title: HMT: Hierarchical Memory Transformer for Long Context Language ProcessingSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Transformer-based large language models (LLM) have been widely used in language processing applications. However, most of them restrict the context window that permits the model to attend to every token in the inputs. Previous works in recurrent models can memorize past tokens to enable unlimited context and maintain effectiveness. However, they have "flat" memory architectures, which have limitations in selecting and filtering information. Since humans are good at learning and self-adjustment, we speculate that imitating brain memory hierarchy is beneficial for model memorization. We propose the Hierarchical Memory Transformer (HMT), a novel framework that enables and improves models' long-context processing ability by imitating human memorization behavior. Leveraging memory-augmented segment-level recurrence, we organize the memory hierarchy by preserving tokens from early input token segments, passing memory embeddings along the sequence, and recalling relevant information from history. Evaluating general language modeling (Wikitext-103, PG-19) and question-answering tasks (PubMedQA), we show that HMT steadily improves the long-context processing ability of context-constrained and long-context models. With an additional 0.5% - 2% of parameters, HMT can easily plug in and augment future LLMs to handle long context effectively. Our code is open-sourced on Github: this https URL.
- [541] arXiv:2405.06394 (replaced) [pdf, ps, html, other]
-
Title: Memory MosaicsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Memory Mosaics are networks of associative memories working in concert to achieve a prediction task of interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent ways. We demonstrate these capabilities on toy examples and we also show that memory mosaics perform as well or better than transformers on medium-scale language modeling tasks.
- [542] arXiv:2405.06461 (replaced) [pdf, ps, other]
-
Title: SketchDream: Sketch-based Text-to-3D Generation and EditingSubjects: Graphics (cs.GR)
Existing text-based 3D generation methods generate attractive results but lack detailed geometry control. Sketches, known for their conciseness and expressiveness, have contributed to intuitive 3D modeling but are confined to producing texture-less mesh models within predefined categories. Integrating sketch and text simultaneously for 3D generation promises enhanced control over geometry and appearance but faces challenges from 2D-to-3D translation ambiguity and multi-modal condition integration. Moreover, further editing of 3D models in arbitrary views will give users more freedom to customize their models. However, it is difficult to achieve high generation quality, preserve unedited regions, and manage proper interactions between shape components. To solve the above issues, we propose a text-driven 3D content generation and editing method, SketchDream, which supports NeRF generation from given hand-drawn sketches and achieves free-view sketch-based local editing. To tackle the 2D-to-3D ambiguity challenge, we introduce a sketch-based multi-view image generation diffusion model, which leverages depth guidance to establish spatial correspondence. A 3D ControlNet with a 3D attention module is utilized to control multi-view images and ensure their 3D consistency. To support local editing, we further propose a coarse-to-fine editing approach: the coarse phase analyzes component interactions and provides 3D masks to label edited regions, while the fine stage generates realistic results with refined details by local enhancement. Extensive experiments validate that our method generates higher-quality results compared with a combination of 2D ControlNet and image-to-3D generation techniques and achieves detailed control compared with existing diffusion-based 3D editing approaches.
- [543] arXiv:2405.06505 (replaced) [pdf, ps, other]
-
Title: Hal: A Language-General Framework for Analysis of User-Specified Monotone Frameworks [DRAFT]Comments: Undergraduate Senior Capstone ProjectSubjects: Programming Languages (cs.PL)
Writing dataflow analyzers requires both language and domain-specificity. That is to say, each programming language and each program property requires its own analyzer. To enable a streamlined, user-driven approach to dataflow analyzers, we introduce the theoretical framework for a user-specified dataflow analysis. This framework is constructed in such a way that the user has to specify as little as possible, while the analyzer infers and computes everything else, including interprocedural embellishments. This theoretical framework was also implemented in Java, where users can specify a program property alongside minimal extra information to induce a dataflow analysis. This framework (both theoretical and in implementation) is language-general, meaning that it is independent of syntax and semantics (as all necessary syntactic and semantic information is provided by the user, and this information is provided only once for a given language). In this paper, we introduce basic notions of intraprocedural and interprocedural dataflow analyses, the proposed "Implicit Monotone Framework," and a rigorous framework for partial functions as a property space.
- [544] arXiv:2405.06714 (replaced) [pdf, ps, other]
-
Title: Towards a Path Dependent Account of Category FluencyComments: To appear at CogSci 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Category fluency is a widely studied cognitive phenomenon, yet two conflicting accounts have been proposed as the underlying retrieval mechanism -- an optimal foraging process deliberately searching through memory (Hills et al., 2012) and a random walk sampling from a semantic network (Abbott et al., 2015). Evidence for both accounts has centered around predicting human patch switches, where both existing models of category fluency produce paradoxically identical results. We begin by peeling back the assumptions made by existing models, namely that each named example only depends on the previous example, by (i) adding an additional bias to model the category transition probability directly and (ii) relying on a large language model to predict based on the entire existing sequence. Then, we present evidence towards resolving the disagreement between each account of foraging by reformulating models as sequence generators. To evaluate, we compare generated category fluency runs to a bank of human-written sequences by proposing a metric based on n-gram overlap. We find category switch predictors do not necessarily produce human-like sequences, in fact the additional biases used by the Hills et al. (2012) model are required to improve generation quality, which are later improved by our category modification. Even generating exclusively with an LLM requires an additional global cue to trigger the patch switching behavior during production. Further tests on only the search process on top of the semantic network highlight the importance of deterministic search to replicate human behavior.
- [545] arXiv:2405.06823 (replaced) [pdf, ps, html, other]
-
Title: PLeak: Prompt Leaking Attacks against Large Language Model ApplicationsComments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large Language Models (LLMs) enable a new ecosystem with many downstream applications, called LLM applications, with different natural language processing tasks. The functionality and performance of an LLM application highly depend on its system prompt, which instructs the backend LLM on what task to perform. Therefore, an LLM application developer often keeps a system prompt confidential to protect its intellectual property. As a result, a natural attack, called prompt leaking, is to steal the system prompt from an LLM application, which compromises the developer's intellectual property. Existing prompt leaking attacks primarily rely on manually crafted queries, and thus achieve limited effectiveness.
In this paper, we design a novel, closed-box prompt leaking attack framework, called PLeak, to optimize an adversarial query such that when the attacker sends it to a target LLM application, its response reveals its own system prompt. We formulate finding such an adversarial query as an optimization problem and solve it with a gradient-based method approximately. Our key idea is to break down the optimization goal by optimizing adversary queries for system prompts incrementally, i.e., starting from the first few tokens of each system prompt step by step until the entire length of the system prompt.
We evaluate PLeak in both offline settings and for real-world LLM applications, e.g., those on Poe, a popular platform hosting such applications. Our results show that PLeak can effectively leak system prompts and significantly outperforms not only baselines that manually curate queries but also baselines with optimized queries that are modified and adapted from existing jailbreaking attacks. We responsibly reported the issues to Poe and are still waiting for their response. Our implementation is available at this repository: this https URL. - [546] arXiv:2405.07076 (replaced) [pdf, ps, other]
-
Title: Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language ModelsComments: 29 pages, 10 tables, 6 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This research develops advanced methodologies for Large Language Models (LLMs) to better manage linguistic behaviors related to emotions and ethics. We introduce DIKE, an adversarial framework that enhances the LLMs' ability to internalize and reflect global human values, adapting to varied cultural contexts to promote transparency and trust among users. The methodology involves detailed modeling of emotions, classification of linguistic behaviors, and implementation of ethical guardrails. Our innovative approaches include mapping emotions and behaviors using self-supervised learning techniques, refining these guardrails through adversarial reviews, and systematically adjusting outputs to ensure ethical alignment. This framework establishes a robust foundation for AI systems to operate with ethical integrity and cultural sensitivity, paving the way for more responsible and context-aware AI interactions.
- [547] arXiv:2405.07200 (replaced) [pdf, ps, other]
-
Title: Chebyshev Polynomial-Based Kolmogorov-Arnold Networks: An Efficient Architecture for Nonlinear Function ApproximationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Accurate approximation of complex nonlinear functions is a fundamental challenge across many scientific and engineering domains. Traditional neural network architectures often struggle to capture intricate patterns and irregularities present in high-dimensional functions. This paper introduces the Chebyshev Kolmogorov-Arnold Network (Chebyshev KAN), a novel approach that combines the theoretical foundations of the Kolmogorov-Arnold Theorem with the powerful approximation capabilities of Chebyshev polynomials. 1
- [548] arXiv:2405.07266 (replaced) [pdf, ps, other]
-
Title: Architecture-Level Modeling of Photonic Deep Neural Network AcceleratorsComments: Published at ISPASS 2024Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR)
Photonics is a promising technology to accelerate Deep Neural Networks as it can use optical interconnects to reduce data movement energy and it enables low-energy, high-throughput optical-analog computations.
To realize these benefits in a full system (accelerator + DRAM), designers must ensure that the benefits of using the electrical, optical, analog, and digital domains exceed the costs of converting data between domains. Designers must also consider system-level energy costs such as data fetch from DRAM. Converting data and accessing DRAM can consume significant energy, so to evaluate and explore the photonic system space, there is a need for a tool that can model these full-system considerations.
In this work, we show that similarities between Compute-in-Memory (CiM) and photonics let us use CiM system modeling tools to accurately model photonics systems. Bringing modeling tools to photonics enables evaluation of photonic research in a full-system context, rapid design space exploration, co-design, and comparison between systems.
Using our open-source model, we show that cross-domain conversion and DRAM can consume a significant portion of photonic system energy. We then demonstrate optimizations that reduce conversions and DRAM accesses to improve photonic system energy efficiency by up to 3x. - [549] arXiv:2405.07348 (replaced) [pdf, ps, other]
-
Title: MedConceptsQA: Open Source Medical Concepts QA BenchmarkSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
We present MedConceptsQA, a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs. The questions are categorized into three levels of difficulty: easy, medium, and hard. We conducted evaluations of the benchmark using various Large Language Models. Our findings show that pre-trained clinical Large Language Models achieved accuracy levels close to random guessing on this benchmark, despite being pre-trained on medical data. However, GPT-4 achieves an absolute average improvement of nearly 27%-37% (27% for zero-shot learning and 37% for few-shot learning) when compared to clinical Large Language Models. Our benchmark serves as a valuable resource for evaluating the understanding and reasoning of medical concepts by Large Language Models. Our benchmark is available at this https URL
- [550] arXiv:2405.07414 (replaced) [pdf, ps, other]
-
Title: Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular DomainsComments: ICML 2024, 18 pages (including supplementary materials)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self-supervised learning framework, we propose a novel pretext task based on the classical binning method. The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values. This pretext task provides the encoder with an inductive bias to capture the irregular dependencies, mapping from continuous inputs to discretized bins, and mitigates the feature heterogeneity by setting all features to have category-type targets. Our empirical investigations ascertain several advantages of binning: capturing the irregular function, compatibility with encoder architecture and additional modifications, standardizing all features into equal sets, grouping similar values within a feature, and providing ordering information. Comprehensive evaluations across diverse tabular datasets corroborate that our method consistently improves tabular representation learning performance for a wide range of downstream tasks. The codes are available in this https URL.
- [551] arXiv:2405.07443 (replaced) [pdf, ps, other]
-
Title: Minimum-Variance Recursive State Estimation for 2-D Systems: When Asynchronous Multi-Channel Delays meet Energy Harvesting ConstraintsSubjects: Systems and Control (eess.SY)
This paper is concerned with the state estimation problem for two-dimensional systems with asynchronous multichannel delays and energy harvesting constraints. In the system, each smart sensor has a certain probability of harvesting energy from the external environment, the authorized transmission between the sensor and the remote filter is contingent upon the current energy level of the sensor, which results in intermittent transmission of observation information. Addressing the issue of incomplete observation information due to asynchronous multi-channel delays, a novel approach for observation partition reconstruction is proposed to convert the delayed activated observation sequences into equivalent delay-free activated observation sequences. Through generating spatial equivalency validation, it is found that the reconstructed delay-free activated observation sequences contain the same information as the original delayed activated observation sequences. Based on the reconstructed activated observation sequence and activated probability, a novel unbiased h+1-step recursive estimator is constructed. Then, the evolution of the probability distribution of the energy level is discussed. The estimation gains are obtained by minimizing the filtering error covariance. Subsequently, through parameter assumptions, a uniform lower bound and a recursive upper bound for the filtering error covariance are presented. And the monotonicity analysis of activated probability on estimation performance is given. Finally, the effectiveness of the proposed estimation scheme is verified through a numerical simulation example.
- [552] arXiv:2405.07524 (replaced) [pdf, ps, other]
-
Title: HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image RetrievalComments: Accepted by ICMR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep image hashing aims to map input images into simple binary hash codes via deep neural networks and thus enable effective large-scale image retrieval. Recently, hybrid networks that combine convolution and Transformer have achieved superior performance on various computer tasks and have attracted extensive attention from researchers. Nevertheless, the potential benefits of such hybrid networks in image retrieval still need to be verified. To this end, we propose a hybrid convolutional and self-attention deep hashing method known as HybridHash. Specifically, we propose a backbone network with stage-wise architecture in which the block aggregation function is introduced to achieve the effect of local self-attention and reduce the computational complexity. The interaction module has been elaborately designed to promote the communication of information between image blocks and to enhance the visual representations. We have conducted comprehensive experiments on three widely used datasets: CIFAR-10, NUS-WIDE and IMAGENET. The experimental results demonstrate that the method proposed in this paper has superior performance with respect to state-of-the-art deep hashing methods. Source code is available this https URL.
- [553] arXiv:2405.07528 (replaced) [pdf, ps, other]
-
Title: Comparing Perceptions of Static and Adaptive Proactive Speech AgentsComments: Accepted to CUI 2024 - 6th Conference on Conversational User InterfacesSubjects: Human-Computer Interaction (cs.HC)
A growing literature on speech interruptions describes how people interrupt one another with speech, but these behaviours have not yet been implemented in the design of artificial agents which interrupt. Perceptions of a prototype proactive speech agent which adapts its speech to both urgency and to the difficulty of the ongoing task it interrupts are compared against perceptions of a static proactive agent which does not. The study hypothesises that adaptive proactive speech modelled on human speech interruptions will lead to partner models which consider the proactive agent as a stronger conversational partner than a static agent, and that interruptions initiated by an adaptive agent will be judged as better timed and more appropriately asked. These hypotheses are all rejected however, as quantitative analysis reveals that participants view the adaptive agent as a poorer dialogue partner than the static agent and as less appropriate in the style it interrupts. Qualitative analysis sheds light on the source of this surprising finding, as participants see the adaptive agent as less socially appropriate and as less consistent in its interactions than the static agent.
- [554] arXiv:2405.07533 (replaced) [pdf, ps, other]
-
Title: DID Link: Authentication in TLS with Decentralized Identifiers and Verifiable CredentialsSandro Rodriguez Garzon, Dennis Natusch, Artur Philipp, Axel Küpper, Hans Joachim Einsiedler, Daniela SchneiderComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Authentication in TLS is predominately carried out with X.509 digital certificates issued by certificate authorities (CA). The centralized nature of current public key infrastructures, however, comes along with severe risks, such as single points of failure and susceptibility to cyber-attacks, potentially undermining the security and trustworthiness of the entire system. With Decentralized Identifiers (DID) alongside distributed ledger technology, it becomes technically feasible to prove ownership of a unique identifier without requiring an attestation of the proof's public key by a centralized and therefore vulnerable CA. This article presents DID Link, a novel authentication scheme for TLS 1.3 that empowers entities to authenticate in a TLS-compliant way with self-issued X.509 certificates that are equipped with ledger-anchored DIDs instead of CA-issued identifiers. It facilitates the exchange of tamper-proof and 3rd-party attested claims in the form of DID-bound Verifiable Credentials after the TLS handshake to complete the authentication with a full identification of the communication partner. A prototypical implementation shows comparable TLS handshake durations of DID Link if verification material is cached and reasonable prolongations if it is obtained from a ledger. The significant speed improvement of the resulting TLS channel over a widely used, DID-based alternative transport protocol on the application layer demonstrates the potential of DID Link to become a viable solution for the establishment of secure and trustful end-to-end communication links with decentrally managed digital identities.
- [555] arXiv:2405.07541 (replaced) [pdf, ps, other]
-
Title: Random walk model that universally generates inverse square L\'evy walk by eliminating search cost minimization constraintShuji Shinohara, Daiki Morita, Hayato Hirai, Ryosuke Kuribayashi, Nobuhito Manome, Toru Moriyama, Hiroshi Okamoto, Yoshihiro Nakajima, Pegio-Yukio Gunji, Ung-il ChungSubjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
The Lévy walk, a type of random walk characterized by linear step lengths that follow a power-law distribution, is observed in the migratory behaviors of various organisms, ranging from bacteria to humans. Notably, Lévy walks with power exponents close to two are frequently observed, though their underlying causes remain elusive. This study introduces a simplified, abstract random walk model designed to produce inverse square Lévy walks, also known as Cauchy walks and explores the conditions that facilitate these phenomena. In our model, agents move toward a randomly selected destination in multi-dimensional space, and their movement strategy is parameterized by the extent to which they pursue the shortest path. When the search cost is proportional to the distance traveled, this parameter effectively reflects the emphasis on minimizing search costs. Our findings reveal that strict adherence to this cost minimization constraint results in a Brownian walk pattern. However, removing this constraint transitions the movement to an inverse square Lévy walk. Therefore, by modulating the prioritization of search costs, our model can seamlessly alternate between Brownian and Cauchy walk dynamics. This model has the potential to be utilized for exploring the parameter space of an optimization problem.
- [556] arXiv:2405.07584 (replaced) [pdf, ps, other]
-
Title: La ROUTOURNE va tournerQuentin Bramas (ICube, UNISTRA), Jean-Romain Luttringer (ICube, UNISTRA), Pascal Mérindol (ICube, UNISTRA)Comments: in French language. AlgoTel 2024 -- 26{è}mes Rencontres Francophones sur les Aspects Algorithmiques des T{é}l{é}communications, May 2024, Saint-Briac-sur-Mer, FranceSubjects: Networking and Internet Architecture (cs.NI)
Segment routing (SR) offers precise control over the paths taken: it specifies a list of detours, called segments, in IP packets. However, the number of detours that can be specified is limited by the hardware. When calculating segment lists, it is therefore necessary to limit their size. Although solutions have been proposed for calculating these lists, they lack generality and are not always optimal or efficient. We present ROUTOURNE, a method for diverting routing algorithms so that they calculate, not simply an optimal physical path to be translated into a list of segments a posteriori (with no guarantee of its size), but directly the optimal lists of segments deployable by the underlying hardware. ROUTOURNE thus facilitates the deployment of advanced traffic engineering strategies and policies, notably for load balancing from sources. Despite a route fraught with surprising challenges - in particular, the loss of isotonicity induced by SR - ROUTOURNE proves efficient, inducing at worst a linear overhead. Its accuracy and optimality have been proven, and its effectiveness evaluated by generalizing it to several more or less complex path calculation algorithms.
- [557] arXiv:2405.07621 (replaced) [pdf, ps, other]
-
Title: Towards Adaptive IMFs -- Generalization of utility functions in Multi-Agent FrameworksComments: Accepted in Netsoft-2024 conferenceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Intent Management Function (IMF) is an integral part of future-generation networks. In recent years, there has been some work on AI-based IMFs that can handle conflicting intents and prioritize the global objective based on apriori definition of the utility function and accorded priorities for competing intents. Some of the earlier works use Multi-Agent Reinforcement Learning (MARL) techniques with AdHoc Teaming (AHT) approaches for efficient conflict handling in IMF. However, the success of such frameworks in real-life scenarios requires them to be flexible to business situations. The intent priorities can change and the utility function, which measures the extent of intent fulfilment, may also vary in definition. This paper proposes a novel mechanism whereby the IMF can generalize to different forms of utility functions and change of intent priorities at run-time without additional training. Such generalization ability, without additional training requirements, would help to deploy IMF in live networks where customer intents and priorities change frequently. Results on the network emulator demonstrate the efficacy of the approach, scalability for new intents, outperforming existing techniques that require additional training to achieve the same degree of flexibility thereby saving cost, and increasing efficiency and adaptability.
- [558] arXiv:2405.07637 (replaced) [pdf, ps, other]
-
Title: Near-Optimal Regret in Linear MDPs with Aggregate Bandit FeedbackSubjects: Machine Learning (cs.LG)
In many real-world applications, it is hard to provide a reward signal in each step of a Reinforcement Learning (RL) process and more natural to give feedback when an episode ends. To this end, we study the recently proposed model of RL with Aggregate Bandit Feedback (RL-ABF), where the agent only observes the sum of rewards at the end of an episode instead of each reward individually. Prior work studied RL-ABF only in tabular settings, where the number of states is assumed to be small. In this paper, we extend ABF to linear function approximation and develop two efficient algorithms with near-optimal regret guarantees: a value-based optimistic algorithm built on a new randomization technique with a Q-functions ensemble, and a policy optimization algorithm that uses a novel hedging scheme over the ensemble.
- [559] arXiv:2405.07703 (replaced) [pdf, ps, other]
-
Title: OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs trained starting from Llama 2Mihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian RebedeaSubjects: Computation and Language (cs.CL)
In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English. Hence, their performance in English greatly exceeds their performance in other languages. This document presents our approach to training and evaluating the first foundational and chat LLM specialized for Romanian.
- [560] arXiv:2405.07708 (replaced) [pdf, ps, other]
-
Title: Secure Aggregation Meets Sparsification in Decentralized LearningSubjects: Machine Learning (cs.LG)
Decentralized learning (DL) faces increased vulnerability to privacy breaches due to sophisticated attacks on machine learning (ML) models. Secure aggregation is a computationally efficient cryptographic technique that enables multiple parties to compute an aggregate of their private data while keeping their individual inputs concealed from each other and from any central aggregator. To enhance communication efficiency in DL, sparsification techniques are used, selectively sharing only the most crucial parameters or gradients in a model, thereby maintaining efficiency without notably compromising accuracy. However, applying secure aggregation to sparsified models in DL is challenging due to the transmission of disjoint parameter sets by distinct nodes, which can prevent masks from canceling out effectively. This paper introduces CESAR, a novel secure aggregation protocol for DL designed to be compatible with existing sparsification mechanisms. CESAR provably defends against honest-but-curious adversaries and can be formally adapted to counteract collusion between them. We provide a foundational understanding of the interaction between the sparsification carried out by the nodes and the proportion of the parameters shared under CESAR in both colluding and non-colluding environments, offering analytical insight into the working and applicability of the protocol. Experiments on a network with 48 nodes in a 3-regular topology show that with random subsampling, CESAR is always within 0.5% accuracy of decentralized parallel stochastic gradient descent (D-PSGD), while adding only 11% of data overhead. Moreover, it surpasses the accuracy on TopK by up to 0.3% on independent and identically distributed (IID) data.
- [561] arXiv:2405.07719 (replaced) [pdf, ps, other]
-
Title: A Unified Sequence Parallelism Approach for Long Context Generative AIComments: 12 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Sequence parallelism (SP), which divides the sequence dimension of input tensors across multiple computational devices, is becoming key to unlocking the long-context capabilities of generative AI models. This paper investigates the state-of-the-art SP approaches, i.e. DeepSpeed-Ulysses and Ring-Attention, and proposes a unified SP approach, which is more robust to transformer model architectures and network hardware topology. This paper compares the communication and memory cost of SP and existing parallelism, including data/tensor/zero/expert/pipeline parallelism, and discusses the best practices for designing hybrid 4D parallelism involving SP. We achieved 86\% MFU on two 8xA800 nodes using SP for sequence length 208K for the LLAMA3-8B model. Our code is publicly available on \url{this https URL}.
- [562] arXiv:2405.07932 (replaced) [pdf, ps, other]
-
Title: PARDEN, Can You Repeat That? Defending against Jailbreaks via RepetitionComments: Accepted at ICML 20224Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models (LLMs) have shown success in many natural language processing tasks. Despite rigorous safety alignment processes, supposedly safety-aligned LLMs like Llama 2 and Claude 2 are still susceptible to jailbreaks, leading to security risks and abuse of the models. One option to mitigate such risks is to augment the LLM with a dedicated "safeguard", which checks the LLM's inputs or outputs for undesired behaviour. A promising approach is to use the LLM itself as the safeguard. Nonetheless, baseline methods, such as prompting the LLM to self-classify toxic content, demonstrate limited efficacy. We hypothesise that this is due to domain shift: the alignment training imparts a self-censoring behaviour to the model ("Sorry I can't do that"), while the self-classify approach shifts it to a classification format ("Is this prompt malicious"). In this work, we propose PARDEN, which avoids this domain shift by simply asking the model to repeat its own outputs. PARDEN neither requires finetuning nor white box access to the model. We empirically verify the effectiveness of our method and show that PARDEN significantly outperforms existing jailbreak detection baselines for Llama-2 and Claude-2. Code and data are available at this https URL.
We find that PARDEN is particularly powerful in the relevant regime of high True Positive Rate (TPR) and low False Positive Rate (FPR). For instance, for Llama2-7B, at TPR equal to 90%, PARDEN accomplishes a roughly 11x reduction in the FPR from 24.8% to 2.0% on the harmful behaviours dataset. - [563] arXiv:2405.07992 (replaced) [pdf, ps, other]
-
Title: MambaOut: Do We Really Need Mamba for Vision?Comments: Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision tasks. Nevertheless, the performance of Mamba for vision is often underwhelming when compared with convolutional and attention-based models. In this paper, we delve into the essence of Mamba, and conceptually conclude that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks. To empirically verify our hypotheses, we construct a series of models named MambaOut through stacking Mamba blocks while removing their core token mixer, SSM. Experimental results strongly support our hypotheses. Specifically, our MambaOut model surpasses all visual Mamba models on ImageNet image classification, indicating that Mamba is indeed unnecessary for this task. As for detection and segmentation, MambaOut cannot match the performance of state-of-the-art visual Mamba models, demonstrating the potential of Mamba for long-sequence visual tasks. The code is available at this https URL
- [564] arXiv:2003.12408 (replaced) [pdf, ps, other]
-
Title: On the role of surrogates in the efficient estimation of treatment effects with limited outcome dataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
In many experiments and observational studies, the outcome of interest is often difficult or expensive to observe, reducing effective sample sizes for estimating average treatment effects (ATEs) even when identifiable. We study how incorporating data on units for which only surrogate outcomes not of primary interest are observed can increase the precision of ATE estimation. We refrain from imposing stringent surrogacy conditions, which permit surrogates as perfect replacements for the target outcome. Instead, we supplement the available, albeit limited, observations of the target outcome (which by themselves identify the ATE) with abundant observations of surrogate outcomes, without any assumptions beyond random assignment and missingness and corresponding overlap conditions. To quantify the potential gains, we derive the difference in efficiency bounds on ATE estimation with and without surrogates, both when an overwhelming or comparable number of units have missing outcomes. We develop robust ATE estimation and inference methods that realize these efficiency gains. We empirically demonstrate the gains by studying the long-term-earning effects of job training.
- [565] arXiv:2009.05079 (replaced) [pdf, ps, html, other]
-
Title: Finding Groups of Cross-Correlated Features in Bi-View DataComments: 30 pages, 5 figures. R package: this https URLJournal-ref: Journal of Machine Learning Research Vol. 24, 2023Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
Datasets in which measurements of two (or more) types are obtained from a common set of samples arise in many scientific applications. A common problem in the exploratory analysis of such data is to identify groups of features of different data types that are strongly associated. A bimodule is a pair (A,B) of feature sets from two data types such that the aggregate cross-correlation between the features in A and those in B is large. A bimodule (A,B) is stable if A coincides with the set of features that have significant aggregate correlation with the features in B, and vice-versa. This paper proposes an iterative-testing based bimodule search procedure (BSP) to identify stable bimodules.
Compared to existing methods for detecting cross-correlated features, BSP was the best at recovering true bimodules with sufficient signal, while limiting the false discoveries. In addition, we applied BSP to the problem of expression quantitative trait loci (eQTL) analysis using data from the GTEx consortium. BSP identified several thousand SNP-gene bimodules. While many of the individual SNP-gene pairs appearing in the discovered bimodules were identified by standard eQTL methods, the discovered bimodules revealed genomic subnetworks that appeared to be biologically meaningful and worthy of further scientific investigation. - [566] arXiv:2107.10955 (replaced) [pdf, ps, other]
-
Title: Learning Linear Polytree Structural Equation ModelsComments: 35 pages, 5 figures, 4 tablesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of the polytree, which is uniquely represented by a CPDAG. On the other hand, necessary conditions on the required sample sizes for both skeleton and CPDAG recovery are also derived in terms of information-theoretic lower bounds, which match the respective sufficient conditions and thereby give a sharp characterization of the difficulty of these tasks. We also consider the problem of inverse correlation matrix estimation under the linear polytree models, and establish the estimation error bound in terms of the dimension and the total number of v-structures. We also consider an extension of group linear polytree models, in which each node represents a group of variables. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of polytree learning when the true graphical structures can only be approximated by polytrees.
- [567] arXiv:2108.04376 (replaced) [pdf, ps, other]
-
Title: Sample Observed Effects: Enumeration, Randomization and GeneralizationSubjects: Methodology (stat.ME); Artificial Intelligence (cs.AI)
The widely used 'Counterfactual' definition of Causal Effects was derived for unbiasedness and accuracy - and not generalizability. We propose a Combinatorial definition for the External Validity (EV) of intervention effects. We first define the concept of an effect observation 'background'. We then formulate conditions for effect generalization based on their sets of (observed and unobserved) backgrounds. This reveals two limits for effect generalization: (1) when effects are observed under all their enumerable backgrounds, or, (2) when backgrounds have become sufficiently randomized. We use the resulting combinatorial framework to re-examine several issues in the original counterfactual formulation: out-of-sample validity, concurrent estimation of multiple effects, bias-variance tradeoffs, statistical power, and connections to current predictive and explaining techniques.
Methodologically, the definitions also allow us to replace the parametric estimation problems that followed the counterfactual definition by combinatorial enumeration and randomization problems in non-experimental samples. We use this non-parametric framework to demonstrate (External Validity, Unconfoundness and Precision) tradeoffs in the performance of popular supervised, explaining, and causal-effect estimators. We also illustrate how the approach allows for the use of supervised and explaining methods in non-i.i.d. samples. The COVID19 pandemic highlighted the need for learning solutions to provide predictions in severally incomplete samples. We demonstrate applications in this pressing problem. - [568] arXiv:2211.11810 (replaced) [pdf, ps, html, other]
-
Title: Sample-optimal classical shadows for pure statesComments: 34 pages; v2 - journal versionSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Machine Learning (cs.LG)
We consider the classical shadows task for pure states in the setting of both joint and independent measurements. The task is to measure few copies of an unknown pure state $\rho$ in order to learn a classical description which suffices to later estimate expectation values of observables. Specifically, the goal is to approximate $\mathrm{Tr}(O \rho)$ for any Hermitian observable $O$ to within additive error $\epsilon$ provided $\mathrm{Tr}(O^2)\leq B$ and $\lVert O \rVert = 1$. Our main result applies to the joint measurement setting, where we show $\tilde{\Theta}(\sqrt{B}\epsilon^{-1} + \epsilon^{-2})$ samples of $\rho$ are necessary and sufficient to succeed with high probability. The upper bound is a quadratic improvement on the previous best sample complexity known for this problem. For the lower bound, we see that the bottleneck is not how fast we can learn the state but rather how much any classical description of $\rho$ can be compressed for observable estimation. In the independent measurement setting, we show that $\mathcal O(\sqrt{Bd} \epsilon^{-1} + \epsilon^{-2})$ samples suffice. Notably, this implies that the random Clifford measurements algorithm of Huang, Kueng, and Preskill, which is sample-optimal for mixed states, is not optimal for pure states. Interestingly, our result also uses the same random Clifford measurements but employs a different estimator.
- [569] arXiv:2212.14813 (replaced) [pdf, ps, other]
-
Title: Unsupervised learning for structure detection in plastically deformed crystalsJournal-ref: Computational Materials Science, 2023, 230, pp.112459Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
Detecting structures at the particle scale within plastically deformed crystalline materials allows a better understanding of the occurring phenomena. While previous approaches mostly relied on applying hand-chosen criteria on different local parameters, these approaches could only detect already known structures.We introduce an unsupervised learning algorithm to automatically detect structures within a crystal under plastic deformation. This approach is based on a study developed for structural detection on colloidal materials. This algorithm has the advantage of being computationally fast and easy to implement. We show that by using local parameters based on bond-angle distributions, we are able to detect more structures and with a higher degree of precision than traditional hand-made criteria.
- [570] arXiv:2301.00712 (replaced) [pdf, ps, html, other]
-
Title: On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved AnalysisComments: Add new upper bounds of nonconvex-PL bilevel problems compared to arXiv version 1 in 2023.1Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning. A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where the lower-level functions lack strong convexity. In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-Łojasiewicz (PL) condition. We show a simple first-order algorithm can achieve better complexity bounds of $\tilde{\mathcal{O}}(\epsilon^{-2})$, $\tilde{\mathcal{O}}(\epsilon^{-4})$ and $\tilde{\mathcal{O}}(\epsilon^{-6})$ in the deterministic, partially stochastic, and fully stochastic setting respectively.
- [571] arXiv:2301.06428 (replaced) [pdf, ps, other]
-
Title: Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic OptimizationComments: ICML 2023Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}_{\xi} [F(x; \xi)]$, where the component $F(x;\xi)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} \epsilon^{-4} + \Delta L^3 d^{3/2} \delta^{-1} \epsilon^{-4})$ stochastic zeroth-order oracle complexity to find a $(\delta,\epsilon)$-Goldstein stationary point of objective function, where $\Delta = f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} \epsilon^{-3}+ \Delta L^2 d^{3/2} \delta^{-1} \epsilon^{-3})$.
- [572] arXiv:2301.13414 (replaced) [pdf, ps, other]
-
Title: Incentive Compatibility in the Auto-bidding WorldSubjects: Theoretical Economics (econ.TH); Computer Science and Game Theory (cs.GT)
Auto-bidding has recently become a popular feature in ad auctions. This feature enables advertisers to simply provide high-level constraints and goals to an automated agent, which optimizes their auction bids on their behalf. In this paper, we examine the effect of different auctions on the incentives of advertisers to report their constraints to the auto-bidder intermediaries. More precisely, we study whether canonical auctions such as first price auction (FPA) and second price auction (SPA) are auto-bidding incentive compatible (AIC): whether an advertiser can gain by misreporting their constraints to the autobidder.
We consider value-maximizing advertisers in two important settings: when they have a budget constraint and when they have a target cost-per-acquisition constraint. The main result of our work is that for both settings, FPA and SPA are not AIC. This contrasts with FPA being AIC when auto-bidders are constrained to bid using a (sub-optimal) uniform bidding policy. We further extend our main result and show that any (possibly randomized) auction that is truthful (in the classic profit-maximizing sense), scalar invariant and symmetric is not AIC. Finally, to complement our findings, we provide sufficient market conditions for FPA and SPA to become AIC for two advertisers. These conditions require advertisers' valuations to be well-aligned. This suggests that when the competition is intense for all queries, advertisers have less incentive to misreport their constraints.
From a methodological standpoint, we develop a novel continuous model of queries. This model provides tractability to study equilibrium with auto-bidders, which contrasts with the standard discrete query model, which is known to be hard. Through the analysis of this model, we uncover a surprising result: in auto-bidding with two advertisers, FPA and SPA are auction equivalent. - [573] arXiv:2304.12690 (replaced) [pdf, ps, other]
-
Title: The Generations of Classical Correlations via Quantum SchemesComments: 18 pages, no figures. To appear in IEEE Transactions on Information Theory. Comments are welcomeSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
Suppose two separated parties, Alice and Bob, share a bipartite quantum state or a classical correlation called a \emph{seed}, and they try to generate a target classical correlation by performing local quantum or classical operations on the seed, i.e., any communications are not allowed. We consider the following fundamental problem about this setting: whether Alice and Bob can use a given seed to generate a target classical correlation. We show that this problem has rich mathematical structures. Firstly, we prove that even if the seed is a pure bipartite state, the above decision problem is already NP-hard and a similar conclusion can also be drawn when the seed is also a classical correlation, implying that this problem is hard to solve generally. Furthermore, we prove that when the seed is a pure quantum state, solving the problem is equivalent to finding out whether the target classical correlation has some diagonal form of positive semi-definite factorizations that matches the seed pure state, revealing an interesting connection between the current problem and optimization theory. Based on this observation and other insights, we give several necessary conditions where the seed pure state has to satisfy to generate the target classical correlation, and it turns out that these conditions can also be generalized to the case that the seed is a mixed quantum state. Lastly, since diagonal forms of positive semi-definite factorizations play a crucial role in solving the problem, we develop an algorithm that can compute them for an arbitrary classical correlation, which has decent performance on the cases we test.
- [574] arXiv:2306.11286 (replaced) [pdf, ps, html, other]
-
Title: Globally optimal solutions to a class of fractional optimization problems based on proximity gradient algorithmComments: 29 pages, 2 figuresSubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
In this paper, we investigate a category of constrained fractional optimization problems that emerge in various practical applications. The objective function for this category is characterized by the ratio of a numerator and denominator, both being convex, semi-algebraic, Lipschitz continuous, and differentiable with Lipschitz continuous gradients over the constraint sets. The constrained sets associated with these problems are closed, convex, and semi-algebraic. We propose an efficient algorithm that is inspired by the proximal gradient method, and we provide a thorough convergence analysis. Our algorithm offers several benefits compared to existing methods. It requires only a single proximal gradient operation per iteration, thus avoiding the complicated inner-loop concave maximization usually required. Additionally, our method converges to a critical point without the typical need for a nonnegative numerator, and this critical point becomes a globally optimal solution with an appropriate condition. Our approach is adaptable to unbounded constraint sets as well. Therefore, our approach is viable for many more practical models. Numerical experiments show that our method not only reliably reaches ground-truth solutions in some model problems but also outperforms several existing methods in maximizing the Sharpe ratio with real-world financial data.
- [575] arXiv:2307.03293 (replaced) [pdf, ps, other]
-
Title: CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray imagesNicolás Gaggion, Candelaria Mosquera, Lucas Mansilla, Julia Mariel Saidman, Martina Aineseder, Diego H. Milone, Enzo FerranteComments: The CheXmask dataset is publicly available at this https URLSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
The development of successful artificial intelligence models for chest X-ray analysis relies on large, diverse datasets with high-quality annotations. While several databases of chest X-ray images have been released, most include disease diagnosis labels but lack detailed pixel-level anatomical segmentation labels. To address this gap, we introduce an extensive chest X-ray multi-center segmentation dataset with uniform and fine-grain anatomical annotations for images coming from five well-known publicly available databases: ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest, and VinDr-CXR, resulting in 657,566 segmentation masks. Our methodology utilizes the HybridGNet model to ensure consistent and high-quality segmentations across all datasets. Rigorous validation, including expert physician evaluation and automatic quality control, was conducted to validate the resulting masks. Additionally, we provide individualized quality indices per mask and an overall quality estimation per dataset. This dataset serves as a valuable resource for the broader scientific community, streamlining the development and assessment of innovative methodologies in chest X-ray analysis. The CheXmask dataset is publicly available at: this https URL
- [576] arXiv:2307.10053 (replaced) [pdf, ps, other]
-
Title: SGD-type Methods with Guaranteed Global Stability in Nonsmooth Nonconvex OptimizationComments: 36 pagesSubjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we focus on providing convergence guarantees for variants of the stochastic subgradient descent (SGD) method in minimizing nonsmooth nonconvex functions. We first develop a general framework to establish global stability for general stochastic subgradient methods, where the corresponding differential inclusion admits a coercive Lyapunov function. We prove that, with sufficiently small stepsizes and controlled noises, the iterates asymptotically stabilize around the stable set of its corresponding differential inclusion. Then we introduce a scheme for developing SGD-type methods with regularized update directions for the primal variables. Based on our developed framework, we prove the global stability of our proposed scheme under mild conditions. We further illustrate that our scheme yields variants of SGD-type methods, which enjoy guaranteed convergence in training nonsmooth neural networks. In particular, by employing the sign map to regularize the update directions, we propose a novel subgradient method named the Sign-map Regularized SGD method (SRSGD). Preliminary numerical experiments exhibit the high efficiency of SRSGD in training deep neural networks.
- [577] arXiv:2307.11552 (replaced) [pdf, ps, other]
-
Title: Contributions of El Ni\~no Southern Oscillation (ENSO) Diversity to Low-Frequency Changes in ENSO VarianceSubjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)
El Niño Southern Oscillation (ENSO) diversity is characterized based on the longitudinal location of maximum sea surface temperature anomalies (SSTA) and amplitude in the tropical Pacific, as Central Pacific (CP) events are typically weaker than Eastern Pacific (EP) events. SSTA pattern and intensity undergo low-frequency modulations, affecting ENSO prediction skill and remote impacts. Yet, how different ENSO types contribute to these decadal variations and long-term variance trends remain uncertain. Here, we decompose the low-frequency changes of ENSO variance into contributions from ENSO diversity categories. We propose a fuzzy clustering of monthly SSTA to allow for non-binary event category memberships. Our approach identifies two La Niña and three El Niño categories and shows that the shift of ENSO variance in the mid-1970s is associated with an increasing likelihood of strong La Niña and extreme El Niño events.
- [578] arXiv:2307.14804 (replaced) [pdf, ps, html, other]
-
Title: Collective behavior from surprise minimizationComments: 29 pages (main text), 29 pages (supplemental appendices), 4 figures, 1 supplemental figure, 5 moviesJournal-ref: Proceedings of the National Academy of Sciences, 121(17), e2320239121 (2024)Subjects: Adaptation and Self-Organizing Systems (nlin.AO); Multiagent Systems (cs.MA); Neurons and Cognition (q-bio.NC)
Collective motion is ubiquitous in nature; groups of animals, such as fish, birds, and ungulates appear to move as a whole, exhibiting a rich behavioral repertoire that ranges from directed movement to milling to disordered swarming. Typically, such macroscopic patterns arise from decentralized, local interactions among constituent components (e.g., individual fish in a school). Preeminent models of this process describe individuals as self-propelled particles, subject to self-generated motion and 'social forces' such as short-range repulsion and long-range attraction or alignment. However, organisms are not particles; they are probabilistic decision-makers. Here, we introduce an approach to modelling collective behavior based on active inference. This cognitive framework casts behavior as the consequence of a single imperative: to minimize surprise. We demonstrate that many empirically-observed collective phenomena, including cohesion, milling and directed motion, emerge naturally when considering behavior as driven by active Bayesian inference -- without explicitly building behavioral rules or goals into individual agents. Furthermore, we show that active inference can recover and generalize the classical notion of social forces as agents attempt to suppress prediction errors that conflict with their expectations. By exploring the parameter space of the belief-based model, we reveal non-trivial relationships between the individual beliefs and group properties like polarization and the tendency to visit different collective states. We also explore how individual beliefs about uncertainty determine collective decision-making accuracy. Finally, we show how agents can update their generative model over time, resulting in groups that are collectively more sensitive to external fluctuations and encode information more robustly.
- [579] arXiv:2308.16422 (replaced) [pdf, ps, html, other]
-
Title: Dilated convolutional neural network for detecting extreme-mass-ratio inspiralsComments: 11 pages, 5 figures, and 2 tablesJournal-ref: Phys. Rev. D 109, 084054 (2024)Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); General Relativity and Quantum Cosmology (gr-qc)
The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and are often constrained by data duration and SNR. In addition, most existing work ignores time-delay interferometry (TDI) and applies the long-wavelength approximation in detector response calculations, thus limiting their ability to handle laser frequency noise. In this study, we introduce DECODE, an end-to-end model focusing on EMRI signal detection by sequence modeling in the frequency domain. Centered around a dilated causal convolutional neural network, trained on synthetic data considering TDI-1.5 detector response, DECODE can efficiently process a year's worth of multichannel TDI data with an SNR of around 50. We evaluate our model on 1-year data with accumulated SNR ranging from 50 to 120 and achieve a true positive rate of 96.3% at a false positive rate of 1%, keeping an inference time of less than 0.01 seconds. With the visualization of three showcased EMRI signals for interpretability and generalization, DECODE exhibits strong potential for future space-based gravitational wave data analyses.
- [580] arXiv:2310.03722 (replaced) [pdf, ps, html, other]
-
Title: Anytime-valid t-tests and confidence sequences for Gaussian means with unknown varianceComments: Substantive revision in v3 (Apr 23 2024)Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma^2$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an "e-process" (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious polynomial dependence on the error probability $\alpha$ that we prove to be not only unavoidable, but (for universal inference) even better than the classical fixed-sample t-test. Numerical experiments are provided along the way to compare and contrast the various approaches, including some recent suboptimal ones.
- [581] arXiv:2310.17646 (replaced) [pdf, ps, other]
-
Title: Learning the dynamics of a one-dimensional plasma model with graph neural networksComments: title changed, accepted for publication in Machine Learning: Science and Technology, code available at this https URLSubjects: Plasma Physics (physics.plasm-ph); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
We explore the possibility of fully replacing a plasma physics kinetic simulator with a graph neural network-based simulator. We focus on this class of surrogate models given the similarity between their message-passing update mechanism and the traditional physics solver update, and the possibility of enforcing known physical priors into the graph construction and update. We show that our model learns the kinetic plasma dynamics of the one-dimensional plasma model, a predecessor of contemporary kinetic plasma simulation codes, and recovers a wide range of well-known kinetic plasma processes, including plasma thermalization, electrostatic fluctuations about thermal equilibrium, and the drag on a fast sheet and Landau damping. We compare the performance against the original plasma model in terms of run-time, conservation laws, and temporal evolution of key physical quantities. The limitations of the model are presented and possible directions for higher-dimensional surrogate models for kinetic plasmas are discussed.
- [582] arXiv:2312.05388 (replaced) [pdf, ps, other]
-
Title: Higher-Order Equivariant Neural Networks for Charge Density Prediction in MaterialsSubjects: Computational Physics (physics.comp-ph); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
The calculation of electron density distribution using density functional theory (DFT) in materials and molecules is central to the study of their quantum and macro-scale properties, yet accurate and efficient calculation remains a long-standing challenge. We introduce ChargE3Net, an E(3)-equivariant graph neural network for predicting electron density in atomic systems. ChargE3Net enables the learning of higher-order equivariant feature to achieve high predictive accuracy and model expressivity. We show that ChargE3Net exceeds the performance of prior work on diverse sets of molecules and materials. When trained on the massive dataset of over 100K materials in the Materials Project database, our model is able to capture the complexity and variability in the data, leading to a significant 26.7% reduction in self-consistent iterations when used to initialize DFT calculations on unseen materials. Furthermore, we show that non-self-consistent DFT calculations using our predicted charge densities yield near-DFT performance on electronic and thermodynamic property prediction at a fraction of the computational cost. Further analysis attributes the greater predictive accuracy to improved modeling of systems with high angular variations. These results illuminate a pathway towards a machine learning-accelerated ab initio calculations for materials discovery.
- [583] arXiv:2312.16657 (replaced) [pdf, ps, other]
-
Title: On a finite sum of cosecants appearing in various problemsSubjects: Number Theory (math.NT); Classical Analysis and ODEs (math.CA); Numerical Analysis (math.NA)
In this paper we investigate the finite sum of cosecants $\sum\csc\big(\varphi+a\pi l/n\big),$ where the index $l$ runs through 1 to $n-1$ and $\varphi$ and $a$ are arbitrary parameters, as well as several closely related sums, such as similar sums of a series of secants, of tangents and of cotangents. These trigonometric sums appear in various problems in mathematics, physics, and a variety of related disciplines. Their particular cases were fragmentarily considered in previous works, and it was noted that even a simple particular case $\sum\csc\big(\pi l/n\big)$ does not have a closed-form, i.e. a compact summation formula. In the paper, we derive several alternative representations for the above-mentioned sums, study their properties, relate them to many other finite and infinite sums, obtain their complete asymptotic expansions for large $n$ and provide accurate upper and lower bounds (e.g. the typical relative error for the upper bound is lesser than $2\times10^{-9}$ for $n\geqslant10$ and lesser than $7\times10^{-14}$ for $n\geqslant50$, which is much better than the bounds we could find in previous works). Our researches reveal that these sums are deeply related to several special numbers and functions, especially to the digamma function (furthermore, as a by-product, we obtain several interesting summations formulae for the digamma function). Asymptotical studies show that these sums may have qualitatively different behaviour depending on the choice of $\varphi$ and $a$; in particular, as $n$ increases some of them may become sporadically large. Finally, we also provide several historical remarks related to various sums considered in the paper. We show that some results in the field either were rediscovered several times or can easily be deduced from various known formulae, including some formulae dating back to the XVIIIth century.
- [584] arXiv:2401.05819 (replaced) [pdf, ps, other]
-
Title: TAnet: A New Temporal Attention Network for EEG-based Auditory Spatial Attention Decoding with a Short Decision WindowSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Auditory spatial attention detection (ASAD) is used to determine the direction of a listener's attention to a speaker by analyzing her/his electroencephalographic (EEG) signals. This study aimed to further improve the performance of ASAD with a short decision window (i.e., <1 s) rather than with long decision windows ranging from 1 to 5 seconds in previous studies. An end-to-end temporal attention network (i.e., TAnet) was introduced in this work. TAnet employs a multi-head attention (MHA) mechanism, which can more effectively capture the interactions among time steps in collected EEG signals and efficiently assign corresponding weights to those EEG time steps. Experiments demonstrated that, compared with the CNN-based method and recent ASAD methods, TAnet provided improved decoding performance in the KUL dataset, with decoding accuracies of 92.4% (decision window 0.1 s), 94.9% (0.25 s), 95.1% (0.3 s), 95.4% (0.4 s), and 95.5% (0.5 s) with short decision windows (i.e., <1 s). As a new ASAD model with a short decision window, TAnet can potentially facilitate the design of EEG-controlled intelligent hearing aids and sound recognition systems.
- [585] arXiv:2402.06260 (replaced) [pdf, ps, other]
-
Title: Vertex-minor universal graphs for generating entangled quantum subsystemsSubjects: Quantum Physics (quant-ph); Discrete Mathematics (cs.DM)
We study the notion of $k$-stabilizer universal quantum state, that is, an $n$-qubit quantum state, such that it is possible to induce any stabilizer state on any $k$ qubits, by using only local operations and classical communications. These states generalize the notion of $k$-pairable states introduced by Bravyi et al., and can be studied from a combinatorial perspective using graph states and $k$-vertex-minor universal graphs. First, we demonstrate the existence of $k$-stabilizer universal graph states that are optimal in size with $n=\Theta(k^2)$ qubits. We also provide parameters for which a random graph state on $\Theta(k^2)$ qubits is $k$-stabilizer universal with high probability. Our second contribution consists of two explicit constructions of $k$-stabilizer universal graph states on $n = O(k^4)$ qubits. Both rely upon the incidence graph of the projective plane over a finite field $\mathbb{F}_q$. This provides a major improvement over the previously known explicit construction of $k$-pairable graph states with $n = O(2^{3k})$, bringing forth a new and potentially powerful family of multipartite quantum resources.
- [586] arXiv:2402.09802 (replaced) [pdf, ps, other]
-
Title: Criterion Collapse and Loss Distribution ControlComments: Revised version accepted to ICML 2024Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In this work, we consider the notion of "criterion collapse," in which optimization of one metric implies optimality in another, with a particular focus on conditions for collapse into error probability minimizers under a wide variety of learning criteria, ranging from DRO and OCE risks (CVaR, tilted ERM) to non-monotonic criteria underlying recent ascent-descent algorithms explored in the literature (Flooding, SoftAD). We show how collapse in the context of losses with a Bernoulli distribution goes far beyond existing results for CVaR and DRO, then expand our scope to include surrogate losses, showing conditions where monotonic criteria such as tilted ERM cannot avoid collapse, whereas non-monotonic alternatives can.
- [587] arXiv:2403.03104 (replaced) [pdf, ps, other]
-
Title: Low-rank approximated Kalman-Bucy filters using Oja's principal component flow for linear time-invariant systemsComments: 6 pages, fixed typographical errors and clarified some unclear statementsSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
The Kalman-Bucy filter is extensively utilized across various applications. However, its computational complexity increases significantly in large-scale systems. To mitigate this challenge, a low-rank approximated Kalman--Bucy filter was proposed, comprising Oja's principal component flow and a low-dimensional Riccati differential equation. Previously, the estimation error was confirmed solely for linear time-invariant systems with a symmetric system matrix. This study extends the application by eliminating the constraint on the symmetricity of the system matrix and describes the equilibrium points of the Oja flow along with their stability for general matrices. In addition, the domain of attraction for a set of stable equilibrium points is estimated. Based on these findings, we demonstrate that the low-rank approximated Kalman--Bucy filter with a suitable rank maintains a bounded estimation error covariance matrix if the system is controllable and observable.
- [588] arXiv:2403.15528 (replaced) [pdf, ps, other]
-
Title: Evaluating GPT-4 with Vision on Detection of Radiological Findings on Chest RadiographsYiliang Zhou, Hanley Ong, Patrick Kennedy, Carol Wu, Jacob Kazam, Keith Hentel, Adam Flanders, George Shih, Yifan PengSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
The study examines the application of GPT-4V, a multi-modal large language model equipped with visual recognition, in detecting radiological findings from a set of 100 chest radiographs and suggests that GPT-4V is currently not ready for real-world diagnostic usage in interpreting chest radiographs.
- [589] arXiv:2404.05553 (replaced) [pdf, ps, other]
-
Title: Alljoined1 -- A dataset for EEG-to-Image decodingJonathan Xu, Bruno Aristimunha, Max Emanuel Feucht, Emma Qian, Charles Liu, Tazik Shahjahan, Martyna Spyra, Steven Zifan Zhang, Nicholas Short, Jioh Kim, Paula Perdomo, Ricky Renfeng Mao, Yashvir Sabharwal, Michael Ahedor Moaz Shoura, Adrian NestorComments: 8 Pages, 6 FiguresSubjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)
We present Alljoined1, a dataset built specifically for EEG-to-Image decoding. Recognizing that an extensive and unbiased sampling of neural responses to visual stimuli is crucial for image reconstruction efforts, we collected data from 8 participants looking at 10,000 natural images each. We have currently gathered 46,080 epochs of brain responses recorded with a 64-channel EEG headset. The dataset combines response-based stimulus timing, repetition between blocks and sessions, and diverse image classes with the goal of improving signal quality. For transparency, we also provide data quality scores. We publicly release the dataset and all code at this https URL.
- [590] arXiv:2404.17483 (replaced) [pdf, ps, other]
-
Title: Differentiable Pareto-Smoothed Weighting for High-Dimensional Heterogeneous Treatment Effect EstimationComments: Accepted to the 40th Conference on Uncertainty in Artificial Intelligence (UAI2024). 14 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
There is a growing interest in estimating heterogeneous treatment effects across individuals using their high-dimensional feature attributes. Achieving high performance in such high-dimensional heterogeneous treatment effect estimation is challenging because in this setup, it is usual that some features induce sample selection bias while others do not but are predictive of potential outcomes. To avoid losing such predictive feature information, existing methods learn separate feature representations using inverse probability weighting (IPW). However, due to their numerically unstable IPW weights, these methods suffer from estimation bias under a finite sample setup. To develop a numerically robust estimator by weighted representation learning, we propose a differentiable Pareto-smoothed weighting framework that replaces extreme weight values in an end-to-end fashion. Our experimental results show that by effectively correcting the weight values, our proposed method outperforms the existing ones, including traditional weighting schemes.
- [591] arXiv:2404.17742 (replaced) [pdf, ps, other]
-
Title: Segmentation Quality and Volumetric Accuracy in Medical ImagingComments: Data used in the paper contains some privacy issue in medical image. Some proper citations are also missingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Current medical image segmentation relies on the region-based (Dice, F1-score) and boundary-based (Hausdorff distance, surface distance) metrics as the de-facto standard. While these metrics are widely used, they lack a unified interpretation, particularly regarding volume agreement. Clinicians often lack clear benchmarks to gauge the "goodness" of segmentation results based on these metrics. Recognizing the clinical relevance of volumetry, we utilize relative volume prediction error (vpe) to directly assess the accuracy of volume predictions derived from segmentation tasks. Our work integrates theoretical analysis and empirical validation across diverse datasets. We delve into the often-ambiguous relationship between segmentation quality (measured by Dice) and volumetric accuracy in clinical practice. Our findings highlight the critical role of incorporating volumetric prediction accuracy into segmentation evaluation. This approach empowers clinicians with a more nuanced understanding of segmentation performance, ultimately improving the interpretation and utility of these metrics in real-world healthcare settings.
- [592] arXiv:2405.00065 (replaced) [pdf, ps, other]
-
Title: From Linear to Linearizable Optimization: A Novel Framework with Applications to Stationary and Non-stationary DR-submodular OptimizationComments: arXiv admin note: text overlap with arXiv:2402.08621Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC); Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper introduces the notion of upper linearizable/quadratizable functions, a class that extends concavity and DR-submodularity in various settings, including monotone and non-monotone cases over different convex sets. A general meta-algorithm is devised to convert algorithms for linear/quadratic maximization into ones that optimize upper quadratizable functions, offering a unified approach to tackling concave and DR-submodular optimization problems. The paper extends these results to multiple feedback settings, facilitating conversions between semi-bandit/first-order feedback and bandit/zeroth-order feedback, as well as between first/zeroth-order feedback and semi-bandit/bandit feedback. Leveraging this framework, new algorithms are derived using existing results as base algorithms for convex optimization, improving upon state-of-the-art results in various cases. Dynamic and adaptive regret guarantees are obtained for DR-submodular maximization, marking the first algorithms to achieve such guarantees in these settings. Notably, the paper achieves these advancements with fewer assumptions compared to existing state-of-the-art results, underscoring its broad applicability and theoretical contributions to non-convex optimization.
- [593] arXiv:2405.00610 (replaced) [pdf, ps, other]
-
Title: Growth in products of matrices: fastest, average, and genericComments: 10 pages. Comments are welcomeSubjects: Group Theory (math.GR); Cryptography and Security (cs.CR); Combinatorics (math.CO); Dynamical Systems (math.DS); Probability (math.PR)
The problems that we consider in this paper are as follows. Let A and B be 2x2 matrices (over reals). Let w(A, B) be a word of length n. After evaluating w(A, B) as a product of matrices, we get a 2x2 matrix, call it W. What is the largest (by the absolute value) possible entry of W, over all w(A, B) of length n, as a function of n? What is the expected absolute value of the largest (by the absolute value) entry in a random product of n matrices, where each matrix is A or B with probability 0.5? What is the Lyapunov exponent for a random matrix product like that? We give partial answer to the first of these questions and an essentially complete answer to the second question. For the third question (the most difficult of the three), we offer a very simple method to produce an upper bound on the Lyapunov exponent in the case where all entries of the matrices A and B are nonnegative.
- [594] arXiv:2405.01463 (replaced) [pdf, ps, other]
-
Title: Dynamic Local Average Treatment EffectsSubjects: Econometrics (econ.EM); Machine Learning (cs.LG); Methodology (stat.ME)
We consider Dynamic Treatment Regimes (DTRs) with One Sided Noncompliance that arise in applications such as digital recommendations and adaptive medical trials. These are settings where decision makers encourage individuals to take treatments over time, but adapt encouragements based on previous encouragements, treatments, states, and outcomes. Importantly, individuals may not comply with encouragements based on unobserved confounders. For settings with binary treatments and encouragements, we provide nonparametric identification, estimation, and inference for Dynamic Local Average Treatment Effects (LATEs), which are expected values of multiple time period treatment contrasts for the respective complier subpopulations. Under standard assumptions in the Instrumental Variable and DTR literature, we show that one can identify Dynamic LATEs that correspond to treating at single time steps. Under an additional cross-period effect-compliance independence assumption, which is satisfied in Staggered Adoption settings and a generalization of them, which we define as Staggered Compliance settings, we identify Dynamic LATEs for treating in multiple time periods.
- [595] arXiv:2405.04610 (replaced) [pdf, ps, other]
-
Title: Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer ClassificationComments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold standard, although time-consuming and vulnerable to inter-observer mistakes. Limited access to high-end technology further limits patients' ability to receive immediate medical care and diagnosis. Recent advances in deep learning have generated interest in its application to medical imaging analysis, specifically the use of histopathological images to diagnose lung and colon cancer. The goal of this investigation is to use and adapt existing pre-trained CNN-based models, such as Xception, DenseNet201, ResNet101, InceptionV3, DenseNet121, DenseNet169, ResNet152, and InceptionResNetV2, to enhance classification through better augmentation strategies. The results show tremendous progress, with all eight models reaching impressive accuracy ranging from 97% to 99%. Furthermore, attention visualization techniques such as GradCAM, GradCAM++, ScoreCAM, Faster Score-CAM, and LayerCAM, as well as Vanilla Saliency and SmoothGrad, are used to provide insights into the models' classification decisions, thereby improving interpretability and understanding of malignant and benign image classification.
- [596] arXiv:2405.06725 (replaced) [pdf, ps, other]
-
Title: On the Shape of Brainscores for Large Language Models (LLMs)Comments: The Figure 10 from arXiv:1710.04019, Figure 6.28 from arXiv:2403.13825, and captions are both from this https URL, where the case in my paper is Figure 3, and has already cited its original source. I believe both arXiv:1710.04019 and arXiv:2403.13825 should cite the original source, rather than force me to cite themSubjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
With the rise of Large Language Models (LLMs), the novel metric "Brainscore" emerged as a means to evaluate the functional similarity between LLMs and human brain/neural systems. Our efforts were dedicated to mining the meaning of the novel score by constructing topological features derived from both human fMRI data involving 190 subjects, and 39 LLMs plus their untrained counterparts. Subsequently, we trained 36 Linear Regression Models and conducted thorough statistical analyses to discern reliable and valid features from our constructed ones. Our findings reveal distinctive feature combinations conducive to interpreting existing brainscores across various brain regions of interest (ROIs) and hemispheres, thereby significantly contributing to advancing interpretable machine learning (iML) studies. The study is enriched by our further discussions and analyses concerning existing brainscores. To our knowledge, this study represents the first attempt to comprehend the novel metric brainscore within this interdisciplinary domain.
- [597] arXiv:2405.07137 (replaced) [pdf, ps, other]
-
Title: Oracle Separation between Noisy Quantum Polynomial Time and the Polynomial HierarchySubjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC)
This work investigates the oracle separation between the physically motivated complexity class of noisy quantum circuits, inspired by definitions such as those presented by Chen, Cotler, Huang, and Li (2022). We establish that with a constant error rate, separation can be achieved in terms of NP. When the error rate is $\Omega(\log n/n)$, we can extend this result to the separation of PH. Notably, our oracles, in all separations, do not necessitate error correction schemes or fault tolerance, as all quantum circuits are of constant depth. This indicates that even quantum computers with minor errors, without error correction, may surpass classical complexity classes under various scenarios and assumptions. We also explore various common noise settings and present new classical hardness results, generalizing those found in studies by Raz and Tal (2022) and Bassirian, Bouland, Fefferman, Gunn, and Tal (2021), which are of independent interest.
- [598] arXiv:2405.07217 (replaced) [pdf, ps, other]
-
Title: Improved bounds for polylogarithmic graph distances in scale-free percolation and related modelsComments: 21 pagesSubjects: Probability (math.PR); Social and Information Networks (cs.SI); Combinatorics (math.CO)
In this paper, we study graph distances in the geometric random graph models scale-free percolation SFP, geometric inhomogeneous random graphs GIRG, and hyperbolic random graphs HRG. Despite the wide success of the models, the parameter regime in which graph distances are polylogarithmic is poorly understood. We provide new and improved lower bounds. In a certain portion of the parameter regime, those match the known upper bounds.
Compared to the best previous lower bounds by Hao and Heydenreich, our result has several advantages: it gives matching bounds for a larger range of parameters, thus settling the question for a larger portion of the parameter space. It strictly improves the lower bounds by Hao and Heydenreich for all parameters settings in which those bounds were not tight. It gives tail bounds on the probability of having short paths, which imply shape theorems for the $k$-neighbourhood of a vertex whenever our lower bounds are tight, and tight bounds for the size of this $k$-neighbourhood. And last but not least, our proof is much simpler and not much longer than two pages, and we demonstrate that it generalizes well by showing that the same technique also works for first passage percolation. - [599] arXiv:2405.07890 (replaced) [pdf, ps, other]
-
Title: Subspace-Informed Matrix CompletionComments: arXiv admin note: text overlap with arXiv:2111.00235Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
In this work, we consider the matrix completion problem, where the objective is to reconstruct a low-rank matrix from a few observed entries. A commonly employed approach involves nuclear norm minimization. For this method to succeed, the number of observed entries needs to scale at least proportional to both the rank of the ground-truth matrix and the coherence parameter. While the only prior information is oftentimes the low-rank nature of the ground-truth matrix, in various real-world scenarios, additional knowledge about the ground-truth low-rank matrix is available. For instance, in collaborative filtering, Netflix problem, and dynamic channel estimation in wireless communications, we have partial or full knowledge about the signal subspace in advance. Specifically, we are aware of some subspaces that form multiple angles with the column and row spaces of the ground-truth matrix. Leveraging this valuable information has the potential to significantly reduce the required number of observations. To this end, we introduce a multi-weight nuclear norm optimization problem that concurrently promotes the low-rank property as well the information about the available subspaces. The proposed weights are tailored to penalize each angle corresponding to each basis of the prior subspace independently. We further propose an optimal weight selection strategy by minimizing the coherence parameter of the ground-truth matrix, which is equivalent to minimizing the required number of observations. Simulation results validate the advantages of incorporating multiple weights in the completion procedure. Specifically, our proposed multi-weight optimization problem demonstrates a substantial reduction in the required number of observations compared to the state-of-the-art methods.