Computer Science
- [1] arXiv:2405.08819 [pdf, ps, other]
-
Title: eScope: A Fine-Grained Power Prediction Mechanism for Mobile ApplicationsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Managing the limited energy on mobile platforms executing long-running, resource intensive streaming applications requires adapting an application's operators in response to their power consumption. For example, the frame refresh rate may be reduced if the rendering operation is consuming too much power. Currently, predicting an application's power consumption requires (1) building a device-specific power model for each hardware component, and (2) analyzing the application's code. This approach can be complicated and error-prone given the complexity of an application's logic and the hardware platforms with heterogeneous components that it may execute on. We propose eScope, an alternative method to directly estimate power consumption by each operator in an application. Specifically, eScope correlates an application's execution traces with its device-level energy draw. We implement eScope as a tool for Android platforms and evaluate it using workloads on several synthetic applications as well as two video stream analytics applications. Our evaluation suggests that eScope predicts an application's power use with 97% or better accuracy while incurring a compute time overhead of less than 3%.
- [2] arXiv:2405.08828 [pdf, ps, html, other]
-
Title: Using ChatGPT for Thematic AnalysisSubjects: Human-Computer Interaction (cs.HC)
The utilisation of AI-driven tools, notably ChatGPT, within academic research is increasingly debated from several perspectives including ease of implementation, and potential enhancements in research efficiency, as against ethical concerns and risks such as biases and unexplained AI operations. This paper explores the use of the GPT model for initial coding in qualitative thematic analysis using a sample of UN policy documents. The primary aim of this study is to contribute to the methodological discussion regarding the integration of AI tools, offering a practical guide to validation for using GPT as a collaborative research assistant. The paper outlines the advantages and limitations of this methodology and suggests strategies to mitigate risks. Emphasising the importance of transparency and reliability in employing GPT within research methodologies, this paper argues for a balanced use of AI in supported thematic analysis, highlighting its potential to elevate research efficacy and outcomes.
- [3] arXiv:2405.08830 [pdf, ps, html, other]
-
Title: Evaluating Supply Chain Resilience During Pandemic Using Agent-based SimulationSubjects: Multiagent Systems (cs.MA); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Recent pandemics have highlighted vulnerabilities in our global economic systems, especially supply chains. Possible future pandemic raises a dilemma for businesses owners between short-term profitability and long-term supply chain resilience planning. In this study, we propose a novel agent-based simulation model integrating extended Susceptible-Infected-Recovered (SIR) epidemiological model and supply and demand economic model to evaluate supply chain resilience strategies during pandemics. Using this model, we explore a range of supply chain resilience strategies under pandemic scenarios using in silico experiments. We find that a balanced approach to supply chain resilience performs better in both pandemic and non-pandemic times compared to extreme strategies, highlighting the importance of preparedness in the form of a better supply chain resilience. However, our analysis shows that the exact supply chain resilience strategy is hard to obtain for each firm and is relatively sensitive to the exact profile of the pandemic and economic state at the beginning of the pandemic. As such, we used a machine learning model that uses the agent-based simulation to estimate a near-optimal supply chain resilience strategy for a firm. The proposed model offers insights for policymakers and businesses to enhance supply chain resilience in the face of future pandemics, contributing to understanding the trade-offs between short-term gains and long-term sustainability in supply chain management before and during pandemics.
- [4] arXiv:2405.08831 [pdf, ps, html, other]
-
Title: Deceptive, Disruptive, No Big Deal: Japanese People React to Simulated Dark Commercial PatternsKatie Seaborn, Tatsuya Itagaki, Mizuki Watanabe, Yijia Wang, Ping Geng, Takao Fujii, Yuto Mandai, Miu Kojima, Suzuka YoshidaJournal-ref: CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (2024), Article No.: 95, 1-8Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Dark patterns and deceptive designs (DPs) are user interface elements that trick people into taking actions that benefit the purveyor. Such designs are widely deployed, with special varieties found in certain nations like Japan that can be traced to global power hierarchies and the local socio-linguistic context of use. In this breaking work, we report on the first user study involving Japanese people (n=30) experiencing a mock shopping website injected with simulated DPs. We found that Alphabet Soup and Misleading Reference Pricing were the most deceptive and least noticeable. Social Proofs, Sneaking in Items, and Untranslation were the least deceptive but Untranslation prevented most from cancelling their account. Mood significantly worsened after experiencing the website. We contribute the first empirical findings on a Japanese consumer base alongside a scalable approach to evaluating user attitudes, perceptions, and behaviours towards DPs in an interactive context. We urge for more human participant research and ideally collaborations with industry to assess real designs in the wild.
- [5] arXiv:2405.08832 [pdf, ps, other]
-
Title: Theorizing Deception: A Scoping Review of Theory in Research on Dark Patterns and Deceptive DesignJournal-ref: CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (2024), Article No.: 321, 1-7Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
The issue of dark patterns and deceptive designs (DPs) in everyday interfaces and interactions continues to grow. DPs are manipulative and malicious elements within user interfaces that deceive users into making unintended choices. In parallel, research on DPs has significantly increased over the past two decades. As the field has matured, epistemological gaps have also become a salient and pressing concern. In this scoping review, we assessed the academic work so far -- 51 papers between 2014 to 2023 -- to identify the state of theory in DP research. We identified the key theories employed, examined how these theories have been referenced, and call for enhancing the incorporation of theory into DP research. We also propose broad theoretical foundations to establish a comprehensive and solid base for contextualizing and informing future DP research from a variety of theoretical scopes and lenses.
- [6] arXiv:2405.08834 [pdf, ps, other]
-
Title: Adversarial Machine Learning Threats to SpacecraftComments: PreprintSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Spacecraft are among the earliest autonomous systems. Their ability to function without a human in the loop have afforded some of humanity's grandest achievements. As reliance on autonomy grows, space vehicles will become increasingly vulnerable to attacks designed to disrupt autonomous processes-especially probabilistic ones based on machine learning. This paper aims to elucidate and demonstrate the threats that adversarial machine learning (AML) capabilities pose to spacecraft. First, an AML threat taxonomy for spacecraft is introduced. Next, we demonstrate the execution of AML attacks against spacecraft through experimental simulations using NASA's Core Flight System (cFS) and NASA's On-board Artificial Intelligence Research (OnAIR) Platform. Our findings highlight the imperative for incorporating AML-focused security measures in spacecraft that engage autonomy.
- [7] arXiv:2405.08838 [pdf, ps, html, other]
-
Title: PolyGlotFake: A Novel Multilingual and Multimodal DeepFake DatasetComments: 13 page, 4 figuresSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, and the few that are multimodal employ outdated techniques, and their audio content is limited to a single language, thereby failing to represent the cutting-edge advancements and globalization trends in current deepfake technologies. To address this gap, we propose a novel, multilingual, and multimodal deepfake dataset: PolyGlotFake. It includes content in seven languages, created using a variety of cutting-edge and popular Text-to-Speech, voice cloning, and lip-sync technologies. We conduct comprehensive experiments using state-of-the-art detection methods on PolyGlotFake dataset. These experiments demonstrate the dataset's significant challenges and its practical value in advancing research into multimodal deepfake detection.
- [8] arXiv:2405.08839 [pdf, ps, html, other]
-
Title: PromptMind Team at EHRSQL-2024: Improving Reliability of SQL Generation using Ensemble LLMsComments: Accepted as a poster for Clinical NLP workshop at NAACL 2024Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
This paper presents our approach to the EHRSQL-2024 shared task, which aims to develop a reliable Text-to-SQL system for electronic health records. We propose two approaches that leverage large language models (LLMs) for prompting and fine-tuning to generate EHRSQL queries. In both techniques, we concentrate on bridging the gap between the real-world knowledge on which LLMs are trained and the domain specific knowledge required for the task. The paper provides the results of each approach individually, demonstrating that they achieve high execution accuracy. Additionally, we show that an ensemble approach further enhances generation reliability by reducing errors. This approach secured us 2nd place in the shared task competition. The methodologies outlined in this paper are designed to be transferable to domain-specific Text-to-SQL problems that emphasize both accuracy and reliability.
- [9] arXiv:2405.08842 [pdf, ps, other]
-
Title: Automated Deep Learning for Load ForecastingJulie Keisler (CRIStAL, EDF R\&D OSIRIS, EDF R\&D), Sandra Claudel, Gilles Cabriel, Margaux BrégèreSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Accurate forecasting of electricity consumption is essential to ensure the performance and stability of the grid, especially as the use of renewable energy increases. Forecasting electricity is challenging because it depends on many external factors, such as weather and calendar variables. While regression-based models are currently effective, the emergence of new explanatory variables and the need to refine the temporality of the signals to be forecasted is encouraging the exploration of novel methodologies, in particular deep learning models. However, Deep Neural Networks (DNNs) struggle with this task due to the lack of data points and the different types of explanatory variables (e.g. integer, float, or categorical). In this paper, we explain why and how we used Automated Deep Learning (AutoDL) to find performing DNNs for load forecasting. We ended up creating an AutoDL framework called EnergyDragon by extending the DRAGON package and applying it to load forecasting. EnergyDragon automatically selects the features embedded in the DNN training in an innovative way and optimizes the architecture and the hyperparameters of the networks. We demonstrate on the French load signal that EnergyDragon can find original DNNs that outperform state-of-the-art load forecasting methods as well as other AutoDL approaches.
- [10] arXiv:2405.08843 [pdf, ps, other]
-
Title: FLEXIBLE: Forecasting Cellular Traffic by Leveraging Explicit Inductive Graph-Based LearningDuc Thinh Ngo (STACK), Kandaraj Piamrat (LS2N, STACK), Ons Aouedi, Thomas Hassan, Philippe Raipin-ParvédySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
From a telecommunication standpoint, the surge in users and services challenges next-generation networks with escalating traffic demands and limited resources. Accurate traffic prediction can offer network operators valuable insights into network conditions and suggest optimal allocation policies. Recently, spatio-temporal forecasting, employing Graph Neural Networks (GNNs), has emerged as a promising method for cellular traffic prediction. However, existing studies, inspired by road traffic forecasting formulations, overlook the dynamic deployment and removal of base stations, requiring the GNN-based forecaster to handle an evolving graph. This work introduces a novel inductive learning scheme and a generalizable GNN-based forecasting model that can process diverse graphs of cellular traffic with one-time training. We also demonstrate that this model can be easily leveraged by transfer learning with minimal effort, making it applicable to different areas. Experimental results show up to 9.8% performance improvement compared to the state-of-the-art, especially in rare-data settings with training data reduced to below 20%.
- [11] arXiv:2405.08848 [pdf, ps, html, other]
-
Title: Automated Repair of AI Code with Large Language Models and Formal VerificationSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
The next generation of AI systems requires strong safety guarantees. This report looks at the software implementation of neural networks and related memory safety properties, including NULL pointer deference, out-of-bound access, double-free, and memory leaks. Our goal is to detect these vulnerabilities, and automatically repair them with the help of large language models. To this end, we first expand the size of NeuroCodeBench, an existing dataset of neural network code, to about 81k programs via an automated process of program mutation. Then, we verify the memory safety of the mutated neural network implementations with ESBMC, a state-of-the-art software verifier. Whenever ESBMC spots a vulnerability, we invoke a large language model to repair the source code. For the latest task, we compare the performance of various state-of-the-art prompt engineering techniques, and an iterative approach that repeatedly calls the large language model.
- [12] arXiv:2405.08852 [pdf, ps, html, other]
-
Title: A Click-Through Rate Prediction Method Based on Cross-Importance of Multi-Order FeaturesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Most current click-through rate prediction(CTR)models create explicit or implicit high-order feature crosses through Hadamard product or inner product, with little attention to the importance of feature crossing; only few models are either limited to the second-order explicit feature crossing, implicitly to high-order feature crossing, or can learn the importance of high-order explicit feature crossing but fail to provide good interpretability for the model. This paper proposes a new model, FiiNet (Multiple Order Feature Interaction Importance Neural Networks). The model first uses the selective kernel network (SKNet) to explicitly construct multi-order feature crosses. It dynamically learns the importance of feature interaction combinations in a fine grained manner, increasing the attention weight of important feature cross combinations and reducing the weight of featureless crosses. To verify that the FiiNet model can dynamically learn the importance of feature interaction combinations in a fine-grained manner and improve the model's recommendation performance and interpretability, this paper compares it with many click-through rate prediction models on two real datasets, proving that the FiiNet model incorporating the selective kernel network can effectively improve the recommendation effect and provide better interpretability. FiiNet model implementations are available in PyTorch.
- [13] arXiv:2405.08882 [pdf, ps, html, other]
-
Title: Lollipop: SVM Rollups on SolanaSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We present a formal specification for the implementation of Solana virtual machine (SVM) rollups deployed on top of the Solana Layer 1 (L1) blockchain. We further discuss our motivation, implementation, design decisions, limitations, and preliminary results. Overall, this paper is intended to serve as an initial introduction to building such system(s) on top of the Solana L1 blockchain, but does not represent an absolute. Lastly, we comment discuss on extensions of this specification to support SVM rollups on other well-established L1 blockchains systems such as Ethereum.
- [14] arXiv:2405.08886 [pdf, ps, html, other]
-
Title: The Pitfalls and Promise of Conformal Inference Under Adversarial AttacksComments: ICML2024Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness through various forms of adversarial training (AT), a notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. To address this gap, this study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks within the adversarial defense community. It is first unveiled that existing CP methods do not produce informative prediction sets under the commonly used $l_{\infty}$-norm bounded attack if the model is not adversarially trained, which underpins the importance of adversarial training for CP. Our paper next demonstrates that the prediction set size (PSS) of CP using adversarially trained models with AT variants is often worse than using standard AT, inspiring us to research into CP-efficient AT for improved PSS. We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis. Moreover, our empirical study on four image classification datasets across three popular AT baselines validates the effectiveness of the proposed Uncertainty-Reducing AT (AT-UR).
- [15] arXiv:2405.08888 [pdf, ps, other]
-
Title: Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural LanguageComments: 22 pages, 5 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Accelerator Physics (physics.acc-ph)
Autonomous tuning of particle accelerators is an active and challenging field of research with the goal of enabling novel accelerator technologies cutting-edge high-impact applications, such as physics discovery, cancer research and material sciences. A key challenge with autonomous accelerator tuning remains that the most capable algorithms require an expert in optimisation, machine learning or a similar field to implement the algorithm for every new tuning task. In this work, we propose the use of large language models (LLMs) to tune particle accelerators. We demonstrate on a proof-of-principle example the ability of LLMs to successfully and autonomously tune a particle accelerator subsystem based on nothing more than a natural language prompt from the operator, and compare the performance of our LLM-based solution to state-of-the-art optimisation algorithms, such as Bayesian optimisation (BO) and reinforcement learning-trained optimisation (RLO). In doing so, we also show how LLMs can perform numerical optimisation of a highly non-linear real-world objective function. Ultimately, this work represents yet another complex task that LLMs are capable of solving and promises to help accelerate the deployment of autonomous tuning algorithms to the day-to-day operations of particle accelerators.
- [16] arXiv:2405.08890 [pdf, ps, other]
-
Title: Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the VideoSubjects: Computer Vision and Pattern Recognition (cs.CV)
Current video summarization methods primarily depend on supervised computer vision techniques, which demands time-consuming manual annotations. Further, the annotations are always subjective which make this task more challenging. To address these issues, we analyzed the feasibility in transforming the video summarization into a text summary task and leverage Large Language Models (LLMs) to boost video summarization. This paper proposes a novel self-supervised framework for video summarization guided by LLMs. Our method begins by generating captions for video frames, which are then synthesized into text summaries by LLMs. Subsequently, we measure semantic distance between the frame captions and the text summary. It's worth noting that we propose a novel loss function to optimize our model according to the diversity of the video. Finally, the summarized video can be generated by selecting the frames whose captions are similar with the text summary. Our model achieves competitive results against other state-of-the-art methods and paves a novel pathway in video summarization.
- [17] arXiv:2405.08892 [pdf, ps, html, other]
-
Title: RS-Reg: Probabilistic and Robust Certified Regression Through Randomized SmoothingSubjects: Machine Learning (cs.LG)
Randomized smoothing has shown promising certified robustness against adversaries in classification tasks. Despite such success with only zeroth-order access to base models, randomized smoothing has not been extended to a general form of regression. By defining robustness in regression tasks flexibly through probabilities, we demonstrate how to establish upper bounds on input data point perturbation (using the $\ell_2$ norm) for a user-specified probability of observing valid outputs. Furthermore, we showcase the asymptotic property of a basic averaging function in scenarios where the regression model operates without any constraint. We then derive a certified upper bound of the input perturbations when dealing with a family of regression models where the outputs are bounded. Our simulations verify the validity of the theoretical results and reveal the advantages and limitations of simple smoothing functions, i.e., averaging, in regression tasks. The code is publicly available at \url{this https URL}.
- [18] arXiv:2405.08904 [pdf, ps, html, other]
-
Title: Multi-resolution Isogeometric Analysis -- Efficient adaptivity utilizing the multi-patch structureComments: This work was supported by the Austrian Science Fund (FWF): P33956Subjects: Numerical Analysis (math.NA)
Isogeometric Analysis (IgA) is a spline based approach to the numerical solution of partial differential equations. There are two major issues that IgA was designed to address. The first issue is the exact representation of domains stemming from Computer Aided Design (CAD) software. In practice, this can be realized only with multi-patch IgA, often in combination with trimming or similar techniques. The second issue is the realization of high-order discretizations (by increasing the spline degree) with numbers of degrees of freedom comparable to low-order methods. High-order methods can deliver their full potential only if the solution to be approximated is sufficiently smooth; otherwise, adaptive methods are required. In the last decades, a zoo of local refinement strategies for splines has been developed. The authors think that many of these approaches are a burden to implement efficiently and impede the utilization of recent advances that rely on tensor-product splines, e.g., concerning matrix assembly and preconditioning. The implementation seems to be particularly cumbersome in the context of multi-patch IgA. Our approach is to moderately increase the number of patches and to utilize different grid sizes on different patches. This allows reusing the existing code bases, recovers the convergence rates of other adaptive approaches and increases the number of degrees of freedom only marginally.
- [19] arXiv:2405.08906 [pdf, ps, html, other]
-
Title: fNIRS Analysis of Interaction Techniques in Touchscreen-Based Educational GamingShayla Sharmin, Elham Bakhshipour, Behdokht Kiafar, Md Fahim Abrar, Pinar Kullu, Nancy Getchell, Roghayeh Leila BarmakiSubjects: Human-Computer Interaction (cs.HC)
Touchscreens are becoming increasingly widespread in educational games, enhancing the quality of learner experience. Traditional metrics are often used to evaluate various input modalities, including hand and stylus. However, there exists a gap in understanding the cognitive impacts of these modalities during educational gameplay, which can be addressed through brain signal analysis to gain deeper insights into the underlying cognitive function and necessary brain resources for each condition. This facilitates a more precise comparison between conditions. In this study, we compared the brain signal and user experience of using hands and stylus on touchscreens while playing an educational game by analyzing hemodynamic response and self-reported measures. Participants engaged in a Unity-based educational quiz game using both hand and stylus on a touchscreen in a counterbalanced within-subject design. Oxygenated and deoxygenated hemoglobin data were collected using fNIRS, alongside quiz performance scores and standardized and customized user experience questionnaire ratings. Our findings show almost the same performance level with both input modalities, however, the hand requires less oxygen flow which suggests a lower cognitive effort than using a stylus while playing the educational game. Although the result shows that the stylus condition required more neural involvement than the hand condition, there is no significant difference between the use of both input modalities. However, there is a statistically significant difference in self-reported measures that support the findings mentioned above, favoring the hand that enhances understanding of modality effects in interactive educational environments.
- [20] arXiv:2405.08908 [pdf, ps, html, other]
-
Title: The Impact of 2D and 3D Gamified VR on Learning American Sign LanguageSubjects: Human-Computer Interaction (cs.HC)
Sign language has been extensively studied as a means of facilitating effective communication between hearing individuals and the deaf community. With the continuous advancements in virtual reality (VR) and gamification technologies, an increasing number of studies have begun to explore the application of these emerging technologies in sign language learning. This paper describes a user study that compares the impact of 2D and 3D games on the user experience in ASL learning. Empirical evidence gathered through questionnaires supports the positive impact of 3D game environments on user engagement and overall experience, particularly in relation to attractiveness, usability, and efficiency. Moreover, initial findings demonstrate a similar behaviour of 2D and 3D games in terms of enhancing user experience. Finally, the study identifies areas where improvements can be made to enhance the dependability and clarity of 3D game environments. These findings contribute to the understanding of how game-based approaches, and specifically the utilisation of 3D environments, can positively influence the learning experience of ASL.
- [21] arXiv:2405.08909 [pdf, ps, html, other]
-
Title: ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and AssociationComments: 14 pages, 3 figures, accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Many query-based approaches for 3D Multi-Object Tracking (MOT) adopt the tracking-by-attention paradigm, utilizing track queries for identity-consistent detection and object queries for identity-agnostic track spawning. Tracking-by-attention, however, entangles detection and tracking queries in one embedding for both the detection and tracking task, which is sub-optimal. Other approaches resemble the tracking-by-detection paradigm, detecting objects using decoupled track and detection queries followed by a subsequent association. These methods, however, do not leverage synergies between the detection and association task. Combining the strengths of both paradigms, we introduce ADA-Track, a novel end-to-end framework for 3D MOT from multi-view cameras. We introduce a learnable data association module based on edge-augmented cross-attention, leveraging appearance and geometric features. Furthermore, we integrate this association module into the decoder layer of a DETR-based 3D detector, enabling simultaneous DETR-like query-to-image cross-attention for detection and query-to-query cross-attention for data association. By stacking these decoder layers, queries are refined for the detection and association task alternately, effectively harnessing the task dependencies. We evaluate our method on the nuScenes dataset and demonstrate the advantage of our approach compared to the two previous paradigms. Code is available at this https URL.
- [22] arXiv:2405.08911 [pdf, ps, html, other]
-
Title: CLIP with Quality Captions: A Strong Pretraining for Vision TasksSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
CLIP models perform remarkably well on zero-shot classification and retrieval tasks. But recent studies have shown that learnt representations in CLIP are not well suited for dense prediction tasks like object detection, semantic segmentation or depth estimation. More recently, multi-stage training methods for CLIP models was introduced to mitigate the weak performance of CLIP on downstream tasks. In this work, we find that simply improving the quality of captions in image-text datasets improves the quality of CLIP's visual representations, resulting in significant improvement on downstream dense prediction vision tasks. In fact, we find that CLIP pretraining with good quality captions can surpass recent supervised, self-supervised and weakly supervised pretraining methods. We show that when CLIP model with ViT-B/16 as image encoder is trained on well aligned image-text pairs it obtains 12.1% higher mIoU and 11.5% lower RMSE on semantic segmentation and depth estimation tasks over recent state-of-the-art Masked Image Modeling (MIM) pretraining methods like Masked Autoencoder (MAE). We find that mobile architectures also benefit significantly from CLIP pretraining. A recent mobile vision architecture, MCi2, with CLIP pretraining obtains similar performance as Swin-L, pretrained on ImageNet-22k for semantic segmentation task while being 6.1$\times$ smaller. Moreover, we show that improving caption quality results in $10\times$ data efficiency when finetuning for dense prediction tasks.
- [23] arXiv:2405.08917 [pdf, ps, html, other]
-
Title: Feature Importance and Explainability in Quantum Machine LearningComments: Amended final year project. 23 pagesSubjects: Machine Learning (cs.LG); Quantum Algebra (math.QA); Machine Learning (stat.ML)
Many Machine Learning (ML) models are referred to as black box models, providing no real insights into why a prediction is made. Feature importance and explainability are important for increasing transparency and trust in ML models, particularly in settings such as healthcare and finance. With quantum computing's unique capabilities, such as leveraging quantum mechanical phenomena like superposition, which can be combined with ML techniques to create the field of Quantum Machine Learning (QML), and such techniques may be applied to QML models. This article explores feature importance and explainability insights in QML compared to Classical ML models. Utilizing the widely recognized Iris dataset, classical ML algorithms such as SVM and Random Forests, are compared against hybrid quantum counterparts, implemented via IBM's Qiskit platform: the Variational Quantum Classifier (VQC) and Quantum Support Vector Classifier (QSVC). This article aims to provide a comparison of the insights generated in ML by employing permutation and leave one out feature importance methods, alongside ALE (Accumulated Local Effects) and SHAP (SHapley Additive exPlanations) explainers.
- [24] arXiv:2405.08920 [pdf, ps, html, other]
-
Title: Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation LearningComments: To appear in ICML 2024Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
A recent study by De et al. (2022) has reported that large-scale representation learning through pre-training on a public dataset significantly enhances differentially private (DP) learning in downstream tasks, despite the high dimensionality of the feature space. To theoretically explain this phenomenon, we consider the setting of a layer-peeled model in representation learning, which results in interesting phenomena related to learned features in deep learning and transfer learning, known as Neural Collapse (NC).
Within the framework of NC, we establish an error bound indicating that the misclassification error is independent of dimension when the distance between actual features and the ideal ones is smaller than a threshold. Additionally, the quality of the features in the last layer is empirically evaluated under different pre-trained models within the framework of NC, showing that a more powerful transformer leads to a better feature representation. Furthermore, we reveal that DP fine-tuning is less robust compared to fine-tuning without DP, particularly in the presence of perturbations. These observations are supported by both theoretical analyses and experimental evaluation. Moreover, to enhance the robustness of DP fine-tuning, we suggest several strategies, such as feature normalization or employing dimension reduction methods like Principal Component Analysis (PCA). Empirically, we demonstrate a significant improvement in testing accuracy by conducting PCA on the last-layer features. - [25] arXiv:2405.08921 [pdf, ps, html, other]
-
Title: Neural Active Learning Meets the Partial Monitoring FrameworkSubjects: Machine Learning (cs.LG)
We focus on the online-based active learning (OAL) setting where an agent operates over a stream of observations and trades-off between the costly acquisition of information (labelled observations) and the cost of prediction errors. We propose a novel foundation for OAL tasks based on partial monitoring, a theoretical framework specialized in online learning from partially informative actions. We show that previously studied binary and multi-class OAL tasks are instances of partial monitoring. We expand the real-world potential of OAL by introducing a new class of cost-sensitive OAL tasks. We propose NeuralCBP, the first PM strategy that accounts for predictive uncertainty with deep neural networks. Our extensive empirical evaluation on open source datasets shows that NeuralCBP has favorable performance against state-of-the-art baselines on multiple binary, multi-class and cost-sensitive OAL tasks.
- [26] arXiv:2405.08927 [pdf, ps, html, other]
-
Title: Expanderizing Higher Order Random WalksSubjects: Data Structures and Algorithms (cs.DS)
We study a variant of the down-up and up-down walks over an $n$-partite simplicial complex, which we call expanderized higher order random walks -- where the sequence of updated coordinates correspond to the sequence of vertices visited by a random walk over an auxiliary expander graph $H$. When $H$ is the clique, this random walk reduces to the usual down-up walk and when $H$ is the directed cycle, this random walk reduces to the well-known systematic scan Glauber dynamics. We show that whenever the usual higher order random walks satisfy a log-Sobolev inequality or a Poincaré inequality, the expanderized walks satisfy the same inequalities with a loss of quality related to the two-sided expansion of the auxillary graph $H$. Our construction can be thought as a higher order random walk generalization of the derandomized squaring algorithm of Rozenman and Vadhan. We show that when initiated with an expander graph our expanderized random walks have mixing time $O(n \log n)$ for sampling a uniformly random list colorings of a graph $G$ of maximum degree $\Delta = O(1)$ where each vertex has at least $(11/6 - \epsilon) \Delta$ and at most $O(\Delta)$ colors and $O\left( \frac{n \log n}{(1 - \| J\|)^2}\right)$ for sampling the Ising model with a PSD interaction matrix $J \in R^{n \times n}$ satisfying $\| J \| \le 1$ and the external field $h \in R^n$-- here the $O(\bullet)$ notation hides a constant that depends linearly on the largest entry of $h$. As expander graphs can be very sparse, this decreases the amount of randomness required to simulate the down-up walks by a logarithmic factor. We also prove some simple results which enable us to argue about log-Sobolev constants of higher order random walks and provide a simple and self-contained analysis of local-to-global $\Phi$-entropy contraction in simplicial complexes -- giving simpler proofs for many pre-existing results.
- [27] arXiv:2405.08931 [pdf, ps, html, other]
-
Title: A QPTAS for Facility Location on Unit Disk graphsSubjects: Data Structures and Algorithms (cs.DS)
We study the classic \textsc{(Uncapacitated) Facility Location} problem on Unit Disk Graphs (UDGs). For a given point set $P$ in the plane, the unit disk graph UDG(P) on $P$ has vertex set $P$ and an edge between two distinct points $p, q \in P$ if and only if their Euclidean distance $|pq|$ is at most 1. The weight of the edge $pq$ is equal to their distance $|pq|$. An instance of \fl on UDG(P) consists of a set $C\subseteq P$ of clients and a set $F\subseteq P$ of facilities, each having an opening cost $f_i$. The goal is to pick a subset $F'\subseteq F$ to open while minimizing $\sum_{i\in F'} f_i + \sum_{v\in C} d(v,F')$, where $d(v,F')$ is the distance of $v$ to nearest facility in $F'$ through UDG(P).
In this paper, we present the first Quasi-Polynomial Time Approximation Schemes (QPTAS) for the problem. While approximation schemes are well-established for facility location problems on sparse geometric graphs (such as planar graphs), there is a lack of such results for dense graphs. Specifically, prior to this study, to the best of our knowledge, there was no approximation scheme for any facility location problem on UDGs in the general setting. - [28] arXiv:2405.08932 [pdf, ps, html, other]
-
Title: Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports to address downstream tasks of interest on bone radiography. A practical processing pipeline is introduced to anonymize and process French medical reports. Pretraining then consists in the self-supervised alignment of visual and textual embedding spaces derived from deep model encoders. The resulting image encoder is then used to handle various downstream tasks, including quantification of osteoarthritis, estimation of bone age on pediatric wrists, bone fracture and anomaly detection. Our approach demonstrates competitive performance on downstream tasks, compared to alternatives requiring a significantly larger amount of human expert annotations. Our work stands as the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations, capitalizing on the large quantity of paired images and reports data available in an hospital. By relying on generic vision-laguage deep models in a language-specific scenario, it contributes to the deployement of vision models for wider healthcare applications.
- [29] arXiv:2405.08935 [pdf, ps, html, other]
-
Title: Function based sim-to-real learning for shape control of deformable free-form surfacesSubjects: Robotics (cs.RO)
For the shape control of deformable free-form surfaces, simulation plays a crucial role in establishing the mapping between the actuation parameters and the deformed shapes. The differentiation of this forward kinematic mapping is usually employed to solve the inverse kinematic problem for determining the actuation parameters that can realize a target shape. However, the free-form surfaces obtained from simulators are always different from the physically deformed shapes due to the errors introduced by hardware and the simplification adopted in physical simulation. To fill the gap, we propose a novel deformation function based sim-to-real learning method that can map the geometric shape of a simulated model into its corresponding shape of the physical model. Unlike the existing sim-to-real learning methods that rely on completely acquired dense markers, our method accommodates sparsely distributed markers and can resiliently use all captured frames -- even for those in the presence of missing markers. To demonstrate its effectiveness, our sim-to-real method has been integrated into a neural network-based computational pipeline designed to tackle the inverse kinematic problem on a pneumatically actuated deformable mannequin.
- [30] arXiv:2405.08938 [pdf, ps, html, other]
-
Title: Pointwise Lipschitz Continuous Graph Algorithms via Proximal Gradient AnalysisSubjects: Data Structures and Algorithms (cs.DS)
In many real-world applications, it is prohibitively expensive to drastically change the solution to a problem after a small perturbation in the environment. Therefore, the stability of an algorithm is a very desirable property. In this paper, we study the class of pointwise Lipschitz continuous algorithms as introduced in the recent work of Kumabe and Yoshida [KY23b, FOCS'23]. The Lipschitz constant of an algorithm, intuitively, bounds the ratio of the changes in its output (measured in $\ell_1$ distance) over the perturbations of its input. Prior to our work, most of the attention was focused on the weighted setting whereas only the maximum bipartite matching and the minimum spanning tree problems were studied in the unweighted which is our focus.
In this paper, we give a general and simple framework for bounding the Lipschitz constant of algorithms measured through the unweighted $\ell_1$ distance of their outputs. Our approach consists of three main steps. First, we consider a natural continuous relaxation of the underlying graph problem by adding a smooth and strongly convex regularizer to the objective function. Then, we give upper bounds on the $\ell_1$ distance of the optimal solutions of the convex programs, under small perturbations of the weights, via a stability analysis of the trajectory of the proximal gradient method. Finally, we present new problem-specific rounding techniques to obtain integral solutions to several graph problems that approximately maintain the stability guarantees of the fractional solutions. We apply our framework to a number of problems including minimum $s$-$t$ cut, multiway cut, densest subgraph, maximum ($b$-)matching, and packing integer programs. To complement our algorithms, we show the tightness of our results for certain problems by establishing matching lower bounds. - [31] arXiv:2405.08944 [pdf, ps, html, other]
-
Title: Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance AnalysisSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC)
Transformer-based long context generative models power emerging AI applications like hour-long video understanding and project-level coding agent. Deploying long context transformers (e.g., 100K to 10M tokens) is prohibitively expensive compared to short context (e.g., 4K tokens) model variants. Reducing the cost of long-context transformers is becoming a pressing research and engineering challenge starting from the year of 2024. This work describes a concurrent programming framework for quantitatively analyzing the efficiency challenges in serving multiple long-context requests under limited size of GPU high-bandwidth memory (HBM) regime. We give a detailed analysis of how all additional computational costs, compared to 4K context, trace back to \textit{one single source: the large size of the KV cache}. We use a 34B GPT-3.5 level model of 50K context on A100 NVLink as a running example, and describe how its large KV cache causes four types of deployment challenges: (1) prefilling long inputs takes much longer compute time and GPU memory than short inputs; (2) after prefilling, the large KV cache residing on the GPU HBM substantially restricts the number of concurrent users being served; (3) during decoding, repeatedly reading the KV cache from HBM to SM largely increases latency; (4) when KV cache memory overflows, swapping it from HBM to DDR causes significant context switching latency. We use this framework to analyze existing works and identify possibilities of combining them to build end-to-end systems. Overall, this work offers a foundational framework for analyzing long context transformer deployment and identifies directions towards reducing the inference cost of 1M context to be as cheap as 4K.
- [32] arXiv:2405.08948 [pdf, ps, html, other]
-
Title: Analyzing Nursing Assistant Attitudes Towards Empathic Geriatric Caregiving Using Quantitative EthnographySubjects: Human-Computer Interaction (cs.HC)
An emergent challenge in geriatric care is improving the quality of care, which requires insight from stakeholders. Qualitative methods offer detailed insights, but they can be biased and have limited generalizability, while quantitative methods may miss nuances. Network-based approaches, such as quantitative ethnography (QE), can bridge this methodological gap. By leveraging the strengths of both methods, QE provides profound insights into need finding interviews. In this paper, to better understand geriatric care attitudes, we interviewed ten nursing assistants, used QE to analyze the data, and compared their daily activities in real life with training experiences. A two-sample t-test with a large effect size (Cohen's d=1.63) indicated a significant difference between real-life and training activities. The findings suggested incorporating more empathetic training scenarios into the future design of our geriatric care simulation. The results have implications for human-computer interaction and human factors. This is illustrated by presenting an example of using QE to analyze expert interviews with nursing assistants as caregivers to inform subsequent design processes.
- [33] arXiv:2405.08954 [pdf, ps, html, other]
-
Title: Zero-Shot Transfer of Neural ODEsSubjects: Robotics (cs.RO)
Autonomous systems often encounter environments and scenarios beyond the scope of their training data, which underscores a critical challenge: the need to generalize and adapt to unseen scenarios in real time. This challenge necessitates new mathematical and algorithmic tools that enable adaptation and zero-shot transfer. To this end, we leverage the theory of function encoders, which enables zero-shot transfer by combining the flexibility of neural networks with the mathematical principles of Hilbert spaces. Using this theory, we first present a method for learning a space of dynamics spanned by a set of neural ODE basis functions. After training, the proposed approach can rapidly identify dynamics in the learned space using an efficient inner product calculation. Critically, this calculation requires no gradient calculations or retraining during the online phase. This method enables zero-shot transfer for autonomous systems at runtime and opens the door for a new class of adaptable control algorithms. We demonstrate state-of-the-art system modeling accuracy for two MuJoCo robot environments and show that the learned models can be used for more efficient MPC control of a quadrotor.
- [34] arXiv:2405.08956 [pdf, ps, html, other]
-
Title: Toward Completing the Picture of Control in Schulze and Ranked Pairs ElectionsComments: Includes supplemental materialSubjects: Computer Science and Game Theory (cs.GT)
Both Schulze and ranked pairs are voting rules that satisfy many natural, desirable axioms. Many standard types of electoral control (with a chair seeking to change the outcome of an election by interfering with the election structure) have already been studied. However, for control by replacing candidates or voters and for (exact) multimode control that combines multiple standard attacks, many questions remain open. We solve a number of these open cases for Schulze and ranked pairs. In addition, we fix a flaw in the reduction of Menton and Singh [IJCAI 2013] showing that Schulze is resistant to constructive control by deleting candidates and re-establish a vulnerability result for destructive control by deleting candidates. In some of our proofs, we study variants of s-t vertex cuts in graphs that are related to our control problems.
- [35] arXiv:2405.08961 [pdf, ps, html, other]
-
Title: Bird's-Eye View to Street-View: A SurveySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In recent years, street view imagery has grown to become one of the most important sources of geospatial data collection and urban analytics, which facilitates generating meaningful insights and assisting in decision-making. Synthesizing a street-view image from its corresponding satellite image is a challenging task due to the significant differences in appearance and viewpoint between the two domains. In this study, we screened 20 recent research papers to provide a thorough review of the state-of-the-art of how street-view images are synthesized from their corresponding satellite counterparts. The main findings are: (i) novel deep learning techniques are required for synthesizing more realistic and accurate street-view images; (ii) more datasets need to be collected for public usage; and (iii) more specific evaluation metrics need to be investigated for evaluating the generated images appropriately. We conclude that, due to applying outdated deep learning techniques, the recent literature failed to generate detailed and diverse street-view images.
- [36] arXiv:2405.08965 [pdf, ps, html, other]
-
Title: LLMs are Meaning-Typed Code ConstructsJason Mars, Yiping Kang, Jayanaka Dantanarayana, Chandra Irugalbandara, Kugesan Sivasothynathan, Lingjia TangSubjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)
Programming with Generative AI (GenAI) models is a type of Neurosymbolic programming and has seen tremendous adoption across many domains. However, leveraging GenAI models in code today can be complex, counter-intuitive and often require specialized frameworks, leading to increased complexity. This is because it is currently unclear as to the right abstractions through which we should marry GenAI models with the nature of traditional programming code constructs. In this paper, we introduce a set of novel abstractions to help bridge the gap between Neuro- and symbolic programming. We introduce Meaning, a new specialized type that represents the underlying semantic value of traditional types (e.g., string). We make the case that GenAI models, LLMs in particular, should be reasoned as a meaning-type wrapped code construct at the language level. We formulate the problem of translation between meaning and traditional types and propose Automatic Meaning-Type Transformation (A-MTT), a runtime feature that abstracts this translation away from the developers by automatically converting between M eaning and types at the interface of LLM invocation. Leveraging this new set of code constructs and OTT, we demonstrate example implementation of neurosymbolic programs that seamlessly utilizes LLMs to solve problems in place of potentially complex traditional programming logic.
- [37] arXiv:2405.08967 [pdf, ps, html, other]
-
Title: Perturbation-based Learning for Recurrent Neural NetworksSubjects: Machine Learning (cs.LG)
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities, yet existing methods for their training encounter efficiency challenges. Backpropagation through time (BPTT), the prevailing method, extends the backpropagation (BP) algorithm by unrolling the RNN over time. However, this approach suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information. Furthermore, BPTT has been shown to struggle with propagating gradient information for long sequences, leading to vanishing gradients. An alternative strategy to using gradient-based methods like BPTT involves stochastically approximating gradients through perturbation-based methods. This learning approach is exceptionally simple, necessitating only forward passes in the network and a global reinforcement signal as feedback. Despite its simplicity, the random nature of its updates typically leads to inefficient optimization, limiting its effectiveness in training neural networks. In this study, we present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT, while maintaining the inherent advantages over gradient-based learning. To this end, we extend the recently introduced activity-based node perturbation (ANP) method to operate in the time domain, leading to more efficient learning and generalization. Subsequently, we conduct a range of experiments to validate our approach. Our results show similar performance, convergence time and scalability when compared to BPTT, strongly outperforming standard node perturbation and weight perturbation methods. These findings suggest that perturbation-based learning methods offer a versatile alternative to gradient-based methods for training RNNs.
- [38] arXiv:2405.08969 [pdf, ps, html, other]
-
Title: Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding ExploitationComments: Accepted at AI for Social Good track of IJCAI 2024 (the 33rd International Joint Conference on Artificial Intelligence), 14 pages, 11 figuresSubjects: Machine Learning (cs.LG)
Hand gestures can provide a natural means of human-computer interaction and enable people who cannot speak to communicate efficiently. Existing hand gesture recognition methods heavily depend on pre-defined gestures, however, motor-impaired individuals require new gestures tailored to each individual's gesture motion and style. Gesture samples collected from different persons have distribution shifts due to their health conditions, the severity of the disability, motion patterns of the arms, etc. In this paper, we introduce the Latent Embedding Exploitation (LEE) mechanism in our replay-based Few-Shot Continual Learning (FSCL) framework that significantly improves the performance of fine-tuning a model for out-of-distribution data. Our method produces a diversified latent feature space by leveraging a preserved latent embedding known as \textit{gesture prior knowledge}, along with \textit{intra-gesture divergence} derived from two additional embeddings. Thus, the model can capture latent statistical structure in highly variable gestures with limited samples. We conduct an experimental evaluation using the SmartWatch Gesture and the Motion Gesture datasets. The proposed method results in an average test accuracy of 57.0\%, 64.6\%, and 69.3\% by using one, three, and five samples for six different gestures. Our method helps motor-impaired persons leverage wearable devices, and their unique styles of movement can be learned and applied in human-computer interaction and social communication.
- [39] arXiv:2405.08971 [pdf, ps, html, other]
-
Title: Computation-Aware Kalman Filtering and SmoothingSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Kalman filtering and smoothing are the foundational mechanisms for efficient inference in Gauss-Markov models. However, their time and memory complexities scale prohibitively with the size of the state space. This is particularly problematic in spatiotemporal regression problems, where the state dimension scales with the number of spatial observations. Existing approximate frameworks leverage low-rank approximations of the covariance matrix. Since they do not model the error introduced by the computational approximation, their predictive uncertainty estimates can be overly optimistic. In this work, we propose a probabilistic numerical method for inference in high-dimensional Gauss-Markov models which mitigates these scaling issues. Our matrix-free iterative algorithm leverages GPU acceleration and crucially enables a tunable trade-off between computational cost and predictive uncertainty. Finally, we demonstrate the scalability of our method on a large-scale climate dataset.
- [40] arXiv:2405.08973 [pdf, ps, html, other]
-
Title: An adaptive approach to Bayesian Optimization with switching costsStefan Pricopie, Richard Allmendinger, Manuel Lopez-Ibanez, Clyde Fare, Matt Benatan, Joshua KnowlesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We investigate modifications to Bayesian Optimization for a resource-constrained setting of sequential experimental design where changes to certain design variables of the search space incur a switching cost. This models the scenario where there is a trade-off between evaluating more while maintaining the same setup, or switching and restricting the number of possible evaluations due to the incurred cost. We adapt two process-constrained batch algorithms to this sequential problem formulation, and propose two new methods: one cost-aware and one cost-ignorant. We validate and compare the algorithms using a set of 7 scalable test functions in different dimensionalities and switching-cost settings for 30 total configurations. Our proposed cost-aware hyperparameter-free algorithm yields comparable results to tuned process-constrained algorithms in all settings we considered, suggesting some degree of robustness to varying landscape features and cost trade-offs. This method starts to outperform the other algorithms with increasing switching-cost. Our work broadens out from other recent Bayesian Optimization studies in resource-constrained settings that consider a batch setting only. While the contributions of this work are relevant to the general class of resource-constrained problems, they are particularly relevant to problems where adaptability to varying resource availability is of high importance
- [41] arXiv:2405.08976 [pdf, ps, html, other]
-
Title: Slice-aware Resource Allocation and Admission Control for Smart Factory Wireless NetworksComments: 7 pages, submitted to VTCfall for reviewSubjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
The 5th generation (5G) and beyond network offers substantial promise as the ideal wireless technology to replace the existing inflexible wired connections in traditional factories of today. 5G network slicing allows for tailored allocation of resources to different network services, each with unique Quality of Service (QoS) requirements. This paper presents a novel solution for slice-aware radio resource allocation based on a convex optimisation control framework for applications in smart factory wireless networks. The proposed framework dynamically allocates minimum power and sub-channels to downlink mixed service type industrial users categorised into three slices: Capacity Limited (CL), Ultra Reliable Low Latency Communication (URLLC), and Time Sensitive (TS) slices. Given that the base station (BS) has limited transmission power, we enforce admission control by effectively relaxing the target rate constraints for current connections in the CL slice. This rate readjustment occurs whenever power consumption exceeds manageable levels. Simulation results show that our approach minimises power, allocates sub-channels to users, maintains slice isolation, and delivers QoS-specific communications to users in all the slices despite time-varying number of users and changing network conditions.
- [42] arXiv:2405.08979 [pdf, ps, html, other]
-
Title: drGAT: Attention-Guided Gene Assessment of Drug Response Utilizing a Drug-Cell-Gene Heterogeneous NetworkSubjects: Machine Learning (cs.LG); Molecular Networks (q-bio.MN); Quantitative Methods (q-bio.QM)
Drug development is a lengthy process with a high failure rate. Increasingly, machine learning is utilized to facilitate the drug development processes. These models aim to enhance our understanding of drug characteristics, including their activity in biological contexts. However, a major challenge in drug response (DR) prediction is model interpretability as it aids in the validation of findings. This is important in biomedicine, where models need to be understandable in comparison with established knowledge of drug interactions with proteins. drGAT, a graph deep learning model, leverages a heterogeneous graph composed of relationships between proteins, cell lines, and drugs. drGAT is designed with two objectives: DR prediction as a binary sensitivity prediction and elucidation of drug mechanism from attention coefficients. drGAT has demonstrated superior performance over existing models, achieving 78\% accuracy (and precision), and 76\% F1 score for 269 DNA-damaging compounds of the NCI60 drug response dataset. To assess the model's interpretability, we conducted a review of drug-gene co-occurrences in Pubmed abstracts in comparison to the top 5 genes with the highest attention coefficients for each drug. We also examined whether known relationships were retained in the model by inspecting the neighborhoods of topoisomerase-related drugs. For example, our model retained TOP1 as a highly weighted predictive feature for irinotecan and topotecan, in addition to other genes that could potentially be regulators of the drugs. Our method can be used to accurately predict sensitivity to drugs and may be useful in the identification of biomarkers relating to the treatment of cancer patients.
- [43] arXiv:2405.08981 [pdf, ps, html, other]
-
Title: Impact of Design Decisions in Scanpath ModelingComments: 16 pagesSubjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Modeling visual saliency in graphical user interfaces (GUIs) allows to understand how people perceive GUI designs and what elements attract their attention. One aspect that is often overlooked is the fact that computational models depend on a series of design parameters that are not straightforward to decide. We systematically analyze how different design parameters affect scanpath evaluation metrics using a state-of-the-art computational model (DeepGaze++). We particularly focus on three design parameters: input image size, inhibition-of-return decay, and masking radius. We show that even small variations of these design parameters have a noticeable impact on standard evaluation metrics such as DTW or Eyenalysis. These effects also occur in other scanpath models, such as UMSS and ScanGAN, and in other datasets such as MASSVIS. Taken together, our results put forward the impact of design decisions for predicting users' viewing behavior on GUIs.
- [44] arXiv:2405.08988 [pdf, ps, html, other]
-
Title: Containment Problem for Deterministic Multicounter Machine ModelsComments: 24 pages,2 figuresSubjects: Formal Languages and Automata Theory (cs.FL)
There are many types of automata and grammar models that have been studied in the literature, and for these models, it is common to determine whether certain problems are decidable. One problem that has been difficult to answer throughout the history of automata and formal language theory is to decide whether a given system $M$ accepts a bounded language (whether there exist words $w_1, \ldots,w_k$ such that $L(M) \subseteq w_1^* \cdots w_k^*$?). Boundedness was only known to be decidable for regular and context-free languages until recently when it was shown to also be decidable for finite automata and pushdown automata augmented with reversal-bounded counters, and for vector addition systems with states. However, decidability of this problem has still gone unanswered for the majority of automata/grammar models with a decidable emptiness problem that have been studied in the literature.
In this paper, we develop new techniques to show that the boundedness problem is decidable for larger classes of one-way nondeterministic automata and grammar models by reducing the problem to the decidability of boundedness for simpler classes of automata. One technique involves characterizing the models in terms of multi-tape automata. We give new characterizations of finite-turn Turing machines, finite-turn Turing machines augmented with various storage structures (like a pushdown, multiple reversal-bounded counters, partially-blind counters, etc.), and simple matrix grammars. The characterizations are then used to show that the boundedness problem for these models is decidable. Another technique uses the concept of the store language of an automaton. This is used to show that the boundedness problem is decidable for pushdown automata that can "flip" their pushdown a bounded number of times. Boundedness remains decidable even if we augment this device with additional stores. - [45] arXiv:2405.08989 [pdf, ps, html, other]
-
Title: What is it for a Machine Learning Model to Have a Capability?Comments: forthcoming in the British Journal for the Philosophy of Science (BJPS)Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
What can contemporary machine learning (ML) models do? Given the proliferation of ML models in society, answering this question matters to a variety of stakeholders, both public and private. The evaluation of models' capabilities is rapidly emerging as a key subfield of modern ML, buoyed by regulatory attention and government grants. Despite this, the notion of an ML model possessing a capability has not been interrogated: what are we saying when we say that a model is able to do something? And what sorts of evidence bear upon this question? In this paper, we aim to answer these questions, using the capabilities of large language models (LLMs) as a running example. Drawing on the large philosophical literature on abilities, we develop an account of ML models' capabilities which can be usefully applied to the nascent science of model evaluation. Our core proposal is a conditional analysis of model abilities (CAMA): crudely, a machine learning model has a capability to X just when it would reliably succeed at doing X if it 'tried'. The main contribution of the paper is making this proposal precise in the context of ML, resulting in an operationalisation of CAMA applicable to LLMs. We then put CAMA to work, showing that it can help make sense of various features of ML model evaluation practice, as well as suggest procedures for performing fair inter-model comparisons.
- [46] arXiv:2405.08991 [pdf, ps, html, other]
-
Title: Theoretical Analysis for Expectation-Maximization-Based Multi-Model 3D RegistrationComments: arXiv admin note: substantial text overlap with arXiv:2402.10865Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We perform detailed theoretical analysis of an expectation-maximization-based algorithm recently proposed in for solving a variation of the 3D registration problem, named multi-model 3D registration. Despite having shown superior empirical results, did not theoretically justify the conditions under which the EM approach converges to the ground truth. In this project, we aim to close this gap by establishing such conditions. In particular, the analysis revolves around the usage of probabilistic tail bounds that are developed and applied in various instances throughout the course. The problem studied in this project stands as another example, different from those seen in the course, in which tail-bounds help advance our algorithmic understanding in a probabilistic way. We provide self-contained background materials on 3D Registration
- [47] arXiv:2405.08992 [pdf, ps, html, other]
-
Title: Contextual Emotion Recognition using Large Vision Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
"How does the person in the bounding box feel?" Achieving human-level recognition of the apparent emotion of a person in real world situations remains an unsolved task in computer vision. Facial expressions are not enough: body pose, contextual knowledge, and commonsense reasoning all contribute to how humans perform this emotional theory of mind task. In this paper, we examine two major approaches enabled by recent large vision language models: 1) image captioning followed by a language-only LLM, and 2) vision language models, under zero-shot and fine-tuned setups. We evaluate the methods on the Emotions in Context (EMOTIC) dataset and demonstrate that a vision language model, fine-tuned even on a small dataset, can significantly outperform traditional baselines. The results of this work aim to help robots and agents perform emotionally sensitive decision-making and interaction in the future.
- [48] arXiv:2405.08996 [pdf, ps, html, other]
-
Title: Learning Correspondence for Deformable ObjectsSubjects: Computer Vision and Pattern Recognition (cs.CV)
We investigate the problem of pixelwise correspondence for deformable objects, namely cloth and rope, by comparing both classical and learning-based methods. We choose cloth and rope because they are traditionally some of the most difficult deformable objects to analytically model with their large configuration space, and they are meaningful in the context of robotic tasks like cloth folding, rope knot-tying, T-shirt folding, curtain closing, etc. The correspondence problem is heavily motivated in robotics, with wide-ranging applications including semantic grasping, object tracking, and manipulation policies built on top of correspondences. We present an exhaustive survey of existing classical methods for doing correspondence via feature-matching, including SIFT, SURF, and ORB, and two recently published learning-based methods including TimeCycle and Dense Object Nets. We make three main contributions: (1) a framework for simulating and rendering synthetic images of deformable objects, with qualitative results demonstrating transfer between our simulated and real domains (2) a new learning-based correspondence method extending Dense Object Nets, and (3) a standardized comparison across state-of-the-art correspondence methods. Our proposed method provides a flexible, general formulation for learning temporally and spatially continuous correspondences for nonrigid (and rigid) objects. We report root mean squared error statistics for all methods and find that Dense Object Nets outperforms baseline classical methods for correspondence, and our proposed extension of Dense Object Nets performs similarly.
- [49] arXiv:2405.08997 [pdf, ps, html, other]
-
Title: LLM-Assisted Rule Based Machine Translation for Low/No-Resource LanguagesSubjects: Computation and Language (cs.CL)
We propose a new paradigm for machine translation that is particularly useful for no-resource languages (those without any publicly available bilingual or monolingual corpora): \acronym (LLM-Assisted Rule Based Machine Translation). Using the \acronym paradigm, we design the first language education/revitalization-oriented machine translator for Owens Valley Paiute (OVP), a critically endangered Indigenous American language for which there is virtually no publicly available data. We present a detailed evaluation of the translator's components: a rule-based sentence builder, an OVP to English translator, and an English to OVP translator. We also discuss the potential of the paradigm, its limitations, and the many avenues for future research that it opens up.
- [50] arXiv:2405.09001 [pdf, ps, html, other]
-
Title: BEVRender: Vision-based Cross-view Vehicle Registration in Off-road GNSS-denied EnvironmentComments: 8 pages, 6 figuresSubjects: Robotics (cs.RO)
We introduce BEVRender, a novel learning-based approach for the localization of ground vehicles in Global Navigation Satellite System (GNSS)-denied off-road scenarios. These environments are typically challenging for conventional vision-based state estimation due to the lack of distinct visual landmarks and the instability of vehicle poses. To address this, BEVRender generates high-quality local bird's eye view (BEV) images of the local terrain. Subsequently, these images are aligned with a geo-referenced aerial map via template-matching to achieve accurate cross-view registration. Our approach overcomes the inherent limitations of visual inertial odometry systems and the substantial storage requirements of image-retrieval localization strategies, which are susceptible to drift and scalability issues, respectively. Extensive experimentation validates BEVRender's advancement over existing GNSS-denied visual localization methods, demonstrating notable enhancements in both localization accuracy and update frequency. The code for BEVRender will be made available soon.
- [51] arXiv:2405.09002 [pdf, ps, html, other]
-
Title: Cross-Cultural Validation of Partner Models for Voice User InterfacesComments: Accepted at ACM CUI '24Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Recent research has begun to assess people's perceptions of voice user interfaces (VUIs) as dialogue partners, termed partner models. Current self-report measures are only available in English, limiting research to English-speaking users. To improve the diversity of user samples and contexts that inform partner modelling research, we translated, localized, and evaluated the Partner Modelling Questionnaire (PMQ) for non-English speaking Western (German, n=185) and East Asian (Japanese, n=198) cohorts where VUI use is popular. Through confirmatory factor analysis (CFA), we find that the scale produces equivalent levels of goodness-to-fit for both our German and Japanese translations, confirming its cross-cultural validity. Still, the structure of the communicative flexibility factor did not replicate directly across Western and East Asian cohorts. We discuss how our translations can open up critical research on cultural similarities and differences in partner model use and design, whilst highlighting the challenges for ensuring accurate translation across cultural contexts.
- [52] arXiv:2405.09004 [pdf, ps, html, other]
-
Title: Improving Sequential Market Clearing via Value-oriented Renewable Energy ForecastingSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Large penetration of renewable energy sources (RESs) brings huge uncertainty into the electricity markets. While existing deterministic market clearing fails to accommodate the uncertainty, the recently proposed stochastic market clearing struggles to achieve desirable market properties. In this work, we propose a value-oriented forecasting approach, which tactically determines the RESs generation that enters the day-ahead market. With such a forecast, the existing deterministic market clearing framework can be maintained, and the day-ahead and real-time overall operation cost is reduced. At the training phase, the forecast model parameters are estimated to minimize expected day-ahead and real-time overall operation costs, instead of minimizing forecast errors in a statistical sense. Theoretically, we derive the exact form of the loss function for training the forecast model that aligns with such a goal. For market clearing modeled by linear programs, this loss function is a piecewise linear function. Additionally, we derive the analytical gradient of the loss function with respect to the forecast, which inspires an efficient training strategy. A numerical study shows our forecasts can bring significant benefits of the overall cost reduction to deterministic market clearing, compared to quality-oriented forecasting approach.
- [53] arXiv:2405.09005 [pdf, ps, html, other]
-
Title: Cons-training tensor networksSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Quantum Physics (quant-ph)
In this study, we introduce a novel family of tensor networks, termed constrained matrix product states (MPS), designed to incorporate exactly arbitrary linear constraints into sparse block structures. These tensor networks effectively bridge the gap between U(1) symmetric MPS and traditional, unconstrained MPS. Central to our approach is the concept of a quantum region, an extension of quantum numbers traditionally used in symmetric tensor networks, adapted to capture any linear constraint, including the unconstrained scenario. We further develop canonical forms for these new MPS, which allow for the merging and factorization of tensor blocks according to quantum region fusion rules. Utilizing this canonical form, we apply an unsupervised training strategy to optimize arbitrary cost functions subject to linear constraints. We use this to solve the quadratic knapsack problem and show a superior performance against a leading nonlinear integer programming solver, highlighting the potential of our method in tackling complex constrained combinatorial optimization problems
- [54] arXiv:2405.09006 [pdf, ps, html, other]
-
Title: Spatial Semantic Recurrent Mining for Referring Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Referring Image Segmentation (RIS) consistently requires language and appearance semantics to more understand each other. The need becomes acute especially under hard situations. To achieve, existing works tend to resort to various trans-representing mechanisms to directly feed forward language semantic along main RGB branch, which however will result in referent distribution weakly-mined in space and non-referent semantic contaminated along channel. In this paper, we propose Spatial Semantic Recurrent Mining (S\textsuperscript{2}RM) to achieve high-quality cross-modality fusion. It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing. During fusion, S\textsuperscript{2}RM will first generate a constraint-weak yet distribution-aware language feature, then bundle features of each row and column from rotated features of one modality context to recurrently correlate relevant semantic contained in feature from other modality context, and finally resort to self-distilled weights to weigh on the contributions of different parsed semantics. Via coparsing, S\textsuperscript{2}RM transports information from the near and remote slice layers of generator context to the current slice layer of parsed context, capable of better modeling global relationship bidirectional and structured. Besides, we also propose a Cross-scale Abstract Semantic Guided Decoder (CASG) to emphasize the foreground of the referent, finally integrating different grained features at a comparatively low cost. Extensive experimental results on four current challenging datasets show that our proposed method performs favorably against other state-of-the-art algorithms.
- [55] arXiv:2405.09009 [pdf, ps, html, other]
-
Title: Ahead of the Count: An Algorithm for Probabilistic Prediction of Instant Runoff (IRV) ElectionsSubjects: Computers and Society (cs.CY); Combinatorics (math.CO); Probability (math.PR)
How can we probabilistically predict the winner in a ranked-choice election without all ballots being counted? In this study, we introduce a novel algorithm designed to predict outcomes in Instant Runoff Voting (IRV) elections. The algorithm takes as input a set of discrete probability distributions describing vote totals for each candidate ranking and calculates the probability that each candidate will win the election. In fact, we calculate all possible sequences of eliminations that might occur in the IRV rounds and assign a probability to each.
The discrete probability distributions can be arbitrary and, in applications, could be measured empirically from pre-election polling data or from partial vote tallies of an in-progress election.
The algorithm is effective for elections with a small number of candidates (five or fewer), with fast execution on typical consumer computers. The run-time is short enough for our method to be used for real-time election night modeling where new predictions are made continuously as more and more vote information becomes available. We demonstrate the algorithm in abstract examples, and also using real data from the 2022 Alaska state elections to simulate election-night predictions and also predictions of election recounts. - [56] arXiv:2405.09010 [pdf, ps, html, other]
-
Title: On Low Field Size Constructions of Access-Optimal Convertible CodesComments: This is an extended version of an IEEE ISIT 2024 paper with the same titleSubjects: Information Theory (cs.IT)
Most large-scale storage systems employ erasure coding to provide resilience against disk failures. Recent work has shown that tuning this redundancy to changes in disk failure rates leads to substantial storage savings. This process requires code conversion, wherein data encoded using an $[n^{I\mskip-2mu},k^{I\mskip-2mu}]$ initial code has to be transformed into data encoded using an $[n^{F\mskip-2mu},k^{F\mskip-2mu}]$ final code, a resource-intensive operation. Convertible codes are a class of codes that enable efficient code conversion while maintaining other desirable properties. In this paper, we focus on the access cost of conversion (total number of code symbols accessed in the conversion process) and on an important subclass of conversions known as the merge regime (combining multiple initial codewords into a single final codeword).
In this setting, explicit constructions are known for systematic access-optimal Maximum Distance Separable (MDS) convertible codes for all parameters in the merge regime. However, the existing construction for a key subset of these parameters, which makes use of Vandermonde parity matrices, requires a large field size making it unsuitable for practical applications. In this paper, we provide (1) sharper bounds on the minimum field size requirement for such codes, and (2) explicit constructions for low field sizes for several parameter ranges. In doing so, we provide a proof of super-regularity of specially designed classes of Vandermonde matrices that could be of independent interest. - [57] arXiv:2405.09011 [pdf, ps, html, other]
-
Title: Symmetric-Difference (Degeneracy) and Signed Tree ModelsComments: 21 pages, 7 figuresSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
We introduce a dense counterpart of graph degeneracy, which extends the recently-proposed invariant symmetric difference. We say that a graph has sd-degeneracy (for symmetric-difference degeneracy) at most $d$ if it admits an elimination order of its vertices where a vertex $u$ can be removed whenever it has a $d$-twin, i.e., another vertex $v$ such that at most $d$ vertices outside $\{u,v\}$ are neighbors of exactly one of $u, v$. The family of graph classes of bounded sd-degeneracy is a superset of that of graph classes of bounded degeneracy or of bounded flip-width, and more generally, of bounded symmetric difference. Unlike most graph parameters, sd-degeneracy is not hereditary: it may be strictly smaller on a graph than on some of its induced subgraphs. In particular, every $n$-vertex graph is an induced subgraph of some $O(n^2)$-vertex graph of sd-degeneracy 1. In spite of this and the breadth of classes of bounded sd-degeneracy, we devise $\tilde{O}(\sqrt n)$-bit adjacency labeling schemes for them, which are optimal up to the hidden polylogarithmic factor. This is attained on some even more general classes, consisting of graphs $G$ whose vertices bijectively map to the leaves of a tree $T$, where transversal edges and anti-edges added to $T$ define the edge set of $G$. We call such graph representations signed tree models as they extend the so-called tree models (or twin-decompositions) developed in the context of twin-width, by adding transversal anti-edges. While computing the degeneracy of an input graph can be done in linear time, we show that deciding whether its symmetric difference is at most 8 is co-NP-complete, and whether its sd-degeneracy is at most 1 is NP-complete.
- [58] arXiv:2405.09014 [pdf, ps, html, other]
-
Title: Feature-based Federated Transfer Learning: Communication Efficiency, Robustness and PrivacyComments: Accepted by IEEE Transactions on Machine Learning in Communications and Networking. arXiv admin note: text overlap with arXiv:2209.05395Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
In this paper, we propose feature-based federated transfer learning as a novel approach to improve communication efficiency by reducing the uplink payload by multiple orders of magnitude compared to that of existing approaches in federated learning and federated transfer learning. Specifically, in the proposed feature-based federated learning, we design the extracted features and outputs to be uploaded instead of parameter updates. For this distributed learning model, we determine the required payload and provide comparisons with the existing schemes. Subsequently, we analyze the robustness of feature-based federated transfer learning against packet loss, data insufficiency, and quantization. Finally, we address privacy considerations by defining and analyzing label privacy leakage and feature privacy leakage, and investigating mitigating approaches. For all aforementioned analyses, we evaluate the performance of the proposed learning scheme via experiments on an image classification task and a natural language processing task to demonstrate its effectiveness.
- [59] arXiv:2405.09016 [pdf, ps, other]
-
Title: IoT-enabled Stability Chamber for the Pharmaceutical IndustrySubjects: Systems and Control (eess.SY)
A stability chamber is a critical piece of equipment for any pharmaceutical facility to retain the manufactured product for testing the stability and quality of the products over a certain period of time by keeping the products in different sets of environmental conditions. In this paper, we proposed an IoT-enabled stability chamber for the pharmaceutical industry. We developed four stability chambers by using the existing utilities of a manufacturing facility. The state-of-the-art automatic PID controlling system of Siemens S7-1200 PLC was used to control each chamber. Seven precise temperature and humidity sensors were used to monitor the environment of each chamber. PC-based Siemens WinCC Runtime Advanced visualization platform was used to visualize the data of the chamber which is FDA 21 CFR Part 11 Compliant. Sensor data of the chamber are stored in the database in a periodic manner and also have report generation features. This chamber also has an alarm management system. The critical alarms are automatically emailed to the user to take action. Additionally, an Internet of Things-based (IoT-based) application was also developed to monitor the sensor's data remotely using any client application.
- [60] arXiv:2405.09017 [pdf, ps, other]
-
Title: A Japanese-Chinese Parallel Corpus Using Crowdsourcing for Web MiningComments: Work in progressSubjects: Computation and Language (cs.CL)
Using crowdsourcing, we collected more than 10,000 URL pairs (parallel top page pairs) of bilingual websites that contain parallel documents and created a Japanese-Chinese parallel corpus of 4.6M sentence pairs from these websites. We used a Japanese-Chinese bilingual dictionary of 160K word pairs for document and sentence alignment. We then used high-quality 1.2M Japanese-Chinese sentence pairs to train a parallel corpus filter based on statistical language models and word translation probabilities. We compared the translation accuracy of the model trained on these 4.6M sentence pairs with that of the model trained on Japanese-Chinese sentence pairs from CCMatrix (12.4M), a parallel corpus from global web mining. Although our corpus is only one-third the size of CCMatrix, we found that the accuracy of the two models was comparable and confirmed that it is feasible to use crowdsourcing for web mining of parallel data.
- [61] arXiv:2405.09021 [pdf, ps, other]
-
Title: Deep Learning in Earthquake Engineering: A Comprehensive ReviewSubjects: Machine Learning (cs.LG)
This article surveys the growing interest in utilizing Deep Learning (DL) as a powerful tool to address challenging problems in earthquake engineering. Despite decades of advancement in domain knowledge, issues such as uncertainty in earthquake occurrence, unpredictable seismic loads, nonlinear structural responses, and community engagement remain difficult to tackle using domain-specific methods. DL offers promising solutions by leveraging its data-driven capacity for nonlinear mapping, sequential data modeling, automatic feature extraction, dimensionality reduction, optimal decision-making, etc. However, the literature lacks a comprehensive review that systematically covers a consistent scope intersecting DL and earthquake engineering. To bridge the gap, the article first discusses methodological advances to elucidate various applicable DL techniques, such as multi-layer perceptron (MLP), convolutional neural network (CNN), recurrent neural network (RNN), generative adversarial network (GAN), autoencoder (AE), transfer learning (TL), reinforcement learning (RL), and graph neural network (GNN). A thorough research landscape is then disclosed by exploring various DL applications across different research topics, including vision-based seismic damage assessment and structural characterization, seismic demand and damage state prediction, seismic response history prediction, regional seismic risk assessment and community resilience, ground motion (GM) for engineering use, seismic response control, and the inverse problem of system/damage identification. Suitable DL techniques for each research topic are identified, emphasizing the preeminence of CNN for vision-based tasks, RNN for sequential data, RL for community resilience, and unsupervised learning for GM analysis. The article also discusses opportunities and challenges for leveraging DL in earthquake engineering research and practice.
- [62] arXiv:2405.09024 [pdf, ps, html, other]
-
Title: Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy LabelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
The ambiguous appearance, tiny scale, and fine-grained classes of objects in remote sensing imagery inevitably lead to the noisy annotations in category labels of detection dataset. However, the effects and treatments of the label noises are underexplored in modern oriented remote sensing object detectors. To address this issue, we propose a robust oriented remote sensing object detection method through dynamic loss decay (DLD) mechanism, inspired by the two phase ``early-learning'' and ``memorization'' learning dynamics of deep neural networks on clean and noisy samples. To be specific, we first observe the end point of early learning phase termed as EL, after which the models begin to memorize the false labels that significantly degrade the detection accuracy. Secondly, under the guidance of the training indicator, the losses of each sample are ranked in descending order, and we adaptively decay the losses of the top K largest ones (bad samples) in the following epochs. Because these large losses are of high confidence to be calculated with wrong labels. Experimental results show that the method achieves excellent noise resistance performance tested on multiple public datasets such as HRSC2016 and DOTA-v1.0/v2.0 with synthetic category label noise. Our solution also has won the 2st place in the "fine-grained object detection based on sub-meter remote sensing imagery" track with noisy labels of 2023 National Big Data and Computing Intelligence Challenge.
- [63] arXiv:2405.09032 [pdf, ps, html, other]
-
Title: ICAL: Implicit Character-Aided Learning for Enhanced Handwritten Mathematical Expression RecognitionComments: Accept by ICDAR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Significant progress has been made in the field of handwritten mathematical expression recognition, while existing encoder-decoder methods are usually difficult to model global information in \LaTeX. Therefore, this paper introduces a novel approach, Implicit Character-Aided Learning (ICAL), to mine the global expression information and enhance handwritten mathematical expression recognition. Specifically, we propose the Implicit Character Construction Module (ICCM) to predict implicit character sequences and use a Fusion Module to merge the outputs of the ICCM and the decoder, thereby producing corrected predictions. By modeling and utilizing implicit character information, ICAL achieves a more accurate and context-aware interpretation of handwritten mathematical expressions. Experimental results demonstrate that ICAL notably surpasses the state-of-the-art(SOTA) models, improving the expression recognition rate (ExpRate) by 2.21\%/1.75\%/1.28\% on the CROHME 2014/2016/2019 datasets respectively, and achieves a remarkable 69.25\% on the challenging HME100k test set. We make our code available on the GitHub: this https URL
- [64] arXiv:2405.09037 [pdf, ps, html, other]
-
Title: Unmasking Efficiency: Learning Salient Sparse Models in Non-IID Federated LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient communication. SSFL identifies a sparse subnetwork prior to training, leveraging parameter saliency scores computed separately on local client data in non-IID scenarios, and then aggregated, to determine a global mask. Only the sparse model weights are communicated each round between the clients and the server. We validate SSFL's effectiveness using standard non-IID benchmarks, noting marked improvements in the sparsity--accuracy trade-offs. Finally, we deploy our method in a real-world federated learning framework and report improvement in communication time.
- [65] arXiv:2405.09039 [pdf, ps, html, other]
-
Title: SMART: Towards Pre-trained Missing-Aware Model for Patient Health Status PredictionSubjects: Machine Learning (cs.LG)
Electronic health record (EHR) data has emerged as a valuable resource for analyzing patient health status. However, the prevalence of missing data in EHR poses significant challenges to existing methods, leading to spurious correlations and suboptimal predictions. While various imputation techniques have been developed to address this issue, they often obsess unnecessary details and may introduce additional noise when making clinical predictions. To tackle this problem, we propose SMART, a Self-Supervised Missing-Aware RepresenTation Learning approach for patient health status prediction, which encodes missing information via elaborated attentions and learns to impute missing values through a novel self-supervised pre-training approach that reconstructs missing data representations in the latent space. By adopting missing-aware attentions and focusing on learning higher-order representations, SMART promotes better generalization and robustness to missing data. We validate the effectiveness of SMART through extensive experiments on six EHR tasks, demonstrating its superiority over state-of-the-art methods.
- [66] arXiv:2405.09041 [pdf, ps, html, other]
-
Title: Learning from Partial Label Proportions for Whole Slide Image SegmentationShinnosuke Matsuo, Daiki Suehiro, Seiichi Uchida, Hiroaki Ito, Kazuhiro Terada, Akihiko Yoshizawa, Ryoma BiseComments: Accepted at MICCAI2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we address the segmentation of tumor subtypes in whole slide images (WSI) by utilizing incomplete label proportions. Specifically, we utilize `partial' label proportions, which give the proportions among tumor subtypes but do not give the proportion between tumor and non-tumor. Partial label proportions are recorded as the standard diagnostic information by pathologists, and we, therefore, want to use them for realizing the segmentation model that can classify each WSI patch into one of the tumor subtypes or non-tumor. We call this problem ``learning from partial label proportions (LPLP)'' and formulate the problem as a weakly supervised learning problem. Then, we propose an efficient algorithm for this challenging problem by decomposing it into two weakly supervised learning subproblems: multiple instance learning (MIL) and learning from label proportions (LLP). These subproblems are optimized efficiently in the end-to-end manner. The effectiveness of our algorithm is demonstrated through experiments conducted on two WSI datasets.
- [67] arXiv:2405.09042 [pdf, ps, html, other]
-
Title: Exploring the Individuality and Collectivity of Intents behind Interactions for Graph Collaborative FilteringComments: 10 pages, 7 figures, accepted by SIGIR 2024Subjects: Information Retrieval (cs.IR)
Intent modeling has attracted widespread attention in recommender systems. As the core motivation behind user selection of items, intent is crucial for elucidating recommendation results. The current mainstream modeling method is to abstract the intent into unknowable but learnable shared or non-shared parameters. Despite considerable progress, we argue that it still confronts the following challenges: firstly, these methods only capture the coarse-grained aspects of intent, ignoring the fact that user-item interactions will be affected by collective and individual factors (e.g., a user may choose a movie because of its high box office or because of his own unique preferences); secondly, modeling believable intent is severely hampered by implicit feedback, which is incredibly sparse and devoid of true semantics. To address these challenges, we propose a novel recommendation framework designated as Bilateral Intent-guided Graph Collaborative Filtering (BIGCF). Specifically, we take a closer look at user-item interactions from a causal perspective and put forth the concepts of individual intent-which signifies private preferences-and collective intent-which denotes overall awareness. To counter the sparsity of implicit feedback, the feature distributions of users and items are encoded via a Gaussian-based graph generation strategy, and we implement the recommendation process through bilateral intent-guided graph reconstruction re-sampling. Finally, we propose graph contrastive regularization for both interaction and intent spaces to uniformize users, items, intents, and interactions in a self-supervised and non-augmented paradigm. Experimental results on three real-world datasets demonstrate the effectiveness of BIGCF compared with existing solutions.
- [68] arXiv:2405.09044 [pdf, ps, html, other]
-
Title: Modeling and Design Optimization of Looped Water Distribution Networks using MS Excel: Developing the Open-Source X-WHAT ModelMarcus Nóbrega Gomes Jr., Igor Matheus Benites, Salma M. Elsherif, Ahmad F. Taha, Marcio H. GiacomoniSubjects: Systems and Control (eess.SY)
Cost-effective water distribution network (WDN) design with acceptable pressure performance is crucial for the management of drinking water in cities. This paper presents a Microsoft Excel tool to model, simulate, and optimize WDNs with looped pipelines under steady-state incompressible flow simulations. Typically, the hardy-cross method is applied using spreadsheet calculations to estimate discharges. This method requires mass-conservative initial estimates and requires successive iterations to converge. In this paper, however, we develop an alternative method that uses the built-in solver capabilities of Excel, does not require initial mass-conservative estimation, and is free of flow corrections. The main objective of this paper is to develop an open-source accessible tool for simulating hydraulic networks also adapted for teaching and learning purposes. The governing equations and the mathematical basis for the hydraulic modeling of the system are mathematically described, considering the topology of the network, mass and energy conservation, cost of tank material, foundation, and cost of pumping energy to fill the tank. The use of this tool is encouraged at the undergraduate and graduate engineering levels, as it offers the opportunity to address complex concepts in a comprehensive way using a spreadsheet that does not require coding expertise. Hence, users can debug all cells and understand all equations used in the hydraulic model, as well as modify them. To demonstrate the model capabilities, three practical examples are presented, with the first one solved step by step, and the results are compared with the EPANET and with the results reported in the literature. Using the optimization method presented in this paper, it was possible to achieve a cost reduction of 151,790 USD (9.8% of the total cost) in a network that supplies a 44,416 population.
- [69] arXiv:2405.09045 [pdf, ps, html, other]
-
Title: AMSNet: Netlist Dataset for AMS CircuitsZhuofu Tao, Yichen Shi, Yiru Huo, Rui Ye, Zonghang Li, Li Huang, Chen Wu, Na Bai, Zhiping Yu, Ting-Jung Lin, Lei HeSubjects: Computer Vision and Pattern Recognition (cs.CV)
Today's analog/mixed-signal (AMS) integrated circuit (IC) designs demand substantial manual intervention. The advent of multimodal large language models (MLLMs) has unveiled significant potential across various fields, suggesting their applicability in streamlining large-scale AMS IC design as well. A bottleneck in employing MLLMs for automatic AMS circuit generation is the absence of a comprehensive dataset delineating the schematic-netlist relationship. We therefore design an automatic technique for converting schematics into netlists, and create dataset AMSNet, encompassing transistor-level schematics and corresponding SPICE format netlists. With a growing size, AMSNet can significantly facilitate exploration of MLLM applications in AMS circuit design. We have made an initial set of netlists public, and will make both our netlist generation tool and the full dataset available upon publishing of this paper.
- [70] arXiv:2405.09049 [pdf, ps, html, other]
-
Title: Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous DrivingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
This study investigates the use of trajectory and dynamic state information for efficient data curation in autonomous driving machine learning tasks. We propose methods for clustering trajectory-states and sampling strategies in an active learning framework, aiming to reduce annotation and data costs while maintaining model performance. Our approach leverages trajectory information to guide data selection, promoting diversity in the training data. We demonstrate the effectiveness of our methods on the trajectory prediction task using the nuScenes dataset, showing consistent performance gains over random sampling across different data pool sizes, and even reaching sub-baseline displacement errors at just 50% of the data cost. Our results suggest that sampling typical data initially helps overcome the ''cold start problem,'' while introducing novelty becomes more beneficial as the training pool size increases. By integrating trajectory-state-informed active learning, we demonstrate that more efficient and robust autonomous driving systems are possible and practical using low-cost data curation strategies.
- [71] arXiv:2405.09050 [pdf, ps, html, other]
-
Title: 3D Shape Augmentation with Content-Aware Shape ResizingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advancements in deep learning for 3D models have propelled breakthroughs in generation, detection, and scene understanding. However, the effectiveness of these algorithms hinges on large training datasets. We address the challenge by introducing Efficient 3D Seam Carving (E3SC), a novel 3D model augmentation method based on seam carving, which progressively deforms only part of the input model while ensuring the overall semantics are unchanged. Experiments show that our approach is capable of producing diverse and high-quality augmented 3D shapes across various types and styles of input models, achieving considerable improvements over previous methods. Quantitative evaluations demonstrate that our method effectively enhances the novelty and quality of shapes generated by other subsequent 3D generation algorithms.
- [72] arXiv:2405.09054 [pdf, ps, html, other]
-
Title: Dim Small Target Detection and Tracking: A Novel Method Based on Temporal Energy Selective Scaling and Trajectory AssociationSubjects: Computer Vision and Pattern Recognition (cs.CV)
The detection and tracking of small targets in passive optical remote sensing (PORS) has broad applications. However, most of the previously proposed methods seldom utilize the abundant temporal features formed by target motion, resulting in poor detection and tracking performance for low signal-to-clutter ratio (SCR) targets. In this article, we analyze the difficulty based on spatial features and the feasibility based on temporal features of realizing effective detection. According to this analysis, we use a multi-frame as a detection unit and propose a detection method based on temporal energy selective scaling (TESS). Specifically, we investigated the composition of intensity temporal profiles (ITPs) formed by pixels on a multi-frame detection unit. For the target-present pixel, the target passing through the pixel will bring a weak transient disturbance on the ITP and introduce a change in the statistical properties of ITP. We use a well-designed function to amplify the transient disturbance, suppress the background and noise components, and output the trajectory of the target on the multi-frame detection unit. Subsequently, to solve the contradiction between the detection rate and the false alarm rate brought by the traditional threshold segmentation, we associate the temporal and spatial features of the output trajectory and propose a trajectory extraction method based on the 3D Hough transform. Finally, we model the trajectory of the target and propose a trajectory-based multi-target tracking method. Compared with the various state-of-the-art detection and tracking methods, experiments in multiple scenarios prove the superiority of our proposed methods.
- [73] arXiv:2405.09055 [pdf, ps, html, other]
-
Title: A safety realignment framework via subspace-oriented model fusion for large language modelsSubjects: Computation and Language (cs.CL)
The current safeguard mechanisms for large language models (LLMs) are indeed susceptible to jailbreak attacks, making them inherently fragile. Even the process of fine-tuning on apparently benign data for downstream tasks can jeopardize safety. One potential solution is to conduct safety fine-tuning subsequent to downstream fine-tuning. However, there's a risk of catastrophic forgetting during safety fine-tuning, where LLMs may regain safety measures but lose the task-specific knowledge acquired during downstream fine-tuning. In this paper, we introduce a safety realignment framework through subspace-oriented model fusion (SOMF), aiming to combine the safeguard capabilities of initially aligned model and the current fine-tuned model into a realigned model. Our approach begins by disentangling all task vectors from the weights of each fine-tuned model. We then identify safety-related regions within these vectors by subspace masking techniques. Finally, we explore the fusion of the initial safely aligned LLM with all task vectors based on the identified safety subspace. We validate that our safety realignment framework satisfies the safety requirements of a single fine-tuned model as well as multiple models during their fusion. Our findings confirm that SOMF preserves safety without notably compromising performance on downstream tasks, including instruction following in Chinese, English, and Hindi, as well as problem-solving capabilities in Code and Math.
- [74] arXiv:2405.09056 [pdf, ps, html, other]
-
Title: CTS: A Consistency-Based Medical Image Segmentation ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
In medical image segmentation tasks, diffusion models have shown significant potential. However, mainstream diffusion models suffer from drawbacks such as multiple sampling times and slow prediction results. Recently, consistency models, as a standalone generative network, have resolved this issue. Compared to diffusion models, consistency models can reduce the sampling times to once, not only achieving similar generative effects but also significantly speeding up training and prediction. However, they are not suitable for image segmentation tasks, and their application in the medical imaging field has not yet been explored. Therefore, this paper applies the consistency model to medical image segmentation tasks, designing multi-scale feature signal supervision modes and loss function guidance to achieve model convergence. Experiments have verified that the CTS model can obtain better medical image segmentation results with a single sampling during the test phase.
- [75] arXiv:2405.09057 [pdf, ps, html, other]
-
Title: Response Matching for generating materials and moleculesSubjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Computational Physics (physics.comp-ph)
Machine learning has recently emerged as a powerful tool for generating new molecular and material structures. The success of state-of-the-art models stems from their ability to incorporate physical symmetries, such as translation, rotation, and periodicity. Here, we present a novel generative method called Response Matching (RM), which leverages the fact that each stable material or molecule exists at the minimum of its potential energy surface. Consequently, any perturbation induces a response in energy and stress, driving the structure back to equilibrium. Matching to such response is closely related to score matching in diffusion models. By employing the combination of a machine learning interatomic potential and random structure search as the denoising model, RM exploits the locality of atomic interactions, and inherently respects permutation, translation, rotation, and periodic invariances. RM is the first model to handle both molecules and bulk materials under the same framework. We demonstrate the efficiency and generalization of RM across three systems: a small organic molecular dataset, stable crystals from the Materials Project, and one-shot learning on a single diamond configuration.
- [76] arXiv:2405.09059 [pdf, ps, html, other]
-
Title: Task-adaptive Q-FaceComments: Ever submitted to ECCV2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Although face analysis has achieved remarkable improvements in the past few years, designing a multi-task face analysis model is still challenging. Most face analysis tasks are studied as separate problems and do not benefit from the synergy among related tasks. In this work, we propose a novel task-adaptive multi-task face analysis method named as Q-Face, which simultaneously performs multiple face analysis tasks with a unified model. We fuse the features from multiple layers of a large-scale pre-trained model so that the whole model can use both local and global facial information to support multiple tasks. Furthermore, we design a task-adaptive module that performs cross-attention between a set of query vectors and the fused multi-stage features and finally adaptively extracts desired features for each face analysis task. Extensive experiments show that our method can perform multiple tasks simultaneously and achieves state-of-the-art performance on face expression recognition, action unit detection, face attribute analysis, age estimation, and face pose estimation. Compared to conventional methods, our method opens up new possibilities for multi-task face analysis and shows the potential for both accuracy and efficiency.
- [77] arXiv:2405.09061 [pdf, ps, html, other]
-
Title: Improving Transformers using Faithful Positional EncodingComments: arXiv admin note: text overlap with arXiv:2305.17149Subjects: Machine Learning (cs.LG)
We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing information about the positional order of the input sequence. We show that the new encoding approach systematically improves the prediction performance in the time-series classification task.
- [78] arXiv:2405.09062 [pdf, ps, html, other]
-
Title: Naturalistic Music Decoding from EEG Data via Latent Diffusion ModelsEmilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Taketo AkamaSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. We additionally perform song classification based on the generated tracks. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.
- [79] arXiv:2405.09074 [pdf, ps, html, other]
-
Title: See to Believe: Using Visualization To Motivate Updating Third-party DependenciesChaiyong Ragkhitwetsagul, Vipawan Jarukitpipat, Raula Gaikovina Kula, Morakot Choetkiertikul, Klinton Chhun, Wachirayana Wanprasert, Thanwadee SunetnantaSubjects: Software Engineering (cs.SE)
Security vulnerabilities introduced by applications using third-party dependencies are on the increase, caused by the emergence of large ecosystems of libraries such as the NPM packages for JavaScript. Nowadays, libraries depend on each other. Relying on these large ecosystems thus means that vulnerable dependencies are not only direct but also indirect (transitive) dependencies. There are automated tool supports to manage these complex dependencies but recent work still shows that developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision.
In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update, especially when convincing developers. To test this hypothesis, we performed a user study involving 20 participants divided equally into experimental and control groups, comparing the state-of-the-art tools with the tasks of reviewing vulnerabilities with complexities and vulnerabilities with indirect dependencies.
We find that 70% of the participants who saw the visualization did re-prioritize their updates in both tasks. This is higher than the 30% and 60% of the participants who used the npm audit tool in both tasks, respectively. - [80] arXiv:2405.09075 [pdf, ps, html, other]
-
Title: Typhon: Automatic Recommendation of Relevant Code Cells in Jupyter NotebooksChaiyong Ragkhitwetsagul, Veerakit Prasertpol, Natanon Ritta, Paphon Sae-Wong, Thanapon Noraset, Morakot ChoetkiertikulSubjects: Software Engineering (cs.SE)
At present, code recommendation tools have gained greater importance to many software developers in various areas of expertise. Having code recommendation tools has enabled better productivity and performance in developing the code in software and made it easier for developers to find code examples and learn from them.
This paper proposes Typhon, an approach to automatically recommend relevant code cells in Jupyter notebooks. Typhon tokenizes developers' markdown description cells and looks for the most similar code cells from the database using text similarities such as the BM25 ranking function or CodeBERT, a machine-learning approach. Then, the algorithm computes the similarity distance between the tokenized query and markdown cells to return the most relevant code cells to the developers.
We evaluated the Typhon tool on Jupyter notebooks from Kaggle competitions and found that the approach can recommend code cells with moderate accuracy. The approach and results in this paper can lead to further improvements in code cell recommendations in Jupyter notebooks. - [81] arXiv:2405.09076 [pdf, ps, html, other]
-
Title: Enhancing Airline Customer Satisfaction: A Machine Learning and Causal Analysis ApproachTejas Mirthipati (Georgia Institute Of Technology)Comments: 7 pages, 19 figuresSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
This study explores the enhancement of customer satisfaction in the airline industry, a critical factor for retaining customers and building brand reputation, which are vital for revenue growth. Utilizing a combination of machine learning and causal inference methods, we examine the specific impact of service improvements on customer satisfaction, with a focus on the online boarding pass experience. Through detailed data analysis involving several predictive and causal models, we demonstrate that improvements in the digital aspects of customer service significantly elevate overall customer satisfaction. This paper highlights how airlines can strategically leverage these insights to make data-driven decisions that enhance customer experiences and, consequently, their market competitiveness.
- [82] arXiv:2405.09081 [pdf, ps, other]
-
Title: Explainable AI for Ship Collision Avoidance: Decoding Decision-Making Processes and Behavioral IntentionsComments: 24 pases and 15 figures. If the program is needed, please contuct usSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
This study developed an explainable AI for ship collision avoidance. Initially, a critic network composed of sub-task critic networks was proposed to individually evaluate each sub-task in collision avoidance to clarify the AI decision-making processes involved. Additionally, an attempt was made to discern behavioral intentions through a Q-value analysis and an Attention mechanism. The former focused on interpreting intentions by examining the increment of the Q-value resulting from AI actions, while the latter incorporated the significance of other ships in the decision-making process for collision avoidance into the learning objective. AI's behavioral intentions in collision avoidance were visualized by combining the perceived collision danger with the degree of attention to other ships. The proposed method was evaluated through a numerical experiment. The developed AI was confirmed to be able to safely avoid collisions under various congestion levels, and AI's decision-making process was rendered comprehensible to humans. The proposed method not only facilitates the understanding of DRL-based controllers/systems in the ship collision avoidance task but also extends to any task comprising sub-tasks.
- [83] arXiv:2405.09083 [pdf, ps, html, other]
-
Title: RSHazeDiff: A Unified Fourier-aware Diffusion Model for Remote Sensing Image DehazingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Haze severely degrades the visual quality of remote sensing images and hampers the performance of automotive navigation, intelligent monitoring, and urban management. The emerging denoising diffusion probabilistic model (DDPM) exhibits the significant potential for dense haze removal with its strong generation ability. Since remote sensing images contain extensive small-scale texture structures, it is important to effectively restore image details from hazy images. However, current wisdom of DDPM fails to preserve image details and color fidelity well, limiting its dehazing capacity for remote sensing images. In this paper, we propose a novel unified Fourier-aware diffusion model for remote sensing image dehazing, termed RSHazeDiff. From a new perspective, RSHazeDiff explores the conditional DDPM to improve image quality in dense hazy scenarios, and it makes three key contributions. First, RSHazeDiff refines the training phase of diffusion process by performing noise estimation and reconstruction constraints in a coarse-to-fine fashion. Thus, it remedies the unpleasing results caused by the simple noise estimation constraint in DDPM. Second, by taking the frequency information as important prior knowledge during iterative sampling steps, RSHazeDiff can preserve more texture details and color fidelity in dehazed images. Third, we design a global compensated learning module to utilize the Fourier transform to capture the global dependency features of input images, which can effectively mitigate the effects of boundary artifacts when processing fixed-size patches. Experiments on both synthetic and real-world benchmarks validate the favorable performance of RSHazeDiff over multiple state-of-the-art methods. Source code will be released at this https URL.
- [84] arXiv:2405.09084 [pdf, ps, html, other]
-
Title: Temporarily Restricting Solidity Smart Contract InteractionsComments: submitted to DAPPS 2024. 11 pages, 5 FiguresSubjects: Cryptography and Security (cs.CR)
In this work we explore ways to restrict the ability to call Solidity smart contract functions for a specified duration. We describe methods to restrict functions from being called twice in the same transaction, block, or time period. This is related to the notion of non-reentrant functions, which are functions that can be called within a previous execution. These methods can be used to restrict interactions with entire sets of functions of smart contracts. We are motivated to revisit this topic for two reasons. First, we note that sixteen real-world smart contracts exploits in 2023 resulting in over $136M USD lost or stolen that could have been prevented by restricting function calls. As part of this survey, we dissect a new class of exploit that involves so-called read-only reentrancy: exploits that re-enter read-only functions to make smart contract state inconsistent in order to enable their exploitation. Second, while some of these approaches are simple, they may not always behave the same across different blockchains that support Solidity.
- [85] arXiv:2405.09086 [pdf, ps, html, other]
-
Title: Chaos-based reinforcement learning with TD3Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Chaos-based reinforcement learning (CBRL) is a method in which the agent's internal chaotic dynamics drives exploration. This approach offers a model for considering how the biological brain can create variability in its behavior and learn in an exploratory manner. At the same time, it is a learning model that has the ability to automatically switch between exploration and exploitation modes and the potential to realize higher explorations that reflect what it has learned so far. However, the learning algorithms in CBRL have not been well-established in previous studies and have yet to incorporate recent advances in reinforcement learning. This study introduced Twin Delayed Deep Deterministic Policy Gradients (TD3), which is one of the state-of-the-art deep reinforcement learning algorithms that can treat deterministic and continuous action spaces, to CBRL. The validation results provide several insights. First, TD3 works as a learning algorithm for CBRL in a simple goal-reaching task. Second, CBRL agents with TD3 can autonomously suppress their exploratory behavior as learning progresses and resume exploration when the environment changes. Finally, examining the effect of the agent's chaoticity on learning shows that extremely strong chaos negatively impacts the flexible switching between exploration and exploitation.
- [86] arXiv:2405.09090 [pdf, ps, html, other]
-
Title: Towards Next-Generation Steganalysis: LLMs Unleash the Power of Detecting SteganographySubjects: Cryptography and Security (cs.CR)
Linguistic steganography provides convenient implementation to hide messages, particularly with the emergence of AI generation technology. The potential abuse of this technology raises security concerns within societies, calling for powerful linguistic steganalysis to detect carrier containing steganographic messages. Existing methods are limited to finding distribution differences between steganographic texts and normal texts from the aspect of symbolic statistics. However, the distribution differences of both kinds of texts are hard to build precisely, which heavily hurts the detection ability of the existing methods in realistic scenarios. To seek a feasible way to construct practical steganalysis in real world, this paper propose to employ human-like text processing abilities of large language models (LLMs) to realize the difference from the aspect of human perception, addition to traditional statistic aspect. Specifically, we systematically investigate the performance of LLMs in this task by modeling it as a generative paradigm, instead of traditional classification paradigm. Extensive experiment results reveal that generative LLMs exhibit significant advantages in linguistic steganalysis and demonstrate performance trends distinct from traditional approaches. Results also reveal that LLMs outperform existing baselines by a wide margin, and the domain-agnostic ability of LLMs makes it possible to train a generic steganalysis model (Both codes and trained models are openly available in this https URL).
- [87] arXiv:2405.09096 [pdf, ps, html, other]
-
Title: Optimizing Sensor Network Design for Multiple CoverageSubjects: Machine Learning (cs.LG); Robotics (cs.RO); Optimization and Control (math.OC)
Sensor placement optimization methods have been studied extensively. They can be applied to a wide range of applications, including surveillance of known environments, optimal locations for 5G towers, and placement of missile defense systems. However, few works explore the robustness and efficiency of the resulting sensor network concerning sensor failure or adversarial attacks. This paper addresses this issue by optimizing for the least number of sensors to achieve multiple coverage of non-simply connected domains by a prescribed number of sensors. We introduce a new objective function for the greedy (next-best-view) algorithm to design efficient and robust sensor networks and derive theoretical bounds on the network's optimality. We further introduce a Deep Learning model to accelerate the algorithm for near real-time computations. The Deep Learning model requires the generation of training examples. Correspondingly, we show that understanding the geometric properties of the training data set provides important insights into the performance and training process of deep learning techniques. Finally, we demonstrate that a simple parallel version of the greedy approach using a simpler objective can be highly competitive.
- [88] arXiv:2405.09101 [pdf, ps, html, other]
-
Title: Adaptive Koopman Embedding for Robust Control of Complex Dynamical SystemsSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
The discovery of linear embedding is the key to the synthesis of linear control techniques for nonlinear systems. In recent years, while Koopman operator theory has become a prominent approach for learning these linear embeddings through data-driven methods, these algorithms often exhibit limitations in generalizability beyond the distribution captured by training data and are not robust to changes in the nominal system dynamics induced by intrinsic or environmental factors. To overcome these limitations, this study presents an adaptive Koopman architecture capable of responding to the changes in system dynamics online. The proposed framework initially employs an autoencoder-based neural network that utilizes input-output information from the nominal system to learn the corresponding Koopman embedding offline. Subsequently, we augment this nominal Koopman architecture with a feed-forward neural network that learns to modify the nominal dynamics in response to any deviation between the predicted and observed lifted states, leading to improved generalization and robustness to a wide range of uncertainties and disturbances compared to contemporary methods. Extensive tracking control simulations, which are undertaken by integrating the proposed scheme within a Model Predictive Control framework, are used to highlight its robustness against measurement noise, disturbances, and parametric variations in system dynamics.
- [89] arXiv:2405.09109 [pdf, ps, html, other]
-
Title: Motion Prediction with Gaussian Processes for Safe Human-Robot Interaction in Virtual EnvironmentsComments: 17 pagesSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Humans use collaborative robots as tools for accomplishing various tasks. The interaction between humans and robots happens in tight shared workspaces. However, these machines must be safe to operate alongside humans to minimize the risk of accidental collisions. Ensuring safety imposes many constraints, such as reduced torque and velocity limits during operation, thus increasing the time to accomplish many tasks. However, for applications such as using collaborative robots as haptic interfaces with intermittent contacts for virtual reality applications, speed limitations result in poor user experiences. This research aims to improve the efficiency of a collaborative robot while improving the safety of the human user. We used Gaussian process models to predict human hand motion and developed strategies for human intention detection based on hand motion and gaze to improve the time for the robot and human security in a virtual environment. We then studied the effect of prediction. Results from comparisons show that the prediction models improved the robot time by 3\% and safety by 17\%. When used alongside gaze, prediction with Gaussian process models resulted in an improvement of the robot time by 2\% and the safety by 13\%.
- [90] arXiv:2405.09111 [pdf, ps, html, other]
-
Title: CarDreamer: Open-Source Learning Platform for World Model based Autonomous DrivingComments: Dechen Gao, Shuangyu Cai, Hanchu Zhou, Hang Wang contributed equallySubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
To safely navigate intricate real-world scenarios, autonomous vehicles must be able to adapt to diverse road conditions and anticipate future events. World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments. Nevertheless, to the best of our knowledge, there does not exist an accessible platform for training and testing such algorithms in sophisticated driving environments. To fill this void, we introduce CarDreamer, the first open-source learning platform designed specifically for developing WM based autonomous driving algorithms. It comprises three key components: 1) World model backbone: CarDreamer has integrated some state-of-the-art WMs, which simplifies the reproduction of RL algorithms. The backbone is decoupled from the rest and communicates using the standard Gym interface, so that users can easily integrate and test their own algorithms. 2) Built-in tasks: CarDreamer offers a comprehensive set of highly configurable driving tasks which are compatible with Gym interfaces and are equipped with empirically optimized reward functions. 3) Task development suite: This suite streamlines the creation of driving tasks, enabling easy definition of traffic flows and vehicle routes, along with automatic collection of multi-modal observation data. A visualization server allows users to trace real-time agent driving videos and performance metrics through a browser. Furthermore, we conduct extensive experiments using built-in tasks to evaluate the performance and potential of WMs in autonomous driving. Thanks to the richness and flexibility of CarDreamer, we also systematically study the impact of observation modality, observability, and sharing of vehicle intentions on AV safety and efficiency. All code and documents are accessible on this https URL.
- [91] arXiv:2405.09112 [pdf, ps, html, other]
-
Title: Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task LearningComments: 24 pages, 10 figures, ACM ESEC/FSE 2024Journal-ref: Proc. ACM Softw. Eng. 1,FSE, Article 75 (July 2024), 24 pagesSubjects: Software Engineering (cs.SE)
Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function names. We propose Epitome, a framework that enhances function name prediction using votes-based name tokenization and multi-task learning, specifically tailored for different compilation optimization binaries. Epitome learns comprehensive function semantics by pre-trained assembly language model and graph neural network, incorporating function semantics similarity prediction task, to maximize the similarity of function semantics in the context of different compilation optimization levels. In addition, we present two data preprocessing methods to improve the comprehensibility of function names. We evaluate the performance of Epitome using 2,597,346 functions extracted from binaries compiled with 5 optimizations (O0-Os) for 4 architectures (x64, x86, ARM, and MIPS). Epitome outperforms the state-of-the-art function name prediction tool by up to 44.34%, 64.16%, and 54.44% in precision, recall, and F1 score, while also exhibiting superior generalizability.
- [92] arXiv:2405.09113 [pdf, ps, html, other]
-
Title: Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained OptimizationKai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt FredriksonSubjects: Machine Learning (cs.LG)
Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which effectively jailbreaks several open-source LLMs. Our approach relaxes the discrete jailbreak optimization into a continuous optimization and progressively increases the sparsity of the optimizing vectors. Consequently, our method effectively bridges the gap between discrete and continuous space optimization. Experimental results demonstrate that our method is more effective and efficient than existing token-level methods. On Harmbench, our method achieves state of the art attack success rate on seven out of eight LLMs. Code will be made available. Trigger Warning: This paper contains model behavior that can be offensive in nature.
- [93] arXiv:2405.09114 [pdf, ps, html, other]
-
Title: SOEDiff: Efficient Distillation for Small Object EditingSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we delve into a new task known as small object editing (SOE), which focuses on text-based image inpainting within a constrained, small-sized area. Despite the remarkable success have been achieved by current image inpainting approaches, their application to the SOE task generally results in failure cases such as Object Missing, Text-Image Mismatch, and Distortion. These failures stem from the limited use of small-sized objects in training datasets and the downsampling operations employed by U-Net models, which hinders accurate generation. To overcome these challenges, we introduce a novel training-based approach, SOEDiff, aimed at enhancing the capability of baseline models like StableDiffusion in editing small-sized objects while minimizing training costs. Specifically, our method involves two key components: SO-LoRA, which efficiently fine-tunes low-rank matrices, and Cross-Scale Score Distillation loss, which leverages high-resolution predictions from the pre-trained teacher diffusion model. Our method presents significant improvements on the test dataset collected from MSCOCO and OpenImage, validating the effectiveness of our proposed method in small object editing. In particular, when comparing SOEDiff with SD-I model on the OpenImage-f dataset, we observe a 0.99 improvement in CLIP-Score and a reduction of 2.87 in FID. Our project page can be found in this https URL.
- [94] arXiv:2405.09118 [pdf, ps, html, other]
-
Title: BonnBot-I Plus: A Bio-diversity Aware Precise Weed Management Robotic PlatformJournal-ref: IEEE Robotics and Automation Letters 2024Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
In this article, we focus on the critical tasks of plant protection in arable farms, addressing a modern challenge in agriculture: integrating ecological considerations into the operational strategy of precision weeding robots like \bbot. This article presents the recent advancements in weed management algorithms and the real-world performance of \bbot\ at the University of Bonn's Klein-Altendorf campus. We present a novel Rolling-view observation model for the BonnBot-Is weed monitoring section which leads to an average absolute weeding performance enhancement of $3.4\%$. Furthermore, for the first time, we show how precision weeding robots could consider bio-diversity-aware concerns in challenging weeding scenarios. We carried out comprehensive weeding experiments in sugar-beet fields, covering both weed-only and mixed crop-weed situations, and introduced a new dataset compatible with precision weeding. Our real-field experiments revealed that our weeding approach is capable of handling diverse weed distributions, with a minimal loss of only $11.66\%$ attributable to intervention planning and $14.7\%$ to vision system limitations highlighting required improvements of the vision system.
- [95] arXiv:2405.09125 [pdf, ps, html, other]
-
Title: HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text RecognitionComments: 12 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Internal Language Model (LM)-based methods use permutation language modeling (PLM) to solve the error correction caused by conditional independence in external LM-based methods. However, random permutations of human interference cause fit oscillations in the model training, and Iterative Refinement (IR) operation to improve multimodal information decoupling also introduces additional overhead. To address these issues, this paper proposes the Hierarchical Attention autoregressive Model with Adaptive Permutation (HAAP) to enhance the location-context-image interaction capability, improving autoregressive generalization with internal LM. First, we propose Implicit Permutation Neurons (IPN) to generate adaptive attention masks to dynamically exploit token dependencies. The adaptive masks increase the diversity of training data and prevent model dependency on a specific order. It reduces the training overhead of PLM while avoiding training fit oscillations. Second, we develop Cross-modal Hierarchical Attention mechanism (CHA) to couple context and image features. This processing establishes rich positional semantic dependencies between context and image while avoiding IR. Extensive experimental results show the proposed HAAP achieves state-of-the-art (SOTA) performance in terms of accuracy, complexity, and latency on several datasets.
- [96] arXiv:2405.09130 [pdf, ps, html, other]
-
Title: Contextual Integrity GamesSubjects: Computers and Society (cs.CY)
The contextual integrity model is a widely accepted way of analyzing the plurality of norms that are colloquially called "privacy norms". Contextual integrity systematically describes such norms by distinguishing the type of data concerned, the three social agents involved (subject, sender, and recipient) and the transmission principle governing the transfer of information. It allows analyzing privacy norms in terms of their impact on the interaction of those agents with one another.
This paper places contextual integrity in a strict game theoretic framework. When such description is possible it has three key advantages: Firstly, it allows indisputable utilitarian justification of some privacy norms. Secondly, it better relates privacy to topics which are well understood by stakeholders whose education is predominantly quantitative, such as engineers and economists. Thirdly, it is an absolute necessity when describing ethical constraints to machines such as AI agents.
In addition to describing games which capture paradigmatic informational norms, the paper also analyzes cases in which the game, per se, does not encourage normative behavior. The paper discusses two main forms of mechanisms which can be applied to the game in such cases, and shows that they reflect accepted privacy regulation and technologies. - [97] arXiv:2405.09131 [pdf, ps, html, other]
-
Title: RobustMVS: Single Domain Generalized Deep Multi-view StereoComments: Accepted to TCSVT. Code will be released at: this https URL. Benchmark will be released at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Despite the impressive performance of Multi-view Stereo (MVS) approaches given plenty of training samples, the performance degradation when generalizing to unseen domains has not been clearly explored yet. In this work, we focus on the domain generalization problem in MVS. To evaluate the generalization results, we build a novel MVS domain generalization benchmark including synthetic and real-world datasets. In contrast to conventional domain generalization benchmarks, we consider a more realistic but challenging scenario, where only one source domain is available for training. The MVS problem can be analogized back to the feature matching task, and maintaining robust feature consistency among views is an important factor for improving generalization performance. To address the domain generalization problem in MVS, we propose a novel MVS framework, namely RobustMVS. A DepthClustering-guided Whitening (DCW) loss is further introduced to preserve the feature consistency among different views, which decorrelates multi-view features from viewpoint-specific style information based on geometric priors from depth maps. The experimental results further show that our method achieves superior performance on the domain generalization benchmark.
- [98] arXiv:2405.09132 [pdf, ps, html, other]
-
Title: EFACT: an External Function Auto-Completion Tool to Strengthen Static Binary LiftingSubjects: Software Engineering (cs.SE)
Static binary lifting is essential in binary rewriting frameworks. Existing tools overlook the impact of External Function Completion (EXFC) in static binary lifting. EXFC recovers the prototypes of External Functions (EXFs, functions defined in standard shared libraries) using only the function symbols available. Incorrect EXFC can misinterpret the source binary, or cause memory overflows in static binary translation, which eventually results in program crashes. Notably, existing tools struggle to recover the prototypes of mangled EXFs originating from binaries compiled from C++. Moreover, they require time-consuming manual processing to support new libraries.
This paper presents EFACT, an External Function Auto-Completion Tool for static binary lifting. Our EXF recovery algorithm better recovers the prototypes of mangled EXFs, particularly addressing the template specialization mechanism in C++. EFACT is designed as a lightweight plugin to strengthen other static binary rewriting frameworks in EXFC. Our evaluation shows that EFACT outperforms RetDec and McSema in mangled EXF recovery by 96.4% and 97.3% on SPEC CPU 2017.
Furthermore, we delve deeper into static binary translation and address several cross-ISA EXFC problems. When integrated with McSema, EFACT correctly translates 36.7% more benchmarks from x86-64 to x86-64 and 93.6% more from x86-64 to AArch64 than McSema alone on EEMBC. - [99] arXiv:2405.09133 [pdf, ps, html, other]
-
Title: Overcoming Domain Drift in Online Continual LearningSubjects: Machine Learning (cs.LG)
Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential learning tasks may entail the gradual displacement of the decision boundaries in the learned feature space, rendering the learned knowledge susceptible to forgetting. To address the above problem, in this paper, we propose a novel rehearsal strategy, termed Drift-Reducing Rehearsal (DRR), to anchor the domain of old tasks and reduce the negative transfer effects. First, we propose to select memory for more representative samples guided by constructed centroids in a data stream. Then, to keep the model from domain chaos in drifting, a two-level angular cross-task Contrastive Margin Loss (CML) is proposed, to encourage the intra-class and intra-task compactness, and increase the inter-class and inter-task discrepancy. Finally, to further suppress the continual domain drift, we present an optional Centorid Distillation Loss (CDL) on the rehearsal memory to anchor the knowledge in feature space for each previous old task. Extensive experimental results on four benchmark datasets validate that the proposed DRR can effectively mitigate the continual domain drift and achieve the state-of-the-art (SOTA) performance in OCL.
- [100] arXiv:2405.09138 [pdf, ps, html, other]
-
Title: OpenGait: A Comprehensive Benchmark Study for Gait Recognition towards Better PracticalityChao Fan, Saihui Hou, Junhao Liang, Chuanfu Shen, Jingzhe Ma, Dongyang Jin, Yongzhen Huang, Shiqi YuSubjects: Computer Vision and Pattern Recognition (cs.CV)
Gait recognition, a rapidly advancing vision technology for person identification from a distance, has made significant strides in indoor settings. However, evidence suggests that existing methods often yield unsatisfactory results when applied to newly released real-world gait datasets. Furthermore, conclusions drawn from indoor gait datasets may not easily generalize to outdoor ones. Therefore, the primary goal of this work is to present a comprehensive benchmark study aimed at improving practicality rather than solely focusing on enhancing performance. To this end, we first develop OpenGait, a flexible and efficient gait recognition platform. Using OpenGait as a foundation, we conduct in-depth ablation experiments to revisit recent developments in gait recognition. Surprisingly, we detect some imperfect parts of certain prior methods thereby resulting in several critical yet undiscovered insights. Inspired by these findings, we develop three structurally simple yet empirically powerful and practically robust baseline models, i.e., DeepGaitV2, SkeletonGait, and SkeletonGait++, respectively representing the appearance-based, model-based, and multi-modal methodology for gait pattern description. Beyond achieving SoTA performances, more importantly, our careful exploration sheds new light on the modeling experience of deep gait models, the representational capacity of typical gait modalities, and so on. We hope this work can inspire further research and application of gait recognition towards better practicality. The code is available at this https URL.
- [101] arXiv:2405.09141 [pdf, ps, html, other]
-
Title: Tree-Packing Revisited: Faster Fully Dynamic Min-Cut and ArboricitySubjects: Data Structures and Algorithms (cs.DS)
A tree-packing is a collection of spanning trees of a graph. It has been a useful tool for computing the minimum cut in static, dynamic, and distributed settings. In particular, [Thorup, Comb. 2007] used them to obtain his dynamic min-cut algorithm with $\tilde O(\lambda^{14.5}\sqrt{n})$ worst-case update time. We reexamine this relationship, showing that we need to maintain fewer spanning trees for such a result; we show that we only need to pack $\Theta(\lambda^3 \log m)$ greedy trees to guarantee a 1-respecting cut or a trivial cut in some contracted graph.
Based on this structural result, we then provide a deterministic algorithm for fully dynamic exact min-cut, that has $\tilde O(\lambda^{5.5}\sqrt{n})$ worst-case update time, for min-cut value bounded by $\lambda$. In particular, this also leads to an algorithm for general fully dynamic exact min-cut with $\tilde O(m^{1-1/12})$ amortized update time, improving upon $\tilde O(m^{1-1/31})$ [Goranci et al., SODA 2023].
We also give the first fully dynamic algorithm that maintains a $(1+\varepsilon)$-approximation of the fractional arboricity -- which is strictly harder than the integral arboricity. Our algorithm is deterministic and has $O(\alpha \log^6m/\varepsilon^4)$ amortized update time, for arboricity at most $\alpha$. We extend these results to a Monte Carlo algorithm with $O(\text{poly}(\log m,\varepsilon^{-1}))$ amortized update time against an adaptive adversary. Our algorithms work on multi-graphs as well.
Both result are obtained by exploring the connection between the min-cut/arboricity and (greedy) tree-packing. We investigate tree-packing in a broader sense; including a lower bound for greedy tree-packing, which - to the best of our knowledge - is the first progress on this topic since [Thorup, Comb. 2007]. - [102] arXiv:2405.09144 [pdf, ps, html, other]
-
Title: Evaluation scheme for children-centered language interaction competence of AI-driven robotsComments: 7 pages, CHI 2024 The Second Workshop on Child-Centred AISubjects: Human-Computer Interaction (cs.HC)
This article explores the evaluation method for the language communication proficiency of AI-driven robots engaging in interactive communication with children. The utilization of AI-driven robots in children's everyday communication is swiftly advancing, underscoring the importance of evaluating these robots'language communication skills. Based on 11 Chinese families' interviews and thematic analysis of the comment text from shopping websites, a framework is introduced in the article to assess five key dimensions of child-robot language communication: interactivity, specificity, development, sociality, and safety. We draw on the concept of "children's agency", viewing children as active participants in shaping society and cultural life alongside adults. Therefore, this article places particular emphasis on collecting data related to children. Whether through survey interviews or direct interactive experiments, we treat children as an independent object for data collection. The study involved empirical research following the mentioned framework, which involved capturing interaction videos in natural conversation settings among children from 6 families. Analysis was performed on quantitative data obtained from video recordings, alongside questionnaires and interviews carried out by parents acting as participants or observers. We found that the presence or absence of parents during children's interactions with robots can impact the evaluation of robots'language communication abilities. Ultimately, this article proposes an enhanced comprehensive evaluation framework incorporating insights from parents and children, supported by empirical evidence and inter-rater consistency assessments, showcasing the scheme's efficacy.
- [103] arXiv:2405.09148 [pdf, ps, html, other]
-
Title: A Hierarchically Feature Reconstructed Autoencoder for Unsupervised Anomaly DetectionComments: 12 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Anomaly detection and localization without any manual annotations and prior knowledge is a challenging task under the setting of unsupervised learning. The existing works achieve excellent performance in the anomaly detection, but with complex networks or cumbersome pipelines. To address this issue, this paper explores a simple but effective architecture in the anomaly detection. It consists of a well pre-trained encoder to extract hierarchical feature representations and a decoder to reconstruct these intermediate features from the encoder. In particular, it does not require any data augmentations and anomalous images for training. The anomalies can be detected when the decoder fails to reconstruct features well, and then errors of hierarchical feature reconstruction are aggregated into an anomaly map to achieve anomaly localization. The difference comparison between those features of encoder and decode lead to more accurate and robust localization results than the comparison in single feature or pixel-by-pixel comparison in the conventional works. Experiment results show that the proposed method outperforms the state-of-the-art methods on MNIST, Fashion-MNIST, CIFAR-10, and MVTec Anomaly Detection datasets on both anomaly detection and localization.
- [104] arXiv:2405.09150 [pdf, ps, html, other]
-
Title: Curriculum Dataset DistillationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. In this paper, we present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their overfitting to the neural network involved in distilling. This enhances the generalization capability of the distilled images across various neural network architectures and also increases their robustness to noise. Extensive experiments demonstrate that our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1\% on Tiny-ImageNet, 9.0\% on ImageNet-1K, and 7.3\% on ImageNet-21K. The source code will be released to the community.
- [105] arXiv:2405.09152 [pdf, ps, html, other]
-
Title: Scalable Image Coding for Humans and Machines Using Feature Fusion NetworkSubjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.
- [106] arXiv:2405.09153 [pdf, ps, html, other]
-
Title: Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parserComments: Accepted to the 6th Clinical NLP Workshop at NAACL, 2024Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME) corpus, we adapted a state-of-the-art AMR parser utilizing continuous training. Our approach incorporates data augmentation techniques to enhance the accuracy of AMR structure predictions. Notably, through this learning strategy, our parser achieved an impressive F1 score of 88% on the THYME corpus's colon cancer dataset. Moreover, our research delved into the efficacy of data required for domain adaptation within the realm of clinical notes, presenting domain adaptation data requirements for AMR parsing. This exploration not only underscores the parser's robust performance but also highlights its potential in facilitating a deeper understanding of clinical narratives through structured semantic representations.
- [107] arXiv:2405.09155 [pdf, ps, html, other]
-
Title: TunnelSense: Low-power, Non-Contact Sensing using Tunnel DiodesLim Chang Quan Thaddeus, C. Rajashekar Reddy, Yuvraj Singh Bhadauria, Dhairya Shah, Manoj Gulati, Ambuj VarshneyComments: This work is accepted at IEEE RFID 2024Subjects: Emerging Technologies (cs.ET)
Sensing the motion of physical objects in an environment enables numerous applications, from tracking occupancy in buildings and monitoring vital signs to diagnosing faults in machines. Typically, these application scenarios involve attaching a sensor, such as an accelerometer, to the object of interest, like a wearable device that tracks our steps. However, many of these scenarios require tracking motion in a noncontact manner where the sensor is not in touch with the object. A sensor in such a scenario observes variations in radio, light, acoustic, and infrared fields disturbed by the object's motion. Current noncontact sensing mechanisms often require substantial energy and involve complex processing on sophisticated hardware. We present TunnelSense, a novel mechanism that rethinks noncontact sensing using tunnel diode oscillators. They are highly sensitive to changes in their electromagnetic environments. The motion of an object near a tunnel diode oscillator induces corresponding changes in its resonant frequency and thus in the generated radio waves. Additionally, the low-power characteristics of the tunnel diode allow tags designed using them to operate on less than 100microwatt of power consumption and with a biasing voltage starting at 70 millivolt. This enables prolonged tag operation on a small battery or energy harvested from the environment. Among numerous applications enabled by the TunnelSense system, this work demonstrates its ability to detect breathing at distances up to 30 centimeter between the subject and the TunnelSense tag.
- [108] arXiv:2405.09163 [pdf, ps, html, other]
-
Title: DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State RepresentationSubjects: Systems and Control (eess.SY)
Variable speed limit (VSL) control is an established yet challenging problem to improve freeway traffic mobility and alleviate bottlenecks by customizing speed limits at proper locations based on traffic conditions. Recent advances in deep reinforcement learning (DRL) have shown promising results in solving VSL control problems by interacting with sophisticated environments. However, the modeling of these methods ignores the inherent graph structure of the traffic state which can be a key factor for more efficient VSL control. Graph structure can not only capture the static spatial feature but also the dynamic temporal features of traffic. Therefore, we propose the DVS-RG: DRL-based differential variable speed limit controller with graph state representation. DVS-RG provides distinct speed limits per lane in different locations dynamically. The road network topology and traffic information(e.g., occupancy, speed) are integrated as the state space of DVS-RG so that the spatial features can be learned. The normalization reward which combines efficiency and safety is used to train the VSL controller to avoid excessive inefficiencies or low safety. The results obtained from the simulation study on SUMO show that DRL-RG achieves higher traffic efficiency (the average waiting time reduced to 68.44\%) and improves the safety measures (the number of potential collision reduced by 15.93\% ) compared to state-of-the-art DRL methods.
- [109] arXiv:2405.09165 [pdf, ps, html, other]
-
Title: An Empirical Study of Token-based Micro CommitsSubjects: Software Engineering (cs.SE)
In software development, developers frequently apply maintenance activities to the source code that change a few lines by a single commit. A good understanding of the characteristics of such small changes can support quality assurance approaches (e.g., automated program repair), as it is likely that small changes are addressing deficiencies in other changes; thus, understanding the reasons for creating small changes can help understand the types of errors introduced. Eventually, these reasons and the types of errors can be used to enhance quality assurance approaches for improving code quality. While prior studies used code churns to characterize and investigate the small changes, such a definition has a critical limitation. Specifically, it loses the information of changed tokens in a line. For example, this definition fails to distinguish the following two one-line changes: (1) changing a string literal to fix a displayed message and (2) changing a function call and adding a new parameter. These are definitely maintenance activities, but we deduce that researchers and practitioners are interested in supporting the latter change. To address this limitation, in this paper, we define micro commits, a type of small change based on changed tokens. Our goal is to quantify small changes using changed tokens. Changed tokens allow us to identify small changes more precisely. In fact, this token-level definition can distinguish the above example. We investigate defined micro commits in four OSS projects and understand their characteristics as the first empirical study on token-based micro commits. We find that micro commits mainly replace a single name or literal token, and micro commits are more likely used to fix bugs. Additionally, we propose the use of token-based information to support software engineering approaches in which very small changes significantly affect their effectiveness.
- [110] arXiv:2405.09171 [pdf, ps, html, other]
-
Title: Hierarchical Emotion Prediction and Control in Text-to-Speech SynthesisComments: This is accepted to IEEE ICASSP 2024Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS) synthesis. Prior studies have primarily focused on learning a global prosodic representation at the utterance level, which strongly correlates with linguistic prosody. Our goal is to construct a hierarchical emotion distribution (ED) that effectively encapsulates intensity variations of emotions at various levels of granularity, encompassing phonemes, words, and utterances. During TTS training, the hierarchical ED is extracted from the ground-truth audio and guides the predictor to establish a connection between emotional and linguistic prosody. At run-time inference, the TTS model generates emotional speech and, at the same time, provides quantitative control of emotion over the speech constituents. Both objective and subjective evaluations validate the effectiveness of the proposed framework in terms of emotion prediction and control.
- [111] arXiv:2405.09173 [pdf, ps, html, other]
-
Title: The Economic Limits of Permissionless ConsensusSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The purpose of a consensus protocol is to keep a distributed network of nodes "in sync," even in the presence of an unpredictable communication network and adversarial behavior by some of the participating nodes. In the permissionless setting, these nodes may be operated by unknown players, with each player free to use multiple identifiers and to start or stop running the protocol at any time. Establishing that a permissionless consensus protocol is "secure" thus requires both a distributed computing argument (that the protocol guarantees consistency and liveness unless the fraction of adversarial participation is sufficiently large) and an economic argument (that carrying out an attack would be prohibitively expensive for an attacker). There is a mature toolbox for assembling arguments of the former type; the goal of this paper is to lay the foundations for arguments of the latter type.
An ideal permissionless consensus protocol would, in addition to satisfying standard consistency and liveness guarantees, render consistency violations prohibitively expensive for the attacker without collateral damage to honest participants. We make this idea precise with our notion of the EAAC (expensive to attack in the absence of collapse) property, and prove the following results:
1. In the synchronous and dynamically available setting, with an adversary that controls at least one-half of the overall resources, no protocol can be EAAC.
2. In the partially synchronous and quasi-permissionless setting, with an adversary that controls at least one-third of the overall resources, no protocol can be EAAC.
3. In the synchronous and quasi-permissionless setting, there is a proof-of-stake protocol that, provided the adversary controls less than two-thirds of the overall stake, satisfies the EAAC property.
All three results are optimal with respect to the size of the adversary. - [112] arXiv:2405.09176 [pdf, ps, html, other]
-
Title: Cross-Input Certified Training for Universal PerturbationsComments: 21 pages, 5 figuresSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Existing work in trustworthy machine learning primarily focuses on single-input adversarial perturbations. In many real-world attack scenarios, input-agnostic adversarial attacks, e.g. universal adversarial perturbations (UAPs), are much more feasible. Current certified training methods train models robust to single-input perturbations but achieve suboptimal clean and UAP accuracy, thereby limiting their applicability in practical applications. We propose a novel method, CITRUS, for certified training of networks robust against UAP attackers. We show in an extensive evaluation across different datasets, architectures, and perturbation magnitudes that our method outperforms traditional certified training methods on standard accuracy (up to 10.3\%) and achieves SOTA performance on the more practical certified UAP accuracy metric.
- [113] arXiv:2405.09177 [pdf, ps, other]
-
Title: Shacl4Bib: custom validation of library dataSubjects: Digital Libraries (cs.DL)
The Shapes Constraint Language (SHACL) is a formal language for validating RDF graphs against a set of conditions. Following this idea and implementing a subset of the language, the Metadata Quality Assessment Framework provides Shacl4Bib: a mechanism to define SHACL-like rules for data sources in non-RDF based formats, such as XML, CSV and JSON. QA catalogue extends this concept further to MARC21, UNIMARC and PICA data. The criteria can be defined either with YAML or JSON configuration files or with Java code. Libraries can validate their data against criteria expressed in a unified language, that improves the clarity and the reusability of custom validation processes.
- [114] arXiv:2405.09178 [pdf, ps, html, other]
-
Title: Testing and Debugging Quantum Programs: The Road to 2030Comments: Submitted to FSE 2024 (SE2030. Software Engineering in 2030 Workshop)Subjects: Software Engineering (cs.SE); Quantum Physics (quant-ph)
Quantum Computing has existed in the theoretical realm for several decades. Recently, given the latest developments in hardware, quantum computing has re-emerged as a promising technology with the potential to solve problems that a classical computer could take hundreds of years to solve. With the rising interest in the field, there are challenges and opportunities for academics and practitioners in terms of software engineering practices, particularly in testing and debugging quantum programs. This paper presents a roadmap for addressing these challenges, pointing out the existing gaps in the literature and suggesting research directions. We present the current state-of-the-art testing and debugging strategies, including classical techniques applied to quantum programs, the development and implementation of quantum-specific assertions, and the identification and classification of bug patterns unique to quantum computing. Additionally, we introduce a conceptual model to illustrate the main concepts regarding the testing and debugging of quantum programs as well as the relationship between them. Those concepts are then used to identify and discuss the main research challenges to cope with quantum programs through 2030, focusing on the interfaces between classical and quantum computing and on creating testing and debugging techniques that take advantage of the unique quantum computing characteristics.
- [115] arXiv:2405.09181 [pdf, ps, html, other]
-
Title: StateGuard: Detecting State Derailment Defects in Decentralized Exchange Smart ContractComments: 5 pages,2 figures, prepared for Conference WWW 2024Journal-ref: WWW '24, May 2024, Pages 810-813Subjects: Software Engineering (cs.SE)
Decentralized Exchanges (DEXs), leveraging blockchain technology and smart contracts, have emerged in decentralized finance. However, the DEX project with multi-contract interaction is accompanied by complex state logic, which makes it challenging to solve state defects. In this paper, we conduct the first systematic study on state derailment defects of DEXs. These defects could lead to incorrect, incomplete, or unauthorized changes to the system state during contract execution, potentially causing security threats. We propose StateGuard, a deep learning-based framework to detect state derailment defects in DEX smart contracts. StateGuard constructs an Abstract Syntax Tree (AST) of the smart contract, extracting key features to generate a graph representation. Then, it leverages a Graph Convolutional Network (GCN) to discover defects. Evaluating StateGuard on 46 DEX projects with 5,671 smart contracts reveals its effectiveness, with a precision of 92.24%. To further verify its practicality, we used StateGuard to audit real-world smart contracts and successfully authenticated multiple novel CVEs.
- [116] arXiv:2405.09183 [pdf, ps, other]
-
Title: A Formal Approach for Tuning Stochastic OscillatorsJournal-ref: Computational Methods in Systems Biology, Sep 2023, Luxembourg, Luxembourg. pp.1-17Subjects: Formal Languages and Automata Theory (cs.FL)
Periodic recurrence is a prominent behavioural of many biological phenomena, including cell cycle and circadian rhythms. Although deterministic models are commonly used to represent the dynamics of periodic phenomena, it is known that they are little appropriate in the case of systems in which stochastic noise induced by small population numbers is actually responsible for periodicity. Within the stochastic modelling settings automata-based model checking approaches have proven an effective means for the analysis of oscillatory dynamics, the main idea being that of coupling a period detector automaton with a continuous-time Markov chain model of an alleged oscillator. In this paper we address a complementary aspect, i.e. that of assessing the dependency of oscillation related measure (period and amplitude) against the parameters of a stochastic oscillator. To this aim we introduce a framework which, by combining an Approximate Bayesian Computation scheme with a hybrid automata capable of quantifying how distant an instance of a stochastic oscillator is from matching a desired (average) period, leads us to identify regions of the parameter space in which oscillation with given period are highly likely. The method is demonstrated through a couple of case studies, including a model of the popular Repressilator circuit.
- [117] arXiv:2405.09185 [pdf, ps, html, other]
-
Title: Influence Maximization in Hypergraphs Using A Genetic Algorithm with New Initialization and Evaluation MethodsSubjects: Social and Information Networks (cs.SI); Neural and Evolutionary Computing (cs.NE)
Influence maximization (IM) is a crucial optimization task related to analyzing complex networks in the real world, such as social networks, disease propagation networks, and marketing networks. Publications to date about the IM problem focus mainly on graphs, which fail to capture high-order interaction relationships from the real world. Therefore, the use of hypergraphs for addressing the IM problem has been receiving increasing attention. However, identifying the most influential nodes in hypergraphs remains challenging, mainly because nodes and hyperedges are often strongly coupled and correlated. In this paper, to effectively identify the most influential nodes, we first propose a novel hypergraph-independent cascade model that integrates the influences of both node and hyperedge failures. Afterward, we introduce genetic algorithms (GA) to identify the most influential nodes that leverage hypergraph collective influences. In the GA-based method, the hypergraph collective influence is effectively used to initialize the population, thereby enhancing the quality of initial candidate solutions. The designed fitness function considers the joint influences of both nodes and hyperedges. This ensures the optimal set of nodes with the best influence on both nodes and hyperedges to be evaluated accurately. Moreover, a new mutation operator is designed by introducing factors, i.e., the collective influence and overlapping effects of nodes in hypergraphs, to breed high-quality offspring. In the experiments, several simulations on both synthetic and real hypergraphs have been conducted, and the results demonstrate that the proposed method outperforms the compared methods.
- [118] arXiv:2405.09186 [pdf, ps, html, other]
-
Title: HumanRankEval: Automatic Evaluation of LMs as Conversational AssistantsComments: Accepted to NACCL 2024 main conferenceSubjects: Computation and Language (cs.CL)
Language models (LMs) as conversational assistants recently became popular tools that help people accomplish a variety of tasks. These typically result from adapting LMs pretrained on general domain text sequences through further instruction-tuning and possibly preference optimisation methods. The evaluation of such LMs would ideally be performed using human judgement, however, this is not scalable. On the other hand, automatic evaluation featuring auxiliary LMs as judges and/or knowledge-based tasks is scalable but struggles with assessing conversational ability and adherence to instructions. To help accelerate the development of LMs as conversational assistants, we propose a novel automatic evaluation task: HumanRankEval (HRE). It consists of a large-scale, diverse and high-quality set of questions, each with several answers authored and scored by humans. To perform evaluation, HRE ranks these answers based on their log-likelihood under the LM's distribution, and subsequently calculates their correlation with the corresponding human rankings. We support HRE's efficacy by investigating how efficiently it separates pretrained and instruction-tuned LMs of various sizes. We show that HRE correlates well with human judgements and is particularly responsive to model changes following instruction-tuning.
- [119] arXiv:2405.09190 [pdf, ps, html, other]
-
Title: Advancing Explainable AI with Causal Analysis in Large-Scale Fuzzy Cognitive MapsComments: 6 pages, 4 figuresSubjects: Artificial Intelligence (cs.AI)
In the quest for accurate and interpretable AI models, eXplainable AI (XAI) has become crucial. Fuzzy Cognitive Maps (FCMs) stand out as an advanced XAI method because of their ability to synergistically combine and exploit both expert knowledge and data-driven insights, providing transparency and intrinsic interpretability. This letter introduces and investigates the "Total Causal Effect Calculation for FCMs" (TCEC-FCM) algorithm, an innovative approach that, for the first time, enables the efficient calculation of total causal effects among concepts in large-scale FCMs by leveraging binary search and graph traversal techniques, thereby overcoming the challenge of exhaustive causal path exploration that hinder existing methods. We evaluate the proposed method across various synthetic FCMs that demonstrate TCEC-FCM's superior performance over exhaustive methods, marking a significant advancement in causal effect analysis within FCMs, thus broadening their usability for modern complex XAI applications.
- [120] arXiv:2405.09191 [pdf, ps, html, other]
-
Title: QMedShield: A Novel Quantum Chaos-based Image Encryption Scheme for Secure Medical Image Storage in the CloudComments: 20 pages, 17 Figures, 9 TablesSubjects: Cryptography and Security (cs.CR); Multimedia (cs.MM)
In the age of digital technology, medical images play a crucial role in the healthcare industry which aids surgeons in making precise decisions and reducing the diagnosis time. However, the storage of large amounts of these images in third-party cloud services raises privacy and security concerns. There are a lot of classical security mechanisms to protect them. Although, the advent of quantum computing entails the development of quantum-based encryption models for healthcare. Hence, we introduce a novel quantum chaos-based encryption scheme for medical images in this article. The model comprises bit-plane scrambling, quantum logistic map, quantum operations in the diffusion phase and hybrid chaotic map, DNA encoding, and computations in the confusion phase to transform the plain medical image into a cipher medical image. The proposed scheme has been evaluated using multiple statistical measures and validated against more attacks such as differential attacks with three different medical datasets. Hence the introduced encryption model has proved to be attack-resistant and robust than other existing image encryption schemes, ensuring the secure storage of medical images in cloud environments.
- [121] arXiv:2405.09193 [pdf, ps, html, other]
-
Title: Autonomous Cooperative Levels of Multiple-Heterogeneous Unmanned Vehicle SystemsSubjects: Systems and Control (eess.SY)
As multiple and heterogenous unmanned vehicle systems continue to play an increasingly important role in addressing complex missions in the real world, the need for effective cooperation among unmanned vehicles becomes paramount. The concept of autonomous cooperation, wherein unmanned vehicles cooperate without human intervention or human control, offers promising avenues for enhancing the efficiency and adaptability of intelligence of multiple-heterogeneous unmanned vehicle systems. Despite the growing interests in this domain, as far as the authors are concerned, there exists a notable lack of comprehensive literature on defining explicit concept and classifying levels of autonomous cooperation of multiple-heterogeneous unmanned vehicle systems. In this aspect, this article aims to define the explicit concept of autonomous cooperation of multiple-heterogeneous unmanned vehicle systems. Furthermore, we provide a novel criterion to assess the technical maturity of the developed unmanned vehicle systems by classifying the autonomous cooperative levels of multiple-heterogeneous unmanned vehicle systems.
- [122] arXiv:2405.09194 [pdf, ps, other]
-
Title: Flexible image analysis for law enforcement agencies with deep neural networks to determine: where, who and whatHenri Bouma, Bart Joosten, Maarten C Kruithof, Maaike H T de Boer, Alexandru Ginsca (LIST (CEA)), Benjamin Labbe (LIST (CEA)), Quoc T Vuong (LIST (CEA))Journal-ref: SPIE - Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies II, 2018, pp.27Subjects: Computer Vision and Pattern Recognition (cs.CV)
Due to the increasing need for effective security measures and the integration of cameras in commercial products, a hugeamount of visual data is created today. Law enforcement agencies (LEAs) are inspecting images and videos to findradicalization, propaganda for terrorist organizations and illegal products on darknet markets. This is time consuming.Instead of an undirected search, LEAs would like to adapt to new crimes and threats, and focus only on data from specificlocations, persons or objects, which requires flexible interpretation of image content. Visual concept detection with deepconvolutional neural networks (CNNs) is a crucial component to understand the image content. This paper has fivecontributions. The first contribution allows image-based geo-localization to estimate the origin of an image. CNNs andgeotagged images are used to create a model that determines the location of an image by its pixel values. The secondcontribution enables analysis of fine-grained concepts to distinguish sub-categories in a generic concept. The proposedmethod encompasses data acquisition and cleaning and concept hierarchies. The third contribution is the recognition ofperson attributes (e.g., glasses or moustache) to enable query by textual description for a person. The person-attributeproblem is treated as a specific sub-task of concept classification. The fourth contribution is an intuitive image annotationtool based on active learning. Active learning allows users to define novel concepts flexibly and train CNNs with minimalannotation effort. The fifth contribution increases the flexibility for LEAs in the query definition by using query expansion.Query expansion maps user queries to known and detectable concepts. Therefore, no prior knowledge of the detectableconcepts is required for the users. The methods are validated on data with varying locations (popular and non-touristiclocations), varying person attributes (CelebA dataset), and varying number of annotations.
- [123] arXiv:2405.09197 [pdf, ps, other]
-
Title: Parallel and Proximal Linear-Quadratic Methods for Real-Time Constrained Model-Predictive ControlWilson Jallet (LAAS-GEPETTO, WILLOW), Ewen Dantec (WILLOW), Etienne Arlaud (WILLOW), Justin Carpentier (WILLOW, DI-ENS), Nicolas Mansard (LAAS-GEPETTO)Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
-Recent strides in model predictive control (MPC)underscore a dependence on numerical advancements to efficientlyand accurately solve large-scale problems. Given the substantialnumber of variables characterizing typical whole-body optimalcontrol (OC) problems -often numbering in the thousands-exploiting the sparse structure of the numerical problem becomescrucial to meet computational demands, typically in the range ofa few milliseconds. A fundamental building block for computingNewton or Sequential Quadratic Programming (SQP) steps indirect optimal control methods involves addressing the linearquadratic regulator (LQR) problem. This paper concentrateson equality-constrained problems featuring implicit systemdynamics and dual regularization, a characteristic found inadvanced interior-point or augmented Lagrangian solvers. Here,we introduce a parallel algorithm designed for solving an LQRproblem with dual regularization. Leveraging a rewriting of theLQR recursion through block elimination, we first enhanced theefficiency of the serial algorithm, then subsequently generalized itto handle parametric problems. This extension enables us to splitdecision variables and solve multiple subproblems concurrently.Our algorithm is implemented in our nonlinear numerical optimalcontrol library ALIGATOR. It showcases improved performanceover previous serial formulations and we validate its efficacy bydeploying it in the model predictive control of a real quadrupedrobot. This paper follows up from our prior work on augmentedLagrangian methods for numerical optimal control with implicitdynamics and constraints.
- [124] arXiv:2405.09200 [pdf, ps, html, other]
-
Title: Performance Analysis of RIS-aided MISO Systems with EMI and Channel AgingSubjects: Information Theory (cs.IT)
In this paper, we investigate a reconfigurable intelligent surface (RIS)-aided multiple-input single-output (MISO) system in the presence of electromagnetic interference (EMI) and channel aging with a Rician fading channel model between the base station (BS) and user equipment (UE). Specifically, we derive the closed-form expression for downlink spectral efficiency (SE) with maximum ratio transmission (MRT) precoding. The Monte-Carlo simulation supports the theoretical results, demonstrating that amplifying the weight of the line-of-sight (LoS) component in Rician fading channels can boost SE, while EMI has a detrimental impact. Furthermore, continuously increasing the number of RIS elements is not an optimal choice when EMI exists. Nonetheless, RIS can be deployed to compensate for SE degradation caused by channel aging effects. Finally, enlarging the RIS elements size can significantly improve system performance.
- [125] arXiv:2405.09204 [pdf, ps, html, other]
-
Title: Lens functions for exploring UMAP Projections with Domain KnowledgeComments: 11 pages, 5 figures, submitted to IEEE Transactions on Visualization and Computer GraphicsSubjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Human-Computer Interaction (cs.HC)
Dimensionality reduction algorithms are often used to visualise high-dimensional data. Previously, studies have used prior information to enhance or suppress expected patterns in projections. In this paper, we adapt such techniques for domain knowledge guided interactive exploration. Inspired by Mapper and STAD, we present three types of lens functions for UMAP, a state-of-the-art dimensionality reduction algorithm. Lens functions enable analysts to adapt projections to their questions, revealing otherwise hidden patterns. They filter the modelled connectivity to explore the interaction between manually selected features and the data's structure, creating configurable perspectives each potentially revealing new insights. The effectiveness of the lens functions is demonstrated in two use cases and their computational cost is analysed in a synthetic benchmark. Our implementation is available in an open-source Python package: this https URL.
- [126] arXiv:2405.09205 [pdf, ps, html, other]
-
Title: A first look into Utiq: Next-generation cookies at the ISP levelSubjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Online privacy has become increasingly important in recent years. While third-party cookies have been widely used for years, they have also been criticized for their potential impact on user privacy. They can be used by advertisers to track users across multiple sites, allowing them to build detailed profiles of their behavior and interests. However, nowadays, many browsers allow users to block third-party cookies, which limits their usefulness for advertisers. In this paper, we take a first look at Utiq, a new way of user tracking performed directly by the ISP, to substitute the third-party cookies used until now. We study the main properties of this new identification methodology and their adoption on the 10K most popular websites. Our results show that, although still marginal due to the restrictions imposed by the system, between 0.7% and 1.2% of websites already include Utiq as one of their user identification methods.
- [127] arXiv:2405.09207 [pdf, ps, html, other]
-
Title: An Exact Theory of Causal Emergence for Linear Stochastic Iteration SystemsSubjects: Information Theory (cs.IT); Systems and Control (eess.SY)
After coarse-graining a complex system, the dynamics of its macro-state may exhibit more pronounced causal effects than those of its micro-state. This phenomenon, known as causal emergence, is quantified by the indicator of effective information. However, two challenges confront this theory: the absence of well-developed frameworks in continuous stochastic dynamical systems and the reliance on coarse-graining methodologies. In this study, we introduce an exact theoretic framework for causal emergence within linear stochastic iteration systems featuring continuous state spaces and Gaussian noise. Building upon this foundation, we derive an analytical expression for effective information across general dynamics and identify optimal linear coarse-graining strategies that maximize the degree of causal emergence when the dimension averaged uncertainty eliminated by coarse-graining has an upper bound. Our investigation reveals that the maximal causal emergence and the optimal coarse-graining methods are primarily determined by the principal eigenvalues and eigenvectors of the dynamic system's parameter matrix, with the latter not being unique. To validate our propositions, we apply our analytical models to three simplified physical systems, comparing the outcomes with numerical simulations, and consistently achieve congruent results.
- [128] arXiv:2405.09208 [pdf, ps, html, other]
-
Title: Extended time Petri netsSubjects: Formal Languages and Automata Theory (cs.FL)
In many complex systems that can be modeled using Petri nets time can be a very important factor which should be taken into account during creation and analysis of the model. Time data can describe starting moments of some actions or their duration before their immediate effects start to influence some other areas of the modeled system. Places in a Petri net often describe static components of the system, but they can also describe states. Such a state can have time restrictions, for example, telling how long it can influence other elements in the model. Time values describing some system may be inconsistent or incomplete, which can cause problems during the creation of the model. In this paper, a new extension of time Petri nets is proposed, which allows the creation of models with different types of time data, which previously were possible to be properly used in separate types of well-known time Petri nets. The proposed new time Petri net solves this problem by integrating different aspects of already existing time Petri nets into one unified net.
- [129] arXiv:2405.09212 [pdf, ps, html, other]
-
Title: SOMTP: Self-Supervised Learning-Based Optimizer for MPC-Based Safe Trajectory Planning Problems in RoboticsSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Model Predictive Control (MPC)-based trajectory planning has been widely used in robotics, and incorporating Control Barrier Function (CBF) constraints into MPC can greatly improve its obstacle avoidance efficiency. Unfortunately, traditional optimizers are resource-consuming and slow to solve such non-convex constrained optimization problems (COPs) while learning-based methods struggle to satisfy the non-convex constraints. In this paper, we propose SOMTP algorithm, a self-supervised learning-based optimizer for CBF-MPC trajectory planning. Specifically, first, SOMTP employs problem transcription to satisfy most of the constraints. Then the differentiable SLPG correction is proposed to move the solution closer to the safe set and is then converted as the guide policy in the following training process. After that, inspired by the Augmented Lagrangian Method (ALM), our training algorithm integrated with guide policy constraints is proposed to enable the optimizer network to converge to a feasible solution. Finally, experiments show that the proposed algorithm has better feasibility than other learning-based methods and can provide solutions much faster than traditional optimizers with similar optimality.
- [130] arXiv:2405.09213 [pdf, ps, html, other]
-
Title: Potential of WebAssembly for Embedded SystemsSubjects: Operating Systems (cs.OS); Programming Languages (cs.PL)
Application virtual machines provide strong isolation properties and are established in the context of software portability. Those opportunities make them interesting for scalable and secure IoT deployments. WebAssembly is an application virtual machine with origins in web browsers, that is getting rapidly adopted in other domains. The strong and steadily growing ecosystem makes WebAssembly an interesting candidate for Embedded Systems. This position paper discusses the usage of WebAssembly in Embedded Systems. After introducing the basic concepts of WebAssembly and existing runtime environments, we give an overview of the challenges for the efficient usage of WebAssembly in Embedded Systems. The paper concludes with a real world case study that demonstrates the viability, before giving an outlook on open issues and upcoming work.
- [131] arXiv:2405.09215 [pdf, ps, html, other]
-
Title: Xmodel-VLM: A Simple Baseline for Multimodal Vision Language ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
We introduce Xmodel-VLM, a cutting-edge multimodal vision language model. It is designed for efficient deployment on consumer GPU servers. Our work directly confronts a pivotal industry issue by grappling with the prohibitive service costs that hinder the broad adoption of large-scale multimodal systems. Through rigorous training, we have developed a 1B-scale language model from the ground up, employing the LLaVA paradigm for modal alignment. The result, which we call Xmodel-VLM, is a lightweight yet powerful multimodal vision language model. Extensive testing across numerous classic multimodal benchmarks has revealed that despite its smaller size and faster execution, Xmodel-VLM delivers performance comparable to that of larger models. Our model checkpoints and code are publicly available on GitHub at this https URL.
- [132] arXiv:2405.09220 [pdf, ps, html, other]
-
Title: ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
In this paper, we present the findings of our Project ALPINE which stands for ``Autoregressive Learning for Planning In NEtworks." Project ALPINE initiates a theoretical investigation into the development of planning capabilities in Transformer-based language models through their autoregressive learning mechanisms, aiming to identify any potential limitations in their planning abilities. We abstract planning as a network path-finding task where the objective is to generate a valid path from a specified source node to a designated target node. In terms of expressiveness, we show that the Transformer is capable of executing path-finding by embedding the adjacency and reachability matrices within its weights. Our theoretical analysis of the gradient-based learning dynamic of the Transformer reveals that the Transformer is capable of learning both the adjacency matrix and a limited form of the reachability matrix. These theoretical insights are then validated through experiments, which demonstrate that the Transformer indeed learns the adjacency matrix and an incomplete reachability matrix, which aligns with the predictions made in our theoretical analysis. Additionally, when applying our methodology to a real-world planning benchmark, called Blocksworld, our observations remain consistent. Our theoretical and empirical analyses further unveil a potential limitation of Transformer in path-finding: it cannot identify reachability relationships through transitivity, and thus would fail when path concatenation is needed to generate a path. In summary, our findings shed new light on how the internal mechanisms of autoregressive learning enable planning in networks. This study may contribute to our understanding of the general planning capabilities in other related domains.
- [133] arXiv:2405.09221 [pdf, ps, other]
-
Title: Bridging the gap in online hate speech detection: a comparative analysis of BERT and traditional models for homophobic content identification on X/TwitterComments: 6 pages, Homophobia detection model available at: this https URL. The dataset used for this study is available at: this https URL - This paper has been accepted by the 6th International Conference on Computing and Data Science (CONF-CDS 2024)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent underrepresentation of homophobia in detection models. Our findings reveal that while BERT outperforms traditional methods, the choice of validation technique can impact model performance. This underscores the importance of contextual understanding in detecting nuanced hate speech. By releasing the largest open-source labelled English dataset for homophobia detection known to us, an analysis of various models' performance and our strongest BERT-based model, we aim to enhance online safety and inclusivity. Future work will extend to broader LGBTQIA+ hate speech detection, addressing the challenges of sourcing diverse datasets. Through this endeavour, we contribute to the larger effort against online hate, advocating for a more inclusive digital landscape. Our study not only offers insights into the effective detection of homophobic content by improving on previous research results, but it also lays groundwork for future advancements in hate speech analysis.
- [134] arXiv:2405.09223 [pdf, ps, html, other]
-
Title: Word Alignment as Preference for Machine TranslationSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The problem of hallucination and omission, a long-standing problem in machine translation (MT), is more pronounced when a large language model (LLM) is used in MT because an LLM itself is susceptible to these phenomena. In this work, we mitigate the problem in an LLM-based MT model by guiding it to better word alignment. We first study the correlation between word alignment and the phenomena of hallucination and omission in MT. Then we propose to utilize word alignment as preference to optimize the LLM-based MT model. The preference data are constructed by selecting chosen and rejected translations from multiple MT tools. Subsequently, direct preference optimization is used to optimize the LLM-based model towards the preference signal. Given the absence of evaluators specifically designed for hallucination and omission in MT, we further propose selecting hard instances and utilizing GPT-4 to directly evaluate the performance of the models in mitigating these issues. We verify the rationality of these designed evaluation methods by experiments, followed by extensive results demonstrating the effectiveness of word alignment-based preference optimization to mitigate hallucination and omission.
- [135] arXiv:2405.09224 [pdf, ps, html, other]
-
Title: Perception-Inspired Graph Convolution for Music Understanding TasksComments: Accepted at the 33rd International Joint Conference on Artificial Intelligence (IJCAI-24)Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
We propose a new graph convolutional block, called MusGConv, specifically designed for the efficient processing of musical score data and motivated by general perceptual principles. It focuses on two fundamental dimensions of music, pitch and rhythm, and considers both relative and absolute representations of these components. We evaluate our approach on four different musical understanding problems: monophonic voice separation, harmonic analysis, cadence detection, and composer identification which, in abstract terms, translate to different graph learning problems, namely, node classification, link prediction, and graph classification. Our experiments demonstrate that MusGConv improves the performance on three of the aforementioned tasks while being conceptually very simple and efficient. We interpret this as evidence that it is beneficial to include perception-informed processing of fundamental musical concepts when developing graph network applications on musical score data.
- [136] arXiv:2405.09230 [pdf, ps, html, other]
-
Title: Reduce to the MACs -- Privacy Friendly Generic Probe RequestsSubjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Abstract. Since the introduction of active discovery in Wi-Fi networks, users can be tracked via their probe requests. Although manufacturers typically try to conceal Media Access Control (MAC) addresses using MAC address randomisation, probe requests still contain Information Elements (IEs) that facilitate device identification. This paper introduces generic probe requests: By removing all unnecessary information from IEs, the requests become indistinguishable from one another, letting single devices disappear in the largest possible anonymity set. Conducting a comprehensive evaluation, we demonstrate that a large IE set contained within undirected probe requests does not necessarily imply fast connection establishment. Furthermore, we show that minimising IEs to nothing but Supported Rates would enable 82.55% of the devices to share the same anonymity set. Our contributions provide a significant advancement in the pursuit of robust privacy solutions for wireless networks, paving the way for more user anonymity and less surveillance in wireless communication ecosystems.
- [137] arXiv:2405.09232 [pdf, ps, html, other]
-
Title: Algebraic Tools for Computing Polynomial Loop InvariantsComments: 10 pages, 1 figureSubjects: Symbolic Computation (cs.SC); Programming Languages (cs.PL); Algebraic Geometry (math.AG)
Loop invariants are properties of a program loop that hold before and after each iteration of the loop. They are often employed to verify programs and ensure that algorithms consistently produce correct results during execution. Consequently, the generation of invariants becomes a crucial task for loops. We specifically focus on polynomial loops, where both the loop conditions and assignments within the loop are expressed as polynomials. Although computing polynomial invariants for general loops is undecidable, efficient algorithms have been developed for certain classes of loops. For instance, when all assignments within a while loop involve linear polynomials, the loop becomes solvable. In this work, we study the more general case where the polynomials exhibit arbitrary degrees.
Applying tools from algebraic geometry, we present two algorithms designed to generate all polynomial invariants for a while loop, up to a specified degree. These algorithms differ based on whether the initial values of the loop variables are given or treated as parameters. Furthermore, we introduce various methods to address cases where the algebraic problem exceeds the computational capabilities of our methods. In such instances, we identify alternative approaches to generate specific polynomial invariants. - [138] arXiv:2405.09233 [pdf, ps, html, other]
-
Title: Tensor Krylov subspace methods via the T-product for large Sylvester tensor equationsSubjects: Numerical Analysis (math.NA)
In the present paper, we introduce new tensor krylov subspace methods for solving large Sylvester tensor equations. The proposed method uses the well-known T-product for tensors and tensor subspaces. We introduce some new tensor products and the related algebraic properties. These new products will enable us to develop third-order the tensor FOM (tFOM), GMRES (tGMRES), tubal Block Arnoldi and the tensor tubal Block Arnoldi method to solve large Sylvester tensor equation. We give some properties related to these method and present some numerical experiments.
- [139] arXiv:2405.09236 [pdf, ps, html, other]
-
Title: Roots in the semiring of finite deterministic dynamical systemsSubjects: Discrete Mathematics (cs.DM); Dynamical Systems (math.DS)
Finite discrete-time dynamical systems (FDDS) model phenomena that evolve deterministically in discrete time. It is possible to define sum and product operations on these systems (disjoint union and direct product, respectively) giving a commutative semiring. This algebraic structure led to several works employing polynomial equations to model hypotheses on phenomena modelled using FDDS. To solve these equations, algorithms for performing the division and computing $k$-th roots are needed. In this paper, we propose two polynomial algorithms for these tasks, under the condition that the result is a connected FDDS. This ultimately leads to an efficient solution to equations of the type $AX^k=B$ for connected $X$. These results are some of the important final steps for solving more general polynomial equations on FDDS.
- [140] arXiv:2405.09241 [pdf, ps, html, other]
-
Title: SMUG-Explain: A Framework for Symbolic Music Graph ExplanationsComments: In Proceedings of the Sound and Music Computing Conference 2024 (SMC2024), Porto, PortugalSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
In this work, we present Score MUsic Graph (SMUG)-Explain, a framework for generating and visualizing explanations of graph neural networks applied to arbitrary prediction tasks on musical scores. Our system allows the user to visualize the contribution of input notes (and note features) to the network output, directly in the context of the musical score. We provide an interactive interface based on the music notation engraving library Verovio. We showcase the usage of SMUG-Explain on the task of cadence detection in classical music. All code is available on this https URL.
- [141] arXiv:2405.09247 [pdf, ps, html, other]
-
Title: Graph Neural Network based Handwritten Trajectories RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The graph neural networks has been proved to be an efficient machine learning technique in real life applications. The handwritten recognition is one of the useful area in real life use where both offline and online handwriting recognition are required. The chain code as feature extraction technique has shown significant results in literature and we have been able to use chain codes with graph neural networks. To the best of our knowledge, this work presents first time a novel combination of handwritten trajectories features as chain codes and graph neural networks together. The handwritten trajectories for offline handwritten text has been evaluated using recovery of drawing order, whereas online handwritten trajectories are directly used with chain codes. Our results prove that present combination surpass previous results and minimize error rate in few epochs only.
- [142] arXiv:2405.09250 [pdf, ps, html, other]
-
Title: New Textual Corpora for Serbian Language ModelingSubjects: Computation and Language (cs.CL)
This paper will present textual corpora for Serbian (and Serbo-Croatian), usable for the training of large language models and publicly available at one of the several notable online repositories. Each corpus will be classified using multiple methods and its characteristics will be detailed. Additionally, the paper will introduce three new corpora: a new umbrella web corpus of Serbo-Croatian, a new high-quality corpus based on the doctoral dissertations stored within National Repository of Doctoral Dissertations from all Universities in Serbia, and a parallel corpus of abstract translation from the same source. The uniqueness of both old and new corpora will be accessed via frequency-based stylometric methods, and the results will be briefly discussed.
- [143] arXiv:2405.09251 [pdf, ps, html, other]
-
Title: Does Machine Bring in Extra Bias in Learning? Approximating Fairness in Models PromptlyComments: These two authors contributed equally and are listed in alphabetical orderSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Providing various machine learning (ML) applications in the real world, concerns about discrimination hidden in ML models are growing, particularly in high-stakes domains. Existing techniques for assessing the discrimination level of ML models include commonly used group and individual fairness measures. However, these two types of fairness measures are usually hard to be compatible with each other, and even two different group fairness measures might be incompatible as well. To address this issue, we investigate to evaluate the discrimination level of classifiers from a manifold perspective and propose a "harmonic fairness measure via manifolds (HFM)" based on distances between sets. Yet the direct calculation of distances might be too expensive to afford, reducing its practical applicability. Therefore, we devise an approximation algorithm named "Approximation of distance between sets (ApproxDist)" to facilitate accurate estimation of distances, and we further demonstrate its algorithmic effectiveness under certain reasonable assumptions. Empirical results indicate that the proposed fairness measure HFM is valid and that the proposed ApproxDist is effective and efficient.
- [144] arXiv:2405.09255 [pdf, ps, html, other]
-
Title: Reinforcement Learning-Based Framework for the Intelligent Adaptation of User InterfacesComments: To be published in Companion of the16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS Companion '24). 9 pages, 2 figures, 28 referencesSubjects: Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)
Adapting the user interface (UI) of software systems to meet the needs and preferences of users is a complex task. The main challenge is to provide the appropriate adaptations at the appropriate time to offer value to end-users. Recent advances in Machine Learning (ML) techniques may provide effective means to support the adaptation process. In this paper, we instantiate a reference framework for Intelligent User Interface Adaptation by using Reinforcement Learning (RL) as the ML component to adapt user interfaces and ultimately improving the overall User Experience (UX). By using RL, the system is able to learn from past adaptations to improve the decision-making capabilities. Moreover, assessing the success of such adaptations remains a challenge. To overcome this issue, we propose to use predictive Human-Computer Interaction (HCI) models to evaluate the outcome of each action (ie adaptations) performed by the RL agent. In addition, we present an implementation of the instantiated framework, which is an extension of OpenAI Gym, that serves as a toolkit for developing and comparing RL algorithms. This Gym environment is highly configurable and extensible to other UI adaptation contexts. The evaluation results show that our RL-based framework can successfully train RL agents able to learn how to adapt UIs in a specific context to maximize the user engagement by using an HCI model as rewards predictor.
- [145] arXiv:2405.09264 [pdf, ps, other]
-
Title: A Quantum of QUIC: Dissecting Cryptography with Post-Quantum InsightsComments: Presented at the 2024 IFIP Networking Conference (IFIP Networking)Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
QUIC is a new network protocol standardized in 2021. It was designed to replace the TCP/TLS stack and is based on UDP. The most current web standard HTTP/3 is specifically designed to use QUIC as transport protocol. QUIC claims to provide secure and fast transport with low-latency connection establishment, flow and congestion control, reliable delivery, and stream multiplexing. To achieve the security goals, QUIC enforces the usage of TLS 1.3. It uses authenticated encryption with additional data (AEAD) algorithms to not only protect the payload but also parts of the header. The handshake relies on asymmetric cryptography, which will be broken with the introduction of powerful quantum computers, making the use of post-quantum cryptography inevitable. This paper presents a detailed evaluation of the impact of cryptography on QUIC performance. The high-performance QUIC implementations LSQUIC, quiche, and MsQuic are evaluated under different aspects. We break symmetric cryptography down to the different security features. To be able to isolate the impact of cryptography, we implemented a NOOP AEAD algorithm which leaves plaintext unaltered. We show that QUIC performance increases by 10 to 20% when removing packet protection. The header protection has negligible impact on performance, especially for AES ciphers. We integrate post-quantum cryptographic algorithms into QUIC, demonstrating its feasibility without major changes to the QUIC libraries by using a TLS library that implements post-quantum algorithms. Kyber, Dilithium, and FALCON are promising candidates for post-quantum secure QUIC, as they have a low impact on the handshake duration. Algorithms like SPHINCS+ with larger key sizes or more complex calculations significantly impact the handshake duration and cause additional issues in our measurements.
- [146] arXiv:2405.09265 [pdf, ps, html, other]
-
Title: Quantum Computing Education for Computer Science Students: Bridging the Gap with Layered Learning and Intuitive AnalogiesSubjects: Emerging Technologies (cs.ET); Quantum Algebra (math.QA)
Quantum computing presents a transformative potential for the world of computing. However, integrating this technology into the curriculum for computer science students who lack prior exposure to quantum mechanics and advanced mathematics remains a challenging task. This paper proposes a scaffolded learning approach aimed at equipping computer science students with essential quantum principles. By introducing foundational quantum concepts through relatable analogies and a layered learning approach based on classical computation, this approach seeks to bridge the gap between classical and quantum computing. This differs from previous approaches which build quantum computing fundamentals from the prerequisite of linear algebra and mathematics. The paper offers a considered set of intuitive analogies for foundation quantum concepts including entanglement, superposition, quantum data structures and quantum algorithms. These analogies coupled with a computing-based layered learning approach, lay the groundwork for a comprehensive teaching methodology tailored for undergraduate third level computer science students.
- [147] arXiv:2405.09266 [pdf, ps, html, other]
-
Title: Dance Any Beat: Blending Beats with Visuals in Dance Video GenerationComments: 11 pages, 6 figures, demo page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
The task of generating dance from music is crucial, yet current methods, which mainly produce joint sequences, lead to outputs that lack intuitiveness and complicate data collection due to the necessity for precise joint annotations. We introduce a Dance Any Beat Diffusion model, namely DabFusion, that employs music as a conditional input to directly create dance videos from still images, utilizing conditional image-to-video generation principles. This approach pioneers the use of music as a conditioning factor in image-to-video synthesis. Our method unfolds in two stages: training an auto-encoder to predict latent optical flow between reference and driving frames, eliminating the need for joint annotation, and training a U-Net-based diffusion model to produce these latent optical flows guided by music rhythm encoded by CLAP. Although capable of producing high-quality dance videos, the baseline model struggles with rhythm alignment. We enhance the model by adding beat information, improving synchronization. We introduce a 2D motion-music alignment score (2D-MM Align) for quantitative assessment. Evaluated on the AIST++ dataset, our enhanced model shows marked improvements in 2D-MM Align score and established metrics. Video results can be found on our project page: this https URL.
- [148] arXiv:2405.09269 [pdf, ps, other]
-
Title: Preconceptual Modeling in Software Engineering: Metaphysics of Diagrammatic RepresentationsComments: 14 pages, 27 figuresSubjects: Software Engineering (cs.SE)
According to many researchers, conceptual model (CM) development is a hard task, and system requirements are difficult to collect, causing many miscommunication problems. CMs require more than modeling ability alone - they first require an understanding of the targeted domain that the model attempts to represent. Accordingly, a preconceptual modeling (pre-CM) stage is intended to address ontological issues before typical CM development is initiated. It involves defining a portion of reality when entities and processes are differentiated and integrated as unified wholes. This pre-CM phase forms the focus of research in this paper. The purpose is not show how to model; rather, it is to demonstrate how to establish a metaphysical basis of the involved portion of reality. To demonstrate such a venture, we employ the so-called thinging machine (TM) modeling that has been proposed as a high-level CM. A TM model integrates staticity and dynamism grounded in a fundamental construct called a thimac (things/machine). It involves two modes of reality, existence (events) and subsistence (regions - roughly, specifications of things and processes). Currently, the dominant approach in CM has evolved to limit its scope of application to develop ontological categorization (types of things). In the TM approach, pre-CM metaphysics is viewed as a part and parcel of CM itself. The general research problem is how to map TM constructs to what is out there in the targeted domain. Discussions involve the nature of thimacs (things and processes) and subsistence and existence as they are superimposed over each other in reality. Specifically, we make two claims, (a) the perceptibility of regions as a phenomenon and (b) the distinctiveness of existence as a construct for events. The results contribute to further the understanding of TM modeling in addition to introducing some metaphysical insights.
- [149] arXiv:2405.09273 [pdf, ps, html, other]
-
Title: Fair Generalized Linear Mixed ModelsComments: 25 pages, 12 figures. arXiv admin note: text overlap with arXiv:2405.06433Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
When using machine learning for automated prediction, it is important to account for fairness in the prediction. Fairness in machine learning aims to ensure that biases in the data and model inaccuracies do not lead to discriminatory decisions. E.g., predictions from fair machine learning models should not discriminate against sensitive variables such as sexual orientation and ethnicity. The training data often in obtained from social surveys. In social surveys, oftentimes the data collection process is a strata sampling, e.g. due to cost restrictions. In strata samples, the assumption of independence between the observation is not fulfilled. Hence, if the machine learning models do not account for the strata correlations, the results may be biased. Especially high is the bias in cases where the strata assignment is correlated to the variable of interest. We present in this paper an algorithm that can handle both problems simultaneously, and we demonstrate the impact of stratified sampling on the quality of fair machine learning predictions in a reproducible simulation study.
- [150] arXiv:2405.09274 [pdf, ps, html, other]
-
Title: Dynamic Activation Pitfalls in LLaMA Models: An Empirical StudySubjects: Machine Learning (cs.LG)
In this work, we systematically investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models. Despite the potential of dynamic activation methods to reduce computation and increase speed in models using the ReLU activation function, our empirical findings have uncovered several inherent pitfalls in the current dynamic activation schemes. Through extensive experiments across various dynamic activation strategies, we demonstrate that LLaMA models usually underperform when compared to their ReLU counterparts, particularly in scenarios demanding high sparsity ratio. We attribute these deficiencies to a combination of factors: 1) the inherent complexity of dynamically predicting activation heads and neurons; 2) the inadequate sparsity resulting from activation functions; 3) the insufficient preservation of information resulting from KV cache skipping. Our analysis not only sheds light on the limitations of dynamic activation in the context of large-scale LLaMA models but also proposes roadmaps for enhancing the design of future sparsity schemes.
- [151] arXiv:2405.09276 [pdf, ps, html, other]
-
Title: Dual-Segment Clustering Strategy for Federated Learning in Heterogeneous EnvironmentsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Federated learning (FL) is a distributed machine learning paradigm with high efficiency and low communication load, only transmitting parameters or gradients of network. However, the non-independent and identically distributed (Non-IID) data characteristic has a negative impact on this paradigm. Furthermore, the heterogeneity of communication quality will significantly affect the accuracy of parameter transmission, causing a degradation in the performance of the FL system or even preventing its convergence. This letter proposes a dual-segment clustering (DSC) strategy, which first clusters the clients according to the heterogeneous communication conditions and then performs a second clustering by the sample size and label distribution, so as to solve the problem of data and communication heterogeneity. Experimental results show that the DSC strategy proposed in this letter can improve the convergence rate of FL, and has superiority on accuracy in a heterogeneous environment compared with the classical algorithm of cluster.
- [152] arXiv:2405.09279 [pdf, ps, html, other]
-
Title: Sign of the Times: Evaluating the use of Large Language Models for Idiomaticity DetectionComments: Presented at the MWE-UD Workshop at LREC-COLING 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Despite the recent ubiquity of large language models and their high zero-shot prompted performance across a wide range of tasks, it is still not known how well they perform on tasks which require processing of potentially idiomatic language. In particular, how well do such models perform in comparison to encoder-only models fine-tuned specifically for idiomaticity tasks? In this work, we attempt to answer this question by looking at the performance of a range of LLMs (both local and software-as-a-service models) on three idiomaticity datasets: SemEval 2022 Task 2a, FLUTE, and MAGPIE. Overall, we find that whilst these models do give competitive performance, they do not match the results of fine-tuned task-specific models, even at the largest scales (e.g. for GPT-4). Nevertheless, we do see consistent performance improvements across model scale. Additionally, we investigate prompting approaches to improve performance, and discuss the practicalities of using LLMs for these tasks.
- [153] arXiv:2405.09281 [pdf, ps, html, other]
-
Title: Localized Attractor Computations for Infinite-State Games (Full Version)Comments: This is a full version of paper accepted at CAV 2024Subjects: Logic in Computer Science (cs.LO)
Infinite-state games are a commonly used model for the synthesis of reactive systems with unbounded data domains. Symbolic methods for solving such games need to be able to construct intricate arguments to establish the existence of winning strategies. Often, large problem instances require prohibitively complex arguments. Therefore, techniques that identify smaller and simpler sub-problems and exploit the respective results for the given game-solving task are highly desirable. In this paper, we propose the first such technique for infinite-state games. The main idea is to enhance symbolic game-solving with the results of localized attractor computations performed in sub-games. The crux of our approach lies in identifying useful sub-games by computing permissive winning strategy templates in finite abstractions of the infinite-state game. The experimental evaluation of our method demonstrates that it outperforms existing techniques and is applicable to infinite-state games beyond the state of the art.
- [154] arXiv:2405.09282 [pdf, ps, html, other]
-
Title: Three-Dimensional Path Planning: Navigating through Rough MereologyComments: number of pages: 16, number of figures: 10Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
In this paper, we present an innovative technique for the path planning of flying robots in a 3D environment in Rough Mereology terms. The main goal was to construct the algorithm that would generate the mereological potential fields in 3-dimensional space. To avoid falling into the local minimum, we assist with a weighted Euclidean distance. Moreover, a searching path from the start point to the target, with respect to avoiding the obstacles was applied. The environment was created by connecting two cameras working in real-time. To determine the gate and elements of the world inside the map was responsible the Python Library OpenCV [1] which recognized shapes and colors. The main purpose of this paper is to apply the given results to drones.
- [155] arXiv:2405.09285 [pdf, ps, other]
-
Title: Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator LearningSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Operator learning for Partial Differential Equations (PDEs) is rapidly emerging as a promising approach for surrogate modeling of intricate systems. Transformers with the self-attention mechanism$\unicode{x2013}$a powerful tool originally designed for natural language processing$\unicode{x2013}$have recently been adapted for operator learning. However, they confront challenges, including high computational demands and limited interpretability. This raises a critical question: Is there a more efficient attention mechanism for Transformer-based operator learning? This paper proposes the Position-induced Transformer (PiT), built on an innovative position-attention mechanism, which demonstrates significant advantages over the classical self-attention in operator learning. Position-attention draws inspiration from numerical methods for PDEs. Different from self-attention, position-attention is induced by only the spatial interrelations of sampling positions for input functions of the operators, and does not rely on the input function values themselves, thereby greatly boosting efficiency. PiT exhibits superior performance over current state-of-the-art neural operators in a variety of complex operator learning tasks across diverse PDE benchmarks. Additionally, PiT possesses an enhanced discretization convergence feature, compared to the widely-used Fourier neural operator.
- [156] arXiv:2405.09286 [pdf, ps, html, other]
-
Title: MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space BindingSubjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
Recent years have witnessed the rapid development of short videos, which usually contain both visual and audio modalities. Background music is important to the short videos, which can significantly influence the emotions of the viewers. However, at present, the background music of short videos is generally chosen by the video producer, and there is a lack of automatic music recommendation methods for short videos. This paper introduces MVBind, an innovative Music-Video embedding space Binding model for cross-modal retrieval. MVBind operates as a self-supervised approach, acquiring inherent knowledge of intermodal relationships directly from data, without the need of manual annotations. Additionally, to compensate the lack of a corresponding musical-visual pair dataset for short videos, we construct a dataset, SVM-10K(Short Video with Music-10K), which mainly consists of meticulously selected short videos. On this dataset, MVBind manifests significantly improved performance compared to other baseline methods. The constructed dataset and code will be released to facilitate future research.
- [157] arXiv:2405.09288 [pdf, ps, html, other]
-
Title: DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual ExplanationsComments: Accepted to Medical Imaging with Deep Learning (MIDL) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning classifiers are prone to latching onto dominant confounders present in a dataset rather than on the causal markers associated with the target class, leading to poor generalization and biased predictions. Although explainability via counterfactual image generation has been successful at exposing the problem, bias mitigation strategies that permit accurate explainability in the presence of dominant and diverse artifacts remain unsolved. In this work, we propose the DeCoDEx framework and show how an external, pre-trained binary artifact detector can be leveraged during inference to guide a diffusion-based counterfactual image generator towards accurate explainability. Experiments on the CheXpert dataset, using both synthetic artifacts and real visual artifacts (support devices), show that the proposed method successfully synthesizes the counterfactual images that change the causal pathology markers associated with Pleural Effusion while preserving or ignoring the visual artifacts. Augmentation of ERM and Group-DRO classifiers with the DeCoDEx generated images substantially improves the results across underrepresented groups that are out of distribution for each class. The code is made publicly available at this https URL.
- [158] arXiv:2405.09291 [pdf, ps, html, other]
-
Title: Sensitivity Decouple Learning for Image Compression Artifacts ReductionComments: Accepted by Transactions on Image ProcessingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a mapping from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing tasks. Different from these methods, we propose to decouple the intrinsic attributes into two complementary features for artifacts reduction,ie, the compression-insensitive features to regularize the high-level semantic representations during training and the compression-sensitive features to be aware of the compression degree. To achieve this, we first employ adversarial training to regularize the compressed and original encoded features for retaining high-level semantics, and we then develop the compression quality-aware feature encoder for compression-sensitive features. Based on these dual complementary features, we propose a Dual Awareness Guidance Network (DAGN) to utilize these awareness features as transformation guidance during the decoding phase. In our proposed DAGN, we develop a cross-feature fusion module to maintain the consistency of compression-insensitive features by fusing compression-insensitive features into the artifacts reduction baseline. Our method achieves an average 2.06 dB PSNR gains on BSD500, outperforming state-of-the-art methods, and only requires 29.7 ms to process one image on BSD500. Besides, the experimental results on LIVE1 and LIU4K also demonstrate the efficiency, effectiveness, and superiority of the proposed method in terms of quantitative metrics, visual quality, and downstream machine vision tasks.
- [159] arXiv:2405.09292 [pdf, ps, html, other]
-
Title: Attribute reduction algorithm of rough sets based on spatial optimizationComments: 7 pages, 2 figures, 1 tableSubjects: Artificial Intelligence (cs.AI)
Rough set is one of the important methods for rule acquisition and attribute reduction. The current goal of rough set attribute reduction focuses more on minimizing the number of reduced attributes, but ignores the spatial similarity between reduced and decision attributes, which may lead to problems such as increased number of rules and limited generality. In this paper, a rough set attribute reduction algorithm based on spatial optimization is proposed. By introducing the concept of spatial similarity, to find the reduction with the highest spatial similarity, so that the spatial similarity between reduction and decision attributes is higher, and more concise and widespread rules are obtained. In addition, a comparative experiment with the traditional rough set attribute reduction algorithms is designed to prove the effectiveness of the rough set attribute reduction algorithm based on spatial optimization, which has made significant improvements on many datasets.
- [160] arXiv:2405.09293 [pdf, ps, html, other]
-
Title: Do language models capture implied discourse meanings? An investigation with exhaustivity implicatures of Korean morphologyComments: Proceedings of the Society for Computation in Linguistics (SCiL) 2024, Association for Computational Linguistics (ACL) AnthologySubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Markedness in natural language is often associated with non-literal meanings in discourse. Differential Object Marking (DOM) in Korean is one instance of this phenomenon, where post-positional markers are selected based on both the semantic features of the noun phrases and the discourse features that are orthogonal to the semantic features. Previous work has shown that distributional models of language recover certain semantic features of words -- do these models capture implied discourse-level meanings as well? We evaluate whether a set of large language models are capable of associating discourse meanings with different object markings in Korean. Results suggest that discourse meanings of a grammatical marker can be more challenging to encode than that of a discourse marker.
- [161] arXiv:2405.09296 [pdf, ps, html, other]
-
Title: Tight Bounds for Online Convex Optimization with Adversarial ConstraintsSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring small cumulative constraint violation (CCV) against an adaptive adversary. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. We establish this result by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.
- [162] arXiv:2405.09300 [pdf, ps, html, other]
-
Title: Comparing the Efficacy of GPT-4 and Chat-GPT in Mental Health Care: A Blind Assessment of Large Language Models for Psychological SupportSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Background: Rapid advancements in natural language processing have led to the development of large language models with the potential to revolutionize mental health care. These models have shown promise in assisting clinicians and providing support to individuals experiencing various psychological challenges.
Objective: This study aims to compare the performance of two large language models, GPT-4 and Chat-GPT, in responding to a set of 18 psychological prompts, to assess their potential applicability in mental health care settings.
Methods: A blind methodology was employed, with a clinical psychologist evaluating the models' responses without knowledge of their origins. The prompts encompassed a diverse range of mental health topics, including depression, anxiety, and trauma, to ensure a comprehensive assessment.
Results: The results demonstrated a significant difference in performance between the two models (p > 0.05). GPT-4 achieved an average rating of 8.29 out of 10, while Chat-GPT received an average rating of 6.52. The clinical psychologist's evaluation suggested that GPT-4 was more effective at generating clinically relevant and empathetic responses, thereby providing better support and guidance to potential users.
Conclusions: This study contributes to the growing body of literature on the applicability of large language models in mental health care settings. The findings underscore the importance of continued research and development in the field to optimize these models for clinical use. Further investigation is necessary to understand the specific factors underlying the performance differences between the two models and to explore their generalizability across various populations and mental health conditions. - [163] arXiv:2405.09304 [pdf, ps, html, other]
-
Title: Kolmogorov complexity as a combinatorial toolComments: Prepared as an special session invited talk at CiE 2024Subjects: Discrete Mathematics (cs.DM); Information Theory (cs.IT); Combinatorics (math.CO)
Kolmogorov complexity is often used as a convenient language for counting and/or probabilistic existence proofs. However, there are some applications where Kolmogorov complexity is used in a more subtle way. We provide one (somehow) surprising example where an existence of a winning strategy in a natural combinatorial game is proven (and no direct proof is known).
- [164] arXiv:2405.09305 [pdf, ps, html, other]
-
Title: Gradient Boosted Filters For Signal ProcessingComments: 9 pages, 12 figures. Submitted to ICML 2024 and subsequently rejected for insufficient evaluationSubjects: Machine Learning (cs.LG)
Gradient boosted decision trees have achieved remarkable success in several domains, particularly those that work with static tabular data. However, the application of gradient boosted models to signal processing is underexplored. In this work, we introduce gradient boosted filters for dynamic data, by employing Hammerstein systems in place of decision trees. We discuss the relationship of our approach to the Volterra series, providing the theoretical underpinning for its application. We demonstrate the effective generalizability of our approach with examples.
- [165] arXiv:2405.09306 [pdf, ps, html, other]
-
Title: Words Blending Boxes. Obfuscating Queries in Information Retrieval using Differential PrivacyComments: Preprint submitted to Information Science journalSubjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR)
Ensuring the effectiveness of search queries while protecting user privacy remains an open issue. When an Information Retrieval System (IRS) does not protect the privacy of its users, sensitive information may be disclosed through the queries sent to the system. Recent improvements, especially in NLP, have shown the potential of using Differential Privacy to obfuscate texts while maintaining satisfactory effectiveness. However, such approaches may protect the user's privacy only from a theoretical perspective while, in practice, the real user's information need can still be inferred if perturbed terms are too semantically similar to the original ones. We overcome such limitations by proposing Word Blending Boxes, a novel differentially private mechanism for query obfuscation, which protects the words in the user queries by employing safe boxes. To measure the overall effectiveness of the proposed WBB mechanism, we measure the privacy obtained by the obfuscation process, i.e., the lexical and semantic similarity between original and obfuscated queries. Moreover, we assess the effectiveness of the privatized queries in retrieving relevant documents from the IRS. Our findings indicate that WBB can be integrated effectively into existing IRSs, offering a key to the challenge of protecting user privacy from both a theoretical and a practical point of view.
- [166] arXiv:2405.09308 [pdf, ps, html, other]
-
Title: TimeX++: Learning Time-Series Explanations with Information BottleneckZichuan Liu, Tianchun Wang, Jimeng Shi, Xu Zheng, Zhuomin Chen, Lei Song, Wenqian Dong, Jayantha Obeysekera, Farhad Shirani, Dongsheng LuoComments: Accepted by International Conference on Machine Learning (ICML 2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To address these issues, we introduce a simple yet practical objective function for time series explainable learning. The design of the objective function builds upon the principle of information bottleneck (IB), and modifies the IB objective function to avoid trivial solutions and distributional shift issues. We further present TimeX++, a novel explanation framework that leverages a parametric network to produce explanation-embedded instances that are both in-distributed and label-preserving. We evaluate TimeX++ on both synthetic and real-world datasets comparing its performance against leading baselines, and validate its practical efficacy through case studies in a real-world environmental application. Quantitative and qualitative evaluations show that TimeX++ outperforms baselines across all datasets, demonstrating a substantial improvement in explanation quality for time series data. The source code is available at \url{this https URL}.
- [167] arXiv:2405.09309 [pdf, ps, html, other]
-
Title: Identification via Binary Uniform Permutation ChannelComments: 9 pages. Extended version of submission to ITW 2024Subjects: Information Theory (cs.IT)
We study message identification over the binary uniform permutation channels. For DMCs, the number of identifiable messages grows doubly exponentially. Identification capacity, the maximum second-order exponent, is known to be the same as the Shannon capacity of a DMC. We consider a binary uniform permutation channel where the transmitted vector is permuted by a permutation chosen uniformly at random. Permutation channels support reliable communication of only polynomially many messages. While this implies a zero second-order identification rate, we prove a soft converse result showing that even non-zero first-order identification rates are not achievable with a power-law decay of error probability for identification over binary uniform permutation channels. To prove the converse, we use a sequence of steps to construct a new identification code with a simpler structure and then use a lower bound on the normalized maximum pairwise intersection of a set system on {0, . . . , n}. We provide generalizations for arbitrary alphabet size.
- [168] arXiv:2405.09310 [pdf, ps, html, other]
-
Title: GrainGrasp: Dexterous Grasp Generation with Fine-grained Contact GuidanceComments: This paper is accepted by the ICRA2024Subjects: Robotics (cs.RO)
One goal of dexterous robotic grasping is to allow robots to handle objects with the same level of flexibility and adaptability as humans. However, it remains a challenging task to generate an optimal grasping strategy for dexterous hands, especially when it comes to delicate manipulation and accurate adjustment the desired grasping poses for objects of varying shapes and sizes. In this paper, we propose a novel dexterous grasp generation scheme called \textbf{\textit{GrainGrasp}} that provides fine-grained contact guidance for each fingertip. In particular, we employ a generative model to predict separate contact maps for each fingertip on the object point cloud, effectively capturing the specifics of finger-object interactions. In addition, we develop a new dexterous grasping optimization algorithm that solely relies on the point cloud as input, eliminating the necessity for complete mesh information of the object. By leveraging the contact maps of different fingertips, the proposed optimization algorithm can generate precise and determinable strategies for human-like object grasping. Experimental results confirm the efficiency of the proposed scheme. Our code is available at this https URL
- [169] arXiv:2405.09312 [pdf, ps, html, other]
-
Title: Agnostic Active Learning of Single Index Models with Linear Sample ComplexitySubjects: Machine Learning (cs.LG)
We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}^d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting.
We provide two main results on agnostic active learning of single index models. First, when $f$ is known and Lipschitz, we show that $\tilde{O}(d)$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent ${O}(d^{2})$ bound of \cite{gajjar2023active}. Second, we show that $\tilde{O}(d)$ samples suffice even in the more difficult setting when $f$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions. - [170] arXiv:2405.09314 [pdf, ps, html, other]
-
Title: Themis: Automatic and Efficient Deep Learning System Testing with Strong Fault Detection CapabilitySubjects: Software Engineering (cs.SE)
Deep Learning Systems (DLSs) have been widely applied in safety-critical tasks such as autopilot. However, when a perturbed input is fed into a DLS for inference, the DLS often has incorrect outputs (i.e., faults). DLS testing techniques (e.g., DeepXplore) detect such faults by generating perturbed inputs to explore data flows that induce faults. Since a DLS often has infinitely many data flows, existing techniques require developers to manually specify a set of activation values in a DLS's neurons for exploring fault-inducing data flows. Unfortunately, recent studies show that such manual effort is tedious and can detect only a tiny proportion of fault-inducing data flows.
In this paper, we present Themis, the first automatic DLS testing system, which attains strong fault detection capability by ensuring a full coverage of fault-inducing data flows at a high probability. Themis carries a new workflow for automatically and systematically revealing data flows whose internal neurons' outputs vary substantially when the inputs are slightly perturbed, as these data flows are likely fault-inducing. We evaluated Themis on ten different DLSs and found that on average the number of faults detected by Themis was 3.78X more than four notable DLS testing techniques. By retraining all evaluated DLSs with the detected faults, Themis also increased (regained) these DLSs' accuracies on average 14.7X higher than all baselines. - [171] arXiv:2405.09317 [pdf, ps, html, other]
-
Title: Controllability Test for Nonlinear Datatic SystemsSubjects: Systems and Control (eess.SY)
Controllability is a fundamental property of control systems, serving as the prerequisite for controller design. While controllability test is well established in modelic (i.e., model-driven) control systems, extending it to datatic (i.e., data-driven) control systems is still a challenging task due to the absence of system models. In this study, we propose a general controllability test method for nonlinear systems with datatic description, where the system behaviors are merely described by data. In this situation, the state transition information of a dynamic system is available only at a limited number of data points, leaving the behaviors beyond these points unknown. Different from traditional exact controllability, we introduce a new concept called $\epsilon$-controllability, which extends the definition from point-to-point form to point-to-region form. Accordingly, our focus shifts to checking whether the system state can be steered to a closed state ball centered on the target state, rather than exactly at that target state. On its basis, we propose a tree search algorithm called maximum expansion of controllable subset (MECS) to identify controllable states in the dataset. Starting with a specific target state, our algorithm can iteratively propagate controllability from a known state ball to a new one. This iterative process gradually enlarges the $\epsilon$-controllable subset by incorporating new controllable balls until all $\epsilon$-controllable states are searched. Besides, a simplified version of MECS is proposed by solving a special shortest path problem, called Floyd expansion with radius fixed (FERF). FERF maintains a fixed radius of all controllable balls based on a mutual controllability assumption of neighboring states. The effectiveness of our method is validated in three datatic control systems whose dynamic behaviors are described by sampled data.
- [172] arXiv:2405.09318 [pdf, ps, html, other]
-
Title: Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System CallsComments: Submitted to IEEE MILCOM 2024Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
In the current cybersecurity landscape, protecting military devices such as communication and battlefield management systems against sophisticated cyber attacks is crucial. Malware exploits vulnerabilities through stealth methods, often evading traditional detection mechanisms such as software signatures. The application of ML/DL in vulnerability detection has been extensively explored in the literature. However, current ML/DL vulnerability detection methods struggle with understanding the context and intent behind complex attacks. Integrating large language models (LLMs) with system call analysis offers a promising approach to enhance malware detection. This work presents a novel framework leveraging LLMs to classify malware based on system call data. The framework uses transfer learning to adapt pre-trained LLMs for malware detection. By retraining LLMs on a dataset of benign and malicious system calls, the models are refined to detect signs of malware activity. Experiments with a dataset of over 1TB of system calls demonstrate that models with larger context sizes, such as BigBird and Longformer, achieve superior accuracy and F1-Score of approximately 0.86. The results highlight the importance of context size in improving detection rates and underscore the trade-offs between computational complexity and performance. This approach shows significant potential for real-time detection in high-stakes environments, offering a robust solution to evolving cyber threats.
- [173] arXiv:2405.09321 [pdf, ps, html, other]
-
Title: ReconBoost: Boosting Can Achieve Modality ReconcilementComments: This paper has been accepted by ICML2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current paradigms of multi-modal learning tend to explore multi-modal features simultaneously. The resulting gradient prohibits further exploitation of the features in the weak modality, leading to modality competition, where the dominant modality overpowers the learning process. To address this issue, we study the modality-alternating learning paradigm to achieve reconcilement. Specifically, we propose a new method called ReconBoost to update a fixed modality each time. Herein, the learning objective is dynamically adjusted with a reconcilement regularization against competition with the historical models. By choosing a KL-based reconcilement, we show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others and help enhance the overall performance. The major difference with the classic GB is that we only preserve the newest model for each modality to avoid overfitting caused by ensembling strong learners. Furthermore, we propose a memory consolidation scheme and a global rectification scheme to make this strategy more effective. Experiments over six multi-modal benchmarks speak to the efficacy of the method. We release the code at this https URL.
- [174] arXiv:2405.09324 [pdf, ps, html, other]
-
Title: Learning Coarse-Grained Dynamics on GraphComments: 33 pages, 12 figuresSubjects: Numerical Analysis (math.NA); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
We consider a Graph Neural Network (GNN) non-Markovian modeling framework to identify coarse-grained dynamical systems on graphs. Our main idea is to systematically determine the GNN architecture by inspecting how the leading term of the Mori-Zwanzig memory term depends on the coarse-grained interaction coefficients that encode the graph topology. Based on this analysis, we found that the appropriate GNN architecture that will account for $K$-hop dynamical interactions has to employ a Message Passing (MP) mechanism with at least $2K$ steps. We also deduce that the memory length required for an accurate closure model decreases as a function of the interaction strength under the assumption that the interaction strength exhibits a power law that decays as a function of the hop distance. Supporting numerical demonstrations on two examples, a heterogeneous Kuramoto oscillator model and a power system, suggest that the proposed GNN architecture can predict the coarse-grained dynamics under fixed and time-varying graph topologies.
- [175] arXiv:2405.09328 [pdf, ps, html, other]
-
Title: WENO scheme on characteristics for the equilibrium dispersive model of chromatography with generalized Langmuir isothermsJournal-ref: Applied Numerical Mathematics 201 (2024) 247-264Subjects: Numerical Analysis (math.NA)
Column chromatography is a laboratory and industrial technique used to separate different substances mixed in a solution. Mathematically, it can be modelled using non-linear partial differential equations whose main ingredients are the adsorption isotherms, which are non-linear functions modelling the affinity between the different substances in the solution and the solid stationary phase filling the column. The goal of this work is twofold. Firstly, we aim to extend the techniques of Donat, Guerrero and Mulet (Appl. Numer. Math. 123 (2018) 22-42) to other adsorption isotherms. In particular, we propose a family of generalized Langmuir-type isotherms and prove that the correspondence between the concentrations of solutes in the liquid phase (the primitive variables) and the conserved variables is well defined and admits a global smooth inverse that can be computed numerically. Secondly, to establish the well-posedness of the mathematical model, we study the eigenstructure of the Jacobian of the mentioned correspondence and use this characteristic information to get oscillation-free sharp interfaces on the numerical approximate solutions. To do so, we determine the structure of the Jacobian matrix of the system and use it to deduce its eigenstructure. We combine the use of characteristic-based numerical fluxes with a second-order implicit-explicit scheme proposed in the cited reference and perform some numerical experiments with Tóth's adsorption isotherms to demonstrate that the characteristic-based schemes produce accurate numerical solutions with no oscillations, even when steep gradients appear in the solutions.
- [176] arXiv:2405.09330 [pdf, ps, other]
-
Title: BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point DetectionComments: This paper has been accepted to FSE'24Subjects: Software Engineering (cs.SE)
Detecting failures and identifying their root causes promptly and accurately is crucial for ensuring the availability of microservice systems. A typical failure troubleshooting pipeline for microservices consists of two phases: anomaly detection and root cause analysis. While various existing works on root cause analysis require accurate anomaly detection, there is no guarantee of accurate estimation with anomaly detection techniques. Inaccurate anomaly detection results can significantly affect the root cause localization results. To address this challenge, we propose BARO, an end-to-end approach that integrates anomaly detection and root cause analysis for effectively troubleshooting failures in microservice systems. BARO leverages the Multivariate Bayesian Online Change Point Detection technique to model the dependency within multivariate time-series metrics data, enabling it to detect anomalies more accurately. BARO also incorporates a novel nonparametric statistical hypothesis testing technique for robustly identifying root causes, which is less sensitive to the accuracy of anomaly detection compared to existing works. Our comprehensive experiments conducted on three popular benchmark microservice systems demonstrate that BARO consistently outperforms state-of-the-art approaches in both anomaly detection and root cause analysis.
- [177] arXiv:2405.09333 [pdf, ps, html, other]
-
Title: Application of Gated Recurrent Units for CT Trajectory OptimizationComments: 4 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advances in computed tomography (CT) imaging, especially with dual-robot systems, have introduced new challenges for scan trajectory optimization. This paper presents a novel approach using Gated Recurrent Units (GRUs) to optimize CT scan trajectories. Our approach exploits the flexibility of robotic CT systems to select projections that enhance image quality by improving resolution and contrast while reducing scan time. We focus on cone-beam CT and employ several projection-based metrics, including absorption, pixel intensities, contrast-to-noise ratio, and data completeness. The GRU network aims to minimize data redundancy and maximize completeness with a limited number of projections. We validate our method using simulated data of a test specimen, focusing on a specific voxel of interest. The results show that the GRU-optimized scan trajectories can outperform traditional circular CT trajectories in terms of image quality metrics. For the used specimen, SSIM improves from 0.38 to 0.49 and CNR increases from 6.97 to 9.08. This finding suggests that the application of GRU in CT scan trajectory optimization can lead to more efficient, cost-effective, and high-quality imaging solutions.
- [178] arXiv:2405.09334 [pdf, ps, html, other]
-
Title: Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark StudyComments: 23 pages, 9 Figures, 13 TablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
While content-based image retrieval (CBIR) has been extensively studied in natural image retrieval, its application to medical images presents ongoing challenges, primarily due to the 3D nature of medical images. Recent studies have shown the potential use of pre-trained vision embeddings for CBIR in the context of radiology image retrieval. However, a benchmark for the retrieval of 3D volumetric medical images is still lacking, hindering the ability to objectively evaluate and compare the efficiency of proposed CBIR approaches in medical imaging. In this study, we extend previous work and establish a benchmark for region-based and multi-organ retrieval using the TotalSegmentator dataset (TS) with detailed multi-organ annotations. We benchmark embeddings derived from pre-trained supervised models on medical images against embeddings derived from pre-trained unsupervised models on non-medical images for 29 coarse and 104 detailed anatomical structures in volume and region levels. We adopt a late interaction re-ranking method inspired by text matching for image retrieval, comparing it against the original method proposed for volume and region retrieval achieving retrieval recall of 1.0 for diverse anatomical regions with a wide size range. The findings and methodologies presented in this paper provide essential insights and benchmarks for the development and evaluation of CBIR approaches in the context of medical imaging.
- [179] arXiv:2405.09335 [pdf, ps, html, other]
-
Title: Prompting-based Synthetic Data Generation for Few-Shot Question AnsweringComments: LREC-COLING 2024Subjects: Computation and Language (cs.CL)
Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the domain they were trained on. Since annotation is costly, we argue that domain-agnostic knowledge from LMs, such as linguistic understanding, is sufficient to create a well-curated dataset. With this motivation, we show that using large language models can improve Question Answering performance on various datasets in the few-shot setting compared to state-of-the-art approaches. For this, we perform data generation leveraging the Prompting framework, suggesting that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme. As a result, we consistently outperform previous approaches on few-shot Question Answering.
- [180] arXiv:2405.09336 [pdf, ps, html, other]
-
Title: Analytical Characterization of the Operational Diversity Order in Fading ChannelsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
We introduce and characterize the operational diversity order (ODO) in fading channels, as a proxy to the classical notion of diversity order at any arbitrary operational signal-to-noise ratio (SNR). Thanks to this definition, relevant insights are brought up in a number of cases: (i) We quantify that in line-of-sight scenarios an increased diversity order is attainable compared to that achieved asymptotically; (ii) this effect is attenuated, but still visible, in the presence of an additional dominant specular component; (iii) we confirm that the decay slope in Rayleigh product channels increases very slowly and never fully achieves unitary slope for finite values of SNR.
- [181] arXiv:2405.09338 [pdf, ps, html, other]
-
Title: Interval Selection in Sliding WindowsComments: 22 pages, 6 figuresSubjects: Data Structures and Algorithms (cs.DS)
We initiate the study of the Interval Selection problem in the (streaming) sliding window model of computation.
In this problem, an algorithm receives a potentially infinite stream of intervals on the line, and the objective is to maintain at every moment an approximation to a largest possible subset of disjoint intervals among the $L$ most recent intervals, for some integer $L$.
We give the following results:
- In the unit-length intervals case, we give a $2$-approximation sliding window algorithm with space $\tilde{\mathrm{O}}(|OPT|)$, and we show that any sliding window algorithm that computes a $(2-\varepsilon)$-approximation requires space $\Omega(L)$, for any $\varepsilon > 0$.
- In the arbitrary-length case, we give a $(\frac{11}{3}+\varepsilon)$-approximation sliding window algorithm with space $\tilde{\mathrm{O}}(|OPT|)$, for any constant $\varepsilon > 0$, which constitutes our main result.
We also show that space $\Omega(L)$ is needed for algorithms that compute a $(2.5-\varepsilon)$-approximation, for any $\varepsilon > 0$.
Our main technical contribution is an improvement over the smooth histogram technique, which consists of running independent copies of a traditional streaming algorithm with different start times. By employing the one-pass $2$-approximation streaming algorithm by Cabello and Pérez-Lantero [Theor. Comput. Sci. '17] for \textsf{Interval Selection} on arbitrary-length intervals as the underlying algorithm, the smooth histogram technique immediately yields a $(4+\varepsilon)$-approximation in this setting. Our improvement is obtained by forwarding the structure of the intervals identified in a run to the subsequent run, which constrains the shape of an optimal solution and allows us to target optimal intervals differently. - [182] arXiv:2405.09341 [pdf, ps, html, other]
-
Title: Large Language Model Bias Mitigation from the Perspective of Knowledge EditingSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Existing debiasing methods inevitably make unreasonable or undesired predictions as they are designated and evaluated to achieve parity across different social groups but leave aside individual facts, resulting in modified existing knowledge. In this paper, we first establish a new bias mitigation benchmark BiasKE leveraging existing and additional constructed datasets, which systematically assesses debiasing performance by complementary metrics on fairness, specificity, and generalization. Meanwhile, we propose a novel debiasing method, Fairness Stamp (FAST), which enables editable fairness through fine-grained calibration on individual biased knowledge. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with remarkable debiasing performance while not hampering overall model capability for knowledge preservation, highlighting the prospect of fine-grained debiasing strategies for editable fairness in LLMs.
- [183] arXiv:2405.09342 [pdf, ps, html, other]
-
Title: Progressive Depth Decoupling and Modulating for Flexible Depth CompletionComments: The article is accepted by IEEE Transactions on Instrumentation & MeasurementSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image-guided depth completion aims at generating a dense depth map from sparse LiDAR data and RGB image. Recent methods have shown promising performance by reformulating it as a classification problem with two sub-tasks: depth discretization and probability prediction. They divide the depth range into several discrete depth values as depth categories, serving as priors for scene depth distributions. However, previous depth discretization methods are easy to be impacted by depth distribution variations across different scenes, resulting in suboptimal scene depth distribution priors. To address the above problem, we propose a progressive depth decoupling and modulating network, which incrementally decouples the depth range into bins and adaptively generates multi-scale dense depth maps in multiple stages. Specifically, we first design a Bins Initializing Module (BIM) to construct the seed bins by exploring the depth distribution information within a sparse depth map, adapting variations of depth distribution. Then, we devise an incremental depth decoupling branch to progressively refine the depth distribution information from global to local. Meanwhile, an adaptive depth modulating branch is developed to progressively improve the probability representation from coarse-grained to fine-grained. And the bi-directional information interactions are proposed to strengthen the information interaction between those two branches (sub-tasks) for promoting information complementation in each branch. Further, we introduce a multi-scale supervision mechanism to learn the depth distribution information in latent features and enhance the adaptation capability across different scenes. Experimental results on public datasets demonstrate that our method outperforms the state-of-the-art methods. The code will be open-sourced at [this https URL](this https URL).
- [184] arXiv:2405.09344 [pdf, ps, html, other]
-
Title: Measurements of Building Attenuation in 450 MHz LTE NetworksComments: Author's version of a paper accepted for publication in Proceedings of the 28th ITG-Symposium Mobile Communication - Technologies and ApplicationsSubjects: Networking and Internet Architecture (cs.NI)
This work reports on a measurement study to estimate the attenuation of 450 MHz LTE networks. The LTE band 72 is currently deployed in Germany, in particular for smart grid applications. Due to this use-case, we assume that a significant amount of future devices will be deployed stationary and indoor which motivated our campaign. We designed a custom measurement device which uses commercial off-the-shelf hardware to assess the downlink RSRP of a public mobile network. In addition, a software has been developed to provide non-experts the possibility to conduct these measurements in the future. This software provides the possibility to determine the indoor position based on ground plans. We conducted measurements at three different buildings. Our results reveal, that the building attenuation of 450 MHz LTE networks is highly heterogeneous and mainly depends on the type of the building, the indoor position and in particular the height of the floor where the device is located.
- [185] arXiv:2405.09355 [pdf, ps, html, other]
-
Title: Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose PredictionGary Sarwin, Alessandro Carretta, Victor Staartjes, Matteo Zoli, Diego Mazzatenta, Luca Regli, Carlo Serra, Ender KonukogluComments: Early Accept at MICCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Localizing oneself during endoscopic procedures can be problematic due to the lack of distinguishable textures and landmarks, as well as difficulties due to the endoscopic device such as a limited field of view and challenging lighting conditions. Expert knowledge shaped by years of experience is required for localization within the human body during endoscopic procedures. In this work, we present a deep learning method based on anatomy recognition, that constructs a surgical path in an unsupervised manner from surgical videos, modelling relative location and variations due to different viewing angles. At inference time, the model can map an unseen video's frames on the path and estimate the viewing angle, aiming to provide guidance, for instance, to reach a particular destination. We test the method on a dataset consisting of surgical videos of transsphenoidal adenomectomies, as well as on a synthetic dataset. An online tool that lets researchers upload their surgical videos to obtain anatomy detections and the weights of the trained YOLOv7 model are available at: this https URL.
- [186] arXiv:2405.09356 [pdf, ps, html, other]
-
Title: Branch-and-price with novel cuts, and a new Stackelberg Security GameSubjects: Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
Anticipating the strategies of potential attackers is crucial for protecting critical infrastructure. We can represent the challenge of the defenders of such infrastructure as a Stackelberg security game. The defender must decide how to allocate limited resources to protect specific targets, aiming to maximize their expected utility (such as minimizing the extent of damage) and considering that attackers will respond in a way that is most advantageous to them.
We present novel valid inequalities to find a Strong Stackelberg Equilibrium in both Stackelberg games and Stackelberg security games. We also consider a Stackelberg security game that aims to protect targets with a defined budget. We use branch-and-price in this game to show that our approach outperforms the standard formulation in the literature, and we conduct an extensive computational study to analyze the impact of various branch-and-price parameters on the performance of our method in different game settings. - [187] arXiv:2405.09357 [pdf, ps, html, other]
-
Title: A universal optimization framework based on cycle ranking for influence maximization in complex networksSubjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Influence maximization aims to identify a set of influential individuals, referred to as influencers, as information sources to maximize the spread of information within networks, constituting a vital combinatorial optimization problem with extensive practical applications and sustained interdisciplinary interest. Diverse approaches have been devised to efficiently address this issue, one of which involves selecting the influencers from a given centrality ranking. In this paper, we propose a novel optimization framework based on ranking basic cycles in networks, capable of selecting the influencers from diverse centrality measures. The experimental results demonstrate that, compared to directly selecting the top-k nodes from centrality sequences and other state-of-the-art optimization approaches, the new framework can expand the dissemination range by 1.5 to 3 times. Counterintuitively, it exhibits minimal hub property, with the average distance between influencers being only one-third of alternative approaches, regardless of the centrality metrics or network types. Our study not only paves the way for novel strategies in influence maximization but also underscores the unique potential of underappreciated cycle structures.
- [188] arXiv:2405.09359 [pdf, ps, html, other]
-
Title: Visual Attention Based Cognitive Human-Robot Collaboration for Pedicle Screw Placement in Robot-Assisted Orthopedic SurgeryComments: 7 pages, 8 figures, submitted to IROS 2024Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
Current orthopedic robotic systems largely focus on navigation, aiding surgeons in positioning a guiding tube but still requiring manual drilling and screw placement. The automation of this task not only demands high precision and safety due to the intricate physical interactions between the surgical tool and bone but also poses significant risks when executed without adequate human oversight. As it involves continuous physical interaction, the robot should collaborate with the surgeon, understand the human intent, and always include the surgeon in the loop. To achieve this, this paper proposes a new cognitive human-robot collaboration framework, including the intuitive AR-haptic human-robot interface, the visual-attention-based surgeon model, and the shared interaction control scheme for the robot. User studies on a robotic platform for orthopedic surgery are presented to illustrate the performance of the proposed method. The results demonstrate that the proposed human-robot collaboration framework outperforms full robot and full human control in terms of safety and ergonomics.
- [189] arXiv:2405.09360 [pdf, ps, html, other]
-
Title: The Unfairness of $\varepsilon$-FairnessSubjects: Machine Learning (cs.LG); Theoretical Economics (econ.TH); Mathematical Finance (q-fin.MF); Machine Learning (stat.ML)
Fairness in decision-making processes is often quantified using probabilistic metrics. However, these metrics may not fully capture the real-world consequences of unfairness. In this article, we adopt a utility-based approach to more accurately measure the real-world impacts of decision-making process. In particular, we show that if the concept of $\varepsilon$-fairness is employed, it can possibly lead to outcomes that are maximally unfair in the real-world context. Additionally, we address the common issue of unavailable data on false negatives by proposing a reduced setting that still captures essential fairness considerations. We illustrate our findings with two real-world examples: college admissions and credit risk assessment. Our analysis reveals that while traditional probability-based evaluations might suggest fairness, a utility-based approach uncovers the necessary actions to truly achieve equality. For instance, in the college admission case, we find that enhancing completion rates is crucial for ensuring fairness. Summarizing, this paper highlights the importance of considering the real-world context when evaluating fairness.
- [190] arXiv:2405.09365 [pdf, ps, html, other]
-
Title: SARATR-X: A Foundation Model for Synthetic Aperture Radar Images Target RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Synthetic aperture radar (SAR) is essential in actively acquiring information for Earth observation. SAR Automatic Target Recognition (ATR) focuses on detecting and classifying various target categories under different image conditions. The current deep learning-based SAR ATR methods are typically designed for specific datasets and applications. Various target characteristics, scene background information, and sensor parameters across ATR datasets challenge the generalization of those methods. This paper aims to achieve general SAR ATR based on a foundation model with Self-Supervised Learning (SSL). Our motivation is to break through the specific dataset and condition limitations and obtain universal perceptual capabilities across the target, scene, and sensor. A foundation model named SARATR-X is proposed with the following four aspects: pre-training dataset, model backbone, SSL, and evaluation task. First, we integrated 14 datasets with various target categories and imaging conditions as a pre-training dataset. Second, different model backbones were discussed to find the most suitable approaches for remote-sensing images. Third, we applied two-stage training and SAR gradient features to ensure the diversity and scalability of SARATR-X. Finally, SARATR-X has achieved competitive and superior performance on 5 datasets with 8 task settings, which shows that the foundation model can achieve universal SAR ATR. We believe it is time to embrace fundamental models for SAR image interpretation in the era of increasing big data.
- [191] arXiv:2405.09367 [pdf, ps, html, other]
-
Title: Efficient WENO schemes for nonuniform gridsSubjects: Numerical Analysis (math.NA)
A set of arbitrarily high-order WENO schemes for reconstructions on nonuniform grids is presented. These non-linear interpolation methods use simple smoothness indicators with a linear cost with respect to the order, making them easy to implement and computationally efficient. The theoretical analysis to verify the accuracy and the essentially non-oscillatory properties are presented together with some numerical experiments involving algebraic problems in order to validate them. Also, these general schemes are applied for the solution of conservation laws and hyperbolic systems in the context of finite volume methods.
- [192] arXiv:2405.09369 [pdf, ps, html, other]
-
Title: Diffusion-based Contrastive Learning for Sequential RecommendationSubjects: Information Retrieval (cs.IR)
Contrastive learning has been effectively applied to alleviate the data sparsity issue and enhance recommendation performance.The majority of existing methods employ random augmentation to generate augmented views of original sequences. The learning objective then aims to minimize the distance between representations of different views for the same user. However, these random augmentation strategies (e.g., mask or substitution) neglect the semantic consistency of different augmented views for the same user, leading to semantically inconsistent sequences with similar representations. Furthermore, most augmentation methods fail to utilize context information, which is critical for understanding sequence semantics. To address these limitations, we introduce a diffusion-based contrastive learning approach for sequential recommendation. Specifically, given a user sequence, we first select some positions and then leverage context information to guide the generation of alternative items via a guided diffusion model. By repeating this approach, we can get semantically consistent augmented views for the same user, which are used to improve the effectiveness of contrastive learning. To maintain cohesion between the representation spaces of both the diffusion model and the recommendation model, we train the entire framework in an end-to-end fashion with shared item embeddings. Extensive experiments on five benchmark datasets demonstrate the superiority of our proposed method.
- [193] arXiv:2405.09372 [pdf, ps, html, other]
-
Title: Investigating the Effect of Operation Mode and Manifestation on Physicalizations of Dynamic ProcessesSubjects: Human-Computer Interaction (cs.HC)
We conducted a study to systematically investigate the communication of complex dynamic processes along a two-dimensional design space, where the axes represent a representation's manifestation (physical or virtual) and operation (manual or automatic). We exemplify the design space on a model embodying cardiovascular pathologies, represented by a mechanism where a liquid is pumped into a draining vessel, with complications illustrated through modifications to the model. The results of a mixed-methods lab study with 28 participants show that both physical manifestation and manual operation have a strong positive impact on the audience's engagement. The study does not show a measurable knowledge increase with respect to cardiovascular pathologies using manually operated physical representations. However, subjectively, participants report a better understanding of the process-mainly through non-visual cues like haptics, but also auditory cues. The study also indicates an increased task load when interacting with the process, which, however, seems to play a minor role for the participants. Overall, the study shows a clear potential of physicalization for the communication of complex dynamic processes, which only fully unfold if observers have to chance to interact with the process.
- [194] arXiv:2405.09373 [pdf, ps, html, other]
-
Title: PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language ModelsSubjects: Computation and Language (cs.CL)
Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages. We overcome the scarcity of naturally occurring toxicity in web-text and ensure coverage across languages with varying resources by automatically scraping over 100M web-text documents. Using PTP, we investigate research questions to study the impact of model size, prompt language, and instruction and preference-tuning methods on toxicity by benchmarking over 60 LLMs. Notably, we find that toxicity increases as language resources decrease or model size increases. Although instruction- and preference-tuning reduce toxicity, the choice of preference-tuning method does not have any significant impact. Our findings shed light on crucial shortcomings of LLM safeguarding and highlight areas for future research.
- [195] arXiv:2405.09375 [pdf, ps, html, other]
-
Title: VascularPilot3D: Toward a 3D fully autonomous navigation for endovascular roboticsSong Jingwei, Yang Keke, Chen Han, Liu Jiayi, Gu Yinan, Hui Qianxin, Huang Yanqi, Li Meng, Zhang Zheng, Cao Tuoyu, Ghaffari MaaniComments: Submitted to MICCAI2024Subjects: Robotics (cs.RO)
This research reports VascularPilot3D, the first 3D fully autonomous endovascular robot navigation system. As an exploration toward autonomous guidewire navigation, VascularPilot3D is developed as a complete navigation system based on intra-operative imaging systems (fluoroscopic X-ray in this study) and typical endovascular robots. VascularPilot3D adopts previously researched fast 3D-2D vessel registration algorithms and guidewire segmentation methods as its perception modules. We additionally propose three modules: a topology-constrained 2D-3D instrument end-point lifting method, a tree-based fast path planning algorithm, and a prior-free endovascular navigation strategy. VascularPilot3D is compatible with most mainstream endovascular robots. Ex-vivo experiments validate that VascularPilot3D achieves 100% success rate among 25 trials. It reduces the human surgeon's overall control loops by 18.38%. VascularPilot3D is promising for general clinical autonomous endovascular navigations.
- [196] arXiv:2405.09391 [pdf, ps, html, other]
-
Title: Compositional imprecise probabilityComments: Draft. Feedback welcomeSubjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO); Category Theory (math.CT); Probability (math.PR)
Imprecise probability is concerned with uncertainty about which probability distributions to use. It has applications in robust statistics and elsewhere. Imprecise probability can be modelled in various ways, including by convex sets of probability distributions.
We look at programming language models for imprecise probability. Our desiderata are that we would like our model to support all kinds of composition, categorical and monoidal, in other words, guided by dataflow diagrams. Another equivalent perspective is that we would like a model of synthetic probability in the sense of Markov categories.
There is already a fairly popular monad-based approach to imprecise probability, but it is not fully compositional because the monad involved is not commutative, which means that we do not have a proper monoidal structure. In this work, we provide a new fully compositional account. The key idea is to name the non-deterministic choices. To manage the renamings and disjointness of names, we use graded monads. We show that the resulting compositional model is maximal. We relate with the earlier monad approach, showing that we obtain tighter bounds on the uncertainty. - [197] arXiv:2405.09393 [pdf, ps, html, other]
-
Title: Counting overlapping pairs of stringsComments: - HAL: lirmm-04576588v1 - this https URL - 16 pages, 1 figure, 2 tables, appendix A, B, C - Mots clés: corrélation, chevauchement, bordure, combinatoire, limites, espérance, population, algorithmique, treillis - Keywords: correlation, overlap, border, counting, bounds, expectation, Asymptotics, limits, population, stringology, latticeSubjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
A correlation is a binary vector that encodes all possible positions of overlaps of two words, where an overlap for an ordered pair of words (u,v) occurs if a suffix of word u matches a prefix of word v. As multiple pairs can have the same correlation, it is relevant to count how many pairs of words share the same correlation depending on the alphabet size and word length n. We exhibit recurrences to compute the number of such pairs -- which is termed population size -- for any correlation; for this, we exploit a relationship between overlaps of two words and self-overlap of one word. This theorem allows us to compute the number of pairs with a longest overlap of a given length and to show that the expected length of the longest border of two words asymptotically diverges, which solves two open questions raised by Gabric in 2022. Finally, we also provide bounds for the asymptotic of the population ratio of any correlation. Given the importance of word overlaps in areas like word combinatorics, bioinformatics, and digital communication, our results may ease analyses of algorithms for string processing, code design, or genome assembly.
- [198] arXiv:2405.09394 [pdf, ps, html, other]
-
Title: SA-FedLora: Adaptive Parameter Allocation for Efficient Federated Learning with LoRA TuningSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Fine-tuning large-scale pre-trained models via transfer learning is an emerging important paradigm for a wide range of downstream tasks, with performance heavily reliant on extensive data. Federated learning (FL), as a distributed framework, provides a secure solution to train models on local datasets while safeguarding raw sensitive data. However, FL networks encounter high communication costs due to the massive parameters of large-scale pre-trained models, necessitating parameter-efficient methods. Notably, parameter efficient fine tuning, such as Low-Rank Adaptation (LoRA), has shown remarkable success in fine-tuning pre-trained models. However, prior research indicates that the fixed parameter budget may be prone to the overfitting or slower convergence. To address this challenge, we propose a Simulated Annealing-based Federated Learning with LoRA tuning (SA-FedLoRA) approach by reducing trainable parameters. Specifically, SA-FedLoRA comprises two stages: initiating and annealing. (1) In the initiating stage, we implement a parameter regularization approach during the early rounds of aggregation, aiming to mitigate client drift and accelerate the convergence for the subsequent tuning. (2) In the annealing stage, we allocate higher parameter budget during the early 'heating' phase and then gradually shrink the budget until the 'cooling' phase. This strategy not only facilitates convergence to the global optimum but also reduces communication costs. Experimental results demonstrate that SA-FedLoRA is an efficient FL, achieving superior performance to FedAvg and significantly reducing communication parameters by up to 93.62%.
- [199] arXiv:2405.09396 [pdf, ps, html, other]
-
Title: $O_2$ is a multiple context-free grammar: an implementation-, formalisation-friendly proofComments: dlt 2024Subjects: Formal Languages and Automata Theory (cs.FL); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Logic (math.LO)
Classifying formal languages according to the expressiveness of grammars able to generate them is a fundamental problem in computational linguistics and, therefore, in the theory of computation. Furthermore, such kind of analysis can give insight into the classification of abstract algebraic structure such as groups, for example through the correspondence given by the word problem. While many such classification problems remain open, others have been settled. Recently, it was proved that $n$-balanced languages (i.e., whose strings contain the same occurrences of letters $a_i$ and $A_i$ with $1\leq i \leq n$) can be generated by multiple context-free grammars (MCFGs), which are one of the several slight extensions of context free grammars added to the classical Chomsky hierarchy to make the mentioned classification more precise. This paper analyses the existing proofs from the computational and the proof-theoretical point of views, systematically studying whether each proof can lead to a verified (i.e., checked by a proof assistant) algorithm parsing balanced languages via MCFGs. We conclude that none of the existing proofs is realistically suitable against this practical goal, and proceed to provide a radically new, elementary, extremely short proof for the crucial case $n \leq 2$. A comparative analysis with respect to the existing proofs is finally performed to justify why the proposed proof is a substantial step towards concretely obtaining a verified parsing algorithm for $O_2$.
- [200] arXiv:2405.09398 [pdf, ps, html, other]
-
Title: Encrypted Container File: Design and Implementation of a Hybrid-Encrypted Multi-Recipient File StructureComments: 7 pages, for associated implementation etc., see this https URLJournal-ref: Proc of the 14th International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2023), Nice, France, June 2023, pp. 1-7, ISSN 2308-4294Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Modern software engineering trends towards Cloud-native software development by international teams of developers. Cloud-based version management services, such as GitHub, are used for the source code and other artifacts created during the development process. However, using such a service usually means that every developer has access to all data stored on the platform. Particularly, if the developers belong to different companies or organizations, it would be desirable for sensitive files to be encrypted in such a way that these can only be decrypted again by a group of previously defined people. In this paper, we examine currently available tools that address this problem, but which have certain shortcomings. We then present our own solution, Encrypted Container Files (ECF), for this problem, eliminating the deficiencies found in the other tools.
- [201] arXiv:2405.09400 [pdf, ps, html, other]
-
Title: Flow updates for domain decomposition of entropic optimal transportSubjects: Numerical Analysis (math.NA)
Domain decomposition has been shown to be a computationally efficient distributed method for solving large scale entropic optimal transport problems. However, a naive implementation of the algorithm can freeze in the limit of very fine partition cells (i.e.~it asymptotically becomes stationary and does not find the global minimizer), since information can only travel slowly between cells. In practice this can be avoided by a coarse-to-fine multiscale scheme. In this article we introduce flow updates as an alternative approach. Flow updates can be interpreted as a variant of the celebrated algorithm by Angenent, Haker, and Tannenbaum, and can be combined canonically with domain decomposition. We prove convergence to the global minimizer and provide a formal discussion of its continuity limit. We give a numerical comparison with naive and multiscale domain decomposition, and show that the hybrid method does not suffer from freezing in the regime of very many cells. While the multiscale scheme is observed to be faster than the hybrid approach in general, the latter could be a viable alternative in cases where a good initial coupling is available. Our numerical experiments are based on a novel GPU implementation of domain decomposition that we describe in the appendix.
- [202] arXiv:2405.09403 [pdf, ps, html, other]
-
Title: Identity Overlap Between Face Recognition Train/Test Data: Causing Optimistic Bias in Accuracy MeasurementSubjects: Computer Vision and Pattern Recognition (cs.CV)
A fundamental tenet of pattern recognition is that overlap between training and testing sets causes an optimistic accuracy estimate. Deep CNNs for face recognition are trained for N-way classification of the identities in the training set. Accuracy is commonly estimated as average 10-fold classification accuracy on image pairs from test sets such as LFW, CALFW, CPLFW, CFP-FP and AgeDB-30. Because train and test sets have been independently assembled, images and identities in any given test set may also be present in any given training set. In particular, our experiments reveal a surprising degree of identity and image overlap between the LFW family of test sets and the MS1MV2 training set. Our experiments also reveal identity label noise in MS1MV2. We compare accuracy achieved with same-size MS1MV2 subsets that are identity-disjoint and not identity-disjoint with LFW, to reveal the size of the optimistic bias. Using more challenging test sets from the LFW family, we find that the size of the optimistic bias is larger for more challenging test sets. Our results highlight the lack of and the need for identity-disjoint train and test methodology in face recognition research.
- [203] arXiv:2405.09404 [pdf, ps, html, other]
-
Title: Time-Equivariant Contrastive Learning for Degenerative Disease Progression in Retinal OCTTaha Emre, Arunava Chakravarty, Dmitrii Lachinov, Antoine Rivail, Ursula Schmidt-Erfurth, Hrvoje BogunovićComments: Accepted at MICCAI 2024 (early accept, top 11%)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Contrastive pretraining provides robust representations by ensuring their invariance to different image transformations while simultaneously preventing representational collapse. Equivariant contrastive learning, on the other hand, provides representations sensitive to specific image transformations while remaining invariant to others. By introducing equivariance to time-induced transformations, such as disease-related anatomical changes in longitudinal imaging, the model can effectively capture such changes in the representation space. In this work, we pro-pose a Time-equivariant Contrastive Learning (TC) method. First, an encoder embeds two unlabeled scans from different time points of the same patient into the representation space. Next, a temporal equivariance module is trained to predict the representation of a later visit based on the representation from one of the previous visits and the corresponding time interval with a novel regularization loss term while preserving the invariance property to irrelevant image transformations. On a large longitudinal dataset, our model clearly outperforms existing equivariant contrastive methods in predicting progression from intermediate age-related macular degeneration (AMD) to advanced wet-AMD within a specified time-window.
- [204] arXiv:2405.09405 [pdf, ps, html, other]
-
Title: On identifying the non-linear dynamics of a hovercraft using an end-to-end deep learning approachSubjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
We present the identification of the non-linear dynamics of a novel hovercraft design, employing end-to-end deep learning techniques. Our experimental setup consists of a hovercraft propelled by racing drone propellers mounted on a lightweight foam base, allowing it to float and be controlled freely on an air hockey table. We learn parametrized physics-inspired non-linear models directly from data trajectories, leveraging gradient-based optimization techniques prevalent in machine learning research. The chosen model structure allows us to control the position of the hovercraft precisely on the air hockey table. We then analyze the prediction performance and demonstrate the closed-loop control performance on the real system.
- [205] arXiv:2405.09406 [pdf, ps, html, other]
-
Title: Bounded-Memory Strategies in Partial-Information GamesSubjects: Computer Science and Game Theory (cs.GT)
We study the computational complexity of solving stochastic games with mean-payoff objectives. Instead of identifying special classes in which simple strategies are sufficient to play $\epsilon$-optimally, or form $\epsilon$-Nash equilibria, we consider general partial-information multiplayer games and ask what can be achieved with (and against) finite-memory strategies up to a {given} bound on the memory. We show $NP$-hardness for approximating zero-sum values, already with respect to memoryless strategies and for 1-player reachability games. On the other hand, we provide upper bounds for solving games of any fixed number of players $k$. We show that one can decide in polynomial space if, for a given $k$-player game, $\epsilon\ge 0$ and bound $b$, there exists an $\epsilon$-Nash equilibrium in which all strategies use at most $b$ memory modes. For given $\epsilon>0$, finding an $\epsilon$-Nash equilibrium with respect to $b$-bounded strategies can be done in $FN[NP]$. Similarly for 2-player zero-sum games, finding a $b$-bounded strategy that, against all $b$-bounded opponent strategies, guarantees an outcome within $\epsilon$ of a given value, can be done in $FNP[NP]$. Our constructions apply to parity objectives with minimal simplifications. Our results improve the status quo in several well-known special cases of games. In particular, for $2$-player zero-sum concurrent mean-payoff games, one can approximate ordinary zero-sum values (without restricting admissible strategies) in $FNP[NP]$.
- [206] arXiv:2405.09408 [pdf, ps, html, other]
-
Title: A velocity-based moving mesh Discontinuous Galerkin method for the advection-diffusion equationComments: 20 pages, 2 figures, Submitted to SINUM on 16/05/2024, not yet reviewed (15/05/2024)Subjects: Numerical Analysis (math.NA)
In convection-dominated flows, robustness of the spatial discretisation is a key property. While Interior Penalty Galerkin (IPG) methods already proved efficient in the situation of large mesh Peclet numbers, Arbitrary Lagrangian-Eulerian (ALE) methods are able to reduce the convection-dominance by moving the mesh. In this paper, we introduce and analyse a velocity-based moving mesh discontinuous Galerkin method for the solution of the linear advection-diffusion equation. By introducing a smooth parameterized velocity $\tilde{V}$ that separates the flow into a mean flow, also called moving mesh velocity, and a remaining advection field $V-\tilde{V}$, we made a convergence analysis based on the smoothness of the mesh velocity. Furthermore, the reduction of the advection speed improves the stability of an explicit time-stepping and the use of the nonconservative ALE formulation changes the coercivity condition. Finally, by adapting the existing robust error criteria to this moving mesh situation, we derived robust \textit{a posteriori} error criteria that describe the potentially small deviation to the mean flow and include the information of a transition towards $V=\tilde{V}$.
- [207] arXiv:2405.09409 [pdf, ps, other]
-
Title: Real-World Federated Learning in Radiology: Hurdles to overcome and Benefits to gainMarkus R. Bujotzek, Ünal Akünal, Stefan Denner, Peter Neher, Maximilian Zenk, Eric Frodl, Astha Jaiswal, Moon Kim, Nicolai R. Krekiehn, Manuel Nickel, Richard Ruppel, Marcus Both, Felix Döllinger, Marcel Opitz, Thorsten Persigehl, Jens Kleesiek, Tobias Penzkofer, Klaus Maier-Hein, Rickmer Braren, Andreas BucherSubjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Objective: Federated Learning (FL) enables collaborative model training while keeping data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles, leaving behind a significant knowledge gap. Minding efforts to implement real-world FL, there is a notable lack of comprehensive assessment comparing FL to less complex alternatives. Materials & Methods: We extensively reviewed FL literature, categorizing insights along with our findings according to their nature and phase while establishing a FL initiative, summarized to a comprehensive guide. We developed our own FL infrastructure within the German Radiological Cooperative Network (RACOON) and demonstrated its functionality by training FL models on lung pathology segmentation tasks across six university hospitals. We extensively evaluated FL against less complex alternatives in three distinct evaluation scenarios. Results: The proposed guide outlines essential steps, identified hurdles, and proposed solutions for establishing successful FL initiatives conducting real-world experiments. Our experimental results show that FL outperforms less complex alternatives in all evaluation scenarios, justifying the effort required to translate FL into real-world applications. Discussion & Conclusion: Our proposed guide aims to aid future FL researchers in circumventing pitfalls and accelerating translation of FL into radiological applications. Our results underscore the value of efforts needed to translate FL into real-world applications by demonstrating advantageous performance over alternatives, and emphasize the importance of strategic organization, robust management of distributed data and infrastructure in real-world settings.
- [208] arXiv:2405.09412 [pdf, ps, html, other]
-
Title: Distinguishing Tor From Other Encrypted Network Traffic Through Character AnalysisComments: 5 pagesJournal-ref: Proc of the 15th International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2024), Venice, Italy, May 2024, pp. 8-12, ISSN 2308-4294Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
For journalists reporting from a totalitarian regime, whistleblowers and resistance fighters, the anonymous use of cloud services on the Internet can be vital for survival. The Tor network provides a free and widely used anonymization service for everyone. However, there are different approaches to distinguishing Tor from non-Tor encrypted network traffic, most recently only due to the (relative) frequencies of hex digits in a single encrypted payload packet. While conventional data traffic is usually encrypted once, but at least three times in the case of Tor due to the structure and principle of the Tor network, we have examined to what extent the number of encryptions contributes to being able to distinguish Tor from non-Tor encrypted data traffic.
- [209] arXiv:2405.09414 [pdf, ps, html, other]
-
Title: Improving the convergence analysis of linear subdivision schemesSubjects: Numerical Analysis (math.NA)
This work presents several new results concerning the analysis of the convergence of binary, univariate, and linear subdivision schemes, all related to the contractivity factor of a convergent scheme. First, we prove that a convergent scheme cannot have a contractivity factor lower than half. Since the lower this factor is, the faster the scheme's convergence, and schemes with contractivity factor $\frac{1}{2}$, such as those generating spline functions, have optimal convergence rates.
Additionally, we provide further insights and conditions for the convergence of linear schemes and demonstrate their applicability in an improved algorithm for determining the convergence of such subdivision schemes. - [210] arXiv:2405.09415 [pdf, ps, html, other]
-
Title: On the Correspondence of Non-flat Assumption-based Argumentation and Logic Programming with Negation as Failure in the HeadSubjects: Artificial Intelligence (cs.AI)
The relation between (a fragment of) assumption-based argumentation (ABA) and logic programs (LPs) under stable model semantics is well-studied. However, for obtaining this relation, the ABA framework needs to be restricted to being flat, i.e., a fragment where the (defeasible) assumptions can never be entailed, only assumed to be true or false. Here, we remove this restriction and show a correspondence between non-flat ABA and LPs with negation as failure in their head. We then extend this result to so-called set-stable ABA semantics, originally defined for the fragment of non-flat ABA called bipolar ABA. We showcase how to define set-stable semantics for LPs with negation as failure in their head and show the correspondence to set-stable ABA semantics.
- [211] arXiv:2405.09423 [pdf, ps, other]
-
Title: MicroPython Testbed for Federated Learning AlgorithmsComments: 20 pages, 6 figures, 12 tables, the extended paper preprintSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Recently, Python Testbed for Federated Learning Algorithms emerged as a low code and generative large language models amenable framework for developing decentralized and distributed applications, primarily targeting edge systems, by nonprofessional programmers with the help of emerging artificial intelligence tools. This light framework is written in pure Python to be easy to install and to fit into a small IoT memory. It supports formally verified generic centralized and decentralized federated learning algorithms, as well as the peer-to-peer data exchange used in time division multiplexing communication, and its current main limitation is that all the application instances can run only on a single PC. This paper presents the MicroPyton Testbed for Federated Learning Algorithms, the new framework that overcomes its predecessor's limitation such that individual application instances may run on different network nodes like PCs and IoTs, primarily in edge systems. The new framework carries on the pure Python ideal, is based on asynchronous I/O abstractions, and runs on MicroPython, and therefore is a great match for IoTs and devices in edge systems. The new framework was experimentally validated on a wireless network comprising PCs and Raspberry Pi Pico W boards, by using application examples originally developed for the predecessor framework.
- [212] arXiv:2405.09424 [pdf, ps, html, other]
-
Title: On backward problem for a time-fractional fourth order parabolic equationComments: Comments are welcome!Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
This paper is concerned with the inverse problem of retrieving the initial value of a time-fractional fourth order parabolic equation from source and final time observation. The considered problem is an {\it ill-posed problem.} We obtain regularized approximations for the sought initial value by employing the quasi-boundary value method, its modified version and by Fourier truncation method(FTM). We provide both the apriori and aposteriori parameter choice strategies and derive the error estimates for all these methods under some {\it source conditions} involving some Sobolev smoothness. As an important implication of the obtained rates, we observe that for both the apriori and aposteriori cases, the rates obtained by all these three methods are same for some source sets. Moreover, we observe that in both the apriori and aposteriori cases, the FTM is free from the so-called {\it saturation effect}, whereas both the quasi-boundary value method and its generalizations possesses the saturation effect for both the cases. Further, we observe that the rates obtained by the FTM is always order optimal for all the considered source sets.
- [213] arXiv:2405.09425 [pdf, ps, html, other]
-
Title: Robust Covariance-Based Activity Detection for Massive AccessComments: 5 pages, 11 figures. Asilomar SSC 2023 ConferenceSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
The wireless channel is undergoing continuous changes, and the block-fading assumption, despite its popularity in theoretical contexts, never holds true in practical scenarios. This discrepancy is particularly critical for user activity detection in grant-free random access, where joint processing across multiple resource blocks is usually undesirable. In this paper, we propose employing a low-dimensional approximation of the channel to capture variations over time and frequency and robustify activity detection algorithms. This approximation entails projecting channel fading vectors onto their principal directions to minimize the approximation order. Through numerical examples, we demonstrate a substantial performance improvement achieved by the resulting activity detection algorithm.
- [214] arXiv:2405.09426 [pdf, ps, html, other]
-
Title: Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated ImagesMemoona Aziz (Western University, Canada), Umair Rehman (Western University, Canada), Muhammad Umair Danish (Western University, Canada), Katarina Grolinger (Western University, Canada)Comments: 10 pages, 3 figures. Submitted to IEEE Transactions on Human-Machine SystemsSubjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces the Global-Local Image Perceptual Score (GLIPS), an image metric designed to assess the photorealistic image quality of AI-generated images with a high degree of alignment to human visual perception. Traditional metrics such as FID and KID scores do not align closely with human evaluations. The proposed metric incorporates advanced transformer-based attention mechanisms to assess local similarity and Maximum Mean Discrepancy (MMD) to evaluate global distributional similarity. To evaluate the performance of GLIPS, we conducted a human study on photorealistic image quality. Comprehensive tests across various generative models demonstrate that GLIPS consistently outperforms existing metrics like FID, SSIM, and MS-SSIM in terms of correlation with human scores. Additionally, we introduce the Interpolative Binning Scale (IBS), a refined scaling method that enhances the interpretability of metric scores by aligning them more closely with human evaluative standards. The proposed metric and scaling approach not only provides more reliable assessments of AI-generated images but also suggest pathways for future enhancements in image generation technologies.
- [215] arXiv:2405.09428 [pdf, ps, html, other]
-
Title: Physics-Informed Neural Network for Multirotor Slung Load Systems ModelingGil Serrano, Marcelo Jacinto, Jose Ribeiro-Gomes, Joao Pinto, Bruno J. Guerreiro, Alexandre Bernardino, Rita CunhaSubjects: Robotics (cs.RO)
Recent advances in aerial robotics have enabled the use of multirotor vehicles for autonomous payload transportation. Resorting only to classical methods to reliably model a quadrotor carrying a cable-slung load poses significant challenges. On the other hand, purely data-driven learning methods do not comply by design with the problem's physical constraints, especially in states that are not densely represented in training data. In this work, we explore the use of physics informed neural networks to learn an end-to-end model of the multirotor-slung-load system and, at a given time, estimate a sequence of the future system states. An LSTM encoder decoder with an attention mechanism is used to capture the dynamics of the system. To guarantee the cohesiveness between the multiple predicted states of the system, we propose the use of a physics-based term in the loss function, which includes a discretized physical model derived from first principles together with slack variables that allow for a small mismatch between expected and predicted values. To train the model, a dataset using a real-world quadrotor carrying a slung load was curated and is made available. Prediction results are presented and corroborate the feasibility of the approach. The proposed method outperforms both the first principles physical model and a comparable neural network model trained without the physics regularization proposed.
- [216] arXiv:2405.09430 [pdf, ps, html, other]
-
Title: Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of BanditsSubjects: Systems and Control (eess.SY)
In recent years, the integration of communication and control systems has gained significant traction in various domains, ranging from autonomous vehicles to industrial automation and beyond. Multi-armed bandit (MAB) algorithms have proven their effectiveness as a robust framework for solving control problems. In this work, we investigate the use of MAB algorithms to control remote devices, which faces considerable challenges primarily represented by latency and reliability. We analyze the effectiveness of MABs operating in environments where the action feedback from controlled devices is transmitted over an unreliable communication channel and stored in a Geo/Geo/1 queue. We investigate the impact of queue sampling strategies on the MAB performance, and introduce a new stochastic approach. Its performance in terms of regret is evaluated against established algorithms in the literature for both upper confidence bound (UCB) and Thompson Sampling (TS) algorithms. Additionally, we study the trade-off between maximizing rewards and minimizing energy consumption.
- [217] arXiv:2405.09431 [pdf, ps, html, other]
-
Title: A Survey On Text-to-3D Contents Generation In The WildComments: 11 pages, 10 figures, 4 tables. arXiv admin note: text overlap with arXiv:2401.17807 by other authorsSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
3D content creation plays a vital role in various applications, such as gaming, robotics simulation, and virtual reality. However, the process is labor-intensive and time-consuming, requiring skilled designers to invest considerable effort in creating a single 3D asset. To address this challenge, text-to-3D generation technologies have emerged as a promising solution for automating 3D creation. Leveraging the success of large vision language models, these techniques aim to generate 3D content based on textual descriptions. Despite recent advancements in this area, existing solutions still face significant limitations in terms of generation quality and efficiency. In this survey, we conduct an in-depth investigation of the latest text-to-3D creation methods. We provide a comprehensive background on text-to-3D creation, including discussions on datasets employed in training and evaluation metrics used to assess the quality of generated 3D models. Then, we delve into the various 3D representations that serve as the foundation for the 3D generation process. Furthermore, we present a thorough comparison of the rapidly growing literature on generative pipelines, categorizing them into feedforward generators, optimization-based generation, and view reconstruction approaches. By examining the strengths and weaknesses of these methods, we aim to shed light on their respective capabilities and limitations. Lastly, we point out several promising avenues for future research. With this survey, we hope to inspire researchers further to explore the potential of open-vocabulary text-conditioned 3D content creation.
- [218] arXiv:2405.09438 [pdf, ps, html, other]
-
Title: Perturbed Integrators Chain Control via Barrier Function Adaptation and Lyapunov RedesignComments: 12 pages, 9 figuresSubjects: Systems and Control (eess.SY)
Lyapunov redesign is a classical technique that uses a nominal control and its corresponding nominal Lyapunov function to design a discontinuous control, such that it compensates the uncertainties and disturbances. In this paper, the idea of Lyapunov redesign is used to propose an adaptive time-varying gain controller to stabilize a class of perturbed chain of integrators with an unknown control coefficient. It is assumed that the upper bound of the perturbation exists but is unknown. A proportional navigation feedback type gain is used to drive the system's trajectories into a prescribed vicinity of the origin in a predefined time, measured using a quadratic Lyapunov function. Once this neighborhood is reached, a barrier function-based gain is used, ensuring that the system's trajectories never leave this neighborhood despite uncertainties and perturbations. Experimental validation of the proposed controller in Furuta's pendulum is presented.
- [219] arXiv:2405.09439 [pdf, ps, html, other]
-
Title: Facilitating Opinion Diversity through Hybrid NLP ApproachesComments: Accepted at NAACL 2024, Student Research WorkshopSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Modern democracies face a critical issue of declining citizen participation in decision-making. Online discussion forums are an important avenue for enhancing citizen participation. This thesis proposal 1) identifies the challenges involved in facilitating large-scale online discussions with Natural Language Processing (NLP), 2) suggests solutions to these challenges by incorporating hybrid human-AI technologies, and 3) investigates what these technologies can reveal about individual perspectives in online discussions. We propose a three-layered hierarchy for representing perspectives that can be obtained by a mixture of human intelligence and large language models. We illustrate how these representations can draw insights into the diversity of perspectives and allow us to investigate interactions in online discussions.
- [220] arXiv:2405.09442 [pdf, ps, html, other]
-
Title: Network Function Capacity Reconnaissance by Remote AdversariesSubjects: Networking and Internet Architecture (cs.NI)
There is anecdotal evidence that attackers use reconnaissance to learn the capacity of their victims before DDoS attacks to maximize their impact. The first step to mitigate capacity reconnaissance attacks is to understand their feasibility. However, the feasibility of capacity reconnaissance in network functions (NFs) (e.g., firewalls, NATs) is unknown. To this end, we formulate the problem of network function capacity reconnaissance (NFCR) and explore the feasibility of inferring the processing capacity of an NF while avoiding detection. We identify key factors that make NFCR challenging and analyze how these factors affect accuracy (measured as a divergence from ground truth) and stealthiness (measured in packets sent). We propose a flexible tool, NFTY, that performs NFCR and we evaluate two practical NFTY configurations to showcase the stealthiness vs. accuracy tradeoffs. We evaluate these strategies in controlled, Internet and/or cloud settings with commercial NFs. NFTY can accurately estimate the capacity of different NF deployments within 10% error in the controlled experiments and the Internet, and within 7% error for a commercial NF deployed in the cloud (AWS). Moreover, NFTY outperforms link-bandwidth estimation baselines by up to 30x.
- [221] arXiv:2405.09443 [pdf, ps, html, other]
-
Title: Low-Complexity Joint Azimuth-Range-Velocity Estimation for Integrated Sensing and Communication with OFDM WaveformComments: 16 pages, 12 figures, submitted to IEEE journalSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Integrated sensing and communication (ISAC) is a main application scenario of the sixth-generation mobile communication systems. Due to the fast-growing number of antennas and subcarriers in cellular systems, the computational complexity of joint azimuth-range-velocity estimation (JARVE) in ISAC systems is extremely high. This paper studies the JARVE problem for a monostatic ISAC system with orthogonal frequency division multiplexing (OFDM) waveform, in which a base station receives the echos of its transmitted cellular OFDM signals to sense multiple targets. The Cramer-Rao bounds are first derived for JARVE. A low-complexity algorithm is further designed for super-resolution JARVE, which utilizes the proposed iterative subspace update scheme and Levenberg-Marquardt optimization method to replace the exhaustive search of spatial spectrum in multiple-signal-classification (MUSIC) algorithm. Finally, with the practical parameters of 5G New Radio, simulation results verify that the proposed algorithm can reduce the computational complexity by three orders of magnitude and two orders of magnitude compared to the existing three-dimensional MUSIC algorithm and estimation-of-signal-parameters-using-rotational-invariance-techniques (ESPRIT) algorithm, respectively, and also improve the estimation performance.
- [222] arXiv:2405.09444 [pdf, ps, html, other]
-
Title: Desk-AId: Humanitarian Aid Desk Assessment with Geospatial AI for Predicting Landmine AreasSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
The process of clearing areas, namely demining, starts by assessing and prioritizing potential hazardous areas (i.e., desk assessment) to go under thorough investigation of experts, who confirm the risk and proceed with the mines clearance operations. This paper presents Desk-AId that supports the desk assessment phase by estimating landmine risks using geospatial data and socioeconomic information. Desk-AId uses a Geospatial AI approach specialized to landmines. The approach includes mixed data sampling strategies and context-enrichment by historical conflicts and key multi-domain facilities (e.g., buildings, roads, health sites). The proposed system addresses the issue of having only ground-truth for confirmed hazardous areas by implementing a new hard-negative data sampling strategy, where negative points are sampled in the vicinity of hazardous areas. Experiments validate Desk-Aid in two domains for landmine risk assessment: 1) country-wide, and 2) uncharted study areas). The proposed approach increases the estimation accuracies up to 92%, for different classification models such as RandomForest (RF), Feedforward Neural Networks (FNN), and Graph Neural Networks (GNN).
- [223] arXiv:2405.09447 [pdf, ps, other]
-
Title: Rotated reference frames in radiative transport theorySubjects: Numerical Analysis (math.NA)
Rotated reference frames offer fast algorithms for the radiative transport equation (RTE). We review the singular-eigenfunction approach and related numerical methods for the multi-dimensional RTE with rotated reference frames.
- [224] arXiv:2405.09453 [pdf, ps, html, other]
-
Title: Kuramoto Oscillators and Swarms on Manifolds for Geometry Informed Machine LearningSubjects: Machine Learning (cs.LG); Mathematical Physics (math-ph); Adaptation and Self-Organizing Systems (nlin.AO)
We propose the idea of using Kuramoto models (including their higher-dimensional generalizations) for machine learning over non-Euclidean data sets. These models are systems of matrix ODE's describing collective motions (swarming dynamics) of abstract particles (generalized oscillators) on spheres, homogeneous spaces and Lie groups. Such models have been extensively studied from the beginning of XXI century both in statistical physics and control theory. They provide a suitable framework for encoding maps between various manifolds and are capable of learning over spherical and hyperbolic geometries. In addition, they can learn coupled actions of transformation groups (such as special orthogonal, unitary and Lorentz groups). Furthermore, we overview families of probability distributions that provide appropriate statistical models for probabilistic modeling and inference in Geometric Deep Learning. We argue in favor of using statistical models which arise in different Kuramoto models in the continuum limit of particles. The most convenient families of probability distributions are those which are invariant with respect to actions of certain symmetry groups.
- [225] arXiv:2405.09454 [pdf, ps, html, other]
-
Title: Tell Me Why: Explainable Public Health Fact-Checking with Large Language ModelsSubjects: Computation and Language (cs.CL)
This paper presents a comprehensive analysis of explainable fact-checking through a series of experiments, focusing on the ability of large language models to verify public health claims and provide explanations or justifications for their veracity assessments. We examine the effectiveness of zero/few-shot prompting and parameter-efficient fine-tuning across various open and closed-source models, examining their performance in both isolated and joint tasks of veracity prediction and explanation generation. Importantly, we employ a dual evaluation approach comprising previously established automatic metrics and a novel set of criteria through human evaluation. Our automatic evaluation indicates that, within the zero-shot scenario, GPT-4 emerges as the standout performer, but in few-shot and parameter-efficient fine-tuning contexts, open-source models demonstrate their capacity to not only bridge the performance gap but, in some instances, surpass GPT-4. Human evaluation reveals yet more nuance as well as indicating potential problems with the gold explanations.
- [226] arXiv:2405.09459 [pdf, ps, html, other]
-
Title: Fourier Boundary Features Network with Wider Catchers for Glass SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Glass largely blurs the boundary between the real world and the reflection. The special transmittance and reflectance quality have confused the semantic tasks related to machine vision. Therefore, how to clear the boundary built by glass, and avoid over-capturing features as false positive information in deep structure, matters for constraining the segmentation of reflection surface and penetrating glass. We proposed the Fourier Boundary Features Network with Wider Catchers (FBWC), which might be the first attempt to utilize sufficiently wide horizontal shallow branches without vertical deepening for guiding the fine granularity segmentation boundary through primary glass semantic information. Specifically, we designed the Wider Coarse-Catchers (WCC) for anchoring large area segmentation and reducing excessive extraction from a structural perspective. We embed fine-grained features by Cross Transpose Attention (CTA), which is introduced to avoid the incomplete area within the boundary caused by reflection noise. For excavating glass features and balancing high-low layers context, a learnable Fourier Convolution Controller (FCC) is proposed to regulate information integration robustly. The proposed method has been validated on three different public glass segmentation datasets. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art (SOTA) methods in glass image segmentation.
- [227] arXiv:2405.09463 [pdf, ps, html, other]
-
Title: Gaze-DETR: Using Expert Gaze to Reduce False Positives in Vulvovaginal Candidiasis ScreeningComments: MICCAI-2024 early accept. Our code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Accurate detection of vulvovaginal candidiasis is critical for women's health, yet its sparse distribution and visually ambiguous characteristics pose significant challenges for accurate identification by pathologists and neural networks alike. Our eye-tracking data reveals that areas garnering sustained attention - yet not marked by experts after deliberation - are often aligned with false positives of neural networks. Leveraging this finding, we introduce Gaze-DETR, a pioneering method that integrates gaze data to enhance neural network precision by diminishing false positives. Gaze-DETR incorporates a universal gaze-guided warm-up protocol applicable across various detection methods and a gaze-guided rectification strategy specifically designed for DETR-based models. Our comprehensive tests confirm that Gaze-DETR surpasses existing leading methods, showcasing remarkable improvements in detection accuracy and generalizability.
- [228] arXiv:2405.09465 [pdf, ps, html, other]
-
Title: Flashback: Enhancing Proposer-Builder Design with Future-Block Auctions in Proof-of-Stake EthereumSubjects: Cryptography and Security (cs.CR)
Maximal extractable value (MEV) in which block proposers unethically gain profits by manipulating the order in which transactions are included within a block, is a key challenge facing blockchains such as Ethereum today. Left unchecked, MEV can lead to a centralization of stake distribution thereby ultimately compromising the security of blockchain consensus. To preserve proposer decentralization (and hence security) of the blockchain, Ethereum has advocated for a proposer-builder separation (PBS) in which the functionality of transaction ordering is separated from proposers and assigned to separate entities called builders. Builders accept transaction bundles from searchers, who compete to find the most profitable bundles. Builders then bid completed blocks to proposers, who accept the most profitable blocks for publication. The auction mechanisms used between searchers, builders and proposers are crucial to the overall health of the blockchain. In this paper, we consider PBS design in Ethereum as a game between searchers, builders and proposers. A key novelty in our design is the inclusion of future block proposers, as all proposers of an epoch are decided ahead of time in proof-of-stake (PoS) Ethereum within the game model. Our analysis shows the existence of alternative auction mechanisms that result in a better (more profitable) equilibrium to players compared to state-of-the-art. Experimental evaluations based on synthetic and real-world data traces corroborate the analysis. Our results highlight that a rethinking of auction mechanism designs is necessary in PoS Ethereum to prevent disruption.
- [229] arXiv:2405.09470 [pdf, ps, html, other]
-
Title: Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style TransferComments: Accepted to SecTL (AsiaCCS Workshop) 2024Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study.
- [230] arXiv:2405.09477 [pdf, ps, html, other]
-
Title: Harmonizing Human Insights and AI Precision: Hand in Hand for Advancing Knowledge Graph TaskSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Knowledge graph embedding (KGE) has caught significant interest for its effectiveness in knowledge graph completion (KGC), specifically link prediction (LP), with recent KGE models cracking the LP benchmarks. Despite the rapidly growing literature, insufficient attention has been paid to the cooperation between humans and AI on KG. However, humans' capability to analyze graphs conceptually may further improve the efficacy of KGE models with semantic information. To this effect, we carefully designed a human-AI team (HAIT) system dubbed KG-HAIT, which harnesses the human insights on KG by leveraging fully human-designed ad-hoc dynamic programming (DP) on KG to produce human insightful feature (HIF) vectors that capture the subgraph structural feature and semantic similarities. By integrating HIF vectors into the training of KGE models, notable improvements are observed across various benchmarks and metrics, accompanied by accelerated model convergence. Our results underscore the effectiveness of human-designed DP in the task of LP, emphasizing the pivotal role of collaboration between humans and AI on KG. We open avenues for further exploration and innovation through KG-HAIT, paving the way towards more effective and insightful KG analysis techniques.
- [231] arXiv:2405.09482 [pdf, ps, html, other]
-
Title: Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational TextsSubjects: Computation and Language (cs.CL)
Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic. Effective teaching, however, requires teachers to adapt the difficulty of content and explanations to the education level of their students. Even the best LLMs today struggle to do this well. If we want to improve LLMs on this adaptation task, we need to be able to measure adaptation success reliably. However, current Static metrics for text difficulty, like the Flesch-Kincaid Reading Ease score, are known to be crude and brittle. We, therefore, introduce and evaluate a new set of Prompt-based metrics for text difficulty. Based on a user study, we create Prompt-based metrics as inputs for LLMs. They leverage LLM's general language understanding capabilities to capture more abstract and complex features than Static metrics. Regression experiments show that adding our Prompt-based metrics significantly improves text difficulty classification over Static metrics alone. Our results demonstrate the promise of using LLMs to evaluate text adaptation to different education levels.
- [232] arXiv:2405.09483 [pdf, ps, html, other]
-
Title: DemOpts: Fairness corrections in COVID-19 case prediction modelsSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
COVID-19 forecasting models have been used to inform decision making around resource allocation and intervention decisions e.g., hospital beds or stay-at-home orders. State of the art deep learning models often use multimodal data such as mobility or socio-demographic data to enhance COVID-19 case prediction models. Nevertheless, related work has revealed under-reporting bias in COVID-19 cases as well as sampling bias in mobility data for certain minority racial and ethnic groups, which could in turn affect the fairness of the COVID-19 predictions along race labels. In this paper, we show that state of the art deep learning models output mean prediction errors that are significantly different across racial and ethnic groups; and which could, in turn, support unfair policy decisions. We also propose a novel de-biasing method, DemOpts, to increase the fairness of deep learning based forecasting models trained on potentially biased datasets. Our results show that DemOpts can achieve better error parity that other state of the art de-biasing approaches, thus effectively reducing the differences in the mean error distributions across more racial and ethnic groups.
- [233] arXiv:2405.09487 [pdf, ps, html, other]
-
Title: Color Space Learning for Cross-Color Person Re-IdentificationComments: Accepted by ICME 2024 (Oral)Subjects: Computer Vision and Pattern Recognition (cs.CV)
The primary color profile of the same identity is assumed to remain consistent in typical Person Re-identification (Person ReID) tasks. However, this assumption may be invalid in real-world situations and images hold variant color profiles, because of cross-modality cameras or identity with different clothing. To address this issue, we propose Color Space Learning (CSL) for those Cross-Color Person ReID problems. Specifically, CSL guides the model to be less color-sensitive with two modules: Image-level Color-Augmentation and Pixel-level Color-Transformation. The first module increases the color diversity of the inputs and guides the model to focus more on the non-color information. The second module projects every pixel of input images onto a new color space. In addition, we introduce a new Person ReID benchmark across RGB and Infrared modalities, NTU-Corridor, which is the first with privacy agreements from all participants. To evaluate the effectiveness and robustness of our proposed CSL, we evaluate it on several Cross-Color Person ReID benchmarks. Our method surpasses the state-of-the-art methods consistently. The code and benchmark are available at: this https URL
- [234] arXiv:2405.09490 [pdf, ps, html, other]
-
Title: Distributed Nonlinear Conic Optimisation with partially separable StructureComments: arXiv admin note: text overlap with arXiv:2309.12897Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper we consider the problem of distributed nonlinear optimisation of a separable convex cost function over a graph subject to cone constraints. We show how to generalise, using convex analysis, monotone operator theory and fixed-point theory, the primal-dual method of multipliers (PDMM), originally designed for equality constraint optimisation and recently extended to include linear inequality constraints, to accommodate for cone constraints. The resulting algorithm can be used to implement a variety of optimisation problems, including the important class of semidefinite programs with partially separable structure, in a fully distributed fashion. We derive update equations by applying the Peaceman-Rachford splitting algorithm to the monotonic inclusion related to the lifted dual problem. The cone constraints are implemented by a reflection method in the lifted dual domain where auxiliary variables are reflected with respect to the intersection of the polar cone and a subspace relating the dual and lifted dual domain. Convergence results for both synchronous and stochastic update schemes are provided and an application of the proposed algorithm is demonstrated to implement an approximate algorithm for maximum cut problems based on semidefinite programming in a fully distributed fashion.
- [235] arXiv:2405.09492 [pdf, ps, html, other]
-
Title: MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware Optimization for Enhanced Continual LearningComments: 8 pages, 5 figuresSubjects: Machine Learning (cs.LG)
Deep neural networks suffer from the catastrophic forgetting problem in the field of continual learning (CL). To address this challenge, we propose MGSER-SAM, a novel memory replay-based algorithm specifically engineered to enhance the generalization capabilities of CL models. We first intergrate the SAM optimizer, a component designed for optimizing flatness, which seamlessly fits into well-known Experience Replay frameworks such as ER and DER++. Then, MGSER-SAM distinctively addresses the complex challenge of reconciling conflicts in weight perturbation directions between ongoing tasks and previously stored memories, which is underexplored in the SAM optimizer. This is effectively accomplished by the strategic integration of soft logits and the alignment of memory gradient directions, where the regularization terms facilitate the concurrent minimization of various training loss terms integral to the CL process. Through rigorous experimental analysis conducted across multiple benchmarks, MGSER-SAM has demonstrated a consistent ability to outperform existing baselines in all three CL scenarios. Comparing to the representative memory replay-based baselines ER and DER++, MGSER-SAM not only improves the testing accuracy by $24.4\%$ and $17.6\%$ respectively, but also achieves the lowest forgetting on each benchmark.
- [236] arXiv:2405.09496 [pdf, ps, html, other]
-
Title: ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using WikidataComments: Accepted to LREC-COLING 2024. arXiv admin note: text overlap with arXiv:2202.14035Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We introduce ParaNames, a massively multilingual parallel name resource consisting of 140 million names spanning over 400 languages. Names are provided for 16.8 million entities, and each entity is mapped from a complex type hierarchy to a standard type (PER/LOC/ORG). Using Wikidata as a source, we create the largest resource of this type to date. We describe our approach to filtering and standardizing the data to provide the best quality possible. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking. We demonstrate the usefulness of ParaNames on two tasks. First, we perform canonical name translation between English and 17 other languages. Second, we use it as a gazetteer for multilingual named entity recognition, obtaining performance improvements on all 10 languages evaluated.
- [237] arXiv:2405.09497 [pdf, ps, html, other]
-
Title: Towards the limits: Sensing Capability Measurement for ISAC Through Channel EncoderSubjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Integrated Sensing and Communication (ISAC) is gradually becoming a reality due to the significant increase in frequency and bandwidth of next-generation wireless communication technologies. Therefore it becomes crucial to evaluate the communication and sensing performance using appropriate channel models to address resource competition from each other. Existing work only models the sensing capability based on the mutual information between the channel response and the received signal, and its theoretical resolution is difficult to support the high-precision requirements of ISAC for sensing tasks, and may even affect its communication optimal.
In this paper, we propose a sensing channel encoder model to measure the sensing capacity with higher resolution by discrete task mutual information. For the first time, derive upper and lower bounds on the sensing accuracy for a given channel. This model not only provides the possibility of optimizing the ISAC systems at a finer granularity and balancing communication and sensing resources, but also provides theoretical explanations for classical intuitive feelings (like more modalities more accuracy) in wireless sensing. Furthermore, we validate the effectiveness of the proposed channel model through real-case studies, including person identification, displacement detection, direction estimation, and device recognition. The evaluation results indicate a Pearson correlation coefficient exceeding 0.9 between our task mutual information and conventional experimental metrics (e.g., accuracy). - [238] arXiv:2405.09499 [pdf, ps, html, other]
-
Title: A Comprehensive Survey on SmartNICs: Architectures, Development Models, Applications, and Research DirectionsComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Networking and Internet Architecture (cs.NI)
The end of Moore's Law and Dennard Scaling has slowed processor improvements in the past decade. While multi-core processors have improved performance, they are limited by the application's level of parallelism, as prescribed by Amdahl's Law. This has led to the emergence of domain-specific processors that specialize in a narrow range of functions. Smart Network Interface Cards (SmartNICs) can be seen as an evolutionary technology that combines heterogeneous domain-specific processors and general-purpose cores to offload infrastructure tasks. Despite the impressive advantages of SmartNICs and their importance in modern networks, the literature has been missing a comprehensive survey. To this end, this paper provides a background encompassing an overview of the evolution of NICs from basic to SmartNICs, describing their architectures, development environments, and advantages over legacy NICs. The paper then presents a comprehensive taxonomy of applications offloaded to SmartNICs, covering network, security, storage, and machine learning functions. Challenges associated with SmartNIC development and deployment are discussed, along with current initiatives and open research issues.
- [239] arXiv:2405.09504 [pdf, ps, other]
-
Title: Initial Algebras Unchained -- A Novel Initial Algebra Construction Formalized in AgdaSubjects: Logic in Computer Science (cs.LO)
The initial algebra for an endofunctor F provides a recursion and induction scheme for data structures whose constructors are described by F. The initial-algebra construction by Adámek (1974) starts with the initial object (e.g. the empty set) and successively applies the functor until a fixed point is reached, an idea inspired by Kleene's fixed point theorem. Depending on the functor of interest, this may require transfinitely many steps indexed by ordinal numbers until termination.
We provide a new initial algebra construction which is not based on an ordinal-indexed chain. Instead, our construction is loosely inspired by Pataraia's fixed point theorem and forms the colimit of all finite recursive coalgebras. This is reminiscent of the construction of the rational fixed point of an endofunctor that forms the colimit of all finite coalgebras. For our main correctness theorem, we assume the given endofunctor is accessible on a (weak form of) locally presentable category. Our proofs are constructive and fully formalized in Agda. - [240] arXiv:2405.09507 [pdf, ps, html, other]
-
Title: QueryNER: Segmentation of E-commerce QueriesComments: Accepted to LREC-COLING 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We present QueryNER, a manually-annotated dataset and accompanying model for e-commerce query segmentation. Prior work in sequence labeling for e-commerce has largely addressed aspect-value extraction which focuses on extracting portions of a product title or query for narrowly defined aspects. Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable types. We report baseline tagging results and conduct experiments comparing token and entity dropping for null and low recall query recovery. Challenging test sets are created using automatic transformations and show how simple data augmentation techniques can make the models more robust to noise. We make the QueryNER dataset publicly available.
- [241] arXiv:2405.09508 [pdf, ps, html, other]
-
Title: Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural PrimingComments: 9 pages, 6 figuresSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
This study evaluates the performance of Recurrent Neural Network (RNN) and Transformer in replicating cross-language structural priming: a key indicator of abstract grammatical representations in human language processing. Focusing on Chinese-English priming, which involves two typologically distinct languages, we examine how these models handle the robust phenomenon of structural priming, where exposure to a particular sentence structure increases the likelihood of selecting a similar structure subsequently. Additionally, we utilize large language models (LLM) to measure the cross-lingual structural priming effect. Our findings indicate that Transformer outperform RNN in generating primed sentence structures, challenging the conventional belief that human sentence processing primarily involves recurrent and immediate processing and suggesting a role for cue-based retrieval mechanisms. Overall, this work contributes to our understanding of how computational models may reflect human cognitive processes in multilingual contexts.
- [242] arXiv:2405.09521 [pdf, ps, html, other]
-
Title: Towards a fully declarative neuro-symbolic languageSubjects: Artificial Intelligence (cs.AI)
Neuro-symbolic systems (NeSy), which claim to combine the best of both learning and reasoning capabilities of artificial intelligence, are missing a core property of reasoning systems: Declarativeness. The lack of declarativeness is caused by the functional nature of neural predicates inherited from neural networks. We propose and implement a general framework for fully declarative neural predicates, which hence extends to fully declarative NeSy frameworks. We first show that the declarative extension preserves the learning and reasoning capabilities while being able to answer arbitrary queries while only being trained on a single query type.
- [243] arXiv:2405.09522 [pdf, ps, html, other]
-
Title: ContourCraft: Learning to Resolve Intersections in Neural Multi-Garment SimulationsComments: Accepted for publication by SIGGRAPH 2024, conference trackSubjects: Graphics (cs.GR); Machine Learning (cs.LG)
Learning-based approaches to cloth simulation have started to show their potential in recent years. However, handling collisions and intersections in neural simulations remains a largely unsolved problem. In this work, we present \moniker{}, a learning-based solution for handling intersections in neural cloth simulations. Unlike conventional approaches that critically rely on intersection-free inputs, \moniker{} robustly recovers from intersections introduced through missed collisions, self-penetrating bodies, or errors in manually designed multi-layer outfits. The technical core of \moniker{} is a novel intersection contour loss that penalizes interpenetrations and encourages rapid resolution thereof. We integrate our intersection loss with a collision-avoiding repulsion objective into a neural cloth simulation method based on graph neural networks (GNNs). We demonstrate our method's ability across a challenging set of diverse multi-layer outfits under dynamic human motions. Our extensive analysis indicates that \moniker{} significantly improves collision handling for learned simulation and produces visually compelling results.
- [244] arXiv:2405.09529 [pdf, ps, other]
-
Title: Artificial Intelligence for the Internal Democracy of Political PartiesSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
The article argues that AI can enhance the measurement and implementation of democratic processes within political parties, known as Intra-Party Democracy (IPD). It identifies the limitations of traditional methods for measuring IPD, which often rely on formal parameters, self-reported data, and tools like surveys. Such limitations lead to the collection of partial data, rare updates, and significant demands on resources. To address these issues, the article suggests that specific data management and Machine Learning (ML) techniques, such as natural language processing and sentiment analysis, can improve the measurement (ML about) and practice (ML for) of IPD. The article concludes by considering some of the principal risks of ML for IPD, including concerns over data privacy, the potential for manipulation, and the dangers of overreliance on technology.
- [245] arXiv:2405.09530 [pdf, ps, html, other]
-
Title: A community palm modelNicholas Clinton, Andreas Vollrath, Remi D'annunzio, Desheng Liu, Henry B. Glick, Adrià Descals, Alicia Sullivan, Oliver Guinan, Jacob Abramowitz, Fred Stolle, Chris Goodman, Tanya Birch, David Quinn, Olga Danylo, Tijs Lips, Daniel Coelho, Enikoe Bihari, Bryce Cronkite-Ratcliff, Ate Poortinga, Atena Haghighattalab, Evan Notman, Michael DeWitt, Aaron Yonas, Gennadii Donchyts, Devaja Shah, David Saah, Karis Tenneson, Nguyen Hanh Quyen, Megha Verma, Andrew WilcoxComments: v0Subjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Palm oil production has been identified as one of the major drivers of deforestation for tropical countries. To meet supply chain objectives, commodity producers and other stakeholders need timely information of land cover dynamics in their supply shed. However, such data are difficult to obtain from suppliers who may lack digital geographic representations of their supply sheds and production locations. Here we present a "community model," a machine learning model trained on pooled data sourced from many different stakeholders, to develop a specific land cover probability map, in this case a semi-global oil palm map. An advantage of this method is the inclusion of varied inputs, the ability to easily update the model as new training data becomes available and run the model on any year that input imagery is available. Inclusion of diverse data sources into one probability map can help establish a shared understanding across stakeholders on the presence and absence of a land cover or commodity (in this case oil palm). The model predictors are annual composites built from publicly available satellite imagery provided by Sentinel-1, Sentinel-2, and ALOS DSM. We provide map outputs as the probability of palm in a given pixel, to reflect the uncertainty of the underlying state (palm or not palm). The initial version of this model provides global accuracy estimated to be approximately 90% (at 0.5 probability threshold) from spatially partitioned test data. This model, and resulting oil palm probability map products are useful for accurately identifying the geographic footprint of palm cultivation. Used in conjunction with timely deforestation information, this palm model is useful for understanding the risk of continued oil palm plantation expansion in sensitive forest areas.
- [246] arXiv:2405.09531 [pdf, ps, html, other]
-
Title: Ticket-based multi-strand method for increased efficiency in proof-of-work based blockchainsComments: 5 pagesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This paper outlines a method aiming to increase the efficiency of proof-of-work based blockchains using a ticket-based approach. To avoid the limitation of serially adding one block at a time to a blockchain, multiple semi-independent chains are used such that several valid blocks can be added in parallel, when they are added to separate chains. Blocks are added to different chains, the chain index being determined by a ``ticket'' that the miner must produce before mining a new block. This allows increasing the transaction rate by several orders of magnitude while the system is still fully decentralized and permissionless, and maintaining security in the sense that a successful attack would require the attacker to control a significant portion of the whole network.
- [247] arXiv:2405.09534 [pdf, ps, html, other]
-
Title: Learning-Based Compress-and-Forward Schemes for the Relay ChannelComments: journal submission under review. arXiv admin note: substantial text overlap with arXiv:2404.14594Subjects: Information Theory (cs.IT)
The relay channel, consisting of a source-destination pair along with a relay, is a fundamental component of cooperative communications. While the capacity of a general relay channel remains unknown, various relaying strategies, including compress-and-forward (CF), have been proposed. In CF, the relay forwards a quantized version of its received signal to the destination. Given the correlated signals at the relay and destination, distributed compression techniques, such as Wyner--Ziv coding, can be harnessed to utilize the relay-to-destination link more efficiently. Leveraging recent advances in neural network-based distributed compression, we revisit the relay channel problem and integrate a learned task-aware Wyner--Ziv compressor into a primitive relay channel with a finite-capacity out-of-band relay-to-destination link. The resulting neural CF scheme demonstrates that our compressor recovers binning of the quantized indices at the relay, mimicking the optimal asymptotic CF strategy, although no structure exploiting the knowledge of source statistics was imposed into the design. The proposed neural CF, employing finite order modulation, operates closely to the rate achievable in a primitive relay channel with a Gaussian codebook. We showcase the advantages of exploiting the correlated destination signal for relay compression through various neural CF architectures that involve end-to-end training of the compressor and the demodulator components. Our learned task-oriented compressors provide the first proof-of-concept work toward interpretable and practical neural CF relaying schemes.
- [248] arXiv:2405.09542 [pdf, ps, html, other]
-
Title: Hybrid Magnonic Reservoir ComputingSubjects: Emerging Technologies (cs.ET); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Applied Physics (physics.app-ph)
Magnonic systems have been a major area of research interest due to their potential benefits in speed and lower power consumption compared to traditional computing. One particular area that they may be of advantage is as Physical Reservoir Computers in machine learning models. In this work, we build on an established design for using an Auto-Oscillation Ring as a reservoir computer by introducing a simple neural network midstream and introduce an additional design using a spin wave guide with a scattering regime for processing data with different types of inputs. We simulate these designs on the new micro magnetic simulation software, this http URL, and show that the designs are capable of performing on various real world data sets comparably or better than traditional dense neural networks.
- [249] arXiv:2405.09543 [pdf, ps, html, other]
-
Title: Algorithmic Fairness: A Tolerance PerspectiveComments: 33 pages, 4 figuresSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Recent advancements in machine learning and deep learning have brought algorithmic fairness into sharp focus, illuminating concerns over discriminatory decision making that negatively impacts certain individuals or groups. These concerns have manifested in legal, ethical, and societal challenges, including the erosion of trust in intelligent systems. In response, this survey delves into the existing literature on algorithmic fairness, specifically highlighting its multifaceted social consequences. We introduce a novel taxonomy based on 'tolerance', a term we define as the degree to which variations in fairness outcomes are acceptable, providing a structured approach to understanding the subtleties of fairness within algorithmic decisions. Our systematic review covers diverse industries, revealing critical insights into the balance between algorithmic decision making and social equity. By synthesizing these insights, we outline a series of emerging challenges and propose strategic directions for future research and policy making, with the goal of advancing the field towards more equitable algorithmic systems.
- [250] arXiv:2405.09544 [pdf, ps, html, other]
-
Title: Classifying geospatial objects from multiview aerial imagery using semantic meshesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Aerial imagery is increasingly used in Earth science and natural resource management as a complement to labor-intensive ground-based surveys. Aerial systems can collect overlapping images that provide multiple views of each location from different perspectives. However, most prediction approaches (e.g. for tree species classification) use a single, synthesized top-down "orthomosaic" image as input that contains little to no information about the vertical aspects of objects and may include processing artifacts. We propose an alternate approach that generates predictions directly on the raw images and accurately maps these predictions into geospatial coordinates using semantic meshes. This method$\unicode{x2013}$released as a user-friendly open-source toolkit$\unicode{x2013}$enables analysts to use the highest quality data for predictions, capture information about the sides of objects, and leverage multiple viewpoints of each location for added robustness. We demonstrate the value of this approach on a new benchmark dataset of four forest sites in the western U.S. that consists of drone images, photogrammetry results, predicted tree locations, and species classification data derived from manual surveys. We show that our proposed multiview method improves classification accuracy from 53% to 75% relative to an orthomosaic baseline on a challenging cross-site tree species classification task.
- [251] arXiv:2405.09545 [pdf, ps, html, other]
-
Title: Intrinsic Voltage Offsets in Memcapacitive Bio-Membranes Enable High-Performance Physical Reservoir ComputingComments: Supplementary Information is included under the main textSubjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Reservoir computing is a brain-inspired machine learning framework for processing temporal data by mapping inputs into high-dimensional spaces. Physical reservoir computers (PRCs) leverage native fading memory and nonlinearity in physical substrates, including atomic switches, photonics, volatile memristors, and, recently, memcapacitors, to achieve efficient high-dimensional mapping. Traditional PRCs often consist of homogeneous device arrays, which rely on input encoding methods and large stochastic device-to-device variations for increased nonlinearity and high-dimensional mapping. These approaches incur high pre-processing costs and restrict real-time deployment. Here, we introduce a novel heterogeneous memcapacitor-based PRC that exploits internal voltage offsets to enable both monotonic and non-monotonic input-state correlations crucial for efficient high-dimensional transformations. We demonstrate our approach's efficacy by predicting a second-order nonlinear dynamical system with an extremely low prediction error (0.00018). Additionally, we predict a chaotic Hénon map, achieving a low normalized root mean square error (0.080). Unlike previous PRCs, such errors are achieved without input encoding methods, underscoring the power of distinct input-state correlations. Most importantly, we generalize our approach to other neuromorphic devices that lack inherent voltage offsets using externally applied offsets to realize various input-state correlations. Our approach and unprecedented performance are a major milestone towards high-performance full in-materia PRCs.
- [252] arXiv:2405.09546 [pdf, ps, html, other]
-
Title: BEHAVIOR Vision Suite: Customizable Dataset Generation via SimulationYunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun WuComments: CVPR 2024 (Highlight). Project website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and rendering quality, limited diversity, and unrealistic physical properties. We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K. BVS supports a large number of adjustable parameters at the scene level (e.g., lighting, object placement), the object level (e.g., joint configuration, attributes such as "filled" and "folded"), and the camera level (e.g., field of view, focal length). Researchers can arbitrarily vary these parameters during data generation to perform controlled experiments. We showcase three example application scenarios: systematically evaluating the robustness of models across different continuous axes of domain shift, evaluating scene understanding models on the same set of images, and training and evaluating simulation-to-real transfer for a novel vision task: unary and binary state prediction. Project website: this https URL
New submissions for Thursday, 16 May 2024 (showing 252 of 252 entries )
- [253] arXiv:2405.08825 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Thermodynamic limit in learning period threeComments: 26 pages, 19 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Adaptation and Self-Organizing Systems (nlin.AO); Chaotic Dynamics (nlin.CD)
A continuous one-dimensional map with period three includes all periods. This raises the following question: Can we obtain any types of periodic orbits solely by learning three data points? We consider learning period three with random neural networks and report the universal property associated with it. We first show that the trained networks have a thermodynamic limit that depends on the choice of target data and network settings. Our analysis reveals that almost all learned periods are unstable and each network has its characteristic attractors (which can even be untrained ones). Here, we propose the concept of characteristic bifurcation expressing embeddable attractors intrinsic to the network, in which the target data points and the scale of the network weights function as bifurcation parameters. In conclusion, learning period three generates various attractors through characteristic bifurcation due to the stability change in latently existing numerous unstable periods of the system.
- [254] arXiv:2405.08863 (cross-list from hep-ph) [pdf, ps, html, other]
-
Title: HepLean: Digitalising high energy physicsComments: 16 pages. Comments are welcomeSubjects: High Energy Physics - Phenomenology (hep-ph); Logic in Computer Science (cs.LO); High Energy Physics - Theory (hep-th)
We introduce HepLean, an open-source project to digitalise definitions, theorems, proofs, and calculations in high energy physics using the interactive theorem prover Lean 4. HepLean has the potential to benefit the high energy physics community in four ways: making it easier to find existing results, allowing the creation of new results using artificial intelligence and automated methods, allowing easy review of papers for mathematical correctness, and providing new ways to teach high energy physics. We will discuss these in detail. We will also demonstrate the digitalisation of three areas of high energy physics in HepLean: Cabibbo-Kobayashi-Maskawa matrices in flavour physics, local anomaly cancellation, and Higgs physics.
- [255] arXiv:2405.08871 (cross-list from hep-th) [pdf, ps, html, other]
-
Title: The DNA of Calabi-Yau HypersurfacesComments: 32 pages, 9 figuresSubjects: High Energy Physics - Theory (hep-th); Neural and Evolutionary Computing (cs.NE); High Energy Physics - Phenomenology (hep-ph)
We implement Genetic Algorithms for triangulations of four-dimensional reflexive polytopes which induce Calabi-Yau threefold hypersurfaces via Batryev's construction. We demonstrate that such algorithms efficiently optimize physical observables such as axion decay constants or axion-photon couplings in string theory compactifications. For our implementation, we choose a parameterization of triangulations that yields homotopy inequivalent Calabi-Yau threefolds by extending fine, regular triangulations of two-faces, thereby eliminating exponentially large redundancy factors in the map from polytope triangulations to Calabi-Yau hypersurfaces. In particular, we discuss how this encoding renders the entire Kreuzer-Skarke list amenable to a variety of optimization strategies, including but not limited to Genetic Algorithms. To achieve optimal performance, we tune the hyperparameters of our Genetic Algorithm using Bayesian optimization. We find that our implementation vastly outperforms other sampling and optimization strategies like Markov Chain Monte Carlo or Simulated Annealing. Finally, we showcase that our Genetic Algorithm efficiently performs optimization even for the maximal polytope with Hodge numbers $h^{1,1} = 491$, where we use it to maximize axion-photon couplings.
- [256] arXiv:2405.08919 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Joint Instantaneous Amplitude-Frequency Analysis of Vibration Signals for Vibration-Based Condition Monitoring of Rolling BearingsSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
Vibrations of damaged bearings are manifested as modulations in the amplitude of the generated vibration signal, making envelope analysis an effective approach for discriminating between healthy and abnormal vibration patterns. Motivated by this, we introduce a low-complexity method for vibration-based condition monitoring (VBCM) of rolling bearings based on envelope analysis. In the proposed method, the instantaneous amplitude (envelope) and instantaneous frequency of the vibration signal are jointly utilized to facilitate three novel envelope representations: instantaneous amplitude-frequency mapping (IAFM), instantaneous amplitude-frequency correlation (IAFC), and instantaneous energy-frequency distribution (IEFD). Maintaining temporal information, these representations effectively capture energy-frequency variations that are unique to the condition of the bearing, thereby enabling the extraction of discriminative features with high sensitivity to variations in operational conditions. Accordingly, six new highly discriminative features are engineered from these representations, capturing and characterizing their shapes. The experimental results show outstanding performance in detecting and diagnosing various fault types, demonstrating the effectiveness of the proposed method in capturing unique variations in energy and frequency between healthy and faulty bearings. Moreover, the proposed method has moderate computational complexity, meeting the requirements of real-time applications. Further, the Python code of the proposed method is made public to support collaborative research efforts and ensure the reproducibility of the presented work
- [257] arXiv:2405.08958 (cross-list from astro-ph.IM) [pdf, ps, html, other]
-
Title: Learned radio interferometric imaging for varying visibility coverageSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
With the next generation of interferometric telescopes, such as the Square Kilometre Array (SKA), the need for highly computationally efficient reconstruction techniques is particularly acute. The challenge in designing learned, data-driven reconstruction techniques for radio interferometry is that they need to be agnostic to the varying visibility coverages of the telescope, since these are different for each observation. Because of this, learned post-processing or learned unrolled iterative reconstruction methods must typically be retrained for each specific observation, amounting to a large computational overhead. In this work we develop learned post-processing and unrolled iterative methods for varying visibility coverages, proposing training strategies to make these methods agnostic to variations in visibility coverage with minimal to no fine-tuning. Learned post-processing techniques are heavily dependent on the prior information encoded in training data and generalise poorly to other visibility coverages. In contrast, unrolled iterative methods, which include the telescope measurement operator inside the network, achieve state-of-the-art reconstruction quality and computation time, generalising well to other coverages and require little to no fine-tuning. Furthermore, they generalise well to realistic radio observations and are able to reconstruct the high dynamic range of these images.
- [258] arXiv:2405.08962 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Understanding Side-Channel Vulnerabilities in Superconducting Qubit Readout ArchitecturesSubjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
Frequency-multiplexing is an effective method to achieve resource-efficient superconducting qubit readout. Allowing multiple resonators to share a common feedline, the number of cables and passive components involved in the readout of a qubit can be drastically reduced. However, this improvement in scalability comes at the price of a crucial non-ideality -- an increased readout crosstalk. Prior works have targeted building better devices and discriminators to reduce its effects, as readout-crosstalk-induced qubit measurement errors are detrimental to the reliability of a quantum computer. However, in this work, we show that beyond the reliability of a system, readout crosstalk can introduce vulnerabilities in a system being shared among multiple users. These vulnerabilities are directly related to correlated errors due to readout crosstalk. These correlated errors can be exploited by nefarious attackers to predict the state of the victim qubits, resulting in information leakage.
- [259] arXiv:2405.08964 (cross-list from math.AC) [pdf, ps, html, other]
-
Title: Wronskians form the inverse system of the arcs of a double pointSubjects: Commutative Algebra (math.AC); Symbolic Computation (cs.SC); Algebraic Geometry (math.AG); Combinatorics (math.CO)
The ideal of the arc scheme of a double point or, equivalently, the differential ideal generated by the ideal of a double point is a primary ideal in an infinite-dimensional polynomial ring supported at the origin. This ideal has a rich combinatorial structure connecting it to singularity theory, partition identities, representation theory, and differential algebra. Macaulay inverse system is a powerful tool for studying the structure of primary ideals which describes an ideal in terms of certain linear differential operators. In the present paper, we show that the inverse system of the ideal of the arc scheme of a double point is precisely a vector space spanned by all the Wronskians of the variables and their formal derivatives. We then apply this characterization to extend our recent result on Poincaré-type series for such ideals.
- [260] arXiv:2405.08975 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: A distribution-free valid p-value for finite samples of bounded random variablesComments: -Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We build a valid p-value based on a concentration inequality for bounded random variables introduced by Pelekis, Ramon and Wang. The motivation behind this work is the calibration of predictive algorithms in a distribution-free setting. The super-uniform p-value is tighter than Hoeffding and Bentkus alternatives in certain regions. Even though we are motivated by a calibration setting in a machine learning context, the ideas presented in this work are also relevant in classical statistical inference. Furthermore, we compare the power of a collection of valid p- values for bounded losses, which are presented in previous literature.
- [261] arXiv:2405.08982 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Enabling Leakage Reduction via Fast and High-Fidelity Qutrit ReadoutSubjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR)
Quantum Error Correction (QEC) is key to operating quantum processors effectively at practical scales. QECs are designed for systems comprising two-level systems, such as qubits, as their fundamental building block. Unfortunately, qubits can leak to third and higher energy levels, making these leaks challenging to detect and mitigate. If not addressed promptly, these leakage errors can proliferate and undermine QEC, leading to significant computational inaccuracies. Here, we present a high-fidelity three-level qubit readout protocol that is simple to implement on dedicated hardware such as FPGAs. Our design enables faster and higher-fidelity leakage detection over approaches using conventional qubit-state discriminators.
- [262] arXiv:2405.08999 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Robust Approximate Sampling via Stochastic Gradient Barker DynamicsJournal-ref: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, AISTATS'24, volume 238, 2024, page 2107-2115Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Stochastic Gradient (SG) Markov Chain Monte Carlo algorithms (MCMC) are popular algorithms for Bayesian sampling in the presence of large datasets. However, they come with little theoretical guarantees and assessing their empirical performances is non-trivial. In such context, it is crucial to develop algorithms that are robust to the choice of hyperparameters and to gradients heterogeneity since, in practice, both the choice of step-size and behaviour of target gradients induce hard-to-control biases in the invariant distribution. In this work we introduce the stochastic gradient Barker dynamics (SGBD) algorithm, extending the recently developed Barker MCMC scheme, a robust alternative to Langevin-based sampling algorithms, to the stochastic gradient framework. We characterize the impact of stochastic gradients on the Barker transition mechanism and develop a bias-corrected version that, under suitable assumptions, eliminates the error due to the gradient noise in the proposal. We illustrate the performance on a number of high-dimensional examples, showing that SGBD is more robust to hyperparameter tuning and to irregular behavior of the target gradients compared to the popular stochastic gradient Langevin dynamics algorithm.
- [263] arXiv:2405.09033 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Accelerating Decision Diagram-based Multi-node Quantum Simulation with Ring Communication and Automatic SWAP InsertionComments: Accepted at IEEE QSW 2024Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC)
An N-bit quantum state requires a vector of length $2^N$, leading to an exponential increase in the required memory with N in conventional statevector-based quantum simulators. A proposed solution to this issue is the decision diagram-based quantum simulator, which can significantly decrease the necessary memory and is expected to operate faster for specific quantum circuits. However, decision diagram-based quantum simulators are not easily parallelizable because data must be manipulated dynamically, and most implementations run on one thread. This paper introduces ring communication-based optimal parallelization and automatic swap insertion techniques for multi-node implementation of decision diagram-based quantum simulators. The ring communication approach is designed so that each node communicates with its neighboring nodes, which can facilitate faster and more parallel communication than broadcasting where one node needs to communicate with all nodes simultaneously. The automatic swap insertion method, an approach to minimize inter-node communication, has been employed in existing multi-node state vector-based simulators, but this paper proposes two methods specifically designed for decision diagram-based quantum simulators. These techniques were implemented and evaluated using the Shor algorithm and random circuits with up to 38 qubits using a maximum of 256 nodes. The experimental results have revealed that multi-node implementation can reduce run-time by up to 26 times. For example, Shor circuits that need 38 qubits can finish simulation in 147 seconds. Additionally, it was shown that ring communication has a higher speed-up effect than broadcast communication, and the importance of selecting the appropriate automatic swap insertion method was revealed.
- [264] arXiv:2405.09034 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Entanglement Distribution Delay Optimization in Quantum Networks with DistillationComments: 13 pages, 6 figuresSubjects: Quantum Physics (quant-ph); Networking and Internet Architecture (cs.NI)
Quantum networks (QNs) distribute entangled states to enable distributed quantum computing and sensing applications. However, in such QNs, quantum switches (QSs) have limited resources that are highly sensitive to noise and losses and must be carefully allocated to minimize entanglement distribution delay. In this paper, a QS resource allocation framework is proposed, which jointly optimizes the average entanglement distribution delay and entanglement distillation operations, to enhance the end-to-end (e2e) fidelity and satisfy minimum rate and fidelity requirements. The proposed framework considers realistic QN noise and includes the derivation of the analytical expressions for the average quantum memory decoherence noise parameter, and the resulting e2e fidelity after distillation. Finally, practical QN deployment aspects are considered, where QSs can control 1) nitrogen-vacancy (NV) center SPS types based on their isotopic decomposition, and 2) nuclear spin regions based on their distance and coupling strength with the electron spin of NV centers. A simulated annealing metaheuristic algorithm is proposed to solve the QS resource allocation optimization problem. Simulation results show that the proposed framework manages to satisfy all users rate and fidelity requirements, unlike existing distillation-agnostic (DA), minimal distillation (MD), and physics-agnostic (PA) frameworks which do not perform distillation, perform minimal distillation, and does not control the physics-based NV center characteristics, respectively. Furthermore, the proposed framework results in around 30% and 50% reductions in the average e2e entanglement distribution delay compared to existing PA and MD frameworks, respectively. Moreover, the proposed framework results in around 5%, 7%, and 11% reductions in the average e2e fidelity compared to existing DA, PA, and MD frameworks, respectively.
- [265] arXiv:2405.09052 (cross-list from cond-mat.mtrl-sci) [pdf, ps, html, other]
-
Title: Dielectric Tensor Prediction for Inorganic Materials Using Latent Information from Preferred PotentialSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
Dielectrics are materials with widespread applications in flash memory, central processing units, photovoltaics, capacitors, etc. However, the availability of public dielectric data remains limited, hindering research and development efforts. Previously, machine learning models focused on predicting dielectric constants as scalars, overlooking the importance of dielectric tensors in understanding material properties under directional electric fields for material design and simulation. This study demonstrates the value of common equivariant structural embedding features derived from a universal neural network potential in enhancing the prediction of dielectric properties. To integrate channel information from various-rank latent features while preserving the desired SE(3) equivariance to the second-rank dielectric tensors, we design an equivariant readout decoder to predict the total, electronic, and ionic dielectric tensors individually, and compare our model with the state-of-the-art models. Finally, we evaluate our model by conducting virtual screening on thermodynamical stable structure candidates in Materials Project. The material Ba\textsubscript{2}SmTaO\textsubscript{6} with large band gaps ($E_g=3.36 \mathrm{eV}$) and dielectric constants ($\epsilon=93.81$) is successfully identified out of the 14k candidate set. The results show that our methods give good accuracy on predicting dielectric tensors of inorganic materials, emphasizing their potential in contributing to the discovery of novel dielectrics.
- [266] arXiv:2405.09077 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Compressive Feature Selection for Remote Visual Multi-Task InferenceComments: 6 pages, 8 figures, IEEE ICME Workshop on Coding for MachinesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Deep models produce a number of features in each internal layer. A key problem in applications such as feature compression for remote inference is determining how important each feature is for the task(s) performed by the model. The problem is especially challenging in the case of multi-task inference, where the same feature may carry different importance for different tasks. In this paper, we examine how effective is mutual information (MI) between a feature and a model's task output as a measure of the feature's importance for that task. Experiments involving hard selection and soft selection (unequal compression) based on MI are carried out to compare the MI-based method with alternative approaches. Multi-objective analysis is provided to offer further insight.
- [267] arXiv:2405.09079 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Integrated Monostatic Sensing and Full-Duplex Multiuser Communication for mmWave SystemsComments: 13 pages, 7 figuresSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
In this paper, we propose a hybrid precoding/combining framework for communication-centric integrated sensing and full-duplex (FD) communication operating at mmWave bands. The designed precoders and combiners enable multiuser (MU) FD communication while simultaneously supporting monostatic sensing in a frequency-selective setting. The joint design of precoders and combiners involves the mitigation of self-interference (SI) caused by simultaneous transmission and reception at the FD base station (BS). Additionally, MU interference needs to be handled by the precoder/combiner design. The resulting optimization problem involves non-convex constraints since hybrid analog/digital architectures utilize networks of phase shifters. To solve the proposed problem, we separate the optimization of each precoder/combiner, and design each one of them while fixing the others. The precoders at the FD BS are designed by reformulating the communication and sensing constraints as signal-to-leakage-plus-noise ratio (SLNR) maximization problems that consider SI and MU interference as leakage. Furthermore, we design the frequency-flat analog combiner such that the residual SI at the FD BS is minimized under communication and sensing gain constraints. Finally, we design an interference-aware digital combining stage that separates MU signals and target reflections. The communication performance and sensing results show that the proposed framework efficiently supports both functionalities simultaneously.
- [268] arXiv:2405.09106 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: Minimisation of Polyak-\L{}ojasewicz Functions Using Random Zeroth-Order OraclesSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
The application of a zeroth-order scheme for minimising Polyak-Łojasewicz (PL) functions is considered. The framework is based on exploiting a random oracle to estimate the function gradient. The convergence of the algorithm to a global minimum in the unconstrained case and to a neighbourhood of the global minimum in the constrained case along with their corresponding complexity bounds are presented. The theoretical results are demonstrated via numerical examples.
- [269] arXiv:2405.09108 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: A Linear Test for Global Nonlinear ControllabilitySubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
It is known that if a nonlinear control affine system without drift is bracket generating, then its associated sub-Laplacian is invertible under some conditions on the domain. In this note, we investigate the converse. We show how invertibility of the sub-Laplacian operator implies a weaker form of controllability, where the reachable sets of a neighborhood of a point have full measure. From a computational point of view, one can then use the spectral gap of the (infinite-dimensional) self-adjoint operator to define a notion of degree of controllability.
An essential tool to establish the converse result is to use the relation between invertibility of the sub-Laplacian to the the controllability of the corresponding continuity equation using possibly non-smooth controls. Then using Ambrosio-Gigli-Savare's superposition principle from optimal transport theory we relate it to controllability properties of the control system. While the proof can be considered of the Perron-Frobenius type, we also provide a second dual Koopman point of view. - [270] arXiv:2405.09115 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Hybrid Meta-Solving for Practical Quantum ComputingDomenik Eichhorn, Maximilian Schweikart, Nick Poser, Frederik Fiand, Benedikt Poggel, Jeanette Miriam LorenzComments: Submitted to the 2024 IEEE International Conference on Quantum Computing and Engineering (QCE)Subjects: Quantum Physics (quant-ph); Software Engineering (cs.SE)
The advent of quantum algorithms has initiated a discourse on the potential for quantum speedups for optimization problems. However, several factors still hinder a practical realization of the potential benefits. These include the lack of advanced, error-free quantum hardware, the absence of accessible software stacks for seamless integration and interaction, and the lack of methods that allow us to leverage the theoretical advantages to real-world use cases. This paper works towards the creation of an accessible hybrid software stack for solving optimization problems, aiming to create a fundamental platform that can utilize quantum technologies to enhance the solving process. We introduce a novel approach that we call Hybrid Meta-Solving, which combines classical and quantum optimization techniques to create customizable and extensible hybrid solvers. We decompose mathematical problems into multiple sub-problems that can be solved by classical or quantum solvers, and propose techniques to semi-automatically build the best solver for a given problem. Implemented in our ProvideQ toolbox prototype, Meta-Solving provides interactive workflows for accessing quantum computing capabilities. Our evaluation demonstrates the applicability of Meta-Solving in industrial use cases. It shows that we can reuse state-of-the-art classical algorithms and extend them with quantum computing techniques. Our approach is designed to be at least as efficient as state-of-the-art classical techniques, while having the potential to outperform them if future advances in the quantum domain are made.
- [271] arXiv:2405.09137 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: On Convergence of the Iteratively Preconditioned Gradient-Descent (IPG) ObserverComments: 7 pagesSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper considers the observer design problem for discrete-time nonlinear dynamical systems with sampled measurement data. Earlier, the recently proposed Iteratively Preconditioned Gradient-Descent (IPG) observer, a Newton-type observer, has been empirically shown to have improved robustness against measurement noise than the prominent nonlinear observers, a property that other Newton-type observers lack. However, no theoretical guarantees on the convergence of the IPG observer were provided. This paper presents a rigorous convergence analysis of the IPG observer for a class of nonlinear systems in deterministic settings, proving its local linear convergence to the actual trajectory. Our assumptions are standard in the existing literature of Newton-type observers, and the analysis further confirms the relation of the IPG observer with the Newton observer, which was only hypothesized earlier.
- [272] arXiv:2405.09142 (cross-list from eess.AS) [pdf, ps, html, other]
-
Title: Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker DiarizationComments: Proceedings of Odyssey 2024: The Speaker and Language Recognition WorkshopSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Current speaker diarization systems rely on an external voice activity detection model prior to speaker embedding extraction on the detected speech segments. In this paper, we establish that the attention system of a speaker embedding extractor acts as a weakly supervised internal VAD model and performs equally or better than comparable supervised VAD systems. Subsequently, speaker diarization can be performed efficiently by extracting the VAD logits and corresponding speaker embedding simultaneously, alleviating the need and computational overhead of an external VAD model. We provide an extensive analysis of the behavior of the frame-level attention system in current speaker verification models and propose a novel speaker diarization pipeline using ECAPA2 speaker embeddings for both VAD and embedding extraction. The proposed strategy gains state-of-the-art performance on the AMI, VoxConverse and DIHARD III diarization benchmarks.
- [273] arXiv:2405.09146 (cross-list from math.CO) [pdf, ps, html, other]
-
Title: First order distinguishability of sparse random graphsSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Logic in Computer Science (cs.LO)
We study the problem of distinguishing between two independent samples $\mathbf{G}_n^1,\mathbf{G}_n^2$ of a binomial random graph $G(n,p)$ by first order (FO) sentences. Shelah and Spencer proved that, for a constant $\alpha\in(0,1)$, $G(n,n^{-\alpha})$ obeys FO zero-one law if and only if $\alpha$ is irrational. Therefore, for irrational $\alpha\in(0,1)$, any fixed FO sentence does not distinguish between $\mathbf{G}_n^1,\mathbf{G}_n^2$ with asymptotical probability 1 (w.h.p.) as $n\to\infty$. We show that the minimum quantifier depth $\mathbf{k}_{\alpha}$ of a FO sentence $\varphi=\varphi(\mathbf{G}_n^1,\mathbf{G}_n^2)$ distinguishing between $\mathbf{G}_n^1,\mathbf{G}_n^2$ depends on how closely $\alpha$ can be approximated by rationals: (1) for all non-Liouville $\alpha\in(0,1)$, $\mathbf{k}_{\alpha}=\Omega(\ln\ln\ln n)$ w.h.p.; (2) there are irrational $\alpha\in(0,1)$ with $\mathbf{k}_{\alpha}$ that grow arbitrarily slowly w.h.p.; (3) $\mathbf{k}_{\alpha}=O_p(\frac{\ln n}{\ln\ln n})$ for all $\alpha\in(0,1)$. The main ingredients in our proofs are a novel randomized algorithm that generates asymmetric strictly balanced graphs as well as a new method to study symmetry groups of randomly perturbed graphs.
- [274] arXiv:2405.09157 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: A Primal-Dual Framework for Symmetric Cone ProgrammingSubjects: Optimization and Control (math.OC); Computational Geometry (cs.CG); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
In this paper, we introduce a primal-dual algorithmic framework for solving Symmetric Cone Programs (SCPs), a versatile optimization model that unifies and extends Linear, Second-Order Cone (SOCP), and Semidefinite Programming (SDP). Our work generalizes the primal-dual framework for SDPs introduced by Arora and Kale, leveraging a recent extension of the Multiplicative Weights Update method (MWU) to symmetric cones. Going beyond existing works, our framework can handle SOCPs and mixed SCPs, exhibits nearly linear time complexity, and can be effectively parallelized. To illustrate the efficacy of our framework, we employ it to develop approximation algorithms for two geometric optimization problems: the Smallest Enclosing Sphere problem and the Support Vector Machine problem. Our theoretical analyses demonstrate that the two algorithms compute approximate solutions in nearly linear running time and with parallel depth scaling polylogarithmically with the input size. We compare our algorithms against CGAL as well as interior point solvers applied to these problems. Experiments show that our algorithms are highly efficient when implemented on a CPU and achieve substantial speedups when parallelized on a GPU, allowing us to solve large-scale instances of these problems.
- [275] arXiv:2405.09162 (cross-list from math.LO) [pdf, ps, html, other]
-
Title: Completeness and Termination of Tableau Calculus for Undirected GraphsComments: 12 pages, 3 figures, the conference 'AWPL 2024' proceedingSubjects: Logic (math.LO); Logic in Computer Science (cs.LO)
Hybrid logic is a modal logic with additional operators specifying nominals and is highly expressive. For example, there is no formula corresponding to the irreflexivity of Kripke frames in basic modal logic, but there is in hybrid logic. Irreflexivity is significant in that irreflexive and symmetric Kripke frames can be regarded as undirected graphs reviewed from a graph theoretic point of view. Thus, the study of the hybrid logic with axioms corresponding to irreflexivity and symmetry can help to elucidate the logical properties of undirected graphs. In this paper, we formulate the tableau method of the hybrid logic for undirected graphs. Our main result is to show the completeness theorem and the termination property of the tableau method, which leads us to prove the decidability.
- [276] arXiv:2405.09203 (cross-list from math.CV) [pdf, ps, other]
-
Title: Monte Carlo methods on compact complex manifolds using Bergman kernelsThibaut Lemoine (CRIStAL), Rémi Bardenet (TAO, CRIStAL)Subjects: Complex Variables (math.CV); Numerical Analysis (math.NA); Probability (math.PR)
In this paper, we propose a new randomized method for numerical integration on a compact complex manifold with respect to a continuous volume form. Taking for quadrature nodes a suitable determinantal point process, we build an unbiased Monte Carlo estimator of the integral of any Lipschitz function, and show that the estimator satisfies a central limit theorem, with a faster rate than under independent sampling. In particular, seeing a complex manifold of dimension $d$ as a real manifold of dimension $d_{\mathbb{R}}=2d$, the mean squared error for $N$ quadrature nodes decays as $N^{-1-2/d_{\mathbb{R}}}$; this is faster than previous DPP-based quadratures and reaches the optimal worst-case rate investigated by [Bakhvalov 1965] in Euclidean spaces. The determinantal point process we use is characterized by its kernel, which is the Bergman kernel of a holomorphic Hermitian line bundle, and we strongly build upon the work of Berman that led to the central limit theorem in [Berman, 2018].We provide numerical illustrations for the Riemann sphere.
- [277] arXiv:2405.09244 (cross-list from astro-ph.EP) [pdf, ps, html, other]
-
Title: NeuralCMS: A deep learning approach to study Jupiter's interiorComments: 8 pages, 6 figures, 4 tables, accepted for publication in A&ASubjects: Earth and Planetary Astrophysics (astro-ph.EP); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
NASA's Juno mission provided exquisite measurements of Jupiter's gravity field that together with the Galileo entry probe atmospheric measurements constrains the interior structure of the giant planet. Inferring its interior structure range remains a challenging inverse problem requiring a computationally intensive search of combinations of various planetary properties, such as the cloud-level temperature, composition, and core features, requiring the computation of ~10^9 interior models. We propose an efficient deep neural network (DNN) model to generate high-precision wide-ranged interior models based on the very accurate but computationally demanding concentric MacLaurin spheroid (CMS) method. We trained a sharing-based DNN with a large set of CMS results for a four-layer interior model of Jupiter, including a dilute core, to accurately predict the gravity moments and mass, given a combination of interior features. We evaluated the performance of the trained DNN (NeuralCMS) to inspect its predictive limitations. NeuralCMS shows very good performance in predicting the gravity moments, with errors comparable with the uncertainty due to differential rotation, and a very accurate mass prediction. This allowed us to perform a broad parameter space search by computing only ~10^4 actual CMS interior models, resulting in a large sample of plausible interior structures, and reducing the computation time by a factor of 10^5. Moreover, we used a DNN explainability algorithm to analyze the impact of the parameters setting the interior model on the predicted observables, providing information on their nonlinear relation.
- [278] arXiv:2405.09254 (cross-list from math.CO) [pdf, ps, html, other]
-
Title: Eigenvalue bounds and alternating rank-metric codesSubjects: Combinatorics (math.CO); Information Theory (cs.IT)
In this note we apply a spectral method to the graph of alternating bilinear forms. In this way, we obtain upper bounds on the size of an alternating rank-metric code for given values of the minimum rank distance. We computationally compare our results with Delsarte's linear programming bound, observing that they give the same value. For small values of the minimum rank distance, we are able to establish the equivalence of the two methods. The problem remains open for larger values.
- [279] arXiv:2405.09272 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Using an Evolutionary Algorithm to Create (MAX)-3SAT QUBOsSubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
A common way of solving satisfiability instances with quantum methods is to transform these instances into instances of QUBO, which in itself is a potentially difficult and expensive task. State-of-the-art transformations from MAX-3SAT to QUBO currently work by mapping clauses of a 3SAT formula associated with the MAX-3SAT instance to an instance of QUBO and combining the resulting QUBOs into a single QUBO instance representing the whole MAX-3SAT instance. As creating these transformations is currently done manually or via exhaustive search methods and, therefore, algorithmically inefficient, we see potential for including search-based optimization. In this paper, we propose two methods of using evolutionary algorithms to automatically create QUBO representations of MAX-3SAT problems. We evaluate our created QUBOs on 500 and 1000-clause 3SAT formulae and find competitive performance to state-of-the-art baselines when using both classical and quantum annealing solvers.
- [280] arXiv:2405.09283 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Bounds and Approximations for the Distribution of a Sum of Lognormal Random VariablesSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
A sum of lognormal random variables (RVs) appears in many problems of science and engineering. For example, it is invloved in computing the distribution of recevied signal and interference powers for radio channels subject to lognormal shadow fading. Its distribution has no closed-from expression and it is typically characterized by approximations, asymptotes or bounds. We give a novel upper bound on the cumulative distribution function (CDF) of a sum of $N$ lognormal RVs. The bound is derived from the tangential mean-arithmetic mean inequality. By using the tangential mean, our method replaces the sum of $N$ lognormal RVs with a product of $N$ shifted lognormal RVs. It is shown that the bound can be made arbitrarily close to the desired CDF, and thus it becomes more accurate than any other bound or approximation, as the shift approaches infinity. The bound is computed by numerical integration, for which we introduce the Mellin transform, which is applicable to products of RVs. At the left tail of the CDF, the bound can be expressed by a single Q-function. Moreover, we derive simple new approximations to the CDF, expressed as a product $N$ Q-functions, which are more accurate than the previous method of Farley.
- [281] arXiv:2405.09298 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Deep Blur Multi-Model (DeepBlurMM) -- a strategy to mitigate the impact of image blur on deep learning model performance in histopathology image analysisSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
AI-based analysis of histopathology whole slide images (WSIs) is central in computational pathology. However, image quality can impact model performance. Here, we investigate to what extent unsharp areas of WSIs impact deep convolutional neural network classification performance. We propose a multi-model approach, i.e. DeepBlurMM, to alleviate the impact of unsharp image areas and improve the model performance. DeepBlurMM uses the sigma cut-offs to determine the most suitable model for predicting tiles with various levels of blurring within a single WSI, where sigma is the standard deviation of the Gaussian distribution. Specifically, the cut-offs categorise the tiles into sharp or slight blur, moderate blur, and high blur. Each blur level has a corresponding model to be selected for tile-level predictions. Throughout the simulation study, we demonstrated the application of DeepBlurMM in a binary classification task for breast cancer Nottingham Histological Grade 1 vs 3. Performance, evaluated over 5-fold cross-validation, showed that DeepBlurMM outperformed the base model under moderate blur and mixed blur conditions. Unsharp image tiles (local blurriness) at prediction time reduced model performance. The proposed multi-model approach improved performance under some conditions, with the potential to improve quality in both research and clinical applications.
- [282] arXiv:2405.09346 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Full-wave EM simulation analysis of human body blockage by dense 2D antenna arraysSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
Recently, proposals of human-sensing-based services for cellular and local area networks have brought indoor localization to the attention of several research groups. In response to these stimuli, various Device-Free Localization (DFL) techniques, also known as Passive Localization methods, have emerged by exploiting ambient signals to locate and track individuals that do not carry any electronic device. This study delves into human passive indoor localization through full-wave electromagnetic simulations. For the scope, we exploit simulations from the commercial tool FEKO software that employs the Method of Moments (MoM). In particular, we collect and analyze the electric field values in a scenario constituted by a dense 2D/3D deployment of receivers in the presence of an anthropomorphic mobile target. The paper describes in detail the collected dataset and provides a first analysis based on a statistical approach. Possible use cases are also investigated through examples in the context of passive localization, sensing, and imaging.
- [283] arXiv:2405.09351 (cross-list from math.DS) [pdf, ps, html, other]
-
Title: Analysis of the Geometric Structure of Neural Networks and Neural ODEs via Morse FunctionsSubjects: Dynamical Systems (math.DS); Neural and Evolutionary Computing (cs.NE)
Besides classical feed-forward neural networks, also neural ordinary differential equations (neural ODEs) gained particular interest in recent years. Neural ODEs can be interpreted as an infinite depth limit of feed-forward or residual neural networks. We study the input-output dynamics of finite and infinite depth neural networks with scalar output. In the finite depth case, the input is a state associated to a finite number of nodes, which maps under multiple non-linear transformations to the state of one output node. In analogy, a neural ODE maps a linear transformation of the input to a linear transformation of its time-$T$ map. We show that depending on the specific structure of the network, the input-output map has different properties regarding the existence and regularity of critical points. These properties can be characterized via Morse functions, which are scalar functions, where every critical point is non-degenerate. We prove that critical points cannot exist, if the dimension of the hidden layer is monotonically decreasing or the dimension of the phase space is smaller or equal to the input dimension. In the case that critical points exist, we classify their regularity depending on the specific architecture of the network. We show that each critical point is non-degenerate, if for finite depth neural networks the underlying graph has no bottleneck, and if for neural ODEs, the linear transformations used have full rank. For each type of architecture, the proven properties are comparable in the finite and in the infinite depth case. The established theorems allow us to formulate results on universal embedding, i.e.\ on the exact representation of maps by neural networks and neural ODEs. Our dynamical systems viewpoint on the geometric structure of the input-output map provides a fundamental understanding, why certain architectures perform better than others.
- [284] arXiv:2405.09352 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: On the impact of the antenna radiation patterns in passive radio sensingJournal-ref: IEEE Antennas and Wireless Propagation Letters (Volume: 23, Issue: 2, February 2024)Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
Electromagnetic (EM) body models based on the scalar diffraction theory allow to predict the impact of subject motions on the radio propagation channel without requiring a time-consuming full-wave approach. On the other hand, they are less effective in complex environments characterized by significant multipath effects. Recently, emerging radio sensing applications have proposed the adoption of smart antennas with non-isotropic radiation characteristics to improve coverage.This letter investigates the impact of antenna radiation patterns in passive radio sensing applications. Adaptations of diffraction-based EM models are proposed to account for antenna non-uniform angular filtering. Next, we quantify experimentally the impact of diffraction and multipath disturbance components on radio sensing accuracy in environments with smart antennas.
- [285] arXiv:2405.09353 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Large coordinate kernel attention network for lightweight image super-resolutionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in computational and memory footprints as the convolutional kernel size increases. To address the first issue, we propose the multi-scale blueprint separable convolutions (MBSConv) as highly efficient building block with multi-scale receptive field, it can focus on the learning for the multi-scale information which is a vital component of discriminative representation. As for the second issue, we revisit the key properties of LKA in which we find that the adjacent direct interaction of local information and long-distance dependencies is crucial to provide remarkable performance. Thus, taking this into account and in order to mitigate the complexity of LKA, we propose a large coordinate kernel attention (LCKA) module which decomposes the 2D convolutional kernels of the depth-wise convolutional layers in LKA into horizontal and vertical 1-D kernels. LCKA enables the adjacent direct interaction of local information and long-distance dependencies not only in the horizontal direction but also in the vertical. Besides, LCKA allows for the direct use of extremely large kernels in the depth-wise convolutional layers to capture more contextual information, which helps to significantly improve the reconstruction performance, and it incurs lower computational complexity and memory footprints. Integrating MBSConv and LCKA, we propose a large coordinate kernel attention network (LCAN).
- [286] arXiv:2405.09362 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: On the Saturation Effect of Kernel Ridge RegressionComments: ICLR 2023; Minor errors are corrected in this versionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-standing conjecture.
- [287] arXiv:2405.09389 (cross-list from q-bio.PE) [pdf, ps, other]
-
Title: Phylotrack: C++ and Python libraries for in silico phylogenetic trackingSubjects: Populations and Evolution (q-bio.PE); Neural and Evolutionary Computing (cs.NE)
In silico evolution instantiates the processes of heredity, variation, and differential reproductive success (the three "ingredients" for evolution by natural selection) within digital populations of computational agents. Consequently, these populations undergo evolution, and can be used as virtual model systems for studying evolutionary dynamics. This experimental paradigm -- used across biological modeling, artificial life, and evolutionary computation -- complements research done using in vitro and in vivo systems by enabling experiments that would be impossible in the lab or field. One key benefit is complete, exact observability. For example, it is possible to perfectly record all parent-child relationships across simulation history, yielding complete phylogenies (ancestry trees). This information reveals when traits were gained or lost, and also facilitates inference of underlying evolutionary dynamics.
The Phylotrack project provides libraries for tracking and analyzing phylogenies in in silico evolution. The project is composed of 1) Phylotracklib: a header-only C++ library, developed under the umbrella of the Empirical project, and 2) Phylotrackpy: a Python wrapper around Phylotracklib, created with Pybind11. Both components supply a public-facing API to attach phylogenetic tracking to digital evolution systems, as well as a stand-alone interface for measuring a variety of popular phylogenetic topology metrics. Underlying design and C++ implementation prioritizes efficiency, allowing for fast generational turnover for agent populations numbering in the tens of thousands. Several explicit features (e.g., phylogeny pruning and abstraction, etc.) are provided for reducing the memory footprint of phylogenetic information. - [288] arXiv:2405.09395 (cross-list from q-bio.NC) [pdf, ps, html, other]
-
Title: Matching domain experts by training from scratch on domain knowledgeSubjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Recently, large language models (LLMs) have outperformed human experts in predicting the results of neuroscience experiments (Luo et al., 2024). What is the basis for this performance? One possibility is that statistical patterns in that specific scientific literature, as opposed to emergent reasoning abilities arising from broader training, underlie LLMs' performance. To evaluate this possibility, we trained (next word prediction) a relatively small 124M-parameter GPT-2 model on 1.3 billion tokens of domain-specific knowledge. Despite being orders of magnitude smaller than larger LLMs trained on trillions of tokens, small models achieved expert-level performance in predicting neuroscience results. Small models trained on the neuroscience literature succeeded when they were trained from scratch using a tokenizer specifically trained on neuroscience text or when the neuroscience literature was used to finetune a pretrained GPT-2. Our results indicate that expert-level performance may be attained by even small LLMs through domain-specific, auto-regressive training approaches.
- [289] arXiv:2405.09455 (cross-list from stat.CO) [pdf, ps, html, other]
-
Title: Efficient pooling designs and screening performance in group testing for two type defectivesSubjects: Computation (stat.CO); Information Theory (cs.IT)
Group testing is utilized in the case when we want to find a few defectives among large amount of items. Testing n items one by one requires n tests, but if the ratio of defectives is small, group testing is an efficient way to reduce the number of tests. Many research have been developed for group testing for a single type of defectives. In this paper, we consider the case where two types of defective A and B exist. For two types of defectives, we develop a belief propagation algorithm to compute marginal posterior probability of defectives. Furthermore, we construct several kinds of collections of pools in order to test for A and B. And by utilizing our belief propagation algorithm, we evaluate the performance of group testing by conducting simulations.
- [290] arXiv:2405.09457 (cross-list from cond-mat.stat-mech) [pdf, ps, other]
-
Title: Recurrence solution of monomer-polymer models on two-dimensional rectangular latticesSubjects: Statistical Mechanics (cond-mat.stat-mech); Computational Complexity (cs.CC); Combinatorics (math.CO)
The problem of counting polymer coverings on the rectangular lattices is investigated. In this model, a linear rigid polymer covers $k$ adjacent lattice sites such that no two polymers occupy a common site. Those unoccupied lattice sites are considered as monomers. We prove that for a given number of polymers ($k$-mers), the number of arrangements for the polymers on two-dimensional rectangular lattices satisfies simple recurrence relations. These recurrence relations are quite general and apply for arbitrary polymer length ($k$) and the width of the lattices ($n$). The well-studied monomer-dimer problem is a special case of the monomer-polymer model when $k=2$. It is known the enumeration of monomer-dimer configurations in planar lattices is #P-complete. The recurrence relations shown here have the potential for hints for the solution of long-standing problems in this class of computational complexity.
- [291] arXiv:2405.09464 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Scalable Scheduling Policies for Quantum Satellite NetworksSubjects: Quantum Physics (quant-ph); Performance (cs.PF)
As Low Earth Orbit (LEO) satellite mega constellations continue to be deployed for satellite internet and recent successful experiments in satellite-based quantum entanglement distribution emerge, a natural question arises: How should we coordinate transmissions and design scalable scheduling policies for a quantum satellite internet? In this work, we consider the problem of transmission scheduling in quantum satellite networks subject to resource constraints at the satellites and ground stations. We show that the most general problem of assigning satellites to ground station pairs for entanglement distribution is NP-hard. We then propose four heuristic algorithms and evaluate their performance for Starlink mega constellation under various amount of resources and placements of the ground stations. We find that the maximum number of receivers necessary per ground station grows very slowly with the total number of deployed ground stations. Our proposed algorithms, leveraging optimal weighted b-matching and the global greedy heuristic, outperform others in entanglement distribution rate, entanglement fidelity, and handover cost metrics. While we develop these scheduling algorithms, we have also designed a software system to simulate, visualize, and evaluate satellite mega-constellations for entanglement distribution.
- [292] arXiv:2405.09472 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Perception- and Fidelity-aware Reduced-Reference Super-Resolution Image Quality AssessmentComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
With the advent of image super-resolution (SR) algorithms, how to evaluate the quality of generated SR images has become an urgent task. Although full-reference methods perform well in SR image quality assessment (SR-IQA), their reliance on high-resolution (HR) images limits their practical applicability. Leveraging available reconstruction information as much as possible for SR-IQA, such as low-resolution (LR) images and the scale factors, is a promising way to enhance assessment performance for SR-IQA without HR for reference. In this letter, we attempt to evaluate the perceptual quality and reconstruction fidelity of SR images considering LR images and scale factors. Specifically, we propose a novel dual-branch reduced-reference SR-IQA network, \ie, Perception- and Fidelity-aware SR-IQA (PFIQA). The perception-aware branch evaluates the perceptual quality of SR images by leveraging the merits of global modeling of Vision Transformer (ViT) and local relation of ResNet, and incorporating the scale factor to enable comprehensive visual perception. Meanwhile, the fidelity-aware branch assesses the reconstruction fidelity between LR and SR images through their visual perception. The combination of the two branches substantially aligns with the human visual system, enabling a comprehensive SR image evaluation. Experimental results indicate that our PFIQA outperforms current state-of-the-art models across three widely-used SR-IQA benchmarks. Notably, PFIQA excels in assessing the quality of real-world SR images.
- [293] arXiv:2405.09493 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Constrained Learning for Causal Inference and Semiparametric StatisticsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Causal estimation (e.g. of the average treatment effect) requires estimating complex nuisance parameters (e.g. outcome models). To adjust for errors in nuisance parameter estimation, we present a novel correction method that solves for the best plug-in estimator under the constraint that the first-order error of the estimator with respect to the nuisance parameter estimate is zero. Our constrained learning framework provides a unifying perspective to prominent first-order correction approaches including debiasing (a.k.a. augmented inverse probability weighting) and targeting (a.k.a. targeted maximum likelihood estimation). Our semiparametric inference approach, which we call the "C-Learner", can be implemented with modern machine learning methods such as neural networks and tree ensembles, and enjoys standard guarantees like semiparametric efficiency and double robustness. Empirically, we demonstrate our approach on several datasets, including those with text features that require fine-tuning language models. We observe the C-Learner matches or outperforms other asymptotically optimal estimators, with better performance in settings with less estimated overlap.
- [294] arXiv:2405.09514 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Tackling Distribution Shifts in Task-Oriented Communication with Information BottleneckComments: 13 pages, 8 figures, submitted to IEEE for potential publicationSubjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)
Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that the encoded features can generalize to domain-shifted data and detect semanticshifted data, while remaining compact for transmission. In this paper, we propose a novel approach based on the information bottleneck (IB) principle and invariant risk minimization (IRM) framework. The proposed method aims to extract compact and informative features that possess high capability for effective domain-shift generalization and accurate semantic-shift detection without any knowledge of the test data during training. Specifically, we propose an invariant feature encoding approach based on the IB principle and IRM framework for domainshift generalization, which aims to find the causal relationship between the input data and task result by minimizing the complexity and domain dependence of the encoded feature. Furthermore, we enhance the task-oriented communication with the label-dependent feature encoding approach for semanticshift detection which achieves joint gains in IB optimization and detection performance. To avoid the intractable computation of the IB-based objective, we leverage variational approximation to derive a tractable upper bound for optimization. Extensive simulation results on image classification tasks demonstrate that the proposed scheme outperforms state-of-the-art approaches and achieves a better rate-distortion tradeoff.
- [295] arXiv:2405.09516 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity AnalysisSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Many algorithms have been recently proposed for causal machine learning. Yet, there is little to no theory on their quality, especially considering finite samples. In this work, we propose a theory based on generalization bounds that provides such guarantees. By introducing a novel change-of-measure inequality, we are able to tightly bound the model loss in terms of the deviation of the treatment propensities over the population, which we show can be empirically limited. Our theory is fully rigorous and holds even in the face of hidden confounding and violations of positivity. We demonstrate our bounds on semi-synthetic and real data, showcasing their remarkable tightness and practical utility.
- [296] arXiv:2405.09523 (cross-list from math.ST) [pdf, ps, html, other]
-
Title: On Semi-supervised Estimation of Discrete Distributions under f-divergencesComments: Full version. Presented in ISIT-24. arXiv admin note: text overlap with arXiv:2305.07955Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)
We study the problem of estimating the joint probability mass function (pmf) over two random variables. In particular, the estimation is based on the observation of $m$ samples containing both variables and $n$ samples missing one fixed variable. We adopt the minimax framework with $l^p_p$ loss functions. Recent work established that univariate minimax estimator combinations achieve minimax risk with the optimal first-order constant for $p \ge 2$ in the regime $m = o(n)$, questions remained for $p \le 2$ and various $f$-divergences. In our study, we affirm that these composite estimators are indeed minimax optimal for $l^p_p$ loss functions, specifically for the range $1 \le p \le 2$, including the critical $l_1$ loss. Additionally, we ascertain their optimality for a suite of $f$-divergences, such as KL, $\chi^2$, Squared Hellinger, and Le Cam divergences.
- [297] arXiv:2405.09525 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Improved classical shadows from local symmetries in the Schur basisSubjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (cs.LG)
We study the sample complexity of the classical shadows task: what is the fewest number of copies of an unknown state you need to measure to predict expected values with respect to some class of observables? Large joint measurements are likely required in order to minimize sample complexity, but previous joint measurement protocols only work when the unknown state is pure. We present the first joint measurement protocol for classical shadows whose sample complexity scales with the rank of the unknown state. In particular we prove $\mathcal O(\sqrt{rB}/\epsilon^2)$ samples suffice, where $r$ is the rank of the state, $B$ is a bound on the squared Frobenius norm of the observables, and $\epsilon$ is the target accuracy. In the low-rank regime, this is a nearly quadratic advantage over traditional approaches that use single-copy measurements.
We present several intermediate results that may be of independent interest: a solution to a new formulation of classical shadows that captures functions of non-identical input states; a generalization of a ``nice'' Schur basis used for optimal qubit purification and quantum majority vote; and a measurement strategy that allows us to use local symmetries in the Schur basis to avoid intractable Weingarten calculations in the analysis. - [298] arXiv:2405.09528 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Energy-Efficient Sleep Mode Optimization of 5G mmWave Networks Using Deep Contextual MABSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)
Millimeter-wave (mmWave) networks, integral to 5G communication, offer a vast spectrum that addresses the issue of spectrum scarcity and enhances peak rate and capacity. However, their dense deployment, necessary to counteract propagation losses, leads to high power consumption. An effective strategy to reduce this energy consumption in mobile networks is the sleep mode optimization (SMO) of base stations (BSs). In this paper, we propose a novel SMO approach for mmWave BSs in a 3D urban environment. This approach, which incorporates a neural network (NN) based contextual multi-armed bandit (C-MAB) with an epsilon decay algorithm, accommodates the dynamic and diverse traffic of user equipment (UE) by clustering the UEs in their respective tracking areas (TAs). Our strategy includes beamforming, which helps reduce energy consumption from the UE side, while SMO minimizes energy use from the BS perspective. We extended our investigation to include Random, Epsilon Greedy, Upper Confidence Bound (UCB), and Load Based sleep mode (SM) strategies. We compared the performance of our proposed C-MAB based SM algorithm with those of All On and other alternative approaches. Simulation results show that our proposed method outperforms all other SM strategies in terms of the $10^{th}$ percentile of user rate and average throughput while demonstrating comparable average throughput to the All On approach. Importantly, it outperforms all approaches in terms of energy efficiency (EE).
- [299] arXiv:2405.09535 (cross-list from cond-mat.dis-nn) [pdf, ps, html, other]
-
Title: Restoring balance: principled under/oversampling of data for optimal classificationComments: 9 pages + appendix, 3 figuresSubjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
Class imbalance in real-world data poses a common bottleneck for machine learning tasks, since achieving good generalization on under-represented examples is often challenging. Mitigation strategies, such as under or oversampling the data depending on their abundances, are routinely proposed and tested empirically, but how they should adapt to the data statistics remains poorly understood. In this work, we determine exact analytical expressions of the generalization curves in the high-dimensional regime for linear classifiers (Support Vector Machines). We also provide a sharp prediction of the effects of under/oversampling strategies depending on class imbalance, first and second moments of the data, and the metrics of performance considered. We show that mixed strategies involving under and oversampling of data lead to performance improvement. Through numerical experiments, we show the relevance of our theoretical predictions on real datasets, on deeper architectures and with sampling strategies based on unsupervised probabilistic models.
- [300] arXiv:2405.09536 (cross-list from stat.ME) [pdf, ps, html, other]
-
Title: Wasserstein Gradient Boosting: A General Framework with Applications to Posterior RegressionSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
Gradient boosting is a sequential ensemble method that fits a new base learner to the gradient of the remaining loss at each step. We propose a novel family of gradient boosting, Wasserstein gradient boosting, which fits a new base learner to an exactly or approximately available Wasserstein gradient of a loss functional on the space of probability distributions. Wasserstein gradient boosting returns a set of particles that approximates a target probability distribution assigned at each input. In probabilistic prediction, a parametric probability distribution is often specified on the space of output variables, and a point estimate of the output-distribution parameter is produced for each input by a model. Our main application of Wasserstein gradient boosting is a novel distributional estimate of the output-distribution parameter, which approximates the posterior distribution over the output-distribution parameter determined pointwise at each data point. We empirically demonstrate the superior performance of the probabilistic prediction by Wasserstein gradient boosting in comparison with various existing methods.
- [301] arXiv:2405.09539 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal CancerComments: Early accepted to MICCAI 2024 (6/6/5)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-modal based methods may likely introduce information redundancy, leading to underperformance. In addition, efficient and effective interactions between multi-modal representations need to be further explored, lacking insightful exploration of prognostic correlation in multi-modality features. In this work, we introduce a multi-modal heterogeneous graph-based conditional feature-guided diffusion model for lymph node metastasis diagnosis based on CT images as well as clinical measurements and radiomics data. To explore the intricate relationships between multi-modal features, we construct a heterogeneous graph. Following this, a conditional feature-guided diffusion approach is applied to eliminate information redundancy. Moreover, we propose a masked relational representation learning strategy, aiming to uncover the latent prognostic correlations and priorities of primary tumor and lymph node image representations. Various experimental results validate the effectiveness of our proposed method. The code is available at this https URL.
- [302] arXiv:2405.09541 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Spectral complexity of deep neural networksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations.
Cross submissions for Thursday, 16 May 2024 (showing 50 of 50 entries )
- [303] arXiv:2012.03906 (replaced) [pdf, ps, other]
-
Title: Breaking the Barrier of 2 for the Competitiveness of Longest Queue DropComments: A preliminary version appeared at ICALP 2021. This version contains an improved analysis which yields a slightly better upper bound. 30 pagesSubjects: Data Structures and Algorithms (cs.DS)
We consider the problem of managing the buffer of a shared-memory switch that transmits packets of unit value. A shared-memory switch consists of an input port, a number of output ports, and a buffer with a specific capacity. In each time step, an arbitrary number of packets arrive at the input port, each packet designated for one output port. Each packet is added to the queue of the respective output port. If the total number of packets exceeds the capacity of the buffer, some packets have to be irrevocably evicted. At the end of each time step, each output port transmits a packet in its queue and the goal is to maximize the number of transmitted packets.
The Longest Queue Drop (LQD) online algorithm accepts any arriving packet to the buffer. However, if this results in the buffer exceeding its memory capacity, then LQD drops a packet from whichever queue is currently the longest, breaking ties arbitrarily. The LQD algorithm was first introduced in 1991, and is known to be $2$-competitive since 2001. Although LQD remains the best known online algorithm for the problem and is of practical interest, determining its true competitiveness is a long-standing open problem. We show that LQD is 1.6918-competitive, establishing the first $(2-\varepsilon)$ upper bound for the competitive ratio of LQD, for a constant $\varepsilon>0$. - [304] arXiv:2104.02933 (replaced) [pdf, ps, html, other]
-
Title: Does the First Response Matter for Future Contributions? A Study of First ContributionsNoppadol Assavakamhaenghan, Supatsara Wattanakriengkrai, Naomichi Shimada, Raula Gaikovina Kula, Takashi Ishio, Kenichi MatsumotoSubjects: Software Engineering (cs.SE)
Open Source Software (OSS) projects rely on a continuous stream of new contributors for their livelihood. Recent studies reported that new contributors experience many barriers in their first contribution, with the social barrier being critical. Although a number of studies investigated the social barriers to new contributors, we hypothesize that negative first responses may cause an unpleasant feeling, and subsequently lead to the discontinuity of any future contribution. We execute protocols of a registered report to analyze 2,765,917 first contributions as Pull Requests (PRs) with 642,841 first responses. We characterize most first response as being positive, but less responsive, and exhibiting sentiments of fear, joy and love. Results also indicate that negative first responses have the literal intention to arouse emotions of being either constructive (50.71%) or criticizing (37.68%) in nature. Running different machine learning models, we find that predicting future interactions is low (F1 score of 0.6171), but relatively better than baselines. Furthermore, an analysis of these models show that interactions are positively correlated with a future contribution, with other dimensions (i.e., project, contributor, contribution) having a large effect.
- [305] arXiv:2109.03459 (replaced) [pdf, ps, html, other]
-
Title: Dual Correction Strategy for Ranking Distillation in Top-N Recommender SystemComments: CIKM 2021Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) to a small model (student), has become an important area of research for practical deployment of recommender systems. Recently, Relaxed Ranking Distillation (RRD) has shown that distilling the ranking information in the recommendation list significantly improves the performance. However, the method still has limitations in that 1) it does not fully utilize the prediction errors of the student model, which makes the training not fully efficient, and 2) it only distills the user-side ranking information, which provides an insufficient view under the sparse implicit feedback. This paper presents Dual Correction strategy for Distillation (DCD), which transfers the ranking information from the teacher model to the student model in a more efficient manner. Most importantly, DCD uses the discrepancy between the teacher model and the student model predictions to decide which knowledge to be distilled. By doing so, DCD essentially provides the learning guidance tailored to "correcting" what the student model has failed to accurately predict. This process is applied for transferring the ranking information from the user-side as well as the item-side to address sparse implicit user feedback. Our experiments show that the proposed method outperforms the state-of-the-art baselines, and ablation studies validate the effectiveness of each component.
- [306] arXiv:2202.04294 (replaced) [pdf, ps, html, other]
-
Title: Optimal Clustering with Bandit FeedbackComments: 54 pages, 4 figuresSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
This paper considers the problem of online clustering with bandit feedback. A set of arms (or items) can be partitioned into various groups that are unknown. Within each group, the observations associated to each of the arms follow the same distribution with the same mean vector. At each time step, the agent queries or pulls an arm and obtains an independent observation from the distribution it is associated to. Subsequent pulls depend on previous ones as well as the previously obtained samples. The agent's task is to uncover the underlying partition of the arms with the least number of arm pulls and with a probability of error not exceeding a prescribed constant $\delta$. The problem proposed finds numerous applications from clustering of variants of viruses to online market segmentation. We present an instance-dependent information-theoretic lower bound on the expected sample complexity for this task, and design a computationally efficient and asymptotically optimal algorithm, namely Bandit Online Clustering (BOC). The algorithm includes a novel stopping rule for adaptive sequential testing that circumvents the need to exactly solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower bound asymptotically, and significantly outperforms a non-adaptive baseline algorithm.
- [307] arXiv:2205.02332 (replaced) [pdf, ps, html, other]
-
Title: Learning Individual Interactions from Population Dynamics with Discrete-Event Simulation ModelComments: for further modificationSubjects: Machine Learning (cs.LG)
The abundance of data affords researchers to pursue more powerful computational tools to learn the dynamics of complex system, such as neural networks, engineered systems and social networks. Traditional machine learning approaches capture complex system dynamics either with dynamic Bayesian networks and state space models, which is hard to scale because it is non-trivial to prescribe the dynamics with a sparse graph or a system of differential equations; or a deep neural networks, where the distributed representation of the learned dynamics is hard to interpret. In this paper, we will explore the possibility of learning a discrete-event simulation representation of complex system dynamics assuming multivariate normal distribution of the state variables, based on the observation that many complex system dynamics can be decomposed into a sequence of local interactions, which individually change the system state only minimally but in sequence generate complex and diverse dynamics. Our results show that the algorithm can data-efficiently capture complex network dynamics in several fields with meaningful events.
- [308] arXiv:2207.06358 (replaced) [pdf, ps, html, other]
-
Title: Smooth Anonymity for Sparse GraphsComments: WWW 2024 Short PaperSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
When working with user data providing well-defined privacy guarantees is paramount. In this work, we aim to manipulate and share an entire sparse dataset with a third party privately. In fact, differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets, e.g. sparse networks, as one of our main results, we prove that \emph{any} differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee. In such situations, we need to look into other privacy notions such as $k$-anonymity. In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity. We further perform an empirical evaluation to back our theoretical guarantees and show that our algorithm improves the performance in downstream machine learning tasks on anonymized data.
- [309] arXiv:2210.01905 (replaced) [pdf, ps, html, other]
-
Title: Polar Encoding: A Simple Baseline Approach for Classification with Missing ValuesComments: Accepted versionJournal-ref: IEEE Transactions on Fuzzy Systems, vol 32, no 5, pp 3084--3093, May 2024Subjects: Machine Learning (cs.LG)
We propose polar encoding, a representation of categorical and numerical $[0,1]$-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and $[0,1]$-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators.
- [310] arXiv:2210.13020 (replaced) [pdf, ps, other]
-
Title: On Tools for Completeness of Kleene Algebra with HypothesesSubjects: Logic in Computer Science (cs.LO)
In the literature on Kleene algebra, a number of variants have been proposed which impose additional structure specified by a theory, such as Kleene algebra with tests (KAT) and the recent Kleene algebra with observations (KAO), or make specific assumptions about certain constants, as for instance in NetKAT. Many of these variants fit within the unifying perspective offered by Kleene algebra with hypotheses, which comes with a canonical language model constructed from a given set of hypotheses. For the case of KAT, this model corresponds to the familiar interpretation of expressions as languages of guarded strings. A relevant question therefore is whether Kleene algebra together with a given set of hypotheses is complete with respect to its canonical language model. In this paper, we revisit, combine and extend existing results on this question to obtain tools for proving completeness in a modular way. We showcase these tools by giving new and modular proofs of completeness for KAT, KAO and NetKAT, and we prove completeness for new variants of KAT: KAT extended with a constant for the full relation, KAT extended with a converse operation, and a version of KAT where the collection of tests only forms a distributive lattice.
- [311] arXiv:2211.11860 (replaced) [pdf, ps, html, other]
-
Title: Upper and Lower Bounds on the Smoothed Complexity of the Simplex MethodComments: 43 pages, 5 figures. STOC 2023Subjects: Data Structures and Algorithms (cs.DS)
The simplex method for linear programming is known to be highly efficient in practice, and understanding its performance from a theoretical perspective is an active research topic. The framework of smoothed analysis, first introduced by Spielman and Teng (JACM '04) for this purpose, defines the smoothed complexity of solving a linear program with $d$ variables and $n$ constraints as the expected running time when Gaussian noise of variance $\sigma^2$ is added to the LP data. We prove that the smoothed complexity of the simplex method is $O(\sigma^{-3/2} d^{13/4}\log^{7/4} n)$, improving the dependence on $1/\sigma$ compared to the previous bound of $O(\sigma^{-2} d^2\sqrt{\log n})$. We accomplish this through a new analysis of the \emph{shadow bound}, key to earlier analyses as well. Illustrating the power of our new method, we use our method to prove a nearly tight upper bound on the smoothed complexity of two-dimensional polygons.
We also establish the first non-trivial lower bound on the smoothed complexity of the simplex method, proving that the \emph{shadow vertex simplex method} requires at least $\Omega \Big(\min \big(\sigma^{-1/2} d^{-1/2}\log^{-1/4} d,2^d \big) \Big)$ pivot steps with high probability. A key part of our analysis is a new variation on the extended formulation for the regular $2^k$-gon. We end with a numerical experiment that suggests this analysis could be further improved. - [312] arXiv:2212.11629 (replaced) [pdf, ps, html, other]
-
Title: Graph-Based Specification and Automated Construction of ILP ProblemsSebastian Ehmes (Technical University of Darmstadt, Real-Time Systems Lab, Germany), Maximilian Kratz (Technical University of Darmstadt, Real-Time Systems Lab, Germany), Andy Schürr (Technical University of Darmstadt, Real-Time Systems Lab, Germany)Comments: In Proceedings GCM 2022, arXiv:2212.10975; ILP section updated, acknowledgement updatedJournal-ref: EPTCS 374, 2022, pp. 3-22Subjects: Software Engineering (cs.SE)
In the Model-Driven Software Engineering (MDSE) community, the combination of techniques operating on graph-based models (e.g., Pattern Matching (PM) and Graph Transformation (GT)) and Integer Linear Programming (ILP) is a common occurrence, since ILP solvers offer a powerful approach to solve linear optimization problems and help to enforce global constraints while delivering optimal solutions. However, designing and specifying complex optimization problems from more abstract problem descriptions can be a challenging task. A designer must be an expert in the specific problem domain as well as the ILP optimization domain to translate the given problem into a valid ILP problem. Typically, domain-specific ILP problem generators are hand-crafted by experts, to avoid specifying a new ILP problem by hand for each new instance of a problem domain. Unfortunately, the task of writing ILP problem generators is an exercise, which has to be repeated for each new scenario, tool, and approach. For this purpose, we introduce the GIPS (Graph-Based ILP Problem Specification Tool) framework that simplifies the development of ILP problem generators for graph-based optimization problems and a new Domain-Specific Language (DSL) called GIPSL (Graph-Based ILP Problem Specification Language) that integrates GT and ILP problems on an abstract level. Our approach uses GIPSL specifications as a starting point to derive ILP problem generators for a specific application domain automatically. First experiments show that the derived ILP problem generators can compete with hand-crafted programs developed by ILP experts.
- [313] arXiv:2302.07520 (replaced) [pdf, ps, other]
-
Title: ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic ArrayComments: 14 pages, 22 figures, journalSubjects: Hardware Architecture (cs.AR)
The systolic accelerator is one of the premier architectural choices for DNN acceleration. However, the conventional systolic architecture suffers from low PE utilization due to the mismatch between the fixed array and diverse DNN workloads. Recent studies have proposed flexible systolic array architectures to adapt to DNN models. However, these designs support only coarse-grained reshaping or significantly increase hardware overhead. In this study, we propose ReDas, a flexible and lightweight systolic array that supports dynamic fine-grained reshaping and multiple dataflows. First, ReDas integrates lightweight and reconfigurable roundabout data paths, which achieve fine-grained reshaping using only short connections between adjacent PEs. Second, we redesign the PE microarchitecture and integrate a set of multi-mode data buffers around the array. The PE structure enables additional data bypassing and flexible data switching. Simultaneously, the multi-mode buffers facilitate fine-grained reallocation of on-chip memory resources, adapting to various dataflow requirements. ReDas can dynamically reconfigure to up to 129 different logical shapes and 3 dataflows for a 128x128 array. Finally, we propose an efficient mapper to generate appropriate configurations for each layer of DNN workloads. Compared to the conventional systolic array, ReDas can achieve about 4.6x speedup and 8.3x energy-delay product (EDP) reduction.
- [314] arXiv:2302.10886 (replaced) [pdf, ps, other]
-
Title: Some Fundamental Aspects about Lipschitz Continuity of Neural NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Lipschitz continuity is a crucial functional property of any predictive model, that naturally governs its robustness, generalisation, as well as adversarial vulnerability. Contrary to other works that focus on obtaining tighter bounds and developing different practical strategies to enforce certain Lipschitz properties, we aim to thoroughly examine and characterise the Lipschitz behaviour of Neural Networks. Thus, we carry out an empirical investigation in a range of different settings (namely, architectures, datasets, label noise, and more) by exhausting the limits of the simplest and the most general lower and upper bounds. As a highlight of this investigation, we showcase a remarkable fidelity of the lower Lipschitz bound, identify a striking Double Descent trend in both upper and lower bounds to the Lipschitz and explain the intriguing effects of label noise on function smoothness and generalisation.
- [315] arXiv:2304.01171 (replaced) [pdf, ps, html, other]
-
Title: Revisiting Context Aggregation for Image MattingQinglin Liu, Xiaoqian Lv, Quanling Meng, Zonglin Li, Xiangyuan Lan, Shuo Yang, Shengping Zhang, Liqiang NieSubjects: Computer Vision and Pattern Recognition (cs.CV)
Traditional studies emphasize the significance of context information in improving matting performance. Consequently, deep learning-based matting methods delve into designing pooling or affinity-based context aggregation modules to achieve superior results. However, these modules cannot well handle the context scale shift caused by the difference in image size during training and inference, resulting in matting performance degradation. In this paper, we revisit the context aggregation mechanisms of matting networks and find that a basic encoder-decoder network without any context aggregation modules can actually learn more universal context aggregation, thereby achieving higher matting performance compared to existing methods. Building on this insight, we present AEMatter, a matting network that is straightforward yet very effective. AEMatter adopts a Hybrid-Transformer backbone with appearance-enhanced axis-wise learning (AEAL) blocks to build a basic network with strong context aggregation learning capability. Furthermore, AEMatter leverages a large image training strategy to assist the network in learning context aggregation from data. Extensive experiments on five popular matting datasets demonstrate that the proposed AEMatter outperforms state-of-the-art matting methods by a large margin.
- [316] arXiv:2304.03427 (replaced) [pdf, ps, html, other]
-
Title: Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan ManuscriptsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Scholars in the humanities rely heavily on ancient manuscripts to study history, religion, and socio-political structures in the past. Many efforts have been devoted to digitizing these precious manuscripts using OCR technology, but most manuscripts were blemished over the centuries so that an Optical Character Recognition (OCR) program cannot be expected to capture faded graphs and stains on pages. This work presents a neural spelling correction model built on Google OCR-ed Tibetan Manuscripts to auto-correct OCR-ed noisy output. This paper is divided into four sections: dataset, model architecture, training and analysis. First, we feature-engineered our raw Tibetan etext corpus into two sets of structured data frames -- a set of paired toy data and a set of paired real data. Then, we implemented a Confidence Score mechanism into the Transformer architecture to perform spelling correction tasks. According to the Loss and Character Error Rate, our Transformer + Confidence score mechanism architecture proves to be superior to Transformer, LSTM-2-LSTM and GRU-2-GRU architectures. Finally, to examine the robustness of our model, we analyzed erroneous tokens, visualized Attention and Self-Attention heatmaps in our model.
- [317] arXiv:2305.03568 (replaced) [pdf, ps, html, other]
-
Title: A vector quantized masked autoencoder for audiovisual speech emotion recognitionComments: 15 pages, 5 figures, this https URLSubjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
The limited availability of labeled data is a major challenge in audiovisual speech emotion recognition (SER). Self-supervised learning approaches have recently been proposed to mitigate the need for labeled data in various applications. This paper proposes the VQ-MAE-AV model, a vector quantized masked autoencoder (MAE) designed for audiovisual speech self-supervised representation learning and applied to SER. Unlike previous approaches, the proposed method employs a self-supervised paradigm based on discrete audio and visual speech representations learned by vector quantized variational autoencoders. A multimodal MAE with self- or cross-attention mechanisms is proposed to fuse the audio and visual speech modalities and to learn local and global representations of the audiovisual speech sequence, which are then used for an SER downstream task. Experimental results show that the proposed approach, which is pre-trained on the VoxCeleb2 database and fine-tuned on standard emotional audiovisual speech datasets, outperforms the state-of-the-art audiovisual SER methods. Extensive ablation experiments are also provided to assess the contribution of the different model components.
- [318] arXiv:2305.08157 (replaced) [pdf, ps, html, other]
-
Title: Algorithmic Pluralism: A Structural Approach To Equal OpportunityComments: To appear in the proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT 2024)Subjects: Computers and Society (cs.CY)
We present a structural approach toward achieving equal opportunity in systems of algorithmic decision-making called algorithmic pluralism. Algorithmic pluralism describes a state of affairs in which no set of algorithms severely limits access to opportunity, allowing individuals the freedom to pursue a diverse range of life paths. To argue for algorithmic pluralism, we adopt Joseph Fishkin's theory of bottlenecks, which focuses on the structure of decision-points that determine how opportunities are allocated. The theory contends that each decision-point or bottleneck limits access to opportunities with some degree of severity and legitimacy. We extend Fishkin's structural viewpoint and use it to reframe existing systemic concerns about equal opportunity in algorithmic decision-making, such as patterned inequality and algorithmic monoculture. In proposing algorithmic pluralism, we argue for the urgent priority of alleviating severe bottlenecks in algorithmic decision-making. We contend that there must be a pluralism of opportunity available to many different individuals in order to promote equal opportunity in a systemic way. We further show how this framework has several implications for system design and regulation through current debates about equal opportunity in algorithmic hiring.
- [319] arXiv:2305.09651 (replaced) [pdf, ps, html, other]
-
Title: Tailoring Instructions to Student's Learning Levels Boosts Knowledge DistillationComments: Accepted at ACL 2023, main conference. Code available at this https URLSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer. In order to enhance the guidance of the teacher training process, we introduce the concept of distillation influence to determine the impact of distillation from each training sample on the student's generalization ability. In this paper, we propose Learning Good Teacher Matters (LGTM), an efficient training technique for incorporating distillation influence into the teacher's learning process. By prioritizing samples that are likely to enhance the student's generalization ability, our LGTM outperforms 10 common knowledge distillation baselines on 6 text classification tasks in the GLUE benchmark.
- [320] arXiv:2305.12705 (replaced) [pdf, ps, html, other]
-
Title: ForestTrav: Accurate, Efficient and Deployable Forest Traversability Estimation for Autonomous Ground VehiclesComments: Videolink: this https URLSubjects: Robotics (cs.RO)
Autonomous navigation in unstructured vegetated environments remains an open challenge. To successfully operate in these settings, ground vehicles must assess the traversability of the environment and determine which vegetation is pliable enough to push through. In this work, we propose a novel method that combines a high-fidelity and feature-rich 3D voxel representation while leveraging the structural context and sparseness of SCNN's to assess Traversability Estimation (TE) in densely vegetated environments. The proposed method is thoroughly evaluated on an accurately-labeled real-world data set that we provide to the community. It is shown to outperform state-of-the-art methods by a significant margin (0.59 vs. 0.39 MCC score at 0.1m voxel resolution) in challenging scenes and to generalize to unseen environments. In addition, the method is economical in the amount of training data and training time required: a model is trained in minutes on a desktop computer. We show that by exploiting the context of the environment, our method can use different feature combinations with only limited performance variations. For example, our approach can be used with lidar-only features, whilst still assessing complex vegetated environments accurately, which was not demonstrated previously in the literature in such environments. In addition, we propose an approach to assess a traversability estimator's sensitivity to information quality and show our method's sensitivity is low.
- [321] arXiv:2305.15699 (replaced) [pdf, ps, html, other]
-
Title: Cross-view Action Recognition Understanding From Exocentric to Egocentric PerspectiveSubjects: Computer Vision and Pattern Recognition (cs.CV)
Understanding action recognition in egocentric videos has emerged as a vital research topic with numerous practical applications. With the limitation in the scale of egocentric data collection, learning robust deep learning-based action recognition models remains difficult. Transferring knowledge learned from the large-scale exocentric data to the egocentric data is challenging due to the difference in videos across views. Our work introduces a novel cross-view learning approach to action recognition (CVAR) that effectively transfers knowledge from the exocentric to the selfish view. First, we present a novel geometric-based constraint into the self-attention mechanism in Transformer based on analyzing the camera positions between two views. Then, we propose a new cross-view self-attention loss learned on unpaired cross-view data to enforce the self-attention mechanism learning to transfer knowledge across views. Finally, to further improve the performance of our cross-view learning approach, we present the metrics to measure the correlations in videos and attention maps effectively. Experimental results on standard egocentric action recognition benchmarks, i.e., Charades-Ego, EPIC-Kitchens-55, and EPIC-Kitchens-100, have shown our approach's effectiveness and state-of-the-art performance.
- [322] arXiv:2305.16329 (replaced) [pdf, ps, html, other]
-
Title: A simple protocol to automate the executing, scaling, and reconfiguration of Cloud-Native AppsComments: version 1.1 of SSMMPSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We propose a simple protocol for Service Mesh management. The protocol specification consists of the formats of messages, and the actions taken by senders and recipients. The idea is that microservices of Cloud-Native Application should be also involved in configurations of their communication sessions. It does not interfere with the business logic of the microservices and requires only minor and generic modifications of the microservices codebase, limited only to network connections. Thus, sidecars are no longer needed, which is in line with the current trends, e.g. Cilium Service Mesh. This article presents the full formal specification of the proposed protocol SSMMP/v1.1.
- [323] arXiv:2306.01879 (replaced) [pdf, ps, other]
-
Title: Revisiting the Role of Language Priors in Vision-Language ModelsComments: Published at ICML 2024. Website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Vision-language models (VLMs) are impactful in part because they can be applied to a variety of visual understanding tasks in a zero-shot fashion, without any fine-tuning. We study $\textit{generative VLMs}$ that are trained for next-word generation given an image. We explore their zero-shot performance on the illustrative task of image-text retrieval across 8 popular vision-language benchmarks. Our first observation is that they can be repurposed for discriminative tasks (such as image-text retrieval) by simply computing the match score of generating a particular text string given an image. We call this probabilistic score the $\textit{Visual Generative Pre-Training Score}$ (VisualGPTScore). While the VisualGPTScore produces near-perfect accuracy on some retrieval benchmarks, it yields poor accuracy on others. We analyze this behavior through a probabilistic lens, pointing out that some benchmarks inadvertently capture unnatural language distributions by creating adversarial but unlikely text captions. In fact, we demonstrate that even a "blind" language model that ignores any image evidence can sometimes outperform all prior art, reminiscent of similar challenges faced by the visual-question answering (VQA) community many years ago. We derive a probabilistic post-processing scheme that controls for the amount of linguistic bias in generative VLMs at test time without having to retrain or fine-tune the model. We show that the VisualGPTScore, when appropriately debiased, is a strong zero-shot baseline for vision-language understanding, oftentimes producing state-of-the-art accuracy.
- [324] arXiv:2306.02928 (replaced) [pdf, ps, html, other]
-
Title: LRVS-Fashion: Extending Visual Search with Referring InstructionsComments: 29 pages, 14 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces a new challenge for image similarity search in the context of fashion, addressing the inherent ambiguity in this domain stemming from complex images. We present Referred Visual Search (RVS), a task allowing users to define more precisely the desired similarity, following recent interest in the industry. We release a new large public dataset, LRVS-Fashion, consisting of 272k fashion products with 842k images extracted from fashion catalogs, designed explicitly for this task. However, unlike traditional visual search methods in the industry, we demonstrate that superior performance can be achieved by bypassing explicit object detection and adopting weakly-supervised conditional contrastive learning on image tuples. Our method is lightweight and demonstrates robustness, reaching Recall at one superior to strong detection-based baselines against 2M distractors. The dataset is available at this https URL .
- [325] arXiv:2306.13030 (replaced) [pdf, ps, html, other]
-
Title: Online Self-Supervised Deep Learning for Intrusion Detection SystemsJournal-ref: Nak{\i}p, M., & Gelenbe, E. (2024). Online Self-Supervised Deep Learning for Intrusion Detection Systems. IEEE Transactions on Information Forensics and SecuritySubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
This paper proposes a novel Self-Supervised Intrusion Detection (SSID) framework, which enables a fully online Deep Learning (DL) based Intrusion Detection System (IDS) that requires no human intervention or prior off-line learning. The proposed framework analyzes and labels incoming traffic packets based only on the decisions of the IDS itself using an Auto-Associative Deep Random Neural Network, and on an online estimate of its statistically measured trustworthiness. The SSID framework enables IDS to adapt rapidly to time-varying characteristics of the network traffic, and eliminates the need for offline data collection. This approach avoids human errors in data labeling, and human labor and computational costs of model training and data collection. The approach is experimentally evaluated on public datasets and compared with well-known {machine learning and deep learning} models, showing that this SSID framework is very useful and advantageous as an accurate and online learning DL-based IDS for IoT systems.
- [326] arXiv:2306.13351 (replaced) [pdf, ps, html, other]
-
Title: Equations with infinite delay: pseudospectral discretization for numerical stability and bifurcation in an abstract frameworkComments: 23 pages, 4 figuresSubjects: Numerical Analysis (math.NA)
We consider nonlinear delay differential and renewal equations with infinite delay. We extend the work of Gyllenberg et al, Appl. Math. Comput. (2018) by introducing a unifying abstract framework, and derive a finite-dimensional approximating system via pseudospectral discretization. For renewal equations, we consider a reformulation in the space of absolutely continuous functions via integration. We prove the one-to-one correspondence of equilibria between the original equation and its approximation, and that linearization and discretization commute. Our most important result is the proof of convergence of the characteristic roots of the pseudospectral approximation of the linear(ized) equations when the collocation nodes are chosen as the family of scaled zeros or extrema of Laguerre polynomials. This ensures that the finite-dimensional system correctly reproduces the stability properties of the original linear equation if the dimension of the approximation is large enough. The result is illustrated with several numerical tests, which also demonstrate the effectiveness of the approach for the bifurcation analysis of equilibria of nonlinear equations. The new approach used to prove convergence also provides the exact location of the spectrum of the differentiation matrices for the Laguerre zeros and extrema, adding new insights into properties that are important in the numerical solution of differential equations by pseudospectral methods.
- [327] arXiv:2306.16851 (replaced) [pdf, ps, other]
-
Title: SWAT: A System-Wide Approach to Tunable Leakage Mitigation in Encrypted Data StoresSubjects: Cryptography and Security (cs.CR)
Numerous studies have underscored the significant privacy risks associated with various leakage patterns in encrypted data stores. While many solutions have been proposed to mitigate these leakages, they either (1) incur substantial overheads, (2) focus on specific subsets of leakage patterns, or (3) apply the same security notion across various workloads, thereby impeding the attainment of fine-tuned privacy-efficiency trade-offs. In light of various detrimental leakage patterns, this paper starts with an investigation into which specific leakage patterns require our focus in the contexts of key-value, range-query, and dynamic workloads, respectively. Subsequently, we introduce new security notions tailored to the specific privacy requirements of these workloads. Accordingly, we propose and instantiate SWAT, an efficient construction that progressively enables these workloads, while provably mitigating system-wide leakage via a suite of algorithms with tunable privacy-efficiency trade-offs. We conducted extensive experiments and compiled a detailed result analysis, showing the efficiency of our solution. SWATis about an order of magnitude slower than an encryption-only data store that reveals various leakage patterns and is two orders of magnitude faster than a trivial zero-leakage solution. Meanwhile, the performance of SWATremains highly competitive compared to other designs that mitigate specific types of leakage.
- [328] arXiv:2307.00229 (replaced) [pdf, ps, html, other]
-
Title: Constrained Local Approximate Ideal Restriction for Advection-Diffusion ProblemsComments: Revised form published in the SIAM Journal on Scientific computing (SPECIAL SECTION Copper Mountain 2023), 22 pages, 7 FiguresJournal-ref: SIAM J. SCI. COMPUT. 2024 Vol. 0, No. 0, pp. S96-S122Subjects: Numerical Analysis (math.NA)
This paper focuses on developing a reduction-based algebraic multigrid method that is suitable for solving general (non)symmetric linear systems and is naturally robust from pure advection to pure diffusion. Initial motivation comes from a new reduction-based algebraic multigrid (AMG) approach, $\ell$AIR (local approximate ideal restriction), that was developed for solving advection-dominated problems. Though this new solver is very effective in the advection dominated regime, its performance degrades in cases where diffusion becomes dominant. This is consistent with the fact that in general, reduction-based AMG methods tend to suffer from growth in complexity and/or convergence rates as the problem size is increased, especially for diffusion dominated problems in two or three dimensions. Motivated by the success of $\ell$AIR in the advective regime, our aim in this paper is to generalize the AIR framework with the goal of improving the performance of the solver in diffusion dominated regimes. To do so, we propose a novel way to combine mode constraints as used commonly in energy minimization AMG methods with the local approximation of ideal operators used in $\ell$AIR. The resulting constrained $\ell$AIR (C$\ell$AIR) algorithm is able to achieve fast scalable convergence on advective and diffusive problems. In addition, it is able to achieve standard low complexity hierarchies in the diffusive regime through aggressive coarsening, something that has been previously difficult for reduction-based methods.
- [329] arXiv:2307.04573 (replaced) [pdf, ps, other]
-
Title: A Semi-Automated Solution Approach Recommender for a Given Use Case: a Case Study for AI/ML in Oncology via Scopus and OpenAIComments: It was published online on 15 May 2024 in Human-Centric Intelligent Systems, SpringerSubjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Nowadays, literature review is a necessary task when trying to solve a given problem. However, an exhaustive literature review is very time-consuming in today's vast literature landscape. It can take weeks, even if looking only for abstracts or surveys. Moreover, choosing a method among others, and targeting searches within relevant problem and solution domains, are not easy tasks. These are especially true for young researchers or engineers starting to work in their field. Even if surveys that provide methods used to solve a specific problem already exist, an automatic way to do it for any use case is missing, especially for those who don't know the existing literature. Our proposed tool, SARBOLD-LLM, allows discovering and choosing among methods related to a given problem, providing additional information about their uses in the literature to derive decision-making insights, in only a few hours. The SARBOLD-LLM comprises three modules: (1: Scopus search) paper selection using a keyword selection scheme to query Scopus API; (2: Scoring and method extraction) relevancy and popularity scores calculation and solution method extraction in papers utilizing OpenAI API (GPT 3.5); (3: Analyzes) sensitivity analysis and post-analyzes which reveals trends, relevant papers and methods. Comparing the SARBOLD-LLM to manual ground truth using precision, recall, and F1-score metrics, the performance results of AI in the oncology case study are 0.68, 0.9, and 0.77, respectively. SARBOLD-LLM demonstrates successful outcomes across various domains, showcasing its robustness and effectiveness. The SARBOLD-LLM addresses engineers more than researchers, as it proposes methods and trends without adding pros and cons. It is a useful tool to select which methods to investigate first and comes as a complement to surveys. This can limit the global search and accumulation of knowledge for the end user. However...
- [330] arXiv:2307.06979 (replaced) [pdf, ps, html, other]
-
Title: Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language ModelsArman Sakif Chowdhury, G. M. Shahariar, Ahammed Tarik Aziz, Syed Mohibul Alam, Md. Azad Sheikh, Tanveer Ahmed BelalComments: Under ReviewSubjects: Computation and Language (cs.CL)
With the rise of social media and online news sources, fake news has become a significant issue globally. However, the detection of fake news in low resource languages like Bengali has received limited attention in research. In this paper, we propose a methodology consisting of four distinct approaches to classify fake news articles in Bengali using summarization and augmentation techniques with five pre-trained language models. Our approach includes translating English news articles and using augmentation techniques to curb the deficit of fake news articles. Our research also focused on summarizing the news to tackle the token length limitation of BERT based models. Through extensive experimentation and rigorous evaluation, we show the effectiveness of summarization and augmentation in the case of Bengali fake news detection. We evaluated our models using three separate test datasets. The BanglaBERT Base model, when combined with augmentation techniques, achieved an impressive accuracy of 96% on the first test dataset. On the second test dataset, the BanglaBERT model, trained with summarized augmented news articles achieved 97% accuracy. Lastly, the mBERT Base model achieved an accuracy of 86% on the third test dataset which was reserved for generalization performance evaluation. The datasets and implementations are available at this https URL
- [331] arXiv:2307.08122 (replaced) [pdf, ps, html, other]
-
Title: Tangent Transformers for Composition, Privacy and RemovalComments: Published at the International Conference on Learning Representations (ICLR) 2024. Code available at: this https URLSubjects: Machine Learning (cs.LG)
We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resulting from linearization can be computed efficiently in a single forward pass, reducing training and inference cost to the same order of magnitude as its original non-linear counterpart, while using the same number of parameters. Furthermore, we show that, when applied to various downstream visual classification tasks, the resulting Tangent Transformer fine-tuned with TAFT can perform comparably with fine-tuning the original non-linear network. Since Tangent Transformers are linear with respect to the new set of weights, and the resulting fine-tuning loss is convex, we show that TAFT enjoys several advantages compared to non-linear fine-tuning when it comes to model composition, parallel training, machine unlearning, and differential privacy. Our code is available at: this https URL
- [332] arXiv:2307.08241 (replaced) [pdf, ps, other]
-
Title: Weak approximation for stochastic reaction-diffusion equation near sharp interface limitComments: 58 pagesSubjects: Numerical Analysis (math.NA); Probability (math.PR)
It is known that when the diffuse interface thickness $\epsilon$ vanishes, the sharp interface limit of the stochastic reaction-diffusion equation is formally a stochastic geometric flow. To capture and simulate such geometric flow, it is crucial to develop numerical approximations whose error bounds depends on $\frac 1\epsilon$ polynomially. However, due to loss of spectral estimate of the linearized stochastic reaction-diffusion equation, how to get such error bound of numerical approximation has been an open problem.
In this paper, we solve this weak error bound problem for stochastic reaction-diffusion equations near sharp interface limit. We first introduce a regularized problem which enjoys the exponential ergodicity. Then we present the regularity analysis of the regularized Kolmogorov and Poisson equations which only depends on $\frac 1{\epsilon}$ polynomially. Furthermore, we establish such weak error bound. This phenomenon could be viewed as a kind of the regularization effect of noise on the numerical approximation of stochastic partial differential equation (SPDE). As a by-product, a central limit theorem of the weak approximation is shown near sharp interface limit. Our method of proof could be extended to a number of other spatial and temporal numerical approximations for semilinear SPDEs. - [333] arXiv:2307.10001 (replaced) [pdf, ps, html, other]
-
Title: As large as it gets: Learning infinitely large Filters via Neural Implicit Functions in the Fourier DomainComments: accepted at TMLR 05/24Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent work in neural networks for image classification has seen a strong tendency towards increasing the spatial context. Whether achieved through large convolution kernels or self-attention, models scale poorly with the increased spatial context, such that the improved model accuracy often comes at significant costs. In this paper, we propose a module for studying the effective filter size of convolutional neural networks. To facilitate such a study, several challenges need to be addressed: 1) we need an effective means to train models with large filters (potentially as large as the input data) without increasing the number of learnable parameters 2) the employed convolution operation should be a plug-and-play module that can replace conventional convolutions in a CNN and allow for an efficient implementation in current frameworks 3) the study of filter sizes has to be decoupled from other aspects such as the network width or the number of learnable parameters 4) the cost of the convolution operation itself has to remain manageable i.e. we cannot naively increase the size of the convolution kernel. To address these challenges, we propose to learn the frequency representations of filter weights as neural implicit functions, such that the better scalability of the convolution in the frequency domain can be leveraged. Additionally, due to the implementation of the proposed neural implicit function, even large and expressive spatial filters can be parameterized by only a few learnable weights. Our analysis shows that, although the proposed networks could learn very large convolution kernels, the learned filters are well localized and relatively small in practice when transformed from the frequency to the spatial domain. We anticipate that our analysis of individually optimized filter sizes will allow for more efficient, yet effective, models in the future. this https URL.
- [334] arXiv:2307.10484 (replaced) [pdf, ps, html, other]
-
Title: Inductive diagrams for causal reasoningComments: This revision is as published in PACMPL through OOPSLA, but with [authorversion] set. Compared to the previous version, the introduction has been almost entirely rewrittenJournal-ref: Proc. ACM Program. Lang. 8, OOPSLA1, Article 113 (April 2024), 26 pagesSubjects: Programming Languages (cs.PL)
The Lamport diagram is a pervasive and intuitive tool for informal reasoning about "happens-before" relationships in a concurrent system. However, traditional axiomatic formalizations of Lamport diagrams can be painful to work with in a mechanized setting like Agda. We propose an alternative, inductive formalization -- the causal separation diagram (CSD) -- that takes inspiration from string diagrams and concurrent separation logic, but enjoys a graphical syntax similar to Lamport diagrams. Critically, CSDs are based on the idea that causal relationships between events are witnessed by the paths that information follows between them. To that end, we model happens-before as a dependent type of paths between events.
The inductive formulation of CSDs enables their interpretation into a variety of semantic domains. We demonstrate the interpretability of CSDs with a case study on properties of logical clocks, widely-used mechanisms for reifying causal relationships as data. We carry out this study by implementing a series of interpreters for CSDs, culminating in a generic proof of Lamport's clock condition that is parametric in a choice of clock. We instantiate this proof on Lamport's scalar clock, on Mattern's vector clock, and on the matrix clocks of Raynal et al. and of Wuu and Bernstein, yielding verified implementations of each. The CSD formalism and our case study are mechanized in the Agda proof assistant. - [335] arXiv:2307.16178 (replaced) [pdf, ps, html, other]
-
Title: On Updating Static Output Feedback Controllers Under State-Space PerturbationSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
In this paper, we propose a novel update of a nominal stabilizing static output feedback (SOF) controller for a perturbed linear system. In almost every classical feedback controller design problem, a stabilizing feedback controller is designed given a stabilizable unstable system. In realistic scenarios, the system model is usually imperfect and subject to perturbations. A typical approach to attenuate the impacts of such perturbations on the system stability is repeating the whole controller design procedure to find an updated stabilizing SOF controller. Such an approach can be inefficient and occasionally infeasible. Using the notion of minimum destabilizing real perturbation (MDRP), we construct a simple norm minimization problem (a least-squares problem) to propose an efficient update of a nominal stabilizing SOF controller that can be applied to various control engineering applications in the case of perturbed scenarios like abrupt changes or inaccurate system models. In particular, considering norm-bounded known or unknown perturbations, this paper presents updated stabilizing SOF controllers and derives sufficient stability conditions. Geometric metrics to quantitatively measure the approach's robustness are defined. Moreover, we characterize the corresponding guaranteed stability regions, and specifically, for the case of norm-bounded unknown perturbations, we propose non-fragility-based robust updated stabilizing SOF controllers. Through extensive numerical simulations, we assess the effectiveness of the theoretical results.
- [336] arXiv:2307.16580 (replaced) [pdf, ps, other]
-
Title: A multiscale and multicriteria Generative Adversarial Network to synthesize 1-dimensional turbulent fieldsCarlos Granero-Belinchon (ODYSSEY, IMT Atlantique - MEE, Lab-STICC_OSE), Manuel Cabeza Gallucci (IMT Atlantique - MEE, UBA, IMT Atlantique)Journal-ref: Machine Learning: Science and Technology, 2024, 5 (2), pp.025032.Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Fluid Dynamics (physics.flu-dyn)
This article introduces a new Neural Network stochastic model to generate a 1-dimensional stochastic field with turbulent velocity statistics. Both the model architecture and training procedure ground on the Kolmogorov and Obukhov statistical theories of fully developed turbulence, so guaranteeing descriptions of 1) energy distribution, 2) energy cascade and 3) intermittency across scales in agreement with experimental observations. The model is a Generative Adversarial Network with multiple multiscale optimization criteria. First, we use three physics-based criteria: the variance, skewness and flatness of the increments of the generated field that retrieve respectively the turbulent energy distribution, energy cascade and intermittency across scales. Second, the Generative Adversarial Network criterion, based on reproducing statistical distributions, is used on segments of different length of the generated field. Furthermore, to mimic multiscale decompositions frequently used in turbulence's studies, the model architecture is fully convolutional with kernel sizes varying along the multiple layers of the model. To train our model we use turbulent velocity signals from grid turbulence at Modane wind tunnel.
- [337] arXiv:2308.02324 (replaced) [pdf, ps, html, other]
-
Title: Robust mmWave/sub-THz multi-connectivity using minimal coordination and coarse synchronizationComments: Major revision: added ray-tracing simulation to validate the theoretical analysis, and refactored the presentation to avoid misleading connections with the canonical cell-free massive MIMO literatureSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This study investigates simpler alternatives to coherent joint transmission for supporting robust connectivity against signal blockage in mmWave/sub-THz access networks. By taking an information-theoretic viewpoint, we demonstrate analytically that with a careful design, full macrodiversity gains and significant SNR gains can be achieved through canonical receivers and minimal coordination and synchronization requirements at the infrastructure side. Our proposed scheme extends non-coherent joint transmission by employing a special form of diversity to counteract artificially induced deep fades that would otherwise make this technique often compare unfavorably against standard transmitter selection schemes. Additionally, the inclusion of an Alamouti-like space-time coding layer is shown to recover a significant fraction of the optimal performance. Our conclusions are based on an insightful multi-point intermittent block fading channel model that enables rigorous ergodic and outage rate analysis, while also considering timing offsets due to imperfect delay compensation. Although simplified, our approach captures the essential features of modern mmWave/sub-THz communications, thereby providing practical design guidelines for realistic systems.
- [338] arXiv:2308.03825 (replaced) [pdf, ps, html, other]
-
Title: "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language ModelsSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type of adversarial prompt, known as jailbreak prompt, has emerged as the main attack vector to bypass the safeguards and elicit harmful content from LLMs. In this paper, employing our new framework JailbreakHub, we conduct a comprehensive analysis of 1,405 jailbreak prompts spanning from December 2022 to December 2023. We identify 131 jailbreak communities and discover unique characteristics of jailbreak prompts and their major attack strategies, such as prompt injection and privilege escalation. We also observe that jailbreak prompts increasingly shift from online Web communities to prompt-aggregation websites and 28 user accounts have consistently optimized jailbreak prompts over 100 days. To assess the potential harm caused by jailbreak prompts, we create a question set comprising 107,250 samples across 13 forbidden scenarios. Leveraging this dataset, our experiments on six popular LLMs show that their safeguards cannot adequately defend jailbreak prompts in all scenarios. Particularly, we identify five highly effective jailbreak prompts that achieve 0.95 attack success rates on ChatGPT (GPT-3.5) and GPT-4, and the earliest one has persisted online for over 240 days. We hope that our study can facilitate the research community and LLM vendors in promoting safer and regulated LLMs.
- [339] arXiv:2308.04670 (replaced) [pdf, ps, html, other]
-
Title: TRTM: Template-based Reconstruction and Target-oriented Manipulation of Crumpled ClothsComments: International Conference on Robotics and Automation (ICRA Yokohama 2024)Subjects: Robotics (cs.RO)
Precise reconstruction and manipulation of the crumpled cloths is challenging due to the high dimensionality of cloth models, as well as the limited observation at self-occluded regions. We leverage the recent progress in the field of single-view human reconstruction to template-based reconstruct crumpled cloths from their top-view depth observations only, with our proposed sim-real registration protocols. In contrast to previous implicit cloth representations, our reconstruction mesh explicitly describes the positions and visibilities of the entire cloth mesh vertices, enabling more efficient dual-arm and single-arm target-oriented manipulations. Experiments demonstrate that our TRTM system can be applied to daily cloths that have similar topologies as our template mesh, but with different shapes, sizes, patterns, and physical properties. Videos, datasets, pre-trained models, and code can be downloaded from our project website: this https URL .
- [340] arXiv:2308.10413 (replaced) [pdf, ps, html, other]
-
Title: Mechanisms that play a game, not toss a coinComments: To appear in Proceedings of IJCAI 2024Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)
Randomized mechanisms can have good normative properties compared to their deterministic counterparts. However, randomized mechanisms are problematic in several ways such as in their verifiability. We propose here to derandomize such mechanisms by having agents play a game instead of tossing a coin. The game is designed so an agent's best action is to play randomly, and this play then injects ``randomness'' into the mechanism. This derandomization retains many of the good normative properties of the original randomized mechanism but gives a mechanism that is deterministic and easy, for instance, to audit. We consider three related methods to derandomize randomized mechanism in six different domains: voting, facility location, task allocation, school choice, peer selection, and resource allocation. We propose a number of novel derandomized mechanisms for these six domains with good normative properties. Each mechanism has a mixed Nash equilibrium in which agents play a modular arithmetic game with an uniform mixed strategy. In all but one mixed Nash equilibrium, agents report their preferences over the original problem sincerely. The derandomized methods are thus ``quasi-strategy proof''. In one domain, we additionally show that a new and desirable normative property emerges as a result of derandomization.
- [341] arXiv:2308.11082 (replaced) [pdf, ps, html, other]
-
Title: PrAIoritize: Automated Early Prediction and Prioritization of Vulnerabilities in Smart ContractsSubjects: Software Engineering (cs.SE)
Context:Smart contracts are prone to numerous security threats due to undisclosed vulnerabilities and code weaknesses. In Ethereum smart contracts, the challenges of timely addressing these code weaknesses highlight the critical need for automated early prediction and prioritization during the code review process. Efficient prioritization is crucial for smart contract security. Objective:Toward this end, our research aims to provide an automated approach, PrAIoritize, for prioritizing and predicting critical code weaknesses in Ethereum smart contracts during the code review process. Method: To do so, we collected smart contract code reviews sourced from Open Source Software (OSS) on GitHub and the Common Vulnerabilities and Exposures (CVE) database. Subsequently, we developed PrAIoritize, an innovative automated prioritization approach. PrAIoritize integrates advanced Large Language Models (LLMs) with sophisticated natural language processing (NLP) techniques. PrAIoritize automates code review labeling by employing a domain-specific lexicon of smart contract weaknesses and their impacts. Following this, feature engineering is conducted for code reviews, and a pre-trained DistilBERT model is utilized for priority classification. Finally, the model is trained and evaluated using code reviews of smart contracts. Results: Our evaluation demonstrates significant improvement over state-of-the-art baselines and commonly used pre-trained models (e.g. T5) for similar classification tasks, with 4.82\%-27.94\% increase in F-measure, precision, and recall. Conclusion: By leveraging PrAIoritize, practitioners can efficiently prioritize smart contract code weaknesses, addressing critical code weaknesses promptly and reducing the time and effort required for manual triage.
- [342] arXiv:2308.11267 (replaced) [pdf, ps, html, other]
-
Title: Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision ProcessesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through the use of an uncertainty set. Simulating RCMDPs requires computing the worst-case dynamics based on value estimates for each state, an approach which has previously been used in the Robust Constrained Policy Gradient (RCPG). Highlighting potential downsides of RCPG such as not robustifying the full constrained objective and the lack of incremental learning, this paper introduces two algorithms, called RCPG with Robust Lagrangian and Adversarial RCPG. RCPG with Robust Lagrangian modifies RCPG by taking the worst-case dynamics based on the Lagrangian rather than either the value or the constraint. Adversarial RCPG also formulates the worst-case dynamics based on the Lagrangian but learns this directly and incrementally as an adversarial policy through gradient descent rather than indirectly and abruptly through constrained optimisation on a sorted value list. A theoretical analysis first derives the Lagrangian policy gradient for the policy optimisation of both proposed algorithms and then the adversarial policy gradient to learn the adversary for Adversarial RCPG. Empirical experiments injecting perturbations in inventory management and safe navigation tasks demonstrate the competitive performance of both algorithms compared to traditional RCPG variants as well as non-robust and non-constrained ablations. In particular, Adversarial RCPG ranks among the top two performing algorithms on all tests.
- [343] arXiv:2308.12874 (replaced) [pdf, ps, html, other]
-
Title: Easy attention: A simple attention mechanism for temporal predictions with transformersMarcial Sanchis-Agudo, Yuning Wang, Roger Arnau, Luca Guastoni, Jasmin Lim, Karthik Duraisamy, Ricardo VinuesaComments: 15 pages and 6 figuresSubjects: Machine Learning (cs.LG)
To improve the robustness of transformer neural networks used for temporal-dynamics prediction of chaotic systems, we propose a novel attention mechanism called easy attention which we demonstrate in time-series reconstruction and prediction. While the standard self attention only makes use of the inner product of queries and keys, it is demonstrated that the keys, queries and softmax are not necessary for obtaining the attention score required to capture long-term dependencies in temporal sequences. Through the singular-value decomposition (SVD) on the softmax attention score, we further observe that self attention compresses the contributions from both queries and keys in the space spanned by the attention score. Therefore, our proposed easy-attention method directly treats the attention scores as learnable parameters. This approach produces excellent results when reconstructing and predicting the temporal dynamics of chaotic systems exhibiting more robustness and less complexity than self attention or the widely-used long short-term memory (LSTM) network. We show the improved performance of the easy-attention method in the Lorenz system, a turbulence shear flow and a model of a nuclear reactor.
- [344] arXiv:2308.14555 (replaced) [pdf, ps, other]
-
Title: Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data SequencesComments: Major revision for lemma 7.1Subjects: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(\frac{1}{N})$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.
- [345] arXiv:2309.07806 (replaced) [pdf, ps, html, other]
-
Title: Feasability of Learning Weighted Automata on a SemiringSubjects: Formal Languages and Automata Theory (cs.FL)
Since the seminal work by Angluin, active learning of automata, by membership and equivalence queries, has been extensively studied and several generalisations have been developed to learn various extensions of automata. For weighted automata, restricted cases have been tackled in the literature and in this paper we chart the boundaries of the Angluin approach (using a class of hypothesis automata constructed from membership and equivalence queries) applied to learning weighted automata over a general semiring. We show precisely the theoretical limitations of this approach and classify functions with respect to how guessable they are (corresponding to the existence and abundance of solutions of certain systems of equations). We provide a syntactic description of the boundary condition for a correct hypothesis of the prescribed form to exist. Of course, from an algorithmic standpoint, knowing that (many) solutions exist need not translate into an effective algorithm to find one; we conclude with a discussion of some known conditions (and variants thereof) that suffice to ensure this, illustrating the ideas over several familiar semirings (including the natural numbers) and pose some open questions for future research.
- [346] arXiv:2309.07906 (replaced) [pdf, ps, html, other]
-
Title: Generative Image DynamicsComments: Project website: this http URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences depicting natural, oscillatory dynamics such as trees, flowers, candles, and clothes swaying in the wind. We model this dense, long-term motion prior in the Fourier domain:given a single image, our trained model uses a frequency-coordinated diffusion sampling process to predict a spectral volume, which can be converted into a motion texture that spans an entire video. Along with an image-based rendering module, these trajectories can be used for a number of downstream applications, such as turning still images into seamlessly looping videos, or allowing users to realistically interact with objects in real pictures by interpreting the spectral volumes as image-space modal bases, which approximate object dynamics.
- [347] arXiv:2309.08097 (replaced) [pdf, ps, html, other]
-
Title: Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot ConditionsComments: Accepted by TETCISubjects: Computer Vision and Pattern Recognition (cs.CV)
The challenge in fine-grained visual categorization lies in how to explore the subtle differences between different subclasses and achieve accurate discrimination. Previous research has relied on large-scale annotated data and pre-trained deep models to achieve the objective. However, when only a limited amount of samples is available, similar methods may become less effective. Diffusion models have been widely adopted in data augmentation due to their outstanding diversity in data generation. However, the high level of detail required for fine-grained images makes it challenging for existing methods to be directly employed. To address this issue, we propose a novel approach termed the detail reinforcement diffusion model~(DRDM), which leverages the rich knowledge of large models for fine-grained data augmentation and comprises two key components including discriminative semantic recombination (DSR) and spatial knowledge reference~(SKR). Specifically, DSR is designed to extract implicit similarity relationships from the labels and reconstruct the semantic mapping between labels and instances, which enables better discrimination of subtle differences between different subclasses. Furthermore, we introduce the SKR module, which incorporates the distributions of different datasets as references in the feature space. This allows the SKR to aggregate the high-dimensional distribution of subclass features in few-shot FGVC tasks, thus expanding the decision boundary. Through these two critical components, we effectively utilize the knowledge from large models to address the issue of data scarcity, resulting in improved performance for fine-grained visual recognition tasks. Extensive experiments demonstrate the consistent performance gain offered by our DRDM.
- [348] arXiv:2309.14660 (replaced) [pdf, ps, html, other]
-
Title: CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud RegistrationShuhao Kang, Youqi Liao, Jianping Li, Fuxun Liang, Yuhao Li, Xianghong Zou, Fangning Li, Xieyuanli Chen, Zhen Dong, Bisheng YangComments: Submitted to IEEE RA-L (under review); project page is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Image-to-point cloud (I2P) registration is a fundamental task for robots and autonomous vehicles to achieve cross-modality data fusion and localization. Existing I2P registration methods estimate correspondences at the point/pixel level, often overlooking global alignment. However, I2P matching can easily converge to a local optimum when performed without high-level guidance from global constraints. To address this issue, this paper introduces CoFiI2P, a novel I2P registration network that extracts correspondences in a coarse-to-fine manner to achieve the globally optimal solution. First, the image and point cloud data are processed through a Siamese encoder-decoder network for hierarchical feature extraction. Second, a coarse-to-fine matching module is designed to leverage these features and establish robust feature correspondences. Specifically, In the coarse matching phase, a novel I2P transformer module is employed to capture both homogeneous and heterogeneous global information from the image and point cloud data. This enables the estimation of coarse super-point/super-pixel matching pairs with discriminative descriptors. In the fine matching module, point/pixel pairs are established with the guidance of super-point/super-pixel correspondences. Finally, based on matching pairs, the transform matrix is estimated with the EPnP-RANSAC algorithm. Extensive experiments conducted on the KITTI dataset demonstrate that CoFiI2P achieves impressive results, with a relative rotation error (RRE) of 1.14 degrees and a relative translation error (RTE) of 0.29 meters. These results represent a significant improvement of 84% in RRE and 89% in RTE compared to the current state-of-the-art (SOTA) method. The project page is available at \url{this https URL}.
- [349] arXiv:2309.16967 (replaced) [pdf, ps, html, other]
-
Title: nnSAM: Plug-and-play Segment Anything Model Improves nnUNet PerformanceSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Automatic segmentation of medical images is crucial in modern clinical workflows. The Segment Anything Model (SAM) has emerged as a versatile tool for image segmentation without specific domain training, but it requires human prompts and may have limitations in specific domains. Traditional models like nnUNet perform automatic segmentation during inference and are effective in specific domains but need extensive domain-specific training. To combine the strengths of foundational and domain-specific models, we propose nnSAM, integrating SAM's robust feature extraction with nnUNet's automatic configuration to enhance segmentation accuracy on small datasets. Our nnSAM model optimizes two main approaches: leveraging SAM's feature extraction and nnUNet's domain-specific adaptation, and incorporating a boundary shape supervision loss function based on level set functions and curvature calculations to learn anatomical shape priors from limited data. We evaluated nnSAM on four segmentation tasks: brain white matter, liver, lung, and heart segmentation. Our method outperformed others, achieving the highest DICE score of 82.77% and the lowest ASD of 1.14 mm in brain white matter segmentation with 20 training samples, compared to nnUNet's DICE score of 79.25% and ASD of 1.36 mm. A sample size study highlighted nnSAM's advantage with fewer training samples. Our results demonstrate significant improvements in segmentation performance with nnSAM, showcasing its potential for small-sample learning in medical image segmentation.
- [350] arXiv:2310.00022 (replaced) [pdf, ps, html, other]
-
Title: CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Learning representations through self-supervision on unlabeled data has proven highly effective for understanding diverse images. However, remote sensing images often have complex and densely populated scenes with multiple land objects and no clear foreground objects. This intrinsic property generates high object density, resulting in false positive pairs or missing contextual information in self-supervised learning. To address these problems, we propose a context-enhanced masked image modeling method (CtxMIM), a simple yet efficient MIM-based self-supervised learning for remote sensing image understanding. CtxMIM formulates original image patches as a reconstructive template and employs a Siamese framework to operate on two sets of image patches. A context-enhanced generative branch is introduced to provide contextual information through context consistency constraints in the reconstruction. With the simple and elegant design, CtxMIM encourages the pre-training model to learn object-level or pixel-level features on a large-scale dataset without specific temporal or geographical constraints. Finally, extensive experiments show that features learned by CtxMIM outperform fully supervised and state-of-the-art self-supervised learning methods on various downstream tasks, including land cover classification, semantic segmentation, object detection, and instance segmentation. These results demonstrate that CtxMIM learns impressive remote sensing representations with high generalization and transferability. Code and data will be made public available.
- [351] arXiv:2310.01042 (replaced) [pdf, ps, other]
-
Title: Constrained Flows in NetworksComments: 28 pages, 8 figuresSubjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
The support of a flow $x$ in a network is the subdigraph induced by the arcs $uv$ for which $x(uv)>0$. We discuss a number of results on flows in networks where we put certain restrictions on structure of the support of the flow. Many of these problems are NP-hard because they generalize linkage problems for digraphs. For example deciding whether a network ${\cal N}=(D,s,t,c)$ has a maximum flow $x$ such that the maximum out-degree of the support $D_x$ of $x$ is at most 2 is NP-complete as it contains the 2-linkage problem as a very special case. Another problem which is NP-complete for the same reason is that of computing the maximum flow we can send from $s$ to $t$ along $p$ paths (called a maximum {\bf $p$-path-flow}) in ${\cal N}$. Baier et al. (2005) gave a polynomial time algorithm which finds a $p$-path-flow $x$ whose value is at least $\frac{2}{3}$ of the value of a optimum $p$-path-flow when $p\in \{2,3\}$, and at least $\frac{1}{2}$ when $p\geq 4$. When $p=2$, they show that this is best possible unless P=NP. We show for each $p\geq 2$ that the value of a maximum $p$-path-flow cannot be approximated by any ratio larger than $\frac{9}{11}$, unless P=NP. We also consider a variant of the problem where the $p$ paths must be disjoint. For this problem, we give an algorithm which gets within a factor $\frac{1}{H(p)}$ of the optimum solution, where $H(p)$ is the $p$'th harmonic number ($H(p) \sim \ln(p)$). We show that in the case where the network is acyclic, we can find such a maximum $p$-path-flow in polynomial time for every $p$. We determine the complexity of a number of related problems concerning the structure of flows. For the special case of acyclic digraphs, some of the results we obtain are in some sense best possible.
- [352] arXiv:2310.01745 (replaced) [pdf, ps, html, other]
-
Title: A Volumetric Approach to Monge's Optimal Transport on SurfacesComments: 40 pages, 15 figuresSubjects: Numerical Analysis (math.NA)
We propose a volumetric formulation for computing the Optimal Transport problem defined on surfaces in $\mathbb{R}^3$, found in disciplines like optics, computer graphics, and computational methodologies. Instead of directly tackling the original problem on the surface, we define a new Optimal Transport problem on a thin tubular region, $T_{\epsilon}$, adjacent to the surface. This extension offers enhanced flexibility and simplicity for numerical discretization on Cartesian grids. The Optimal Transport mapping and potential function computed on $T_{\epsilon}$ are consistent with the original problem on surfaces. We demonstrate that, with the proposed volumetric approach, it is possible to use simple and straightforward numerical methods to solve Optimal Transport for $\Gamma = \mathbb{S}^2$ and the $2$-torus.
- [353] arXiv:2310.01967 (replaced) [pdf, ps, html, other]
-
Title: Efficient Frontier Management for Collaborative Active SLAMMuhammad Farhan Ahmed, Matteo Maragliano, Vincent FremontCarmine, Tommaso Recchiuto, Antonio SgorbissaComments: 7 pages, 11 figures 3 TablesSubjects: Robotics (cs.RO)
In autonomous robotics, a critical challenge lies in developing robust solutions for Active Collaborative SLAM, wherein multiple robots collaboratively explore and map an unknown environment while intelligently coordinating their movements and sensor data acquisitions. In this article, we present an efficient centralized frontier sharing approach that maximizes exploration by taking into account information gain in the merged map, distance, and reward computation among frontier candidates and encourages the spread of agents into the environment. Eventually, our method efficiently spreads the robots for maximum exploration while keeping SLAM uncertainty low. Additionally, we also present two coordination approaches, synchronous and asynchronous to prioritize robot goal assignments by the central server. The proposed method is implemented in ROS and evaluated through simulation and experiments on publicly available datasets and similar methods, rendering promising results.
- [354] arXiv:2310.04257 (replaced) [pdf, ps, html, other]
-
Title: On Solving Close Enough Orienteering Problems with Overlapped NeighborhoodsComments: 30 pages, 11 figuresSubjects: Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Close Enough Traveling Salesman Problem (CETSP) is a well-known variant of TSP whereby the agent may complete its mission at any point within a target neighborhood. Heuristics based on overlapped neighborhoods, known as Steiner Zones (SZ), have gained attention in addressing CETSP. While SZs offer effective approximations to the original graph, their inherent overlap imposes constraints on search space, potentially conflicting with global optimization objectives. Here we show how such limitations can be converted into advantages in a Close Enough Orienteering Problem (CEOP) by aggregating prizes across overlapped neighborhoods. We further extend classic CEOP with Non-uniform Neighborhoods (CEOP-N) by introducing non-uniform costs for prize collection. To tackle CEOP and CEOP-N, we develop a new approach featuring a Randomized Steiner Zone Discretization (RSZD) scheme coupled with a hybrid algorithm based on Particle Swarm Optimization (PSO) and Ant Colony System (ACS), CRaSZe-AntS. The RSZD scheme identifies sub-regions for PSO exploration, and ACS determines the discrete visiting sequence. We evaluate the RSZD's discretization performance on CEOP instances derived from established CETSP instances and compare CRaSZe-AntS against the most relevant state-of-the-art heuristic focused on single-neighborhood optimization for CEOP instances. We also compare the performance of the interior search within SZs and the boundary search on individual neighborhoods in the context of CEOP-N. Our experimental results show that CRaSZe-AntS can yield comparable solution quality with significantly reduced computation time compared to the single neighborhood strategy, where we observe an average 140.44% increase in prize collection and a 55.18% reduction in algorithm execution time. CRaSZe-AntS is thus highly effective in solving emerging CEOP-N, examples of which include truck-and-drone delivery scenarios.
- [355] arXiv:2310.04874 (replaced) [pdf, ps, html, other]
-
Title: AirIMU: Learning Uncertainty Propagation for Inertial OdometrySubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Inertial odometry (IO) using strap-down inertial measurement units (IMUs) is critical in many robotic applications where precise orientation and position tracking are essential. Prior kinematic motion model-based IO methods often use a simplified linearized IMU noise model and thus usually encounter difficulties in modeling non-deterministic errors arising from environmental disturbances and mechanical defects. In contrast, data-driven IO methods struggle to accurately model the sensor motions, often leading to generalizability and interoperability issues. To address these challenges, we present AirIMU, a hybrid approach to estimate the uncertainty, especially the non-deterministic errors, by data-driven methods and increase the generalization abilities using model-based methods. We demonstrate the adaptability of AirIMU using a full spectrum of IMUs, from low-cost automotive grades to high-end navigation grades. We also validate its effectiveness on various platforms, including hand-held devices, vehicles, and a helicopter that covers a trajectory of 262 kilometers. In the ablation study, we validate the effectiveness of our learned uncertainty in an IMU-GPS pose graph optimization experiment, achieving a 31.6\% improvement in accuracy. Experiments demonstrate that jointly training the IMU noise correction and uncertainty estimation synergistically benefits both tasks.
- [356] arXiv:2310.05393 (replaced) [pdf, ps, html, other]
-
Title: Hierarchical Side-Tuning for Vision TransformersComments: 10 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Fine-tuning pre-trained Vision Transformers (ViTs) has showcased significant promise in enhancing visual recognition tasks. Yet, the demand for individualized and comprehensive fine-tuning processes for each task entails substantial computational and memory costs, posing a considerable challenge. Recent advancements in Parameter-Efficient Transfer Learning (PETL) have shown potential for achieving high performance with fewer parameter updates compared to full fine-tuning. However, their effectiveness is primarily observed in simple tasks like image classification, while they encounter challenges with more complex vision tasks like dense prediction. To address this gap, this study aims to identify an effective tuning method that caters to a wider range of visual tasks. In this paper, we introduce Hierarchical Side-Tuning (HST), an innovative PETL method facilitating the transfer of ViT models to diverse downstream tasks. Diverging from existing methods that focus solely on fine-tuning parameters within specific input spaces or modules, HST employs a lightweight Hierarchical Side Network (HSN). This network leverages intermediate activations from the ViT backbone to model multi-scale features, enhancing prediction capabilities. To evaluate HST, we conducted comprehensive experiments across a range of visual tasks, including classification, object detection, instance segmentation, and semantic segmentation. Remarkably, HST achieved state-of-the-art performance in 13 out of the 19 tasks on the VTAB-1K benchmark, with the highest average Top-1 accuracy of 76.1%, while fine-tuning a mere 0.78M parameters. When applied to object detection and semantic segmentation tasks on the COCO and ADE20K testdev benchmarks, HST outperformed existing PETL methods and even surpassed full fine-tuning.
- [357] arXiv:2310.09031 (replaced) [pdf, ps, html, other]
-
Title: MINDE: Mutual Information Neural Diffusion EstimationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In this work we present a new method for the estimation of Mutual Information (MI) between random variables. Our approach is based on an original interpretation of the Girsanov theorem, which allows us to use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions. As a by-product, our method also enables the estimation of the entropy of random variables. Armed with such building blocks, we present a general recipe to measure MI, which unfolds in two directions: one uses conditional diffusion process, whereas the other uses joint diffusion processes that allow simultaneous modelling of two random variables. Our results, which derive from a thorough experimental protocol over all the variants of our approach, indicate that our method is more accurate than the main alternatives from the literature, especially for challenging distributions. Furthermore, our methods pass MI self-consistency tests, including data processing and additivity under independence, which instead are a pain-point of existing methods.
- [358] arXiv:2310.15047 (replaced) [pdf, ps, html, other]
-
Title: Implicit meta-learning may lead language models to trust more reliable sourcesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We demonstrate that LLMs may learn indicators of document usefulness and modulate their updates accordingly. We introduce random strings ("tags") as indicators of usefulness in a synthetic fine-tuning dataset. Fine-tuning on this dataset leads to implicit meta-learning (IML): in further fine-tuning, the model updates to make more use of text that is tagged as useful. We perform a thorough empirical investigation of this phenomenon, finding (among other things) that (i) it occurs in both pretrained LLMs and those trained from scratch, as well as on a vision task, and (ii) larger models and smaller batch sizes tend to give more IML. We also use probing to examine how IML changes the way models store knowledge in their parameters. Finally, we reflect on what our results might imply about capabilities, risks, and controllability of future AI systems. Our code can be found at this https URL.
- [359] arXiv:2311.05112 (replaced) [pdf, ps, html, other]
-
Title: A Survey of Large Language Models in Medicine: Progress, Application, and ChallengeHongjian Zhou, Fenglin Liu, Boyang Gu, Xinyu Zou, Jinfa Huang, Jinge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Chenyu You, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. CliftonComments: Preprint. Version 5. 6 figures; 14 tables; 41 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models (LLMs), such as ChatGPT, have received substantial attention due to their capabilities for understanding and generating human language. While there has been a burgeoning trend in research focusing on the employment of LLMs in supporting different medical tasks (e.g., enhancing clinical diagnostics and providing medical education), a review of these efforts, particularly their development, practical applications, and outcomes in medicine, remains scarce. Therefore, this review aims to provide a detailed overview of the development and deployment of LLMs in medicine, including the challenges and opportunities they face. In terms of development, we provide a detailed introduction to the principles of existing medical LLMs, including their basic model structures, number of parameters, and sources and scales of data used for model development. It serves as a guide for practitioners in developing medical LLMs tailored to their specific needs. In terms of deployment, we offer a comparison of the performance of different LLMs across various medical tasks, and further compare them with state-of-the-art lightweight models, aiming to provide an understanding of the advantages and limitations of LLMs in medicine. Overall, in this review, we address the following questions: 1) What are the practices for developing medical LLMs 2) How to measure the medical task performance of LLMs in a medical setting? 3) How have medical LLMs been employed in real-world practice? 4) What challenges arise from the use of medical LLMs? and 5) How to more effectively develop and deploy medical LLMs? By answering these questions, this review aims to provide insights into the opportunities for LLMs in medicine and serve as a practical resource. We also maintain a regularly updated list of practical guides on medical LLMs at: this https URL.
- [360] arXiv:2311.12373 (replaced) [pdf, ps, html, other]
-
Title: Beyond Turing: A Comparative Analysis of Approaches for Detecting Machine-Generated TextSubjects: Computation and Language (cs.CL)
Significant progress has been made on text generation by pre-trained language models (PLMs), yet distinguishing between human and machine-generated text poses an escalating challenge. This paper offers an in-depth evaluation of three distinct methods used to address this task: traditional shallow learning, Language Model (LM) fine-tuning, and Multilingual Model fine-tuning. These approaches are rigorously tested on a wide range of machine-generated texts, providing a benchmark of their competence in distinguishing between human-authored and machine-authored linguistic constructs. The results reveal considerable differences in performance across methods, thus emphasizing the continued need for advancement in this crucial area of NLP. This study offers valuable insights and paves the way for future research aimed at creating robust and highly discriminative models.
- [361] arXiv:2311.12608 (replaced) [pdf, ps, html, other]
-
Title: Density-Guided Dense Pseudo Label Selection For Semi-supervised Oriented Object DetectionComments: 9 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, dense pseudo-label, which directly selects pseudo labels from the original output of the teacher model without any complicated post-processing steps, has received considerable attention in semi-supervised object detection (SSOD). However, for the multi-oriented and dense objects that are common in aerial scenes, existing dense pseudo-label selection methods are inefficient because they ignore the significant density difference. Therefore, we propose Density-Guided Dense Pseudo Label Selection (DDPLS) for semi-supervised oriented object detection. In DDPLS, we design a simple but effective adaptive mechanism to guide the selection of dense pseudo labels. Specifically, we propose the Pseudo Density Score (PDS) to estimate the density of potential objects and use this score to select reliable dense pseudo labels. On the DOTA-v1.5 benchmark, the proposed method outperforms previous methods especially when labeled data are scarce. For example, it achieves 49.78 mAP given only 5\% of annotated data, which surpasses previous state-of-the-art method given 10\% of annotated data by 1.15 mAP. Our codes is available at this https URL.
- [362] arXiv:2311.16379 (replaced) [pdf, ps, html, other]
-
Title: Enhanced Fractional Fourier Transform (FRFT) scheme using the closed Newton-Cotes rulesComments: 14 page,15 figuresSubjects: Numerical Analysis (math.NA); Probability (math.PR)
The paper enhances the accuracy of the one-dimensional fractional Fourier transform (FRFT) using the closed Newton Cotes quadrature rules. Given the weights generated by the Composite Newton Cotes rules of order QN, it is shown that a FRFT of a QN-long weighted sequence can be written as two composites of FRFTs. The first composite of FRFTs is made up of a FRFT of a Q-long weighted sequence and a FRFT of a N-long sequence. The second composite is made up of a FRFT of a N-long weighted sequence and a FRFT of a Q-long sequence. The Empirical evidence suggests that the composite FRFTs has commutative property and works both algebraically and numerically. The composite of FRFTs is applied to the problem of inverting Fourier and Laplace transforms. The results show that the composite FRFTs outperforms both the simple non-weighted FRFT and the Newton-Cotes integration method, but the difference is less significant for the integration method.
- [363] arXiv:2311.18803 (replaced) [pdf, ps, html, other]
-
Title: BioCLIP: A Vision Foundation Model for the Tree of LifeSamuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu SuComments: CVPR 2024 (oral) camera-ready version; data releasedSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an explosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new questions, contexts, and datasets. A vision model for general organismal biology questions on images is of timely need. To approach this, we curate and release TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images. We then develop BioCLIP, a foundation model for the tree of life, leveraging the unique properties of biology captured by TreeOfLife-10M, namely the abundance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge. We rigorously benchmark our approach on diverse fine-grained biology classification tasks and find that BioCLIP consistently and substantially outperforms existing baselines (by 16% to 17% absolute). Intrinsic evaluation reveals that BioCLIP has learned a hierarchical representation conforming to the tree of life, shedding light on its strong generalizability. this https URL has models, data and code.
- [364] arXiv:2312.02089 (replaced) [pdf, ps, html, other]
-
Title: Sequential Sweeps and High Dimensional ExpansionSubjects: Discrete Mathematics (cs.DM)
It is well known that the spectral gap of the down-up walk over an $n$-partite simplicial complex (also known as Glauber dynamics) cannot be better than $O(1/n)$ due to natural obstructions such as coboundaries. We study an alternative random walk over partite simplicial complexes known as the sequential sweep or the systematic scan Glauber dynamics: Whereas the down-up walk at each step selects a random coordinate and updates it based on the remaining coordinates, the sequential sweep goes through each of the coordinates one by one in a deterministic order and applies the same update operation. It is natural, thus, to compare $n$-steps of the down-up walk with a single step of the sequential sweep. Interestingly, while the spectral gap of the $n$-th power of the down-up walk is still bounded from above by a constant, under a strong enough local spectral assumption (in the sense of Gur, Lifschitz, Liu, STOC 2022) we can show that the spectral gap of this walk can be arbitrarily close to 1. We also study other isoperimetric inequalities for these walks, and show that under the assumptions of local entropy contraction (related to the considerations of Gur, Lifschitz, Liu), these walks satisfy an entropy contraction inequality.
- [365] arXiv:2312.04140 (replaced) [pdf, ps, html, other]
-
Title: Polarimetric Light Transport Analysis for Specular Inter-reflectionComments: Accepted to IEEE Transactions on Computational Imaging (TCI)Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Polarization is well known for its ability to decompose diffuse and specular reflections. However, the existing decomposition methods only focus on direct reflection and overlook multiple reflections, especially specular inter-reflection. In this paper, we propose a novel decomposition method for handling specular inter-reflection of metal objects by using a unique polarimetric feature: the rotation direction of linear polarization. This rotation direction serves as a discriminative factor between direct and inter-reflection on specular surfaces. To decompose the reflectance components, we actively rotate the linear polarization of incident light and analyze the rotation direction of the reflected light. We evaluate our method using both synthetic and real data, demonstrating its effectiveness in decomposing specular inter-reflections of metal objects. Furthermore, we demonstrate that our method can be combined with other decomposition methods for a detailed analysis of light transport. As a practical application, we show its effectiveness in improving the accuracy of 3D measurement against strong specular inter-reflection.
- [366] arXiv:2312.05134 (replaced) [pdf, ps, html, other]
-
Title: Optimal Multi-Distribution LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Multi-distribution learning (MDL), which seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions, has emerged as a unified framework in response to the evolving demand for robustness, fairness, multi-group collaboration, etc. Achieving data-efficient MDL necessitates adaptive sampling, also called on-demand sampling, throughout the learning process. However, there exist substantial gaps between the state-of-the-art upper and lower bounds on the optimal sample complexity. Focusing on a hypothesis class of Vapnik-Chervonenkis (VC) dimension d, we propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon^2 (modulo some logarithmic factor), matching the best-known lower bound. Our algorithmic ideas and theory are further extended to accommodate Rademacher classes. The proposed algorithms are oracle-efficient, which access the hypothesis class solely through an empirical risk minimization oracle.
Additionally, we establish the necessity of randomization, revealing a large sample size barrier when only deterministic hypotheses are permitted. These findings resolve three open problems presented in COLT 2023 (i.e., citet[Problems 1, 3 and 4]{awasthi2023sample}). - [367] arXiv:2312.05490 (replaced) [pdf, ps, html, other]
-
Title: Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole Slide Image ClassificationComments: submitted to IEEE TRANSACTIONS ON MEDICAL IMAGINGSubjects: Computer Vision and Pattern Recognition (cs.CV)
In computational pathology, whole slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains complex. While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances. To address these issues, we propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation. The computation of the Shapley value is then accelerated using attention, meanwhile retaining the enhanced instance identification and prioritization. We further introduce a framework for the progressive assignment of pseudo bags based on estimated IIS, encouraging more balanced attention distributions in MIL models. Our extensive experiments on CAMELYON-16, BRACS, and TCGA-LUNG datasets show our method's superiority over existing state-of-the-art approaches, offering enhanced interpretability and class-wise insights. We will release the code upon acceptance.
- [368] arXiv:2312.06344 (replaced) [pdf, ps, html, other]
-
Title: Learning Robust Policies for Uncertain Parametric Markov Decision ProcessesComments: 10 pages, accepted for oral presentation at L4DCSubjects: Systems and Control (eess.SY); Logic in Computer Science (cs.LO)
Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs.
- [369] arXiv:2312.06353 (replaced) [pdf, ps, html, other]
-
Title: Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 KilobytesComments: Accepted to ICML 2024. 25 pages, 14 figures, 7 tables. Codes are available at this https URLSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. Federated learning offers a way to fine-tune LLMs using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance height possible with full-parameter tuning. However, federated full-parameter tuning of LLMs is a non-trivial problem due to the immense communication cost. This work introduces FedKSeed that employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds and scalar gradients, amounting to only a few thousand bytes, making federated full-parameter tuning of billion-sized LLMs possible on devices. Building on it, we develop a strategy enabling probability-differentiated seed sampling, prioritizing perturbations with greater impact on model accuracy. Experiments across six scenarios with various LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in both communication efficiency and new task generalization.
- [370] arXiv:2312.06701 (replaced) [pdf, ps, html, other]
-
Title: Dynamic Adversarial Attacks on Autonomous Driving SystemsSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This paper introduces an attacking mechanism to challenge the resilience of autonomous driving systems. Specifically, we manipulate the decision-making processes of an autonomous vehicle by dynamically displaying adversarial patches on a screen mounted on another moving vehicle. These patches are optimized to deceive the object detection models into misclassifying targeted objects, e.g., traffic signs. Such manipulation has significant implications for critical multi-vehicle interactions such as intersection crossing and lane changing, which are vital for safe and efficient autonomous driving systems. Particularly, we make four major contributions. First, we introduce a novel adversarial attack approach where the patch is not co-located with its target, enabling more versatile and stealthy attacks. Moreover, our method utilizes dynamic patches displayed on a screen, allowing for adaptive changes and movement, enhancing the flexibility and performance of the attack. To do so, we design a Screen Image Transformation Network (SIT-Net), which simulates environmental effects on the displayed images, narrowing the gap between simulated and real-world scenarios. Further, we integrate a positional loss term into the adversarial training process to increase the success rate of the dynamic attack. Finally, we shift the focus from merely attacking perceptual systems to influencing the decision-making algorithms of self-driving systems. Our experiments demonstrate the first successful implementation of such dynamic adversarial attacks in real-world autonomous driving scenarios, paving the way for advancements in the field of robust and secure autonomous driving.
- [371] arXiv:2312.06736 (replaced) [pdf, ps, html, other]
-
Title: SqueezeSAM: User friendly mobile interactive segmentationBalakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Lemeng Wu, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas ChandraSubjects: Computer Vision and Pattern Recognition (cs.CV)
The Segment Anything Model (SAM) has been a cornerstone in the field of interactive segmentation, propelling significant progress in generative AI, computational photography, and medical imaging. Despite its ability to process arbitrary user input and generate corresponding segmentation masks, SAM's 600 million parameter architecture, based on ViT-H, is not compatible with current mobile hardware due to its high computational demands and large model size. Our research aims to adapt SAM for use in mobile photography applications. To this end, we have developed a fully convolutional SqueezeSAM model architecture, which is 62.5 times faster and 31.6 times smaller than the original SAM, making it a viable solution for mobile applications. Furthermore, our tiny model achieves an mIOU within \emph{1\%} of the original VIT-H architecture.
Automated segmentation holds significant value in the creation flow for photography applications, as evidenced by its adoption by leading industry players like apple and capcut. To facilitate this automation, we employ salient object detection and simulate potential user clicks for foreground object selection, generating an initial segmentation mask that users can subsequently edit interactively. A common user expectation is that a click on a specific part of an object will result in the segmentation of the entire object. For example, a click on a person's t-shirt in a photo should ideally segment the entire person, not just the t-shirt. However, SAM typically only segments the clicked area. We address this limitation through a novel data augmentation scheme. Consequently, if a user clicks on a person holding a basketball, both the person and the basketball are segmented together, aligning with user expectations and enhancing the overall user experience. - [372] arXiv:2312.08945 (replaced) [pdf, ps, html, other]
-
Title: A Comparative Gas Cost Analysis of Proxy and Diamond Patterns in EVM Blockchains for Trusted Smart Contract EngineeringSubjects: Software Engineering (cs.SE)
Blockchain applications are witnessing rapid evolution, necessitating the integration of upgradeable smart contracts. Software patterns have been proposed to summarize upgradeable smart contract best practices. However, research is missing on the comparison of these upgradeable smart contract patterns, especially regarding gas costs related to deployment and execution. This study aims to provide an in-depth analysis of gas costs associated with two prevalent upgradeable smart contract patterns: the Proxy and diamond patterns. The Proxy pattern utilizes a Proxy pointing to a logic contract, while the diamond pattern enables a Proxy to point to multiple logic contracts. We conduct a comparative analysis of gas costs for both patterns in contrast to a traditional non-upgradeable smart contract. We derive from this analysis a theoretical contribution in the form of two consolidated blockchain patterns and a corresponding decision model. By so doing we hope to contribute to the broader understanding of upgradeable smart contract patterns.
- [373] arXiv:2312.12677 (replaced) [pdf, ps, html, other]
-
Title: Synchronous Consensus in Partial SynchronySubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We demonstrate a deterministic Byzantine consensus algorithm with synchronous operation in partial synchrony. It is naturally leaderless, tolerates any number of $ f<n/2 $ Byzantine processes with 2 rounds of exchange of originator-only signed messages, and terminates within a bounded interval of time. The algorithm is resilient to transient faults and asynchrony in a fraction of links with known size per number of faulty processes. It circumvents asynchronous and faulty links with 3-hop epidemic dissemination. Key finding: the resilience to asynchrony of links and the enabled by it leaderless consensus in partial synchrony ensure algorithm operation with simultaneous validity, safety, and bounded liveness.
- [374] arXiv:2312.14367 (replaced) [pdf, ps, other]
-
Title: A force-based beam element model based on the modified higher-order shear deformation theory for accurate analysis of FG beamsSubjects: Computational Engineering, Finance, and Science (cs.CE)
In this paper, a force-based beam finite element model based on a modified higher-order shear deformation theory is proposed for the accurate analysis of functionally graded beams. In the modified higher-order shear deformation theory, the distribution of transverse shear stress across the beam's thickness is obtained from the differential equilibrium equation, and a modified shear stiffness is derived to take the effect of transverse shear stress distribution into consideration. In the proposed beam element model, unlike traditional beam finite elements that regard generalized displacements as unknown fields, the internal forces are considered as the unknown fields, and they are predefined by using the closed-form solutions of the differential equilibrium equations of higher-order shear beam. Then, the generalized displacements are expressed by the internal forces with the introduction of geometric relations and constitutive equations, and the equation system of the beam element is constructed based on the equilibrium conditions at the boundaries and the compatibility condition within the element. Numerical examples underscore the accuracy and efficacy of the proposed higher-order beam element model in the static analysis of functionally graded sandwich beams, particularly in terms of true transverse shear stress distribution.
- [375] arXiv:2312.14428 (replaced) [pdf, ps, html, other]
-
Title: A Unified Industrial Large Knowledge Model Framework in Smart ManufacturingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The recent emergence of large language models (LLMs) shows the potential for artificial general intelligence, revealing new opportunities in industry 4.0 and smart manufacturing. However, a notable gap exists in applying these LLMs in industry, primarily due to their training on general knowledge rather than domain-specific knowledge. Such specialized domain knowledge is vital for effectively addressing the complex needs of industrial applications. To bridge this gap, this paper proposes an Industrial Large Knowledge Model (ILKM) framework emphasizing their potential to revolutionize the industry in smart manufacturing. In addition, ILKMs and LLMs are compared from eight perspectives. Finally, the "6S Principle" is proposed as the guideline for ILKM development, and several potential opportunities are highlighted for ILKM deployment in smart manufacturing.
- [376] arXiv:2312.14973 (replaced) [pdf, ps, html, other]
-
Title: Interactive Visualization of Time-Varying Flow Fields Using Particle Tracing Neural NetworksComments: Accepted by Pacific Vis 2024Subjects: Graphics (cs.GR)
In this paper, we present a comprehensive evaluation to establish a robust and efficient framework for Lagrangian-based particle tracing using deep neural networks (DNNs). Han et al. (2021) first proposed a DNN-based approach to learn Lagrangian representations and demonstrated accurate particle tracing for an analytic 2D flow field. In this paper, we extend and build upon this prior work in significant ways. First, we evaluate the performance of DNN models to accurately trace particles in various settings, including 2D and 3D time-varying flow fields, flow fields from multiple applications, flow fields with varying complexity, as well as structured and unstructured input data. Second, we conduct an empirical study to inform best practices with respect to particle tracing model architectures, activation functions, and training data structures. Third, we conduct a comparative evaluation of prior techniques that employ flow maps as input for exploratory flow visualization. Specifically, we compare our extended model against its predecessor by Han et al. (2021), as well as the conventional approach that uses triangulation and Barycentric coordinate interpolation. Finally, we consider the integration and adaptation of our particle tracing model with different viewers. We provide an interactive web-based visualization interface by leveraging the efficiencies of our framework, and perform high-fidelity interactive visualization by integrating it with an OSPRay-based viewer. Overall, our experiments demonstrate that using a trained DNN model to predict new particle trajectories requires a low memory footprint and results in rapid inference. Following best practices for large 3D datasets, our deep learning approach using GPUs for inference is shown to require approximately 46 times less memory while being more than 400 times faster than the conventional methods.
- [377] arXiv:2401.00889 (replaced) [pdf, ps, other]
-
Title: 3D Human Pose Perception from Egocentric Stereo VideosSubjects: Computer Vision and Pattern Recognition (cs.CV)
While head-mounted devices are becoming more compact, they provide egocentric views with significant self-occlusions of the device user. Hence, existing methods often fail to accurately estimate complex 3D poses from egocentric views. In this work, we propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation, which leverages the scene information and temporal context of egocentric stereo videos. Specifically, we utilize 1) depth features from our 3D scene reconstruction module with uniformly sampled windows of egocentric stereo frames, and 2) human joint queries enhanced by temporal features of the video inputs. Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting. Furthermore, we introduce two new benchmark datasets, i.e., UnrealEgo2 and UnrealEgo-RW (RealWorld). The proposed datasets offer a much larger number of egocentric stereo views with a wider variety of human motions than the existing datasets, allowing comprehensive evaluation of existing and upcoming methods. Our extensive experiments show that the proposed approach significantly outperforms previous methods. We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
- [378] arXiv:2401.01974 (replaced) [pdf, ps, html, other]
-
Title: Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as ProgrammersSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Visual reasoning is dominated by end-to-end neural networks scaled to billions of model parameters and training examples. However, even the largest models struggle with compositional reasoning, generalization, fine-grained spatial and temporal reasoning, and counting. Visual reasoning with large language models (LLMs) as controllers can, in principle, address these limitations by decomposing the task and solving subtasks by orchestrating a set of (visual) tools. Recently, these models achieved great performance on tasks such as compositional visual question answering, visual grounding, and video temporal reasoning. Nevertheless, in their current form, these models heavily rely on human engineering of in-context examples in the prompt, which are often dataset- and task-specific and require significant labor by highly skilled programmers. In this work, we present a framework that mitigates these issues by introducing spatially and temporally abstract routines and by leveraging a small number of labeled examples to automatically generate in-context examples, thereby avoiding human-created in-context examples. On a number of visual reasoning tasks, we show that our framework leads to consistent gains in performance, makes LLMs as controllers setup more robust, and removes the need for human engineering of in-context examples.
- [379] arXiv:2401.06178 (replaced) [pdf, ps, html, other]
-
Title: AI Art is Theft: Labour, Extraction, and Exploitation, Or, On the Dangers of Stochastic PollocksComments: Post-review. 18 pages. Accepted for publication in FAccT'24Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Since the launch of applications such as DALL-E, Midjourney, and Stable Diffusion, generative artificial intelligence has been controversial as a tool for creating artwork. While some have presented longtermist worries about these technologies as harbingers of fully automated futures to come, more pressing is the impact of generative AI on creative labour in the present. Already, business leaders have begun replacing human artistic labour with AI-generated images. In response, the artistic community has launched a protest movement, which argues that AI image generation is a kind of theft. This paper analyzes, substantiates, and critiques these arguments, concluding that AI image generators involve an unethical kind of labour theft. If correct, many other AI applications also rely upon theft.
- [380] arXiv:2401.07553 (replaced) [pdf, ps, html, other]
-
Title: Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language ModelsSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due to its accessibility and non-reliance on domain expertise. Previous safe RL methods with natural language constraints typically adopt a recurrent neural network, which leads to limited capabilities when dealing with various forms of human language input. Furthermore, these methods often require a ground-truth cost function, necessitating domain expertise for the conversion of language constraints into a well-defined cost function that determines constraint violation. To address these issues, we proposes to use pre-trained language models (LM) to facilitate RL agents' comprehension of natural language constraints and allow them to infer costs for safe policy learning. Through the use of pre-trained LMs and the elimination of the need for a ground-truth cost, our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints. Experiments on grid-world navigation and robot control show that the proposed method can achieve strong performance while adhering to given constraints. The usage of pre-trained LMs allows our method to comprehend complicated constraints and learn safe policies without the need for ground-truth cost at any stage of training or evaluation. Extensive ablation studies are conducted to demonstrate the efficacy of each part of our method.
- [381] arXiv:2401.10225 (replaced) [pdf, ps, html, other]
-
Title: ChatQA: Surpassing GPT-4 on Conversational QA and RAGComments: We add Llama3-ChatQA-1.5-8B, Llama3-ChatQA-1.5-70B, and GPT-4-Turbo-2024-04-09 resultsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-augmented generation (RAG) and conversational question answering (QA). To enhance generation, we propose a two-stage instruction tuning method that significantly boosts the performance of RAG. For effective retrieval, we introduce a dense retriever optimized for conversational QA, which yields results comparable to the alternative state-of-the-art query rewriting models, while substantially reducing deployment costs. We also present the ChatRAG Bench, which encompasses ten datasets covering comprehensive evaluations on RAG, table-related QA, arithmetic calculations, and scenarios involving unanswerable questions. Our ChatQA-1.0-70B (score: 54.14), built on Llama2, a weaker foundation model than GPT-4, can slightly outperform GPT-4-0613 (score: 53.90) and GPT-4-Turbo-2024-04-09 (score: 54.03) on the ChatRAG Bench, without relying on any synthetic data from OpenAI GPT models. Notably, Llama3-ChatQA-1.5-70B model surpasses the accuracy of GPT-4-Turbo-2024-04-09 by a margin. To advance research in this field, we open-sourced the model weights, instruction tuning data, ChatRAG Bench, and retriever for the community: this https URL.
- [382] arXiv:2401.12533 (replaced) [pdf, ps, html, other]
-
Title: Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background KnowledgeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Center-based clustering has attracted significant research interest from both theory and practice. In many practical applications, input data often contain background knowledge that can be used to improve clustering results. In this work, we build on widely adopted $k$-center clustering and model its input background knowledge as must-link (ML) and cannot-link (CL) constraint sets. However, most clustering problems including $k$-center are inherently $\mathcal{NP}$-hard, while the more complex constrained variants are known to suffer severer approximation and computation barriers that significantly limit their applicability. By employing a suite of techniques including reverse dominating sets, linear programming (LP) integral polyhedron, and LP duality, we arrive at the first efficient approximation algorithm for constrained $k$-center with the best possible ratio of 2. We also construct competitive baseline algorithms and empirically evaluate our approximation algorithm against them on a variety of real datasets. The results validate our theoretical findings and demonstrate the great advantages of our algorithm in terms of clustering cost, clustering quality, and running time.
- [383] arXiv:2401.13142 (replaced) [pdf, ps, html, other]
-
Title: Unsocial Intelligence: an Investigation of the Assumptions of AGI DiscourseSubjects: Computers and Society (cs.CY)
Dreams of machines rivaling human intelligence have shaped the field of AI since its inception. Yet, the very meaning of human-level AI or artificial general intelligence (AGI) remains elusive and contested. Definitions of AGI embrace a diverse range of incompatible values and assumptions. Contending with the fractured worldviews of AGI discourse is vital for critiques that pursue different values and futures. To that end, we provide a taxonomy of AGI definitions, laying the ground for examining the key social, political, and ethical assumptions they make. We highlight instances in which these definitions frame AGI or human-level AI as a technical topic and expose the value-laden choices being implicitly made. Drawing on feminist, STS, and social science scholarship on the political and social character of intelligence in both humans and machines, we propose contextual, democratic, and participatory paths to imagining future forms of machine intelligence. The development of future forms of AI must involve explicit attention to the values it encodes, the people it includes or excludes, and a commitment to epistemic justice.
- [384] arXiv:2401.14434 (replaced) [pdf, ps, other]
-
Title: Transforming gradient-based techniques into interpretable methodsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of these explanations into images frequently yields considerable noise. Presently, we introduce GAD (Gradient Artificial Distancing) as a supportive framework for gradient-based techniques. Its primary objective is to accentuate influential regions by establishing distinctions between classes. The essence of GAD is to limit the scope of analysis during visualization and, consequently reduce image noise. Empirical investigations involving occluded images have demonstrated that the identified regions through this methodology indeed play a pivotal role in facilitating class differentiation.
- [385] arXiv:2401.15497 (replaced) [pdf, ps, html, other]
-
Title: Foregrounding Artist Opinions: A Survey Study on Transparency, Ownership, and Fairness in AI Generative ArtSubjects: Computers and Society (cs.CY)
Generative AI tools are used to create art-like outputs and sometimes aid in the creative process. These tools have potential benefits for artists, but they also have the potential to harm the art workforce and infringe upon artistic and intellectual property rights. Without explicit consent from artists, Generative AI creators scrape artists' digital work to train Generative AI models and produce art-like outputs at scale. These outputs are now being used to compete with human artists in the marketplace as well as being used by some artists in their generative processes to create art. We surveyed 459 artists to investigate the tension between artists' opinions on Generative AI art's potential utility and harm. This study surveys artists' opinions on the utility and threat of Generative AI art models, fair practices in the disclosure of artistic works in AI art training models, ownership and rights of AI art derivatives, and fair compensation. Results show that a majority of artists believe creators should disclose what art is being used in AI training, that AI outputs should not belong to model creators, and express concerns about AI's impact on the art workforce and who profits from their art. We hope the results of this work will further meaningful collaboration and alignment between the art community and Generative AI researchers and developers.
- [386] arXiv:2401.16092 (replaced) [pdf, ps, html, other]
-
Title: Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help YouFelix Friedrich, Katharina Hämmerl, Patrick Schramowski, Manuel Brack, Jindrich Libovicky, Kristian Kersting, Alexander FraserSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Text-to-image generation models have recently achieved astonishing results in image quality, flexibility, and text alignment, and are consequently employed in a fast-growing number of applications. Through improvements in multilingual abilities, a larger community now has access to this technology. However, our results show that multilingual models suffer from significant gender biases just as monolingual models do. Furthermore, the natural expectation that multilingual models will provide similar results across languages does not hold up. Instead, there are important differences between languages. We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models. We use MAGBIG to investigate the effect of multilingualism on gender bias in T2I models. To this end, we construct multilingual prompts requesting portraits of people with a certain occupation or trait. Our results show that not only do models exhibit strong gender biases but they also behave differently across languages. Furthermore, we investigate prompt engineering strategies, such as indirect, neutral formulations, to mitigate these biases. Unfortunately, these approaches have limited success and result in worse text-to-image alignment. Consequently, we call for more research into diverse representations across languages in image generators, as well as into steerability to address biased model behavior.
- [387] arXiv:2401.16288 (replaced) [pdf, ps, html, other]
-
Title: Upper bounds on the rate of linear $q$-ary $k$-hash codesSubjects: Information Theory (cs.IT); Combinatorics (math.CO)
This paper presents new upper bounds on the rate of linear $k$-hash codes in $\mathbb{F}_q^n$, $q\geq k$, that is, codes with the property that any $k$ distinct codewords are all simultaneously distinct in at least one coordinate.
- [388] arXiv:2402.01454 (replaced) [pdf, ps, html, other]
-
Title: Integrating Large Language Models in Causal Discovery: A Statistical Causal ApproachMasayuki Takayama, Tadahisa Okuda, Thong Pham, Tatsuyoshi Ikenoue, Shingo Fukuma, Shohei Shimizu, Akiyoshi SannaiSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
In practical statistical causal discovery (SCD), embedding domain expert knowledge as constraints into the algorithm is widely accepted as significant for creating consistent meaningful causal models, despite the recognized challenges in systematic acquisition of the background knowledge. To overcome these challenges, this paper proposes a novel methodology for causal inference, in which SCD methods and knowledge based causal inference (KBCI) with a large language model (LLM) are synthesized through ``statistical causal prompting (SCP)'' for LLMs and prior knowledge augmentation for SCD. Experiments have revealed that GPT-4 can cause the output of the LLM-KBCI and the SCD result with prior knowledge from LLM-KBCI to approach the ground truth, and that the SCD result can be further improved, if GPT-4 undergoes SCP. Furthermore, by using an unpublished real-world dataset, we have demonstrated that the background knowledge provided by the LLM can improve SCD on this dataset, even if this dataset has never been included in the training data of the LLM. The proposed approach can thus address challenges such as dataset biases and limitations, illustrating the potential of LLMs to improve data-driven causal inference across diverse scientific domains.
- [389] arXiv:2402.01708 (replaced) [pdf, ps, other]
-
Title: Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech GeneratorsComments: 17 pages, 4 tables, 4 figures Accepted at the 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT '24)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Audio and Speech Processing (eess.AS)
The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States, where anonymous perpetrators create synthetic voices that call police officers to close down schools and hospitals, or to violently gain access to innocent citizens' homes. Incidents like this demonstrate that multimodal generative AI risks and harms do not exist in isolation, but arise from the interactions of multiple stakeholders and technical AI systems. In this paper we analyse speech generation incidents to study how patterns of specific harms arise. We find that specific harms can be categorised according to the exposure of affected individuals, that is to say whether they are a subject of, interact with, suffer due to, or are excluded from speech generation systems. Similarly, specific harms are also a consequence of the motives of the creators and deployers of the systems. Based on these insights we propose a conceptual framework for modelling pathways to ethical and safety harms of AI, which we use to develop a taxonomy of harms of speech generators. Our relational approach captures the complexity of risks and harms in sociotechnical AI systems, and yields a taxonomy that can support appropriate policy interventions and decision making for the responsible development and release of speech generation models.
- [390] arXiv:2402.01766 (replaced) [pdf, ps, html, other]
-
Title: LLM Voting: Human Choices and AI Collective Decision MakingComments: Submitted to AIES2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); General Economics (econ.GN)
This paper investigates the voting behaviors of Large Language Models (LLMs), specifically GPT-4 and LLaMA-2, their biases, and how they align with human voting patterns. Our methodology involved using a dataset from a human voting experiment to establish a baseline for human preferences and a corresponding experiment with LLM agents. We observed that the methods used for voting input and the presentation of choices influence LLM voting behavior. We discovered that varying the persona can reduce some of these biases and enhance alignment with human choices. While the Chain-of-Thought approach did not improve prediction accuracy, it has potential for AI explainability in the voting process. We also identified a trade-off between preference diversity and alignment accuracy in LLMs, influenced by different temperature settings. Our findings indicate that LLMs may lead to less diverse collective outcomes and biased assumptions when used in voting scenarios, emphasizing the importance of cautious integration of LLMs into democratic processes.
- [391] arXiv:2402.02551 (replaced) [pdf, ps, html, other]
-
Title: Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching TasksComments: This paper has been accepted at the International Conference on Mechatronics and Automation (ICMA 2024), sponsored by the IEEESubjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability, which may pose challenges in ensuring stability and safety. To address these issues, we propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy, all while actively engaging in the learning phase through interactions with the environment. This approach circumvents the control performance and complexities associated with computations while addressing nonrepetitive reaching tasks in the presence of obstacles. First, a model-free DRL agent is employed to plan velocity-bounded motion for a manipulator with 'n' degrees of freedom (DoF), ensuring collision avoidance for the end-effector through joint-level reasoning. The generated reference motion is then input into a robust subsystem-based adaptive controller, which produces the necessary torques, while the cuckoo search optimization (CSO) algorithm enhances control gains to minimize the stabilization and tracking error in the steady state. This approach guarantees robustness and uniform exponential convergence in an unfamiliar environment, despite the presence of uncertainties and disturbances. Theoretical assertions are validated through the presentation of simulation outcomes.
- [392] arXiv:2402.02746 (replaced) [pdf, ps, html, other]
-
Title: Standard Gaussian Process Can Be Excellent for High-Dimensional Bayesian OptimizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
There has been a long-standing and widespread belief that Bayesian Optimization (BO) with standard Gaussian process (GP), referred to as standard BO, is ineffective in high-dimensional optimization problems. While this belief sounds reasonable, strong empirical evidence is lacking. In this paper, we systematically investigated BO with standard GP regression across a variety of synthetic and real-world benchmark problems for high-dimensional optimization. We found that, surprisingly, when using Matérn kernels and Upper Confidence Bound (UCB), standard BO consistently achieves top-tier performance, often outperforming other BO methods specifically designed for high-dimensional optimization. Contrary to the stereotype, we found that standard GP equipped with Matérn kernels can serve as a capable surrogate for learning high-dimensional functions. Without strong structural assumptions, BO with standard GP not only excels in high-dimensional optimization but also is robust in accommodating various structures within target functions. Furthermore, with standard GP, achieving promising optimization performance is possible via maximum a posterior (MAP) estimation with diffuse priors or merely maximum likelihood estimation, eliminating the need for expensive Markov-Chain Monte Carlo (MCMC) sampling that might be required by more complex surrogate models. In parallel, we also investigated and analyzed alternative popular settings in running standard BO, which, however, often fail in high-dimensional optimization. This might link to the a few failure cases reported in literature. We thus advocate for a re-evaluation and in-depth study of the potential of standard BO in addressing high-dimensional problems.
- [393] arXiv:2402.03897 (replaced) [pdf, ps, html, other]
-
Title: Robust Data-EnablEd Predictive Leading Cruise Control via Reachability AnalysisComments: 8 pages, 4 figuresSubjects: Systems and Control (eess.SY)
Data-driven predictive control promises model-free wave-dampening strategies for Connected and Autonomous Vehicles (CAVs) in mixed traffic flow. However, its performance relies on data quality, which suffers from unknown noise and disturbances.This paper introduces a Robust Data-EnablEd Predictive Leading Cruise Control (RDeeP-LCC) method based on reachability analysis, aiming to achieve safe and optimal CAV control under bounded process noise and external disturbances. Precisely, the matrix zonotope set technique and Willems' Fundamental Lemma are employed to derive the over-approximated system dynamics directly from data, and a data-driven feedback control technique is utilized to obtain an additional feedback input for stability. We decouple the mixed platoon into an error system and a nominal system, where the error system provides data-driven reachability sets for the enhanced safety constraints in the nominal system. Finally, a data-driven predictive control framework is formulated in a tube-based control manner for robustness guarantees. Nonlinear simulations with noise-corrupted data demonstrate that the proposed method outperforms baseline methods in mitigating traffic waves.
- [394] arXiv:2402.04157 (replaced) [pdf, ps, html, other]
-
Title: Controller synthesis for input-state data with measurement errorsSubjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
We consider the problem of designing a state-feedback controller for a linear system, based only on noisy input-state data. We focus on input-state data corrupted by measurement errors, which, albeit less investigated, are as relevant as process disturbances in applications. For energy and instantaneous bounds on these measurement errors, we derive linear matrix inequalities for controller design where the one for the energy bound is equivalent to robust stabilization of all systems consistent with the noisy data points via a common Lyapunov function.
- [395] arXiv:2402.04291 (replaced) [pdf, ps, html, other]
-
Title: BiLLM: Pushing the Limit of Post-Training Quantization for LLMsWei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan QiComments: 19 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful compression technology, binarization can extremely reduce model weights to a mere 1 bit, lowering the expensive computation and memory requirements. However, existing quantization techniques fall short of maintaining LLM performance under ultra-low bit-widths. In response to this challenge, we present BiLLM, a groundbreaking 1-bit post-training quantization scheme tailored for pretrained LLMs. Based on the weight distribution of LLMs, BiLLM first identifies and structurally selects salient weights, and minimizes the compression loss through an effective binary residual approximation strategy. Moreover, considering the bell-shaped distribution of the non-salient weights, we propose an optimal splitting search to group and binarize them accurately. BiLLM achieving for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families and evaluation metrics, outperforms SOTA quantization methods of LLM by significant margins. Moreover, BiLLM enables the binarization process of the LLM with 7 billion weights within 0.5 hours on a single GPU, demonstrating satisfactory time efficiency. Our code is available at this https URL.
- [396] arXiv:2402.05164 (replaced) [pdf, ps, html, other]
-
Title: A Resource Model For Neural Scaling LawComments: 10 pages, 8 figures, Published as a workshop paper at ICLR 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556. We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.
- [397] arXiv:2402.06178 (replaced) [pdf, ps, html, other]
-
Title: MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion ModelsYixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon DixonComments: Accepted to IJCAI 2024Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged. Our method transforms text editing to \textit{latent space manipulation} while adding an extra constraint to enforce consistency. It seamlessly integrates with existing pretrained text-to-music diffusion models without requiring additional training. Experimental results demonstrate superior performance over both zero-shot and certain supervised baselines in style and timbre transfer evaluations. Additionally, we showcase the practical applicability of our approach in real-world music editing scenarios.
- [398] arXiv:2402.06988 (replaced) [pdf, ps, html, other]
-
Title: Three Subtyping Algorithms for Binary Session Types and their Complexity Analyses (full version)Comments: 14 pages, 5 figures. Full version of a paper submitted to PLACES 2024Subjects: Programming Languages (cs.PL)
Session types are a type discipline for describing and specifying communication behaviours of concurrent processes. Session subtyping, firstly introduced by Gay and Hole, is widely used for enlarging typability of session programs. This paper gives the complexity analysis of three algorithms for subtyping of synchronous binary session types. First, we analyse the complexity of the algorithm from the original paper, which is based on an inductive tree search. We then introduce its optimised version, which improves the complexity, but is still exponential against the size of the two types. Finally, we propose a new quadratic algorithm based on a graph search using the concept of $\mathcal{XYZW}$-simulation, recently introduced by Silva et al.
- [399] arXiv:2402.07808 (replaced) [pdf, ps, html, other]
-
Title: Sourcerer: Sample-based Maximum Entropy Source Distribution EstimationSubjects: Machine Learning (cs.LG)
Scientific modeling applications often require estimating a distribution of parameters consistent with a dataset of observations - an inference task also known as source distribution estimation. This problem can be ill-posed, however, since many different source distributions might produce the same distribution of data-consistent simulations. To make a principled choice among many equally valid sources, we propose an approach which targets the maximum entropy distribution, i.e., prioritizes retaining as much uncertainty as possible. Our method is purely sample-based - leveraging the Sliced-Wasserstein distance to measure the discrepancy between the dataset and simulations - and thus suitable for simulators with intractable likelihoods. We benchmark our method on several tasks, and show that it can recover source distributions with substantially higher entropy than recent source estimation methods, without sacrificing the fidelity of the simulations. Finally, to demonstrate the utility of our approach, we infer source distributions for parameters of the Hodgkin-Huxley model from experimental datasets with thousands of single-neuron measurements. In summary, we propose a principled method for inferring source distributions of scientific simulator parameters while retaining as much uncertainty as possible.
- [400] arXiv:2402.11533 (replaced) [pdf, ps, html, other]
-
Title: Randomness-Efficient Constructions of Capacity-Achieving List-Decodable CodesSubjects: Information Theory (cs.IT)
We wish to generate list-decodable codes over small alphabets using as little randomness as possible. Specifically, we hope to generate codes achieving what we term the Elias bound, which means that they are $(\rho,L)$-list-decodable with rate $R \geq 1-h(\rho)-O(1/L)$. A long line of work shows that uniformly random linear codes (RLCs) achieve the Elias bound: hence, we know $O(n^2)$ random bits suffice. Prior works demonstrate that just $O(Ln)$ random bits suffice, via puncturing of low-bias codes. These recent constructions are combinatorial.
We provide two new constructions, which are algebraic. Compared to prior works, our constructions are simpler and more direct. Furthermore, our codes are designed in such a way that their duals are also quite easy to analyze. Our first construction -- which can be seen as a generalization of the Wozencraft ensemble -- achieves the Elias bound and consumes $Ln$ random bits. Additionally, its dual code achieves the GV-bound with high probability, and both the primal and dual admit quasilinear-time encoding algorithms. The second construction consumes $2nL$ random bits and yields a code where both it and its dual achieve the Elias bound. As we discuss, properties of a dual code are often crucial for applications in cryptography.
In all of the above cases -- including the prior works achieving randomness complexity $O(Ln)$ -- the codes are designed to "approximate" RLCs. Namely, for a given locality parameter $L$ we construct codes achieving the same $L$-local properties as RLCs. This allows one to appeal to known list-decodability results for RLCs and thereby conclude that the code approximating an RLC also achieves the Elias bound. As a final contribution, we indicate that such a proof strategy is inherently unable to generate list-decodable codes of rate $R$ over $\mathbb F_q$ with less than $L(1-R)n\log_2(q)$ bits of randomness. - [401] arXiv:2402.13823 (replaced) [pdf, ps, html, other]
-
Title: Using Large Language Models for Natural Language Processing Tasks in Requirements Engineering: A Systematic GuidelineSubjects: Software Engineering (cs.SE)
Large Language Models (LLMs) are the cornerstone in automating Requirements Engineering (RE) tasks, underpinning recent advancements in the field. Their pre-trained comprehension of natural language is pivotal for effectively tailoring them to specific RE tasks. However, selecting an appropriate LLM from a myriad of existing architectures and fine-tuning it to address the intricacies of a given task poses a significant challenge for researchers and practitioners in the RE domain. Utilizing LLMs effectively for NLP problems in RE necessitates a dual understanding: firstly, of the inner workings of LLMs, and secondly, of a systematic approach to selecting and adapting LLMs for NLP4RE tasks. This chapter aims to furnish readers with essential knowledge about LLMs in its initial segment. Subsequently, it provides a comprehensive guideline tailored for students, researchers, and practitioners on harnessing LLMs to address their specific objectives. By offering insights into the workings of LLMs and furnishing a practical guide, this chapter contributes towards improving future research and applications leveraging LLMs for solving RE challenges.
- [402] arXiv:2402.14035 (replaced) [pdf, ps, html, other]
-
Title: Wisdom of Committee: Distilling from Foundation Model to Specialized Application ModelZichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe ZhaoSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Recent advancements in foundation models have yielded impressive performance across a wide range of tasks. Meanwhile, for specific applications, practitioners have been developing specialized application models. To enjoy the benefits of both kinds of models, one natural path is to transfer the knowledge in foundation models into specialized application models, which are generally more efficient for serving. Techniques from knowledge distillation may be applied here, where the application model learns to mimic the foundation model. However, specialized application models and foundation models have substantial gaps in capacity, employing distinct architectures, using different input features from different modalities, and being optimized on different distributions. These differences in model characteristics lead to significant challenges for distillation methods. In this work, we propose creating a teaching committee comprising both foundation model teachers and complementary teachers. Complementary teachers possess model characteristics akin to the student's, aiming to bridge the gap between the foundation model and specialized application models for a smoother knowledge transfer. Further, to accommodate the dissimilarity among the teachers in the committee, we introduce DiverseDistill, which allows the student to understand the expertise of each teacher and extract task knowledge. Our evaluations demonstrate that adding complementary teachers enhances student performance. Finally, DiverseDistill consistently outperforms baseline distillation methods, regardless of the teacher choices, resulting in significantly improved student performance.
- [403] arXiv:2402.14832 (replaced) [pdf, ps, other]
-
Title: Integrating Simulation Budget Management into Drum-Buffer-Rope: A Study on Parametrization and Reducing Computational EffortSubjects: Systems and Control (eess.SY)
In manufacturing, a bottleneck workstation frequently emerges, complicating production planning and escalating costs. To address this, Drum-Buffer-Rope (DBR) is a widely recognized production planning and control method that focuses on centralizing the bottleneck workstation, thereby improving production system performance. Although DBR is primarily focused on cre-ating a bottleneck schedule, the selection of planning parameters is crucial, as they significantly influence the scheduling process. Conducting a compre-hensive full factorial enumeration to identify the ideal planning parameters requires substantial computational effort. Simulation Budget Management (SBM) offers an effective concept to reduce this effort by skipping less promising parameter combinations. This publication introduces a method for integrating SBM into multi-stage multi-item DBR planned and controlled production system with limited capacity, aimed at determining the optimal planning parameters. Furthermore, we conduct a simulation study to analyze the effects of different production system environments, i.e., varying levels of shop load and process uncertainty, on both the performance and parame-terization of DBR and the efficacy of SBM. Our results show significant re-duction in simulation budget for identifying optimal planning parameters compared to traditional full factorial enumeration.
- [404] arXiv:2402.15953 (replaced) [pdf, ps, html, other]
-
Title: Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesComments: Accepted at the International Conference on Management of Data 2024Subjects: Databases (cs.DB)
With the increasing rate of data generated by critical systems, estimating functions on streaming data has become essential. This demand has driven numerous advancements in algorithms designed to efficiently query and analyze one or more data streams while operating under memory constraints. The primary challenge arises from the rapid influx of new items, requiring algorithms that enable efficient incremental processing of streams in order to keep up. A prominent algorithm in this domain is the AMS sketch. Originally developed to estimate the second frequency moment of a data stream, it can also estimate the cardinality of the equi-join between two relations. Since then, two important advancements are the Count sketch, a method which significantly improves upon the sketch update time, and secondly, an extension of the AMS sketch to accommodate multi-join queries. However, combining the strengths of these methods to maintain sketches for multi-join queries while ensuring fast update times is a non-trivial task, and has remained an open problem for decades as highlighted in the existing literature. In this work, we successfully address this problem by introducing a novel sketching method which has fast updates, even for sketches capable of accurately estimating the cardinality of complex multi-join queries. We prove that our estimator is unbiased and has the same error guarantees as the AMS-based method. Our experimental results confirm the significant improvement in update time complexity, resulting in orders of magnitude faster estimates, with equal or better estimation accuracy.
- [405] arXiv:2402.17472 (replaced) [pdf, ps, html, other]
-
Title: RAGFormer: Learning Semantic Attributes and Topological Structure for Fraud DetectionComments: Preprint.Under reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Fraud detection remains a challenging task due to the complex and deceptive nature of fraudulent activities. Current approaches primarily concentrate on learning only one perspective of the graph: either the topological structure of the graph or the attributes of individual nodes. However, we conduct empirical studies to reveal that these two types of features, while nearly orthogonal, are each independently effective. As a result, previous methods can not fully capture the comprehensive characteristics of the fraud graph. To address this dilemma, we present a novel framework called Relation-Aware GNN with transFormer~(RAGFormer) which simultaneously embeds both semantic and topological features into a target node. The simple yet effective network consists of a semantic encoder, a topology encoder, and an attention fusion module. The semantic encoder utilizes Transformer to learn semantic features and node interactions across different relations. We introduce Relation-Aware GNN as the topology encoder to learn topological features and node interactions within each relation. These two complementary features are interleaved through an attention fusion module to support prediction by both orthogonal features. Extensive experiments on two popular public datasets demonstrate that RAGFormer achieves state-of-the-art performance. The significant improvement of RAGFormer in an industrial credit card fraud detection dataset further validates the applicability of our method in real-world business scenarios.
- [406] arXiv:2403.01218 (replaced) [pdf, ps, html, other]
-
Title: Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of PrivacySubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
The high cost of model training makes it increasingly desirable to develop techniques for unlearning. These techniques seek to remove the influence of a training example without having to retrain the model from scratch. Intuitively, once a model has unlearned, an adversary that interacts with the model should no longer be able to tell whether the unlearned example was included in the model's training set or not. In the privacy literature, this is known as membership inference. In this work, we discuss adaptations of Membership Inference Attacks (MIAs) to the setting of unlearning (leading to their ``U-MIA'' counterparts). We propose a categorization of existing U-MIAs into ``population U-MIAs'', where the same attacker is instantiated for all examples, and ``per-example U-MIAs'', where a dedicated attacker is instantiated for each example. We show that the latter category, wherein the attacker tailors its membership prediction to each example under attack, is significantly stronger. Indeed, our results show that the commonly used U-MIAs in the unlearning literature overestimate the privacy protection afforded by existing unlearning techniques on both vision and language models. Our investigation reveals a large variance in the vulnerability of different examples to per-example U-MIAs. In fact, several unlearning algorithms lead to a reduced vulnerability for some, but not all, examples that we wish to unlearn, at the expense of increasing it for other examples. Notably, we find that the privacy protection for the remaining training examples may worsen as a consequence of unlearning. We also discuss the fundamental difficulty of equally protecting all examples using existing unlearning schemes, due to the different rates at which examples are unlearned. We demonstrate that naive attempts at tailoring unlearning stopping criteria to different examples fail to alleviate these issues.
- [407] arXiv:2403.03954 (replaced) [pdf, ps, html, other]
-
Title: 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D RepresentationsComments: Published at Robotics: Science and Systems (RSS) 2024. Videos, code, and data: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on this https URL .
- [408] arXiv:2403.06573 (replaced) [pdf, ps, other]
-
Title: Electrical Consumption Flexibility in the Cement IndustrySebastián Rojas-Innocenti, Enrique Baeyens, Alejandro Martín-Crespo, Sergio Saludes-Rodil, Fernando Frechoso-EscuderoSubjects: Systems and Control (eess.SY)
A method for identifying and quantifying the flexibility of electricity demand in a production plant is reported. The plant is equipped with electric machines, product storage silos, distributed generation, and electrical storage systems. The method aims to minimize production costs. To achieve this, the plant is mathematically modeled, and an economic optimization problem is formulated by managing these plant equipment. From this optimal schedule (base schedule), the feasibility of modifying it to sell or buy energy in the electricity balancing regulation markets is evaluated, thus obtaining the so called flexibility schedule. Finally, this method was successfully applied to a real case using data from a Spanish cement production plant.
- [409] arXiv:2403.08493 (replaced) [pdf, ps, other]
-
Title: Rumor Forwarding Prediction Model Based on Uncertain Time SeriesComments: 11 pages,3 figuresSubjects: Social and Information Networks (cs.SI); Applications (stat.AP)
The rapid spread of rumors in social media is mainly caused by individual retweets. This paper applies uncertainty time series analysis (UTSA) to analyze a rumor retweeting behavior on Weibo. First, the rumor forwarding is modeled using uncertain time series, including order selection, parameter estimation, residual analysis, uncertainty hypothesis testing and forecast, and the validity of using uncertain time series analysis is further supported by analyzing the characteristics of the residual plot. The experimental results show that the uncertain time series can better predict the next stage of rumor forwarding. The results of the study have important practical significance for rumor management and the management of social media information dissemination.
- [410] arXiv:2403.09499 (replaced) [pdf, ps, html, other]
-
Title: A Reinforcement Learning Approach to Dairy Farm Battery Management using Q LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Dairy farming consumes a significant amount of energy, making it an energy-intensive sector within agriculture. Integrating renewable energy generation into dairy farming could help address this challenge. Effective battery management is important for integrating renewable energy generation. Managing battery charging and discharging poses significant challenges because of fluctuations in electrical consumption, the intermittent nature of renewable energy generation, and fluctuations in energy prices. Artificial Intelligence (AI) has the potential to significantly improve the use of renewable energy in dairy farming, however, there is limited research conducted in this particular domain. This research considers Ireland as a case study as it works towards attaining its 2030 energy strategy centered on the utilization of renewable sources. This study proposes a Q-learning-based algorithm for scheduling battery charging and discharging in a dairy farm setting. This research also explores the effect of the proposed algorithm by adding wind generation data and considering additional case studies. The proposed algorithm reduces the cost of imported electricity from the grid by 13.41%, peak demand by 2%, and 24.49% when utilizing wind generation. These results underline how reinforcement learning is highly effective in managing batteries in the dairy farming sector.
- [411] arXiv:2403.10656 (replaced) [pdf, ps, html, other]
-
Title: Properties of the Strong Data Processing Constant for R\'enyi DivergenceComments: 6 pages, 1 figureSubjects: Information Theory (cs.IT)
Strong data processing inequalities (SDPI) are an important object of study in Information Theory and have been well studied for $f$-divergences. Universal upper and lower bounds have been provided along with several applications, connecting them to impossibility (converse) results, concentration of measure, hypercontractivity, and so on. In this paper, we study Rényi divergence and the corresponding SDPI constant whose behavior seems to deviate from that of ordinary $\Phi$-divergences. In particular, one can find examples showing that the universal upper bound relating its SDPI constant to the one of Total Variation does not hold in general. In this work, we prove, however, that the universal lower bound involving the SDPI constant of the Chi-square divergence does indeed hold. Furthermore, we also provide a characterization of the distribution that achieves the supremum when $\alpha$ is equal to $2$ and consequently compute the SDPI constant for Rényi divergence of the general binary channel.
- [412] arXiv:2403.10799 (replaced) [pdf, ps, html, other]
-
Title: Efficient Pruning of Large Language Model with Adaptive Estimation FusionJun Liu, Chao Wu, Changdi Yang, Hao Tang, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi WangSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models (LLMs) have become crucial for many generative downstream tasks, leading to an inevitable trend and significant challenge to deploy them efficiently on resource-constrained devices. Structured pruning is a widely used method to address this challenge. However, when dealing with the complex structure of the multiple decoder layers, general methods often employ common estimation approaches for pruning. These approaches lead to a decline in accuracy for specific downstream tasks. In this paper, we introduce a simple yet efficient method that adaptively models the importance of each substructure. Meanwhile, it can adaptively fuse coarse-grained and finegrained estimations based on the results from complex and multilayer structures. All aspects of our design seamlessly integrate into the endto-end pruning framework. Our experimental results, compared with state-of-the-art methods on mainstream datasets, demonstrate average accuracy improvements of 1.1%, 1.02%, 2.0%, and 1.2% for LLaMa-7B,Vicuna-7B, Baichuan-7B, and Bloom-7b1, respectively.
- [413] arXiv:2403.15383 (replaced) [pdf, ps, html, other]
-
Title: ThemeStation: Generating Theme-Aware 3D Assets from Few ExemplarsComments: Accepted to SIGGRAPH 2024. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D generation. ThemeStation synthesizes customized 3D assets based on given few exemplars with two goals: 1) unity for generating 3D assets that thematically align with the given exemplars and 2) diversity for generating 3D assets with a high degree of variations. To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage. We propose a novel dual score distillation (DSD) loss to jointly leverage priors from both the input exemplars and the synthesized concept image. Extensive experiments and user studies confirm that ThemeStation surpasses prior works in producing diverse theme-aware 3D models with impressive quality. ThemeStation also enables various applications such as controllable 3D-to-3D generation.
- [414] arXiv:2403.18621 (replaced) [pdf, ps, html, other]
-
Title: Performance Analysis of Integrated Sensing and Communication Networks with Blockage EffectsComments: Submitted to IEEE Transactions on Vehicular TechnologySubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Communication-sensing integration represents an up-and-coming area of research, enabling wireless networks to simultaneously perform communication and sensing tasks. However, in urban cellular networks, the blockage of buildings results in a complex signal propagation environment, affecting the performance analysis of integrated sensing and communication (ISAC) networks. To overcome this obstacle, this paper constructs a comprehensive framework considering building blockage and employs a distance-correlated blockage model to analyze interference from line of sight (LoS), non-line of sight (NLoS), and target reflection cascading (TRC) links. Using stochastic geometric theory, expressions for signal-to-interference-plus-noise ratio (SINR) and coverage probability for communication and sensing in the presence of blockage are derived, allowing for a comprehensive comparison under the same parameters. The research findings indicate that blockage can positively impact coverage, especially in enhancing communication performance. The analysis also suggests that there exists an optimal base station (BS) density when blockage is of the same order of magnitude as the BS density, maximizing communication or sensing coverage probability.
- [415] arXiv:2404.02543 (replaced) [pdf, ps, html, other]
-
Title: Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search DatasetSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Unbiased learning-to-rank (ULTR) is a well-established framework for learning from user clicks, which are often biased by the ranker collecting the data. While theoretically justified and extensively tested in simulation, ULTR techniques lack empirical validation, especially on modern search engines. The Baidu-ULTR dataset released for the WSDM Cup 2023, collected from Baidu's search engine, offers a rare opportunity to assess the real-world performance of prominent ULTR techniques. Despite multiple submissions during the WSDM Cup 2023 and the subsequent NTCIR ULTRE-2 task, it remains unclear whether the observed improvements stem from applying ULTR or other learning techniques.
In this work, we revisit and extend the available experiments on the Baidu-ULTR dataset. We find that standard unbiased learning-to-rank techniques robustly improve click predictions but struggle to consistently improve ranking performance, especially considering the stark differences obtained by choice of ranking loss and query-document features. Our experiments reveal that gains in click prediction do not necessarily translate to enhanced ranking performance on expert relevance annotations, implying that conclusions strongly depend on how success is measured in this benchmark. - [416] arXiv:2404.03043 (replaced) [pdf, ps, html, other]
-
Title: Linear Anchored Gaussian Mixture Model for Location and Width Computations of Objects in Thick Line ShapeComments: 23 pages, 13 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Accurate detection of the centerline of a thick linear structure and good estimation of its thickness are challenging topics in many real-world applications such X-ray imaging, remote sensing and lane marking detection in road traffic. Model-based approaches using Hough and Radon transforms are often used but, are not recommended for thick line detection, whereas methods based on image derivatives need further step-by-step processing making their efficiency dependent on each step outcome. In this paper, a novel paradigm to better detect thick linear objects is presented, where the 3D image gray level representation is considered as a finite mixture model of a statistical distribution, called linear anchored Gaussian distribution and parametrized by a scale factor to describe the structure thickness and radius and angle parameters to localize the structure centerline. Expectation-Maximization algorithm (Algo1) using the original image as input data is used to estimate the model parameters. To rid the data of irrelevant information brought by nonuniform and noisy background, a modified EM algorithm (Algo2) is detailed. In Experiments, the proposed algorithms show promising results on real-world images and synthetic images corrupted by blur and noise, where Algo2, using Hessian-based angle initialization, outperforms Algo1 and Algo2 with random angle initialization, in terms of running time and structure location and thickness computation accuracy.
- [417] arXiv:2404.03921 (replaced) [pdf, ps, html, other]
-
Title: Simple Techniques for Enhancing Sentence Embeddings in Generative Language ModelsComments: Accepted by ICIC 2024 (Oral)Subjects: Computation and Language (cs.CL)
Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. However, these advancements mainly pertain to fine-tuning scenarios, leaving explorations into computationally efficient direct inference methods for sentence representation in a nascent stage. This paper endeavors to bridge this research gap. Through comprehensive experimentation, we challenge the widely held belief in the necessity of an Explicit One-word Limitation for deriving sentence embeddings from Pre-trained Language Models (PLMs). We demonstrate that this approach, while beneficial for generative models under direct inference scenario, is not imperative for discriminative models or the fine-tuning of generative PLMs. This discovery sheds new light on the design of manual templates in future studies. Building upon this insight, we propose two innovative prompt engineering techniques capable of further enhancing the expressive power of PLMs' raw embeddings: Pretended Chain of Thought and Knowledge Enhancement. We confirm their effectiveness across various PLM types and provide a detailed exploration of the underlying factors contributing to their success.
- [418] arXiv:2404.05128 (replaced) [pdf, ps, html, other]
-
Title: Importance of realism in procedurally-generated synthetic images for deep learning: case studies in maize and canolaSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Artificial neural networks are often used to identify features of crop plants. However, training their models requires many annotated images, which can be expensive and time-consuming to acquire. Procedural models of plants, such as those developed with Lindenmayer-systems (L-systems) can be created to produce visually realistic simulations, and hence images of plant simulations, where annotations are implicitly known. These synthetic images can either augment or completely replace real images in training neural networks for phenotyping tasks. In this paper, we systematically vary amounts of real and synthetic images used for training in both maize and canola to better understand situations where synthetic images generated from L-systems can help prediction on real images. This work also explores the degree to which realism in the synthetic images improves prediction. We have five different variants of a procedural canola model (these variants were created by tuning the realism while using calibration), and the deep learning results showed how drastically these results improve as the canola synthetic images are made to be more realistic. Furthermore, we see how neural network predictions can be used to help calibrate L-systems themselves, creating a feedback loop.
- [419] arXiv:2404.05388 (replaced) [pdf, ps, html, other]
-
Title: An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle MappingComments: 1st ACM International Conference on AI-powered Software (AIware)Subjects: Software Engineering (cs.SE)
The advent of advanced AI underscores the urgent need for comprehensive safety evaluations, necessitating collaboration across communities (i.e., AI, software engineering, and governance). However, divergent practices and terminologies across these communities, combined with the complexity of AI systems-of which models are only a part-and environmental affordances (e.g., access to tools), obstruct effective communication and comprehensive evaluation. This paper proposes a framework for AI system evaluation comprising three components: 1) harmonised terminology to facilitate communication across communities involved in AI safety evaluation; 2) a taxonomy identifying essential elements for AI system evaluation; 3) a mapping between AI lifecycle, stakeholders, and requisite evaluations for accountable AI supply chain. This framework catalyses a deeper discourse on AI system evaluation beyond model-centric approaches.
- [420] arXiv:2404.07705 (replaced) [pdf, ps, html, other]
-
Title: ViM-UNet: Vision Mamba for Biomedical SegmentationComments: Published in MIDL 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
CNNs, most notably the UNet, are the default architecture for biomedical segmentation. Transformer-based approaches, such as UNETR, have been proposed to replace them, benefiting from a global field of view, but suffering from larger runtimes and higher parameter counts. The recent Vision Mamba architecture offers a compelling alternative to transformers, also providing a global field of view, but at higher efficiency. Here, we introduce ViM-UNet, a novel segmentation architecture based on it and compare it to UNet and UNETR for two challenging microscopy instance segmentation tasks. We find that it performs similarly or better than UNet, depending on the task, and outperforms UNETR while being more efficient. Our code is open source and documented at this https URL.
- [421] arXiv:2404.09395 (replaced) [pdf, ps, other]
-
Title: Data Analysis Methods Preliminaries for a Photon-based Hardware Random Number GeneratorComments: Presented at College of STEM SYmposium, Clayton State UniversitySubjects: Cryptography and Security (cs.CR); Instrumentation and Detectors (physics.ins-det)
High quality random numbers are necessary in the modern world. Ranging from encryption keys in cyber security to models and simulations for scientific use: it's important that these random numbers are of high quality and quickly attainable. One common solution to the generation of random numbers is that of pseudo-random number generators, or PRNGs. PRNGs generate random numbers by first quantifying some unpredictable phenomena into a number or string and feeding it into an algorithm which yields numbers randomly based on that seed. Easy places to find seeds include the user's mouse movements or the machine's uptime. These are only pseudorandom, however, as if given the same seed twice, the PRNG would generate the same 'random' output. This is great for games like Minecraft, but not so great for cybersecurity encryption key generation. By using a hardware random number generator (HRNG), random numbers that are not susceptible to the flaws found in PRNGs can be attained at a high rate.
- [422] arXiv:2404.10102 (replaced) [pdf, ps, other]
-
Title: Chinchilla Scaling: A replication attemptSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Hoffmann et al. (2022) propose three methods for estimating a compute-optimal scaling law. We attempt to replicate their third estimation procedure, which involves fitting a parametric loss function to a reconstruction of data from their plots. We find that the reported estimates are inconsistent with their first two estimation methods, fail at fitting the extracted data, and report implausibly narrow confidence intervals--intervals this narrow would require over 600,000 experiments, while they likely only ran fewer than 500. In contrast, our rederivation of the scaling law using the third approach yields results that are compatible with the findings from the first two estimation procedures described by Hoffmann et al.
- [423] arXiv:2404.13446 (replaced) [pdf, ps, other]
-
Title: New Structures and Algorithms for Length-Constrained Expander DecompositionsComments: Added funding infoSubjects: Data Structures and Algorithms (cs.DS)
Expander decompositions form the basis of one of the most flexible paradigms for close-to-linear-time graph algorithms. Length-constrained expander decompositions generalize this paradigm to better work for problems with lengths, distances and costs. Roughly, an $(h,s)$-length $\phi$-expander decomposition is a small collection of length increases to a graph so that nodes within distance $h$ can route flow over paths of length $hs$ with congestion at most $1/\phi$.
In this work, we give a close-to-linear time algorithm for computing length-constrained expander decompositions in graphs with general lengths and capacities. Notably, and unlike previous works, our algorithm allows for one to trade off off between the size of the decomposition and the length of routing paths: for any $\epsilon > 0$ not too small, our algorithm computes in close-to-linear time an $(h,s)$-length $\phi$-expander decomposition of size $m \cdot \phi \cdot n^\epsilon$ where $s = \exp(\text{poly}(1/\epsilon))$. The key foundations of our algorithm are: (1) a simple yet powerful structural theorem which states that the union of a sequence of sparse length-constrained cuts is itself sparse and (2) new algorithms for efficiently computing sparse length-constrained flows. - [424] arXiv:2404.14027 (replaced) [pdf, ps, html, other]
-
Title: OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation NetworksSophia Sirko-Galouchenko, Alexandre Boulch, Spyros Gidaris, Andrei Bursuc, Antonin Vobecky, Patrick Pérez, Renaud MarletComments: Accepted to CVPR 2024, Workshop on Autonomous DrivingSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We introduce a self-supervised pretraining method, called OcFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios. Moreover, empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach.
- [425] arXiv:2404.14634 (replaced) [pdf, ps, html, other]
-
Title: UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal CuesComments: 18 pages, 12 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image by leveraging temporal and cross-view information. Our novel cross-view fusion strategy is scalable to any number of cameras, while our synthetic data generation strategy ensures generalization across diverse actors, scenes, and viewpoints. Finally, UPose3D leverages the prediction uncertainty of both the 2D keypoint estimator and the pose compiler module. This provides robustness to outliers and noisy data, resulting in state-of-the-art performance in out-of-distribution settings. In addition, for in-distribution settings, UPose3D yields a performance rivaling methods that rely on 3D annotated data, while being the state-of-the-art among methods relying only on 2D supervision.
- [426] arXiv:2404.16061 (replaced) [pdf, ps, html, other]
-
Title: Dynamic Many Valued Logic Systems in Theoretical EconomicsSubjects: Logic in Computer Science (cs.LO); Theoretical Economics (econ.TH)
This paper is an original attempt to understand the foundations of economic reasoning. It endeavors to rigorously define the relationship between subjective interpretations and objective valuations of such interpretations in the context of theoretical economics. This analysis is substantially expanded through a dynamic approach, where the truth of a valuation results in an updated interpretation or changes in the agent's subjective belief regarding the effectiveness of the selected action as well as the objective reality of the effectiveness of all other possible actions (i.e. consequence realization). Complications arise when the economic agent is presented with a set of actions that render ambiguous preference, or when the effectiveness of an action cannot be perceived upon its selection, thereby necessitating a different theory of choice and consequence realization.
- [427] arXiv:2404.17358 (replaced) [pdf, ps, html, other]
-
Title: Adversarial Consistency and the Uniqueness of the Adversarial Bayes ClassifierComments: 18 pages, v2: fixed typosSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Adversarial training is a common technique for learning robust classifiers. Prior work showed that convex surrogate losses are not statistically consistent in the adversarial context -- or in other words, a minimizing sequence of the adversarial surrogate risk will not necessarily minimize the adversarial classification error. We connect the consistency of adversarial surrogate losses to properties of minimizers to the adversarial classification risk, known as \emph{adversarial Bayes classifiers}. Specifically, under reasonable distributional assumptions, a convex loss is statistically consistent for adversarial learning iff the adversarial Bayes classifier satisfies a certain notion of uniqueness.
- [428] arXiv:2404.18532 (replaced) [pdf, ps, html, other]
-
Title: MileBench: Benchmarking MLLMs in Long ContextComments: 31 pages, 13 figures, 14 tables; We add results of GPT-4o in this versionSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Despite the advancements and impressive performance of Multimodal Large Language Models (MLLMs) on benchmarks, their effectiveness in real-world, long-context, and multi-image tasks is unclear due to the benchmarks' limited scope. Existing benchmarks often focus on single-image and short-text samples, and when assessing multi-image tasks, they either limit the image count or focus on specific task (e.g time-series captioning), potentially obscuring the performance challenges of MLLMs. To address these limitations, we introduce MileBench, a pioneering benchmark designed to test the MultImodal Long-contExt capabilities of MLLMs. This benchmark comprises not only multimodal long contexts, but also multiple tasks requiring both comprehension and generation. We establish two distinct evaluation sets, diagnostic and realistic, to systematically assess MLLMs' long-context adaptation capacity and their ability to complete tasks in long-context scenarios. Our experimental results, obtained from testing 22 models, revealed that while the closed-source GPT-4o outperforms others, most open-source MLLMs struggle in long-context situations. Interestingly, the performance gap tends to widen with an increase in the number of images. We strongly encourage an intensification of research efforts towards enhancing MLLMs' long-context capabilities, especially in scenarios involving multiple images.
- [429] arXiv:2405.00623 (replaced) [pdf, ps, other]
-
Title: "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and TrustComments: Accepted to FAccT 2024. This version includes the appendixSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We explore this question through a large-scale, pre-registered, human-subject experiment (N=404) in which participants answer medical questions with or without access to responses from a fictional LLM-infused search engine. Using both behavioral and self-reported measures, we examine how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions (e.g., "I'm not sure, but...") decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy. An exploratory analysis suggests that this increase can be attributed to reduced (but not fully eliminated) overreliance on incorrect answers. While we observe similar effects for uncertainty expressed from a general perspective (e.g., "It's not clear, but..."), these effects are weaker and not statistically significant. Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters. This highlights the importance of user testing before deploying LLMs at scale.
- [430] arXiv:2405.01228 (replaced) [pdf, ps, html, other]
-
Title: RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning models often encounter challenges in making accurate inferences when there are domain shifts between the source and target data. This issue is particularly pronounced in clinical settings due to the scarcity of annotated data resulting from the professional and private nature of medical data. Despite the existence of decent solutions, many of them are hindered in clinical settings due to limitations in data collection and computational complexity. To tackle domain shifts in data-scarce medical scenarios, we propose a Random frequency filtering enabled Single-source Domain Generalization algorithm (RaffeSDG), which promises robust out-of-domain inference with segmentation models trained on a single-source domain. A filter-based data augmentation strategy is first proposed to promote domain variability within a single-source domain by introducing variations in frequency space and blending homologous samples. Then Gaussian filter-based structural saliency is also leveraged to learn robust representations across augmented samples, further facilitating the training of generalizable segmentation models. To validate the effectiveness of RaffeSDG, we conducted extensive experiments involving out-of-domain inference on segmentation tasks for three human tissues imaged by four diverse modalities. Through thorough investigations and comparisons, compelling evidence was observed in these experiments, demonstrating the potential and generalizability of RaffeSDG. The code is available at this https URL.
- [431] arXiv:2405.01389 (replaced) [pdf, ps, html, other]
-
Title: Invariant Risk Minimization Is A Total Variation ModelComments: ICML 2024Subjects: Machine Learning (cs.LG)
Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with respect to the classifier variable. Moreover, we propose a novel IRM framework based on the TV-$\ell_1$ model. It not only expands the classes of functions that can be used as the learning risk, but also has robust performance in denoising and invariant feature preservation based on the coarea formula. We also illustrate some requirements for IRM-TV-$\ell_1$ to achieve out-of-distribution generalization. Experimental results show that the proposed framework achieves competitive performance in several benchmark machine learning scenarios.
- [432] arXiv:2405.01909 (replaced) [pdf, ps, other]
-
Title: Towards Sustainable Low Carbon Emission Mini Data CentresIsmael Samaye (LIRMM | ADAC), Paul Leloup (LIRMM), Gilles Sassatelli (LIRMM | ADAC), Abdoulaye Gamatié (LIRMM | ADAC)Journal-ref: ComPAS 2023 - Conf{\'e}rence francophone d'informatique en Parall{\'e}lisme, Architecture et Syst{\`e}me, Jul 2023, Annecy, FranceSubjects: Hardware Architecture (cs.AR)
Mini data centres have become increasingly prevalent in diverse organizations in recent years. They can be easily deployed at large scale, with high resilience. They are also cost-effective and provide highsecurity protection. On the other hand, IT technologies have resulted in the development of ever more energy-efficient servers, leading to the periodic replacement of older-generation servers in mini data centres. However, the disposal of older servers has resulted in electronic waste that further aggravates the already critical e-waste problem. Furthermore, despite the shift towards more energy-efficient servers, many mini data centres still rely heavily on high-carbon energy sources. This contributes to data centres' overall carbon footprint. All these issues are concerns for sustainability. In order to address this sustainability issue, this paper proposes an approach to extend the lifespan of older-generation servers in mini data centres. This is made possible thanks to a novel solar-powered computing technology, named Genesis, that compensates for the energy overhead generated by older servers. As a result, electronic waste can be reduced while improving system sustainability by reusing functional server hardware. Moreover, Genesis does not require server cooling, which reduces energy and water requirements. Analytical reasoning is applied to compare the efficiency of typical conventional mini data centre designs against alternative Genesis-based designs, in terms of energy, carbon emissions and exploitation costs.
- [433] arXiv:2405.02175 (replaced) [pdf, ps, html, other]
-
Title: Hoaxpedia: A Unified Wikipedia Hoax Articles DatasetComments: Short paperSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Hoaxes are a recognised form of disinformation created deliberately, with potential serious implications in the credibility of reference knowledge resources such as Wikipedia. What makes detecting Wikipedia hoaxes hard is that they often are written according to the official style guidelines. In this work, we first provide a systematic analysis of the similarities and discrepancies between legitimate and hoax Wikipedia articles, and introduce Hoaxpedia, a collection of 311 Hoax articles (from existing literature as well as official Wikipedia lists) alongside semantically similar real articles. We report results of binary classification experiments in the task of predicting whether a Wikipedia article is real or hoax, and analyze several settings as well as a range of language models. Our results suggest that detecting deceitful content in Wikipedia based on content alone, despite not having been explored much in the past, is a promising direction.
- [434] arXiv:2405.02213 (replaced) [pdf, ps, html, other]
-
Title: Automatic Programming: Large Language Models and BeyondSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance
- [435] arXiv:2405.02661 (replaced) [pdf, ps, html, other]
-
Title: DDE-Find: Learning Delay Differential Equations from Noisy, Limited DataComments: 42 pages, 19 tables, 8 figuresSubjects: Machine Learning (cs.LG)
Delay Differential Equations (DDEs) are a class of differential equations that can model diverse scientific phenomena. However, identifying the parameters, especially the time delay, that make a DDE's predictions match experimental results can be challenging. We introduce DDE-Find, a data-driven framework for learning a DDE's parameters, time delay, and initial condition function. DDE-Find uses an adjoint-based approach to efficiently compute the gradient of a loss function with respect to the model parameters. We motivate and rigorously prove an expression for the gradients of the loss using the adjoint. DDE-Find builds upon recent developments in learning DDEs from data and delivers the first complete framework for learning DDEs from data. Through a series of numerical experiments, we demonstrate that DDE-Find can learn DDEs from noisy, limited data.
- [436] arXiv:2405.03548 (replaced) [pdf, ps, html, other]
-
Title: MAmmoTH2: Scaling Instructions from the WebComments: Work in ProgressSubjects: Computation and Language (cs.CL)
Instruction tuning improves the reasoning abilities of large language models (LLMs), with data quality and scalability being the crucial factors. Most instruction tuning data come from human crowd-sourcing or GPT-4 distillation. We propose a paradigm to efficiently harvest 10 million naturally existing instruction data from the pre-training web corpus to enhance LLM reasoning. Our approach involves (1) recalling relevant documents, (2) extracting instruction-response pairs, and (3) refining the extracted pairs using open-source LLMs. Fine-tuning base LLMs on this dataset, we build MAmmoTH2 models, which significantly boost performance on reasoning benchmarks. Notably, MAmmoTH2-7B's (Mistral) performance increases from 11% to 34% on MATH and from 36% to 67% on GSM8K without training on any in-domain data. Further training MAmmoTH2 on public instruction tuning datasets yields MAmmoTH2-Plus, achieving state-of-the-art performance on several reasoning and chatbot benchmarks. Our work demonstrates how to harvest large-scale, high-quality instruction data without costly human annotation or GPT-4 distillation, providing a new paradigm for building better instruction tuning data.
- [437] arXiv:2405.03669 (replaced) [pdf, ps, other]
-
Title: IMELL Cut Elimination with Linear OverheadComments: Version with proofs of the FSCD 2024 paper with the same titleSubjects: Logic in Computer Science (cs.LO)
Recently, Accattoli introduced the Exponential Substitution Calculus (ESC) given by untyped proof terms for Intuitionistic Multiplicative Exponential Linear Logic (IMELL), endowed with rewriting rules at-a-distance for cut elimination. He also introduced a new cut elimination strategy, dubbed the good strategy, and showed that its number of steps is a time cost model with polynomial overhead for the ESC/IMELL, and the first such one.
Here, we refine Accattoli's result by introducing an abstract machine for ESC and proving that it implements the good strategy and computes cut-free terms/proofs within a linear overhead. - [438] arXiv:2405.04065 (replaced) [pdf, ps, html, other]
-
Title: FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context InferenceComments: 14 pagesSubjects: Computation and Language (cs.CL)
Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work using utilizing retrieved content by simply prepending retrieved contents to the input poses a high runtime issue, which degrades the inference efficiency of the LLMs because they fail to use the Key-Value (KV) cache efficiently. In this paper, we propose \textsc{FlashBack}, a modular RALM designed to improve the inference efficiency of RALM with appending context pattern while maintaining decent performance after specific fine-tuning without heavily destruct the knowledge integrity of the LLM. \textsc{FlashBack} appends retrieved documents at the end of the context for efficiently utilizing the KV cache instead of prepending them. Our experiment shows that the inference speed of \textsc{FlashBack} is up to $4\times$ faster than the prepending method on a 7B LLM (Llama 2). Via bypassing unnecessary re-computation, it demonstrates an advancement by achieving significantly faster inference speed, and this heightened efficiency will substantially reduce inferential cost. Our code will be publicly available.
- [439] arXiv:2405.04144 (replaced) [pdf, ps, html, other]
-
Title: Lossy Compression with Data, Perception, and Classification ConstraintsComments: 23 pages, in part submitted to ITWSubjects: Information Theory (cs.IT)
By extracting task-relevant information while maximally compressing the input, the information bottleneck (IB) principle has provided a guideline for learning effective and robust representations of the target inference. However, extending the idea to the multi-task learning scenario with joint consideration of generative tasks and traditional reconstruction tasks remains unexplored. This paper addresses this gap by reconsidering the lossy compression problem with diverse constraints on data reconstruction, perceptual quality, and classification accuracy. Firstly, we study two ternary relationships, namely, the rate-distortion-classification (RDC) and rate-perception-classification (RPC). For both RDC and RPC functions, we derive the closed-form expressions of the optimal rate for binary and Gaussian sources. These new results complement the IB principle and provide insights into effectively extracting task-oriented information to fulfill diverse objectives. Secondly, unlike prior research demonstrating a tradeoff between classification and perception in signal restoration problems, we prove that such a tradeoff does not exist in the RPC function and reveal that the source noise plays a decisive role in the classification-perception tradeoff. Finally, we implement a deep-learning-based image compression framework, incorporating multiple tasks related to distortion, perception, and classification. The experimental results coincide with the theoretical analysis and verify the effectiveness of our generalized IB in balancing various task objectives.
- [440] arXiv:2405.04195 (replaced) [pdf, ps, html, other]
-
Title: Rational methods for abstract linear, non-homogeneous problems without order reductionComments: 14 pages, 4 tablesSubjects: Numerical Analysis (math.NA)
Starting from an A-stable rational approximation to $\rm{e}^z$ of order $p$, $$r(z)= 1+ z+ \cdots + z^p/ p! + O(z^{p+1}),$$ families of stable methods are proposed to time discretize abstract IVP's of the type $u'(t) = A u(t) + f(t)$. These numerical procedures turn out to be of order $p$, thus overcoming the order reduction phenomenon, and only one evaluation of $f$ per step is required.
- [441] arXiv:2405.04471 (replaced) [pdf, ps, html, other]
-
Title: Universal Spatial Audio TranscoderComments: 12 pages, 8 figures. Accepted for presentation at the AES 156th Convention, Madrid, Spain (June 2024)Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
This paper addresses the challenges associated with both the conversion between different spatial audio formats and the decoding of a spatial audio format to a specific loudspeaker layout. Existing approaches often rely on layout remapping tools, which may not guarantee optimal conversion from a psychoacoustic perspective. To overcome these challenges, we present the Universal Spatial Audio Transcoder (USAT) method and its corresponding open source implementation. USAT generates an optimal decoder or transcoder for any input spatial audio format, adapting it to any output format or 2D/3D loudspeaker configuration. Drawing upon optimization techniques based on psychoacoustic principles, the algorithm maximizes the preservation of spatial information. We present examples of the decoding and transcoding of several audio formats, and show that USAT approach is advantageous compared to the most common methods in the field.
- [442] arXiv:2405.04600 (replaced) [pdf, ps, html, other]
-
Title: Contextual API Completion for Unseen Repositories Using LLMsSubjects: Software Engineering (cs.SE)
Large language models have made substantial progress in addressing diverse code-related tasks. However, their adoption is hindered by inconsistencies in generating output due to the lack of real-world, domain-specific information, such as for intra-repository API calls for unseen software projects. We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual information within a code repository for API completion tasks. Our approach is tailored to refine code completion tasks, with a focus on optimizing local API completions. We examine relevant import statements during API completion to derive insights into local APIs, drawing from their method signatures. For API token completion, we analyze the inline variables and correlate them with the appropriate imported modules, thereby allowing our approach to rank the most contextually relevant suggestions from the available local APIs. Further, for conversational API completion, we gather APIs that are most relevant to the developer query with a retrieval-based search across the project. We employ our tool, LANCE, within the framework of our proposed benchmark, APIEval, encompassing two different programming languages. Our evaluation yields an average accuracy of 82.6% for API token completion and 76.9% for conversational API completion tasks. On average, LANCE surpasses Copilot by 143% and 142% for API token completion and conversational API completion, respectively. The implications of our findings are substantial for developers, suggesting that our lightweight context analysis can be applied to multilingual environments without language-specific training or fine-tuning, allowing for efficient implementation with minimal examples and effort.
- [443] arXiv:2405.04880 (replaced) [pdf, ps, html, other]
-
Title: The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake AudioYuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi SunSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including 2 languages, over 1M audio samples, and various test conditions, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online.
- [444] arXiv:2405.05529 (replaced) [pdf, ps, html, other]
-
Title: Tomur: Traffic-Aware Performance Prediction of On-NIC Network Functions with Multi-Resource ContentionComments: Correct the typo in introduction. Correct the typo in referenceSubjects: Networking and Internet Architecture (cs.NI)
Network function (NF) offloading on SmartNICs has been widely used in modern data centers, offering benefits in host resource saving and programmability. Co-running NFs on the same SmartNICs can cause performance interference due to onboard resource contention. Therefore, to meet performance SLAs while ensuring efficient resource management, operators need mechanisms to predict NF performance under such contention. However, existing solutions lack SmartNIC-specific knowledge and exhibit limited traffic awareness, leading to poor accuracy for on-NIC NFs. This paper proposes Tomur, a novel performance predictive system for on-NIC NFs. Tomur builds upon the key observation that co-located NFs contend for multiple resources, including onboard accelerators and the memory subsystem. It also facilitates traffic awareness according to the behaviors of individual resources to maintain accuracy as the external traffic attributes vary. Evaluation using BlueField-2 SmartNIC shows that Tomur improves the prediction accuracy by 78.8% and reduces SLA violations by 92.2% compared to state-of-the-art approaches, and enables new practical usecases.
- [445] arXiv:2405.05945 (replaced) [pdf, ps, html, other]
-
Title: Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion TransformersPeng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng LiComments: Technical Report; Code at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified framework designed to transform noise into images, videos, multi-view 3D objects, and audio clips conditioned on text instructions. By tokenizing the latent spatial-temporal space and incorporating learnable placeholders such as [nextline] and [nextframe] tokens, Lumina-T2X seamlessly unifies the representations of different modalities across various spatial-temporal resolutions. This unified approach enables training within a single framework for different modalities and allows for flexible generation of multimodal data at any resolution, aspect ratio, and length during inference. Advanced techniques like RoPE, RMSNorm, and flow matching enhance the stability, flexibility, and scalability of Flag-DiT, enabling models of Lumina-T2X to scale up to 7 billion parameters and extend the context window to 128K tokens. This is particularly beneficial for creating ultra-high-definition images with our Lumina-T2I model and long 720p videos with our Lumina-T2V model. Remarkably, Lumina-T2I, powered by a 5-billion-parameter Flag-DiT, requires only 35% of the training computational costs of a 600-million-parameter naive DiT. Our further comprehensive analysis underscores Lumina-T2X's preliminary capability in resolution extrapolation, high-resolution editing, generating consistent 3D views, and synthesizing videos with seamless transitions. We expect that the open-sourcing of Lumina-T2X will further foster creativity, transparency, and diversity in the generative AI community.
- [446] arXiv:2405.06270 (replaced) [pdf, ps, html, other]
-
Title: XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in HealthcareSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
The integration of Large Language Models (LLMs) into healthcare diagnostics offers a promising avenue for clinical decision-making. This study outlines the development of a novel method for zero-shot/few-shot in-context learning (ICL) by integrating medical domain knowledge using a multi-layered structured prompt. We also explore the efficacy of two communication styles between the user and LLMs: the Numerical Conversational (NC) style, which processes data incrementally, and the Natural Language Single-Turn (NL-ST) style, which employs long narrative prompts.
Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates, using a dataset of 920 patient records in various few-shot scenarios. Results indicate that traditional clinical machine learning (ML) models generally outperform LLMs in zero-shot and few-shot settings. However, the performance gap narrows significantly when employing few-shot examples alongside effective explainable AI (XAI) methods as sources of domain knowledge. Moreover, with sufficient time and an increased number of examples, the conversational style (NC) nearly matches the performance of ML models. Most notably, LLMs demonstrate comparable or superior cost-sensitive accuracy relative to ML models.
This research confirms that, with appropriate domain knowledge and tailored communication strategies, LLMs can significantly enhance diagnostic processes. The findings highlight the importance of optimizing the number of training examples and communication styles to improve accuracy and reduce biases in LLM applications. - [447] arXiv:2405.06321 (replaced) [pdf, ps, html, other]
-
Title: Correlation Dimension of Natural Language in a Statistical ManifoldComments: Published at Physical Review ResearchJournal-ref: Physical Review Research, 6(2), L022028 (2024)Subjects: Computation and Language (cs.CL); Statistical Mechanics (cond-mat.stat-mech); Artificial Intelligence (cs.AI)
The correlation dimension of natural language is measured by applying the Grassberger-Procaccia algorithm to high-dimensional sequences produced by a large-scale language model. This method, previously studied only in a Euclidean space, is reformulated in a statistical manifold via the Fisher-Rao distance. Language exhibits a multifractal, with global self-similarity and a universal dimension around 6.5, which is smaller than those of simple discrete random sequences and larger than that of a Barabási-Albert process. Long memory is the key to producing self-similarity. Our method is applicable to any probabilistic model of real-world discrete sequences, and we show an application to music data.
- [448] arXiv:2405.06445 (replaced) [pdf, ps, html, other]
-
Title: Systematic interval observer design for linear systemsComments: 5 pages, 2 figuresSubjects: Systems and Control (eess.SY)
We first propose systematic and comprehensive interval observer designs for linear time-invariant systems, under standard assumptions involving observability and interval bounds on the initial condition and disturbances. Historically, such designs rely on transformations with certain limitations into a form that is Metzler (for continuous time) or non-negative (for discrete time). We show that they can be effectively replaced with a linear time-invariant transformation that can be easily computed offline. Then, we propose the extension to the time-varying setting, where conventional transformations lack guaranteed outcomes. Academic examples are presented to illustrate our methods.
- [449] arXiv:2405.06670 (replaced) [pdf, ps, html, other]
-
Title: TLINet: Differentiable Neural Network Temporal Logic InferenceSubjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG)
There has been a growing interest in extracting formal descriptions of the system behaviors from data. Signal Temporal Logic (STL) is an expressive formal language used to describe spatial-temporal properties with interpretability. This paper introduces TLINet, a neural-symbolic framework for learning STL formulas. The computation in TLINet is differentiable, enabling the usage of off-the-shelf gradient-based tools during the learning process. In contrast to existing approaches, we introduce approximation methods for max operator designed specifically for temporal logic-based gradient techniques, ensuring the correctness of STL satisfaction evaluation. Our framework not only learns the structure but also the parameters of STL formulas, allowing flexible combinations of operators and various logical structures. We validate TLINet against state-of-the-art baselines, demonstrating that our approach outperforms these baselines in terms of interpretability, compactness, rich expressibility, and computational efficiency.
- [450] arXiv:2405.06671 (replaced) [pdf, ps, html, other]
-
Title: Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral LabellingSubhendu Khatuya, Rajdeep Mukherjee, Akash Ghosh, Manjunath Hegde, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan GoyalComments: This work has been accepted to appear at North American Chapter of the Association for Computational Linguistics (NAACL), 2024Subjects: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
We study the problem of automatically annotating relevant numerals (GAAP metrics) occurring in the financial documents with their corresponding XBRL tags. Different from prior works, we investigate the feasibility of solving this extreme classification problem using a generative paradigm through instruction tuning of Large Language Models (LLMs). To this end, we leverage metric metadata information to frame our target outputs while proposing a parameter efficient solution for the task using LoRA. We perform experiments on two recently released financial numeric labeling datasets. Our proposed model, FLAN-FinXC, achieves new state-of-the-art performances on both the datasets, outperforming several strong baselines. We explain the better scores of our proposed model by demonstrating its capability for zero-shot as well as the least frequently occurring tags. Also, even when we fail to predict the XBRL tags correctly, our generated output has substantial overlap with the ground-truth in majority of the cases.
- [451] arXiv:2405.06904 (replaced) [pdf, ps, html, other]
-
Title: Generation of Granular-Balls for Clustering Based on the Principle of Justifiable GranularitySubjects: Machine Learning (cs.LG)
Efficient and robust data clustering remains a challenging task in the field of data analysis. Recent efforts have explored the integration of granular-ball (GB) computing with clustering algorithms to address this challenge, yielding promising results. However, existing methods for generating GBs often rely on single indicators to measure GB quality and employ threshold-based or greedy strategies, potentially leading to GBs that do not accurately capture the underlying data distribution. To address these limitations, this article introduces a novel GB generation method. The originality of this method lies in leveraging the principle of justifiable granularity to measure the quality of a GB for clustering tasks. To be precise, we define the coverage and specificity of a GB and introduce a comprehensive measure for assessing GB quality. Utilizing this quality measure, the method incorporates a binary tree pruning-based strategy and an anomaly detection method to determine the best combination of sub-GBs for each GB and identify abnormal GBs, respectively. Compared to previous GB generation methods, the new method maximizes the overall quality of generated GBs while ensuring alignment with the data distribution, thereby enhancing the rationality of the generated GBs. Experimental results obtained from both synthetic and publicly available datasets underscore the effectiveness of the proposed GB generation method, showcasing improvements in clustering accuracy and normalized mutual information.
- [452] arXiv:2405.07035 (replaced) [pdf, ps, html, other]
-
Title: A Turkish Educational Crossword Puzzle GeneratorComments: This paper has been accepted for presentation at AIED2024 LBRSubjects: Computation and Language (cs.CL)
This paper introduces the first Turkish crossword puzzle generator designed to leverage the capabilities of large language models (LLMs) for educational purposes. In this work, we introduced two specially created datasets: one with over 180,000 unique answer-clue pairs for generating relevant clues from the given answer, and another with over 35,000 samples containing text, answer, category, and clue data, aimed at producing clues for specific texts and keywords within certain categories. Beyond entertainment, this generator emerges as an interactive educational tool that enhances memory, vocabulary, and problem-solving skills. It's a notable step in AI-enhanced education, merging game-like engagement with learning for Turkish and setting new standards for interactive, intelligent learning tools in Turkish.
- [453] arXiv:2405.07162 (replaced) [pdf, ps, html, other]
-
Title: Learning Reward for Robot Skills Using Large Language Models via Self-AlignmentComments: ICML 2024Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.
- [454] arXiv:2405.07396 (replaced) [pdf, ps, html, other]
-
Title: An Unstructured Body-of-Revolution Electromagnetic Particle-in-Cell Algorithm with Radial Perfectly Matched Layers and Dual PolarizationsComments: This manuscript has been accepted for the publication in Computer Physics Communications COMPHY-D-23-00476R4 (May 11, 2024)Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
A novel electromagnetic particle-in-cell algorithm has been developed for fully kinetic plasma simulations on unstructured (irregular) meshes in complex body-of-revolution geometries. The algorithm, implemented in the BORPIC++ code, utilizes a set of field scalings and a coordinate mapping, reducing the Maxwell field problem in a cylindrical system to a Cartesian finite element Maxwell solver in the meridian plane. The latter obviates the cylindrical coordinate singularity in the symmetry axis. The choice of an unstructured finite element discretization enhances the geometrical flexibility of the BORPIC++ solver compared to the more traditional finite difference solvers. Symmetries in Maxwell's equations are explored to decompose the problem into two dual polarization states with isomorphic representations that enable code reuse. The particle-in-cell scatter and gather steps preserve charge-conservation at the discrete level. Our previous algorithm (BORPIC+) discretized the E and B field components of TE-phi and TM-phi polarizations on the finite element (primal) mesh. A cylindrical perfectly matched layer is implemented as a boundary condition in the radial direction to simulate open space problems, with periodic boundary conditions in the axial direction. We investigate effects of charged particles moving next to the cylindrical perfectly matched layer. We model azimuthal currents arising from rotational motion of charged rings, which produce TM-phi polarized fields. Several numerical examples are provided to illustrate the first application of the algorithm.
- [455] arXiv:2405.07536 (replaced) [pdf, ps, html, other]
-
Title: Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path GeneratorSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system.
- [456] arXiv:2405.07550 (replaced) [pdf, ps, html, other]
-
Title: Wild Berry image dataset collected in Finnish forests and peatlands using dronesLuigi Riz, Sergio Povoli, Andrea Caraffa, Davide Boscaini, Mohamed Lamine Mekhalfi, Paul Chippendale, Marjut Turtiainen, Birgitta Partanen, Laura Smith Ballester, Francisco Blanes Noguera, Alessio Franchi, Elisa Castelli, Giacomo Piccinini, Luca Marchesotti, Micael Santos Couceiro, Fabio PoiesiSubjects: Computer Vision and Pattern Recognition (cs.CV)
Berry picking has long-standing traditions in Finland, yet it is challenging and can potentially be dangerous. The integration of drones equipped with advanced imaging techniques represents a transformative leap forward, optimising harvests and promising sustainable practices. We propose WildBe, the first image dataset of wild berries captured in peatlands and under the canopy of Finnish forests using drones. Unlike previous and related datasets, WildBe includes new varieties of berries, such as bilberries, cloudberries, lingonberries, and crowberries, captured under severe light variations and in cluttered environments. WildBe features 3,516 images, including a total of 18,468 annotated bounding boxes. We carry out a comprehensive analysis of WildBe using six popular object detectors, assessing their effectiveness in berry detection across different forest regions and camera types. We will release WildBe publicly.
- [457] arXiv:2405.07640 (replaced) [pdf, ps, html, other]
-
Title: Hyperparameter Importance Analysis for Multi-Objective AutoMLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Hyperparameter optimization plays a pivotal role in enhancing the predictive performance and generalization capabilities of ML models. However, in many applications, we do not only care about predictive performance but also about objectives such as inference time, memory, or energy consumption. In such MOO scenarios, determining the importance of hyperparameters poses a significant challenge due to the complex interplay between the conflicting objectives. In this paper, we propose the first method for assessing the importance of hyperparameters in the context of multi-objective hyperparameter optimization. Our approach leverages surrogate-based hyperparameter importance (HPI) measures, i.e. fANOVA and ablation paths, to provide insights into the impact of hyperparameters on the optimization objectives. Specifically, we compute the a-priori scalarization of the objectives and determine the importance of the hyperparameters for different objective tradeoffs. Through extensive empirical evaluations on diverse benchmark datasets with three different objectives paired with accuracy, namely time, demographic parity, and energy consumption, we demonstrate the effectiveness and robustness of our proposed method. Our findings not only offer valuable guidance for hyperparameter tuning in MOO tasks but also contribute to advancing the understanding of HPI in complex optimization scenarios.
- [458] arXiv:2405.07703 (replaced) [pdf, ps, html, other]
-
Title: OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs trained starting from Llama 2Mihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian RebedeaSubjects: Computation and Language (cs.CL)
In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English. Hence, their performance in English greatly exceeds their performance in other languages. This document presents our approach to training and evaluating the first foundational and chat LLM specialized for Romanian.
- [459] arXiv:2405.07719 (replaced) [pdf, ps, html, other]
-
Title: A Unified Sequence Parallelism Approach for Long Context Generative AIComments: 12 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Sequence parallelism (SP), which divides the sequence dimension of input tensors across multiple computational devices, is becoming key to unlocking the long-context capabilities of generative AI models. This paper investigates the state-of-the-art SP approaches, i.e. DeepSpeed-Ulysses and Ring-Attention, and proposes a unified SP approach, which is more robust to transformer model architectures and network hardware topology. This paper compares the communication and memory cost of SP and existing parallelism, including data/tensor/zero/expert/pipeline parallelism, and discusses the best practices for designing hybrid 4D parallelism involving SP. We achieved 86% MFU on two 8xA800 nodes using SP for sequence length 208K for the LLAMA3-8B model. Our code is publicly available on \url{this https URL}.
- [460] arXiv:2405.07896 (replaced) [pdf, ps, html, other]
-
Title: Almanac Copilot: Towards Autonomous Electronic Health Record NavigationCyril Zakka, Joseph Cho, Gracia Fahed, Rohan Shad, Michael Moor, Robyn Fong, Dhamanpreet Kaur, Vishnu Ravi, Oliver Aalami, Roxana Daneshjou, Akshay Chaudhari, William HiesingerSubjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Clinicians spend large amounts of time on clinical documentation, and inefficiencies impact quality of care and increase clinician burnout. Despite the promise of electronic medical records (EMR), the transition from paper-based records has been negatively associated with clinician wellness, in part due to poor user experience, increased burden of documentation, and alert fatigue. In this study, we present Almanac Copilot, an autonomous agent capable of assisting clinicians with EMR-specific tasks such as information retrieval and order placement. On EHR-QA, a synthetic evaluation dataset of 300 common EHR queries based on real patient data, Almanac Copilot obtains a successful task completion rate of 74% (n = 221 tasks) with a mean score of 2.45 over 3 (95% CI:2.34-2.56). By automating routine tasks and streamlining the documentation process, our findings highlight the significant potential of autonomous agents to mitigate the cognitive load imposed on clinicians by current EMR systems.
- [461] arXiv:2405.08029 (replaced) [pdf, ps, html, other]
-
Title: PHUDGE: Phi-3 as Scalable JudgeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In this paper cum technical report, we present PHUDGE A fine tuned Phi3 model that achieved SOTA results in 4 tasks as Feedback Test, Feedback OOD, MT Human, Preference Test surpassing each and every existing model in latency and throughput. It shows very strong correlation not only with GPT4 but with Human annotators too in unseen data as well as in both absolute and relative grading tasks. We have not only addressed the usage of small LMs for cost effective production grade systems but have also shown that Causal modelling is not only slow in nature but sometimes it can hinder models learning capabilities and should be replaced by simpler tasks whenever we can to make the overall system faster and better. We show that by following systematic ML experimentation, thoughtful data augmentation and re purposing the problem itself, we can even beat 10x bigger models even with lesser training data. To the best of our knowledge, we are re the first one to experiment and showcase the usage of generalised version of Earth Movers Distance AKA Wasserstein distance by using Minkowski Distance with a penalty to control loss smoothing and can be used as a loss function instead of Cross Entropy to get stable training and better results for grading tasks.
- [462] arXiv:2405.08036 (replaced) [pdf, ps, html, other]
-
Title: POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement LearningComments: change reference formatSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting the joint action values it can represent and hindering the learning of the optimal policy. To address this challenge, we propose the Potentially Optimal joint actions Weighted QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses of these joint actions during training. We theoretically prove that with such a weighted training approach the optimal policy is guaranteed to be recovered. Experiments in matrix games, predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
- [463] arXiv:2405.08097 (replaced) [pdf, ps, html, other]
-
Title: Learning functions on symmetric matrices and point clouds via lightweight invariant featuresComments: 28 pages, 2 figures, 2 tablesSubjects: Machine Learning (cs.LG); Commutative Algebra (math.AC)
In this work, we present a mathematical formulation for machine learning of (1) functions on symmetric matrices that are invariant with respect to the action of permutations by conjugation, and (2) functions on point clouds that are invariant with respect to rotations, reflections, and permutations of the points. To achieve this, we construct $O(n^2)$ invariant features derived from generators for the field of rational functions on $n\times n$ symmetric matrices that are invariant under joint permutations of rows and columns. We show that these invariant features can separate all distinct orbits of symmetric matrices except for a measure zero set; such features can be used to universally approximate invariant functions on almost all weighted graphs. For point clouds in a fixed dimension, we prove that the number of invariant features can be reduced, generically without losing expressivity, to $O(n)$, where $n$ is the number of points. We combine these invariant features with DeepSets to learn functions on symmetric matrices and point clouds with varying sizes. We empirically demonstrate the feasibility of our approach on molecule property regression and point cloud distance prediction.
- [464] arXiv:2405.08300 (replaced) [pdf, ps, html, other]
-
Title: Vector-Symbolic Architecture for Event-Based Optical FlowSubjects: Computer Vision and Pattern Recognition (cs.CV); Symbolic Computation (cs.SC)
From a perspective of feature matching, optical flow estimation for event cameras involves identifying event correspondences by comparing feature similarity across accompanying event frames. In this work, we introduces an effective and robust high-dimensional (HD) feature descriptor for event frames, utilizing Vector Symbolic Architectures (VSA). The topological similarity among neighboring variables within VSA contributes to the enhanced representation similarity of feature descriptors for flow-matching points, while its structured symbolic representation capacity facilitates feature fusion from both event polarities and multiple spatial scales. Based on this HD feature descriptor, we propose a novel feature matching framework for event-based optical flow, encompassing both model-based (VSA-Flow) and self-supervised learning (VSA-SM) methods. In VSA-Flow, accurate optical flow estimation validates the effectiveness of HD feature descriptors. In VSA-SM, a novel similarity maximization method based on the HD feature descriptor is proposed to learn optical flow in a self-supervised way from events alone, eliminating the need for auxiliary grayscale images. Evaluation results demonstrate that our VSA-based method achieves superior accuracy in comparison to both model-based and self-supervised learning methods on the DSEC benchmark, while remains competitive among both methods on the MVSEC benchmark. This contribution marks a significant advancement in event-based optical flow within the feature matching methodology.
- [465] arXiv:2405.08304 (replaced) [pdf, ps, html, other]
-
Title: Computational Thought Experiments for a More Rigorous Philosophy and Science of the MindComments: 6 pages, 4 figures, to appear at CogSci 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
We offer philosophical motivations for a method we call Virtual World Cognitive Science (VW CogSci), in which researchers use virtual embodied agents that are embedded in virtual worlds to explore questions in the field of Cognitive Science. We focus on questions about mental and linguistic representation and the ways that such computational modeling can add rigor to philosophical thought experiments, as well as the terminology used in the scientific study of such representations. We find that this method forces researchers to take a god's-eye view when describing dynamical relationships between entities in minds and entities in an environment in a way that eliminates the need for problematic talk of belief and concept types, such as the belief that cats are silly, and the concept CAT, while preserving belief and concept tokens in individual cognizers' minds. We conclude with some further key advantages of VW CogSci for the scientific study of mental and linguistic representation and for Cognitive Science more broadly.
- [466] arXiv:2405.08498 (replaced) [pdf, ps, html, other]
-
Title: Learning Decision Policies with Instrumental Variables through Double Machine LearningComments: Accepted at ICML 2024Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recent IV regression algorithms use a two-stage approach, where a deep neural network (DNN) estimator learnt in the first stage is directly plugged into the second stage, in which another DNN is used to estimate the causal effect. Naively plugging the estimator can cause heavy bias in the second stage, especially when regularisation bias is present in the first stage estimator. We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions and effectively learns high-performing policies. We derive a novel learning objective to reduce bias and design the DML-IV algorithm following the double/debiased machine learning (DML) framework. The learnt DML-IV estimator has strong convergence rate and $O(N^{-1/2})$ suboptimality guarantees that match those when the dataset is unconfounded. DML-IV outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
- [467] arXiv:2405.08596 (replaced) [pdf, ps, other]
-
Title: EVDA: Evolving Deepfake Audio Detection Continual Learning BenchmarkComments: This paper need more modificationSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
The rise of advanced large language models such as GPT-4, GPT-4o, and the Claude family has made fake audio detection increasingly challenging. Traditional fine-tuning methods struggle to keep pace with the evolving landscape of synthetic speech, necessitating continual learning approaches that can adapt to new audio while retaining the ability to detect older types. Continual learning, which acts as an effective tool for detecting newly emerged deepfake audio while maintaining performance on older types, lacks a well-constructed and user-friendly evaluation framework. To address this gap, we introduce EVDA, a benchmark for evaluating continual learning methods in deepfake audio detection. EVDA includes classic datasets from the Anti-Spoofing Voice series, Chinese fake audio detection series, and newly generated deepfake audio from models like GPT-4 and GPT-4o. It supports various continual learning techniques, such as Elastic Weight Consolidation (EWC), Learning without Forgetting (LwF), and recent methods like Regularized Adaptive Weight Modification (RAWM) and Radian Weight Modification (RWM). Additionally, EVDA facilitates the development of robust algorithms by providing an open interface for integrating new continual learning methods
- [468] arXiv:2405.08619 (replaced) [pdf, ps, html, other]
-
Title: ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive OptimisationSubjects: Computation and Language (cs.CL); Multimedia (cs.MM)
The field of chemistry and Artificial Intelligence (AI) intersection is an area of active research that aims to accelerate scientific discovery. The integration of large language models (LLMs) with scientific modalities has shown significant promise in this endeavour. However, challenges persist in effectively addressing training efficacy and the out-of-distribution problem, particularly as existing approaches rely on larger models and datasets. In this context, we focus on machine language-molecule translation and deploy a novel training approach called contrastive preference optimisation, which avoids generating translations that are merely adequate but not perfect. To ensure generalisability and mitigate memorisation effects, we conduct experiments using only 10\% of the data. Our results demonstrate that our models achieve up to a 32\% improvement compared to counterpart models. We also introduce a scalable fine-grained evaluation methodology that accommodates responsibility.
- [469] arXiv:2405.08649 (replaced) [pdf, ps, html, other]
-
Title: The computational power of discrete chemical reaction networks with bounded executionsSubjects: Computational Complexity (cs.CC)
Chemical reaction networks (CRNs) model systems where molecules interact according to a finite set of reactions such as $A + B \to C$, representing that if a molecule of $A$ and $B$ collide, they disappear and a molecule of $C$ is produced. CRNs can compute Boolean-valued predicates $\phi:\mathbb{N}^d \to \{0,1\}$ and integer-valued functions $f:\mathbb{N}^d \to \mathbb{N}$; for instance $X_1 + X_2 \to Y$ computes the function $\min(x_1,x_2)$.
We study the computational power of execution bounded CRNs, in which only a finite number of reactions can occur from the initial configuration (e.g., ruling out reversible reactions such as $A \rightleftharpoons B$). The power and composability of such CRNs depend crucially on some other modeling choices that do not affect the computational power of CRNs with unbounded executions, namely whether an initial leader is present, and whether (for predicates) all species are required to "vote" for the Boolean output. If the CRN starts with an initial leader, and can allow only the leader to vote, then all semilinear predicates and functions can be stably computed in $O(n \log n)$ parallel time by execution bounded CRNs.
However, if no initial leader is allowed, all species vote, and the CRN is "noncollapsing" (does not shrink from initially large to final $O(1)$ size configurations), then execution bounded CRNs are severely limited, able to compute only eventually constant predicates. A key tool is to characterize execution bounded CRNs as precisely those with a nonnegative linear potential function that is strictly decreased by every reaction, a result that may be of independent interest. - [470] arXiv:2405.08691 (replaced) [pdf, ps, html, other]
-
Title: Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search PlanningComments: Submitted to IROS 2024Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This study investigates the computational speed and accuracy of two numerical integration methods, cubature and sampling-based, for integrating an integrand over a 2D polygon. Using a group of rovers searching the Martian surface with a limited sensor footprint as a test bed, the relative error and computational time are compared as the area was subdivided to improve accuracy in the sampling-based approach. The results show that the sampling-based approach exhibits a $14.75\%$ deviation in relative error compared to cubature when it matches the computational performance at $100\%$. Furthermore, achieving a relative error below $1\%$ necessitates a $10000\%$ increase in relative time to calculate due to the $\mathcal{O}(N^2)$ complexity of the sampling-based method. It is concluded that for enhancing reinforcement learning capabilities and other high iteration algorithms, the cubature method is preferred over the sampling-based method.
- [471] arXiv:1807.02227 (replaced) [pdf, ps, html, other]
-
Title: Polynomial time algorithm for optimal stopping with fixed accuracySubjects: Probability (math.PR); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Computational Finance (q-fin.CP); Mathematical Finance (q-fin.MF)
The problem of high-dimensional path-dependent optimal stopping (OS) is important to multiple academic communities and applications. Modern OS tasks often have a large number of decision epochs, and complicated non-Markovian dynamics, making them especially challenging. Standard approaches, often relying on ADP, duality, deep learning and other heuristics, have shown strong empirical performance, yet have limited rigorous guarantees (which may scale exponentially in the problem parameters and/or require previous knowledge of basis functions or additional continuity assumptions). Although past work has placed these problems in the framework of computational complexity and polynomial-time approximability, those analyses were limited to simple one-dimensional problems. For long-horizon complex OS problems, is a polynomial time solution even theoretically possible? We prove that given access to an efficient simulator of the underlying information process, and fixed accuracy epsilon, there exists an algorithm that returns an epsilon-optimal solution (both stopping policies and approximate optimal values) with computational complexity scaling polynomially in the time horizon and underlying dimension. Like the first polynomial-time (approximation) algorithms for several other well-studied problems, our theoretical guarantees are polynomial yet impractical. Our approach is based on a novel expansion for the optimal value which may be of independent interest.
- [472] arXiv:2009.11689 (replaced) [pdf, ps, html, other]
-
Title: A characterization of absorbing sets in coalition formation gamesSubjects: Theoretical Economics (econ.TH); Computer Science and Game Theory (cs.GT)
Given a standard myopic dynamic process among coalition structures, an absorbing set is a minimal collection of such structures that is never left once entered through that process. Absorbing sets are an important solution concept in coalition formation games, but they have drawbacks: they can be large and hard to obtain. In this paper, we characterize an absorbing set in terms of a collection consisting of a small number of sets of coalitions that we refer to as a "reduced form" of a game. We apply our characterization to study convergence to stability in several economic environments.
- [473] arXiv:2106.11884 (replaced) [pdf, ps, html, other]
-
Title: Parallel computation of interval bases for persistence module decompositionComments: 49 pages, 10 Algorithm pseudocodes, 8 figures Changes are all over the sections. The heart of the work,Section 3, is kept essentially the same as in v2 In particular, additional material about comparisons to the Smith Normal Form reduction is added in the new Section 5 and Appendix B. Old Appendices A-B removedSubjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG)
A persistence module $M$, with coefficients in a field $\mathbb{F}$, is a finite-dimensional linear representation of an equioriented quiver of type $A_n$ or, equivalently, a graded module over the ring of polynomials $\mathbb{F}[x]$. It is well-known that $M$ can be written as the direct sum of indecomposable representations or as the direct sum of cyclic submodules generated by homogeneous elements. An interval basis for $M$ is a set of homogeneous elements of $M$ such that the sum of the cyclic submodules of $M$ generated by them is direct and equal to $M$. We introduce a novel algorithm to compute an interval basis for $M$. Based on a flag of kernels of the structure maps, our algorithm is suitable for parallel or distributed computation and does not rely on a presentation of $M$. This algorithm outperforms the approach via the presentation matrix and Smith Normal Form. We specialize our parallel approach to persistent homology modules, and we close by applying the proposed algorithm to tracking harmonics via Hodge decomposition.
- [474] arXiv:2201.07127 (replaced) [pdf, ps, html, other]
-
Title: Concatenations of Terms of an Arithmetic ProgressionComments: Substantial revision of the previous version. 15 pages, 27 referencesSubjects: Combinatorics (math.CO); Symbolic Computation (cs.SC)
Let $\left(u(n)\right)_{n\in\mathbb{N}}$ be an arithmetic progression of natural integers in base $b\in\mathbb{N}\setminus \{0,1\}$. We consider the following sequences: $s(n)=\overline{u(0)u(1)\cdots u(n) }^b$ formed by concatenating the first $n+1$ terms of $\left(u(n)\right)_{n\in\mathbb{N}}$ in base $b$ from the right; $s_r(n) = \overline{u(n)u(n-1)\cdots u(0)}^b$; and $\left(s_*(n)\right)_{n\in\mathbb{N}}$, given by $s_*(0)=u(0)$, $s_*(n)=\overline{s_r(n-1)s(n)}^b, n\geq 1$. We construct explicit formulas for these sequences and use basic concepts of linear difference operators to prove they are not P-recursive (holonomic). We also present an alternative proof that follows directly from their definitions. We implemented $\left(s(n)\right)_{n\in\mathbb{N}}$ and $\left(s_r(n)\right)_{n\in\mathbb{N}}$ in the decimal base when $(u(n))_{n\in\mathbb{N}}=\mathbb{N}\setminus \{0\}$.
- [475] arXiv:2208.05243 (replaced) [pdf, ps, html, other]
-
Title: Combinatorial Persistent Homology TransformComments: This is the final version of the paperJournal-ref: Foundations of Data Science, 2024Subjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG); Category Theory (math.CT)
The combinatorial interpretation of the persistence diagram as a Möbius inversion was recently shown to be functorial. We employ this discovery to recast the Persistent Homology Transform of a geometric complex as a representation of a cellulation on $\mathbb{S}^n$ to the category of combinatorial persistence diagrams. Detailed examples are provided. We hope this recasting of the PH transform will allow for the adoption of existing methods from algebraic and topological combinatorics to the study of shapes.
- [476] arXiv:2212.01792 (replaced) [pdf, ps, html, other]
-
Title: Classification by sparse generalized additive modelsSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Methodology (stat.ME)
We consider (nonparametric) sparse (generalized) additive models (SpAM) for classification. The design of a SpAM classifier is based on minimizing the logistic loss with a sparse group Lasso/Slope-type penalties on the coefficients of univariate additive components' expansions in orthonormal series (e.g., Fourier or wavelets). The resulting classifier is inherently adaptive to the unknown sparsity and smoothness. We show that under certain sparse group restricted eigenvalue condition it is nearly-minimax (up to log-factors) simultaneously across the entire range of analytic, Sobolev and Besov classes. The performance of the proposed classifier is illustrated on a simulated and a real-data examples.
- [477] arXiv:2301.09502 (replaced) [pdf, ps, html, other]
-
Title: The Identity Problem in the special affine group of $\mathbb{Z}^2$Comments: Extended abstract in LICS'23Subjects: Group Theory (math.GR); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM)
We consider semigroup algorithmic problems in the Special Affine group $\mathsf{SA}(2, \mathbb{Z}) = \mathbb{Z}^2 \rtimes \mathsf{SL}(2, \mathbb{Z})$, which is the group of affine transformations of the lattice $\mathbb{Z}^2$ that preserve orientation. Our paper focuses on two decision problems introduced by Choffrut and Karhumäki (2005): the Identity Problem (does a semigroup contain a neutral element?) and the Group Problem (is a semigroup a group?) for finitely generated sub-semigroups of $\mathsf{SA}(2, \mathbb{Z})$. We show that both problems are decidable and NP-complete. Since $\mathsf{SL}(2, \mathbb{Z}) \leq \mathsf{SA}(2, \mathbb{Z}) \leq \mathsf{SL}(3, \mathbb{Z})$, our result extends that of Bell, Hirvensalo and Potapov (2017) on the NP-completeness of both problems in $\mathsf{SL}(2, \mathbb{Z})$, and contributes a first step towards the open problems in $\mathsf{SL}(3, \mathbb{Z})$.
- [478] arXiv:2302.01947 (replaced) [pdf, ps, html, other]
-
Title: Extracting the gamma-ray source-count distribution below the Fermi-LAT detection limit with deep learningComments: 26 pages + Appendix, 28 figuresJournal-ref: JCAP09(2023)029Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); High Energy Astrophysical Phenomena (astro-ph.HE); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
We reconstruct the extra-galactic gamma-ray source-count distribution, or $dN/dS$, of resolved and unresolved sources by adopting machine learning techniques. Specifically, we train a convolutional neural network on synthetic 2-dimensional sky-maps, which are built by varying parameters of underlying source-counts models and incorporate the Fermi-LAT instrumental response functions. The trained neural network is then applied to the Fermi-LAT data, from which we estimate the source count distribution down to flux levels a factor of 50 below the Fermi-LAT threshold. We perform our analysis using 14 years of data collected in the $(1,10)$ GeV energy range. The results we obtain show a source count distribution which, in the resolved regime, is in excellent agreement with the one derived from catalogued sources, and then extends as $dN/dS \sim S^{-2}$ in the unresolved regime, down to fluxes of $5 \cdot 10^{-12}$ cm$^{-2}$ s$^{-1}$. The neural network architecture and the devised methodology have the flexibility to enable future analyses to study the energy dependence of the source-count distribution.
- [479] arXiv:2306.05597 (replaced) [pdf, ps, html, other]
-
Title: On the implementation of zero-determinant strategies in repeated gamesComments: 19 pagesSubjects: Statistical Mechanics (cond-mat.stat-mech); Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
Zero-determinant strategies are a class of strategies in repeated games which unilaterally control payoffs. Zero-determinant strategies have attracted much attention in studies of social dilemma, particularly in the context of evolution of cooperation. So far, not only general properties of zero-determinant strategies have been investigated, but zero-determinant strategies have been applied to control in the fields of information and communications technology and analysis of imitation. Here, we further deepen our understanding on general mathematical properties of zero-determinant strategies. We first prove that zero-determinant strategies, if exist, can be implemented by some one-dimensional transition probability. Next, we prove that, if a two-player game has a non-trivial potential function, a zero-determinant strategy exists in its repeated version. These results assist us to implement zero-determinant strategies in broader situations.
- [480] arXiv:2306.08321 (replaced) [pdf, ps, html, other]
-
Title: Nonparametric regression using over-parameterized shallow ReLU neural networksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the Hölder space with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.
- [481] arXiv:2306.11286 (replaced) [pdf, ps, html, other]
-
Title: Globally Optimal Solutions to a Class of Fractional Optimization Problems Based on Proximal Gradient AlgorithmComments: 29 pages, 2 figuresSubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
In this paper, we investigate a category of constrained fractional optimization problems that emerge in various practical applications. The objective function for this category is characterized by the ratio of a numerator and denominator, both being convex, semi-algebraic, Lipschitz continuous, and differentiable with Lipschitz continuous gradients over the constraint sets. The constrained sets associated with these problems are closed, convex, and semi-algebraic. We propose an efficient algorithm that is inspired by the proximal gradient method, and we provide a thorough convergence analysis. Our algorithm offers several benefits compared to existing methods. It requires only a single proximal gradient operation per iteration, thus avoiding the complicated inner-loop concave maximization usually required. Additionally, our method converges to a critical point without the typical need for a nonnegative numerator, and this critical point becomes a globally optimal solution with an appropriate condition. Our approach is adaptable to unbounded constraint sets as well. Therefore, our approach is viable for many more practical models. Numerical experiments show that our method not only reliably reaches ground-truth solutions in some model problems but also outperforms several existing methods in maximizing the Sharpe ratio with real-world financial data.
- [482] arXiv:2307.08929 (replaced) [pdf, ps, html, other]
-
Title: Active learning of effective Hamiltonian for super-large-scale atomic structuresXingyue Ma, Hongying Chen, Ri He, Zhanbo Yu, Sergei Prokhorenko, Zheng Wen, Zhicheng Zhong, Jorge Iñiguez, L. Bellaiche, Di Wu, Yurong YangComments: 11 pages, 4 figuresSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG); Applied Physics (physics.app-ph); Computational Physics (physics.comp-ph)
The first-principles-based effective Hamiltonian scheme provides one of the most accurate modeling technique for large-scale structures, especially for ferroelectrics. However, the parameterization of the effective Hamiltonian is complicated and can be difficult for some complex systems such as high-entropy perovskites. Here, we propose a general form of effective Hamiltonian and develop an active machine learning approach to parameterize the effective Hamiltonian based on Bayesian linear regression. The parameterization is employed in molecular dynamics simulations with the prediction of energy, forces, stress and their uncertainties at each step, which decides whether first-principles calculations are executed to retrain the parameters. Structures of BaTiO$_3$, Pb(Zr$_{0.75}$Ti$_{0.25}$)O$_3$ and (Pb,Sr)TiO$_3$ system are taken as examples to show the accuracy of this approach, as compared with conventional parametrization method and experiments. This machine learning approach provides a universal and automatic way to compute the effective Hamiltonian parameters for any considered complex systems with super-large-scale (more than $10^7$ atoms) atomic structures.
- [483] arXiv:2307.11127 (replaced) [pdf, ps, html, other]
-
Title: Asymptotically Unbiased Synthetic Control Methods by Distribution MatchingComments: This study was presented at the Workshop on Counterfactuals in Minds and Machines at the International Conference on Machine Learning in July 2023 and at the International Conference on Econometrics and Statistics in August 2023Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Methodology (stat.ME)
Synthetic Control Methods (SCMs) have become an essential tool for comparative case studies. The fundamental idea of SCMs is to estimate the counterfactual outcomes of a treated unit using a weighted sum of the observed outcomes of untreated units. The accuracy of the synthetic control (SC) is critical for evaluating the treatment effect of a policy intervention; therefore, the estimation of SC weights has been the focus of extensive research. In this study, we first point out that existing SCMs suffer from an endogeneity problem, the correlation between the outcomes of untreated units and the error term of the synthetic control, which yields a bias in the treatment effect estimator. We then propose a novel SCM based on density matching, assuming that the density of outcomes of the treated unit can be approximated by a weighted average of the joint density of untreated units (i.e., a mixture model). Based on this assumption, we estimate SC weights by matching the moments of treated outcomes with the weighted sum of moments of untreated outcomes. Our proposed method has three advantages over existing methods: first, our estimator is asymptotically unbiased under the assumption of the mixture model; second, due to the asymptotic unbiasedness, we can reduce the mean squared error in counterfactual predictions; third, our method generates full densities of the treatment effect, not merely expected values, which broadens the applicability of SCMs. We provide experimental results to demonstrate the effectiveness of our proposed method.
- [484] arXiv:2309.07927 (replaced) [pdf, ps, html, other]
-
Title: Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. AdultsSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data. However, this progress doesn't readily extend to ASR for children due to the limited availability of suitable child-specific databases and the distinct characteristics of children's speech. A recent study investigated leveraging the My Science Tutor (MyST) children's speech corpus to enhance Whisper's performance in recognizing children's speech. They were able to demonstrate some improvement on a limited testset. This paper builds on these findings by enhancing the utility of the MyST dataset through more efficient data preprocessing. We reduce the Word Error Rate (WER) on the MyST testset 13.93% to 9.11% with Whisper-Small and from 13.23% to 8.61% with Whisper-Medium and show that this improvement can be generalized to unseen datasets. We also highlight important challenges towards improving children's ASR performance. The results showcase the viable and efficient integration of Whisper for effective children's speech recognition.
- [485] arXiv:2310.02877 (replaced) [pdf, ps, html, other]
-
Title: Stationarity without mean reversion in improper Gaussian processesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The behavior of a GP regression depends on the choice of covariance function. Stationary covariance functions are preferred in machine learning applications. However, (non-periodic) stationary covariance functions are always mean reverting and can therefore exhibit pathological behavior when applied to data that does not relax to a fixed global mean value. In this paper we show that it is possible to use improper GP priors with infinite variance to define processes that are stationary but not mean reverting. To this aim, we use of non-positive kernels that can only be defined in this limit regime. The resulting posterior distributions can be computed analytically and it involves a simple correction of the usual formulas. The main contribution of the paper is the introduction of a large family of smooth non-reverting covariance functions that closely resemble the kernels commonly used in the GP literature (e.g. squared exponential and Matérn class). By analyzing both synthetic and real data, we demonstrate that these non-positive kernels solve some known pathologies of mean reverting GP regression while retaining most of the favorable properties of ordinary smooth stationary kernels.
- [486] arXiv:2311.18431 (replaced) [pdf, ps, other]
-
Title: On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithmsSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes adaPG$^{q,r}$, a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters $q$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations. In an attempt to better understand the underlying theory, its convergence is established in a more general setting that allows for time-varying parameters. Finally, an adaptive alternating minimization algorithm is presented by exploring the dual setting. This algorithm not only incorporates additional adaptivity, but also expands its applicability beyond standard strongly convex settings.
- [487] arXiv:2312.08174 (replaced) [pdf, ps, html, other]
-
Title: Double Machine Learning for Static Panel Models with Fixed EffectsSubjects: Econometrics (econ.EM); Machine Learning (cs.LG); Machine Learning (stat.ML)
Recent advances in causal inference have seen the development of methods which make use of the predictive power of machine learning algorithms. In this paper, we use double machine learning (DML) (Chernozhukov et al., 2018) to approximate high-dimensional and non-linear nuisance functions of the confounders to make inferences about the effects of policy interventions from panel data. We propose new estimators by adapting correlated random effects, within-group and first-difference estimation for linear models to an extension of Robinson (1988)'s partially linear regression model to static panel data models with individual fixed effects and unspecified non-linear confounder effects. Using Monte Carlo simulations, we compare the relative performance of different machine learning algorithms and find that conventional least squares estimators performs well when the data generating process is mildly non-linear and smooth, but there are substantial performance gains with DML in terms of bias reduction when the true effect of the regressors is non-linear and discontinuous. However, inference based on individual learners can lead to badly biased inference. Finally, we provide an illustrative example of DML for observational panel data showing the impact of the introduction of the minimum wage on voting behavior in the UK.
- [488] arXiv:2312.08625 (replaced) [pdf, ps, html, other]
-
Title: Graph Network Surrogate Model for Subsurface Flow OptimizationSubjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG)
The optimization of well locations and controls is an important step in the design of subsurface flow operations such as oil production or geological CO2 storage. These optimization problems can be computationally expensive, however, as many potential candidate solutions must be evaluated. In this study, we propose a graph network surrogate model (GNSM) for optimizing well placement and controls. The GNSM transforms the flow model into a computational graph that involves an encoding-processing-decoding architecture. Separate networks are constructed to provide global predictions for the pressure and saturation state variables. Model performance is enhanced through the inclusion of the single-phase steady-state pressure solution as a feature. A multistage multistep strategy is used for training. The trained GNSM is applied to predict flow responses in a 2D unstructured model of a channelized reservoir. Results are presented for a large set of test cases, in which five injection wells and five production wells are placed randomly throughout the model, with a random control variable (bottom-hole pressure) assigned to each well. Median relative error in pressure and saturation for 300 such test cases is 1-2%. The ability of the trained GNSM to provide accurate predictions for a new (geologically similar) permeability realization is demonstrated. Finally, the trained GNSM is used to optimize well locations and controls with a differential evolution algorithm. GNSM-based optimization results are comparable to those from simulation-based optimization, with a runtime speedup of a factor of 36. Much larger speedups are expected if the method is used for robust optimization, in which each candidate solution is evaluated on multiple geological models.
- [489] arXiv:2312.12625 (replaced) [pdf, ps, html, other]
-
Title: Calibrating Wireless Ray Tracing for Digital Twinning using Local Phase Error EstimatesComments: Revised 10 May 2024: additional FDTD experimentsSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Embodying the principle of simulation intelligence, digital twin (DT) systems construct and maintain a high-fidelity virtual model of a physical system. This paper focuses on ray tracing (RT), which is widely seen as an enabling technology for DTs of the radio access network (RAN) segment of next-generation disaggregated wireless systems. RT makes it possible to simulate channel conditions, enabling data augmentation and prediction-based transmission. However, the effectiveness of RT hinges on the adaptation of the electromagnetic properties assumed by the RT to actual channel conditions, a process known as calibration. The main challenge of RT calibration is the fact that small discrepancies in the geometric model fed to the RT software hinder the accuracy of the predicted phases of the simulated propagation paths. Existing solutions to this problem either rely on the channel power profile, hence disregarding phase information, or they operate on the channel responses by assuming the simulated phases to be sufficiently accurate for calibration. This paper proposes a novel channel response-based scheme that, unlike the state of the art, estimates and compensates for the phase errors in the RT-generated channel responses. The proposed approach builds on the variational expectation maximization algorithm with a flexible choice of the prior phase-error distribution that bridges between a deterministic model with no phase errors and a stochastic model with uniform phase errors. The algorithm is computationally efficient, and is demonstrated, by leveraging the open-source differentiable RT software available within the Sionna library, to outperform existing methods in terms of the accuracy of RT predictions.
- [490] arXiv:2402.01493 (replaced) [pdf, ps, html, other]
-
Title: Sliced-Wasserstein Estimation with Spherical Harmonics as Control VariatesComments: Accepted to ICML 2024Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The Sliced-Wasserstein (SW) distance between probability measures is defined as the average of the Wasserstein distances resulting for the associated one-dimensional projections. As a consequence, the SW distance can be written as an integral with respect to the uniform measure on the sphere and the Monte Carlo framework can be employed for calculating the SW distance. Spherical harmonics are polynomials on the sphere that form an orthonormal basis of the set of square-integrable functions on the sphere. Putting these two facts together, a new Monte Carlo method, hereby referred to as Spherical Harmonics Control Variates (SHCV), is proposed for approximating the SW distance using spherical harmonics as control variates. The resulting approach is shown to have good theoretical properties, e.g., a no-error property for Gaussian measures under a certain form of linear dependency between the variables. Moreover, an improved rate of convergence, compared to Monte Carlo, is established for general measures. The convergence analysis relies on the Lipschitz property associated to the SW integrand. Several numerical experiments demonstrate the superior performance of SHCV against state-of-the-art methods for SW distance computation.
- [491] arXiv:2402.09623 (replaced) [pdf, ps, html, other]
-
Title: Conformalized Adaptive Forecasting of Heterogeneous TrajectoriesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
This paper presents a new conformal method for generating simultaneous forecasting bands guaranteed to cover the entire path of a new random trajectory with sufficiently high probability. Prompted by the need for dependable uncertainty estimates in motion planning applications where the behavior of diverse objects may be more or less unpredictable, we blend different techniques from online conformal prediction of single and multiple time series, as well as ideas for addressing heteroscedasticity in regression. This solution is both principled, providing precise finite-sample guarantees, and effective, often leading to more informative predictions than prior methods.
- [492] arXiv:2404.00252 (replaced) [pdf, ps, html, other]
-
Title: Learned Scanpaths Aid Blind Panoramic Video Quality AssessmentComments: Accepted to CVPR 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Panoramic videos have the advantage of providing an immersive and interactive viewing experience. Nevertheless, their spherical nature gives rise to various and uncertain user viewing behaviors, which poses significant challenges for panoramic video quality assessment (PVQA). In this work, we propose an end-to-end optimized, blind PVQA method with explicit modeling of user viewing patterns through visual scanpaths. Our method consists of two modules: a scanpath generator and a quality assessor. The scanpath generator is initially trained to predict future scanpaths by minimizing their expected code length and then jointly optimized with the quality assessor for quality prediction. Our blind PVQA method enables direct quality assessment of panoramic images by treating them as videos composed of identical frames. Experiments on three public panoramic image and video quality datasets, encompassing both synthetic and authentic distortions, validate the superiority of our blind PVQA model over existing methods.
- [493] arXiv:2404.06394 (replaced) [pdf, ps, html, other]
-
Title: On the minimal memory set of cellular automataComments: 10 pagesSubjects: Cellular Automata and Lattice Gases (nlin.CG); Formal Languages and Automata Theory (cs.FL); Dynamical Systems (math.DS)
For a group $G$ and a finite set $A$, a cellular automaton (CA) is a transformation $\tau : A^G \to A^G$ defined via a finite memory set $S \subseteq G$ and a local map $\mu : A^S \to A$. Although memory sets are not unique, every CA admits a unique minimal memory set, which consists on all the essential elements of $S$ that affect the behavior of the local map. In this paper, we study the links between the minimal memory set and the generating patterns $\mathcal{P}$ of $\mu$; these are the patterns in $A^S$ that are not fixed when the cellular automaton is applied. In particular, we show that when $\vert S \vert \geq 2$ and $\vert \mathcal{P} \vert$ is not a multiple of $\vert A \vert$, then the minimal memory set must be $S$ itself. Moreover, when $\vert \mathcal{P} \vert = \vert A \vert$, $\vert S \vert \geq 3$, and the restriction of $\mu$ to these patterns is well-behaved, then the minimal memory set must be $S$ or $S \setminus \{s\}$, for some $s \in S \setminus \{e\}$. These are some of the first general theoretical results on the minimal memory set of a cellular automaton.
- [494] arXiv:2404.12141 (replaced) [pdf, ps, html, other]
-
Title: MolCRAFT: Structure-Based Drug Design in Continuous Parameter SpaceComments: 20 pages, 11 figuresSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and diffusion to SBDD, including mode collapse and hybrid continuous-discrete space. In this paper, we introduce MolCRAFT, the first SBDD model that operates in the continuous parameter space, together with a novel noise reduced sampling strategy. Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. To our best knowledge, MolCRAFT is the first to achieve reference-level Vina Scores (-6.59 kcal/mol) with comparable molecular size, outperforming other strong baselines by a wide margin (-0.84 kcal/mol). Code is available at this https URL.
- [495] arXiv:2404.15419 (replaced) [pdf, ps, other]
-
Title: Using Deep Learning to Identify Initial Error Sensitivity for Interpretable ENSO ForecastsSubjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)
We introduce an interpretable-by-design method, optimized model-analog, that integrates deep learning with model-analog forecasting, a straightforward yet effective approach that generates forecasts from similar initial climate states in a repository of model simulations. This hybrid framework employs a convolutional neural network to estimate state-dependent weights to identify initial analog states that lead to shadowing target trajectories. The advantage of our method lies in its inherent interpretability, offering insights into initial-error-sensitive regions through estimated weights and the ability to trace the physically-based evolution of the system through analog forecasting. We evaluate our approach using the Community Earth System Model Version 2 Large Ensemble to forecast the El Niño-Southern Oscillation (ENSO) on a seasonal-to-annual time scale. Results show a 10% improvement in forecasting equatorial Pacific sea surface temperature anomalies at 9-12 months leads compared to the original (unweighted) model-analog technique. Furthermore, our model demonstrates improvements in boreal winter and spring initialization when evaluated against a reanalysis dataset. Our approach reveals state-dependent regional sensitivity linked to various seasonally varying physical processes, including the Pacific Meridional Modes, equatorial recharge oscillator, and stochastic wind forcing. Additionally, disparities emerge in the sensitivity associated with El Niño versus La Niña events. El Niño forecasts are more sensitive to initial uncertainty in tropical Pacific sea surface temperatures, while La Niña forecasts are more sensitive to initial uncertainty in tropical Pacific zonal wind stress. This approach has broad implications for forecasting diverse climate phenomena, including regional temperature and precipitation, which are challenging for the original model-analog approach.
- [496] arXiv:2405.00725 (replaced) [pdf, ps, html, other]
-
Title: Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram DataVikhyat Agrawal, Sunil Vasu Kalmady, Venkataseetharam Manoj Malipeddi, Manisimha Varma Manthena, Weijie Sun, Saiful Islam, Abram Hindle, Padma Kaul, Russell GreinerComments: Accepted for ICMHI 2024Subjects: Signal Processing (eess.SP); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training. Our code is available at this https URL.
- [497] arXiv:2405.01726 (replaced) [pdf, ps, html, other]
-
Title: SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image DenoisingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Denoising hyperspectral images (HSIs) is a crucial preprocessing procedure due to the noise originating from intra-imaging mechanisms and environmental factors. Utilizing domain-specific knowledge of HSIs, such as spectral correlation, spatial self-similarity, and spatial-spectral correlation, is essential for deep learning-based denoising. Existing methods are often constrained by running time, space complexity, and computational complexity, employing strategies that explore these priors separately. While these strategies can avoid some redundant information, they inevitably overlook broader and more underlying long-range spatial-spectral information that positively impacts image restoration. This paper proposes a Spatial-Spectral Selective State Space Model-based U-shaped network, termed Spatial-Spectral U-Mamba (SSUMamba), for hyperspectral image denoising. We can obtain complete global spatial-spectral correlation within a module thanks to the linear space complexity in State Space Model (SSM) computations. We introduce a Spatial-Spectral Alternating Scan (SSAS) strategy for HSIs, which helps model the information flow in multiple directions in 3-D HSIs. Experimental results demonstrate that our method outperforms compared methods. The source code is available at this https URL.
- [498] arXiv:2405.02131 (replaced) [pdf, ps, html, other]
-
Title: Physics-informed generative neural networks for RF propagation prediction with application to indoor body perceptionSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Electromagnetic (EM) body models designed to predict Radio-Frequency (RF) propagation are time-consuming methods which prevent their adoption in strict real-time computational imaging problems, such as human body localization and sensing. Physics-informed Generative Neural Network (GNN) models have been recently proposed to reproduce EM effects, namely to simulate or reconstruct missing data or samples by incorporating relevant EM principles and constraints. The paper discusses a Variational Auto-Encoder (VAE) model which is trained to reproduce the effects of human motions on the EM field and incorporate EM body diffraction principles. Proposed physics-informed generative neural network models are verified against both classical diffraction-based EM tools and full-wave EM body simulations.
- [499] arXiv:2405.06725 (replaced) [pdf, ps, html, other]
-
Title: On the Shape of Brainscores for Large Language Models (LLMs)Comments: Published as a workshop paper at ICLR AGI Workshop 2024Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
With the rise of Large Language Models (LLMs), the novel metric "Brainscore" emerged as a means to evaluate the functional similarity between LLMs and human brain/neural systems. Our efforts were dedicated to mining the meaning of the novel score by constructing topological features derived from both human fMRI data involving 190 subjects, and 39 LLMs plus their untrained counterparts. Subsequently, we trained 36 Linear Regression Models and conducted thorough statistical analyses to discern reliable and valid features from our constructed ones. Our findings reveal distinctive feature combinations conducive to interpreting existing brainscores across various brain regions of interest (ROIs) and hemispheres, thereby significantly contributing to advancing interpretable machine learning (iML) studies. The study is enriched by our further discussions and analyses concerning existing brainscores. To our knowledge, this study represents the first attempt to comprehend the novel metric brainscore within this interdisciplinary domain.
- [500] arXiv:2405.08431 (replaced) [pdf, ps, html, other]
-
Title: Similarity Metrics for MR Image-To-Image TranslationComments: 29 pages, 6 figures, appendix with 5 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Image-to-image translation can create large impact in medical imaging, i.e. if images of a patient can be translated to another modality, type or sequence for better diagnosis. However, these methods must be validated by human reader studies, which are costly and restricted to small samples. Automatic evaluation of large samples to pre-evaluate and continuously improve methods before human validation is needed. In this study, we give an overview of reference and non-reference metrics for image synthesis assessment and investigate the ability of nine metrics, that need a reference (SSIM, MS-SSIM, PSNR, MSE, NMSE, MAE, LPIPS, NMI and PCC) and three non-reference metrics (BLUR, MSN, MNG) to detect 11 kinds of distortions in MR images from the BraSyn dataset. In addition we test a downstream segmentation metric and the effect of three normalization methods (Minmax, cMinMax and Zscore). Although PSNR and SSIM are frequently used to evaluate generative models for image-to-image-translation tasks in the medical domain, they show very specific shortcomings. SSIM ignores blurring but is very sensitive to intensity shifts in unnormalized MR images. PSNR is even more sensitive to different normalization methods and hardly measures the degree of distortions. Further metrics, such as LPIPS, NMI and DICE can be very useful to evaluate other similarity aspects. If the images to be compared are misaligned, most metrics are flawed. By carefully selecting and reasonably combining image similarity metrics, the training and selection of generative models for MR image synthesis can be improved. Many aspects of their output can be validated before final and costly evaluation by trained radiologists is conducted.
- [501] arXiv:2405.08608 (replaced) [pdf, ps, html, other]
-
Title: On the Paley RIP and Paley graph extractorComments: 10 pages, references are updated, comments are welcomeSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Number Theory (math.NT)
Constructing explicit RIP matrices is an open problem in compressed sensing theory. In particular, it is quite challenging to construct explicit RIP matrices that break the square-root bottleneck. On the other hand, providing explicit $2$-source extractors is a fundamental problem in theoretical computer science, cryptography and combinatorics. Nowadays, there are only a few known constructions for explicit $2$-source extractors (with negligible errors) that break the half barrier for min-entropy.
In this paper, we establish a new connection between RIP matrices breaking the square-root bottleneck and $2$-source extractors breaking the half barrier for min-entropy. Here we focus on an RIP matrix (called the Paley ETF) and a $2$-source extractor (called the Paley graph extractor), where both are defined from quadratic residues over the finite field of odd prime order $p\equiv 1 \pmod{4}$. As a main result, we prove that if the Paley ETF breaks the square-root bottleneck, then the Paley graph extractor breaks the half barrier for min-entropy as well. Since it is widely believed that the Paley ETF breaks the square-root bottleneck, our result accordingly provides a new affirmative intuition on the conjecture for the Paley graph extractor by Benny Chor and Oded Goldreich. - [502] arXiv:2405.08621 (replaced) [pdf, ps, html, other]
-
Title: RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video ContentComments: 8pages, 2figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artefacts and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose a novel blind deep video quality assessment (VQA) method specifically for enhanced video content. It employs a new Recurrent Memory Transformer (RMT) based network architecture to obtain video quality representations, which is optimised through a novel content-quality-aware contrastive learning strategy based on a new database containing 13K training patches with enhanced content. The extracted quality representations are then combined through linear regression to generate video-level quality indices. The proposed method, RMT-BVQA, has been evaluated on the VDPVE (VQA Dataset for Perceptual Video Enhancement) database through a five-fold cross validation. The results show its superior correlation performance when compared to ten existing no-reference quality metrics.
- [503] arXiv:2405.08801 (replaced) [pdf, ps, html, other]
-
Title: Prospects of Privacy Advantage in Quantum Machine LearningJamie Heredge, Niraj Kumar, Dylan Herman, Shouvanik Chakrabarti, Romina Yalovetzky, Shree Hari Sureshbabu, Changhao Li, Marco PistoiaComments: 28 pages, 8 figures, 1 tableSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
Ensuring data privacy in machine learning models is critical, particularly in distributed settings where model gradients are typically shared among multiple parties to allow collaborative learning. Motivated by the increasing success of recovering input data from the gradients of classical models, this study addresses a central question: How hard is it to recover the input data from the gradients of quantum machine learning models? Focusing on variational quantum circuits (VQC) as learning models, we uncover the crucial role played by the dynamical Lie algebra (DLA) of the VQC ansatz in determining privacy vulnerabilities. While the DLA has previously been linked to the classical simulatability and trainability of VQC models, this work, for the first time, establishes its connection to the privacy of VQC models. In particular, we show that properties conducive to the trainability of VQCs, such as a polynomial-sized DLA, also facilitate the extraction of detailed snapshots of the input. We term this a weak privacy breach, as the snapshots enable training VQC models for distinct learning tasks without direct access to the original input. Further, we investigate the conditions for a strong privacy breach where the original input data can be recovered from these snapshots by classical or quantum-assisted polynomial time methods. We establish conditions on the encoding map such as classical simulatability, overlap with DLA basis, and its Fourier frequency characteristics that enable such a privacy breach of VQC models. Our findings thus play a crucial role in detailing the prospects of quantum privacy advantage by guiding the requirements for designing quantum machine learning models that balance trainability with robust privacy protection.
- [504] arXiv:2405.08810 (replaced) [pdf, ps, other]
-
Title: Quantum computing with QiskitAli Javadi-Abhari, Matthew Treinish, Kevin Krsulich, Christopher J. Wood, Jake Lishman, Julien Gacon, Simon Martiel, Paul D. Nation, Lev S. Bishop, Andrew W. Cross, Blake R. Johnson, Jay M. GambettaSubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
We describe Qiskit, a software development kit for quantum information science. We discuss the key design decisions that have shaped its development, and examine the software architecture and its core components. We demonstrate an end-to-end workflow for solving a problem in condensed matter physics on a quantum computer that serves to highlight some of Qiskit's capabilities, for example the representation and optimization of circuits at various abstraction levels, its scalability and retargetability to new gates, and the use of quantum-classical computations via dynamic circuits. Lastly, we discuss some of the ecosystem of tools and plugins that extend Qiskit for various tasks, and the future ahead.