We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 253 entries: 1-253 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 3 May 24

[1]  arXiv:2405.00739 [pdf, other]
Title: Why does Knowledge Distillation Work? Rethink its Attention and Fidelity Mechanism
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Does Knowledge Distillation (KD) really work? Conventional wisdom viewed it as a knowledge transfer procedure where a perfect mimicry of the student to its teacher is desired. However, paradoxical studies indicate that closely replicating the teacher's behavior does not consistently improve student generalization, posing questions on its possible causes. Confronted with this gap, we hypothesize that diverse attentions in teachers contribute to better student generalization at the expense of reduced fidelity in ensemble KD setups. By increasing data augmentation strengths, our key findings reveal a decrease in the Intersection over Union (IoU) of attentions between teacher models, leading to reduced student overfitting and decreased fidelity. We propose this low-fidelity phenomenon as an underlying characteristic rather than a pathology when training KD. This suggests that stronger data augmentation fosters a broader perspective provided by the divergent teacher ensemble and lower student-teacher mutual information, benefiting generalization performance. These insights clarify the mechanism on low-fidelity phenomenon in KD. Thus, we offer new perspectives on optimizing student model performance, by emphasizing increased diversity in teacher attentions and reduced mimicry behavior between teachers and student.

[2]  arXiv:2405.00743 [pdf, other]
Title: On the weight dynamics of learning networks
Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)

Neural networks have become a widely adopted tool for tackling a variety of problems in machine learning and artificial intelligence. In this contribution we use the mathematical framework of local stability analysis to gain a deeper understanding of the learning dynamics of feed forward neural networks. Therefore, we derive equations for the tangent operator of the learning dynamics of three-layer networks learning regression tasks. The results are valid for an arbitrary numbers of nodes and arbitrary choices of activation functions. Applying the results to a network learning a regression task, we investigate numerically, how stability indicators relate to the final training-loss. Although the specific results vary with different choices of initial conditions and activation functions, we demonstrate that it is possible to predict the final training loss, by monitoring finite-time Lyapunov exponents or covariant Lyapunov vectors during the training process.

[3]  arXiv:2405.00746 [pdf, other]
Title: Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

To create useful reinforcement learning (RL) agents, step zero is to design a suitable reward function that captures the nuances of the task. However, reward engineering can be a difficult and time-consuming process. Instead, human-in-the-loop (HitL) RL allows agents to learn reward functions from human feedback. Despite recent successes, many of the HitL RL methods still require numerous human interactions to learn successful reward functions. To improve the feedback efficiency of HitL RL methods (i.e., require less feedback), this paper introduces Sub-optimal Data Pre-training, SDP, an approach that leverages reward-free, sub-optimal data to improve scalar- and preference-based HitL RL algorithms. In SDP, we start by pseudo-labeling all low-quality data with rewards of zero. Through this process, we obtain free reward labels to pre-train our reward model. This pre-training phase provides the reward model a head start in learning, whereby it can identify that low-quality transitions should have a low reward, all without any actual feedback. Through extensive experiments with a simulated teacher, we demonstrate that SDP can significantly improve or achieve competitive performance with state-of-the-art (SOTA) HitL RL algorithms across nine robotic manipulation and locomotion tasks.

[4]  arXiv:2405.00747 [pdf, other]
Title: Soft Preference Optimization: Aligning Language Models to Expert Distributions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We propose Soft Preference Optimization (SPO), a method for aligning generative models, such as Large Language Models (LLMs), with human preferences, without the need for a reward model. SPO optimizes model outputs directly over a preference dataset through a natural loss function that integrates preference loss with a regularization term across the model's entire output distribution rather than limiting it to the preference dataset. Although SPO does not require the assumption of an existing underlying reward model, we demonstrate that, under the Bradley-Terry (BT) model assumption, it converges to a softmax of scaled rewards, with the distribution's "softness" adjustable via the softmax exponent, an algorithm parameter. We showcase SPO's methodology, its theoretical foundation, and its comparative advantages in simplicity, computational efficiency, and alignment precision.

[5]  arXiv:2405.00792 [pdf, other]
Title: Error Exponent in Agnostic PAC Learning
Comments: paper with appendix to accepted ISIT2024 paper with the same name
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Statistical learning theory and the Probably Approximately Correct (PAC) criterion are the common approach to mathematical learning theory. PAC is widely used to analyze learning problems and algorithms, and have been studied thoroughly. Uniform worst case bounds on the convergence rate have been well established using, e.g., VC theory or Radamacher complexity. However, in a typical scenario the performance could be much better. In this paper, we consider PAC learning using a somewhat different tradeoff, the error exponent - a well established analysis method in Information Theory - which describes the exponential behavior of the probability that the risk will exceed a certain threshold as function of the sample size. We focus on binary classification and find, under some stability assumptions, an improved distribution dependent error exponent for a wide range of problems, establishing the exponential behavior of the PAC error probability in agnostic learning. Interestingly, under these assumptions, agnostic learning may have the same error exponent as realizable learning. The error exponent criterion can be applied to analyze knowledge distillation, a problem that so far lacks a theoretical analysis.

[6]  arXiv:2405.00819 [pdf, other]
Title: ICU Bloodstream Infection Prediction: A Transformer-Based Approach for EHR Analysis
Subjects: Machine Learning (cs.LG)

We introduce RatchetEHR, a novel transformer-based framework designed for the predictive analysis of electronic health records (EHR) data in intensive care unit (ICU) settings, with a specific focus on bloodstream infection (BSI) prediction. Leveraging the MIMIC-IV dataset, RatchetEHR demonstrates superior predictive performance compared to other methods, including RNN, LSTM, and XGBoost, particularly due to its advanced handling of sequential and temporal EHR data. A key innovation in RatchetEHR is the integration of the Graph Convolutional Transformer (GCT) component, which significantly enhances the ability to identify hidden structural relationships within EHR data, resulting in more accurate clinical predictions. Through SHAP value analysis, we provide insights into influential features for BSI prediction. RatchetEHR integrates multiple advancements in deep learning which together provide accurate predictions even with a relatively small sample size and highly imbalanced dataset. This study contributes to medical informatics by showcasing the application of advanced AI techniques in healthcare and sets a foundation for further research to optimize these capabilities in EHR data analysis.

[7]  arXiv:2405.00837 [pdf, other]
Title: Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations
Comments: 26 pages, 8 figures
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)

Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. Given a set of points $[\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n] = \mathbf{X} \in \mathbb{R}^{d \times n}$ and a vector $\mathbf{y} \in \mathbb{R}^d$, the goal is to find coefficients $\mathbf{w} \in \mathbb{R}^n$ so that $\mathbf{X} \mathbf{w} \approx \mathbf{y}$, subject to some desired structure on $\mathbf{w}$. In this work we seek $\mathbf{w}$ that forms a local reconstruction of $\mathbf{y}$ by solving a regularized least squares regression problem. We obtain local solutions through a locality function that promotes the use of columns of $\mathbf{X}$ that are close to $\mathbf{y}$ when used as a regularization term. We prove that, for all levels of regularization and under a mild condition that the columns of $\mathbf{X}$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$, thereby providing local sparse solutions when $d \ll n$. Under the same condition we also show that for any $\mathbf{y}$ contained in the convex hull of $\mathbf{X}$ there exists a regime of regularization parameter such that the optimal coefficients are supported on the vertices of the Delaunay simplex containing $\mathbf{y}$. This provides an interpretation of the sparsity as having structure obtained implicitly from the Delaunay triangulation of $\mathbf{X}$. We demonstrate that our locality regularized problem can be solved in comparable time to other methods that identify the containing Delaunay simplex.

[8]  arXiv:2405.00839 [pdf, other]
Title: Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning
Comments: This paper has been accepted for presentation at ICDCS (44th IEEE International Conference on Distributed Computing Systems). Keywords: decentralized multi-agent learning, federated learning, edge computing, heterogeneous agents, workload balancing, and communication-efficient training )
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Performance (cs.PF)

Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task size) may lead to substantial variations in training time. This heterogeneity creates a bottleneck, lengthening the overall training time due to straggler effects and potentially wasting spare resources of faster agents. To minimize training time in heterogeneous environments, we present a Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning (ComDML), which balances the workload among agents through a decentralized approach. Leveraging local-loss split training, ComDML enables parallel updates, where slower agents offload part of their workload to faster agents. To minimize the overall training time, ComDML optimizes the workload balancing by jointly considering the communication and computation capacities of agents, which hinges upon integer programming. A dynamic decentralized pairing scheduler is developed to efficiently pair agents and determine optimal offloading amounts. We prove that in ComDML, both slower and faster agents' models converge, for convex and non-convex functions. Furthermore, extensive experimental results on popular datasets (CIFAR-10, CIFAR-100, and CINIC-10) and their non-I.I.D. variants, with large models such as ResNet-56 and ResNet-110, demonstrate that ComDML can significantly reduce the overall training time while maintaining model accuracy, compared to state-of-the-art methods. ComDML demonstrates robustness in heterogeneous environments, and privacy measures can be seamlessly integrated for enhanced data protection.

[9]  arXiv:2405.00853 [pdf, ps, other]
Title: Efficient Algorithms for Learning Monophonic Halfspaces in Graphs
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of learning a binary classifier on the vertices of a graph. In particular, we consider classifiers given by monophonic halfspaces, partitions of the vertices that are convex in a certain abstract sense. Monophonic halfspaces, and related notions such as geodesic halfspaces,have recently attracted interest, and several connections have been drawn between their properties(e.g., their VC dimension) and the structure of the underlying graph $G$. We prove several novel results for learning monophonic halfspaces in the supervised, online, and active settings. Our main result is that a monophonic halfspace can be learned with near-optimal passive sample complexity in time polynomial in $n = |V(G)|$. This requires us to devise a polynomial-time algorithm for consistent hypothesis checking, based on several structural insights on monophonic halfspaces and on a reduction to $2$-satisfiability. We prove similar results for the online and active settings. We also show that the concept class can be enumerated with delay $\operatorname{poly}(n)$, and that empirical risk minimization can be performed in time $2^{\omega(G)}\operatorname{poly}(n)$ where $\omega(G)$ is the clique number of $G$. These results answer open questions from the literature (Gonz\'alez et al., 2020), and show a contrast with geodesic halfspaces, for which some of the said problems are NP-hard (Seiffarth et al., 2023).

[10]  arXiv:2405.00877 [pdf, other]
Title: Markov flow policy -- deep MC
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Discounted algorithms often encounter evaluation errors due to their reliance on short-term estimations, which can impede their efficacy in addressing simple, short-term tasks and impose undesired temporal discounts (\(\gamma\)). Interestingly, these algorithms are often tested without applying a discount, a phenomenon we refer as the \textit{train-test bias}. In response to these challenges, we propose the Markov Flow Policy, which utilizes a non-negative neural network flow to enable comprehensive forward-view predictions. Through integration into the TD7 codebase and evaluation using the MuJoCo benchmark, we observe significant performance improvements, positioning MFP as a straightforward, practical, and easily implementable solution within the domain of average rewards algorithms.

[11]  arXiv:2405.00879 [pdf, other]
Title: Machine Learning Techniques for Data Reduction of Climate Applications
Comments: 7 pages. arXiv admin note: text overlap with arXiv:2404.18063
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)

Scientists conduct large-scale simulations to compute derived quantities-of-interest (QoI) from primary data. Often, QoI are linked to specific features, regions, or time intervals, such that data can be adaptively reduced without compromising the integrity of QoI. For many spatiotemporal applications, these QoI are binary in nature and represent presence or absence of a physical phenomenon. We present a pipelined compression approach that first uses neural-network-based techniques to derive regions where QoI are highly likely to be present. Then, we employ a Guaranteed Autoencoder (GAE) to compress data with differential error bounds. GAE uses QoI information to apply low-error compression to only these regions. This results in overall high compression ratios while still achieving downstream goals of simulation or data collections. Experimental results are presented for climate data generated from the E3SM Simulation model for downstream quantities such as tropical cyclone and atmospheric river detection and tracking. These results show that our approach is superior to comparable methods in the literature.

[12]  arXiv:2405.00885 [pdf, other]
Title: WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Image and Video Processing (eess.IV)

As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing and communications capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy.

[13]  arXiv:2405.00902 [pdf, ps, other]
Title: MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure
Comments: Accepted to AAMAS 2024. 15 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to "cover" the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.

[14]  arXiv:2405.00909 [pdf, other]
Title: Quantum Federated Learning Experiments in the Cloud with Data Encoding
Comments: SIGCOMM 2024, Quantum Computing, Federated Learning, Qiskit
Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Quantum Physics (quant-ph)

Quantum Federated Learning (QFL) is an emerging concept that aims to unfold federated learning (FL) over quantum networks, enabling collaborative quantum model training along with local data privacy. We explore the challenges of deploying QFL on cloud platforms, emphasizing quantum intricacies and platform limitations. The proposed data-encoding-driven QFL, with a proof of concept (GitHub Open Source) using genomic data sets on quantum simulators, shows promising results.

[15]  arXiv:2405.00910 [pdf, other]
Title: De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data
Authors: Nicholas Tenev
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Econometrics (econ.EM)

Prediction models can improve efficiency by automating decisions such as the approval of loan applications. However, they may inherit bias against protected groups from the data they are trained on. This paper adds counterfactual (simulated) ethnic bias to real data on mortgage application decisions, and shows that this bias is replicated by a machine learning model (XGBoost) even when ethnicity is not used as a predictive variable. Next, several other de-biasing methods are compared: averaging over prohibited variables, taking the most favorable prediction over prohibited variables (a novel method), and jointly minimizing errors as well as the association between predictions and prohibited variables. De-biasing can recover some of the original decisions, but the results are sensitive to whether the bias is effected through a proxy.

[16]  arXiv:2405.00922 [pdf, other]
Title: MTDT: A Multi-Task Deep Learning Digital Twin
Comments: 8 pages, 2 figures, 4 tables
Subjects: Machine Learning (cs.LG)

Traffic congestion has significant impacts on both the economy and the environment. Measures of Effectiveness (MOEs) have long been the standard for evaluating the level of service and operational efficiency of traffic intersections. However, the scarcity of traditional high-resolution loop detector data (ATSPM) presents challenges in accurately measuring MOEs or capturing the intricate temporospatial characteristics inherent in urban intersection traffic. In response to this challenge, we have introduced the Multi-Task Deep Learning Digital Twin (MTDT) as a solution for multifaceted and precise intersection traffic flow simulation. MTDT enables accurate, fine-grained estimation of loop detector waveform time series for each lane of movement, alongside successful estimation of several MOEs for each lane group associated with a traffic phase concurrently and for all approaches of an arbitrary urban intersection. Unlike existing deep learning methodologies, MTDT distinguishes itself through its adaptability to local temporal and spatial features, such as signal timing plans, intersection topology, driving behaviors, and turning movement counts. While maintaining a straightforward design, our model emphasizes the advantages of multi-task learning in traffic modeling. By consolidating the learning process across multiple tasks, MTDT demonstrates reduced overfitting, increased efficiency, and enhanced effectiveness by sharing representations learned by different tasks. Furthermore, our approach facilitates sequential computation and lends itself to complete parallelization through GPU implementation. This not only streamlines the computational process but also enhances scalability and performance.

[17]  arXiv:2405.00937 [pdf, ps, other]
Title: New bounds on the cohesion of complete-link and other linkage methods for agglomeration clustering
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces.
One of our new bounds, in contrast to the existing ones, allows us to separate complete-link from single-link in terms of approximation for the diameter, which corroborates the common perception that the former is more suitable than the latter when the goal is producing compact clusters.
We also show that our techniques can be employed to derive upper bounds on the cohesion of a class of linkage methods that includes the quite popular average-link.

[18]  arXiv:2405.00946 [pdf, other]
Title: SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters
Subjects: Machine Learning (cs.LG)

This paper introduces SparseTSF, a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF), designed to address the challenges of modeling complex temporal dependencies over extended horizons with minimal computational resources. At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity and trend in time series data. This technique involves downsampling the original sequences to focus on cross-period trend prediction, effectively extracting periodic features while minimizing the model's complexity and parameter count. Based on this technique, the SparseTSF model uses fewer than 1k parameters to achieve competitive or superior performance compared to state-of-the-art models. Furthermore, SparseTSF showcases remarkable generalization capabilities, making it well-suited for scenarios with limited computational resources, small samples, or low-quality data. The code is available at: https://github.com/lss-1138/SparseTSF.

[19]  arXiv:2405.00949 [pdf, other]
Title: The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Chemical Physics (physics.chem-ph); Biomolecules (q-bio.BM)

This study introduces a systematic framework to compare the efficacy of Large Language Models (LLMs) for fine-tuning across various cheminformatics tasks. Employing a uniform training methodology, we assessed three well-known models-RoBERTa, BART, and LLaMA-on their ability to predict molecular properties using the Simplified Molecular Input Line Entry System (SMILES) as a universal molecular representation format. Our comparative analysis involved pre-training 18 configurations of these models, with varying parameter sizes and dataset scales, followed by fine-tuning them on six benchmarking tasks from DeepChem. We maintained consistent training environments across models to ensure reliable comparisons. This approach allowed us to assess the influence of model type, size, and training dataset size on model performance. Specifically, we found that LLaMA-based models generally offered the lowest validation loss, suggesting their superior adaptability across tasks and scales. However, we observed that absolute validation loss is not a definitive indicator of model performance - contradicts previous research - at least for fine-tuning tasks: instead, model size plays a crucial role. Through rigorous replication and validation, involving multiple training and fine-tuning cycles, our study not only delineates the strengths and limitations of each model type but also provides a robust methodology for selecting the most suitable LLM for specific cheminformatics applications. This research underscores the importance of considering model architecture and dataset characteristics in deploying AI for molecular property prediction, paving the way for more informed and effective utilization of AI in drug discovery and related fields.

[20]  arXiv:2405.00950 [pdf, other]
Title: Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback
Authors: Guojun Xiong, Jian Li
Comments: Accepted by ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Restless multi-armed bandits (RMAB) play a central role in modeling sequential decision making problems under an instantaneous activation constraint that at most B arms can be activated at any decision epoch. Each restless arm is endowed with a state that evolves independently according to a Markov decision process regardless of being activated or not. In this paper, we consider the task of learning in episodic RMAB with unknown transition functions and adversarial rewards, which can change arbitrarily across episodes. Further, we consider a challenging but natural bandit feedback setting that only adversarial rewards of activated arms are revealed to the decision maker (DM). The goal of the DM is to maximize its total adversarial rewards during the learning process while the instantaneous activation constraint must be satisfied in each decision epoch. We develop a novel reinforcement learning algorithm with two key contributors: a novel biased adversarial reward estimator to deal with bandit feedback and unknown transitions, and a low-complexity index policy to satisfy the instantaneous activation constraint. We show $\tilde{\mathcal{O}}(H\sqrt{T})$ regret bound for our algorithm, where $T$ is the number of episodes and $H$ is the episode length. To our best knowledge, this is the first algorithm to ensure $\tilde{\mathcal{O}}(\sqrt{T})$ regret for adversarial RMAB in our considered challenging settings.

[21]  arXiv:2405.00955 [pdf, other]
Title: Recovering Labels from Local Updates in Federated Learning
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Gradient inversion (GI) attacks present a threat to the privacy of clients in federated learning (FL) by aiming to enable reconstruction of the clients' data from communicated model updates. A number of such techniques attempts to accelerate data recovery by first reconstructing labels of the samples used in local training. However, existing label extraction methods make strong assumptions that typically do not hold in realistic FL settings. In this paper we present a novel label recovery scheme, Recovering Labels from Local Updates (RLU), which provides near-perfect accuracy when attacking untrained (most vulnerable) models. More significantly, RLU achieves high performance even in realistic real-world settings where the clients in an FL system run multiple local epochs, train on heterogeneous data, and deploy various optimizers to minimize different objective functions. Specifically, RLU estimates labels by solving a least-square problem that emerges from the analysis of the correlation between labels of the data points used in a training round and the resulting update of the output layer. The experimental results on several datasets, architectures, and data heterogeneity scenarios demonstrate that the proposed method consistently outperforms existing baselines, and helps improve quality of the reconstructed images in GI attacks in terms of both PSNR and LPIPS.

[22]  arXiv:2405.00957 [pdf, other]
Title: IntraMix: Intra-Class Mixup Generation for Accurate Labels and Neighbors
Comments: 18 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

Graph Neural Networks (GNNs) demonstrate excellent performance on graphs, with their core idea about aggregating neighborhood information and learning from labels. However, the prevailing challenges in most graph datasets are twofold of Insufficient High-Quality Labels and Lack of Neighborhoods, resulting in weak GNNs. Existing data augmentation methods designed to address these two issues often tackle only one. They may either require extensive training of generators, rely on overly simplistic strategies, or demand substantial prior knowledge, leading to suboptimal generalization abilities. To simultaneously address both of these two challenges, we propose an elegant method called IntraMix. IntraMix innovatively employs Mixup among low-quality labeled data of the same class, generating high-quality labeled data at minimal cost. Additionally, it establishes neighborhoods for the generated data by connecting them with data from the same class with high confidence, thereby enriching the neighborhoods of graphs. IntraMix efficiently tackles both challenges faced by graphs and challenges the prior notion of the limited effectiveness of Mixup in node classification. IntraMix serves as a universal framework that can be readily applied to all GNNs. Extensive experiments demonstrate the effectiveness of IntraMix across various GNNs and datasets.

[23]  arXiv:2405.00958 [pdf, other]
Title: Generative manufacturing systems using diffusion models and ChatGPT
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)

In this study, we introduce Generative Manufacturing Systems (GMS) as a novel approach to effectively manage and coordinate autonomous manufacturing assets, thereby enhancing their responsiveness and flexibility to address a wide array of production objectives and human preferences. Deviating from traditional explicit modeling, GMS employs generative AI, including diffusion models and ChatGPT, for implicit learning from envisioned futures, marking a shift from a model-optimum to a training-sampling decision-making. Through the integration of generative AI, GMS enables complex decision-making through interactive dialogue with humans, allowing manufacturing assets to generate multiple high-quality global decisions that can be iteratively refined based on human feedback. Empirical findings showcase GMS's substantial improvement in system resilience and responsiveness to uncertainties, with decision times reduced from seconds to milliseconds. The study underscores the inherent creativity and diversity in the generated solutions, facilitating human-centric decision-making through seamless and continuous human-machine interactions.

[24]  arXiv:2405.00965 [pdf, other]
Title: Robust Decentralized Learning with Local Updates and Gradient Tracking
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

As distributed learning applications such as Federated Learning, the Internet of Things (IoT), and Edge Computing grow, it is critical to address the shortcomings of such technologies from a theoretical perspective. As an abstraction, we consider decentralized learning over a network of communicating clients or nodes and tackle two major challenges: data heterogeneity and adversarial robustness. We propose a decentralized minimax optimization method that employs two important modules: local updates and gradient tracking. Minimax optimization is the key tool to enable adversarial training for ensuring robustness. Having local updates is essential in Federated Learning (FL) applications to mitigate the communication bottleneck, and utilizing gradient tracking is essential to proving convergence in the case of data heterogeneity. We analyze the performance of the proposed algorithm, Dec-FedTrack, in the case of nonconvex-strongly concave minimax optimization, and prove that it converges a stationary point. We also conduct numerical experiments to support our theoretical findings.

[25]  arXiv:2405.00984 [pdf, other]
Title: FREE: Faster and Better Data-Free Meta-Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data, presenting practical benefits in contexts constrained by data privacy concerns. Current DFML methods primarily focus on the data recovery from these pre-trained models. However, they suffer from slow recovery speed and overlook gaps inherent in heterogeneous pre-trained models. In response to these challenges, we introduce the Faster and Better Data-Free Meta-Learning (FREE) framework, which contains: (i) a meta-generator for rapidly recovering training tasks from pre-trained models; and (ii) a meta-learner for generalizing to new unseen tasks. Specifically, within the module Faster Inversion via Meta-Generator, each pre-trained model is perceived as a distinct task. The meta-generator can rapidly adapt to a specific task in just five steps, significantly accelerating the data recovery. Furthermore, we propose Better Generalization via Meta-Learner and introduce an implicit gradient alignment algorithm to optimize the meta-learner. This is achieved as aligned gradient directions alleviate potential conflicts among tasks from heterogeneous pre-trained models. Empirical experiments on multiple benchmarks affirm the superiority of our approach, marking a notable speed-up (20$\times$) and performance enhancement (1.42\% $\sim$ 4.78\%) in comparison to the state-of-the-art.

[26]  arXiv:2405.00985 [pdf, other]
Title: Progressive Feedforward Collapse of ResNet Training
Comments: 14 pages, 5 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST)

Neural collapse (NC) is a simple and symmetric phenomenon for deep neural networks (DNNs) at the terminal phase of training, where the last-layer features collapse to their class means and form a simplex equiangular tight frame aligning with the classifier vectors. However, the relationship of the last-layer features to the data and intermediate layers during training remains unexplored. To this end, we characterize the geometry of intermediate layers of ResNet and propose a novel conjecture, progressive feedforward collapse (PFC), claiming the degree of collapse increases during the forward propagation of DNNs. We derive a transparent model for the well-trained ResNet according to that ResNet with weight decay approximates the geodesic curve in Wasserstein space at the terminal phase. The metrics of PFC indeed monotonically decrease across depth on various datasets. We propose a new surrogate model, multilayer unconstrained feature model (MUFM), connecting intermediate layers by an optimal transport regularizer. The optimal solution of MUFM is inconsistent with NC but is more concentrated relative to the input data. Overall, this study extends NC to PFC to model the collapse phenomenon of intermediate layers and its dependence on the input data, shedding light on the theoretical understanding of ResNet in classification problems.

[27]  arXiv:2405.00987 [pdf, other]
Title: S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic
Comments: Accepted for publication at ICLR 2024
Subjects: Machine Learning (cs.LG)

Learning expressive stochastic policies instead of deterministic ones has been proposed to achieve better stability, sample complexity, and robustness. Notably, in Maximum Entropy Reinforcement Learning (MaxEnt RL), the policy is modeled as an expressive Energy-Based Model (EBM) over the Q-values. However, this formulation requires the estimation of the entropy of such EBMs, which is an open problem. To address this, previous MaxEnt RL methods either implicitly estimate the entropy, resulting in high computational complexity and variance (SQL), or follow a variational inference procedure that fits simplified actor distributions (e.g., Gaussian) for tractability (SAC). We propose Stein Soft Actor-Critic (S$^2$AC), a MaxEnt RL algorithm that learns expressive policies without compromising efficiency. Specifically, S$^2$AC uses parameterized Stein Variational Gradient Descent (SVGD) as the underlying policy. We derive a closed-form expression of the entropy of such policies. Our formula is computationally efficient and only depends on first-order derivatives and vector products. Empirical results show that S$^2$AC yields more optimal solutions to the MaxEnt objective than SQL and SAC in the multi-goal environment, and outperforms SAC and SQL on the MuJoCo benchmark. Our code is available at: https://github.com/SafaMessaoud/S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic

[28]  arXiv:2405.01009 [pdf, other]
Title: Tackling Graph Oversquashing by Global and Local Non-Dissipativity
Subjects: Machine Learning (cs.LG)

A common problem in Message-Passing Neural Networks is oversquashing -- the limited ability to facilitate effective information flow between distant nodes. Oversquashing is attributed to the exponential decay in information transmission as node distances increase. This paper introduces a novel perspective to address oversquashing, leveraging properties of global and local non-dissipativity, that enable the maintenance of a constant information flow rate. Namely, we present SWAN, a uniquely parameterized model GNN with antisymmetry both in space and weight domains, as a means to obtain non-dissipativity. Our theoretical analysis asserts that by achieving these properties, SWAN offers an enhanced ability to transmit information over extended distances. Empirical evaluations on synthetic and real-world benchmarks that emphasize long-range interactions validate the theoretical understanding of SWAN, and its ability to mitigate oversquashing.

[29]  arXiv:2405.01010 [pdf, other]
Title: Efficient and Adaptive Posterior Sampling Algorithms for Bandits
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA-$\alpha$) and Thompson Sampling with Timestamp Duelling (TS-TD-$\alpha$), where $\alpha \in [0,1]$ controls the trade-off between utility and computation. Both algorithms achieve $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bound, where $K$ is the number of arms, $T$ is the finite learning horizon, and $\Delta$ denotes the single round performance loss when pulling a sub-optimal arm.

[30]  arXiv:2405.01013 [pdf, other]
Title: Non-clairvoyant Scheduling with Partial Predictions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

The non-clairvoyant scheduling problem has gained new interest within learning-augmented algorithms, where the decision-maker is equipped with predictions without any quality guarantees. In practical settings, access to predictions may be reduced to specific instances, due to cost or data limitations. Our investigation focuses on scenarios where predictions for only $B$ job sizes out of $n$ are available to the algorithm. We first establish near-optimal lower bounds and algorithms in the case of perfect predictions. Subsequently, we present a learning-augmented algorithm satisfying the robustness, consistency, and smoothness criteria, and revealing a novel tradeoff between consistency and smoothness inherent in the scenario with a restricted number of predictions.

[31]  arXiv:2405.01031 [pdf, other]
Title: The Privacy Power of Correlated Noise in Decentralized Learning
Comments: Accepted as conference paper at ICML 2024
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)

Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use.

[32]  arXiv:2405.01033 [pdf, other]
Title: CrossMPT: Cross-attention Message-Passing Transformer for Error Correcting Codes
Comments: 13 pages
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)

Error correcting codes~(ECCs) are indispensable for reliable transmission in communication systems. The recent advancements in deep learning have catalyzed the exploration of ECC decoders based on neural networks. Among these, transformer-based neural decoders have achieved state-of-the-art decoding performance. In this paper, we propose a novel Cross-attention Message-Passing Transformer~(CrossMPT). CrossMPT iteratively updates two types of input vectors (i.e., magnitude and syndrome vectors) using two masked cross-attention blocks. The mask matrices in these cross-attention blocks are determined by the code's parity-check matrix that delineates the relationship between magnitude and syndrome vectors. Our experimental results show that CrossMPT significantly outperforms existing neural network-based decoders, particularly in decoding low-density parity-check codes. Notably, CrossMPT also achieves a significant reduction in computational complexity, achieving over a 50\% decrease in its attention layers compared to the original transformer-based decoder, while retaining the computational complexity of the remaining layers.

[33]  arXiv:2405.01041 [pdf, other]
Title: Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation
Subjects: Machine Learning (cs.LG)

Neural networks have been extensively applied to a variety of tasks, achieving astounding results. Applying neural networks in the scientific field is an important research direction that is gaining increasing attention. In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference during application. Additionally, comparing neural networks to traditional algorithms in scientific applications is inevitable. These applications often require rapid computations, making the reduction of neural network sizes increasingly important. Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity. Theoretical work has discovered that under strong non-linearity, neurons in the same layer tend to behave similarly, a phenomenon known as condensation. Condensation offers an opportunity to reduce the scale of neural networks to a smaller subnetwork with similar performance. In this article, we propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems. Our reduction method can currently be applied to both fully connected networks and convolutional networks, achieving positive results. In complex combustion acceleration tasks, we reduced the size of the neural network to 41.7% of its original scale while maintaining prediction accuracy. In the CIFAR10 image classification task, we reduced the network size to 11.5% of the original scale, still maintaining a satisfactory validation accuracy. Our method can be applied to most trained neural networks, reducing computational pressure and improving inference speed.

[34]  arXiv:2405.01052 [pdf, ps, other]
Title: Polynomial Chaos Expanded Gaussian Process
Comments: Manuscript: 20 pages, 4 figures, 7 tables
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

In complex and unknown processes, global models are initially generated over the entire experimental space, but they often fail to provide accurate predictions in local areas. Recognizing this limitation, this study addresses the need for models that effectively represent both global and local experimental spaces. It introduces a novel machine learning (ML) approach: Polynomial Chaos Expanded Gaussian Process (PCEGP), leveraging polynomial chaos expansion (PCE) to calculate input-dependent hyperparameters of the Gaussian process (GP). This approach provides a mathematically interpretable method that incorporates non-stationary covariance functions and heteroscedastic noise estimation to generate locally adapted models. The model performance is compared to different algorithms in benchmark tests for regression tasks. The results demonstrate low prediction errors of the PCEGP in these benchmark applications, highlighting model performance that is often competitive with or superior to previous methods. A key advantage of the presented model is the transparency and traceability in the calculation of hyperparameters and model predictions.

[35]  arXiv:2405.01053 [pdf, other]
Title: Explicitly Modeling Generality into Self-Supervised Learning
Comments: 28 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The goal of generality in machine learning is to achieve excellent performance on various unseen tasks and domains. Recently, self-supervised learning (SSL) has been regarded as an effective method to achieve this goal. It can learn high-quality representations from unlabeled data and achieve promising empirical performance on multiple downstream tasks. Existing SSL methods mainly constrain generality from two aspects: (i) large-scale training data, and (ii) learning task-level shared knowledge. However, these methods lack explicit modeling of the SSL generality in the learning objective, and the theoretical understanding of SSL's generality remains limited. This may cause SSL models to overfit in data-scarce situations and generalize poorly in the real world, making it difficult to achieve true generality. To address these issues, we provide a theoretical definition of generality in SSL and define a $\sigma$-measurement to help quantify it. Based on this insight, we explicitly model generality into self-supervised learning and further propose a novel SSL framework, called GeSSL. It introduces a self-motivated target based on $\sigma$-measurement, which enables the model to find the optimal update direction towards generality. Extensive theoretical and empirical evaluations demonstrate the superior performance of the proposed GeSSL.

[36]  arXiv:2405.01055 [pdf, ps, other]
Title: Leverage Multi-source Traffic Demand Data Fusion with Transformer Model for Urban Parking Prediction
Comments: 7 pages, 5 figures, under review by the 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

The escalation in urban private car ownership has worsened the urban parking predicament, necessitating effective parking availability prediction for urban planning and management. However, the existing prediction methods suffer from low prediction accuracy with the lack of spatial-temporal correlation features related to parking volume, and neglect of flow patterns and correlations between similar parking lots within certain areas. To address these challenges, this study proposes a parking availability prediction framework integrating spatial-temporal deep learning with multi-source data fusion, encompassing traffic demand data from multiple sources (e.g., metro, bus, taxi services), and parking lot data. The framework is based on the Transformer as the spatial-temporal deep learning model and leverages K-means clustering to establish parking cluster zones, extracting and integrating traffic demand characteristics from various transportation modes (i.e., metro, bus, online ride-hailing, and taxi) connected to parking lots. Real-world empirical data was used to verify the effectiveness of the proposed method compared with different machine learning, deep learning, and traditional statistical models for predicting parking availability. Experimental results reveal that, with the proposed pipeline, the developed Transformer model outperforms other models in terms of various metrics, e.g., Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). By fusing multi-source demanding data with spatial-temporal deep learning techniques, this approach offers the potential to develop parking availability prediction systems that furnish more accurate and timely information to both drivers and urban planners, thereby fostering more efficient and sustainable urban mobility.

[37]  arXiv:2405.01060 [pdf, other]
Title: A text-based, generative deep learning model for soil reflectance spectrum simulation in the VIS-NIR (400-2499 nm) bands
Comments: The paper has been submitted to Remote sensing of Environment and revised
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Simulating soil reflectance spectra is invaluable for soil-plant radiative modeling and training machine learning models, yet it is difficult as the intricate relationships between soil structure and its constituents. To address this, a fully data-driven soil optics generative model (SOGM) for simulation of soil reflectance spectra based on soil property inputs was developed. The model is trained on an extensive dataset comprising nearly 180,000 soil spectra-property pairs from 17 datasets. It generates soil reflectance spectra from text-based inputs describing soil properties and their values rather than only numerical values and labels in binary vector format. The generative model can simulate output spectra based on an incomplete set of input properties. SOGM is based on the denoising diffusion probabilistic model (DDPM). Two additional sub-models were also built to complement the SOGM: a spectral padding model that can fill in the gaps for spectra shorter than the full visible-near-infrared range (VIS-NIR; 400 to 2499 nm), and a wet soil spectra model that can estimate the effects of water content on soil reflectance spectra given the dry spectrum predicted by the SOGM. The SOGM was up-scaled by coupling with the Helios 3D plant modeling software, which allowed for generation of synthetic aerial images of simulated soil and plant scenes. It can also be easily integrated with soil-plant radiation model used for remote sensin research like PROSAIL. The testing results of the SOGM on new datasets that not included in model training proved that the model can generate reasonable soil reflectance spectra based on available property inputs. The presented models are openly accessible on: https://github.com/GEMINI-Breeding/SOGM_soil_spectra_simulation.

[38]  arXiv:2405.01067 [pdf, other]
Title: AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Communication bottlenecks hinder the scalability of distributed neural network training, particularly on distributed-memory computing clusters. To significantly reduce this communication overhead, we introduce AB-training, a novel data-parallel training method that decomposes weight matrices into low-rank representations and utilizes independent group-based training. This approach consistently reduces network traffic by 50% across multiple scaling scenarios, increasing the training potential on communication-constrained systems. Our method exhibits regularization effects at smaller scales, leading to improved generalization for models like VGG16, while achieving a remarkable 44.14 : 1 compression ratio during training on CIFAR-10 and maintaining competitive accuracy. Albeit promising, our experiments reveal that large batch effects remain a challenge even in low-rank training regimes.

[39]  arXiv:2405.01073 [pdf, other]
Title: Poisoning Attacks on Federated Learning for Autonomous Driving
Comments: Accepted to SCAI2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Federated Learning (FL) is a decentralized learning paradigm, enabling parties to collaboratively train models while keeping their data confidential. Within autonomous driving, it brings the potential of reducing data storage costs, reducing bandwidth requirements, and to accelerate the learning. FL is, however, susceptible to poisoning attacks. In this paper, we introduce two novel poisoning attacks on FL tailored to regression tasks within autonomous driving: FLStealth and Off-Track Attack (OTA). FLStealth, an untargeted attack, aims at providing model updates that deteriorate the global model performance while appearing benign. OTA, on the other hand, is a targeted attack with the objective to change the global model's behavior when exposed to a certain trigger. We demonstrate the effectiveness of our attacks by conducting comprehensive experiments pertaining to the task of vehicle trajectory prediction. In particular, we show that, among five different untargeted attacks, FLStealth is the most successful at bypassing the considered defenses employed by the server. For OTA, we demonstrate the inability of common defense strategies to mitigate the attack, highlighting the critical need for new defensive mechanisms against targeted attacks within FL for autonomous driving.

[40]  arXiv:2405.01102 [pdf, other]
Title: Less is More: on the Over-Globalizing Problem in Graph Transformers
Comments: Accepted by ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property always benefit Graph Transformers? We reveal the over-globalizing problem in Graph Transformer by presenting both empirical evidence and theoretical analysis, i.e., the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened. Then we propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer), including the inter-cluster and intra-cluster Transformers, to prevent the over-globalizing problem while keeping the ability to extract valuable information from distant nodes. Moreover, the collaborative training is proposed to improve the model's generalization ability with a theoretical guarantee. Extensive experiments on various graphs well validate the effectiveness of our proposed CoBFormer.

[41]  arXiv:2405.01114 [pdf, other]
Title: Continual Imitation Learning for Prosthetic Limbs
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. Motorized bionic limbs offer promise, but their utility depends on mimicking the evolving synergy of human movement in various settings. In this context, we present a novel model for bionic prostheses' application that leverages camera-based motion capture and wearable sensor data, to learn the synergistic coupling of the lower limbs during human locomotion, empowering it to infer the kinematic behavior of a missing lower limb across varied tasks, such as climbing inclines and stairs. We propose a model that can multitask, adapt continually, anticipate movements, and refine. The core of our method lies in an approach which we call -- multitask prospective rehearsal -- that anticipates and synthesizes future movements based on the previous prediction and employs a corrective mechanism for subsequent predictions. We design an evolving architecture that merges lightweight, task-specific modules on a shared backbone, ensuring both specificity and scalability. We empirically validate our model against various baselines using real-world human gait datasets, including experiments with transtibial amputees, which encompass a broad spectrum of locomotion tasks. The results show that our approach consistently outperforms baseline models, particularly under scenarios affected by distributional shifts, adversarial perturbations, and noise.

[42]  arXiv:2405.01125 [pdf, other]
Title: Lipschitz constant estimation for general neural network architectures using control tools
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Systems and Control (eess.SY)

This paper is devoted to the estimation of the Lipschitz constant of neural networks using semidefinite programming. For this purpose, we interpret neural networks as time-varying dynamical systems, where the $k$-th layer corresponds to the dynamics at time $k$. A key novelty with respect to prior work is that we use this interpretation to exploit the series interconnection structure of neural networks with a dynamic programming recursion. Nonlinearities, such as activation functions and nonlinear pooling layers, are handled with integral quadratic constraints. If the neural network contains signal processing layers (convolutional or state space model layers), we realize them as 1-D/2-D/N-D systems and exploit this structure as well. We distinguish ourselves from related work on Lipschitz constant estimation by more extensive structure exploitation (scalability) and a generalization to a large class of common neural network architectures. To show the versatility and computational advantages of our method, we apply it to different neural network architectures trained on MNIST and CIFAR-10.

[43]  arXiv:2405.01142 [pdf, other]
Title: Sharp Bounds for Sequential Federated Learning on Heterogeneous Data
Comments: arXiv admin note: text overlap with arXiv:2311.03154
Subjects: Machine Learning (cs.LG)

There are two paradigms in Federated Learning (FL): parallel FL (PFL), where models are trained in a parallel manner across clients; and sequential FL (SFL), where models are trained in a sequential manner across clients. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. To resolve the theoretical dilemma of SFL, we establish sharp convergence guarantees for SFL on heterogeneous data with both upper and lower bounds. Specifically, we derive the upper bounds for strongly convex, general convex and non-convex objective functions, and construct the matching lower bounds for the strongly convex and general convex objective functions. Then, we compare the upper bounds of SFL with those of PFL, showing that SFL outperforms PFL (at least, when the level of heterogeneity is relatively high). Experimental results on quadratic functions and real data sets validate the counterintuitive comparison result.

[44]  arXiv:2405.01147 [pdf, other]
Title: Why Tabular Foundation Models Should Be a Research Priority
Comments: Accepted at International Conference on Machine Learning (ICML 2024)
Subjects: Machine Learning (cs.LG)

Recent text and image foundation models are incredibly impressive, and these models are attracting an ever-increasing portion of research resources. In this position piece we aim to shift the ML research community's priorities ever so slightly to a different modality: tabular data. Tabular data is the dominant modality in many fields, yet it is given hardly any research attention and significantly lags behind in terms of scale and power. We believe the time is now to start developing tabular foundation models, or what we coin a Large Tabular Model (LTM). LTMs could revolutionise the way science and ML use tabular data: not as single datasets that are analyzed in a vacuum, but contextualized with respect to related datasets. The potential impact is far-reaching: from few-shot tabular models to automating data science; from out-of-distribution synthetic data to empowering multidisciplinary scientific discovery. We intend to excite reflections on the modalities we study, and convince some researchers to study large tabular models.

[45]  arXiv:2405.01155 [pdf, other]
Title: SynFlowNet: Towards Molecule Design with Guaranteed Synthesis Pathways
Comments: Presented at ICLR 2024 GEM Workshop
Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)

Recent breakthroughs in generative modelling have led to a number of works proposing molecular generation models for drug discovery. While these models perform well at capturing drug-like motifs, they are known to often produce synthetically inaccessible molecules. This is because they are trained to compose atoms or fragments in a way that approximates the training distribution, but they are not explicitly aware of the synthesis constraints that come with making molecules in the lab. To address this issue, we introduce SynFlowNet, a GFlowNet model whose action space uses chemically validated reactions and reactants to sequentially build new molecules. We evaluate our approach using synthetic accessibility scores and an independent retrosynthesis tool. SynFlowNet consistently samples synthetically feasible molecules, while still being able to find diverse and high-utility candidates. Furthermore, we compare molecules designed with SynFlowNet to experimentally validated actives, and find that they show comparable properties of interest, such as molecular weight, SA score and predicted protein binding affinity.

[46]  arXiv:2405.01157 [pdf, other]
Title: Tabular and Deep Reinforcement Learning for Gittins Index
Subjects: Machine Learning (cs.LG); Performance (cs.PF); Machine Learning (stat.ML)

In the realm of multi-arm bandit problems, the Gittins index policy is known to be optimal in maximizing the expected total discounted reward obtained from pulling the Markovian arms. In most realistic scenarios however, the Markovian state transition probabilities are unknown and therefore the Gittins indices cannot be computed. One can then resort to reinforcement learning (RL) algorithms that explore the state space to learn these indices while exploiting to maximize the reward collected. In this work, we propose tabular (QGI) and Deep RL (DGN) algorithms for learning the Gittins index that are based on the retirement formulation for the multi-arm bandit problem. When compared with existing RL algorithms that learn the Gittins index, our algorithms have a lower run time, require less storage space (small Q-table size in QGI and smaller replay buffer in DGN), and illustrate better empirical convergence to the Gittins index. This makes our algorithm well suited for problems with large state spaces and is a viable alternative to existing methods. As a key application, we demonstrate the use of our algorithms in minimizing the mean flowtime in a job scheduling problem when jobs are available in batches and have an unknown service time distribution. \

[47]  arXiv:2405.01158 [pdf, other]
Title: Interpretable Data-driven Anomaly Detection in Industrial Processes with ExIFFI
Comments: 6 pages, submitted to IEEE RTSI 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Anomaly detection (AD) is a crucial process often required in industrial settings. Anomalies can signal underlying issues within a system, prompting further investigation. Industrial processes aim to streamline operations as much as possible, encompassing the production of the final product, making AD an essential mean to reach this goal.Conventional anomaly detection methodologies typically classify observations as either normal or anomalous without providing insight into the reasons behind these classifications.Consequently, in light of the emergence of Industry 5.0, a more desirable approach involves providing interpretable outcomes, enabling users to understand the rationale behind the results.This paper presents the first industrial application of ExIFFI, a recently developed approach focused on the production of fast and efficient explanations for the Extended Isolation Forest (EIF) Anomaly detection method. ExIFFI is tested on two publicly available industrial datasets demonstrating superior effectiveness in explanations and computational efficiency with the respect to other state-of-the-art explainable AD models.

[48]  arXiv:2405.01186 [pdf, other]
Title: Potential Energy based Mixture Model for Noisy Label Learning
Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Training deep neural networks (DNNs) from noisy labels is an important and challenging task. However, most existing approaches focus on the corrupted labels and ignore the importance of inherent data structure. To bridge the gap between noisy labels and data, inspired by the concept of potential energy in physics, we propose a novel Potential Energy based Mixture Model (PEMM) for noise-labels learning. We innovate a distance-based classifier with the potential energy regularization on its class centers. Embedding our proposed classifier with existing deep learning backbones, we can have robust networks with better feature representations. They can preserve intrinsic structures from the data, resulting in a superior noisy tolerance. We conducted extensive experiments to analyze the efficiency of our proposed model on several real-world datasets. Quantitative results show that it can achieve state-of-the-art performance.

[49]  arXiv:2405.01189 [pdf, other]
Title: Gradient-Congruity Guided Federated Sparse Training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Edge computing allows artificial intelligence and machine learning models to be deployed on edge devices, where they can learn from local data and collaborate to form a global model. Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy. However, FL also faces challenges such as high computational and communication costs regarding resource-constrained devices, and poor generalization performance due to the heterogeneity of data across edge clients and the presence of out-of-distribution data. In this paper, we propose the Gradient-Congruity Guided Federated Sparse Training (FedSGC), a novel method that integrates dynamic sparse training and gradient congruity inspection into federated learning framework to address these issues. Our method leverages the idea that the neurons, in which the associated gradients with conflicting directions with respect to the global model contain irrelevant or less generalized information for other clients, and could be pruned during the sparse training process. Conversely, the neurons where the associated gradients with consistent directions could be grown in a higher priority. In this way, FedSGC can greatly reduce the local computation and communication overheads while, at the same time, enhancing the generalization abilities of FL. We evaluate our method on challenging non-i.i.d settings and show that it achieves competitive accuracy with state-of-the-art FL methods across various scenarios while minimizing computation and communication costs.

[50]  arXiv:2405.01196 [pdf, other]
Title: Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures such as Wide Residual Networks (WRN) and Visual Transformers (ViT) significantly improves model calibration whilst retaining accuracy, and at a low training cost. In addition, we show that placing a Gaussian prior on the last hidden layer outputs of a DNN, and training the model variationally in the classification training stage, even further improves calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.

[51]  arXiv:2405.01198 [pdf, other]
Title: Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Reinforcement learning policies are typically represented by black-box neural networks, which are non-interpretable and not well-suited for safety-critical domains. To address both of these issues, we propose constrained normalizing flow policies as interpretable and safe-by-construction policy models. We achieve safety for reinforcement learning problems with instantaneous safety constraints, for which we can exploit domain knowledge by analytically constructing a normalizing flow that ensures constraint satisfaction. The normalizing flow corresponds to an interpretable sequence of transformations on action samples, each ensuring alignment with respect to a particular constraint. Our experiments reveal benefits beyond interpretability in an easier learning objective and maintained constraint satisfaction throughout the entire learning process. Our approach leverages constraints over reward engineering while offering enhanced interpretability, safety, and direct means of providing domain knowledge to the agent without relying on complex reward functions.

[52]  arXiv:2405.01205 [pdf, other]
Title: Error-Driven Uncertainty Aware Training
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this work, we present a novel technique, named Error-Driven Uncertainty Aware Training (EUAT), which aims to enhance the ability of neural models to estimate their uncertainty correctly, namely to be highly uncertain when they output inaccurate predictions and low uncertain when their output is accurate. The EUAT approach operates during the model's training phase by selectively employing two loss functions depending on whether the training examples are correctly or incorrectly predicted by the model. This allows for pursuing the twofold goal of i) minimizing model uncertainty for correctly predicted inputs and ii) maximizing uncertainty for mispredicted inputs, while preserving the model's misprediction rate. We evaluate EUAT using diverse neural models and datasets in the image recognition domains considering both non-adversarial and adversarial settings. The results show that EUAT outperforms existing approaches for uncertainty estimation (including other uncertainty-aware training techniques, calibration, ensembles, and DEUP) by providing uncertainty estimates that not only have higher quality when evaluated via statistical metrics (e.g., correlation with residuals) but also when employed to build binary classifiers that decide whether the model's output can be trusted or not and under distributional data shifts.

[53]  arXiv:2405.01207 [pdf, ps, other]
Title: Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features
Comments: Trustworthy Speech Processing, Satellite Workshop at ICASSP 2024
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Membership Inference (MI) poses a substantial privacy threat to the training data of Automatic Speech Recognition (ASR) systems, while also offering an opportunity to audit these models with regard to user data. This paper explores the effectiveness of loss-based features in combination with Gaussian and adversarial perturbations to perform MI in ASR models. To the best of our knowledge, this approach has not yet been investigated. We compare our proposed features with commonly used error-based features and find that the proposed features greatly enhance performance for sample-level MI. For speaker-level MI, these features improve results, though by a smaller margin, as error-based features already obtained a high performance for this task. Our findings emphasise the importance of considering different feature sets and levels of access to target models for effective MI in ASR systems, providing valuable insights for auditing such models.

[54]  arXiv:2405.01229 [pdf, ps, other]
Title: Boosting Jailbreak Attack with Momentum
Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Optimization and Control (math.OC)

Large Language Models (LLMs) have achieved remarkable success across diverse tasks, yet they remain vulnerable to adversarial attacks, notably the well-documented \textit{jailbreak} attack. Recently, the Greedy Coordinate Gradient (GCG) attack has demonstrated efficacy in exploiting this vulnerability by optimizing adversarial prompts through a combination of gradient heuristics and greedy search. However, the efficiency of this attack has become a bottleneck in the attacking process. To mitigate this limitation, in this paper we rethink the generation of adversarial prompts through an optimization lens, aiming to stabilize the optimization process and harness more heuristic insights from previous iterations. Specifically, we introduce the \textbf{M}omentum \textbf{A}ccelerated G\textbf{C}G (\textbf{MAC}) attack, which incorporates a momentum term into the gradient heuristic. Experimental results showcase the notable enhancement achieved by MAP in gradient-based attacks on aligned language models. Our code is available at https://github.com/weizeming/momentum-attack-llm.

[55]  arXiv:2405.01247 [pdf, other]
Title: Lying Graph Convolution: Learning to Lie for Node Classification Tasks
Comments: Accepted to IJCNN2024
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

In the context of machine learning for graphs, many researchers have empirically observed that Deep Graph Networks (DGNs) perform favourably on node classification tasks when the graph structure is homophilic (\ie adjacent nodes are similar). In this paper, we introduce Lying-GCN, a new DGN inspired by opinion dynamics that can adaptively work in both the heterophilic and the homophilic setting. At each layer, each agent (node) shares its own opinions (node embeddings) with its neighbours. Instead of sharing its opinion directly as in GCN, we introduce a mechanism which allows agents to lie. Such a mechanism is adaptive, thus the agents learn how and when to lie according to the task that should be solved. We provide a characterisation of our proposal in terms of dynamical systems, by studying the spectral property of the coefficient matrix of the system. While the steady state of the system collapses to zero, we believe the lying mechanism is still usable to solve node classification tasks. We empirically prove our belief on both synthetic and real-world datasets, by showing that the lying mechanism allows to increase the performances in the heterophilic setting without harming the results in the homophilic one.

[56]  arXiv:2405.01251 [pdf, other]
Title: Revisiting semi-supervised training objectives for differentiable particle filters
Comments: 5 pages, 2 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Differentiable particle filters combine the flexibility of neural networks with the probabilistic nature of sequential Monte Carlo methods. However, traditional approaches rely on the availability of labelled data, i.e., the ground truth latent state information, which is often difficult to obtain in real-world applications. This paper compares the effectiveness of two semi-supervised training objectives for differentiable particle filters. We present results in two simulated environments where labelled data are scarce.

[57]  arXiv:2405.01260 [pdf, other]
Title: Causal Influence in Federated Edge Inference
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Signal Processing (eess.SP); Systems and Control (eess.SY)

In this paper, we consider a setting where heterogeneous agents with connectivity are performing inference using unlabeled streaming data. Observed data are only partially informative about the target variable of interest. In order to overcome the uncertainty, agents cooperate with each other by exchanging their local inferences with and through a fusion center. To evaluate how each agent influences the overall decision, we adopt a causal framework in order to distinguish the actual influence of agents from mere correlations within the decision-making process. Various scenarios reflecting different agent participation patterns and fusion center policies are investigated. We derive expressions to quantify the causal impact of each agent on the joint decision, which could be beneficial for anticipating and addressing atypical scenarios, such as adversarial attacks or system malfunctions. We validate our theoretical results with numerical simulations and a real-world application of multi-camera crowd counting.

[58]  arXiv:2405.01261 [pdf, other]
Title: Continuously evolving rewards in an open-ended environment
Comments: 30 pages, 8 figures
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Unambiguous identification of the rewards driving behaviours of entities operating in complex open-ended real-world environments is difficult, partly because goals and associated behaviours emerge endogenously and are dynamically updated as environments change. Reproducing such dynamics in models would be useful in many domains, particularly where fixed reward functions limit the adaptive capabilities of agents. Simulation experiments described assess a candidate algorithm for the dynamic updating of rewards, RULE: Reward Updating through Learning and Expectation. The approach is tested in a simplified ecosystem-like setting where experiments challenge entities' survival, calling for significant behavioural change. The population of entities successfully demonstrate the abandonment of an initially rewarded but ultimately detrimental behaviour, amplification of beneficial behaviour, and appropriate responses to novel items added to their environment. These adjustment happen through endogenous modification of the entities' underlying reward function, during continuous learning, without external intervention.

[59]  arXiv:2405.01263 [pdf, other]
Title: An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Operating Systems (cs.OS)

The commonly used caching policies, such as LRU or LFU, exhibit optimal performance only for specific traffic patterns. Even advanced Machine Learning-based methods, which detect patterns in historical request data, struggle when future requests deviate from past trends. Recently, a new class of policies has emerged that makes no assumptions about the request arrival process. These algorithms solve an online optimization problem, enabling continuous adaptation to the context. They offer theoretical guarantees on the regret metric, which is the gap between the gain of the online policy and the gain of the optimal static cache allocation in hindsight. Nevertheless, the high computational complexity of these solutions hinders their practical adoption. In this study, we introduce a groundbreaking gradient-based online caching policy, the first to achieve logarithmic computational complexity relative to catalog size along with regret guarantees. This means our algorithm can efficiently handle large-scale data while minimizing the performance gap between real-time decisions and optimal hindsight choices. As requests arrive, our policy dynamically adjusts the probabilities of including items in the cache, which drive cache update decisions. Our algorithm's streamlined complexity is a key advantage, enabling its application to real-world traces featuring millions of requests and items. This is a significant achievement, as traces of this scale have been out of reach for existing policies with regret guarantees. To the best of our knowledge, our experimental results show for the first time that the regret guarantees of gradient-based caching policies bring significant benefits in scenarios of practical interest.

[60]  arXiv:2405.01270 [pdf, other]
Title: The Importance of Model Inspection for Better Understanding Performance Characteristics of Graph Neural Networks
Comments: International Symposium on Biomedical Imaging (ISBI)
Subjects: Machine Learning (cs.LG)

This study highlights the importance of conducting comprehensive model inspection as part of comparative performance analyses. Here, we investigate the effect of modelling choices on the feature learning characteristics of graph neural networks applied to a brain shape classification task. Specifically, we analyse the effect of using parameter-efficient, shared graph convolutional submodels compared to structure-specific, non-shared submodels. Further, we assess the effect of mesh registration as part of the data harmonisation pipeline. We find substantial differences in the feature embeddings at different layers of the models. Our results highlight that test accuracy alone is insufficient to identify important model characteristics such as encoded biases related to data source or potentially non-discriminative features learned in submodels. Our model inspection framework offers a valuable tool for practitioners to better understand performance characteristics of deep learning models in medical imaging.

[61]  arXiv:2405.01306 [pdf, other]
Title: Graph is all you need? Lightweight data-agnostic neural architecture search without training
Subjects: Machine Learning (cs.LG)

Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the proxy in lieu of the evaluation metric. Our training-free NAS method is data-agnostic and light-weight. It can find the best architecture among 200 randomly sampled architectures from NAS-Bench201 in 217 CPU seconds. Besides, our method is able to achieve competitive performance on various datasets including NASBench-101, NASBench-201, and NDS search spaces. We also demonstrate that nasgraph generalizes to more challenging tasks on Micro TransNAS-Bench-101.

[62]  arXiv:2405.01319 [pdf, other]
Title: Data Scoping: Effectively Learning the Evolution of Generic Transport PDEs
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)

Transport phenomena (e.g., fluid flows) are governed by time-dependent partial differential equations (PDEs) describing mass, momentum, and energy conservation, and are ubiquitous in many engineering applications. However, deep learning architectures are fundamentally incompatible with the simulation of these PDEs. This paper clearly articulates and then solves this incompatibility. The local-dependency of generic transport PDEs implies that it only involves local information to predict the physical properties at a location in the next time step. However, the deep learning architecture will inevitably increase the scope of information to make such predictions as the number of layers increases, which can cause sluggish convergence and compromise generalizability. This paper aims to solve this problem by proposing a distributed data scoping method with linear time complexity to strictly limit the scope of information to predict the local properties. The numerical experiments over multiple physics show that our data scoping method significantly accelerates training convergence and improves the generalizability of benchmark models on large-scale engineering simulations. Specifically, over the geometries not included in the training data for heat transferring simulation, it can increase the accuracy of Convolutional Neural Networks (CNNs) by 21.7 \% and that of Fourier Neural Operators (FNOs) by 38.5 \% on average.

[63]  arXiv:2405.01327 [pdf, other]
Title: Constrained Reinforcement Learning Under Model Mismatch
Subjects: Machine Learning (cs.LG)

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constrained RL under model uncertainty, where the goal is to learn a good policy that optimizes the reward and at the same time satisfy the constraint under model mismatch. We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training. We demonstrate the effectiveness of our algorithm on a set of RL tasks with constraints.

[64]  arXiv:2405.01349 [pdf, other]
Title: Position Paper: Beyond Robustness Against Single Attack Types
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Current research on defending against adversarial examples focuses primarily on achieving robustness against a single attack type such as $\ell_2$ or $\ell_{\infty}$-bounded attacks. However, the space of possible perturbations is much larger and currently cannot be modeled by a single attack type. The discrepancy between the focus of current defenses and the space of attacks of interest calls to question the practicality of existing defenses and the reliability of their evaluation. In this position paper, we argue that the research community should look beyond single attack robustness, and we draw attention to three potential directions involving robustness against multiple attacks: simultaneous multiattack robustness, unforeseen attack robustness, and a newly defined problem setting which we call continual adaptive robustness. We provide a unified framework which rigorously defines these problem settings, synthesize existing research in these fields, and outline open directions. We hope that our position paper inspires more research in simultaneous multiattack, unforeseen attack, and continual adaptive robustness.

[65]  arXiv:2405.01350 [pdf, other]
Title: Community-Invariant Graph Contrastive Learning
Comments: This paper is accepted by ICML-2024
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current knowledge-based graph augmentation methods can only focus on either topology or node features, causing the model to lack robustness against various types of noise. To address these limitations, this research investigated the role of the graph community in graph augmentation and figured out its crucial advantage for learnable graph augmentation. Based on our observations, we propose a community-invariant GCL framework to maintain graph community structure during learnable graph augmentation. By maximizing the spectral changes, this framework unifies the constraints of both topology and feature augmentation, enhancing the model's robustness. Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. Code is released on Github (https://github.com/ShiyinTan/CI-GCL.git).

[66]  arXiv:2405.01365 [pdf, other]
Title: Dynamic Online Ensembles of Basis Expansions
Comments: 34 pages, 14 figures. Accepted to Transactions on Machine Learning Research (TMLR)
Journal-ref: Transactions on Machine Learning Research (TMLR), 2024
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

Practical Bayesian learning often requires (1) online inference, (2) dynamic models, and (3) ensembling over multiple different models. Recent advances have shown how to use random feature approximations to achieve scalable, online ensembling of Gaussian processes with desirable theoretical properties and fruitful applications. One key to these methods' success is the inclusion of a random walk on the model parameters, which makes models dynamic. We show that these methods can be generalized easily to any basis expansion model and that using alternative basis expansions, such as Hilbert space Gaussian processes, often results in better performance. To simplify the process of choosing a specific basis expansion, our method's generality also allows the ensembling of several entirely different models, for example, a Gaussian process and polynomial regression. Finally, we propose a novel method to ensemble static and dynamic models together.

[67]  arXiv:2405.01389 [pdf, other]
Title: Invariant Risk Minimization Is A Total Variation Model
Subjects: Machine Learning (cs.LG)

Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with respect to the classifier variable. Moreover, we propose a novel IRM framework based on the TV-$\ell_1$ model. It not only expands the classes of functions that can be used as the learning risk, but also has robust performance in denoising and invariant feature preservation based on the coarea formula. We also illustrate some requirements for IRM-TV-$\ell_1$ to achieve out-of-distribution generalization. Experimental results show that the proposed framework achieves competitive performance in several benchmark machine learning scenarios.

[68]  arXiv:2405.01451 [pdf, other]
Title: Test-time Assessment of a Model's Performance on Unseen Domains via Optimal Transport
Subjects: Machine Learning (cs.LG)

Gauging the performance of ML models on data from unseen domains at test-time is essential yet a challenging problem due to the lack of labels in this setting. Moreover, the performance of these models on in-distribution data is a poor indicator of their performance on data from unseen domains. Thus, it is essential to develop metrics that can provide insights into the model's performance at test time and can be computed only with the information available at test time (such as their model parameters, the training data or its statistics, and the unlabeled test data). To this end, we propose a metric based on Optimal Transport that is highly correlated with the model's performance on unseen domains and is efficiently computable only using information available at test time. Concretely, our metric characterizes the model's performance on unseen domains using only a small amount of unlabeled data from these domains and data or statistics from the training (source) domain(s). Through extensive empirical evaluation using standard benchmark datasets, and their corruptions, we demonstrate the utility of our metric in estimating the model's performance in various practical applications. These include the problems of selecting the source data and architecture that leads to the best performance on data from an unseen domain and the problem of predicting a deployed model's performance at test time on unseen domains. Our empirical results show that our metric, which uses information from both the source and the unseen domain, is highly correlated with the model's performance, achieving a significantly better correlation than that obtained via the popular prediction entropy-based metric, which is computed solely using the data from the unseen domain.

[69]  arXiv:2405.01462 [pdf, other]
Title: Uncertainty for Active Learning on Graphs
Subjects: Machine Learning (cs.LG)

Uncertainty Sampling is an Active Learning strategy that aims to improve the data efficiency of machine learning models by iteratively acquiring labels of data points with the highest uncertainty. While it has proven effective for independent data its applicability to graphs remains under-explored. We propose the first extensive study of Uncertainty Sampling for node classification: (1) We benchmark Uncertainty Sampling beyond predictive uncertainty and highlight a significant performance gap to other Active Learning strategies. (2) We develop ground-truth Bayesian uncertainty estimates in terms of the data generating process and prove their effectiveness in guiding Uncertainty Sampling toward optimal queries. We confirm our results on synthetic data and design an approximate approach that consistently outperforms other uncertainty estimators on real datasets. (3) Based on this analysis, we relate pitfalls in modeling uncertainty to existing methods. Our analysis enables and informs the development of principled uncertainty estimation on graphs.

[70]  arXiv:2405.01468 [pdf, other]
Title: Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
Authors: Yifei Ming, Yixuan Li
Comments: The paper is accepted at ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Pre-trained contrastive vision-language models have demonstrated remarkable performance across a wide range of tasks. However, they often struggle on fine-trained datasets with categories not adequately represented during pre-training, which makes adaptation necessary. Recent works have shown promising results by utilizing samples from web-scale databases for retrieval-augmented adaptation, especially in low-data regimes. Despite the empirical success, understanding how retrieval impacts the adaptation of vision-language models remains an open research question. In this work, we adopt a reflective perspective by presenting a systematic study to understand the roles of key components in retrieval-augmented adaptation. We unveil new insights on uni-modal and cross-modal retrieval and highlight the critical role of logit ensemble for effective adaptation. We further present theoretical underpinnings that directly support our empirical observations.

[71]  arXiv:2405.01480 [pdf, other]
Title: Common pitfalls to avoid while using multiobjective optimization in machine learning
Comments: 21 pages, 12 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Recently, there has been an increasing interest in exploring the application of multiobjective optimization (MOO) in machine learning (ML). The interest is driven by the numerous situations in real-life applications where multiple objectives need to be optimized simultaneously. A key aspect of MOO is the existence of a Pareto set, rather than a single optimal solution, which illustrates the inherent trade-offs between objectives. Despite its potential, there is a noticeable lack of satisfactory literature that could serve as an entry-level guide for ML practitioners who want to use MOO. Hence, our goal in this paper is to produce such a resource. We critically review previous studies, particularly those involving MOO in deep learning (using Physics-Informed Neural Networks (PINNs) as a guiding example), and identify misconceptions that highlight the need for a better grasp of MOO principles in ML. Using MOO of PINNs as a case study, we demonstrate the interplay between the data loss and the physics loss terms. We highlight the most common pitfalls one should avoid while using MOO techniques in ML. We begin by establishing the groundwork for MOO, focusing on well-known approaches such as the weighted sum (WS) method, alongside more complex techniques like the multiobjective gradient descent algorithm (MGDA). Additionally, we compare the results obtained from the WS and MGDA with one of the most common evolutionary algorithms, NSGA-II. We emphasize the importance of understanding the specific problem, the objective space, and the selected MOO method, while also noting that neglecting factors such as convergence can result in inaccurate outcomes and, consequently, a non-optimal solution. Our goal is to offer a clear and practical guide for ML practitioners to effectively apply MOO, particularly in the context of DL.

[72]  arXiv:2405.01488 [pdf, other]
Title: Digital Twin Generators for Disease Modeling
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine learning approaches that leverage large datasets of historical patients' longitudinal health records to generate patients' digital twins are more tractable than potential mechanistic models. In this manuscript, we describe a neural network architecture that can learn conditional generative models of clinical trajectories, which we call Digital Twin Generators (DTGs), that can create digital twins of individual patients. We show that the same neural network architecture can be trained to generate accurate digital twins for patients across 13 different indications simply by changing the training set and tuning hyperparameters. By introducing a general purpose architecture, we aim to unlock the ability to scale machine learning approaches to larger datasets and across more indications so that a digital twin could be created for any patient in the world.

[73]  arXiv:2405.01507 [pdf, other]
Title: Accelerating Convergence in Bayesian Few-Shot Classification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at https://github.com/keanson/MD-BSFC.

[74]  arXiv:2405.01524 [pdf, other]
Title: A separability-based approach to quantifying generalization: which layer is best?
Comments: 6, pages, 5 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Generalization to unseen data remains poorly understood for deep learning classification and foundation models. How can one assess the ability of networks to adapt to new or extended versions of their input space in the spirit of few-shot learning, out-of-distribution generalization, and domain adaptation? Which layers of a network are likely to generalize best? We provide a new method for evaluating the capacity of networks to represent a sampled domain, regardless of whether the network has been trained on all classes in the domain. Our approach is the following: after fine-tuning state-of-the-art pre-trained models for visual classification on a particular domain, we assess their performance on data from related but distinct variations in that domain. Generalization power is quantified as a function of the latent embeddings of unseen data from intermediate layers for both unsupervised and supervised settings. Working throughout all stages of the network, we find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best, which has implications for pruning. Since the trends observed across datasets are largely consistent, we conclude that our approach reveals (a function of) the intrinsic capacity of the different layers of a model to generalize.

[75]  arXiv:2405.01531 [pdf, other]
Title: Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Crucially, the CBM design inherently allows for human interventions, in which expert users are given the ability to modify potentially misaligned concept choices to influence the decision behavior of the model in an interpretable fashion. However, existing approaches often require numerous human interventions per image to achieve strong performances, posing practical challenges in scenarios where obtaining human feedback is expensive. In this paper, we find that this is noticeably driven by an independent treatment of concepts during intervention, wherein a change of one concept does not influence the use of other ones in the model's final decision. To address this issue, we introduce a trainable concept intervention realignment module, which leverages concept relations to realign concept assignments post-intervention. Across standard, real-world benchmarks, we find that concept realignment can significantly improve intervention efficacy; significantly reducing the number of interventions needed to reach a target classification performance or concept prediction accuracy. In addition, it easily integrates into existing concept-based architectures without requiring changes to the models themselves. This reduced cost of human-model collaboration is crucial to enhancing the feasibility of CBMs in resource-constrained environments.

[76]  arXiv:2405.01534 [pdf, other]
Title: Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks
Comments: Published at ICLR 2024. Website at this https URL 9 pages, 3 figures, 3 tables; 14 pages appendix (7 additional figures)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Large Language Models (LLMs) have been shown to be capable of performing high-level planning for long-horizon robotics tasks, yet existing methods require access to a pre-defined skill library (e.g. picking, placing, pulling, pushing, navigating). However, LLM planning does not address how to design or learn those behaviors, which remains challenging particularly in long-horizon settings. Furthermore, for many tasks of interest, the robot needs to be able to adjust its behavior in a fine-grained manner, requiring the agent to be capable of modifying low-level control actions. Can we instead use the internet-scale knowledge from LLMs for high-level policies, guiding reinforcement learning (RL) policies to efficiently solve robotic control tasks online without requiring a pre-determined set of skills? In this paper, we propose Plan-Seq-Learn (PSL): a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control for solving long-horizon robotics tasks from scratch. We demonstrate that PSL achieves state-of-the-art results on over 25 challenging robotics tasks with up to 10 stages. PSL solves long-horizon tasks from raw visual input spanning four benchmarks at success rates of over 85%, out-performing language-based, classical, and end-to-end approaches. Video results and code at https://mihdalal.github.io/planseqlearn/

Cross-lists for Fri, 3 May 24

[77]  arXiv:2405.00678 (cross-list from eess.SP) [pdf, ps, other]
Title: Low-cost modular devices for on-road vehicle detection and characterisation
Comments: 17 pages
Journal-ref: Poza Lujan, JL., Uribe Chavert, P., Posadas-Yag\"ue, JL. Lowcost modular devices for onroad vehicle detection and characterisation. Des Autom Embed Syst 27, 85.102 (2023)
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Detecting and characterising vehicles is one of the purposes of embedded systems used in intelligent environments. An analysis of a vehicle characteristics can reveal inappropriate or dangerous behaviour. This detection makes it possible to sanction or notify emergency services to take early and practical actions. Vehicle detection and characterisation systems employ complex sensors such as video cameras, especially in urban environments. These sensors provide high precision and performance, although the price and computational requirements are proportional to their accuracy. These sensors offer high accuracy, but the price and computational requirements are directly proportional to their performance. This article introduces a system based on modular devices that is economical and has a low computational cost. These devices use ultrasonic sensors to detect the speed and length of vehicles. The measurement accuracy is improved through the collaboration of the device modules. The experiments were performed using multiple modules oriented to different angles. This module is coupled with another specifically designed to detect distance using previous modules speed and length data. The collaboration between different modules reduces the speed relative error ranges from 1 to 5, depending on the angle configuration used in the modules.

[78]  arXiv:2405.00688 (cross-list from cs.RO) [pdf, ps, other]
Title: Understanding Social Perception, Interactions, and Safety Aspects of Sidewalk Delivery Robots Using Sentiment Analysis
Authors: Yuchen Du, Tho V. Le
Comments: 34 pages, 7 figures, 2 tables
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

This article presents a comprehensive sentiment analysis (SA) of comments on YouTube videos related to Sidewalk Delivery Robots (SDRs). We manually annotated the collected YouTube comments with three sentiment labels: negative (0), positive (1), and neutral (2). We then constructed models for text sentiment classification and tested the models' performance on both binary and ternary classification tasks in terms of accuracy, precision, recall, and F1 score. Our results indicate that, in binary classification tasks, the Support Vector Machine (SVM) model using Term Frequency-Inverse Document Frequency (TF-IDF) and N-gram get the highest accuracy. In ternary classification tasks, the model using Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory Networks (LSTM) and Gated Recurrent Unit (GRU) significantly outperforms other machine learning models, achieving an accuracy, precision, recall, and F1 score of 0.78. Additionally, we employ the Latent Dirichlet Allocation model to generate 10 topics from the comments to explore the public's underlying views on SDRs. Drawing from these findings, we propose targeted recommendations for shaping future policies concerning SDRs. This work provides valuable insights for stakeholders in the SDR sector regarding social perception, interaction, and safety.

[79]  arXiv:2405.00695 (cross-list from cs.RO) [pdf, other]
Title: Joint torques prediction of a robotic arm using neural networks
Comments: 6 pages, 5 figures, submitted to CASE 2024
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Accurate dynamic models are crucial for many robotic applications. Traditional approaches to deriving these models are based on the application of Lagrangian or Newtonian mechanics. Although these methods provide a good insight into the physical behaviour of the system, they rely on the exact knowledge of parameters such as inertia, friction and joint flexibility. In addition, the system is often affected by uncertain and nonlinear effects, such as saturation and dead zones, which can be difficult to model. A popular alternative is the application of Machine Learning (ML) techniques - e.g., Neural Networks (NNs) - in the context of a "black-box" methodology. This paper reports on our experience with this approach for a real-life 6 degrees of freedom (DoF) manipulator. Specifically, we considered several NN architectures: single NN, multiple NNs, and cascade NN. We compared the performance of the system by using different policies for selecting the NN hyperparameters. Our experiments reveal that the best accuracy and performance are obtained by a cascade NN, in which we encode our prior physical knowledge about the dependencies between joints, complemented by an appropriate optimisation of the hyperparameters.

[80]  arXiv:2405.00697 (cross-list from q-fin.CP) [pdf, other]
Title: Pricing Catastrophe Bonds -- A Probabilistic Machine Learning Approach
Subjects: Computational Finance (q-fin.CP); Machine Learning (cs.LG); Pricing of Securities (q-fin.PR); Applications (stat.AP)

This paper proposes a probabilistic machine learning method to price catastrophe (CAT) bonds in the primary market. The proposed method combines machine-learning-based predictive models with Conformal Prediction, an innovative algorithm that generates distribution-free probabilistic forecasts for CAT bond prices. Using primary market CAT bond transaction records between January 1999 and March 2021, the proposed method is found to be more robust and yields more accurate predictions of the bond spreads than traditional regression-based methods. Furthermore, the proposed method generates more informative prediction intervals than linear regression and identifies important nonlinear relationships between various risk factors and bond spreads, suggesting that linear regressions could misestimate the bond spreads. Overall, this paper demonstrates the potential of machine learning methods in improving the pricing of CAT bonds.

[81]  arXiv:2405.00699 (cross-list from cs.NE) [pdf, other]
Title: Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Spiking Neural Network (SNN) is acknowledged as the next generation of Artificial Neural Network (ANN) and hold great promise in effectively processing spatial-temporal information. However, the choice of timestep becomes crucial as it significantly impacts the accuracy of the neural network training. Specifically, a smaller timestep indicates better performance in efficient computing, resulting in reduced latency and operations. While, using a small timestep may lead to low accuracy due to insufficient information presentation with few spikes. This observation motivates us to develop an SNN that is more reliable for adaptive timestep by introducing a novel regularisation technique, namely Spatial-Temporal Regulariser (STR). Our approach regulates the ratio between the strength of spikes and membrane potential at each timestep. This effectively balances spatial and temporal performance during training, ultimately resulting in an Anytime Optimal Inference (AOI) SNN. Through extensive experiments on frame-based and event-based datasets, our method, in combination with cutoff based on softmax output, achieves state-of-the-art performance in terms of both latency and accuracy. Notably, with STR and cutoff, SNN achieves 2.14 to 2.89 faster in inference compared to the pre-configured timestep with near-zero accuracy drop of 0.50% to 0.64% over the event-based datasets. Code available: https://github.com/Dengyu-Wu/AOI-SNN-Regularisation

[82]  arXiv:2405.00705 (cross-list from cs.CL) [pdf, other]
Title: SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

The pre-trained Large Language Models (LLMs) can be adapted for many downstream tasks and tailored to align with human preferences through fine-tuning. Recent studies have discovered that LLMs can achieve desirable performance with only a small amount of high-quality data, suggesting that a large amount of the data in these extensive datasets is redundant or even harmful. Identifying high-quality data from vast datasets to curate small yet effective datasets has emerged as a critical challenge. In this paper, we introduce SHED, an automated dataset refinement framework based on Shapley value for instruction fine-tuning. SHED eliminates the need for human intervention or the use of commercial LLMs. Moreover, the datasets curated through SHED exhibit transferability, indicating they can be reused across different LLMs with consistently high performance. We conduct extensive experiments to evaluate the datasets curated by SHED. The results demonstrate SHED's superiority over state-of-the-art methods across various tasks and LLMs; notably, datasets comprising only 10% of the original data selected by SHED achieve performance comparable to or surpassing that of the full datasets.

[83]  arXiv:2405.00708 (cross-list from cs.CL) [pdf, other]
Title: Interactive Analysis of LLMs using Meaningful Counterfactuals
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Counterfactual examples are useful for exploring the decision boundaries of machine learning models and determining feature attributions. How can we apply counterfactual-based methods to analyze and explain LLMs? We identify the following key challenges. First, the generated textual counterfactuals should be meaningful and readable to users and thus can be mentally compared to draw conclusions. Second, to make the solution scalable to long-form text, users should be equipped with tools to create batches of counterfactuals from perturbations at various granularity levels and interactively analyze the results. In this paper, we tackle the above challenges and contribute 1) a novel algorithm for generating batches of complete and meaningful textual counterfactuals by removing and replacing text segments in different granularities, and 2) LLM Analyzer, an interactive visualization tool to help users understand an LLM's behaviors by interactively inspecting and aggregating meaningful counterfactuals. We evaluate the proposed algorithm by the grammatical correctness of its generated counterfactuals using 1,000 samples from medical, legal, finance, education, and news datasets. In our experiments, 97.2% of the counterfactuals are grammatically correct. Through a use case, user studies, and feedback from experts, we demonstrate the usefulness and usability of the proposed interactive visualization tool.

[84]  arXiv:2405.00709 (cross-list from cs.CL) [pdf, other]
Title: Evaluating Tool-Augmented Agents in Remote Sensing Platforms
Comments: ICLR 2024 Machine Learning for Remote Sensing (ML4RS) Workshop
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Tool-augmented Large Language Models (LLMs) have shown impressive capabilities in remote sensing (RS) applications. However, existing benchmarks assume question-answering input templates over predefined image-text data pairs. These standalone instructions neglect the intricacies of realistic user-grounded tasks. Consider a geospatial analyst: they zoom in a map area, they draw a region over which to collect satellite imagery, and they succinctly ask "Detect all objects here". Where is `here`, if it is not explicitly hardcoded in the image-text template, but instead is implied by the system state, e.g., the live map positioning? To bridge this gap, we present GeoLLM-QA, a benchmark designed to capture long sequences of verbal, visual, and click-based actions on a real UI platform. Through in-depth evaluation of state-of-the-art LLMs over a diverse set of 1,000 tasks, we offer insights towards stronger agents for RS applications.

[85]  arXiv:2405.00710 (cross-list from cs.CL) [pdf, ps, other]
Title: Homonym Sense Disambiguation in the Georgian Language
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Georgian language, based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on a dataset formed by filtering the Georgian Common Crawls corpus. The dataset is used to train a classifier for words with multiple senses. Additionally, we present experimental results of using LSTM for WSD. Accurately disambiguating homonyms is crucial in natural language processing. Georgian, an agglutinative language belonging to the Kartvelian language family, presents unique challenges in this context. The aim of this paper is to highlight the specific problems concerning homonym disambiguation in the Georgian language and to present our approach to solving them. The techniques discussed in the article achieve 95% accuracy for predicting lexical meanings of homonyms using a hand-classified dataset of over 7500 sentences.

[86]  arXiv:2405.00712 (cross-list from eess.SP) [pdf, other]
Title: SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Human Activity Recognition (HAR) is a well-studied field with research dating back to the 1980s. Over time, HAR technologies have evolved significantly from manual feature extraction, rule-based algorithms, and simple machine learning models to powerful deep learning models, from one sensor type to a diverse array of sensing modalities. The scope has also expanded from recognising a limited set of activities to encompassing a larger variety of both simple and complex activities. However, there still exist many challenges that hinder advancement in complex activity recognition using modern deep learning methods. In this paper, we comprehensively systematise factors leading to inaccuracy in complex HAR, such as data variety and model capacity. Among many sensor types, we give more attention to wearable and camera due to their prevalence. Through this Systematisation of Knowledge (SoK) paper, readers can gain a solid understanding of the development history and existing challenges of HAR, different categorisations of activities, obstacles in deep learning-based complex HAR that impact accuracy, and potential research directions.

[87]  arXiv:2405.00715 (cross-list from cs.CL) [pdf, other]
Title: Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large Language Models (LLMs) have shown promising capabilities in handling clinical text summarization tasks. In this study, we demonstrate that a small open-source LLM can be effectively trained to generate high-quality clinical notes from outpatient patient-doctor dialogues. We achieve this through a comprehensive domain- and task-specific adaptation process for the LLaMA-2 13 billion parameter model. This process incorporates continued pre-training, supervised fine-tuning, and reinforcement learning from both AI and human feedback. We introduced an enhanced approach, termed DistillDirect, for performing on-policy reinforcement learning with Gemini Pro serving as the teacher model. Our resulting model, LLaMA-Clinic, is capable of generating clinical notes that are comparable in quality to those authored by physicians. In a blinded physician reader study, the majority (90.4%) of individual evaluations rated the notes generated by LLaMA-Clinic as "acceptable" or higher across all three criteria: real-world readiness, completeness, and accuracy. Notably, in the more challenging "Assessment and Plan" section, LLaMA-Clinic scored higher (4.2/5) in real-world readiness compared to physician-authored notes (4.1/5). Additionally, we identified caveats in public clinical note datasets, such as ACI-BENCH. We highlight key considerations for future clinical note-generation tasks, emphasizing the importance of pre-defining a best-practice note format. Overall, our research demonstrates the potential and feasibility of training smaller, open-source LLMs to assist with clinical documentation, capitalizing on healthcare institutions' access to patient records and domain expertise. We have made our newly created synthetic clinic dialogue-note dataset and the physician feedback dataset publicly available to foster future research in this field.

[88]  arXiv:2405.00719 (cross-list from eess.SP) [pdf, other]
Title: EEG-Deformer: A Dense Convolutional Transformer for Brain-computer Interfaces
Comments: 10 pages, 9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine temporal dynamics of EEG signals. To overcome this limitation, we introduce EEG-Deformer, which incorporates two main novel components into a CNN-Transformer: (1) a Hierarchical Coarse-to-Fine Transformer (HCT) block that integrates a Fine-grained Temporal Learning (FTL) branch into Transformers, effectively discerning coarse-to-fine temporal patterns; and (2) a Dense Information Purification (DIP) module, which utilizes multi-level, purified temporal information to enhance decoding accuracy. Comprehensive experiments on three representative cognitive tasks consistently verify the generalizability of our proposed EEG-Deformer, demonstrating that it either outperforms existing state-of-the-art methods or is comparable to them. Visualization results show that EEG-Deformer learns from neurophysiologically meaningful brain regions for the corresponding cognitive tasks. The source code can be found at https://github.com/yi-ding-cs/EEG-Deformer.

[89]  arXiv:2405.00720 (cross-list from eess.SP) [pdf, other]
Title: A Novel Machine Learning-based Equalizer for a Downstream 100G PAM-4 PON
Comments: 3 pages, 6 figures, accepted by Optical Fiber Communications Conference and Exhibition 2024
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

A frequency-calibrated SCINet (FC-SCINet) equalizer is proposed for down-stream 100G PON with 28.7 dB path loss. At 5 km, FC-SCINet improves the BER by 88.87% compared to FFE and a 3-layer DNN with 10.57% lower complexity.

[90]  arXiv:2405.00721 (cross-list from eess.SP) [pdf, other]
Title: Optimizing Brain-Computer Interface Performance: Advancing EEG Signals Channel Selection through Regularized CSP and SPEA II Multi-Objective Optimization
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Brain-computer interface systems and the recording of brain activity has garnered significant attention across a diverse spectrum of applications. EEG signals have emerged as a modality for recording neural electrical activity. Among the methodologies designed for feature extraction from EEG data, the method of RCSP has proven to be an approach, particularly in the context of MI tasks. RCSP exhibits efficacy in the discrimination and classification of EEG signals. In optimizing the performance of this method, our research extends to a comparative analysis with conventional CSP techniques, as well as optimized methodologies designed for similar applications. Notably, we employ the meta-heuristic multi-objective Strength Pareto Evolutionary Algorithm II (SPEA-II) as a pivotal component of our research paradigm. This is a state-of-the-art approach in the selection of an subset of channels from a multichannel EEG signal with MI tasks. Our main objective is to formulate an optimum channel selection strategy aimed at identifying the most pertinent subset of channels from the multi-dimensional electroencephalogram (EEG) signals. One of the primary objectives inherent to channel selection in the EEG signal analysis pertains to the reduction of the channel count, an approach that enhances user comfort when utilizing gel-based EEG electrodes. Additionally, within this research, we took benefit of ensemble learning models as a component of our decision-making. This technique serves to mitigate the challenges associated with overfitting, especially when confronted with an extensive array of potentially redundant EEG channels and data noise. Our findings not only affirm the performance of RCSP in MI-based BCI systems, but also underscore the significance of channel selection strategies and ensemble learning techniques in optimizing the performance of EEG signal classification.

[91]  arXiv:2405.00723 (cross-list from eess.SP) [pdf, other]
Title: EEG_RL-Net: Enhancing EEG MI Classification through Reinforcement Learning-Optimised Graph Neural Networks
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Brain-Computer Interfaces (BCIs) rely on accurately decoding electroencephalography (EEG) motor imagery (MI) signals for effective device control. Graph Neural Networks (GNNs) outperform Convolutional Neural Networks (CNNs) in this regard, by leveraging the spatial relationships between EEG electrodes through adjacency matrices. The EEG_GLT-Net framework, featuring the state-of-the-art EEG_GLT adjacency matrix method, has notably enhanced EEG MI signal classification, evidenced by an average accuracy of 83.95% across 20 subjects on the PhysioNet dataset. This significantly exceeds the 76.10% accuracy rate achieved using the Pearson Correlation Coefficient (PCC) method within the same framework.
In this research, we advance the field by applying a Reinforcement Learning (RL) approach to the classification of EEG MI signals. Our innovative method empowers the RL agent, enabling not only the classification of EEG MI data points with higher accuracy, but effective identification of EEG MI data points that are less distinct. We present the EEG_RL-Net, an enhancement of the EEG_GLT-Net framework, which incorporates the trained EEG GCN Block from EEG_GLT-Net at an adjacency matrix density of 13.39% alongside the RL-centric Dueling Deep Q Network (Dueling DQN) block. The EEG_RL-Net model showcases exceptional classification performance, achieving an unprecedented average accuracy of 96.40% across 20 subjects within 25 milliseconds. This model illustrates the transformative effect of the RL in EEG MI time point classification.

[92]  arXiv:2405.00724 (cross-list from eess.SP) [pdf, other]
Title: Baseline Drift Tolerant Signal Encoding for ECG Classification with Deep Learning
Comments: 4 pages, 3 figures. Submitted to 46th Annual International Conference of the IEEE Engineering in Medicine and Biology 2024
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Common artefacts such as baseline drift, rescaling, and noise critically limit the performance of machine learningbased automated ECG analysis and interpretation. This study proposes Derived Peak (DP) encoding, a non-parametric method that generates signed spikes corresponding to zero crossings of the signals first and second-order time derivatives. Notably, DP encoding is invariant to shift and scaling artefacts, and its implementation is further simplified by the absence of userdefined parameters. DP encoding was used to encode the 12-lead ECG data from the PTB-XL dataset (n=18,869 participants) and was fed to 1D-ResNet-18 models trained to identify myocardial infarction, conductive deficits and ST-segment abnormalities. Robustness to artefacts was assessed by corrupting ECG data with sinusoidal baseline drift, shift, rescaling and noise, before encoding. The addition of these artefacts resulted in a significant drop in accuracy for seven other methods from prior art, while DP encoding maintained a baseline AUC of 0.88 under drift, shift and rescaling. DP achieved superior performance to unencoded inputs in the presence of shift (AUC under 1mV shift: 0.91 vs 0.62), and rescaling artefacts (AUC 0.91 vs 0.79). Thus, DP encoding is a simple method by which robustness to common ECG artefacts may be improved for automated ECG analysis and interpretation.

[93]  arXiv:2405.00725 (cross-list from eess.SP) [pdf, other]
Title: Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data
Comments: Accepted for ICMHI 2024
Subjects: Signal Processing (eess.SP); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training. Our code is available at https://github.com/vikhyatt/Hospital-FL-DP.

[94]  arXiv:2405.00727 (cross-list from eess.SP) [pdf, other]
Title: Generalised envelope spectrum-based signal-to-noise objectives: Formulation, optimisation and application for gear fault detection under time-varying speed conditions
Comments: 27 pages, 15 figures, tables 1, submitted MSSP review
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Methodology (stat.ME)

In vibration-based condition monitoring, optimal filter design improves fault detection by enhancing weak fault signatures within vibration signals. This process involves optimising a derived objective function from a defined objective. The objectives are often based on proxy health indicators to determine the filter's parameters. However, these indicators can be compromised by irrelevant extraneous signal components and fluctuating operational conditions, affecting the filter's efficacy. Fault detection primarily uses the fault component's prominence in the squared envelope spectrum, quantified by a squared envelope spectrum-based signal-to-noise ratio. New optimal filter objective functions are derived from the proposed generalised envelope spectrum-based signal-to-noise objective for machines operating under variable speed conditions. Instead of optimising proxy health indicators, the optimal filter coefficients of the formulation directly maximise the squared envelope spectrum-based signal-to-noise ratio over targeted frequency bands using standard gradient-based optimisers. Four derived objective functions from the proposed objective effectively outperform five prominent methods in tests on three experimental datasets.

[95]  arXiv:2405.00732 (cross-list from cs.CL) [pdf, other]
Title: LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models (LLMs). LoRA reduces the number of trainable parameters and memory usage while achieving comparable performance to full fine-tuning. We aim to assess the viability of training and serving LLMs fine-tuned with LoRA in real-world applications. First, we measure the quality of LLMs fine-tuned with quantized low rank adapters across 10 base models and 31 tasks for a total of 310 models. We find that 4-bit LoRA fine-tuned models outperform base models by 34 points and GPT-4 by 10 points on average. Second, we investigate the most effective base models for fine-tuning and assess the correlative and predictive capacities of task complexity heuristics in forecasting the outcomes of fine-tuning. Finally, we evaluate the latency and concurrency capabilities of LoRAX, an open-source Multi-LoRA inference server that facilitates the deployment of multiple LoRA fine-tuned models on a single GPU using shared base model weights and dynamic adapter loading. LoRAX powers LoRA Land, a web application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA A100 GPU with 80GB memory. LoRA Land highlights the quality and cost-effectiveness of employing multiple specialized LLMs over a single, general-purpose LLM.

[96]  arXiv:2405.00734 (cross-list from eess.SP) [pdf, other]
Title: EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Manifold Attention and Confidence Stratification (MACS) to diagnose neurodegenerative disorders based on EEG signals sourced from four centers with unreliable annotations. The MACS framework's effectiveness stems from these features: 1) The Augmentor generates various EEG-represented brain variants to enrich the data space; 2) The Switcher enhances the feature space for trusted samples and reduces overfitting on incorrectly labeled samples; 3) The Encoder uses the Riemannian manifold and Euclidean metrics to capture spatiotemporal variations and dynamic synchronization in EEG; 4) The Projector, equipped with dual heads, monitors consistency across multiple brain variants and ensures diagnostic accuracy; 5) The Stratifier adaptively stratifies learned samples by confidence levels throughout the training process; 6) Forward and backpropagation in MACS are constrained by confidence stratification to stabilize the learning system amid unreliable annotations. Our subject-independent experiments, conducted on both neurocognitive and movement disorders using cross-center corpora, have demonstrated superior performance compared to existing related algorithms. This work not only improves EEG-based diagnostics for cross-center and small-setting brain diseases but also offers insights into extending MACS techniques to other data analyses, tackling data heterogeneity and annotation unreliability in multimedia and multimodal content understanding.

[97]  arXiv:2405.00736 (cross-list from eess.SP) [pdf, other]
Title: Joint Signal Detection and Automatic Modulation Classification via Deep Learning
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different carrier frequencies. We first generate a coexisting RADIOML dataset (CRML23) to facilitate the joint design. Different from the publicly available AMC dataset ignoring the signal detection step and containing only one signal, our synthetic dataset covers the more realistic multiple-signal coexisting scenario. Then, we present a joint framework for detection and classification (JDM) for such a multiple-signal coexisting environment, which consists of two modules for signal detection and AMC, respectively. In particular, these two modules are interconnected using a designated data structure called "proposal". Finally, we conduct extensive simulations over the newly developed dataset, which demonstrate the effectiveness of our designs. Our code and dataset are now available as open-source (https://github.com/Singingkettle/ChangShuoRadioData).

[98]  arXiv:2405.00738 (cross-list from cs.AR) [pdf, other]
Title: HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis
Comments: 7 pages, 2 figures
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs). However, GPUs require large amounts of energy, which poses environmental concerns, demands high operational costs, and causes GPUs to be unsuitable for edge computing. We develop an accelerator for transformers, namely, Llama 2, an open-source state-of-the-art LLM, using high level synthesis (HLS) on Field Programmable Gate Arrays (FPGAs). HLS allows us to rapidly prototype FPGA designs without writing code at the register-transfer level (RTL). We name our method HLSTransform, and the FPGA designs we synthesize with HLS achieve up to a 12.75x reduction and 8.25x reduction in energy used per token on the Xilinx Virtex UltraScale+ VU9P FPGA compared to an Intel Xeon Broadwell E5-2686 v4 CPU and NVIDIA RTX 3090 GPU respectively, while increasing inference speeds by up to 2.46x compared to CPU and maintaining 0.53x the speed of an RTX 3090 GPU despite the GPU's 4 times higher base clock rate. With the lack of existing open-source FPGA accelerators for transformers, we open-source our code and document our steps for synthesis. We hope this work will serve as a step in democratizing the use of FPGAs in transformer inference and inspire research into energy-efficient inference methods as a whole. The code can be found on https://github.com/HLSTransform/submission.

[99]  arXiv:2405.00740 (cross-list from cs.CV) [pdf, other]
Title: Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Comments: 14 pages, 8 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.

[100]  arXiv:2405.00742 (cross-list from cs.CR) [pdf, other]
Title: Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks
Comments: 11 pages,4 figures
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)

Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among different stations. To address these challenges, a federated graph learning approach involving multiple charging stations is proposed to collaboratively train a more generalized deep learning model for demand forecasting while capturing spatial correlations among various stations and enhancing robustness against potential attacks. Firstly, for better model performance, a Graph Neural Network (GNN) model is leveraged to characterize the geographic correlation among different charging stations in a federated manner. Secondly, to ensure robustness and deal with the data heterogeneity in a federated setting, a message passing that utilizes a global attention mechanism to aggregate personalized models for each client is proposed. Thirdly, by concerning cyberattacks, a special credit-based function is designed to mitigate potential threats from malicious clients or unwanted attacks. Extensive experiments on a public EV charging dataset are conducted using various deep learning techniques and federated learning methods to demonstrate the prediction accuracy and robustness of the proposed approach.

[101]  arXiv:2405.00749 (cross-list from cs.CV) [pdf, other]
Title: More is Better: Deep Domain Adaptation with Multiple Sources
Comments: Accepted by IJCAI 2024. arXiv admin note: text overlap with arXiv:2002.12169
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to domain shift. Domain adaptation (DA) aims to address this problem by aligning the distributions between the source and target domains. Multi-source domain adaptation (MDA) is a powerful and practical extension in which the labeled data may be collected from multiple sources with different distributions. In this survey, we first define various MDA strategies. Then we systematically summarize and compare modern MDA methods in the deep learning era from different perspectives, followed by commonly used datasets and a brief benchmark. Finally, we discuss future research directions for MDA that are worth investigating.

[102]  arXiv:2405.00751 (cross-list from q-bio.QM) [pdf, other]
Title: F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching
Comments: Accepted by ICLR 2024 GEM workshop
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Molecular dynamics (MD) is a crucial technique for simulating biological systems, enabling the exploration of their dynamic nature and fostering an understanding of their functions and properties. To address exploration inefficiency, emerging enhanced sampling approaches like coarse-graining (CG) and generative models have been employed. In this work, we propose a \underline{Frame-to-Frame} generative model with guided \underline{Flow}-matching (F$3$low) for enhanced sampling, which (a) extends the domain of CG modeling to the SE(3) Riemannian manifold; (b) retreating CGMD simulations as autoregressively sampling guided by the former frame via flow-matching models; (c) targets the protein backbone, offering improved insights into secondary structure formation and intricate folding pathways. Compared to previous methods, F$3$low allows for broader exploration of conformational space. The ability to rapidly generate diverse conformations via force-free generative paradigm on SE(3) paves the way toward efficient enhanced sampling methods.

[103]  arXiv:2405.00754 (cross-list from cs.CV) [pdf, other]
Title: CLIPArTT: Light-weight Adaptation of CLIP to New Domains at Test Time
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Pre-trained vision-language models (VLMs), exemplified by CLIP, demonstrate remarkable adaptability across zero-shot classification tasks without additional training. However, their performance diminishes in the presence of domain shifts. In this study, we introduce CLIP Adaptation duRing Test-Time (CLIPArTT), a fully test-time adaptation (TTA) approach for CLIP, which involves automatic text prompts construction during inference for their use as text supervision. Our method employs a unique, minimally invasive text prompt tuning process, wherein multiple predicted classes are aggregated into a single new text prompt, used as pseudo label to re-classify inputs in a transductive manner. Additionally, we pioneer the standardization of TTA benchmarks (e.g., TENT) in the realm of VLMs. Our findings demonstrate that, without requiring additional transformations nor new trainable modules, CLIPArTT enhances performance dynamically across non-corrupted datasets such as CIFAR-10, corrupted datasets like CIFAR-10-C and CIFAR-10.1, alongside synthetic datasets such as VisDA-C. This research underscores the potential for improving VLMs' adaptability through novel test-time strategies, offering insights for robust performance across varied datasets and environments. The code can be found at: https://github.com/dosowiechi/CLIPArTT.git

[104]  arXiv:2405.00755 (cross-list from cs.ET) [pdf, other]
Title: Quantum AI for Alzheimer's disease early screening
Comments: 18 pages, 6 figures
Subjects: Emerging Technologies (cs.ET); Machine Learning (cs.LG); Quantum Physics (quant-ph)

Quantum machine learning is a new research field combining quantum information science and machine learning. Quantum computing technologies seem to be particularly well suited to solving problems in the health sector in an efficient way, because they may deal with large datasets more efficiently than classical AI.
Alzheimer's disease is a neurodegenerative brain disorder that mostly affects elderly people, causing important cognitive impairments. It is the most common cause of dementia and it has an effect on memory, thought, learning abilities and movement control. This type of disease has no cure, consequently an early diagnosis is fundamental for reducing its impact. The analysis of handwriting can be effective for diagnosing, as many researches have conjectured. The DARWIN (Diagnosis AlzheimeR WIth haNdwriting) dataset contains handwriting samples from people affected by Alzheimer's disease and a group of healthy people. Here we apply quantum AI to this use-case. In particular, we use this dataset to test kernel methods for classification task and compare their performances with the ones obtained via quantum machine learning methods. We find that quantum and classical algorithms achieve similar performances and in some cases quantum methods perform even better.
Our results pave the way for future new quantum machine learning applications in early-screening diagnostics in the healthcare domain.

[105]  arXiv:2405.00770 (cross-list from quant-ph) [pdf, other]
Title: Quantum-Classical Separations in Shallow-Circuit-Based Learning with and without Noises
Comments: 14 pages, 3 figures
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Machine Learning (cs.LG)

We study quantum-classical separations between classical and quantum supervised learning models based on constant depth (i.e., shallow) circuits, in scenarios with and without noises. We construct a classification problem defined by a noiseless shallow quantum circuit and rigorously prove that any classical neural network with bounded connectivity requires logarithmic depth to output correctly with a larger-than-exponentially-small probability. This unconditional near-optimal quantum-classical separation originates from the quantum nonlocality property that distinguishes quantum circuits from their classical counterparts. We further derive the noise thresholds for demonstrating such a separation on near-term quantum devices under the depolarization noise model. We prove that this separation will persist if the noise strength is upper bounded by an inverse polynomial with respect to the system size, and vanish if the noise strength is greater than an inverse polylogarithmic function. In addition, for quantum devices with constant noise strength, we prove that no super-polynomial classical-quantum separation exists for any classification task defined by shallow Clifford circuits, independent of the structures of the circuits that specify the learning models.

[106]  arXiv:2405.00781 (cross-list from quant-ph) [pdf, other]
Title: A Review of Barren Plateaus in Variational Quantum Computing
Comments: 21 pages, 10 boxes
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)

Variational quantum computing offers a flexible computational paradigm with applications in diverse areas. However, a key obstacle to realizing their potential is the Barren Plateau (BP) phenomenon. When a model exhibits a BP, its parameter optimization landscape becomes exponentially flat and featureless as the problem size increases. Importantly, all the moving pieces of an algorithm -- choices of ansatz, initial state, observable, loss function and hardware noise -- can lead to BPs when ill-suited. Due to the significant impact of BPs on trainability, researchers have dedicated considerable effort to develop theoretical and heuristic methods to understand and mitigate their effects. As a result, the study of BPs has become a thriving area of research, influencing and cross-fertilizing other fields such as quantum optimal control, tensor networks, and learning theory. This article provides a comprehensive review of the current understanding of the BP phenomenon.

[107]  arXiv:2405.00782 (cross-list from math.DS) [pdf, other]
Title: Rigged Dynamic Mode Decomposition: Data-Driven Generalized Eigenfunction Decompositions for Koopman Operators
Subjects: Dynamical Systems (math.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Spectral Theory (math.SP)

We introduce the Rigged Dynamic Mode Decomposition (Rigged DMD) algorithm, which computes generalized eigenfunction decompositions of Koopman operators. By considering the evolution of observables, Koopman operators transform complex nonlinear dynamics into a linear framework suitable for spectral analysis. While powerful, traditional Dynamic Mode Decomposition (DMD) techniques often struggle with continuous spectra. Rigged DMD addresses these challenges with a data-driven methodology that approximates the Koopman operator's resolvent and its generalized eigenfunctions using snapshot data from the system's evolution. At its core, Rigged DMD builds wave-packet approximations for generalized Koopman eigenfunctions and modes by integrating Measure-Preserving Extended Dynamic Mode Decomposition with high-order kernels for smoothing. This provides a robust decomposition encompassing both discrete and continuous spectral elements. We derive explicit high-order convergence theorems for generalized eigenfunctions and spectral measures. Additionally, we propose a novel framework for constructing rigged Hilbert spaces using time-delay embedding, significantly extending the algorithm's applicability. We provide examples, including systems with a Lebesgue spectrum, integrable Hamiltonian systems, the Lorenz system, and a high-Reynolds number lid-driven flow in a two-dimensional square cavity, demonstrating Rigged DMD's convergence, efficiency, and versatility. This work paves the way for future research and applications of decompositions with continuous spectra.

[108]  arXiv:2405.00790 (cross-list from cs.AR) [pdf, other]
Title: SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)

Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. To address such increasing demands, designing a scalable hardware architecture became a key problem. Among recent solutions, the 2.5D silicon interposer multi-chip module (MCM)-based AI accelerator has been actively explored as a promising scalable solution due to their significant benefits in the low engineering cost and composability. However, previous MCM accelerators are based on homogeneous architectures with fixed dataflow, which encounter major challenges from highly heterogeneous multi-model workloads due to their limited workload adaptivity. Therefore, in this work, we explore the opportunity in the heterogeneous dataflow MCM AI accelerators. We identify the scheduling of multi-model workload on heterogeneous dataflow MCM AI accelerator is an important and challenging problem due to its significance and scale, which reaches O(10^18) scale even for a single model case on 6x6 chiplets. We develop a set of heuristics to navigate the huge scheduling space and codify them into a scheduler with advanced techniques such as inter-chiplet pipelining. Our evaluation on ten multi-model workload scenarios for datacenter multitenancy and AR/VR use-cases has shown the efficacy of our approach, achieving on average 35.3% and 31.4% less energy-delay product (EDP) for the respective applications settings compared to homogeneous baselines.

[109]  arXiv:2405.00816 (cross-list from cs.SI) [pdf, ps, other]
Title: Sifting out communities in large sparse networks
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)

Research data sets are growing to unprecedented sizes and network modeling is commonly used to extract complex relationships in diverse domains, such as genetic interactions involved in disease, logistics, and social communities. As the number of nodes increases in a network, an increasing sparsity of edges is a practical limitation due to memory restrictions. Moreover, many of these sparse networks exhibit very large numbers of nodes with no adjacent edges, as well as disjoint components of nodes with no edges connecting them. A prevalent aim in network modeling is the identification of clusters, or communities, of nodes that are highly interrelated. Several definitions of strong community structure have been introduced to facilitate this task, each with inherent assumptions and biases. We introduce an intuitive objective function for quantifying the quality of clustering results in large sparse networks. We utilize a two-step method for identifying communities which is especially well-suited for this domain as the first step efficiently divides the network into the disjoint components, while the second step optimizes clustering of the produced components based on the new objective. Using simulated networks, optimization based on the new objective function consistently yields significantly higher accuracy than those based on the modularity function, with the widest gaps appearing for the noisiest networks. Additionally, applications to benchmark problems illustrate the intuitive correctness of our approach. Finally, the practicality of our approach is demonstrated in real-world data in which we identify complex genetic interactions in large-scale networks comprised of tens of thousands of nodes. Based on these three different types of trials, our results clearly demonstrate the usefulness of our two-step procedure and the accuracy of our simple objective.

[110]  arXiv:2405.00820 (cross-list from cs.AR) [pdf, other]
Title: HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)

Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets.
In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs.
We showcase the versatility and multi-functionality of our framework through six case studies: I) Design space sampling; II) Fine-grained parallelism backend speedup; III) Targeting Intel's HLS flow; IV) Adding new auxiliary designs; V) Integrating published HLS data; VI) HLS tool version regression benchmarking.
Code at https://github.com/sharc-lab/HLSFactory.

[111]  arXiv:2405.00842 (cross-list from math.ST) [pdf, other]
Title: Quickest Change Detection with Confusing Change
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC)

In the problem of quickest change detection (QCD), a change occurs at some unknown time in the distribution of a sequence of independent observations. This work studies a QCD problem where the change is either a bad change, which we aim to detect, or a confusing change, which is not of our interest. Our objective is to detect a bad change as quickly as possible while avoiding raising a false alarm for pre-change or a confusing change. We identify a specific set of pre-change, bad change, and confusing change distributions that pose challenges beyond the capabilities of standard Cumulative Sum (CuSum) procedures. Proposing novel CuSum-based detection procedures, S-CuSum and J-CuSum, leveraging two CuSum statistics, we offer solutions applicable across all kinds of pre-change, bad change, and confusing change distributions. For both S-CuSum and J-CuSum, we provide analytical performance guarantees and validate them by numerical results. Furthermore, both procedures are computationally efficient as they only require simple recursive updates.

[112]  arXiv:2405.00846 (cross-list from cs.RO) [pdf, other]
Title: Gameplay Filters: Safe Robot Walking through Adversarial Imagination
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This paper presents a general approach that leverages offline game-theoretic reinforcement learning to synthesize a highly robust safety filter for high-order nonlinear dynamics. This gameplay filter then maintains runtime safety by continually simulating adversarial futures and precluding task-driven actions that would cause it to lose future games (and thereby violate safety). Validated on a 36-dimensional quadruped robot locomotion task, the gameplay safety filter exhibits inherent robustness to the sim-to-real gap without manual tuning or heuristic designs. Physical experiments demonstrate the effectiveness of the gameplay safety filter under perturbations, such as tugging and unmodeled irregular terrains, while simulation studies shed light on how to trade off computation and conservativeness without compromising safety.

[113]  arXiv:2405.00871 (cross-list from eess.SY) [pdf, other]
Title: Learning to Boost the Performance of Stable Nonlinear Systems
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems; crucially, we guarantee Lp closed-loop stability even if optimization is halted prematurely, and even when the ground-truth dynamics are unknown, with vanishing conservatism in the class of stabilizing policies as the model uncertainty is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.

[114]  arXiv:2405.00876 (cross-list from cs.CV) [pdf, other]
Title: Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Vision language models (VLMs) have recently emerged and gained the spotlight for their ability to comprehend the dual modality of image and textual data. VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning. Additionally, a universal segmentation model by Meta AI, Segment Anything Model (SAM) shows unprecedented performance at isolating objects from unforeseen images. Since medical experts, biologists, and materials scientists routinely examine microscopy or medical images in conjunction with textual information in the form of captions, literature, or reports, and draw conclusions of great importance and merit, it is indubitably essential to test the performance of VLMs and foundation models such as SAM, on these images. In this study, we charge ChatGPT, LLaVA, Gemini, and SAM with classification, segmentation, counting, and VQA tasks on a variety of microscopy images. We observe that ChatGPT and Gemini are impressively able to comprehend the visual features in microscopy images, while SAM is quite capable at isolating artefacts in a general sense. However, the performance is not close to that of a domain expert - the models are readily encumbered by the introduction of impurities, defects, artefact overlaps and diversity present in the images.

[115]  arXiv:2405.00906 (cross-list from cs.CV) [pdf, other]
Title: LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets
Authors: Ojasw Upadhyay
Comments: 3 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Vision transformers have revolutionized computer vision, but their computational demands present challenges for training and deployment. This paper introduces LOTUS (LOttery Transformers with Ultra Sparsity), a novel method that leverages data lottery ticket selection and sparsity pruning to accelerate vision transformer training while maintaining accuracy. Our approach focuses on identifying and utilizing the most informative data subsets and eliminating redundant model parameters to optimize the training process. Through extensive experiments, we demonstrate the effectiveness of LOTUS in achieving rapid convergence and high accuracy with significantly reduced computational requirements. This work highlights the potential of combining data selection and sparsity techniques for efficient vision transformer training, opening doors for further research and development in this area.

[116]  arXiv:2405.00908 (cross-list from cs.CV) [pdf, ps, other]
Title: Transformer-Based Self-Supervised Learning for Histopathological Classification of Ischemic Stroke Clot Origin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Background and Purpose: Identifying the thromboembolism source in ischemic stroke is crucial for treatment and secondary prevention yet is often undetermined. This study describes a self-supervised deep learning approach in digital pathology of emboli for classifying ischemic stroke clot origin from histopathological images. Methods: The dataset included whole slide images (WSI) from the STRIP AI Kaggle challenge, consisting of retrieved clots from ischemic stroke patients following mechanical thrombectomy. Transformer-based deep learning models were developed using transfer learning and self-supervised pretraining for classifying WSI. Customizations included an attention pooling layer, weighted loss function, and threshold optimization. Various model architectures were tested and compared, and model performances were primarily evaluated using weighted logarithmic loss. Results: The model achieved a logloss score of 0.662 in cross-validation and 0.659 on the test set. Different model backbones were compared, with the swin_large_patch4_window12_384 showed higher performance. Thresholding techniques for clot origin classification were employed to balance false positives and negatives. Conclusion: The study demonstrates the extent of efficacy of transformer-based deep learning models in identifying ischemic stroke clot origins from histopathological images and emphasizes the need for refined modeling techniques specifically adapted to thrombi WSI. Further research is needed to improve model performance, interpretability, validate its effectiveness. Future enhancement could include integrating larger patient cohorts, advanced preprocessing strategies, and exploring ensemble multimodal methods for enhanced diagnostic accuracy.

[117]  arXiv:2405.00914 (cross-list from math.OC) [pdf, other]
Title: Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization
Authors: Chris Junchi Li
Comments: arXiv admin note: text overlap with arXiv:2307.00126
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper presents a new algorithm member for accelerating first-order methods for bilevel optimization, namely the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation}, abbreviated as \texttt{(P)RAF${}^2$BA}. The algorithm leverages \emph{fully} first-order oracles and seeks approximate stationary points in nonconvex-strongly-convex bilevel optimization, enhancing oracle complexity for efficient optimization. Theoretical guarantees for finding approximate first-order stationary points and second-order stationary points at the state-of-the-art query complexities are established, showcasing their effectiveness in solving complex optimization tasks. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms. The significance of \texttt{(P)RAF${}^2$BA} in optimizing nonconvex-strongly-convex bilevel optimization problems is underscored by its state-of-the-art convergence rates and computational efficiency.

[118]  arXiv:2405.00915 (cross-list from cs.CV) [pdf, other]
Title: EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
Comments: 25 pages. 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Code and trained models are open-sourced.

[119]  arXiv:2405.00934 (cross-list from eess.AS) [pdf, ps, other]
Title: Benchmarking Representations for Speech, Music, and Acoustic Events
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes. ARCH streamlines benchmarking of ARL techniques through its unified access to a wide range of domains and its ability to readily incorporate new datasets and models. To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets. We argue that the presented wide-ranging evaluation provides valuable insights into state-of-the-art ARL methods, and is useful to pinpoint promising research directions.

[120]  arXiv:2405.00964 (cross-list from math.ST) [pdf, other]
Title: Deriving Lehmer and Hölder means as maximum weighted likelihood estimates for the multivariate exponential family
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG)

The links between the mean families of Lehmer and H\"older and the weighted maximum likelihood estimator have recently been established in the case of a regular univariate exponential family. In this article, we will extend the outcomes obtained to the multivariate case. This extension provides a probabilistic interpretation of these families of means and could therefore broaden their uses in various applications.

[121]  arXiv:2405.00972 (cross-list from cs.CL) [pdf, other]
Title: CACTUS: Chemistry Agent Connecting Tool-Usage to Science
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Quantitative Methods (q-bio.QM)

Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama2-7b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b and Mistral-7b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with domain-specific tools, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment. Furthermore, CACTUS represents a significant milestone in the field of cheminformatics, offering an adaptable tool for researchers engaged in chemistry and molecular discovery. By integrating the strengths of open-source LLMs with domain-specific tools, CACTUS has the potential to accelerate scientific advancement and unlock new frontiers in the exploration of novel, effective, and safe therapeutic candidates, catalysts, and materials. Moreover, CACTUS's ability to integrate with automated experimentation platforms and make data-driven decisions in real time opens up new possibilities for autonomous discovery.

[122]  arXiv:2405.00988 (cross-list from cs.CL) [pdf, other]
Title: Context-Aware Clustering using Large Language Models
Comments: 16 pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Despite the remarkable success of Large Language Models (LLMs) in text understanding and generation, their potential for text clustering tasks remains underexplored. We observed that powerful closed-source LLMs provide good quality clusterings of entity sets but are not scalable due to the massive compute power required and the associated costs. Thus, we propose CACTUS (Context-Aware ClusTering with aUgmented triplet losS), a systematic approach that leverages open-source LLMs for efficient and effective supervised clustering of entity subsets, particularly focusing on text-based entities. Existing text clustering methods fail to effectively capture the context provided by the entity subset. Moreover, though there are several language modeling based approaches for clustering, very few are designed for the task of supervised clustering. This paper introduces a novel approach towards clustering entity subsets using LLMs by capturing context via a scalable inter-entity attention mechanism. We propose a novel augmented triplet loss function tailored for supervised clustering, which addresses the inherent challenges of directly applying the triplet loss to this problem. Furthermore, we introduce a self-supervised clustering task based on text augmentation techniques to improve the generalization of our model. For evaluation, we collect ground truth clusterings from a closed-source LLM and transfer this knowledge to an open-source LLM under the supervised clustering framework, allowing a faster and cheaper open-source model to perform the same task. Experiments on various e-commerce query and product clustering datasets demonstrate that our proposed approach significantly outperforms existing unsupervised and supervised baselines under various external clustering evaluation metrics.

[123]  arXiv:2405.00989 (cross-list from cs.CV) [pdf, ps, other]
Title: Estimate the building height at a 10-meter resolution based on Sentinel data
Authors: Xin Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Building height is an important indicator for scientific research and practical application. However, building height products with a high spatial resolution (10m) are still very scarce. To meet the needs of high-resolution building height estimation models, this study established a set of spatial-spectral-temporal feature databases, combining SAR data provided by Sentinel-1, optical data provided by Sentinel-2, and shape data provided by building footprints. The statistical indicators on the time scale are extracted to form a rich database of 160 features. This study combined with permutation feature importance, Shapley Additive Explanations, and Random Forest variable importance, and the final stable features are obtained through an expert scoring system. This study took 12 large, medium, and small cities in the United States as the training data. It used moving windows to aggregate the pixels to solve the impact of SAR image displacement and building shadows. This study built a building height model based on a random forest model and compared three model ensemble methods of bagging, boosting, and stacking. To evaluate the accuracy of the prediction results, this study collected Lidar data in the test area, and the evaluation results showed that its R-Square reached 0.78, which can prove that the building height can be obtained effectively. The fast production of high-resolution building height data can support large-scale scientific research and application in many fields.

[124]  arXiv:2405.01002 (cross-list from cs.CV) [pdf, other]
Title: Spider: A Unified Framework for Context-dependent Concept Understanding
Comments: Accepted by ICML 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovation. Since there is a strong coupling relationship between foreground and background context in CD tasks, existing methods require to train separate models in their focused domains. This restricts their real-world CD concept understanding towards artificial general intelligence (AGI). We propose a unified model with a single set of parameters, Spider, which only needs to be trained once. With the help of the proposed concept filter driven by the image-mask group prompt, Spider is able to understand and distinguish diverse strong context-dependent concepts to accurately capture the Prompter's intention. Without bells and whistles, Spider significantly outperforms the state-of-the-art specialized models in 8 different context-dependent segmentation tasks, including 4 natural scenes (salient, camouflaged, and transparent objects and shadow) and 4 medical lesions (COVID-19, polyp, breast, and skin lesion with color colonoscopy, CT, ultrasound, and dermoscopy modalities). Besides, Spider shows obvious advantages in continuous learning. It can easily complete the training of new tasks by fine-tuning parameters less than 1\% and bring a tolerable performance degradation of less than 5\% for all old tasks. The source code will be publicly available at \href{https://github.com/Xiaoqi-Zhao-DLUT/Spider-UniCDSeg}{Spider-UniCDSeg}.

[125]  arXiv:2405.01004 (cross-list from cs.SD) [pdf, ps, other]
Title: Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Recent transformer-based ASR models have achieved word-error rates (WER) below 4%, surpassing human annotator accuracy, yet they demand extensive server resources, contributing to significant carbon footprints. The traditional server-based architecture of ASR also presents privacy concerns, alongside reliability and latency issues due to network dependencies. In contrast, on-device (edge) ASR enhances privacy, boosts performance, and promotes sustainability by effectively balancing energy use and accuracy for specific applications. This study examines the effects of quantization, memory demands, and energy consumption on the performance of various ASR model inference on the NVIDIA Jetson Orin Nano. By analyzing WER and transcription speed across models using FP32, FP16, and INT8 quantization on clean and noisy datasets, we highlight the crucial trade-offs between accuracy, speeds, quantization, energy efficiency, and memory needs. We found that changing precision from fp32 to fp16 halves the energy consumption for audio transcription across different models, with minimal performance degradation. A larger model size and number of parameters neither guarantees better resilience to noise, nor predicts the energy consumption for a given transcription load. These, along with several other findings offer novel insights for optimizing ASR systems within energy- and memory-limited environments, crucial for the development of efficient on-device ASR solutions. The code and input data needed to reproduce the results in this article are open sourced are available on [https://github.com/zzadiues3338/ASR-energy-jetson].

[126]  arXiv:2405.01015 (cross-list from stat.ML) [pdf, other]
Title: Network reconstruction via the minimum description length principle
Authors: Tiago P. Peixoto
Comments: 17 pages, 10 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Populations and Evolution (q-bio.PE)

A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting, and produces an inferred network with a statistically justifiable number of edges. The status quo in this context is based on $L_{1}$ regularization combined with cross-validation. As we demonstrate, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity with weight "shrinkage". This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster to employ, as it requires a single fit to the complete data. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of edges to be known in advance. We also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving in the order of $10^{4}$ to $10^{5}$ species, and demonstrate how the inferred model can be used to predict the outcome of interventions in the system.

[127]  arXiv:2405.01029 (cross-list from cs.AI) [pdf, other]
Title: MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
Comments: Accepted at ICML 2024
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Learning to solve vehicle routing problems (VRPs) has garnered much attention. However, most neural solvers are only structured and trained independently on a specific problem, making them less generic and practical. In this paper, we aim to develop a unified neural solver that can cope with a range of VRP variants simultaneously. Specifically, we propose a multi-task vehicle routing solver with mixture-of-experts (MVMoE), which greatly enhances the model capacity without a proportional increase in computation. We further develop a hierarchical gating mechanism for the MVMoE, delivering a good trade-off between empirical performance and computational complexity. Experimentally, our method significantly promotes the zero-shot generalization performance on 10 unseen VRP variants, and showcases decent results on the few-shot setting and real-world benchmark instances. We further provide extensive studies on the effect of MoE configurations in solving VRPs. Surprisingly, the hierarchical gating can achieve much better out-of-distribution generalization performance. The source code is available at: https://github.com/RoyalSkye/Routing-MVMoE.

[128]  arXiv:2405.01035 (cross-list from cs.GT) [pdf, other]
Title: LOQA: Learning with Opponent Q-Learning Awareness
Comments: accepted to ICLR but still not in proceedings this https URL
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In various real-world scenarios, interactions among agents often resemble the dynamics of general-sum games, where each agent strives to optimize its own utility. Despite the ubiquitous relevance of such settings, decentralized machine learning algorithms have struggled to find equilibria that maximize individual utility while preserving social welfare. In this paper we introduce Learning with Opponent Q-Learning Awareness (LOQA), a novel, decentralized reinforcement learning algorithm tailored to optimizing an agent's individual utility while fostering cooperation among adversaries in partially competitive environments. LOQA assumes the opponent samples actions proportionally to their action-value function Q. Experimental results demonstrate the effectiveness of LOQA at achieving state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner's Dilemma and the Coin Game. LOQA achieves these outcomes with a significantly reduced computational footprint, making it a promising approach for practical multi-agent applications.

[129]  arXiv:2405.01054 (cross-list from cs.RO) [pdf, other]
Title: Continual Learning for Robust Gate Detection under Dynamic Lighting in Autonomous Drone Racing
Comments: 8 pages, 6 figures, in 2024 International Joint Conference on Neural Networks (IJCNN)
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The proposed technique relies upon a lightweight neural network backbone augmented with capabilities for continual learning. The envisaged approach amalgamates predictions of the gates' positional coordinates, distance, and orientation, encapsulating them into a cohesive pose tuple. A comprehensive number of tests serve to underscore the efficacy of this approach in confronting diverse and challenging scenarios, specifically those involving variable lighting conditions. The proposed methodology exhibits notable robustness in the face of illumination variations, thereby substantiating its effectiveness.

[130]  arXiv:2405.01063 (cross-list from cs.IR) [pdf, other]
Title: Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach
Comments: 8 pages, 5 figures
Subjects: Information Retrieval (cs.IR); Computers and Society (cs.CY); Machine Learning (cs.LG)

As recommender systems are indispensable in various domains such as job searching and e-commerce, providing equitable recommendations to users with different sensitive attributes becomes an imperative requirement. Prior approaches for enhancing fairness in recommender systems presume the availability of all sensitive attributes, which can be difficult to obtain due to privacy concerns or inadequate means of capturing these attributes. In practice, the efficacy of these approaches is limited, pushing us to investigate ways of promoting fairness with limited sensitive attribute information.
Toward this goal, it is important to reconstruct missing sensitive attributes. Nevertheless, reconstruction errors are inevitable due to the complexity of real-world sensitive attribute reconstruction problems and legal regulations. Thus, we pursue fair learning methods that are robust to reconstruction errors. To this end, we propose Distributionally Robust Fair Optimization (DRFO), which minimizes the worst-case unfairness over all potential probability distributions of missing sensitive attributes instead of the reconstructed one to account for the impact of the reconstruction errors. We provide theoretical and empirical evidence to demonstrate that our method can effectively ensure fairness in recommender systems when only limited sensitive attributes are accessible.

[131]  arXiv:2405.01098 (cross-list from quant-ph) [pdf, ps, other]
Title: Multivariate trace estimation using quantum state space linear algebra
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Numerical Analysis (math.NA)

In this paper, we present a quantum algorithm for approximating multivariate traces, i.e. the traces of matrix products. Our research is motivated by the extensive utility of multivariate traces in elucidating spectral characteristics of matrices, as well as by recent advancements in leveraging quantum computing for faster numerical linear algebra. Central to our approach is a direct translation of a multivariate trace formula into a quantum circuit, achieved through a sequence of low-level circuit construction operations. To facilitate this translation, we introduce \emph{quantum Matrix States Linear Algebra} (qMSLA), a framework tailored for the efficient generation of state preparation circuits via primitive matrix algebra operations. Our algorithm relies on sets of state preparation circuits for input matrices as its primary inputs and yields two state preparation circuits encoding the multivariate trace as output. These circuits are constructed utilizing qMSLA operations, which enact the aforementioned multivariate trace formula. We emphasize that our algorithm's inputs consist solely of state preparation circuits, eschewing harder to synthesize constructs such as Block Encodings. Furthermore, our approach operates independently of the availability of specialized hardware like QRAM, underscoring its versatility and practicality.

[132]  arXiv:2405.01108 (cross-list from cs.CV) [pdf, other]
Title: Federated Learning with Heterogeneous Data Handling for Robust Vehicular Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In the pursuit of refining precise perception models for fully autonomous driving, continual online model training becomes essential. Federated Learning (FL) within vehicular networks offers an efficient mechanism for model training while preserving raw sensory data integrity. Yet, FL struggles with non-identically distributed data (e.g., quantity skew), leading to suboptimal convergence rates during model training. In previous work, we introduced FedLA, an innovative Label-Aware aggregation method addressing data heterogeneity in FL for generic scenarios.
In this paper, we introduce FedProx+LA, a novel FL method building upon the state-of-the-art FedProx and FedLA to tackle data heterogeneity, which is specifically tailored for vehicular networks. We evaluate the efficacy of FedProx+LA in continuous online object detection model training. Through a comparative analysis against conventional and state-of-the-art methods, our findings reveal the superior convergence rate of FedProx+LA. Notably, if the label distribution is very heterogeneous, our FedProx+LA approach shows substantial improvements in detection performance compared to baseline methods, also outperforming our previous FedLA approach. Moreover, both FedLA and FedProx+LA increase convergence speed by 30% compared to baseline methods.

[133]  arXiv:2405.01109 (cross-list from math.NA) [pdf, other]
Title: Hypergraph $p$-Laplacian regularization on point clouds for data interpolation
Comments: 33 pages
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Analysis of PDEs (math.AP)

As a generalization of graphs, hypergraphs are widely used to model higher-order relations in data. This paper explores the benefit of the hypergraph structure for the interpolation of point cloud data that contain no explicit structural information. We define the $\varepsilon_n$-ball hypergraph and the $k_n$-nearest neighbor hypergraph on a point cloud and study the $p$-Laplacian regularization on the hypergraphs. We prove the variational consistency between the hypergraph $p$-Laplacian regularization and the continuum $p$-Laplacian regularization in a semisupervised setting when the number of points $n$ goes to infinity while the number of labeled points remains fixed. A key improvement compared to the graph case is that the results rely on weaker assumptions on the upper bound of $\varepsilon_n$ and $k_n$. To solve the convex but non-differentiable large-scale optimization problem, we utilize the stochastic primal-dual hybrid gradient algorithm. Numerical experiments on data interpolation verify that the hypergraph $p$-Laplacian regularization outperforms the graph $p$-Laplacian regularization in preventing the development of spikes at the labeled points.

[134]  arXiv:2405.01124 (cross-list from stat.ML) [pdf, other]
Title: Investigating Self-Supervised Image Denoising with Denaturation
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Statistics Theory (math.ST)

Self-supervised learning for image denoising problems in the presence of denaturation for noisy data is a crucial approach in machine learning. However, theoretical understanding of the performance of the approach that uses denatured data is lacking. To provide better understanding of the approach, in this paper, we analyze a self-supervised denoising algorithm that uses denatured data in depth through theoretical analysis and numerical experiments. Through the theoretical analysis, we discuss that the algorithm finds desired solutions to the optimization problem with the population risk, while the guarantee for the empirical risk depends on the hardness of the denoising task in terms of denaturation levels. We also conduct several experiments to investigate the performance of an extended algorithm in practice. The results indicate that the algorithm training with denatured images works, and the empirical performance aligns with the theoretical results. These results suggest several insights for further improvement of self-supervised image denoising that uses denatured data in future directions.

[135]  arXiv:2405.01134 (cross-list from cs.RO) [pdf, other]
Title: Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space
Comments: Accepted for publication at the 2024 International Conference on Space Robotics (iSpaRo) | The source code is available at this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The ability to autonomously assemble structures is crucial for the development of future space infrastructure. However, the unpredictable conditions of space pose significant challenges for robotic systems, necessitating the development of advanced learning techniques to enable autonomous assembly. In this study, we present a novel approach for learning autonomous peg-in-hole assembly in the context of space robotics. Our focus is on enhancing the generalization and adaptability of autonomous systems through deep reinforcement learning. By integrating procedural generation and domain randomization, we train agents in a highly parallelized simulation environment across a spectrum of diverse scenarios with the aim of acquiring a robust policy. The proposed approach is evaluated using three distinct reinforcement learning algorithms to investigate the trade-offs among various paradigms. We demonstrate the adaptability of our agents to novel scenarios and assembly sequences while emphasizing the potential of leveraging advanced simulation techniques for robot learning in space. Our findings set the stage for future advancements in intelligent robotic systems capable of supporting ambitious space missions and infrastructure development beyond Earth.

[136]  arXiv:2405.01200 (cross-list from eess.SY) [pdf, other]
Title: Learning-to-solve unit commitment based on few-shot physics-guided spatial-temporal graph convolution network
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

This letter proposes a few-shot physics-guided spatial temporal graph convolutional network (FPG-STGCN) to fast solve unit commitment (UC). Firstly, STGCN is tailored to parameterize UC. Then, few-shot physics-guided learning scheme is proposed. It exploits few typical UC solutions yielded via commercial optimizer to escape from local minimum, and leverages the augmented Lagrangian method for constraint satisfaction. To further enable both feasibility and continuous relaxation for integers in learning process, straight-through estimator for Tanh-Sign composition is proposed to fully differentiate the mixed integer solution space. Case study on the IEEE benchmark justifies that, our method bests mainstream learning ways on UC feasibility, and surpasses traditional solver on efficiency.

[137]  arXiv:2405.01233 (cross-list from q-fin.MF) [pdf, other]
Title: Mathematics of Differential Machine Learning in Derivative Pricing and Hedging
Subjects: Mathematical Finance (q-fin.MF); Machine Learning (cs.LG); Computational Finance (q-fin.CP)

This article introduces the groundbreaking concept of the financial differential machine learning algorithm through a rigorous mathematical framework. Diverging from existing literature on financial machine learning, the work highlights the profound implications of theoretical assumptions within financial models on the construction of machine learning algorithms.
This endeavour is particularly timely as the finance landscape witnesses a surge in interest towards data-driven models for the valuation and hedging of derivative products. Notably, the predictive capabilities of neural networks have garnered substantial attention in both academic research and practical financial applications.
The approach offers a unified theoretical foundation that facilitates comprehensive comparisons, both at a theoretical level and in experimental outcomes. Importantly, this theoretical grounding lends substantial weight to the experimental results, affirming the differential machine learning method's optimality within the prevailing context.
By anchoring the insights in rigorous mathematics, the article bridges the gap between abstract financial concepts and practical algorithmic implementations.

[138]  arXiv:2405.01242 (cross-list from cs.SD) [pdf, other]
Title: TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.

[139]  arXiv:2405.01249 (cross-list from cs.CL) [pdf, ps, other]
Title: Prompt engineering paradigms for medical applications: scoping review and recommendations for better practices
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Prompt engineering is crucial for harnessing the potential of large language models (LLMs), especially in the medical domain where specialized terminology and phrasing is used. However, the efficacy of prompt engineering in the medical domain remains to be explored. In this work, 114 recent studies (2022-2024) applying prompt engineering in medicine, covering prompt learning (PL), prompt tuning (PT), and prompt design (PD) are reviewed. PD is the most prevalent (78 articles). In 12 papers, PD, PL, and PT terms were used interchangeably. ChatGPT is the most commonly used LLM, with seven papers using it for processing sensitive clinical data. Chain-of-Thought emerges as the most common prompt engineering technique. While PL and PT articles typically provide a baseline for evaluating prompt-based approaches, 64% of PD studies lack non-prompt-related baselines. We provide tables and figures summarizing existing work, and reporting recommendations to guide future research contributions.

[140]  arXiv:2405.01277 (cross-list from cs.HC) [pdf, other]
Title: Quantifying Spatial Domain Explanations in BCI using Earth Mover's Distance
Comments: 8 pages, 3 figures, 3 tables, draft of the accepted work at IJCNN, WCCI 2024
Subjects: Human-Computer Interaction (cs.HC); Emerging Technologies (cs.ET); Machine Learning (cs.LG)

Brain-computer interface (BCI) systems facilitate unique communication between humans and computers, benefiting severely disabled individuals. Despite decades of research, BCIs are not fully integrated into clinical and commercial settings. It's crucial to assess and explain BCI performance, offering clear explanations for potential users to avoid frustration when it doesn't work as expected. This work investigates the efficacy of different deep learning and Riemannian geometry-based classification models in the context of motor imagery (MI) based BCI using electroencephalography (EEG). We then propose an optimal transport theory-based approach using earth mover's distance (EMD) to quantify the comparison of the feature relevance map with the domain knowledge of neuroscience. For this, we utilized explainable AI (XAI) techniques for generating feature relevance in the spatial domain to identify important channels for model outcomes. Three state-of-the-art models are implemented - 1) Riemannian geometry-based classifier, 2) EEGNet, and 3) EEG Conformer, and the observed trend in the model's accuracy across different architectures on the dataset correlates with the proposed feature relevance metrics. The models with diverse architectures perform significantly better when trained on channels relevant to motor imagery than data-driven channel selection. This work focuses attention on the necessity for interpretability and incorporating metrics beyond accuracy, underscores the value of combining domain knowledge and quantifying model interpretations with data-driven approaches in creating reliable and robust Brain-Computer Interfaces (BCIs).

[141]  arXiv:2405.01284 (cross-list from cs.RO) [pdf, other]
Title: Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning
Authors: Liu Qiyuan
Comments: 50 pages, 30 figures, Final Year Project Report at Nanyang Technological University, Singapore This article is an NTU FYP report. The formal paper is still in the preparation process
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

The existing Motion Imitation models typically require expert data obtained through MoCap devices, but the vast amount of training data needed is difficult to acquire, necessitating substantial investments of financial resources, manpower, and time. This project combines 3D human pose estimation with reinforcement learning, proposing a novel model that simplifies Motion Imitation into a prediction problem of joint angle values in reinforcement learning. This significantly reduces the reliance on vast amounts of training data, enabling the agent to learn an imitation policy from just a few seconds of video and exhibit strong generalization capabilities. It can quickly apply the learned policy to imitate human arm motions in unfamiliar videos. The model first extracts skeletal motions of human arms from a given video using 3D human pose estimation. These extracted arm motions are then morphologically retargeted onto a robotic manipulator. Subsequently, the retargeted motions are used to generate reference motions. Finally, these reference motions are used to formulate a reinforcement learning problem, enabling the agent to learn a policy for imitating human arm motions. This project excels at imitation tasks and demonstrates robust transferability, accurately imitating human arm motions from other unfamiliar videos. This project provides a lightweight, convenient, efficient, and accurate Motion Imitation model. While simplifying the complex process of Motion Imitation, it achieves notably outstanding performance.

[142]  arXiv:2405.01292 (cross-list from math.OC) [pdf, ps, other]
Title: Koopman Data-Driven Predictive Control with Robust Stability and Recursive Feasibility Guarantees
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)

In this paper, we consider the design of data-driven predictive controllers for nonlinear systems from input-output data via linear-in-control input Koopman lifted models. Instead of identifying and simulating a Koopman model to predict future outputs, we design a subspace predictive controller in the Koopman space. This allows us to learn the observables minimizing the multi-step output prediction error of the Koopman subspace predictor, preventing the propagation of prediction errors. To avoid losing feasibility of our predictive control scheme due to prediction errors, we compute a terminal cost and terminal set in the Koopman space and we obtain recursive feasibility guarantees through an interpolated initial state. As a third contribution, we introduce a novel regularization cost yielding input-to-state stability guarantees with respect to the prediction error for the resulting closed-loop system. The performance of the developed Koopman data-driven predictive control methodology is illustrated on a nonlinear benchmark example from the literature.

[143]  arXiv:2405.01299 (cross-list from cs.CL) [pdf, other]
Title: The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation
Comments: LREC-COLING NLPerspectives workshop
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large Language Models (LLMs) have emerged as powerful support tools across various natural language tasks and a range of application domains. Recent studies focus on exploring their capabilities for data annotation. This paper provides a comparative overview of twelve studies investigating the potential of LLMs in labelling data. While the models demonstrate promising cost and time-saving benefits, there exist considerable limitations, such as representativeness, bias, sensitivity to prompt variations and English language preference. Leveraging insights from these studies, our empirical analysis further examines the alignment between human and GPT-generated opinion distributions across four subjective datasets. In contrast to the studies examining representation, our methodology directly obtains the opinion distribution from GPT. Our analysis thereby supports the minority of studies that are considering diverse perspectives when evaluating data annotation tasks and highlights the need for further research in this direction.

[144]  arXiv:2405.01314 (cross-list from eess.SY) [pdf, other]
Title: Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

We address a joint trajectory planning, user association, resource allocation, and power control problem to maximize proportional fairness in the aerial IoT network, considering practical end-to-end quality-of-service (QoS) and communication schedules. Though the problem is rather ancient, apart from the fact that the previous approaches have never considered user- and time-specific QoS, we point out a prevalent mistake in coordinate optimization approaches adopted by the majority of the literature. Coordinate optimization approaches, which repetitively optimize radio resources for a fixed trajectory and vice versa, generally converge to local optima when all variables are differentiable. However, these methods often stagnate at a non-stationary point, significantly degrading the network utility in mixed-integer problems such as joint trajectory and radio resource optimization. We detour this problem by converting the formulated problem into the Markov decision process (MDP). Exploiting the beneficial characteristics of the MDP, we design a non-iterative framework that cooperatively optimizes trajectory and radio resources without initial trajectory choice. The proposed framework can incorporate various trajectory planning algorithms such as the genetic algorithm, tree search, and reinforcement learning. Extensive comparisons with diverse baselines verify that the proposed framework significantly outperforms the state-of-the-art method, nearly achieving the global optimum. Our implementation code is available at https://github.com/hslyu/dbspf.

[145]  arXiv:2405.01392 (cross-list from cs.RO) [pdf, other]
Title: LLMSat: A Large Language Model-Based Goal-Oriented Agent for Autonomous Space Exploration
Authors: David Maranto
Comments: B.A.Sc thesis
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Space Physics (physics.space-ph)

As spacecraft journey further from Earth with more complex missions, systems of greater autonomy and onboard intelligence are called for. Reducing reliance on human-based mission control becomes increasingly critical if we are to increase our rate of solar-system-wide exploration. Recent work has explored AI-based goal-oriented systems to increase the level of autonomy in mission execution. These systems make use of symbolic reasoning managers to make inferences from the state of a spacecraft and a handcrafted knowledge base, enabling autonomous generation of tasks and re-planning. Such systems have proven to be successful in controlled cases, but they are difficult to implement as they require human-crafted ontological models to allow the spacecraft to understand the world. Reinforcement learning has been applied to train robotic agents to pursue a goal. A new architecture for autonomy is called for. This work explores the application of Large Language Models (LLMs) as the high-level control system of a spacecraft. Using a systems engineering approach, this work presents the design and development of an agentic spacecraft controller by leveraging an LLM as a reasoning engine, to evaluate the utility of such an architecture in achieving higher levels of spacecraft autonomy. A series of deep space mission scenarios simulated within the popular game engine Kerbal Space Program (KSP) are used as case studies to evaluate the implementation against the requirements. It is shown the reasoning and planning abilities of present-day LLMs do not scale well as the complexity of a mission increases, but this can be alleviated with adequate prompting frameworks and strategic selection of the agent's level of authority over the host spacecraft. This research evaluates the potential of LLMs in augmenting autonomous decision-making systems for future robotic space applications.

[146]  arXiv:2405.01402 (cross-list from cs.RO) [pdf, other]
Title: Learning Force Control for Legged Manipulation
Comments: This work has been accepted to ICRA24, as well as the Loco-manipulation workshop at ICRA24
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)

Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing. We showcase our method on a whole-body control platform of a quadruped robot with an arm. Such force control enables us to perform gravity compensation and impedance control, unlocking compliant whole-body manipulation. The learned whole-body controller with variable compliance makes it intuitive for humans to teleoperate the robot by only commanding the manipulator, and the robot's body adjusts automatically to achieve the desired position and force. Consequently, a human teleoperator can easily demonstrate a wide variety of loco-manipulation tasks. To the best of our knowledge, we provide the first deployment of learned whole-body force control in legged manipulators, paving the way for more versatile and adaptable legged robots.

[147]  arXiv:2405.01404 (cross-list from stat.ML) [pdf, other]
Title: Random Pareto front surfaces
Comments: The code is available at: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Methodology (stat.ME)

The Pareto front of a set of vectors is the subset which is comprised solely of all of the best trade-off points. By interpolating this subset, we obtain the optimal trade-off surface. In this work, we prove a very useful result which states that all Pareto front surfaces can be explicitly parametrised using polar coordinates. In particular, our polar parametrisation result tells us that we can fully characterise any Pareto front surface using the length function, which is a scalar-valued function that returns the projected length along any positive radial direction. Consequently, by exploiting this representation, we show how it is possible to generalise many useful concepts from linear algebra, probability and statistics, and decision theory to function over the space of Pareto front surfaces. Notably, we focus our attention on the stochastic setting where the Pareto front surface itself is a stochastic process. Among other things, we showcase how it is possible to define and estimate many statistical quantities of interest such as the expectation, covariance and quantile of any Pareto front surface distribution. As a motivating example, we investigate how these statistics can be used within a design of experiments setting, where the goal is to both infer and use the Pareto front surface distribution in order to make effective decisions. Besides this, we also illustrate how these Pareto front ideas can be used within the context of extreme value theory. Finally, as a numerical example, we applied some of our new methodology on a real-world air pollution data set.

[148]  arXiv:2405.01413 (cross-list from cs.CV) [pdf, other]
Title: MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
Comments: 17 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their success, large 3D point cloud-language models (3D-LLMs) also integrate point clouds into LLMs. However, directly aligning point clouds with LLM requires expensive training costs, typically in hundreds of GPU-hours on A100, which hinders the development of 3D-LLMs. In this paper, we introduce MiniGPT-3D, an efficient and powerful 3D-LLM that achieves multiple SOTA results while training for only 27 hours on one RTX 3090. Specifically, we propose to align 3D point clouds with LLMs using 2D priors from 2D-LLMs, which can leverage the similarity between 2D and 3D visual information. We introduce a novel four-stage training strategy for modality alignment in a cascaded way, and a mixture of query experts module to adaptively aggregate features with high efficiency. Moreover, we utilize parameter-efficient fine-tuning methods LoRA and Norm fine-tuning, resulting in only 47.8M learnable parameters, which is up to 260x fewer than existing methods. Extensive experiments show that MiniGPT-3D achieves SOTA on 3D object classification and captioning tasks, with significantly cheaper training costs. Notably, MiniGPT-3D gains an 8.12 increase on GPT-4 evaluation score for the challenging object captioning task compared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800. We are the first to explore the efficient 3D-LLM, offering new insights to the community. Code and weights are available at https://github.com/TangYuan96/MiniGPT-3D.

[149]  arXiv:2405.01425 (cross-list from cs.DS) [pdf, other]
Title: In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies
Comments: 32 pages
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output than previously known, namely in R\'enyi divergence (which implies TV, $\mathcal{W}_2$, KL, $\chi^2$). The proof departs from known approaches for polytime algorithms for the problem -- we utilize a stochastic diffusion perspective to show contraction to the target distribution with the rate of convergence determined by functional isoperimetric constants of the stationary density.

[150]  arXiv:2405.01435 (cross-list from cs.NI) [pdf, other]
Title: Closed-form congestion control via deep symbolic regression
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

As mobile networks embrace the 5G era, the interest in adopting Reinforcement Learning (RL) algorithms to handle challenges in ultra-low-latency and high throughput scenarios increases. Simultaneously, the advent of packetized fronthaul networks imposes demanding requirements that traditional congestion control mechanisms cannot accomplish, highlighting the potential of RL-based congestion control algorithms. Although learning RL policies optimized for satisfying the stringent fronthaul requirements is feasible, the adoption of neural network models in real deployments still poses some challenges regarding real-time inference and interpretability. This paper proposes a methodology to deal with such challenges while maintaining the performance and generalization capabilities provided by a baseline RL policy. The method consists of (1) training a congestion control policy specialized in fronthaul-like networks via reinforcement learning, (2) collecting state-action experiences from the baseline, and (3) performing deep symbolic regression on the collected dataset. The proposed process overcomes the challenges related to inference-time limitations through closed-form expressions that approximate the baseline performance (link utilization, delay, and fairness) and which can be directly implemented in any programming language. Finally, we analyze the inner workings of the closed-form expressions.

[151]  arXiv:2405.01440 (cross-list from cs.RO) [pdf, other]
Title: A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving
Comments: Accepted at "Interaction-driven Behavior Prediction and Planning for Autonomous Vehicles" workshop in 35th IEEE Intelligent Vehicles Symposium (IV 2024)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Reinforcement learning has emerged as an important approach for autonomous driving. A reward function is used in reinforcement learning to establish the learned skill objectives and guide the agent toward the optimal policy. Since autonomous driving is a complex domain with partly conflicting objectives with varying degrees of priority, developing a suitable reward function represents a fundamental challenge. This paper aims to highlight the gap in such function design by assessing different proposed formulations in the literature and dividing individual objectives into Safety, Comfort, Progress, and Traffic Rules compliance categories. Additionally, the limitations of the reviewed reward functions are discussed, such as objectives aggregation and indifference to driving context. Furthermore, the reward categories are frequently inadequately formulated and lack standardization. This paper concludes by proposing future research that potentially addresses the observed shortcomings in rewards, including a reward validation framework and structured rewards that are context-aware and able to resolve conflicts.

[152]  arXiv:2405.01453 (cross-list from cs.AI) [pdf, other]
Title: Creative Problem Solving in Large Language and Vision Models -- What Would it Take?
Comments: 9 pages, 7 figures, 2 tables
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, we discuss approaches for integrating Computational Creativity (CC) with research in large language and vision models (LLVMs) to address a key limitation of these models, i.e., creative problem solving. We present preliminary experiments showing how CC principles can be applied to address this limitation through augmented prompting. With this work, we hope to foster discussions of Computational Creativity in the context of ML algorithms for creative problem solving in LLVMs. Our code is at: https://github.com/lnairGT/creative-problem-solving-LLMs

[153]  arXiv:2405.01458 (cross-list from cs.CL) [pdf, other]
Title: UQA: Corpus for Urdu Question Answering
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)

This paper introduces UQA, a novel dataset for question answering and text comprehension in Urdu, a low-resource language with over 70 million native speakers. UQA is generated by translating the Stanford Question Answering Dataset (SQuAD2.0), a large-scale English QA dataset, using a technique called EATS (Enclose to Anchor, Translate, Seek), which preserves the answer spans in the translated context paragraphs. The paper describes the process of selecting and evaluating the best translation model among two candidates: Google Translator and Seamless M4T. The paper also benchmarks several state-of-the-art multilingual QA models on UQA, including mBERT, XLM-RoBERTa, and mT5, and reports promising results. For XLM-RoBERTa-XL, we have an F1 score of 85.99 and 74.56 EM. UQA is a valuable resource for developing and testing multilingual NLP systems for Urdu and for enhancing the cross-lingual transferability of existing models. Further, the paper demonstrates the effectiveness of EATS for creating high-quality datasets for other languages and domains. The UQA dataset and the code are publicly available at www.github.com/sameearif/UQA.

[154]  arXiv:2405.01460 (cross-list from cs.CR) [pdf, other]
Title: Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders
Comments: Accepted by ICML 2024
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Defenses against these poisoning attacks can be categorized based on whether specific interventions are adopted during training. The first approach is training-time defense, such as adversarial training, which can mitigate poisoning effects but is computationally intensive. The other approach is pre-training purification, e.g., image short squeezing, which consists of several simple compressions but often encounters challenges in dealing with various UEs. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method. Firstly, we uncover rate-constrained variational autoencoders (VAEs), demonstrating a clear tendency to suppress the perturbations in UEs. We subsequently conduct a theoretical analysis for this phenomenon. Building upon these insights, we introduce a disentangle variational autoencoder (D-VAE), capable of disentangling the perturbations with learnable class-wise embeddings. Based on this network, a two-stage purification approach is naturally developed. The first stage focuses on roughly eliminating perturbations, while the second stage produces refined, poison-free results, ensuring effectiveness and robustness across various scenarios. Extensive experiments demonstrate the remarkable performance of our method across CIFAR-10, CIFAR-100, and a 100-class ImageNet-subset. Code is available at https://github.com/yuyi-sd/D-VAE.

[155]  arXiv:2405.01463 (cross-list from econ.EM) [pdf, ps, other]
Title: Dynamic Local Average Treatment Effects
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Methodology (stat.ME)

We consider Dynamic Treatment Regimes (DTRs) with one sided non-compliance that arise in applications such as digital recommendations and adaptive medical trials. These are settings where decision makers encourage individuals to take treatments over time, but adapt encouragements based on previous encouragements, treatments, states, and outcomes. Importantly, individuals may choose to (not) comply with a treatment recommendation, whenever it is made available to them, based on unobserved confounding factors. We provide non-parametric identification, estimation, and inference for Dynamic Local Average Treatment Effects, which are expected values of multi-period treatment contrasts among appropriately defined complier subpopulations. Under standard assumptions in the Instrumental Variable and DTR literature, we show that one can identify local average effects of contrasts that correspond to offering treatment at any single time step. Under an additional cross-period effect-compliance independence assumption, which is satisfied in Staggered Adoption settings and a generalization of them, which we define as Staggered Compliance settings, we identify local average treatment effects of treating in multiple time periods.

[156]  arXiv:2405.01481 (cross-list from cs.CL) [pdf, other]
Title: NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Comments: 13 pages, 4 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to using hundreds of GPUs for training. NeMo-Aligner comes with highly optimized and scalable implementations for major paradigms of model alignment such as: Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), SteerLM, and Self-Play Fine-Tuning (SPIN). Additionally, our toolkit supports running most of the alignment techniques in a Parameter Efficient Fine-Tuning (PEFT) setting. NeMo-Aligner is designed for extensibility, allowing support for other alignment techniques with minimal effort. It is open-sourced with Apache 2.0 License and we invite community contributions at https://github.com/NVIDIA/NeMo-Aligner

[157]  arXiv:2405.01484 (cross-list from cs.HC) [pdf, other]
Title: Designing Algorithmic Recommendations to Achieve Human-AI Complementarity
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)

Algorithms frequently assist, rather than replace, human decision-makers. However, the design and analysis of algorithms often focus on predicting outcomes and do not explicitly model their effect on human decisions. This discrepancy between the design and role of algorithmic assistants becomes of particular concern in light of empirical evidence that suggests that algorithmic assistants again and again fail to improve human decisions. In this article, we formalize the design of recommendation algorithms that assist human decision-makers without making restrictive ex-ante assumptions about how recommendations affect decisions. We formulate an algorithmic-design problem that leverages the potential-outcomes framework from causal inference to model the effect of recommendations on a human decision-maker's binary treatment choice. Within this model, we introduce a monotonicity assumption that leads to an intuitive classification of human responses to the algorithm. Under this monotonicity assumption, we can express the human's response to algorithmic recommendations in terms of their compliance with the algorithm and the decision they would take if the algorithm sends no recommendation. We showcase the utility of our framework using an online experiment that simulates a hiring task. We argue that our approach explains the relative performance of different recommendation algorithms in the experiment, and can help design solutions that realize human-AI complementarity.

[158]  arXiv:2405.01491 (cross-list from physics.chem-ph) [pdf, other]
Title: FeNNol: an Efficient and Flexible Library for Building Force-field-enhanced Neural Network Potentials
Subjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG)

Neural network interatomic potentials (NNPs) have recently proven to be powerful tools to accurately model complex molecular systems while bypassing the high numerical cost of ab-initio molecular dynamics simulations. In recent years, numerous advances in model architectures as well as the development of hybrid models combining machine-learning (ML) with more traditional, physically-motivated, force-field interactions have considerably increased the design space of ML potentials. In this paper, we present FeNNol, a new library for building, training and running force-field-enhanced neural network potentials. It provides a flexible and modular system for building hybrid models, allowing to easily combine state-of-the-art embeddings with ML-parameterized physical interaction terms without the need for explicit programming. Furthermore, FeNNol leverages the automatic differentiation and just-in-time compilation features of the Jax Python library to enable fast evaluation of NNPs, shrinking the performance gap between ML potentials and standard force-fields. This is demonstrated with the popular ANI-2x model reaching simulation speeds nearly on par with the AMOEBA polarizable force-field on commodity GPUs (GPU=Graphics processing unit). We hope that FeNNol will facilitate the development and application of new hybrid NNP architectures for a wide range of molecular simulation problems.

[159]  arXiv:2405.01494 (cross-list from cs.CV) [pdf, other]
Title: Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Federated learning (FL) enables multiple clients to train models collectively while preserving data privacy. However, FL faces challenges in terms of communication cost and data heterogeneity. One-shot federated learning has emerged as a solution by reducing communication rounds, improving efficiency, and providing better security against eavesdropping attacks. Nevertheless, data heterogeneity remains a significant challenge, impacting performance. This work explores the effectiveness of diffusion models in one-shot FL, demonstrating their applicability in addressing data heterogeneity and improving FL performance. Additionally, we investigate the utility of our diffusion model approach, FedDiff, compared to other one-shot FL methods under differential privacy (DP). Furthermore, to improve generated sample quality under DP settings, we propose a pragmatic Fourier Magnitude Filtering (FMF) method, enhancing the effectiveness of generated data for global model training.

[160]  arXiv:2405.01502 (cross-list from cs.CL) [pdf, other]
Title: Analyzing the Role of Semantic Representations in the Era of Large Language Models
Comments: NAACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Traditionally, natural language processing (NLP) models often use a rich set of features created by linguistic expertise, such as semantic representations. However, in the era of large language models (LLMs), more and more tasks are turned into generic, end-to-end sequence generation problems. In this paper, we investigate the question: what is the role of semantic representations in the era of LLMs? Specifically, we investigate the effect of Abstract Meaning Representation (AMR) across five diverse NLP tasks. We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT, and find that it generally hurts performance more than it helps. To investigate what AMR may have to offer on these tasks, we conduct a series of analysis experiments. We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions, named entities, and in the final inference step where the LLM must connect its reasoning over the AMR to its prediction. We recommend focusing on these areas for future work in semantic representations for LLMs. Our code: https://github.com/causalNLP/amr_llm.

[161]  arXiv:2405.01521 (cross-list from cs.CV) [pdf, other]
Title: Transformer-Aided Semantic Communications
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

The transformer structure employed in large language models (LLMs), as a specialized category of deep neural networks (DNNs) featuring attention mechanisms, stands out for their ability to identify and highlight the most relevant aspects of input data. Such a capability is particularly beneficial in addressing a variety of communication challenges, notably in the realm of semantic communication where proper encoding of the relevant data is critical especially in systems with limited bandwidth. In this work, we employ vision transformers specifically for the purpose of compression and compact representation of the input image, with the goal of preserving semantic information throughout the transmission process. Through the use of the attention mechanism inherent in transformers, we create an attention mask. This mask effectively prioritizes critical segments of images for transmission, ensuring that the reconstruction phase focuses on key objects highlighted by the mask. Our methodology significantly improves the quality of semantic communication and optimizes bandwidth usage by encoding different parts of the data in accordance with their semantic information content, thus enhancing overall efficiency. We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset, focusing on both reconstruction quality and accuracy. Our evaluation results demonstrate that our framework successfully preserves semantic information, even when only a fraction of the encoded data is transmitted, according to the intended compression rates.

[162]  arXiv:2405.01536 (cross-list from cs.CV) [pdf, other]
Title: Customizing Text-to-Image Models with a Single Image Pair
Comments: project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to customize a generative model to capture the demonstrated stylistic difference. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process. Unlike existing methods that learn to mimic a single concept from a collection of images, our method captures the stylistic difference between paired images. This allows us to apply a stylistic change without overfitting to the specific image content in the examples. To address this new task, we employ a joint optimization method that explicitly separates the style and content into distinct LoRA weight spaces. We optimize these style and content weights to reproduce the style and content images while encouraging their orthogonality. During inference, we modify the diffusion process via a new style guidance based on our learned weights. Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair.

[163]  arXiv:2405.01538 (cross-list from cs.CV) [pdf, other]
Title: Multi-Space Alignments Towards Universal LiDAR Segmentation
Comments: CVPR 2024; 33 pages, 14 figures, 14 tables; Code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)

A unified and versatile LiDAR segmentation model with strong robustness and generalizability is desirable for safe autonomous driving perception. This work presents M3Net, a one-of-a-kind framework for fulfilling multi-task, multi-dataset, multi-modality LiDAR segmentation in a universal manner using just a single set of parameters. To better exploit data volume and diversity, we first combine large-scale driving datasets acquired by different types of sensors from diverse scenes and then conduct alignments in three spaces, namely data, feature, and label spaces, during the training. As a result, M3Net is capable of taming heterogeneous data for training state-of-the-art LiDAR segmentation models. Extensive experiments on twelve LiDAR segmentation datasets verify our effectiveness. Notably, using a shared set of parameters, M3Net achieves 75.1%, 83.1%, and 72.4% mIoU scores, respectively, on the official benchmarks of SemanticKITTI, nuScenes, and Waymo Open.

Replacements for Fri, 3 May 24

[164]  arXiv:1511.05240 (replaced) [pdf, ps, other]
Title: An extension of McDiarmid's inequality
Authors: Richard Combes
Comments: 14 pages
Subjects: Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)
[165]  arXiv:2108.01295 (replaced) [pdf, other]
Title: MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning
Subjects: Machine Learning (cs.LG)
[166]  arXiv:2205.11787 (replaced) [pdf, other]
Title: Quadratic models for understanding catapult dynamics of neural networks
Comments: accepted in ICLR 2024; changed the title
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[167]  arXiv:2207.05209 (replaced) [pdf, other]
Title: Fourier Neural Operator with Learned Deformations for PDEs on General Geometries
Journal-ref: Journal of Machine Learning Research (2023) Volume 24, Issue 1, Article No. 388, pp 18593-18618
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
[168]  arXiv:2209.02555 (replaced) [pdf, other]
Title: Finite-Time Error Bounds for Greedy-GQ
Comments: Accepted by Springer Machine Learning Journal
Subjects: Machine Learning (cs.LG)
[169]  arXiv:2210.01422 (replaced) [pdf, other]
Title: Time-Varying Propensity Score to Bridge the Gap between the Past and Present
Comments: Published at ICLR 2024
Subjects: Machine Learning (cs.LG)
[170]  arXiv:2211.01025 (replaced) [pdf, other]
Title: DynamicLight: Two-Stage Dynamic Traffic Signal Timing
Comments: 18 pages, 5 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[171]  arXiv:2211.11695 (replaced) [pdf, other]
Title: Disentangled Representation Learning
Comments: 29 pages, 12 Figures, 4 Tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[172]  arXiv:2301.10226 (replaced) [pdf, other]
Title: A Watermark for Large Language Models
Comments: 13 pages in the main body. Published at ICML 2023. Code is available at github.com/jwkirchenbauer/lm-watermarking
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[173]  arXiv:2303.00638 (replaced) [pdf, other]
Title: MEGA-DAgger: Imitation Learning with Multiple Imperfect Experts
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
[174]  arXiv:2303.10828 (replaced) [pdf, other]
Title: DataLight: Offline Data-Driven Traffic Signal Control
Comments: 15 pages, 3 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[175]  arXiv:2303.11786 (replaced) [pdf, other]
Title: Skeleton Regression: A Graph-Based Approach to Estimation with Manifold Structure
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
[176]  arXiv:2306.04634 (replaced) [pdf, other]
Title: On the Reliability of Watermarks for Large Language Models
Comments: 9 pages in the main body. Published at ICLR 2024. Code is available at this https URL
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[177]  arXiv:2306.06545 (replaced) [pdf, other]
Title: A Probabilistic Framework for Modular Continual Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[178]  arXiv:2306.11201 (replaced) [pdf, other]
Title: Adaptive Federated Learning with Auto-Tuned Clients
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
[179]  arXiv:2308.02001 (replaced) [pdf, other]
Title: Memory capacity of two layer neural networks with smooth activations
Comments: V3: the result was generalized to activations which are real analytic at a point by including a bias vector. The presentation and rigor were also improved
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[180]  arXiv:2309.01837 (replaced) [pdf, other]
Title: Delegating Data Collection in Decentralized Machine Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[181]  arXiv:2309.16391 (replaced) [pdf, other]
Title: 2-Cats: 2D Copula Approximating Transforms
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[182]  arXiv:2310.02246 (replaced) [pdf, other]
Title: Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances
Comments: ICLR 2024 Spotlight
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[183]  arXiv:2310.03150 (replaced) [pdf, other]
Title: Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly
Comments: Camera-ready version for DEEM'24. Please cite the official ACM paper via this https URL
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[184]  arXiv:2310.04420 (replaced) [pdf, other]
Title: BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity
Comments: ICLR 2024. Project page: this https URL
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
[185]  arXiv:2310.18948 (replaced) [pdf, other]
Title: Probabilistic Feature Augmentation for AIS-Based Multi-Path Long-Term Vessel Trajectory Forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Probability (math.PR)
[186]  arXiv:2311.04843 (replaced) [pdf, other]
Title: Bridging Dimensions: Confident Reachability for High-Dimensional Controllers
Subjects: Machine Learning (cs.LG)
[187]  arXiv:2311.14495 (replaced) [pdf, other]
Title: StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Comments: 27 pages, 7 figures, ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Dynamical Systems (math.DS)
[188]  arXiv:2312.04469 (replaced) [pdf, other]
Title: On the Learnability of Watermarks for Language Models
Comments: Accepted at ICLR 2024
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[189]  arXiv:2312.05905 (replaced) [pdf, other]
Title: Improving Subgraph-GNNs via Edge-Level Ego-Network Encodings
Comments: TMLR, graph neural networks, weisfeiler-lehman, expressivity, higher-order GNNs, 3-WL, 1-WL, edge-level, ego-networks
Journal-ref: Nurudin Alvarez-Gonzalez, Andreas Kaltenbrunner, Vicen\c{c} Gomez. Improving Subgraph-GNNs via Edge-Level Ego-Network Encodings. In Transactions on Machine Learning Research, 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[190]  arXiv:2312.06786 (replaced) [pdf, other]
Title: Mixture-of-Linear-Experts for Long-term Time Series Forecasting
Journal-ref: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4672-4680, 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[191]  arXiv:2312.11875 (replaced) [pdf, other]
Title: Sparse is Enough in Fine-tuning Pre-trained Large Language Models
Comments: Accepted at ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[192]  arXiv:2312.14331 (replaced) [pdf, other]
Title: Maximum entropy GFlowNets with soft Q-learning
Journal-ref: 2024 Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2593-2601
Subjects: Machine Learning (cs.LG)
[193]  arXiv:2312.16427 (replaced) [pdf, other]
Title: Learning to Embed Time Series Patches Independently
Comments: ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[194]  arXiv:2401.15240 (replaced) [pdf, ps, other]
Title: Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games
Comments: AISTATS 2024 Oral
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
[195]  arXiv:2402.01344 (replaced) [pdf, other]
Title: Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks
Comments: Accepted to ICML 2024
Subjects: Machine Learning (cs.LG)
[196]  arXiv:2402.08290 (replaced) [pdf, other]
Title: The Effect of Data Poisoning on Counterfactual Explanations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[197]  arXiv:2402.12812 (replaced) [pdf, other]
Title: Scalable Decentralized Algorithms for Online Personalized Mean Estimation
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[198]  arXiv:2402.13505 (replaced) [pdf, other]
Title: SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning
Comments: ICML2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[199]  arXiv:2402.16075 (replaced) [pdf, other]
Title: Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[200]  arXiv:2403.00877 (replaced) [pdf, other]
Title: Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
[201]  arXiv:2403.04764 (replaced) [pdf, other]
Title: TS-RSR: A provably efficient approach for batch bayesian optimization
Authors: Zhaolin Ren, Na Li
Comments: Revised presentation and organization of theoretical results
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[202]  arXiv:2403.07815 (replaced) [pdf, other]
Title: Chronos: Learning the Language of Time Series
Comments: Code and model checkpoints available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[203]  arXiv:2403.12237 (replaced) [pdf, other]
Title: Efficient Transformer-based Hyper-parameter Optimization for Resource-constrained IoT Environments
Comments: 7 pages, Submitted to IEEE Internet of Things Magazine
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[204]  arXiv:2403.15989 (replaced) [pdf, other]
Title: Knowledge-guided Machine Learning: Current Trends and Future Prospects
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
[205]  arXiv:2403.19050 (replaced) [pdf, other]
Title: Detecting Generative Parroting through Overfitting Masked Autoencoders
Comments: Accepted to CVPR 2024, Responsible Generative AI workshop
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[206]  arXiv:2403.19561 (replaced) [pdf, other]
Title: Self-Improved Learning for Scalable Neural Combinatorial Optimization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[207]  arXiv:2404.07446 (replaced) [pdf, other]
Title: Graph Attention Network for Lane-Wise and Topology-Invariant Intersection Traffic Simulation
Comments: T-TIS Journal, 12 pages, 8 figures, 4 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[208]  arXiv:2404.11568 (replaced) [pdf, other]
Title: On the Scalability of GNNs for Molecular Graphs
Subjects: Machine Learning (cs.LG)
[209]  arXiv:2404.13663 (replaced) [pdf, other]
Title: Cumulative Hazard Function Based Efficient Multivariate Temporal Point Process Learning
Authors: Bingqing Liu
Comments: 8 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[210]  arXiv:2404.15656 (replaced) [pdf, other]
Title: MISLEAD: Manipulating Importance of Selected features for Learning Epsilon in Evasion Attack Deception
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[211]  arXiv:2404.16444 (replaced) [pdf, other]
Title: Automating the Discovery of Partial Differential Equations in Dynamical Systems
Comments: 18 pages, 6 figures, 1 table
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Applications (stat.AP); Machine Learning (stat.ML)
[212]  arXiv:2404.18311 (replaced) [pdf, ps, other]
Title: Towards Real-time Learning in Large Language Models: A Critical Review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[213]  arXiv:2404.19484 (replaced) [pdf, other]
Title: More Compute Is What You Need
Authors: Zhen Guo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[214]  arXiv:2404.19756 (replaced) [pdf, other]
Title: KAN: Kolmogorov-Arnold Networks
Comments: 48 pages, 20 figures. Codes are available at this https URL
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[215]  arXiv:2405.00142 (replaced) [pdf, other]
Title: Utilizing Machine Learning and 3D Neuroimaging to Predict Hearing Loss: A Comparative Analysis of Dimensionality Reduction and Regression Techniques
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[216]  arXiv:2209.10368 (replaced) [pdf, other]
Title: USC: Uncompromising Spatial Constraints for Safety-Oriented 3D Object Detectors in Autonomous Driving
Comments: 8 pages (IEEE double column format), 7 figures, 2 tables, submitted to ITSC 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[217]  arXiv:2211.09325 (replaced) [pdf, other]
Title: TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
Comments: Conference on Robot Learning (CoRL), 2022. Supplementary material is available at this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[218]  arXiv:2212.09201 (replaced) [pdf, other]
Title: Spectral Regularized Kernel Two-Sample Tests
Comments: 75 pages, to be published in the Annals of Statistics
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
[219]  arXiv:2304.06027 (replaced) [pdf, other]
Title: Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA
Comments: Transactions on Machine Learning Research (TMLR) 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[220]  arXiv:2305.05281 (replaced) [pdf, other]
Title: Causal Discovery via Conditional Independence Testing with Proxy Variables
Comments: ICML 2024
Subjects: Methodology (stat.ME); Machine Learning (cs.LG)
[221]  arXiv:2305.16965 (replaced) [pdf, other]
Title: Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling
Comments: full version; IJCAI 2024 accepted (main track)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[222]  arXiv:2305.19001 (replaced) [pdf, other]
Title: High-probability sample complexities for policy evaluation with linear function approximation
Comments: The first two authors contributed equally; paper accepted to IEEE Transactions on Information Theory
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
[223]  arXiv:2306.10835 (replaced) [pdf, other]
Title: Online Dynamic Submodular Optimization
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[224]  arXiv:2308.16471 (replaced) [pdf, other]
Title: Foundational Policy Acquisition via Multitask Learning for Motor Skill Generation
Comments: 11 pages, 6 figures
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[225]  arXiv:2309.00733 (replaced) [pdf, other]
Title: TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models
Comments: Accepted to ICLR 2024, Reliable and Responsible Foundation Models workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[226]  arXiv:2309.08793 (replaced) [pdf, other]
Title: Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation
Comments: 8 pages, 4 figures, 4 tables
Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
[227]  arXiv:2309.10621 (replaced) [pdf, other]
Title: Large language models can accurately predict searcher preferences
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[228]  arXiv:2309.11295 (replaced) [pdf, other]
Title: CPLLM: Clinical Prediction with Large Language Models
Comments: v2
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[229]  arXiv:2310.05921 (replaced) [pdf, other]
Title: Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions
Comments: 8 pages, 5 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Robotics (cs.RO); Methodology (stat.ME)
[230]  arXiv:2310.07815 (replaced) [pdf, other]
Title: Language Models As Semantic Indexers
Comments: 10 pages, 5 appendix pages
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
[231]  arXiv:2311.09253 (replaced) [pdf, other]
Title: The Perception-Robustness Tradeoff in Deterministic Image Restoration
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)
[232]  arXiv:2311.18044 (replaced) [pdf, other]
Title: Transfer Learning in Robotics: An Upcoming Breakthrough? A Review of Promises and Challenges
Comments: 21 pages, 7 figures
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[233]  arXiv:2312.07516 (replaced) [pdf, other]
Title: Learning finitely correlated states: stability of the spectral reconstruction
Comments: 27+7 pages, 6 figures. Typos corrected, improved presentation
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
[234]  arXiv:2312.13933 (replaced) [pdf, other]
Title: Structured Probabilistic Coding
Comments: 11 pages, accepted by AAAI 2024 (Oral)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[235]  arXiv:2312.16503 (replaced) [pdf, other]
Title: Attention-Enhanced Reservoir Computing
Subjects: Emerging Technologies (cs.ET); Machine Learning (cs.LG)
[236]  arXiv:2401.01404 (replaced) [pdf, other]
Title: Scalable network reconstruction in subquadratic time
Authors: Tiago P. Peixoto
Comments: 12 pages, 7 figures
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Computation (stat.CO); Machine Learning (stat.ML)
[237]  arXiv:2401.11632 (replaced) [pdf, other]
Title: What Are We Optimizing For? A Human-centric Evaluation of Deep Learning-based Movie Recommenders
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[238]  arXiv:2401.15502 (replaced) [pdf, other]
Title: Differentially private Bayesian tests
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[239]  arXiv:2402.02972 (replaced) [pdf, other]
Title: Retrieval-Augmented Score Distillation for Text-to-3D Generation
Comments: Accepted to ICML 2024 / Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[240]  arXiv:2402.03008 (replaced) [pdf, other]
Title: Diffusive Gibbs Sampling
Comments: Accepted for publication at ICML 2024. Code available: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
[241]  arXiv:2402.04930 (replaced) [pdf, other]
Title: Blue noise for diffusion models
Comments: SIGGRAPH 2024 Conference Proceedings; Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[242]  arXiv:2402.07031 (replaced) [pdf, other]
Title: Instance-Level Safety-Aware Fidelity of Synthetic Data and Its Calibration
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[243]  arXiv:2402.11355 (replaced) [pdf, other]
Title: Converting Representational Counterfactuals to Natural Language
Comments: Preprint
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
[244]  arXiv:2402.14095 (replaced) [pdf, other]
Title: Zero-shot generalization across architectures for visual classification
Comments: Accepted as a Tiny Paper at ICLR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[245]  arXiv:2403.03792 (replaced) [pdf, other]
Title: Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks
Comments: v0.2
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[246]  arXiv:2403.15527 (replaced) [pdf, other]
Title: Conformal online model aggregation
Comments: 22 pages, 12 figures. arXiv admin note: text overlap with arXiv:2401.09379
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[247]  arXiv:2404.04916 (replaced) [pdf, other]
Title: Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder
Comments: Accepted by ICML 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[248]  arXiv:2404.10727 (replaced) [pdf, other]
Title: How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model
Comments: 9 pages, 6 figures
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
[249]  arXiv:2404.10830 (replaced) [pdf, other]
Title: Fewer Truncations Improve Language Modeling
Comments: ICML 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[250]  arXiv:2404.14619 (replaced) [pdf, other]
Title: OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Comments: Minor corrections
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[251]  arXiv:2404.18253 (replaced) [pdf, other]
Title: Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment
Authors: Tengjun Huang
Comments: Accepted by the Twelfth International Conference on Learning Representations (ICLR) Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[252]  arXiv:2404.18362 (replaced) [pdf, other]
Title: Physics-informed Convolutional Neural Network for Microgrid Economic Dispatch
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
[253]  arXiv:2405.00332 (replaced) [pdf, other]
Title: A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[ total of 253 entries: 1-253 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help  (Access key information)