We gratefully acknowledge support from
the Simons Foundation and member institutions.

Optimization and Control

New submissions

[ total of 38 entries: 1-38 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 3 May 24

[1]  arXiv:2405.00891 [pdf, other]
Title: An interacting particle consensus method for constrained global optimization
Subjects: Optimization and Control (math.OC); Analysis of PDEs (math.AP); Numerical Analysis (math.NA)

This paper presents a particle-based optimization method designed for addressing minimization problems with equality constraints, particularly in cases where the loss function exhibits non-differentiability or non-convexity. The proposed method combines components from consensus-based optimization algorithm with a newly introduced forcing term directed at the constraint set. A rigorous mean-field limit of the particle system is derived, and the convergence of the mean-field limit to the constrained minimizer is established. Additionally, we introduce a stable discretized algorithm and conduct various numerical experiments to demonstrate the performance of the proposed method.

[2]  arXiv:2405.00911 [pdf, other]
Title: Stabilization of infinite-dimensional systems under quantization and packet loss
Authors: Masashi Wakaiki
Comments: 26 pages, 8 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

We study the problem of stabilizing infinite-dimensional systems with input and output quantization. The closed-loop system we consider is subject to packet loss in the sensor-to-controller channels, whose duration is assumed to be averagely bounded. Given a bound on the initial state, we propose design methods for dynamic quantizers with zoom parameters. We show that the closed-loop state staring in a given region exponentially converges to zero if the bounds of quantization errors and packet-loss duration satisfy suitable conditions. Since the norms of the operators representing the system dynamics are used in the proposed quantization schemes, we also present methods for approximately computing the operator norms.

[3]  arXiv:2405.00914 [pdf, other]
Title: Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization
Authors: Chris Junchi Li
Comments: arXiv admin note: text overlap with arXiv:2307.00126
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper presents a new algorithm member for accelerating first-order methods for bilevel optimization, namely the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation}, abbreviated as \texttt{(P)RAF${}^2$BA}. The algorithm leverages \emph{fully} first-order oracles and seeks approximate stationary points in nonconvex-strongly-convex bilevel optimization, enhancing oracle complexity for efficient optimization. Theoretical guarantees for finding approximate first-order stationary points and second-order stationary points at the state-of-the-art query complexities are established, showcasing their effectiveness in solving complex optimization tasks. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms. The significance of \texttt{(P)RAF${}^2$BA} in optimizing nonconvex-strongly-convex bilevel optimization problems is underscored by its state-of-the-art convergence rates and computational efficiency.

[4]  arXiv:2405.00947 [pdf, ps, other]
Title: Co-Optimization of EV Charging Control and Incentivization for Enhanced Power System Stability
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

We study how high charging rate demands from electric vehicles (EVs) in a power distribution grid may collectively cause its dynamic instability, and, accordingly, how a price incentivization strategy can be used to steer customers to settle for lesser charging rate demands so that these instabilities can be avoided. We pose the problem as a joint optimization and optimal control formulation. The optimization determines the optimal charging setpoints for EVs to minimize the $\mathcal{H}_2$-norm of the transfer function of the grid model, while the optimal control simultaneously develops a linear quadratic regulator (LQR) based state-feedback control signal for the battery-currents of those EVs to jointly minimize the risk of grid instability. A subsequent algorithm is developed to determine how much customers may be willing to sacrifice their intended charging rate demands in return for financial incentives. Results are derived for both unidirectional and bidirectional charging, and validated using numerical simulations of multiple EV charging stations in the IEEE 33-bus power distribution model.

[5]  arXiv:2405.01047 [pdf, ps, other]
Title: Optimal Pricing for Linear-Quadratic Games with Nonlinear Interaction Between Agents
Comments: 7 pages, 2 figures, revisions under IEEE Control Systems Letters
Subjects: Optimization and Control (math.OC); Computer Science and Game Theory (cs.GT)

This paper studies a class of network games with linear-quadratic payoffs and externalities exerted through a strictly concave interaction function. This class of game is motivated by the diminishing marginal effects with peer influences. We analyze the optimal pricing strategy for this class of network game. First, we prove the existence of a unique Nash Equilibrium (NE). Second, we study the optimal pricing strategy of a monopolist selling a divisible good to agents. We show that the optimal pricing strategy, found by solving a bilevel optimization problem, is strictly better when the monopolist knows the network structure as opposed to the best strategy agnostic to network structure. Numerical experiments demonstrate that in most cases, the maximum revenue is achieved with an asymmetric network. These results contrast with the previously studied case of linear interaction function, where a network-independent price is proven optimal with symmetric networks. Lastly, we describe an efficient algorithm to find the optimal pricing strategy.

[6]  arXiv:2405.01123 [pdf, ps, other]
Title: On some global implicit function theorems for set-valued inclusions with applications to parametric vector optimization
Authors: Amos Uderzo
Subjects: Optimization and Control (math.OC)

The present paper deals with the perturbation analysis of set-valued inclusion problems, a problem format whose relevance has recently emerged in such contexts as robust and vector optimization as well as in vector equilibrium theory. The set-valued inclusions here considered are parameterized by variables belonging to a topological space, with and without constraints. By proper techniques of variational analysis, some qualitative global implicit function theorems are established, which ensure global solvability of these problems and continuous dependence on the parameter of the related solutions. Applications to parametric vector optimization are discussed, aimed at deriving sufficient conditions for the existence of ideal efficient solutions that depend continuously on the parameter perturbations.

[7]  arXiv:2405.01208 [pdf, ps, other]
Title: On generators of $k$-PSD closures of the positive semidefinite cone
Subjects: Optimization and Control (math.OC)

Positive semidefinite (PSD) cone is the cone of positive semidefinite matrices, and is the object of interest in semidefinite programming (SDP). A computational efficient approximation of the PSD cone is the $k$-PSD closure, $1 \leq k < n$, cone of $n\times n$ real symmetric matrices such that all of their $k\times k$ principal submatrices are positive semidefinite. For $k=1$, one obtains a polyhedral approximation, while $k=2$ yields a second order conic (SOC) approximation of the PSD cone. These approximations of the PSD cone have been used extensively in real-world applications such as AC Optimal Power Flow (ACOPF) to address computational inefficiencies where SDP relaxations are utilized for convexification the non-convexities. However a theoretical discussion about the geometry of these conic approximations of the PSD cone is rather sparse. In this short communication, we attempt to provide a characterization of some family of generators of the aforementioned conic approximations.

[8]  arXiv:2405.01232 [pdf, other]
Title: Kinetic Theories for Metropolis Monte Carlo Methods
Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA); Probability (math.PR)

We consider generalizations of the classical inverse problem to Bayesien type estimators, where the result is not one optimal parameter but an optimal probability distribution in parameter space. The practical computational tool to compute these distributions is the Metropolis Monte Carlo algorithm. We derive kinetic theories for the Metropolis Monte Carlo method in different scaling regimes. The derived equations yield a different point of view on the classical algorithm. It further inspired modifications to exploit the difference scalings shown on an simulation example of the Lorenz system.

[9]  arXiv:2405.01241 [pdf, ps, other]
Title: Port-Hamiltonian systems with energy and power ports
Comments: 6 pages
Subjects: Optimization and Control (math.OC); Mathematical Physics (math-ph); Symplectic Geometry (math.SG)

We extend the port-Hamiltonian framework defined with respect to a Lagrangian submanifold and a Dirac structure by augmenting the Lagrangian submanifold with the space of external variables. The new pair of conjugated variables is called energy port. We show that in the most general case, the extension describes constrained Hamiltonian systems whose Hamiltonian function depends on inputs.

[10]  arXiv:2405.01292 [pdf, ps, other]
Title: Koopman Data-Driven Predictive Control with Robust Stability and Recursive Feasibility Guarantees
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)

In this paper, we consider the design of data-driven predictive controllers for nonlinear systems from input-output data via linear-in-control input Koopman lifted models. Instead of identifying and simulating a Koopman model to predict future outputs, we design a subspace predictive controller in the Koopman space. This allows us to learn the observables minimizing the multi-step output prediction error of the Koopman subspace predictor, preventing the propagation of prediction errors. To avoid losing feasibility of our predictive control scheme due to prediction errors, we compute a terminal cost and terminal set in the Koopman space and we obtain recursive feasibility guarantees through an interpolated initial state. As a third contribution, we introduce a novel regularization cost yielding input-to-state stability guarantees with respect to the prediction error for the resulting closed-loop system. The performance of the developed Koopman data-driven predictive control methodology is illustrated on a nonlinear benchmark example from the literature.

[11]  arXiv:2405.01387 [pdf, ps, other]
Title: Lexicographic Optimization: Algorithms and Stability
Subjects: Optimization and Control (math.OC)

A lexicographic maximum of a set $X \subseteq \mathbb{R}^n$ is a vector in $X$ whose smallest component is as large as possible, and subject to that requirement, whose second smallest component is as large as possible, and so on for the third smallest component, etc. Lexicographic maximization has numerous practical and theoretical applications, including fair resource allocation, analyzing the implicit regularization of learning algorithms, and characterizing refinements of game-theoretic equilibria. We prove that a minimizer in $X$ of the exponential loss function $L_c(\mathbf{x}) = \sum_i \exp(-c x_i)$ converges to a lexicographic maximum of $X$ as $c \rightarrow \infty$, provided that $X$ is stable in the sense that a well-known iterative method for finding a lexicographic maximum of $X$ cannot be made to fail simply by reducing the required quality of each iterate by an arbitrarily tiny degree. Our result holds for both near and exact minimizers of the exponential loss, while earlier convergence results made much stronger assumptions about the set $X$ and only held for the exact minimizer. We are aware of no previous results showing a connection between the iterative method for computing a lexicographic maximum and exponential loss minimization. We show that every convex polytope is stable, but that there exist compact, convex sets that are not stable. We also provide the first analysis of the convergence rate of an exponential loss minimizer (near or exact) and discover a curious dichotomy: While the two smallest components of the vector converge to the lexicographically maximum values very quickly (at roughly the rate $\frac{\log n}{c}$), all other components can converge arbitrarily slowly.

[12]  arXiv:2405.01410 [pdf, other]
Title: Staggered Routing in Autonomous Mobility-on-Demand Systems
Comments: 32 pages, 10 figures
Subjects: Optimization and Control (math.OC)

In autonomous mobility-on-demand systems, effectively managing vehicle flows to mitigate induced congestion and ensure efficient operations is imperative for system performance and positive customer experience. Against this background, we study the potential of staggered routing, i.e., purposely delaying trip departures from a system perspective, in order to reduce congestion and ensure efficient operations while still meeting customer time windows. We formalize the underlying planning problem and show how to efficiently model it as a mixed integer linear program. Moreover, we present a matheuristic that allows us to efficiently solve large-scale real-world instances both in an offline full-information setting and its online rolling horizon counterpart. We conduct a numerical study for Manhattan, New York City, focusing on low- and highly-congested scenarios. Our results show that in low-congestion scenarios, staggering trip departures allows mitigating, on average, 94% of the induced congestion in a full information setting. In a rolling horizon setting, our algorithm allows us to reduce 90% of the induced congestion. In high-congestion scenarios, we observe an average reduction of 66% as the full information bound and an average reduction of 56% in our online setting. Surprisingly, we show that these reductions can be reached by shifting trip departures by a maximum of six minutes in both the low and high-congestion scenarios.

[13]  arXiv:2405.01424 [pdf, other]
Title: A Model Problem for First Order Mean Field Games with Discrete Initial Data
Subjects: Optimization and Control (math.OC)

In this article, we study a simplified version of a density-dependent first-order mean field game, in which the players face a penalization equal to the population density at their final position. We consider the problem of finding an equilibrium when the initial distribution is a discrete measure. We show that the problem becomes finite-dimensional: the final piecewise smooth density is completely determined by the weights and positions of the initial measure. We establish existence and uniqueness of a solution using classical fixed point theorems. Finally, we show that Newton's method provides an effective way to compute the solution. Our numerical simulations provide an illustration of how density penalization in a mean field game tends to the smoothen the initial distribution.

[14]  arXiv:2405.01438 [pdf, other]
Title: Solving the train-platforming problem via a two-level Lagrangian Relaxation approach
Subjects: Optimization and Control (math.OC)

High-speed railway stations are crucial junctions in high-speed railway networks. Compared to operations on the tracks between stations, trains have more routing possibilities within stations. As a result, track allocation at a station is relatively complicated. In this study, we aim to solve the train platforming problem for a busy high-speed railway station by considering comprehensive track resources and interlocking configurations. A two-level space-time network is constructed to capture infrastructure information at various levels of detail from both macroscopic and microscopic perspectives. Additionally, we propose a nonlinear programming model that minimizes a weighted sum of total travel time and total deviation time for trains at the station. We apply a Two-level Lagrangian Relaxation (2-L LR) to a linearized version of the model and demonstrate how this induces a decomposable train-specific path choice problem at the macroscopic level that is guided by Lagrange multipliers associated with microscopic resource capacity violation. As case studies, the proposed model and solution approach are applied to a small virtual railway station and a high-speed railway hub station located on the busiest high-speed railway line in China. Through a comparison of other approaches that include Logic-based Benders Decomposition (LBBD), we highlight the superiority of the proposed method; on realistic instances, the 2-L LR method finds solution that are, on average, approximately 2% from optimality. Finally, we test algorithm performance at the operational level and obtain near-optimal solutions, with optimality gaps of approximately 1%, in a very short time.

Cross-lists for Fri, 3 May 24

[15]  arXiv:2405.00782 (cross-list from math.DS) [pdf, other]
Title: Rigged Dynamic Mode Decomposition: Data-Driven Generalized Eigenfunction Decompositions for Koopman Operators
Subjects: Dynamical Systems (math.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Spectral Theory (math.SP)

We introduce the Rigged Dynamic Mode Decomposition (Rigged DMD) algorithm, which computes generalized eigenfunction decompositions of Koopman operators. By considering the evolution of observables, Koopman operators transform complex nonlinear dynamics into a linear framework suitable for spectral analysis. While powerful, traditional Dynamic Mode Decomposition (DMD) techniques often struggle with continuous spectra. Rigged DMD addresses these challenges with a data-driven methodology that approximates the Koopman operator's resolvent and its generalized eigenfunctions using snapshot data from the system's evolution. At its core, Rigged DMD builds wave-packet approximations for generalized Koopman eigenfunctions and modes by integrating Measure-Preserving Extended Dynamic Mode Decomposition with high-order kernels for smoothing. This provides a robust decomposition encompassing both discrete and continuous spectral elements. We derive explicit high-order convergence theorems for generalized eigenfunctions and spectral measures. Additionally, we propose a novel framework for constructing rigged Hilbert spaces using time-delay embedding, significantly extending the algorithm's applicability. We provide examples, including systems with a Lebesgue spectrum, integrable Hamiltonian systems, the Lorenz system, and a high-Reynolds number lid-driven flow in a two-dimensional square cavity, demonstrating Rigged DMD's convergence, efficiency, and versatility. This work paves the way for future research and applications of decompositions with continuous spectra.

[16]  arXiv:2405.00837 (cross-list from cs.LG) [pdf, other]
Title: Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations
Comments: 26 pages, 8 figures
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)

Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. Given a set of points $[\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n] = \mathbf{X} \in \mathbb{R}^{d \times n}$ and a vector $\mathbf{y} \in \mathbb{R}^d$, the goal is to find coefficients $\mathbf{w} \in \mathbb{R}^n$ so that $\mathbf{X} \mathbf{w} \approx \mathbf{y}$, subject to some desired structure on $\mathbf{w}$. In this work we seek $\mathbf{w}$ that forms a local reconstruction of $\mathbf{y}$ by solving a regularized least squares regression problem. We obtain local solutions through a locality function that promotes the use of columns of $\mathbf{X}$ that are close to $\mathbf{y}$ when used as a regularization term. We prove that, for all levels of regularization and under a mild condition that the columns of $\mathbf{X}$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$, thereby providing local sparse solutions when $d \ll n$. Under the same condition we also show that for any $\mathbf{y}$ contained in the convex hull of $\mathbf{X}$ there exists a regime of regularization parameter such that the optimal coefficients are supported on the vertices of the Delaunay simplex containing $\mathbf{y}$. This provides an interpretation of the sparsity as having structure obtained implicitly from the Delaunay triangulation of $\mathbf{X}$. We demonstrate that our locality regularized problem can be solved in comparable time to other methods that identify the containing Delaunay simplex.

[17]  arXiv:2405.00838 (cross-list from q-bio.GN) [pdf, other]
Title: Cross-modality Matching and Prediction of Perturbation Responses with Labeled Gromov-Wasserstein Optimal Transport
Comments: 16 pages, 4 figures, correspondence to Aviv Regev and Romain Lopez
Subjects: Genomics (q-bio.GN); Optimization and Control (math.OC)

It is now possible to conduct large scale perturbation screens with complex readout modalities, such as different molecular profiles or high content cell images. While these open the way for systematic dissection of causal cell circuits, integrated such data across screens to maximize our ability to predict circuits poses substantial computational challenges, which have not been addressed. Here, we extend two Gromov-Wasserstein Optimal Transport methods to incorporate the perturbation label for cross-modality alignment. The obtained alignment is then employed to train a predictive model that estimates cellular responses to perturbations observed with only one measurement modality. We validate our method for the tasks of cross-modality alignment and cross-modality prediction in a recent multi-modal single-cell perturbation dataset. Our approach opens the way to unified causal models of cell biology.

[18]  arXiv:2405.00842 (cross-list from math.ST) [pdf, other]
Title: Quickest Change Detection with Confusing Change
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC)

In the problem of quickest change detection (QCD), a change occurs at some unknown time in the distribution of a sequence of independent observations. This work studies a QCD problem where the change is either a bad change, which we aim to detect, or a confusing change, which is not of our interest. Our objective is to detect a bad change as quickly as possible while avoiding raising a false alarm for pre-change or a confusing change. We identify a specific set of pre-change, bad change, and confusing change distributions that pose challenges beyond the capabilities of standard Cumulative Sum (CuSum) procedures. Proposing novel CuSum-based detection procedures, S-CuSum and J-CuSum, leveraging two CuSum statistics, we offer solutions applicable across all kinds of pre-change, bad change, and confusing change distributions. For both S-CuSum and J-CuSum, we provide analytical performance guarantees and validate them by numerical results. Furthermore, both procedures are computationally efficient as they only require simple recursive updates.

[19]  arXiv:2405.00867 (cross-list from cs.RO) [pdf, other]
Title: A Convex Formulation of the Soft-Capture Problem
Comments: Accepted to ISpaRo24
Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)

We present a fast trajectory optimization algorithm for the soft capture of uncooperative tumbling space objects. Our algorithm generates safe, dynamically feasible, and minimum-fuel trajectories for a six-degree-of-freedom servicing spacecraft to achieve soft capture (near-zero relative velocity at contact) between predefined locations on the servicer spacecraft and target body. We solve a convex problem by enforcing a convex relaxation of the field-of-view constraint, followed by a sequential convex program correcting the trajectory for collision avoidance. The optimization problems can be solved with a standard second-order cone programming solver, making the algorithm both fast and practical for implementation in flight software. We demonstrate the performance and robustness of our algorithm in simulation over a range of object tumble rates up to 10{\deg}/s.

[20]  arXiv:2405.00951 (cross-list from cs.CV) [pdf, other]
Title: Hyperspectral Band Selection based on Generalized 3DTV and Tensor CUR Decomposition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA); Optimization and Control (math.OC)

Hyperspectral Imaging (HSI) serves as an important technique in remote sensing. However, high dimensionality and data volume typically pose significant computational challenges. Band selection is essential for reducing spectral redundancy in hyperspectral imagery while retaining intrinsic critical information. In this work, we propose a novel hyperspectral band selection model by decomposing the data into a low-rank and smooth component and a sparse one. In particular, we develop a generalized 3D total variation (G3DTV) by applying the $\ell_1^p$-norm to derivatives to preserve spatial-spectral smoothness. By employing the alternating direction method of multipliers (ADMM), we derive an efficient algorithm, where the tensor low-rankness is implied by the tensor CUR decomposition. We demonstrate the effectiveness of the proposed approach through comparisons with various other state-of-the-art band selection techniques using two benchmark real-world datasets. In addition, we provide practical guidelines for parameter selection in both noise-free and noisy scenarios.

[21]  arXiv:2405.00985 (cross-list from cs.LG) [pdf, other]
Title: Progressive Feedforward Collapse of ResNet Training
Comments: 14 pages, 5 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST)

Neural collapse (NC) is a simple and symmetric phenomenon for deep neural networks (DNNs) at the terminal phase of training, where the last-layer features collapse to their class means and form a simplex equiangular tight frame aligning with the classifier vectors. However, the relationship of the last-layer features to the data and intermediate layers during training remains unexplored. To this end, we characterize the geometry of intermediate layers of ResNet and propose a novel conjecture, progressive feedforward collapse (PFC), claiming the degree of collapse increases during the forward propagation of DNNs. We derive a transparent model for the well-trained ResNet according to that ResNet with weight decay approximates the geodesic curve in Wasserstein space at the terminal phase. The metrics of PFC indeed monotonically decrease across depth on various datasets. We propose a new surrogate model, multilayer unconstrained feature model (MUFM), connecting intermediate layers by an optimal transport regularizer. The optimal solution of MUFM is inconsistent with NC but is more concentrated relative to the input data. Overall, this study extends NC to PFC to model the collapse phenomenon of intermediate layers and its dependence on the input data, shedding light on the theoretical understanding of ResNet in classification problems.

[22]  arXiv:2405.01031 (cross-list from cs.LG) [pdf, other]
Title: The Privacy Power of Correlated Noise in Decentralized Learning
Comments: Accepted as conference paper at ICML 2024
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)

Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use.

[23]  arXiv:2405.01127 (cross-list from math.PR) [pdf, other]
Title: Backward Map for Filter Stability Analysis
Subjects: Probability (math.PR); Optimization and Control (math.OC)

In this paper, a backward map is introduced for the purposes of analysis of the nonlinear (stochastic) filter stability. The backward map is important because the filter-stability in the sense of $\chisq$-divergence follows from showing a certain variance decay property for the backward map. To show this property requires additional assumptions on the model properties of the hidden Markov model (HMM). The analysis in this paper is based on introducing a Poincar\'e Inequality (PI) for HMMs with white noise observations. In finite state-space settings, PI is related to both the ergodicity of the Markov process as well as the observability of the HMM. It is shown that the Poincar\'e constant is positive if and only if the HMM is detectable.

[24]  arXiv:2405.01229 (cross-list from cs.LG) [pdf, ps, other]
Title: Boosting Jailbreak Attack with Momentum
Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Optimization and Control (math.OC)

Large Language Models (LLMs) have achieved remarkable success across diverse tasks, yet they remain vulnerable to adversarial attacks, notably the well-documented \textit{jailbreak} attack. Recently, the Greedy Coordinate Gradient (GCG) attack has demonstrated efficacy in exploiting this vulnerability by optimizing adversarial prompts through a combination of gradient heuristics and greedy search. However, the efficiency of this attack has become a bottleneck in the attacking process. To mitigate this limitation, in this paper we rethink the generation of adversarial prompts through an optimization lens, aiming to stabilize the optimization process and harness more heuristic insights from previous iterations. Specifically, we introduce the \textbf{M}omentum \textbf{A}ccelerated G\textbf{C}G (\textbf{MAC}) attack, which incorporates a momentum term into the gradient heuristic. Experimental results showcase the notable enhancement achieved by MAP in gradient-based attacks on aligned language models. Our code is available at https://github.com/weizeming/momentum-attack-llm.

[25]  arXiv:2405.01404 (cross-list from stat.ML) [pdf, other]
Title: Random Pareto front surfaces
Comments: The code is available at: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Methodology (stat.ME)

The Pareto front of a set of vectors is the subset which is comprised solely of all of the best trade-off points. By interpolating this subset, we obtain the optimal trade-off surface. In this work, we prove a very useful result which states that all Pareto front surfaces can be explicitly parametrised using polar coordinates. In particular, our polar parametrisation result tells us that we can fully characterise any Pareto front surface using the length function, which is a scalar-valued function that returns the projected length along any positive radial direction. Consequently, by exploiting this representation, we show how it is possible to generalise many useful concepts from linear algebra, probability and statistics, and decision theory to function over the space of Pareto front surfaces. Notably, we focus our attention on the stochastic setting where the Pareto front surface itself is a stochastic process. Among other things, we showcase how it is possible to define and estimate many statistical quantities of interest such as the expectation, covariance and quantile of any Pareto front surface distribution. As a motivating example, we investigate how these statistics can be used within a design of experiments setting, where the goal is to both infer and use the Pareto front surface distribution in order to make effective decisions. Besides this, we also illustrate how these Pareto front ideas can be used within the context of extreme value theory. Finally, as a numerical example, we applied some of our new methodology on a real-world air pollution data set.

[26]  arXiv:2405.01480 (cross-list from cs.LG) [pdf, other]
Title: Common pitfalls to avoid while using multiobjective optimization in machine learning
Comments: 21 pages, 12 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Recently, there has been an increasing interest in exploring the application of multiobjective optimization (MOO) in machine learning (ML). The interest is driven by the numerous situations in real-life applications where multiple objectives need to be optimized simultaneously. A key aspect of MOO is the existence of a Pareto set, rather than a single optimal solution, which illustrates the inherent trade-offs between objectives. Despite its potential, there is a noticeable lack of satisfactory literature that could serve as an entry-level guide for ML practitioners who want to use MOO. Hence, our goal in this paper is to produce such a resource. We critically review previous studies, particularly those involving MOO in deep learning (using Physics-Informed Neural Networks (PINNs) as a guiding example), and identify misconceptions that highlight the need for a better grasp of MOO principles in ML. Using MOO of PINNs as a case study, we demonstrate the interplay between the data loss and the physics loss terms. We highlight the most common pitfalls one should avoid while using MOO techniques in ML. We begin by establishing the groundwork for MOO, focusing on well-known approaches such as the weighted sum (WS) method, alongside more complex techniques like the multiobjective gradient descent algorithm (MGDA). Additionally, we compare the results obtained from the WS and MGDA with one of the most common evolutionary algorithms, NSGA-II. We emphasize the importance of understanding the specific problem, the objective space, and the selected MOO method, while also noting that neglecting factors such as convergence can result in inaccurate outcomes and, consequently, a non-optimal solution. Our goal is to offer a clear and practical guide for ML practitioners to effectively apply MOO, particularly in the context of DL.

Replacements for Fri, 3 May 24

[27]  arXiv:2108.06740 (replaced) [pdf, ps, other]
Title: A fast iterative PDE-based algorithm for feedback controls of nonsmooth mean-field control problems
Comments: Accepted for publication by SIAM Journal on Scientific Computing
Subjects: Optimization and Control (math.OC)
[28]  arXiv:2207.09969 (replaced) [pdf, other]
Title: A Unified Approach to Evaluation and Routing in Public Transport Systems
Subjects: Optimization and Control (math.OC); Physics and Society (physics.soc-ph)
[29]  arXiv:2306.10835 (replaced) [pdf, other]
Title: Online Dynamic Submodular Optimization
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[30]  arXiv:2402.07064 (replaced) [pdf, other]
Title: Piecewise SOS-Convex Moment Optimization and Applications via Exact Semi-Definite Programs
Comments: Moment Optimization; Sum-of-Squares Convex Polynomials; Piecewise Functions; Generalized moment problems; Semi-Definite Programming
Subjects: Optimization and Control (math.OC)
[31]  arXiv:2205.11787 (replaced) [pdf, other]
Title: Quadratic models for understanding catapult dynamics of neural networks
Comments: accepted in ICLR 2024; changed the title
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[32]  arXiv:2303.15121 (replaced) [pdf, ps, other]
Title: Learning linear dynamical systems under convex constraints
Comments: 29 pages; added Example 4 (Lipschitz regression), Corollary 4 and Remark 2; corrected minor typos
Subjects: Statistics Theory (math.ST); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
[33]  arXiv:2305.19001 (replaced) [pdf, other]
Title: High-probability sample complexities for policy evaluation with linear function approximation
Comments: The first two authors contributed equally; paper accepted to IEEE Transactions on Information Theory
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
[34]  arXiv:2306.11201 (replaced) [pdf, other]
Title: Adaptive Federated Learning with Auto-Tuned Clients
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
[35]  arXiv:2308.02001 (replaced) [pdf, other]
Title: Memory capacity of two layer neural networks with smooth activations
Comments: V3: the result was generalized to activations which are real analytic at a point by including a bias vector. The presentation and rigor were also improved
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[36]  arXiv:2308.07275 (replaced) [pdf, other]
Title: On Semidefinite Relaxations for Matrix-Weighted State-Estimation Problems in Robotics
Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
[37]  arXiv:2401.15240 (replaced) [pdf, ps, other]
Title: Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games
Comments: AISTATS 2024 Oral
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
[38]  arXiv:2403.04764 (replaced) [pdf, other]
Title: TS-RSR: A provably efficient approach for batch bayesian optimization
Authors: Zhaolin Ren, Na Li
Comments: Revised presentation and organization of theoretical results
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[ total of 38 entries: 1-38 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, math, recent, 2405, contact, help  (Access key information)