Publications & Pre-Prints

PUBLICATIONS

An Optimal Structured Zeroth-order Algorithm for Non-smooth Optimization

2023/05/25

M Rando, C Molinari, L Rosasco, S Villa

Finite-difference methods are a class of algorithms designed to solve black-box optimization problems by approximating a gradient of the target function on a set of directions. In black-box optimization, the non-smooth setting is particularly relevant since, in practice, differentiability and smoothness assumptions cannot be verified. To cope with nonsmoothness, several authors use a smooth approximation of the target function and show that finite difference methods approximate its gradient. Recently, it has been proved that imposing a structure in the directions allows improving performance. However, only the smooth setting was considered. To close this gap, we introduce and analyze O-ZD, the first structured finite-difference algorithm for non-smooth black-box optimization. Our method exploits a smooth approximation of the target function and we prove that it approximates its gradient on a subset of random {\em orthogonal} directions. We analyze the convergence of O-ZD under different assumptions. For non-smooth convex functions, we obtain the optimal complexity. In the non-smooth non-convex setting, we characterize the number of iterations needed to bound the expected norm of the smoothed gradient. For smooth functions, our analysis recovers existing results for structured zeroth-order methods for the convex case and extends them to the non-convex setting. We conclude with numerical simulations where assumptions are satisfied, observing that our algorithm has very good practical performances.

Regularization properties of dual subgradient flow

2023/05/11

V Apidopoulos, C Molinari, L Rosasco, S Villa

Dual gradient descent combined with early stopping represents an efficient alternative to the Tikhonov variational approach when the regularizer is strongly convex. However, for many relevant applications, it is crucial to deal with regularizers which are only convex. In this setting, the dual problem is non smooth, and dual gradient descent cannot be used. In this paper, we study the regularization properties of a subgradient dual flow, and we show that the proposed procedure achieves the same recovery accuracy as penalization methods, while being more efficient from the computational perspective.

Zeroth-order optimization with orthogonal random directions

2022/8/2

David Kozak, Cesare Molinari, Lorenzo Rosasco, Luis Tenorio, Silvia Villa

We propose and analyze a randomized zeroth-order optimization method based on approximating the exact gradient by finite differences computed in a set of orthogonal random directions that changes with each iteration. A number of previously proposed methods are recovered as special cases including spherical smoothing, coordinate descent, as well as discretized gradient descent. Our main contribution is proving convergence guarantees as well as convergence rates under different parameter choices and assumptions. In particular, we consider convex objectives, but also possibly non-convex objectives satisfying the Polyak-Łojasiewicz (PL) condition. Theoretical results are complemented and illustrated by numerical experiments.

Stochastic Zeroth order Descent with Structured Directions

2022/6/10

Marco Rando, Cesare Molinari, Silvia Villa, Lorenzo Rosasco

We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach which approximates a stochastic gradient on a set of orthogonal directions, where is the dimension of the ambient space. These directions are randomly chosen, and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form for every , which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations. Our bound also shows the benefits of using multiple directions instead of one. For non-convex functions satisfying the Polyak-{\L}ojasiewicz condition, we establish the first convergence rates for stochastic zeroth order algorithms under such an assumption. We corroborate our theoretical findings in numerical simulations where assumptions are satisfied and on the real-world problem of hyper-parameter optimization, observing that S-SZD has very good practical performances.

Fast iterative regularization by reusing data

2022/4/21

Cristian Vega, Cesare Molinari, Lorenzo Rosasco, Silvia Villa

Discrete inverse problems correspond to solving a system of equations in a stable way with respect to noise in the data. A typical approach to enforce uniqueness and select a meaningful solution is to introduce a regularizer. While for most applications the regularizer is convex, in many cases it is not smooth nor strongly convex. In this paper, we propose and study two new iterative regularization methods, based on a primal-dual algorithm, to solve inverse problems efficiently. Our analysis, in the noise free case, provides convergence rates for the Lagrangian and the feasibility gap. In the noisy case, it provides stability bounds and early-stopping rules with theoretical guarantees. The main novelty of our work is the exploitation of some a priori knowledge about the solution set, i.e. redundant information. More precisely we show that the linear systems can be used more than once along the iteration. Despite the simplicity of the idea, we show that this procedure brings surprising advantages in the numerical applications. We discuss various approaches to take advantage of redundant information, that are at the same time consistent with our assumptions and flexible in the implementation. Finally, we illustrate our theoretical findings with numerical simulations for robust sparse recovery and image reconstruction through total variation. We confirm the efficiency of the proposed procedures, comparing the results with state-of-the-art methods.

Iterative regularization for low complexity regularizers

2022/2/1

Cesare Molinari, Mathurin Massias, Lorenzo Rosasco, Silvia Villa

Iterative regularization exploits the implicit bias of an optimization algorithm to regularize ill-posed problems. Constructing algorithms with such built-in regularization mechanisms is a classic challenge in inverse problems but also in modern machine learning, where it provides both a new perspective on algorithms analysis, and significant speed-ups compared to explicit regularization. In this work, we propose and study the first iterative regularization procedure able to handle biases described by non smooth and non strongly convex functionals, prominent in low-complexity regularization. Our approach is based on a primal-dual algorithm of which we analyze convergence and stability properties, even in the case where the original problem is unfeasible. The general results are illustrated considering the special case of sparse recovery with the penalty. Our theoretical results are complemented by experiments showing the computational benefits of our approach.

A stochastic bregman primal-dual splitting algorithm for composite optimization

2021/12/22

Antonio Silveti-Falls, Cesare Molinari, Jalal Fadili

We study a stochastic first order primal-dual method for solving convex-concave saddle point problems over real reflexive Banach spaces using Bregman divergences and relative smoothness assumptions, in which we allow for stochastic error in the computation of gradient terms within the algorithm. We show ergodic convergence in expectation of the Lagrangian optimality gap with a rate of O(1/k) and that every almost sure weak cluster point of the ergodic sequence is a saddle point in expectation under mild assumptions. Under slightly stricter assumptions, we show almost sure weak convergence of the pointwise iterates to a saddle point. Under a relative strong convexity assumption on the objective functions and a total convexity assumption on the entropies of the Bregman divergences, we establish almost sure strong convergence of the pointwise iterates to a saddle point. Our framework is general and does not need strong convexity of the entropies inducing the Bregman divergences in the algorithm. Numerical applications are considered including entropically regularized Wasserstein barycenter problems and regularized inverse problems on the simplex.

Iterative regularization for convex regularization

2021/10/31

Silvia Villa, Mathurin Massias, Cesare Molinari, Lorenzo Rosasco

I will present iterative regularization for linear inverse problems, when the regularizer is convex but not necessarily strongly convex. I will describe the stability properties of a primal-dual gradient based approach, analyzing its convergence in the presence of worst case deterministic noise. As a main example, we specialize and illustrate the results for the problem of robust sparse recovery. Key to our analysis is a combination of ideas from regularization theory and optimization in the presence of errors. Theoretical results are complemented by experiments showing that state-of-the-art performances can be achieved with considerable computational speed-ups.

Inexact and stochastic generalized conditional gradient with augmented lagrangian and proximal step

2021/9/1

Antonio Silveti-Falls, Cesare Molinari, Jalal Fadili

In this paper we propose and analyze inexact and stochastic versions of the CGALP algorithm developed in [34], which we denote ICGALP, that allow for errors in the computation of several important quantities. In particular this allows one to compute some gradients, proximal terms, and/or linear minimization oracles in an inexact fashion that facilitates the practical application of the algorithm to computationally intensive settings, eg, in high (or possibly infinite) dimensional Hilbert spaces commonly found in machine learning problems. The algorithm is able to solve composite minimization problems involving the sum of three convex proper lowersemicontinuous functions subject to an affine constraint of the form 𝐴𝑥= 𝑏 for some bounded linear operator 𝐴. Only one of the functions in the objective is assumed to be differentiable, the other two are assumed to have an accessible proximal operator and a linear minimization oracle. As main results, we show convergence of the Lagrangian values (so-called convergence in the Bregman sense) and asymptotic feasibility of the affine constraint as well as strong convergence of the sequence of dual variables to a solution of the dual problem, in an almost sure sense. Almost sure convergence rates are given for the Lagrangian values and the feasibility gap for the ergodic primal variables. Rates in expectation are given for the Lagrangian values and the feasibility gap subsequentially in the pointwise sense. Numerical experiments verifying the predicted rates of convergence are shown as well.

Optimal distributed control of linear parabolic equations by spectral decomposition

2021/7

Martin Lazar, Cesare Molinari

We construct an algorithm for solving a constrained optimal control problem for a first‐order evolutionary system governed by a positive self‐adjoint operator. The problem consists in identifying distributed control that minimizes a given cost functional, which comprises a cost of the control and a trajectory regulation term, while steering the final state close to a given target. The approach explores the dual problem and it generalizes the Hilbert Uniqueness Method (HUM). The practical implementation of the algorithm is based on a spectral decomposition of the operator determining the dynamics of the system. Once this decomposition is available – which can be done offline and saved for future use – the optimal control problem is solved almost instantaneously. It is practically reduced to a scalar nonlinear equation for the optimal Lagrange multiplier. The efficiency of the algorithm is demonstrated through numerical …

Iterative regularization for convex regularizers

2021/3/18

Cesare Molinari, Mathurin Massias, Lorenzo Rosasco, Silvia Villa

We study iterative regularization for linear models, when the bias is convex but not necessarily strongly convex. We characterize the stability properties of a primal-dual gradient based approach, analyzing its convergence in the presence of worst case deterministic noise. As a main example, we specialize and illustrate the results for the problem of robust sparse recovery. Key to our analysis is a combination of ideas from regularization theory and optimization in the presence of errors. Theoretical results are complemented by experiments showing that state-of-the-art performances are achieved with considerable computational speed-ups.

Alternating forward–backward splitting for linearly constrained optimization problems

2020/7

Cesare Molinari, Juan Peypouquet, Fernando Roldan

We present an alternating forward–backward splitting method for solving linearly constrained structured optimization problems. The algorithm takes advantage of the separable structure and possibly asymmetric regularity properties of the objective functions involved. We also describe some applications to the study of non-Newtonian fluids and image reconstruction problems. We conclude with a numerical example, and its comparison with Condat’s algorithm. An acceleration heuristic is also briefly outlined.

Implicit regularization for convex regularizers

2020/6

Cesare Molinari, Mathurin Massias, Lorenzo Rosasco124, Silvia Villa

We study implicit regularization for over-parameterized linear models, when the bias is convex but not necessarily strongly convex. We characterize the regularization property of a primal-dual gradient based approach, analyzing convergence and especially stability in the presence of worst case deterministic noise. As a main example, we specialize and illustrate the results for the problem of robust sparse recovery. Key to our analysis is a combination of ideas from regularization theory and optimization in the presence of errors. Theoretical results are complemented by experiments showing that state-of-the-art performances are achieved with considerable computational speed-ups.

Generalized conditional gradient with augmented lagrangian for composite minimization

2020

Antonio Silveti-Falls, Cesare Molinari, Jalal Fadili

In this paper we propose a splitting scheme which hybridizes the generalized conditional gradient with a proximal step and which we call the CGALP algorithm for minimizing the sum of three proper convex and lower-semicontinuous functions in real Hilbert spaces. The minimization is subject to an affine constraint, that, in particular, allows one to deal with composite problems (a sum of more than three functions) in a separable way by the usual product space technique. While classical conditional gradient methods require Lipschitz continuity of the gradient of the differentiable part of the objective, CGALP needs only differentiability (on an appropriate subset) and hence circumvents the intricate question of Lipschitz continuity of gradients. For the two remaining functions in the objective, we do not require any additional regularity assumption. The second function, possibly nonsmooth, is assumed simple; i.e., the …

Gradient conditionnel généralisé et lagrangien augmenté pour la minimisation composite

2019/8/26

Antonio Silveti-Falls, Cesare Molinari, Jalal M Fadili

Dans ce travail, nous proposons un schéma d'éclatement en optimisation non lisse, hybridant le gradient conditionnel avec une étape proximale que nous appelons CGALP, pour minimiser la somme de fonctions propres fermées et convexes sur un compact de . La minimisation est de plus sujette à une contrainte affine, que nous prenons en compte par un Lagrangien augmenté, en qui permet en particulier de traiter des problèmes composites à plusieurs fonctions par une technique d'espace produit. Certaines fonctions sont autorisées à être non lisses mais dont l'opérateur proximal est simple à calculer. Notre analyse et garanties de convergence sont assurées pour un large choix de paramètres en boucle ouverte. Comme résultats principaux, nous montrons la faisabilité asymptotique de la variable primale, la convergence de toute sous-suite vers une solution du problème primal, la convergence de la variable duale à une solution du problème dual, et la convergence du Lagrangien. Des taux de convergence sont aussi fournis. Les implications et illustrations de l'algorithme en traitement des données sont discutées.

Convergence rates of Forward–Douglas–Rachford splitting method

2019/8/15

Cesare Molinari, Jingwei Liang, Jalal Fadili

Over the past decades, operator splitting methods have become ubiquitous for non-smooth optimization owing to their simplicity and efficiency. In this paper, we consider the Forward–Douglas–Rachford splitting method and study both global and local convergence rates of this method. For the global rate, we establish a sublinear convergence rate in terms of a Bregman divergence suitably designed for the objective function. Moreover, when specializing to the Forward–Backward splitting, we prove a stronger convergence rate result for the objective function value. Then locally, based on the assumption that the non-smooth part of the optimization problem is partly smooth, we establish local linear convergence of the method. More precisely, we show that the sequence generated by Forward–Douglas–Rachford first (i) identifies a smooth manifold in a finite number of iteration and then (ii) enters a local linear …

Lagrangian Penalization Scheme with Parallel Forward–Backward Splitting

2018/5

Cesare Molinari, Juan Peypouquet

We propose a new iterative algorithm for the numerical approximation of the solutions to convex optimization problems and constrained variational inequalities, especially when the functions and operators involved have a separable structure on a product space, and exhibit some dissymmetry in terms of their component-wise regularity. Our method combines Lagrangian techniques and a penalization scheme with bounded parameters, with parallel forward–backward iterations. Conveniently combined, these techniques allow us to take advantage of the particular structure of the problem. We prove the weak convergence of the sequence generated by this scheme, along with worst-case convergence rates in the convex optimization setting, and for the strongly non-degenerate monotone operator case. Implementation issues related to the penalization of the constraint set are discussed, as well as applications …

Optimal control of parabolic equations by spectral decomposition

2017/8/3

Martin Lazar, C Molinari, J Peypouquet

This paper deals with an optimal control problem of Bolza type for a class of parabolic equations. It consists in finding the initial datum that minimizes a cost functional, which comprises an energy term on the control and a regulation term given by a distributed cost on the state, and such that the final state lies within a prescribed distance to a given target. Beyond the particularities of the chosen problem, our aim here is to propose a new methodology based on a spectral decomposition of the operator governing the evolution of the system, describe its implementation for this kind of problem, and finally perform some preliminary numerical experiments to assess its behaviour.

Minimization of quadratic functionals with TV Tikhonov regularization and applications to data recovery

2014/10/3

Cesare Molinari

Abstract in italiano ROF è un modello proposto per la prima volta dagli autori Rudin, Osher e Fatemi per la ricostruzione di dati danneggiati attraverso la minimizzazione di funzionali quadratici con l'aggiunta di una regolarizzazione di Tikhonov del tipo variazione totale; l'idea ha trovato successo soprattutto nel campo delle immagini, malgrado la difficoltà matematica del problema relativa alla non-differenziabilità del funzionale costo generata dalla norma TV. In questo lavoro proponiamo due alternative per la ricerca di una soluzione approssimata nello spazio H1 e testiamo entrambe prima nel campo discreto e poi direttamente sul problema continuo. La prima è un'idea già presente nella letteratura relativa all'argomento: attraverso l'introduzione di una regolarizzazione, si rende differenziabile l'intero funzionale costo; a questo punto è possibile l'utilizzo di tecniche iterative di ottimizzazione convessa come l'algoritmo del gradiente o la linearizzazione delle condizioni di ottimalità attraverso il metodo di Newton. La seconda tecnica è invece l'applicazione di un algoritmo proposto Peypouquet e Cabot (articolo non ancora pubblicato); nello stesso riferimento è contenuto il risultato di convergenza debole con relativa dimostrazione. L'algoritmo in questione si applica a problemi di ottimizzazione in spazi di Hilbert per funzionali convessi che possono essere scritti come somma di un termine differenziabile con gradiente Lipschitz continuo e di un termine a cui si richiede solo la inf-semicontinuità e prevede l'alternanza di un passo del metodo del gradiente (per sfruttare l'informazione sulla direzione di discesa contenuta nel termine differenziabile) e di …

Optimal control of parabolic equations by spectral decomposition

C Molinari, M Lazar

We present an algorithm for solving a constrained optimal control problem for a first order evolutionary system governed by a positive self-adjoint operator. The problem consists in identifying distributed control that minimizes a given cost functional, which comprises a cost of the control and a trajectory regulation term, while steering the final state close to a given target. More precisely, let H and U be two real Hilbert spaces. Given yT in H and ε> 0, denote L2

PRE-PRINTS

D. Kozak, C. Molinari, S. Villa, L. Rosasco, L. Tenorio,

Stochastic zero-th order optimization by spherical smoothing, Under review, https://arxiv.org/abs/2107.03941.

C. Vega, C. Molinari, L. Rosasco, S. Villa,

Iterative regularization for convex regularizers with random activations and diagonal scaling, In preparation.

C. Molinari, M. Massias, L. Rosasco, S. Villa,

Iterative regularization for low complexity models, In preparation.

A. Silveti-Falls, C. Molinari, J. Fadili,

A Stochastic Bregman Primal-Dual Splitting Algorithm for Composite Optimization, In preparation.

C. Molinari, J. Peypouquet,

An adjoint-based localization algorithm for sparse optimal control of heat equation via initial data, In preparation.