Search results for: first-order-methods-in-optimization

First Order Methods in Optimization

Author : Amir Beck
File Size : 45.50 MB
Format : PDF
Download : 671
Read : 1231
Download »
The primary goal of this book is to provide a self-contained, comprehensive study of the main ?rst-order methods that are frequently used in solving large-scale problems. First-order methods exploit information on values and gradients/subgradients (but not Hessians) of the functions composing the model under consideration. With the increase in the number of applications that can be modeled as large or even huge-scale optimization problems, there has been a revived interest in using simple methods that require low iteration cost as well as low memory storage. The author has gathered, reorganized, and synthesized (in a unified manner) many results that are currently scattered throughout the literature, many of which cannot be typically found in optimization books. First-Order Methods in Optimization offers comprehensive study of first-order methods with the theoretical foundations; provides plentiful examples and illustrations; emphasizes rates of convergence and complexity analysis of the main first-order methods used to solve large-scale problems; and covers both variables and functional decomposition methods.

First order and Stochastic Optimization Methods for Machine Learning

Author : Guanghui Lan
File Size : 65.29 MB
Format : PDF, ePub, Docs
Download : 496
Read : 219
Download »
This book covers not only foundational materials but also the most recent progresses made during the past few years on the area of machine learning algorithms. In spite of the intensive research and development in this area, there does not exist a systematic treatment to introduce the fundamental concepts and recent progresses on machine learning algorithms, especially on those based on stochastic optimization methods, randomized algorithms, nonconvex optimization, distributed and online learning, and projection free methods. This book will benefit the broad audience in the area of machine learning, artificial intelligence and mathematical programming community by presenting these recent developments in a tutorial style, starting from the basic building blocks to the most carefully designed and complicated algorithms for machine learning.

First Order Methods in Large Scale Semidenite Optimization

Author : Michael Bürgisser
File Size : 61.59 MB
Format : PDF, ePub, Docs
Download : 134
Read : 767
Download »
Semidefinite Optimization has attracted the attention of many researchers over the last twenty years. It has nowadays a huge variety of applications in such different fields as Control, Structural Design, Statistics, or in the relaxation of hard combinatorial problems. In this thesis, we focus on the practical tractability of large-scale semidefinite optimization problems. From a theoretical point of view, these problems can be solved by polynomial-time Interior-Point methods approximately. The complexity estimate of Interior-Point methods grows logarithmically in the inverse of the solution accuracy, but with the order 3.5 in both the matrix size and the number of constraints. The later property prohibits the resolution of large-scale problems in practice. In this thesis, we present new approaches based on advanced First-Order methods such as Smoothing Techniques and Mirror-Prox algorithms for solving structured large-scale semidefinite optimization problems up to a moderate accuracy. These methods require a very specific problem format. However, generic semidefinite optimization problems do not comply with these requirements. In a preliminary step, we recast slightly structured semidefinite optimization problems in an alternative form to which these methods are applicable, namely as matrix saddle-point problems. The final methods have a complexity result that depends linearly in both the number of constraints and the inverse of the target accuracy. Smoothing Techniques constitute a two-stage procedure: we derive a smooth approximation of the objective function at first and apply an optimal First-Order method to the adapted problem afterwards. We present a refined version of this optimal First-Order method in this thesis. The worst-case complexity result for this modified scheme is of the same order as for the original method. However, numerical results show that this alternative scheme needs much less iterations than its original counterpart to find an approximate solution in practice. Using this refined version of the optimal First-Order method in Smoothing Techniques, we are able to solve randomly generated matrix saddle-point problems involving a hundred matrices of size 12'800 x 12'800 up to an absolute accuracy of 0.0012 in about four hours. Smoothing Techniques and Mirror-Prox methods require the computation of one or two matrix exponentials at every iteration when applied to the matrix saddle-point problems obtained from the above transformation step. Using standard techniques, the efficiency estimate for the exponentiation of a symmetric matrix grows cubically in the size of the matrix. Clearly, this operation limits the class of problems that can be solved by Smoothing Techniques and Mirror-Prox methods in practice. We present a randomized Mirror-Prox method where we replace the exact matrix exponential by a stochastic approximation. This randomized method outperforms all its competitors with respect to the theoretical complexity estimate on a significant class of large-scale matrix saddle-point problems. Furthermore, we show numerical results where the randomized method needs only about 58% of the CPU time of the deterministic counterpart for solving approximately randomly generated matrix saddle-point problems with a hundred matrices of size 800 x 800. As a side result of this thesis, we show that the Hedge algorithm - a method that is heavily used in Theoretical Computer Science - can be interpreted as a Dual Averaging scheme. The embedding of the Hedge algorithm in the framework of Dual Averaging schemes allows us to derive three new versions of this algorithm. The efficiency guarantees of these modified Hedge algorithms are at least as good as, sometimes even better than, the complexity estimates of the original method. We present numerical experiments where the refined methods significantly outperform their vanilla counterpart.

Stochastic First Order Methods in Smooth Convex Optimization

Author : Olivier Devolder
File Size : 24.85 MB
Format : PDF, Kindle
Download : 733
Read : 154
Download »

Enhanced First order Methods in Convex and Nonconvex Optimization

Author :
File Size : 82.82 MB
Format : PDF, ePub
Download : 246
Read : 1246
Download »
First-order methods for convex and nonconvex optimization have been an important research topic in the past few years. This talk studies and develops efficient algorithms of first-order type, to solve a variety of problems. We first focus on the widely studied gradient-based methods in composite convex optimization problems that arise extensively in compressed sensing and machine learning. In particular, we discuss an accelerated first-order scheme and its variants, which enjoy the "optimal" convergence rate for the gradient methods in terms of complexity, and their practical behavior.

Second Order Methods for Neural Networks

Author : Adrian J. Shepherd
File Size : 57.93 MB
Format : PDF, ePub, Mobi
Download : 779
Read : 1013
Download »
This volume aims to develop the reader's understanding of the theoretical and practical issues involved in the development of efficient MLP training strategies, and to describe and evaluate the performance of a wide range of specific training algorithm. Particular emphasis is given to the development of methods which a strong theoretical foundation, rather than heuristic, "rule of thumb" training strategies. Second-Order Methods for Neural Networks will be of interest to academic researchers and postgraduate students working with neural networks (especially supervised learning with multi-layer perceptrons), industrial researchers and programmers developing neural network software, and professionals using neural networks as optimisation tools.

Accelerated Optimization for Machine Learning

Author : Zhouchen Lin
File Size : 41.7 MB
Format : PDF, Docs
Download : 642
Read : 1094
Download »
This book on optimization includes forewords by Michael I. Jordan, Zongben Xu and Zhi-Quan Luo. Machine learning relies heavily on optimization to solve problems with its learning models, and first-order optimization algorithms are the mainstream approaches. The acceleration of first-order optimization algorithms is crucial for the efficiency of machine learning. Written by leading experts in the field, this book provides a comprehensive introduction to, and state-of-the-art review of accelerated first-order optimization algorithms for machine learning. It discusses a variety of methods, including deterministic and stochastic algorithms, where the algorithms can be synchronous or asynchronous, for unconstrained and constrained problems, which can be convex or non-convex. Offering a rich blend of ideas, theories and proofs, the book is up-to-date and self-contained. It is an excellent reference resource for users who are seeking faster optimization algorithms, as well as for graduate students and researchers wanting to grasp the frontiers of optimization in machine learning in a short time.

First Order Methods for Large Scale Convex Optimization

Author : Zi Wang
File Size : 32.74 MB
Format : PDF, ePub, Mobi
Download : 845
Read : 859
Download »
The revolution of storage technology in the past few decades made it possible to gather tremendous amount of data anywhere from demand and sales records to web user behavior, customer ratings, software logs and patient data in healthcare. Recognizing patterns and discovering knowledge from large amount of data becomes more and more important, and has attracted significant attention in operations research (OR), statistics and computer science field. Mathematical programming is an essential tool within these fields, and especially for data mining and machine learning, and it plays a significant role for data-driven predictions/decisions and pattern recognition.The major challenge while solving those large-scale optimization problems is to process large data sets within practically tolerable run-times. This is where the advantages of first-order algorithms becomes clearly apparent. These methods only use gradient information, and are particularly good at computing medium accuracy solutions. In contrast, interior point method computations that exploit second-order information quickly become intractable, even for moderate-size problems, since the complexity of each factorization of a n n matrix in interior point methods is O(n^3). The memory required for second-order methods could also be an issue in practice for problems with dense data matrices due to limited RAM. Another benefit of using first-order methods is that one can exploit additional structural information of the problem to further improve the efficiency of these algorithms.In this dissertation, we studied convex regression, and multi-agent consensus optimization problems; and developed new fast first-order iterative algorithms to efficiently compute -optimal and -feasible solutions to these large-scale optimization problems in parallel, distributed, or asynchronous computation settings while carefully managing memory usage. The proposed algorithms are able to take advantage of the structural information of the specific problems we considered in this dissertation, and have strong capability to deal with large-scale problems. Our numerical results showed the advantages of our proposed methods over other traditional methods in terms of speed, memory usage, and especially communication requirements for distributed methods.

Efficient Second order Methods for Machine Learning

Author : Peng Xu
File Size : 31.48 MB
Format : PDF, ePub, Docs
Download : 569
Read : 388
Download »
Due to the large-scale nature of many modern machine learning applications, including but not limited to deep learning problems, people have been focusing on studying and developing efficient optimization algorithms. Most of these are first-order methods which use only gradient information. The conventional wisdom in the machine learning community is that second-order methods that use Hessian information are inappropriate to use since they can not be efficient. In this thesis, we consider second-order optimization methods: we develop new sub-sampled Newton-type algorithms for both convex and non-convex optimization problems; we prove that they are efficient and scalable; and we provide a detailed empirical evaluation of their scalability as well as usefulness. In the convex setting, we present a subsampled Newton-type algorithm (SSN) that exploits non-uniform subsampling Hessians as well as inexact updates to reduce the computational complexity. Theoretically we show that our algorithms achieve a linear-quadratic convergence rate and empirically we demonstrate the efficiency of our methods on several real datasets. In addition, we extend our methods into a distributed setting and propose a distributed Newton-type method, Globally Improved Approximate NewTon method (GIANT). Theoretically we show that GIANT is highly communication efficient comparing with existing distributed optimization algorithms. Empirically we demonstrate the scalability and efficiency of GIANT in Spark. In the non-convex setting, we consider two classic non-convex Newton-type methods -- Trust Region method (TR) and Cubic Regularization method (CR). We relax the Hessian approximation condition that has been assumed in the existing works of using inexact Hessian for those algorithms. Under the relaxed Hessian approximation condition, we show that worst-case iteration complexities to converge an approximate second-order stationary point are retained for both methods. Using the similar idea of SSN, we present the sub-sampled TR and CR methods along with the sampling complexities to achieve the Hessian approximation condition. To understand the empirical performances of those methods, we conduct an extensive empirical study on some non-convex machine learning problems and showcase the efficiency and robustness of these Newton-type methods under various settings.

First order Methods of Smooth Convex Optimization with Inexact Oracle

Author : Olivier Devolder
File Size : 78.39 MB
Format : PDF, Mobi
Download : 791
Read : 1202
Download »

A Collection of Technical Papers Structures

Author :
File Size : 77.20 MB
Format : PDF, Kindle
Download : 431
Read : 252
Download »

On the Relationship Between Conjugate Gradient and Optimal First Order Methods for Convex Optimization

Author : Sahar Karimi
File Size : 50.98 MB
Format : PDF, Docs
Download : 315
Read : 998
Download »
In a series of work initiated by Nemirovsky and Yudin, and later extended by Nesterov, first-order algorithms for unconstrained minimization with optimal theoretical complexity bound have been proposed. On the other hand, conjugate gradient algorithms as one of the widely used first-order techniques suffer from the lack of a finite complexity bound. In fact their performance can possibly be quite poor. This dissertation is partially on tightening the gap between these two classes of algorithms, namely the traditional conjugate gradient methods and optimal first-order techniques. We derive conditions under which conjugate gradient methods attain the same complexity bound as in Nemirovsky-Yudin's and Nesterov's methods. Moreover, we propose a conjugate gradient-type algorithm named CGSO, for Conjugate Gradient with Subspace Optimization, achieving the optimal complexity bound with the payoff of a little extra computational cost. We extend the theory of CGSO to convex problems with linear constraints. In particular we focus on solving $l_1$-regularized least square problem, often referred to as Basis Pursuit Denoising (BPDN) problem in the optimization community. BPDN arises in many practical fields including sparse signal recovery, machine learning, and statistics. Solving BPDN is fairly challenging because the size of the involved signals can be quite large; therefore first order methods are of particular interest for these problems. We propose a quasi-Newton proximal method for solving BPDN. Our numerical results suggest that our technique is computationally effective, and can compete favourably with the other state-of-the-art solvers.

Avoiding Communication in First Order Methods for Optimization

Author : Aditya Devarakonda
File Size : 41.73 MB
Format : PDF, ePub, Docs
Download : 190
Read : 1170
Download »
Machine learning has gained renewed interest in recent years due to advances in computer hardware (processing power and high-capacity storage) and the availability of large amounts of data which can be used to develop accurate, robust models. While hardware improvements have facilitated the development of machine learning models in a single machine, the analysis of large amounts of data still requires parallel computing to obtain shorter running times or where the dataset cannot be stored on a single machine. In addition to hardware improvements, algorithm redesign is also an important direction to further reduce running times. On modern computer architectures, the cost of moving data (communication) from main memory to caches in a single machine is orders of magnitude more expensive than the cost of performing floating-point operations (computation). On parallel machines the cost of moving data from one processor to another over an interconnection network is the most expensive operation. The large gap between computation and communication suggests that algorithm redesign should be driven by the goal of avoiding communication and, if necessary, decreasing communication at the expense of additional computation. Many problems in machine learning solve mathematical optimization problems which, in most non-linear and non-convex cases, requires iterative methods. This thesis is focused on deriving communication-avoiding variants of the block coordinate descent method, which is a first-order method that has strong convergence rates for many optimization problems. Block coordinate descent is an iterative algorithm which at each iteration samples a small subset of rows or columns of the input matrix, solves a subproblem using just the chosen rows or columns, and obtains a partial solution. This solution is then iteratively refined until the optimal solution is reached or until convergence criteria are met. In the parallel case, each iteration of block coordinate descent requires communication. Therefore, avoiding communication is key to attaining high performance. This thesis adapts well-known techniques from existing work on communication-avoiding (CA) Krylov and s-step Krylov methods. CA-Krylov methods unroll vector recurrences and rearrange the sequence of computation in way that defers communication for s iterations, where s is a tunable parameter. For CA-Krylov methods the reduction in communication cost comes at the expense of numerical instability for large values of s. We apply a similar recurrence unrolling technique to block coordinate descent in order to obtain communication-avoiding variants which solve the L2-regularized least-squares, L1-regularized least-squares, Support Vector Machines, and Kernel problems. Our communication-avoiding variants reduce the latency cost by a tunable factor of s at the expense of a factor of s increase in computational and bandwidth costs for the L2 and L1 least-squares and SVM problems. The CA-variants for these problems require additional computation and bandwidth in order to update the residual vector. For CA-kernel methods the computational and bandwidth costs do not increase. This is because the CA-variants of kernel methods can reuse elements of the kernel matrix already computed and therefore do not need to compute and communicate additional elements of the kernel matrix. Our experimental results illustrate that our new, communication-avoiding methods can obtain speedups of up to 6.1x on a Cray XC30 supercomputer using MPI for parallel processing. For CA-kernel methods we show modeled speedups of 26x, 120x, and 197x for MPI on a predicted Exascale system, Spark on a predicted Exascale system, and Spark on a cloud system, respectively. Furthermore we also experimentally confirm that our algorithms are numerically stable for large values of s. Finally, we also present an adaptive batch size technique which reduces the latency cost of training convolutional neural networks (CNN). With this technique we have achieved speedups of up to 6.25x when training CNNs on up to 4 NVIDIA P100 GPUs. Furthermore, we were able to train the ImageNet dataset using the ResNet-50 network with a batch size of up to 524,228 which would allow neural network training to attain a higher fraction of peak GPU performance than training with smaller batch sizes.

Numerical Optimization Techniques for Engineering Design

Author : Garret N. Vanderplaats
File Size : 75.94 MB
Format : PDF, ePub
Download : 112
Read : 1248
Download »

Numerical Methods and Optimization

Author : Éric Walter
File Size : 83.41 MB
Format : PDF, Docs
Download : 597
Read : 856
Download »
Initial training in pure and applied sciences tends to present problem-solving as the process of elaborating explicit closed-form solutions from basic principles, and then using these solutions in numerical applications. This approach is only applicable to very limited classes of problems that are simple enough for such closed-form solutions to exist. Unfortunately, most real-life problems are too complex to be amenable to this type of treatment. Numerical Methods – a Consumer Guide presents methods for dealing with them. Shifting the paradigm from formal calculus to numerical computation, the text makes it possible for the reader to · discover how to escape the dictatorship of those particular cases that are simple enough to receive a closed-form solution, and thus gain the ability to solve complex, real-life problems; · understand the principles behind recognized algorithms used in state-of-the-art numerical software; · learn the advantages and limitations of these algorithms, to facilitate the choice of which pre-existing bricks to assemble for solving a given problem; and · acquire methods that allow a critical assessment of numerical results. Numerical Methods – a Consumer Guide will be of interest to engineers and researchers who solve problems numerically with computers or supervise people doing so, and to students of both engineering and applied mathematics.

Optimization for Machine Learning

Author : Suvrit Sra
File Size : 77.5 MB
Format : PDF, ePub
Download : 743
Read : 1057
Download »
An up-to-date account of the interplay between optimization and machine learning, accessible to students and researchers in both communities. The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields. Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.

Optimal First Order Methods for a Class of Non Smooth Convex Optimization with Applications to Image Analysis

Author : Yuyuan Ouyang
File Size : 43.97 MB
Format : PDF, Kindle
Download : 958
Read : 1299
Download »
This PhD Dissertation concerns optimal first order methods in convex optimization, and their applications in imaging science. The research is motivated by the rapid advances in the technologies for digital data acquisition, which results in high demand for efficient algorithms to solve non-smooth convex optimization problems. In this dissertation we will develop theories and optimal numerical methods for solving a class of deterministic and stochastic saddle point problems and more general variational inequalities arising from large-scale data analysis problems. In the first part of this dissertation, we aim to solve a class of deterministic and stochastic saddle point problems (SPP), which has been considered as a framework of ill-posed inverse problems regularized by a non-smooth functional in many data analysis problems, such as image reconstruction in compressed sensing and machine learning. The proposed deterministic accelerated primal dual (APD) algorithm is expected to have the same optimal rate of convergence as the one obtained by Nesterov for a different scheme. We also propose a stochastic APD algorithm that also exhibits an optimal rate of convergence. To our best knowledge, no stochastic primal-dual algorithms have been developed in literatures.

First order Methods for Trace Norm Minimization

Author : Hsiao-Han Chao
File Size : 26.79 MB
Format : PDF, ePub, Docs
Download : 145
Read : 802
Download »
Minimizing the trace norm (sum of singular values) of a matrix has become popular as a convex heuristic for computing low rank approximations, with numerous applications in control, system theory, statistics, and machine learning. However, it is usually too expensive to solve these matrix optimization problems with second-order methods, such as the interior-point method, given that the scale of the problems is relatively large. In this thesis, we compare several first-order methods for nondifferentiable convex optimization based on splitting algorithms, and apply them to primal, dual, or primal-dual optimality conditions. The implementation aspects of the algorithms are discussed in detail and their performance is compared in experiments with randomly generated data as well as system identification test sets. We show that on large-scale problems, several of the first-order methods reach a modest accuracy within a shorter time than the interior-point solvers. Based on the experiments, the most promising methods are the Alternating Direction Method of Multipliers and the Pock-Chambolle semi-implicit algorithm.

Introduction to Optimization Methods and Tools for Multidisciplinary Design in Aeronautics and Turbomachinery

Author : Jacques Periaux
File Size : 90.62 MB
Format : PDF, ePub, Docs
Download : 354
Read : 993
Download »

First order Convex Optimization Methods for Signal and Image Processing

Author : Tobias Lindstrøm Jensen
File Size : 47.37 MB
Format : PDF, Kindle
Download : 995
Read : 561
Download »