Optimizer

An Optimizer is the most important entity in nsoptim as this computes the optimum. Different optimizer can handle different functions and have different methods, but every Optimizer implements the basic public interface for optimizers.

General Optimizer interface

nsoptim::Optimizer

template<typename _LossFunction, typename _PenaltyFunction, typename _Coefficients>
class Optimizer

Base class for all optimizers. The only use of this class is that it ensures that _LossFunction is a valid loss function class, _PenaltyFunction is a valid penalty function class, and that both of them can handle coefficient of type _Coefficients. The template parameters are made available as member types.

Public Types

using LossFunction = _LossFunction

Loss function type.

using PenaltyFunction = _PenaltyFunction

Penalty function type.

using Coefficients = _Coefficients

Coefficients type.

using Optimum = nsoptim::Optimum<LossFunction, PenaltyFunction, Coefficients>

Optimum type as returned by this optimizer.

Public methods

Optimum Optimize()

Find an optimum of the objective function.

Optimum Optimize(const Coefficients &start)

Find an optimum of the objective function, starting the optimization at start.

MM Optimizer

The MM (minimization by majorization) algorithm is a “black-box” algorithm which can optimize a very general class of objective functions, without having explicit knowledge about the intricates of the loss and/or penalty function. The MM optimzer only needs a loss and/or penalty function which provide a convex surrogate function. The algorithm works by successively minimizing the convex surrogate function which majorizes the true objective function at the current point . A function majorizes function at if

  • is greater than everywhere, i.e., for all in the domain, and

  • the functions coincide at , i.e., .

Either the loss function, the penalty function, or both must implement a ConvexSurrogate method according to the interface described in Convex Surrogate.

The MM optimizer requires an inner optimizer, i.e., an optimizer which can optimize the convex surrogate objective function for the given coefficients type.

The MM optimizer has several configuration parameters that are set on construction by supplying a nsoptim::MMConfiguration object.

Convex Surrogate

template<Coefficients>
typename LossFunction::ConvexSurrogateType LossFunction::ConvexSurrogate(const Coefficients &where)

Return a convex surrogate loss function which majorizes the true loss function at where.

Penalty functions can provide a similar member to return a convex surrogate.

nsoptim::MMConfiguration

struct MMConfiguration

Configuration options for the MM algorithm.

Public Types

enum TighteningType

Type of tightening for inner optimization.

Values:

kNone = 0

No tightening, i.e., always use the configured numeric tolerance for the inner optimization. This is automatically chosen if the inner optimizer does not support chaning the numeric tolerance.

kExponential = 1

Start with a large inner tolerance and at each iteration make the inner optimization more precise by reducing the tolerance level of the inner optimizer by a constant factor, up until the minimum inner tolerance level is reached.

kAdaptive = 2

Start with a large inner tolerance and reduce by a constant factor as soon as parameter change in the outer optimization is less than the inner tolerance.

Public Members

int max_it

Maximum number of iterations allowed.

TighteningType tightening

Type of tightening for inner optimization.

int adaptive_tightening_steps

Number of tightening steps if using adaptive thightening.

nsoptim::MMOptimizer

template<typename LossFunction, typename PenaltyFunction, typename InnerOptimizerType, typename Coefficients>
class MMOptimizer : public nsoptim::Optimizer<LossFunction, PenaltyFunction, Coefficients>

Compute the minimum of a non-convex objective function (the loss and/or the penalty can be non-convex) using the Minimization by Majorization (MM) algorithm.

Public Functions

MMOptimizer(const MMConfiguration &config = mm_optimizer::kDefaultMMConfiguration)

Ininitialize the MM algorithm without loss or penalty function.

Parameters
  • config: configuration for the MM optimizer.

MMOptimizer(const InnerOptimizerType &optimizer, const MMConfiguration &config = mm_optimizer::kDefaultMMConfiguration)

Ininitialize the MM algorithm using the given inner optimizer,

Parameters
  • optimizer: optimizer to use for the inner optimization.

  • config: configuration for the MM optimizer.

MMOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const MMConfiguration &config = mm_optimizer::kDefaultMMConfiguration)

Ininitialize the MM algorithm using the given loss function and the penalty function.

Parameters
  • loss: a loss function.

  • penalty: a penalty function.

  • config: configuration for the MM optimizer.

MMOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const InnerOptimizerType &optimizer, const MMConfiguration &config = mm_optimizer::kDefaultMMConfiguration)

Ininitialize the MM algorithm using the given loss function, the penalty function, and inner optimizer.

Parameters
  • loss: a loss function.

  • penalty: a penalty function.

  • optimizer: optimizer to use for the inner optimization.

  • config: configuration for the MM optimizer.

MMOptimizer(const MMOptimizer &other)

Default copy constructor.

MMOptimizer(MMOptimizer &&other)

Default move constructor.

MMOptimizer &operator=(MMOptimizer &&other)

Default move assignment operator.

void Reset()

Reset the optimizier. This compeletely purges the current state.

LossFunction &loss() const

Access the loss function.

Return

the loss function currently in use by the MM algorithm.

void loss(const LossFunction &loss)

Set the loss function.

Parameters
  • loss: the new loss function to optimize over.

PenaltyFunction &penalty() const

Access the penalty function.

Return

the penalty function currently in use by the optimizer.

void penalty(const PenaltyFunction &penalty)

Set the penalty function.

Parameters
  • penalty: the new penalty function to optimize over.

double convergence_tolerance() const

Get the convergence tolerance for the MM algorithm.

Return

convergence tolerance.

void convergence_tolerance(double convergence_tolerance)

Set the convergence tolerance for the MM algorithm.

Parameters
  • convergence_tolerance: convergene tolerance for the MM algorithm.

Optimum Optimize()

Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point.

Return

information about the optimum.

Optimum Optimize(const Coefficients &start)

Find the minimum of the objective function, using the given coefficients as starting point.

Return

information about the optimum.

Parameters
  • start: where to start the optimization from.

Optimum Optimize(const Coefficients &start, const int max_it)

Find the minimum of the objective function, using the given coefficients as starting point and at most max_it iterations.

Return

information about the optimum.

Parameters
  • start: where to start the optimization from.

  • max_it: maximum number of iterations.

Optimum Optimize(const int max_it)

Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point and at most max_it iterations.

Return

information about the optimum.

Parameters
  • max_it: maximum number of iterations.

Linearized Alternative Direction Method of Moments (ADMM) Optimizer

  • Supported loss functions: LsRegressionLoss, WeightedLsRegressionLoss

  • Supported penalty functions: EnPenalty, AdaptiveEnPenalty

Linearized ADMM works for objective functions that can be written as and solves the problem

Especially if is “wide” (i.e., has more columns than rows), the proximal operator for this problem is usually much quicker to compute than for . More information on the properties of the algorithm can be found in [1].

The linearized ADMM algorithm requires a proper implementation of the proximal operator that can handle the given loss and penalty functions. A proximal operator needs to follow the following interface:

Proximal Operator

class ProxOp
void loss(const Lossfunction &loss)

Change the loss to loss.

arma::vec operator()(const arma::vec &u, const arma::vec &prev, const double intercept, const double lambda, nsoptim::Metrics *metrics)

Get the value of the proximal operator of the function scaled by lambda, evaluated at u. The argument prev is the previous value returned by the proximal operator and intercept is the current value of the intercept or 0, if the loss does not use an intercept term.

double ComputeIntercept(const arma::vec &fitted)

Compute the intercept term, given the fitted values.

double StepSize(const PenaltyFunction &penalty, const double norm_x)

Get the step size required for the set loss and the given the penalty.

nsoptim::AdmmLinearConfiguration

struct AdmmLinearConfiguration

Configuration options for the linearized ADMM algorithm.

Public Members

int max_it

maximum number of iterations allowed.

nsoptim::GenericLinearizedAdmmOptimizer

template<typename ProxOp, typename PenaltyFunction, typename Coefficients>
class GenericLinearizedAdmmOptimizer : public nsoptim::Optimizer<ProxOp::LossFunction, PenaltyFunction, Coefficients>

Compute the EN regression estimate using the alternating direction method of multiplier (ADMM) with linearization. This optimizer uses the given proximal operator class ProxOp.

Public Functions

GenericLinearizedAdmmOptimizer()

Initialize the ADMM optimizer without setting a loss or penalty function.

GenericLinearizedAdmmOptimizer(const LossFunction &loss, const PenaltyFunction &penalty)

Initialize the ADMM optimizer with the given loss and penalty functions.

Parameters
  • loss: the loss function object.

  • penalty: the penalty function object.

GenericLinearizedAdmmOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const ProximalOperator &prox, const AdmmLinearConfiguration &config = admm_optimizer::kDefaultLinConfig)

Initialize the ADMM optimizer with the given loss and penalty functions.

Parameters
  • loss: the loss function object.

  • penalty: the penalty function object.

  • prox: proximal operator object.

  • config: optional ADMM configuration object.

template<typename T, typename ...Args, typename = typename std::enable_if<!IsConfiguration<T>::value && !IsLossFunction<T>::value, void>::type>
GenericLinearizedAdmmOptimizer(T &&prox_arg_1, Args&&... prox_args)

Initialize the ADMM optimizer without setting a loss or penalty function.

Parameters
  • prox_arg_1: first argument to constructor of the proximal operator.

  • prox_args...: further arguments to the constructor of the proximal operator.

template<typename C, typename ...Args, typename = typename std::enable_if<IsConfiguration<C>::value, void>::type>
GenericLinearizedAdmmOptimizer(const C &config, Args&&... prox_args)

Initialize the ADMM optimizer without setting a loss or penalty function.

Parameters
  • config: ADMM configuration object.

  • prox_args...: arguments to the constructor of the proximal operator.

template<typename T1, typename T2, typename ...Args, typename = typename std::enable_if<!IsConfiguration<T1>::value && !IsLossFunction<T1>::value, void>::type>
GenericLinearizedAdmmOptimizer(T1 &&prox_arg_1, T2 &&prox_arg_2, Args&&... prox_args)

Initialize the ADMM optimizer without setting a loss or penalty function.

Parameters
  • prox_arg_1: first argument to constructor of the proximal operator.

  • prox_arg_2: second argument to constructor of the proximal operator.

  • prox_args...: further arguments to the constructor of the proximal operator.

template<typename C, typename T1, typename ...Args, typename = typename std::enable_if<IsConfiguration<C>::value, void>::type>
GenericLinearizedAdmmOptimizer(const C &config, T1 &&prox_arg_1, Args&&... prox_args)

Initialize the ADMM optimizer without setting a loss or penalty function.

Parameters
  • config: ADMM configuration object.

  • prox_arg_1: first argument to constructor of the proximal operator.

  • prox_args...: further arguments to the constructor of the proximal operator.

template<typename C, typename T1, typename T2, typename ...Args, typename = typename std::enable_if<IsConfiguration<C>::value, void>::type>
GenericLinearizedAdmmOptimizer(const C &config, T1 &&prox_arg_1, T2 &&prox_arg_2, Args&&... prox_args)

Initialize the ADMM optimizer without setting a loss or penalty function.

Parameters
  • config: ADMM configuration object.

  • prox_arg_1: first argument to constructor of the proximal operator.

  • prox_arg_2: second argument to constructor of the proximal operator.

  • prox_args...: further arguments to the constructor of the proximal operator.

template<typename L, typename P, typename T1, typename ...Args, typename = typename std::enable_if<IsLossFunction<L>::value && !IsConfiguration<T1>::value, void>::type>
GenericLinearizedAdmmOptimizer(const L &loss, const P &penalty, T1 &&prox_arg_1, Args&&... prox_args)

Initialize the ADMM optimizer without setting a loss or penalty function.

Parameters
  • loss: the loss function object.

  • penalty: the penalty function object.

  • prox_arg_1: first argument to constructor of the proximal operator.

  • prox_args...: further arguments to the constructor of the proximal operator.

template<typename L, typename P, typename C, typename ...Args, typename = typename std::enable_if<IsLossFunction<L>::value && IsConfiguration<C>::value, void>::type>
GenericLinearizedAdmmOptimizer(const L &loss, const P &penalty, const C &config, Args&&... prox_args)

Initialize the ADMM optimizer without setting a loss or penalty function.

Parameters
  • loss: the loss function object.

  • penalty: the penalty function object.

  • config: ADMM configuration object.

  • prox_args...: further arguments to the constructor of the proximal operator.

GenericLinearizedAdmmOptimizer(const GenericLinearizedAdmmOptimizer &other)

Default copy constructor.

The copied optimizer will share the identical loss and penalty functions after construction. In case the loss or penalty function are mutated in any way, the change will affect both optimizers. If the loss/penalty function is changed on one of the optimizers (using the loss() or penalty() methods), the two optimizers will not share the new loss/penalty function.

GenericLinearizedAdmmOptimizer &operator=(const GenericLinearizedAdmmOptimizer &other)

Default copy assignment.

The copied optimizer will share the identical loss and penalty functions after construction. In case the loss or penalty function are mutated in any way, the change will affect both optimizers. If the loss/penalty function is changed on one of the optimizers (using the loss() or penalty() methods), the two optimizers will not share the new loss/penalty function.

GenericLinearizedAdmmOptimizer(GenericLinearizedAdmmOptimizer &&other)

Default move constructor.

GenericLinearizedAdmmOptimizer &operator=(GenericLinearizedAdmmOptimizer &&other)

Default move assignment operator.

double convergence_tolerance() const

Get the convergence tolerance for the ADMM algorithm.

Return

convergence tolerance.

void convergence_tolerance(double convergence_tolerance)

Set the convergence tolerance for the ADMM algorithm.

Parameters
  • convergence_tolerance: convergene tolerance for the ADMM algorithm.

Optimum Optimize()

Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point.

Return

information about the optimum.

Optimum Optimize(const Coefficients &start)

Find the minimum of the objective function, using the given coefficients as starting point.

Return

information about the optimum.

Parameters
  • start: where to start the optimization from.

Optimum Optimize(const Coefficients &start, const int max_it)

Find the minimum of the objective function, using the given coefficients as starting point and at most max_it iterations.

Return

information about the optimum.

Parameters
  • start: where to start the optimization from.

  • max_it: maximum number of iterations.

Optimum Optimize(const int max_it)

Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point and at most max_it iterations.

Return

information about the optimum.

Parameters
  • max_it: maximum number of iterations.

Alternative Direction Method of Moments (ADMM) Optimizer with Variable Step-Size

  • Supported loss functions: LsRegressionLoss, WeightedLsRegressionLoss

  • Supported penalty functions: EnPenalty, AdaptiveEnPenalty

This implementation operates directly on the objective function , but adjusts the step size according to [2].

nsoptim::AdmmVarStepConfiguration

struct AdmmVarStepConfiguration

Configuration options for the variable-stepsize ADMM algorithm.

Public Members

int max_it

Maximum number of iterations allowed.

double tau

Initial step size. If negative (the default), use the square L_2 norm of x.

double tau_lower_mult

Lower bound for the step size, defined as a multiple of tau.

double tau_adjustment_lower

Lower bound of the step-size adjustment factor.

double tau_adjustment_upper

Upper bound of the step-size adjustment factor.

nsoptim::AdmmVarStepOptimizer

template<typename LossFunction, typename PenaltyFunction, typename Coefficients>
class AdmmVarStepOptimizer : public nsoptim::Optimizer<LossFunction, PenaltyFunction, Coefficients>

Compute the EN regression estimate using the alternating direction method of multiplier (ADMM) with variable step-size.

Public Functions

AdmmVarStepOptimizer(const AdmmVarStepConfiguration &config = admm_optimizer::kDefaultVarStepConfig)

Ininitialize the optimizer using the given (weighted) LS loss function and the Ridge penalty.

Parameters
  • loss: a weighted LS loss function.

  • penalty: the Ridge penalty.

AdmmVarStepOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const AdmmVarStepConfiguration &config = admm_optimizer::kDefaultVarStepConfig)

Ininitialize the optimizer using the given (weighted) LS loss function and the Ridge penalty.

Parameters
  • loss: a weighted LS loss function.

  • penalty: the Ridge penalty.

AdmmVarStepOptimizer(const AdmmVarStepOptimizer &other)

Default copy constructor.

The copied optimizer will share the identical loss and penalty functions after construction. In case the loss or penalty function are mutated in any way, the change will affect both optimizers. If the loss/penalty function is changed on one of the optimizers (using the loss() or penalty() methods), the two optimizers will not share the new loss/penalty function.

AdmmVarStepOptimizer &operator=(const AdmmVarStepOptimizer &other)

Default copy assignment.

The copied optimizer will share the identical loss and penalty functions after construction. In case the loss or penalty function are mutated in any way, the change will affect both optimizers. If the loss/penalty function is changed on one of the optimizers (using the loss() or penalty() methods), the two optimizers will not share the new loss/penalty function.

AdmmVarStepOptimizer(AdmmVarStepOptimizer &&other)

Default move constructor.

AdmmVarStepOptimizer &operator=(AdmmVarStepOptimizer &&other)

Default move assignment operator.

double convergence_tolerance() const

Get the convergence tolerance for the ADMM algorithm.

Return

convergence tolerance.

void convergence_tolerance(double convergence_tolerance)

Set the convergence tolerance for the ADMM algorithm.

Parameters
  • convergence_tolerance: convergene tolerance for the ADMM algorithm.

Optimum Optimize()

Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point.

Return

information about the optimum.

Optimum Optimize(const Coefficients &start)

Find the minimum of the objective function, using the given coefficients as starting point.

Return

information about the optimum.

Parameters
  • start: where to start the optimization from.

Optimum Optimize(const Coefficients &start, const int max_it)

Find the minimum of the objective function, using the given coefficients as starting point and at most max_it iterations.

Return

information about the optimum.

Parameters
  • start: where to start the optimization from.

  • max_it: maximum number of iterations.

Optimum Optimize(const int max_it)

Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point and at most max_it iterations.

Return

information about the optimum.

Parameters
  • max_it: maximum number of iterations.

Dual Augmented Lagrangian (DAL)

The dual augmented lagrangian algorithm according to [3]. Supports only sparse coefficients.

nsoptim::DalEnConfiguration

struct DalEnConfiguration

Configuration options for the DAL algorithm.

Public Members

int max_it

Maximum number of iterations allowed.

int max_inner_it

Maximum number of inner iterations allowed.

double eta_start_numerator_conservative

Conservative setting for the numerator when computing the initial value of the proximity parameters.

double eta_start_numerator_aggressive

Aggressive setting for the numerator when computing the initial value of the proximity parameters.

double lambda_relchange_aggressive

Maximum relative change in lambda that allows the use of the aggressive numerator.

double eta_multiplier

Multiplier to scale the proximity parameters at each outer iteration.

nsoptim::DalEnOptimizer

template<class LossFunction, class PenaltyFunction>
class DalEnOptimizer : public nsoptim::Optimizer<LossFunction, PenaltyFunction, RegressionCoefficients<arma::sp_vec>>

Public Functions

DalEnOptimizer(const Configuration &config = _optim_dal_internal::kDefaultDalEnConfiguration)

Initialize the DAL optimizer for EN penalties without initializing the loss/penalty functions.

Parameters
  • config: configuration for the algorithm.

DalEnOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const Configuration &config = _optim_dal_internal::kDefaultDalEnConfiguration)

Initialize the DAL optimizer for EN penalties.

Parameters
  • loss: the loss function to optimize over.

  • penalty: the penalty function to optimizer over.

  • config: configuration for the algorithm.

DalEnOptimizer(const DalEnOptimizer &other)

Copy constructor.

This creates shallow copies of the loss and penalty functions and a deep copy of the hessian as well as possibly the weighted data.

void Reset()

Reset the optimizier. This compeletely purges the current state.

LossFunction &loss() const

Access the loss function.

Return

the loss function currently in use by the optimizer.

void loss(const LossFunction &loss)

Set the loss function.

Parameters
  • loss: the new loss function to optimize over.

PenaltyFunction &penalty() const

Access the penalty function.

Return

the penalty function currently in use by the optimizer.

void penalty(const PenaltyFunction &penalty)

Set the penalty function.

Parameters
  • penalty: the new penalty function to optimize over.

double convergence_tolerance() const

Get the convergence tolerance for the DAL algorithm.

Return

convergence tolerance.

void convergence_tolerance(double convergence_tolerance)

Set the convergence tolerance for the DAL algorithm.

Parameters
  • convergence_tolerance: convergene tolerance for the MM algorithm.

Optimum Optimize(const Coefficients &start)

Find the minimum of the objective function, using the given coefficients as starting point.

Return

information about the optimum.

Parameters
  • start: where to start the optimization from.

Optimum Optimize(const Coefficients &start, const int max_it)

Find the minimum of the objective function, using the given coefficients as starting point and at most max_it iterations.

Return

information about the optimum.

Parameters
  • start: where to start the optimization from.

  • max_it: maximum number of iterations.

Optimum Optimize()

Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point.

Return

information about the optimum.

Optimum Optimize(const int max_it)

Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point and at most max_it iterations.

Return

information about the optimum.

Parameters
  • max_it: maximum number of iterations.

References

[1] Attouch, H., Bolte, J., Redont, P., and Soubeyran, A. (2010). Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Ojasiewicz inequality. Mathematics of Operations Research, 35(2):438–457.

[2] Bartels, S. and Milicevic, M. (2017). Alternating direction method of multipliers with variable step sizes. arXiv:1704.06069.

[3] Tomioka, R., Suzuki, T., and Sugiyama, M. (2011). Super-linear convergence of dual augmented lagrangian algorithm for sparsity regularized estimation. Journal of Machine Learning Research, 12:1537–1586.