Optimizer¶
An Optimizer is the most important entity in nsoptim as this computes the optimum. Different optimizer can handle different functions and have different methods, but every Optimizer implements the basic public interface for optimizers.
General Optimizer interface¶
nsoptim::Optimizer¶
-
template<typename
_LossFunction, typename_PenaltyFunction, typename_Coefficients>
classOptimizer¶ Base class for all optimizers. The only use of this class is that it ensures that
_LossFunctionis a valid loss function class,_PenaltyFunctionis a valid penalty function class, and that both of them can handle coefficient of type_Coefficients. The template parameters are made available as member types.Public Types
-
using
LossFunction= _LossFunction¶ Loss function type.
-
using
PenaltyFunction= _PenaltyFunction¶ Penalty function type.
-
using
Coefficients= _Coefficients¶ Coefficients type.
-
using
Optimum= nsoptim::Optimum<LossFunction, PenaltyFunction, Coefficients>¶ Optimum type as returned by this optimizer.
Public methods
-
Optimum
Optimize(const Coefficients &start)¶ Find an optimum of the objective function, starting the optimization at start.
-
using
MM Optimizer¶
The MM (minimization by majorization) algorithm is a “black-box” algorithm which can optimize a very general class of objective functions, without having explicit knowledge about the intricates of the loss and/or penalty function.
The MM optimzer only needs a loss and/or penalty function which provide a convex surrogate function.
The algorithm works by successively minimizing the convex surrogate function which majorizes the true objective function at the current point .
A function
majorizes function
at
if
is greater than
everywhere, i.e.,
for all
in the domain, and
the functions coincide at
, i.e.,
.
Either the loss function, the penalty function, or both must implement a ConvexSurrogate method according to the interface described in Convex Surrogate.
The MM optimizer requires an inner optimizer, i.e., an optimizer which can optimize the convex surrogate objective function for the given coefficients type.
The MM optimizer has several configuration parameters that are set on construction by supplying a nsoptim::MMConfiguration object.
Convex Surrogate¶
-
template<Coefficients>
typename LossFunction::ConvexSurrogateTypeLossFunction::ConvexSurrogate(const Coefficients &where)¶ Return a convex surrogate loss function which majorizes the true loss function at where.
Penalty functions can provide a similar member to return a convex surrogate.
nsoptim::MMConfiguration¶
-
struct
MMConfiguration¶ Configuration options for the MM algorithm.
Public Types
-
enum
TighteningType¶ Type of tightening for inner optimization.
Values:
-
kNone= 0¶ No tightening, i.e., always use the configured numeric tolerance for the inner optimization. This is automatically chosen if the inner optimizer does not support chaning the numeric tolerance.
-
kExponential= 1¶ Start with a large inner tolerance and at each iteration make the inner optimization more precise by reducing the tolerance level of the inner optimizer by a constant factor, up until the minimum inner tolerance level is reached.
-
kAdaptive= 2¶ Start with a large inner tolerance and reduce by a constant factor as soon as parameter change in the outer optimization is less than the inner tolerance.
-
Public Members
-
int
max_it¶ Maximum number of iterations allowed.
-
TighteningType
tightening¶ Type of tightening for inner optimization.
-
int
adaptive_tightening_steps¶ Number of tightening steps if using adaptive thightening.
-
enum
nsoptim::MMOptimizer¶
-
template<typename
LossFunction, typenamePenaltyFunction, typenameInnerOptimizerType, typenameCoefficients>
classMMOptimizer: public nsoptim::Optimizer<LossFunction, PenaltyFunction, Coefficients>¶ Compute the minimum of a non-convex objective function (the loss and/or the penalty can be non-convex) using the Minimization by Majorization (MM) algorithm.
Public Functions
-
MMOptimizer(const MMConfiguration &config = mm_optimizer::kDefaultMMConfiguration)¶ Ininitialize the MM algorithm without loss or penalty function.
- Parameters
config: configuration for the MM optimizer.
-
MMOptimizer(const InnerOptimizerType &optimizer, const MMConfiguration &config = mm_optimizer::kDefaultMMConfiguration)¶ Ininitialize the MM algorithm using the given inner optimizer,
- Parameters
optimizer: optimizer to use for the inner optimization.config: configuration for the MM optimizer.
-
MMOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const MMConfiguration &config = mm_optimizer::kDefaultMMConfiguration)¶ Ininitialize the MM algorithm using the given loss function and the penalty function.
- Parameters
loss: a loss function.penalty: a penalty function.config: configuration for the MM optimizer.
-
MMOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const InnerOptimizerType &optimizer, const MMConfiguration &config = mm_optimizer::kDefaultMMConfiguration)¶ Ininitialize the MM algorithm using the given loss function, the penalty function, and inner optimizer.
- Parameters
loss: a loss function.penalty: a penalty function.optimizer: optimizer to use for the inner optimization.config: configuration for the MM optimizer.
-
MMOptimizer(const MMOptimizer &other)¶ Default copy constructor.
-
MMOptimizer(MMOptimizer &&other)¶ Default move constructor.
-
MMOptimizer &
operator=(MMOptimizer &&other)¶ Default move assignment operator.
-
void
Reset()¶ Reset the optimizier. This compeletely purges the current state.
-
LossFunction &
loss() const¶ Access the loss function.
- Return
the loss function currently in use by the MM algorithm.
-
void
loss(const LossFunction &loss)¶ Set the loss function.
- Parameters
loss: the new loss function to optimize over.
-
PenaltyFunction &
penalty() const¶ Access the penalty function.
- Return
the penalty function currently in use by the optimizer.
-
void
penalty(const PenaltyFunction &penalty)¶ Set the penalty function.
- Parameters
penalty: the new penalty function to optimize over.
-
double
convergence_tolerance() const¶ Get the convergence tolerance for the MM algorithm.
- Return
convergence tolerance.
-
void
convergence_tolerance(double convergence_tolerance)¶ Set the convergence tolerance for the MM algorithm.
- Parameters
convergence_tolerance: convergene tolerance for the MM algorithm.
-
Optimum
Optimize()¶ Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point.
- Return
information about the optimum.
-
Optimum
Optimize(const Coefficients &start)¶ Find the minimum of the objective function, using the given coefficients as starting point.
- Return
information about the optimum.
- Parameters
start: where to start the optimization from.
-
Optimum
Optimize(const Coefficients &start, const int max_it)¶ Find the minimum of the objective function, using the given coefficients as starting point and at most
max_ititerations.- Return
information about the optimum.
- Parameters
start: where to start the optimization from.max_it: maximum number of iterations.
-
Optimum
Optimize(const int max_it)¶ Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point and at most
max_ititerations.- Return
information about the optimum.
- Parameters
max_it: maximum number of iterations.
-
Linearized Alternative Direction Method of Moments (ADMM) Optimizer¶
Supported loss functions:
LsRegressionLoss,WeightedLsRegressionLossSupported penalty functions:
EnPenalty,AdaptiveEnPenalty
Linearized ADMM works for objective functions that can be written as and solves the problem
Especially if is “wide” (i.e., has more columns than rows), the proximal operator for this problem is usually much quicker to compute than for
.
More information on the properties of the algorithm can be found in [1].
The linearized ADMM algorithm requires a proper implementation of the proximal operator that can handle the given loss and penalty functions. A proximal operator needs to follow the following interface:
Proximal Operator¶
-
class
ProxOp¶ -
void
loss(const Lossfunction &loss)¶ Change the loss to loss.
-
arma::vec
operator()(const arma::vec &u, const arma::vec &prev, const double intercept, const double lambda, nsoptim::Metrics *metrics)¶ Get the value of the proximal operator of the function scaled by lambda, evaluated at u. The argument prev is the previous value returned by the proximal operator and intercept is the current value of the intercept or 0, if the loss does not use an intercept term.
-
double
ComputeIntercept(const arma::vec &fitted)¶ Compute the intercept term, given the fitted values.
-
double
StepSize(const PenaltyFunction &penalty, const double norm_x)¶ Get the step size required for the set loss and the given the penalty.
-
void
nsoptim::AdmmLinearConfiguration¶
nsoptim::GenericLinearizedAdmmOptimizer¶
-
template<typename
ProxOp, typenamePenaltyFunction, typenameCoefficients>
classGenericLinearizedAdmmOptimizer: public nsoptim::Optimizer<ProxOp::LossFunction, PenaltyFunction, Coefficients>¶ Compute the EN regression estimate using the alternating direction method of multiplier (ADMM) with linearization. This optimizer uses the given proximal operator class
ProxOp.Public Functions
-
GenericLinearizedAdmmOptimizer()¶ Initialize the ADMM optimizer without setting a loss or penalty function.
-
GenericLinearizedAdmmOptimizer(const LossFunction &loss, const PenaltyFunction &penalty)¶ Initialize the ADMM optimizer with the given loss and penalty functions.
- Parameters
loss: the loss function object.penalty: the penalty function object.
-
GenericLinearizedAdmmOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const ProximalOperator &prox, const AdmmLinearConfiguration &config = admm_optimizer::kDefaultLinConfig)¶ Initialize the ADMM optimizer with the given loss and penalty functions.
- Parameters
loss: the loss function object.penalty: the penalty function object.prox: proximal operator object.config: optional ADMM configuration object.
-
template<typename
T, typename ...Args, typename = typename std::enable_if<!IsConfiguration<T>::value && !IsLossFunction<T>::value, void>::type>GenericLinearizedAdmmOptimizer(T &&prox_arg_1, Args&&... prox_args)¶ Initialize the ADMM optimizer without setting a loss or penalty function.
- Parameters
prox_arg_1: first argument to constructor of the proximal operator.prox_args...: further arguments to the constructor of the proximal operator.
-
template<typename
C, typename ...Args, typename = typename std::enable_if<IsConfiguration<C>::value, void>::type>GenericLinearizedAdmmOptimizer(const C &config, Args&&... prox_args)¶ Initialize the ADMM optimizer without setting a loss or penalty function.
- Parameters
config: ADMM configuration object.prox_args...: arguments to the constructor of the proximal operator.
-
template<typename
T1, typenameT2, typename ...Args, typename = typename std::enable_if<!IsConfiguration<T1>::value && !IsLossFunction<T1>::value, void>::type>GenericLinearizedAdmmOptimizer(T1 &&prox_arg_1, T2 &&prox_arg_2, Args&&... prox_args)¶ Initialize the ADMM optimizer without setting a loss or penalty function.
- Parameters
prox_arg_1: first argument to constructor of the proximal operator.prox_arg_2: second argument to constructor of the proximal operator.prox_args...: further arguments to the constructor of the proximal operator.
-
template<typename
C, typenameT1, typename ...Args, typename = typename std::enable_if<IsConfiguration<C>::value, void>::type>GenericLinearizedAdmmOptimizer(const C &config, T1 &&prox_arg_1, Args&&... prox_args)¶ Initialize the ADMM optimizer without setting a loss or penalty function.
- Parameters
config: ADMM configuration object.prox_arg_1: first argument to constructor of the proximal operator.prox_args...: further arguments to the constructor of the proximal operator.
-
template<typename
C, typenameT1, typenameT2, typename ...Args, typename = typename std::enable_if<IsConfiguration<C>::value, void>::type>GenericLinearizedAdmmOptimizer(const C &config, T1 &&prox_arg_1, T2 &&prox_arg_2, Args&&... prox_args)¶ Initialize the ADMM optimizer without setting a loss or penalty function.
- Parameters
config: ADMM configuration object.prox_arg_1: first argument to constructor of the proximal operator.prox_arg_2: second argument to constructor of the proximal operator.prox_args...: further arguments to the constructor of the proximal operator.
-
template<typename
L, typenameP, typenameT1, typename ...Args, typename = typename std::enable_if<IsLossFunction<L>::value && !IsConfiguration<T1>::value, void>::type>GenericLinearizedAdmmOptimizer(const L &loss, const P &penalty, T1 &&prox_arg_1, Args&&... prox_args)¶ Initialize the ADMM optimizer without setting a loss or penalty function.
- Parameters
loss: the loss function object.penalty: the penalty function object.prox_arg_1: first argument to constructor of the proximal operator.prox_args...: further arguments to the constructor of the proximal operator.
-
template<typename
L, typenameP, typenameC, typename ...Args, typename = typename std::enable_if<IsLossFunction<L>::value && IsConfiguration<C>::value, void>::type>GenericLinearizedAdmmOptimizer(const L &loss, const P &penalty, const C &config, Args&&... prox_args)¶ Initialize the ADMM optimizer without setting a loss or penalty function.
- Parameters
loss: the loss function object.penalty: the penalty function object.config: ADMM configuration object.prox_args...: further arguments to the constructor of the proximal operator.
-
GenericLinearizedAdmmOptimizer(const GenericLinearizedAdmmOptimizer &other)¶ Default copy constructor.
The copied optimizer will share the identical loss and penalty functions after construction. In case the loss or penalty function are mutated in any way, the change will affect both optimizers. If the loss/penalty function is changed on one of the optimizers (using the
loss()orpenalty()methods), the two optimizers will not share the new loss/penalty function.
-
GenericLinearizedAdmmOptimizer &
operator=(const GenericLinearizedAdmmOptimizer &other)¶ Default copy assignment.
The copied optimizer will share the identical loss and penalty functions after construction. In case the loss or penalty function are mutated in any way, the change will affect both optimizers. If the loss/penalty function is changed on one of the optimizers (using the
loss()orpenalty()methods), the two optimizers will not share the new loss/penalty function.
-
GenericLinearizedAdmmOptimizer(GenericLinearizedAdmmOptimizer &&other)¶ Default move constructor.
-
GenericLinearizedAdmmOptimizer &
operator=(GenericLinearizedAdmmOptimizer &&other)¶ Default move assignment operator.
-
double
convergence_tolerance() const¶ Get the convergence tolerance for the ADMM algorithm.
- Return
convergence tolerance.
-
void
convergence_tolerance(double convergence_tolerance)¶ Set the convergence tolerance for the ADMM algorithm.
- Parameters
convergence_tolerance: convergene tolerance for the ADMM algorithm.
-
Optimum
Optimize()¶ Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point.
- Return
information about the optimum.
-
Optimum
Optimize(const Coefficients &start)¶ Find the minimum of the objective function, using the given coefficients as starting point.
- Return
information about the optimum.
- Parameters
start: where to start the optimization from.
-
Optimum
Optimize(const Coefficients &start, const int max_it)¶ Find the minimum of the objective function, using the given coefficients as starting point and at most
max_ititerations.- Return
information about the optimum.
- Parameters
start: where to start the optimization from.max_it: maximum number of iterations.
-
Optimum
Optimize(const int max_it)¶ Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point and at most
max_ititerations.- Return
information about the optimum.
- Parameters
max_it: maximum number of iterations.
-
Alternative Direction Method of Moments (ADMM) Optimizer with Variable Step-Size¶
Supported loss functions:
LsRegressionLoss,WeightedLsRegressionLossSupported penalty functions:
EnPenalty,AdaptiveEnPenalty
This implementation operates directly on the objective function , but adjusts the step size
according to [2].
nsoptim::AdmmVarStepConfiguration¶
-
struct
AdmmVarStepConfiguration¶ Configuration options for the variable-stepsize ADMM algorithm.
Public Members
-
int
max_it¶ Maximum number of iterations allowed.
-
double
tau¶ Initial step size. If negative (the default), use the square L_2 norm of
x.
-
double
tau_lower_mult¶ Lower bound for the step size, defined as a multiple of
tau.
-
double
tau_adjustment_lower¶ Lower bound of the step-size adjustment factor.
-
double
tau_adjustment_upper¶ Upper bound of the step-size adjustment factor.
-
int
nsoptim::AdmmVarStepOptimizer¶
-
template<typename
LossFunction, typenamePenaltyFunction, typenameCoefficients>
classAdmmVarStepOptimizer: public nsoptim::Optimizer<LossFunction, PenaltyFunction, Coefficients>¶ Compute the EN regression estimate using the alternating direction method of multiplier (ADMM) with variable step-size.
Public Functions
-
AdmmVarStepOptimizer(const AdmmVarStepConfiguration &config = admm_optimizer::kDefaultVarStepConfig)¶ Ininitialize the optimizer using the given (weighted) LS loss function and the Ridge penalty.
- Parameters
loss: a weighted LS loss function.penalty: the Ridge penalty.
-
AdmmVarStepOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const AdmmVarStepConfiguration &config = admm_optimizer::kDefaultVarStepConfig)¶ Ininitialize the optimizer using the given (weighted) LS loss function and the Ridge penalty.
- Parameters
loss: a weighted LS loss function.penalty: the Ridge penalty.
-
AdmmVarStepOptimizer(const AdmmVarStepOptimizer &other)¶ Default copy constructor.
The copied optimizer will share the identical loss and penalty functions after construction. In case the loss or penalty function are mutated in any way, the change will affect both optimizers. If the loss/penalty function is changed on one of the optimizers (using the
loss()orpenalty()methods), the two optimizers will not share the new loss/penalty function.
-
AdmmVarStepOptimizer &
operator=(const AdmmVarStepOptimizer &other)¶ Default copy assignment.
The copied optimizer will share the identical loss and penalty functions after construction. In case the loss or penalty function are mutated in any way, the change will affect both optimizers. If the loss/penalty function is changed on one of the optimizers (using the
loss()orpenalty()methods), the two optimizers will not share the new loss/penalty function.
-
AdmmVarStepOptimizer(AdmmVarStepOptimizer &&other)¶ Default move constructor.
-
AdmmVarStepOptimizer &
operator=(AdmmVarStepOptimizer &&other)¶ Default move assignment operator.
-
double
convergence_tolerance() const¶ Get the convergence tolerance for the ADMM algorithm.
- Return
convergence tolerance.
-
void
convergence_tolerance(double convergence_tolerance)¶ Set the convergence tolerance for the ADMM algorithm.
- Parameters
convergence_tolerance: convergene tolerance for the ADMM algorithm.
-
Optimum
Optimize()¶ Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point.
- Return
information about the optimum.
-
Optimum
Optimize(const Coefficients &start)¶ Find the minimum of the objective function, using the given coefficients as starting point.
- Return
information about the optimum.
- Parameters
start: where to start the optimization from.
-
Optimum
Optimize(const Coefficients &start, const int max_it)¶ Find the minimum of the objective function, using the given coefficients as starting point and at most
max_ititerations.- Return
information about the optimum.
- Parameters
start: where to start the optimization from.max_it: maximum number of iterations.
-
Optimum
Optimize(const int max_it)¶ Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point and at most
max_ititerations.- Return
information about the optimum.
- Parameters
max_it: maximum number of iterations.
-
Dual Augmented Lagrangian (DAL)¶
The dual augmented lagrangian algorithm according to [3]. Supports only sparse coefficients.
nsoptim::DalEnConfiguration¶
-
struct
DalEnConfiguration¶ Configuration options for the DAL algorithm.
Public Members
-
int
max_it¶ Maximum number of iterations allowed.
-
int
max_inner_it¶ Maximum number of inner iterations allowed.
-
double
eta_start_numerator_conservative¶ Conservative setting for the numerator when computing the initial value of the proximity parameters.
-
double
eta_start_numerator_aggressive¶ Aggressive setting for the numerator when computing the initial value of the proximity parameters.
-
double
lambda_relchange_aggressive¶ Maximum relative change in
lambdathat allows the use of the aggressive numerator.
-
double
eta_multiplier¶ Multiplier to scale the proximity parameters at each outer iteration.
-
int
nsoptim::DalEnOptimizer¶
-
template<class
LossFunction, classPenaltyFunction>
classDalEnOptimizer: public nsoptim::Optimizer<LossFunction, PenaltyFunction, RegressionCoefficients<arma::sp_vec>>¶ Public Functions
-
DalEnOptimizer(const Configuration &config = _optim_dal_internal::kDefaultDalEnConfiguration)¶ Initialize the DAL optimizer for EN penalties without initializing the loss/penalty functions.
- Parameters
config: configuration for the algorithm.
-
DalEnOptimizer(const LossFunction &loss, const PenaltyFunction &penalty, const Configuration &config = _optim_dal_internal::kDefaultDalEnConfiguration)¶ Initialize the DAL optimizer for EN penalties.
- Parameters
loss: the loss function to optimize over.penalty: the penalty function to optimizer over.config: configuration for the algorithm.
-
DalEnOptimizer(const DalEnOptimizer &other)¶ Copy constructor.
This creates shallow copies of the loss and penalty functions and a deep copy of the hessian as well as possibly the weighted data.
-
void
Reset()¶ Reset the optimizier. This compeletely purges the current state.
-
LossFunction &
loss() const¶ Access the loss function.
- Return
the loss function currently in use by the optimizer.
-
void
loss(const LossFunction &loss)¶ Set the loss function.
- Parameters
loss: the new loss function to optimize over.
-
PenaltyFunction &
penalty() const¶ Access the penalty function.
- Return
the penalty function currently in use by the optimizer.
-
void
penalty(const PenaltyFunction &penalty)¶ Set the penalty function.
- Parameters
penalty: the new penalty function to optimize over.
-
double
convergence_tolerance() const¶ Get the convergence tolerance for the DAL algorithm.
- Return
convergence tolerance.
-
void
convergence_tolerance(double convergence_tolerance)¶ Set the convergence tolerance for the DAL algorithm.
- Parameters
convergence_tolerance: convergene tolerance for the MM algorithm.
-
Optimum
Optimize(const Coefficients &start)¶ Find the minimum of the objective function, using the given coefficients as starting point.
- Return
information about the optimum.
- Parameters
start: where to start the optimization from.
-
Optimum
Optimize(const Coefficients &start, const int max_it)¶ Find the minimum of the objective function, using the given coefficients as starting point and at most
max_ititerations.- Return
information about the optimum.
- Parameters
start: where to start the optimization from.max_it: maximum number of iterations.
-
Optimum
Optimize()¶ Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point.
- Return
information about the optimum.
-
Optimum
Optimize(const int max_it)¶ Find the minimum of the objective function, using the previous solution (or the 0-vector if no previous solution exists) as starting point and at most
max_ititerations.- Return
information about the optimum.
- Parameters
max_it: maximum number of iterations.
-
References¶
[1] Attouch, H., Bolte, J., Redont, P., and Soubeyran, A. (2010). Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Ojasiewicz inequality. Mathematics of Operations Research, 35(2):438–457.
[2] Bartels, S. and Milicevic, M. (2017). Alternating direction method of multipliers with variable step sizes. arXiv:1704.06069.
[3] Tomioka, R., Suzuki, T., and Sugiyama, M. (2011). Super-linear convergence of dual augmented lagrangian algorithm for sparsity regularized estimation. Journal of Machine Learning Research, 12:1537–1586.