Optimization Module
Durbyn.Optimize provides a unified optimization API plus direct solver wrappers.
The module preserves a stable public API (optimize, nelder_mead, bfgs, lbfgsb, brent, option types, and result shapes), while each solver follows its own algorithmic implementation details.
Overview
| Algorithm | Function | Type | Typical Use |
|---|---|---|---|
| Nelder-Mead | nelder_mead | Derivative-free | No gradient available |
| BFGS | bfgs | Quasi-Newton | Smooth unconstrained problems |
| L-BFGS-B | lbfgsb | Bounded quasi-Newton | Box-constrained problems |
| Brent | brent | 1D derivative-free | Scalar bounded minimization |
| ITP | itp | 1D bracketed root finding | Robust zero finding with sign-change bracket |
Primary exports:
using Durbyn.Optimize
optimize, nelder_mead, bfgs, lbfgsb, brent, itp
NelderMeadOptions, BFGSOptions, LBFGSBOptions, BrentOptions, ITPOptions
OptimizeResult
numerical_hessian
scaler, descalerProvenance and Implementation Basis
This module is documented and maintained with an equation-first workflow:
- Define each algorithm from published mathematical formulations in cited references.
- Implement those formulations directly in Julia.
- Validate behavior with module tests and explicit solver-level checks.
Solver provenance in this module:
nelder_mead: implemented directly in this repository from the Nelder-Mead equations.bfgs: implemented directly in this repository from BFGS equations (inverse-Hessian rank-2 updates + strong Wolfe line search).lbfgsb: implemented directly in this repository from the L-BFGS-B equations (projected gradient, limited-memory updates, bound-feasible line search).itp: implemented directly in this repository from the ITP root-finding equations (interpolate-truncate-project).brent: implemented directly in this repository from Brent's bounded golden-section/parabolic interpolation equations.- This page includes the governing equations and references so the implementation basis is transparent.
Unified Interface: optimize
optimize(fn, x0, method=:nelder_mead; kwargs...)Supported methods
:nelder_mead:bfgs:lbfgsb:brent(requireslength(x0) == 1, finitelower, finiteupper):itp(requireslength(x0) == 1, finitelower, finiteupper, and a sign-change bracket)
Key keyword arguments
gradient=nothing: analytic gradient functiongradient(x) -> vectorlower=-Inf,upper=Inf: bounds (:lbfgsb,:brent,:itp)max_iterations: iteration/evaluation budgetparam_scale: parameter scaling vectorstep_sizes: finite-difference steps used for numerical Hessian estimationfn_scale: objective scaling (negative to maximize)trace: verbosity (0silent)report_interval: trace cadencereltol,abstol,gtolfactr,pgtol,memory_size(:lbfgsb)alpha,beta,gamma(:nelder_mead)hessian=false: include numerical Hessian at solution- Not supported for
method = :itp.
- Not supported for
Return value
optimize returns OptimizeResult with fields:
minimizer::Vector{Float64}minimum::Float64converged::Booliterations::Intf_calls::Intg_calls::Intmessage::String
If hessian=true, returns a named tuple with the same fields plus hessian.
Example
using Durbyn.Optimize
rosenbrock(x) = 100.0 * (x[2] - x[1]^2)^2 + (1.0 - x[1])^2
rosenbrock_grad(x) = [
-400.0 * x[1] * (x[2] - x[1]^2) - 2.0 * (1.0 - x[1]),
200.0 * (x[2] - x[1]^2),
]
r1 = optimize(rosenbrock, [-1.2, 1.0], :nelder_mead)
r2 = optimize(rosenbrock, [-1.2, 1.0], :bfgs; gradient=rosenbrock_grad)
r3 = optimize(rosenbrock, [0.5, 0.5], :lbfgsb; lower=[0.0, 0.0], upper=[2.0, 2.0])
r4 = optimize(x -> (x[1] - 2.0)^2, [0.0], :brent; lower=-10.0, upper=10.0)
r5 = optimize(x -> x[1]^2 - 2.0, [1.0], :itp; lower=0.0, upper=2.0)Direct Solver Wrappers
These are useful when you want solver-specific controls or named-tuple results.
nelder_mead
result = nelder_mead(f, x0, NelderMeadOptions(...))Returns:
(x_opt = ..., f_opt = ..., fncount = ..., fail = ...)fail == 0: convergedfail == 1: stopped before convergencefail == 10: shrink step failed to reduce simplex diameter (stagnation guard)
Key options:
abstol,reltol,alpha,beta,gammamaxit,trace,invalid_penalty- optional bound projection:
project_to_bounds=true,lower,upper
Implemented equations (Nelder & Mead, 1965):
\[\bar{P} = \frac{1}{n}\sum_{i \ne h} P_i,\qquad P^* = (1 + \alpha)\bar{P} - \alpha P_h\]
\[P^{**}_{\text{expand}} = \gamma P^* + (1-\gamma)\bar{P}\]
\[P^{**}_{\text{contract}} = \beta P_h + (1-\beta)\bar{P}\]
\[P_i \leftarrow \frac{P_i + P_l}{2}\quad\text{(shrink)}\]
Contraction acceptance follows:
\[y_{\text{contract}} \le \min(y_h, y^*)\]
Stopping criterion uses simplex function-value spread:
\[\sigma = \sqrt{\frac{1}{n}\sum_{i=0}^{n}(y_i - \bar{y})^2}\]
Default coefficients follow the original recommendations: alpha = 1, beta = 0.5, gamma = 2.
Initial simplex is built in axial form from x0: P_i = P_0 + \delta_i e_i for i=1,\dots,n.
bfgs
result = bfgs(f, g!, x0; options=BFGSOptions(...), mask=trues(length(x0)))g! mutates its first argument:
g!(grad, x) = (grad .= 2.0 .* x; nothing)You can also pass g! = nothing to use numerical gradients.
Returns:
(x_opt = ..., f_opt = ..., n_iter = ..., fail = ..., fn_evals = ..., gr_evals = ...)Mathematical basis (BFGS update):
\[p_k = -H_k \nabla f(x_k),\qquad x_{k+1} = x_k + \alpha_k p_k\]
\[s_k = x_{k+1} - x_k,\quad y_k = \nabla f(x_{k+1}) - \nabla f(x_k),\quad \rho_k = \frac{1}{y_k^\top s_k}\]
\[H_{k+1} = (I - \rho_k s_k y_k^\top)H_k(I - \rho_k y_k s_k^\top) + \rho_k s_k s_k^\top\]
Strong Wolfe conditions used in line search:
\[f(x_k+\alpha p_k) \le f(x_k) + c_1 \alpha \nabla f(x_k)^\top p_k,\qquad c_1=10^{-4}\]
\[\left|\nabla f(x_k+\alpha p_k)^\top p_k\right| \le c_2 \left|\nabla f(x_k)^\top p_k\right|,\qquad c_2=0.9\]
Curvature acceptance for inverse-Hessian update:
\[y_k^\top s_k > 0\]
Secant equation
The BFGS update ensures the new approximation satisfies:
\[H_{k+1} y_k = s_k\]
This quasi-Newton condition guarantees that the approximate inverse Hessian models the most recent curvature information.
Initial inverse Hessian scaling
At iteration 0, $H_0 = I$. After the first step, the inverse Hessian is rescaled:
\[H_0 \leftarrow \frac{y_0^\top s_0}{y_0^\top y_0} \cdot I\]
This approximates the scale of the true inverse Hessian along the most recent direction, making subsequent step lengths closer to 1.
Convergence
The implementation checks gradient infinity-norm convergence:
\[\|\nabla f(x_k)\|_\infty \le \varepsilon_g\]
along with relative function decrease and step size stagnation tests. BFGS achieves superlinear convergence on smooth problems: $\|x_{k+1} - x^*\| / \|x_k - x^*\| \to 0$.
Implementation note:
- This module implements BFGS directly (search direction
p=-Hg, strong Wolfe bracketing/zoom, and in-place symmetric inverse-Hessian updates). - If
g! = nothing, gradients are computed by central finite differences. maskkeeps non-active coordinates fixed at their initial values.
lbfgsb
result = lbfgsb(f, g, x0; lower=..., upper=..., options=LBFGSBOptions(...))gcan benothing(internal bound-aware finite differences) or a gradient functiong(x) -> vector- Supports
mask(masked variables are fixed at initial values)
Returns:
(x_opt = ..., f_opt = ..., n_iter = ..., fail = ..., fn_evals = ..., gr_evals = ..., message = ...)fail conventions:
0: converged1: iteration limit reached52: abnormal/invalid state (for example infeasible bounds or non-finite values)
Mathematical basis (box-constrained optimization):
\[\min_{x \in \mathbb{R}^n} f(x)\quad\text{subject to}\quad l \le x \le u\]
Projected-gradient stationarity condition (used by L-BFGS-B style methods):
\[\|P_{[l,u]}(x - \nabla f(x)) - x\|_\infty \le \varepsilon\]
Limited-memory correction pairs:
\[s_k = x_{k+1} - x_k,\qquad y_k = \nabla f(x_{k+1}) - \nabla f(x_k)\]
\[\text{accept update if}\quad s_k^\top y_k > \epsilon_{\text{mach}}\|y_k\|^2\]
Two-loop recursion
The search direction is computed via the L-BFGS two-loop recursion using the $m$ most recent correction pairs $\{s_k, y_k\}$ and a scaling factor:
\[\theta = \frac{y_{k-1}^\top y_{k-1}}{y_{k-1}^\top s_{k-1}}\]
The implicit Hessian approximation has the compact form $B_k = \theta I - W_k M_k W_k^\top$ where $W_k$ and $M_k$ are constructed from the stored pairs. The two-loop recursion avoids forming this matrix explicitly, computing the direction in $O(mn)$ time where $m$ is the memory size and $n$ is the dimension.
Implementation note:
- This module implements the L-BFGS-B iteration directly (projected-gradient stop test, two-loop direction, bound-feasible line search, and limited-memory updates).
- For bounded
optimize(..., :lbfgsb)calls, a deterministic interior restart may be attempted to avoid poor boundary stationary points; the better solution is kept.
brent
result = brent(f, lower, upper; options=BrentOptions(...))Returns:
(x_opt = ..., f_opt = ..., n_iter = ..., fail = ..., fn_evals = ...)f is scalar style: f(x::Float64).
Mathematical basis (Brent):
\[x_{k+1} = \begin{cases} x_k + \frac{p_k}{q_k}, & \text{if parabolic step is acceptable}\\ x_k + c\,(b_k-a_k), & \text{otherwise (golden-section fallback)} \end{cases}\]
\[c = \frac{3-\sqrt{5}}{2}\]
Parabolic interpolation coefficients:
\[r = (x_{\text{best}} - x_{\text{second}})(f_{\text{best}} - f_{\text{prev}}), \qquad q = (x_{\text{best}} - x_{\text{prev}})(f_{\text{best}} - f_{\text{second}})\]
\[p = (x_{\text{best}} - x_{\text{prev}})q - (x_{\text{best}} - x_{\text{second}})r, \qquad q \leftarrow 2(q - r)\]
The parabolic step $p/q$ is accepted only when three conditions hold simultaneously:
- Damping: $|p| < \tfrac{1}{2}|q| \cdot |\text{old\_step}|$ — prevents oscillation by requiring the parabolic step to be less than half the previous step
- Lower feasibility: $p > q(a - x_{\text{best}})$
- Upper feasibility: $p < q(b - x_{\text{best}})$
Otherwise, a golden-section step is used as fallback.
Convergence properties:
- Exactly 1 function evaluation per iteration
- Worst-case: golden-section linear convergence with rate $c \approx 0.382$
- Best-case: superlinear convergence of order $\approx 1.324$ on smooth functions
Implementation note:
- This module standardizes tolerance handling, traces, non-finite guards, and return fields.
- The iteration engine follows Brent's Chapter 5 structure directly with
(x_best, x_second, x_prev)point tracking and guarded parabolic steps.
itp
result = itp(f, a, b; options=ITPOptions(...))Returns:
(x_root = ..., f_root = ..., n_iter = ..., fail = ..., fn_evals = ...)fail conventions:
0: converged (f(x_root) == 0or bracket width<= 2*tol)1: max iterations reached before tolerance
ITP iteration equations:
\[x_{1/2} = \frac{a+b}{2},\qquad x_f = \frac{b f(a) - a f(b)}{f(a)-f(b)}\]
\[\sigma = \operatorname{sign}(x_{1/2}-x_f),\qquad \delta = \kappa_1 |b-a|^{\kappa_2}\]
\[x_t = \begin{cases} x_f + \sigma\delta, & \delta \le |x_{1/2}-x_f|\\ x_{1/2}, & \text{otherwise} \end{cases}\]
\[r = \varepsilon\,2^{\,n_{1/2}+n_0-j} - \frac{b-a}{2},\qquad x_{\text{ITP}} = \begin{cases} x_t, & |x_t-x_{1/2}| \le r\\ x_{1/2} - \sigma r, & \text{otherwise} \end{cases}\]
Default hyperparameters:
\[\kappa_1 = \frac{0.2}{b_0 - a_0}, \qquad \kappa_2 = 2, \qquad n_0 = 1\]
Convergence guarantees:
- Worst-case: $n_{1/2} + n_0$ iterations (minmax optimal when $n_0 = 0$), where $n_{1/2} = \lceil\log_2((b_0 - a_0)/(2\varepsilon))\rceil$
- Order of convergence: with $\kappa_2 = 2$, achieves quadratic convergence on smooth functions (better than Brent's $\approx 1.324$)
- Average-case: strictly better than bisection under any continuous distribution on the root location
The truncation-projection strategy protects against catastrophic cancellation in the regula falsi interpolation step while maintaining the bracket guarantee of bisection.
Implementation note:
- This module implements ITP directly (interpolate-truncate-project) and preserves bracket guarantees.
Numerical Derivatives
Hessian
H = numerical_hessian(f, x; gradient=nothing)numerical_hessian uses Section 8.1 finite differences directly:
When gradient is provided (forward-difference Hessian columns, Eq. 8.7):
\[\nabla^2 f(x)e_j \approx \frac{\nabla f(x + \varepsilon_j e_j) - \nabla f(x)}{\varepsilon_j}\]
When gradient is not provided (second-order finite differences of f, Eq. 8.9):
\[\frac{\partial^2 f}{\partial x_i^2}(x) \approx \frac{f(x+\varepsilon_i e_i)-2f(x)+f(x-\varepsilon_i e_i)}{\varepsilon_i^2}\]
\[\frac{\partial^2 f}{\partial x_i \partial x_j}(x) \approx \frac{f(x+\varepsilon_i e_i+\varepsilon_j e_j)-f(x+\varepsilon_i e_i)-f(x+\varepsilon_j e_j)+f(x)} {\varepsilon_i\varepsilon_j}\]
Symmetry is enforced numerically by using 0.5 * (H + H') when the gradient-differencing path is used.
Step Size Selection
Optimal step sizes balance truncation error against roundoff error. For machine epsilon $u \approx 10^{-16}$:
| Method | Optimal $\varepsilon$ | Accuracy |
|---|---|---|
| Forward-difference gradient | $\varepsilon \approx \sqrt{u} \approx 10^{-8}$ | $O(\varepsilon)$ |
| Central-difference gradient | $\varepsilon \approx u^{1/3} \approx 6 \times 10^{-6}$ | $O(\varepsilon^2)$ |
| Function-only Hessian | $\varepsilon \approx u^{1/4} \approx 10^{-4}$ | $O(\varepsilon^2)$ |
The gradient-based Hessian (Eq. 8.7) requires $2n$ gradient evaluations. The function-only Hessian (Eq. 8.9) requires $O(n^2)$ function evaluations but needs no gradient.
Parameter Scaling Helpers
x_scaled = scaler(x, scale) # x ./ scale
x_orig = descaler(x_scaled, scale) # x_scaled .* scaleUse scaling when parameters are on very different magnitudes.
Practical Notes
- Exact function/gradient call counts may differ across methods because each algorithm uses method-specific line searches, simplex operations, or bound handling.
- The
optimizeAPI is the recommended entry point unless you explicitly need a direct solver tuple return.
References
- Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization.
- Lagarias, J. C. et al. (1998). Convergence properties of the Nelder-Mead simplex method in low dimensions.
- Gao, F. and Han, L. (2012). Implementing the Nelder-Mead simplex algorithm with adaptive parameters.
- Broyden, Fletcher, Goldfarb, and Shanno original quasi-Newton papers (1970).
- Byrd, Lu, Nocedal, and Zhu (1995). A limited memory algorithm for bound constrained optimization.
- Zhu, Byrd, Lu, and Nocedal (1997). Algorithm 778: L-BFGS-B.
- Brent, R. P. (1973/2002). Algorithms for Minimization Without Derivatives.
- Oliveira, I. F. D. & Takahashi, R. H. C. (2021). An Enhancement of the Bisection Method Average Performance Preserving Minmax Optimality. ACM TOMS, 47(1).
- Nocedal, J. and Wright, S. J. (2006). Numerical Optimization.