Thu, 29 Aug 2024 00:00:00 -0500
Radon FB + sliding improvements
/*! Solver for the point source localisation problem with primal-dual proximal splitting. This corresponds to the manuscript * Valkonen T. - _Proximal methods for point source localisation_, [arXiv:2212.02991](https://arxiv.org/abs/2212.02991). The main routine is [`pointsource_pdps`]. It is based on specilisatinn of [`generic_pointsource_fb_reg`] through relevant [`FBSpecialisation`] implementations. Both norm-2-squared and norm-1 data terms are supported. That is, implemented are solvers for <div> $$ \min_{μ ∈ ℳ(Ω)}~ F_0(Aμ - b) + α \|μ\|_{ℳ(Ω)} + δ_{≥ 0}(μ), $$ for both $F_0(y)=\frac{1}{2}\|y\|_2^2$ and $F_0(y)=\|y\|_1$ with the forward operator $A \in 𝕃(ℳ(Ω); ℝ^n)$. </div> ## Approach <p> The problem above can be written as $$ \min_μ \max_y G(μ) + ⟨y, Aμ-b⟩ - F_0^*(μ), $$ where $G(μ) = α \|μ\|_{ℳ(Ω)} + δ_{≥ 0}(μ)$. The Fenchel–Rockafellar optimality conditions, employing the predual in $ℳ(Ω)$, are $$ 0 ∈ A_*y + ∂G(μ) \quad\text{and}\quad Aμ - b ∈ ∂ F_0^*(y). $$ The solution of the first part is as for forward-backward, treated in the manuscript. This is the task of <code>generic_pointsource_fb</code>, where we use <code>FBSpecialisation</code> to replace the specific residual $Aμ-b$ by $y$. For $F_0(y)=\frac{1}{2}\|y\|_2^2$ the second part reads $y = Aμ -b$. For $F_0(y)=\|y\|_1$ the second part reads $y ∈ ∂\|·\|_1(Aμ - b)$. </p> Based on zero initialisation for $μ$, we use the [`Subdifferentiable`] trait to make an initialisation corresponding to the second part of the optimality conditions. In the algorithm itself, standard proximal steps are taking with respect to $F\_0^* + ⟨b, ·⟩$. */ use numeric_literals::replace_float_literals; use serde::{Serialize, Deserialize}; use nalgebra::DVector; use clap::ValueEnum; use alg_tools::iterate::{ AlgIteratorFactory, AlgIteratorState, }; use alg_tools::loc::Loc; use alg_tools::euclidean::Euclidean; use alg_tools::linops::Apply; use alg_tools::norms::{ Linfinity, Projection, }; use alg_tools::bisection_tree::{ BTFN, PreBTFN, Bounds, BTNodeLookup, BTNode, BTSearch, SupportGenerator, LocalAnalysis, }; use alg_tools::mapping::RealMapping; use alg_tools::nalgebra_support::ToNalgebraRealField; use alg_tools::linops::AXPY; use crate::types::*; use crate::measures::DiscreteMeasure; use crate::measures::merging::SpikeMerging; use crate::forward_model::ForwardModel; use crate::seminorms::DiscreteMeasureOp; use crate::plot::{ SeqPlotter, Plotting, PlotLookup }; use crate::fb::{ FBGenericConfig, insert_and_reweigh, postprocess, prune_and_maybe_simple_merge }; use crate::regularisation::RegTerm; use crate::dataterm::{ DataTerm, L2Squared, L1 }; /// Acceleration #[derive(Clone, Copy, Eq, PartialEq, Serialize, Deserialize, ValueEnum, Debug)] pub enum Acceleration { /// No acceleration #[clap(name = "none")] None, /// Partial acceleration, $ω = 1/\sqrt{1+σ}$ #[clap(name = "partial", help = "Partial acceleration, ω = 1/√(1+σ)")] Partial, /// Full acceleration, $ω = 1/\sqrt{1+2σ}$; no gap convergence guaranteed #[clap(name = "full", help = "Full acceleration, ω = 1/√(1+2σ); no gap convergence guaranteed")] Full } /// Settings for [`pointsource_pdps`]. #[derive(Clone, Copy, Eq, PartialEq, Serialize, Deserialize, Debug)] #[serde(default)] pub struct PDPSConfig<F : Float> { /// Primal step length scaling. We must have `τ0 * σ0 < 1`. pub τ0 : F, /// Dual step length scaling. We must have `τ0 * σ0 < 1`. pub σ0 : F, /// Accelerate if available pub acceleration : Acceleration, /// Generic parameters pub generic : FBGenericConfig<F>, } #[replace_float_literals(F::cast_from(literal))] impl<F : Float> Default for PDPSConfig<F> { fn default() -> Self { let τ0 = 0.5; PDPSConfig { τ0, σ0 : 0.99/τ0, acceleration : Acceleration::Partial, generic : Default::default(), } } } /// Trait for data terms for the PDPS #[replace_float_literals(F::cast_from(literal))] pub trait PDPSDataTerm<F : Float, V, const N : usize> : DataTerm<F, V, N> { /// Calculate some subdifferential at `x` for the conjugate fn some_subdifferential(&self, x : V) -> V; /// Factor of strong convexity of the conjugate #[inline] fn factor_of_strong_convexity(&self) -> F { 0.0 } /// Perform dual update fn dual_update(&self, _y : &mut V, _y_prev : &V, _σ : F); } #[replace_float_literals(F::cast_from(literal))] impl<F : Float, V : Euclidean<F> + AXPY<F>, const N : usize> PDPSDataTerm<F, V, N> for L2Squared { fn some_subdifferential(&self, x : V) -> V { x } fn factor_of_strong_convexity(&self) -> F { 1.0 } #[inline] fn dual_update(&self, y : &mut V, y_prev : &V, σ : F) { y.axpy(1.0 / (1.0 + σ), &y_prev, σ / (1.0 + σ)); } } #[replace_float_literals(F::cast_from(literal))] impl<F : Float + nalgebra::RealField, const N : usize> PDPSDataTerm<F, DVector<F>, N> for L1 { fn some_subdifferential(&self, mut x : DVector<F>) -> DVector<F> { // nalgebra sucks for providing second copies of the same stuff that's elsewhere as well. x.iter_mut() .for_each(|v| if *v != F::ZERO { *v = *v/<F as NumTraitsFloat>::abs(*v) }); x } #[inline] fn dual_update(&self, y : &mut DVector<F>, y_prev : &DVector<F>, σ : F) { y.axpy(1.0, y_prev, σ); y.proj_ball_mut(1.0, Linfinity); } } /// Iteratively solve the pointsource localisation problem using primal-dual proximal splitting. /// /// The `dataterm` should be either [`L1`] for norm-1 data term or [`L2Squared`] for norm-2-squared. /// The settings in `config` have their [respective documentation](PDPSConfig). `opA` is the /// forward operator $A$, $b$ the observable, and $\lambda$ the regularisation weight. /// The operator `op𝒟` is used for forming the proximal term. Typically it is a convolution /// operator. Finally, the `iterator` is an outer loop verbosity and iteration count control /// as documented in [`alg_tools::iterate`]. /// /// For the mathematical formulation, see the [module level](self) documentation and the manuscript. /// /// Returns the final iterate. #[replace_float_literals(F::cast_from(literal))] pub fn pointsource_pdps_reg<'a, F, I, A, GA, 𝒟, BTA, G𝒟, S, K, D, Reg, const N : usize>( opA : &'a A, b : &'a A::Observable, reg : Reg, op𝒟 : &'a 𝒟, pdpsconfig : &PDPSConfig<F>, iterator : I, mut plotter : SeqPlotter<F, N>, dataterm : D, ) -> DiscreteMeasure<Loc<F, N>, F> where F : Float + ToNalgebraRealField, I : AlgIteratorFactory<IterInfo<F, N>>, for<'b> &'b A::Observable : std::ops::Neg<Output=A::Observable> + std::ops::Add<A::Observable, Output=A::Observable>, //+ std::ops::Mul<F, Output=A::Observable>, // <-- FIXME: compiler overflow A::Observable : std::ops::MulAssign<F>, GA : SupportGenerator<F, N, SupportType = S, Id = usize> + Clone, A : ForwardModel<Loc<F, N>, F, PreadjointCodomain = BTFN<F, GA, BTA, N>> + Lipschitz<&'a 𝒟, FloatType=F>, BTA : BTSearch<F, N, Data=usize, Agg=Bounds<F>>, G𝒟 : SupportGenerator<F, N, SupportType = K, Id = usize> + Clone, 𝒟 : DiscreteMeasureOp<Loc<F, N>, F, PreCodomain = PreBTFN<F, G𝒟, N>>, 𝒟::Codomain : RealMapping<F, N>, S: RealMapping<F, N> + LocalAnalysis<F, Bounds<F>, N>, K: RealMapping<F, N> + LocalAnalysis<F, Bounds<F>, N>, BTNodeLookup: BTNode<F, usize, Bounds<F>, N>, PlotLookup : Plotting<N>, DiscreteMeasure<Loc<F, N>, F> : SpikeMerging<F>, D : PDPSDataTerm<F, A::Observable, N>, Reg : RegTerm<F, N> { // Set up parameters let config = &pdpsconfig.generic; let op𝒟norm = op𝒟.opnorm_bound(); let l = opA.lipschitz_factor(&op𝒟).unwrap().sqrt(); let mut τ = pdpsconfig.τ0 / l; let mut σ = pdpsconfig.σ0 / l; let γ = dataterm.factor_of_strong_convexity(); // We multiply tolerance by τ for FB since our subproblems depending on tolerances are scaled // by τ compared to the conditional gradient approach. let tolerance = config.tolerance * τ * reg.tolerance_scaling(); let mut ε = tolerance.initial(); // Initialise iterates let mut μ = DiscreteMeasure::new(); let mut y = dataterm.some_subdifferential(-b); let mut y_prev = y.clone(); let mut stats = IterInfo::new(); // Run the algorithm iterator.iterate(|state| { // Calculate smooth part of surrogate model. // Using `std::mem::replace` here is not ideal, and expects that `empty_observable` // has no significant overhead. For some reosn Rust doesn't allow us simply moving // the residual and replacing it below before the end of this closure. y *= -τ; let r = std::mem::replace(&mut y, opA.empty_observable()); let minus_τv = opA.preadjoint().apply(r); // Save current base point let μ_base = μ.clone(); // Insert and reweigh let (d, within_tolerances) = insert_and_reweigh( &mut μ, &minus_τv, &μ_base, None, op𝒟, op𝒟norm, τ, ε, config, ®, state, &mut stats ); // Prune and possibly merge spikes prune_and_maybe_simple_merge( &mut μ, &minus_τv, &μ_base, op𝒟, τ, ε, config, ®, state, &mut stats ); // Update step length parameters let ω = match pdpsconfig.acceleration { Acceleration::None => 1.0, Acceleration::Partial => { let ω = 1.0 / (1.0 + γ * σ).sqrt(); σ = σ * ω; τ = τ / ω; ω }, Acceleration::Full => { let ω = 1.0 / (1.0 + 2.0 * γ * σ).sqrt(); σ = σ * ω; τ = τ / ω; ω }, }; // Do dual update y = b.clone(); // y = b opA.gemv(&mut y, 1.0 + ω, &μ, -1.0); // y = A[(1+ω)μ^{k+1}]-b opA.gemv(&mut y, -ω, &μ_base, 1.0); // y = A[(1+ω)μ^{k+1} - ω μ^k]-b dataterm.dual_update(&mut y, &y_prev, σ); y_prev.copy_from(&y); // Update main tolerance for next iteration let ε_prev = ε; ε = tolerance.update(ε, state.iteration()); stats.this_iters += 1; // Give function value if needed state.if_verbose(|| { // Plot if so requested plotter.plot_spikes( format!("iter {} end; {}", state.iteration(), within_tolerances), &d, "start".to_string(), Some(&minus_τv), reg.target_bounds(τ, ε_prev), &μ, ); // Calculate mean inner iterations and reset relevant counters. // Return the statistics let res = IterInfo { value : dataterm.calculate_fit_op(&μ, opA, b) + reg.apply(&μ), n_spikes : μ.len(), ε : ε_prev, postprocessing: config.postprocessing.then(|| μ.clone()), .. stats }; stats = IterInfo::new(); res }) }); postprocess(μ, config, dataterm, opA, b) }