# Publications

## Journal articles

*Clason*,*Mazurenko*and*Valkonen*,*Acceleration and global convergence of a first-order primal–dual method for nonconvex problems*(2018).First-order primal–dual algorithms are the backbone for mathematical image processing and more general inverse problems that can be formulated as convex optimization problems. Recent advances have extended their applicability to areas previously dominated by second-order algorithms, such as nonconvex problems arising in optimal control. Nonetheless, the application of first-order primal–dual algorithms to nonconvex large-scale optimization still requires further investigation. In this paper, we analyze an extension of the primal–dual hybrid gradient method (PDHGM, also known as the Chambolle–Pock method) designed to solve problems with a nonlinear operator in the saddle term. Based on the idea of testing, we derive new step length parameter conditions for the convergence in infinite-dimensional Hilbert spaces and provide acceleration rules for suitably locally monotone problems. Importantly, we demonstrate linear convergence rates and prove global convergence in certain cases. We demonstrate the efficacy of these new step length rules on PDE-constrained optimization problems.Submitted*Zhu*and*Valkonen*,*Spherical function regularization for parallel MRI reconstruction*(2018).From the optimization point of view, a difficulty with parallel MRI with simultaneous coil sensitivity estimation is the multiplicative nature of the non-linear forward operator: the image being reconstructed and the coil sensitivities compete against each other, causing the optimization process to be very sensitive to small perturbations. This can, to some extent, be avoided by regularizing the unknown in a suitably “orthogonal” fashion. In this paper, we introduce such a regularization based on spherical function bases. To perform this regularization, we represent efficient recurrence formulas for spherical Bessel functions and associated Legendre functions. Numerically, we study the solution of the model with non-linear ADMM. We perform various numerical simulations to demonstrate the efficacy of the proposed model in parallel MRI reconstruction.Submitted*Valkonen*,*Preconditioned proximal point methods and notions of partial subregularity*(2017).Based on the needs of convergence proofs of preconditioned proximal point methods, we introduce notions of partial strong submonotonicity and partial (metric) subregularity of set-valued maps. We study relationships between these two concepts, neither of which is generally weaker or stronger than the other one. For our algorithmic purposes, the novel submonotonicity turns out to be easier to employ than more conventional error bounds obtained from subregularity. Using strong submonotonicity, we demonstrate the a posteriori linear convergence of the Primal–Dual Hybrid Gradient Method (Modified) to some strictly complementary solutions of example problems from image processing and data science. This is without the conventional assumption that all the objective functions of the involved saddle point problem are strongly convex.Submitted*Valkonen*,*Interior–proximal primal–dual methods*(2017).We study preconditioned proximal point methods for a class of saddle point problems, where the preconditioner decouples the overall proximal point method into an alternating primal–dual method. This is akin to the Chambolle–Pock method or the ADMM. In our work, we replace the squared distance in the dual step by a barrier function on a symmetric cone, while using a standard (Euclidean) proximal step for the primal variable. We show that under non-degeneracy and simple linear constraints, such a hybrid primal–dual algorithm can achieve linear convergence on originally strongly convex problems involving the second-order cone in their saddle point form. On general symmetric cones, we are only able to show an $O(1/N)$ rate. These results are based on estimates of strong convexity of the barrier function, extended with a penalty to the boundary of the symmetric cone.Submitted*Valkonen*,*Testing and non-linear preconditioning of the proximal point method*(2017).Employing the ideas of non-linear preconditioning and testing of the classical proximal point method, we formalise common arguments in convergence rate and convergence proofs of optimisation methods to the verification of a simple iteration-wise inequality. When applied to fixed point operators, the latter can be seen as a generalisation of firm non-expansivity or the $\alpha$-averaged property. The main purpose of this work is to provide the abstract background theory for our companion paper “Block-proximal methods with spatially adapted acceleration”. In the present account we demonstrate the effectiveness of the general approach on several classical algorithms, as well as their stochastic variants. Besides, of course, the proximal point method, these method include the gradient descent, forward–backward splitting, Douglas–Rachford splitting, Newton's method, as well as several methods for saddle-point problems, such as the Alternating Directions Method of Multipliers, and the Chambolle–Pock method.Submitted*Valkonen*,*Block-proximal methods with spatially adapted acceleration*(2017).We study and develop (stochastic) primal–dual block-coordinate descent methods for convex problems based on the method due to Chambolle and Pock. Our methods have known convergence rates for the iterates and the ergodic gap: $O(1/N^2)$ if each each block is strongly convex, $O(1/N)$ if no convexity is present, and more generally a mixed rate $O(1/N^2)+O(1/N)$ for strongly convex blocks, if only some blocks are strongly convex. Additional novelties of our methods include blockwise-adapted step lengths and acceleration, as well as the ability update both the primal and dual variables randomly in blocks under a very light compatibility condition. In other words, these variants of our methods are doubly-stochastic. We test the proposed methods on various image processing problems, where we employ pixelwise-adapted acceleration.Optimization (2017), Accepted*Clason*and*Valkonen*,*Primal-dual extragradient methods for nonlinear nonsmooth PDE-constrained optimization*(2017).We study the extension of the Chambolle–Pock primal-dual algorithm to nonsmooth optimization problems involving nonlinear operators between function spaces. Local convergence is shown under technical conditions including metric regularity of the corresponding primal-dual optimality conditions. We also show convergence for a Nesterov-type accelerated variant provided one part of the functional is strongly convex. We show the applicability of the accelerated algorithm to examples of inverse problems with $L^1$ and $L^\infty$ fitting terms as well as of state-constrained optimal control problems, where convergence can be guaranteed after introducing an (arbitrary small, still nonsmooth) Moreau–Yosida regularization. This is verified in numerical examples.SIAM Journal on Optimization**27**(2017), 1313–1339, doi:10.1137/16M1080859*Benning*,*Schönlieb*,*Valkonen*and*Vlačić*,*Explorations on anisotropic regularisation of dynamic inverse problems by bilevel optimisation*(2016).We explore anisotropic regularisation methods for the application of video denoising in the spirit of Holler et al.. In order to explore the regularisation rather independent of the choice of the regularisation parameter, we propose a bilevel optimisation strategy to compute the optimal regularisation parameters of such a model based on ground truth data. The optimisation poses a challenge in itself, as the dependency on one of the regularisation parameters is non-linear such that the standard existence and convergence theory does not apply. Having obtained optimal regularisation parameters, we proceed to the main contribution of this work, which is a study of the anisotropic regularisation methods for three different types of video sequences. We conclude by discussing the corresponding numerical results and their impact on the actual modelling of dynamic inverse problems.submitted*Valkonen*and*Pock*,*Acceleration of the PDHGM on partially strongly convex functions*(2017).We propose several variants of the primal-dual method due to Chambolle and Pock. Without requiring full strong convexity of the objective functions, our methods are accelerated on subspaces with strong convexity. This yields mixed rates, $O(1/N^2)$ with respect to initialisation and $O(1/N)$ with respect to the dual sequence, and the residual part of the primal sequence. We demonstrate the efficacy of the proposed methods on image processing problems lacking strong convexity, such as total generalised variation denoising and total variation deblurring.Journal of Mathematical Imaging and Vision**59**(2017), 394–414, doi:10.1007/s10851-016-0692-2*Clason*and*Valkonen*,*Stability of saddle points via explicit coderivatives of pointwise subdifferentials*(2017).We derive stability criteria for saddle points of a class of nonsmooth optimization problems in Hilbert spaces arising in PDE-constrained optimization, using metric regularity of infinite-dimensional set-valued mappings. A main ingredient is an explicit pointwise characterization of the regular coderivative of the subdifferential of convex integral functionals. This is applied to several stability properties for parameter identification problems for an elliptic partial differential equation with non-differentiable data fitting terms.Set-valued and Variational Analysis**25**(2017), 69–112, doi:10.1007/s11228-016-0366-7*Gorokh*,*Korolev*and*Valkonen*,*Diffusion tensor imaging with deterministic error bounds*(2016).Errors in the data and the forward operator of an inverse problem can be handily modelled using partial order in Banach lattices. We present some existing results of the theory of regularisation in this novel framework, where errors are represented as bounds by means of the appropriate partial order. We apply the theory to diffusion tensor imaging (DTI), where correct noise modelling is challenging: it involves the Rician distribution and the nonlinear Stejskal-Tanner equation. Linearisation of the latter in the statistical framework would complicate the noise model even further. We avoid this using the error bounds approach, which preserves simple error structure under monotone transformations.Journal of Mathematical Imaging and Vision**56**(2016), 137–157, doi:10.1007/s10851-016-0639-7*De Los Reyes*,*Schönlieb*and*Valkonen*,*Bilevel parameter learning for higher-order total variation regularisation models*(2017).We consider a bilevel optimisation approach for parameter learning in higher-order total generalized variation (TGV) image reconstruction models. Apart of the classical PSNR fidelity measure, we propose and analyse an alternative cost functional, based on a Huber regularised TV-seminorm. Existence of Lagrange multipliers is proved and a quasi-Newton algorithm is proposed for the numerical solution of the bilevel problem. Exhaustive numerical experiments are carried out to show the suitability of our approach.Journal of Mathematical Imaging and Vision**57**(2017), 1–25, doi:10.1007/s10851-016-0662-8*De Los Reyes*,*Schönlieb*and*Valkonen*,*The structure of optimal parameters for image restoration problems*(2016).We study the qualitative properties of optimal regularisation parameters in variational models for image restoration. The parameters are solutions of bilevel optimisation problems with the image restoration problem as constraint. A general type of regulariser is considered, which encompasses total variation (TV), total generalized variation (TGV) and infimal-convolution total variation (ICTV). We prove that under certain conditions on the given data optimal parameters derived by bilevel optimisation problems exist. A crucial point in the existence proof turns out to be the boundedness of the optimal parameters away from $0$ which we prove in this paper. The analysis is done on the original – in image restoration typically non-smooth variational problem – as well as on a smoothed approximation set in Hilbert space which is the one considered in numerical computations. For the smoothed bilevel problem we also prove that it $\Gamma$ converges to the original problem as the smoothing vanishes. All analysis is done in function spaces rather than on the discretised learning problem.Journal of Mathematical Analysis and Applications**434**(2016), 464–500, doi:10.1016/j.jmaa.2015.09.023*Hintermüller*,*Valkonen*and*Wu*,*Limiting aspects of non-convex $\mbox{TV}^\varphi$ models*(2015).Recently, non-convex regularisation models have been introduced in order to provide a better prior for gradient distributions in real images. They are based on using concave energies $\phi$ in the total variation type functional $\mbox{TV}^\phi(u) := \int \phi(|\nabla u(x)|)\, d x$. In this paper, it is demonstrated that for typical choices of $\phi$, functionals of this type pose several difficulties when extended to the entire space of functions of bounded variation, $\mbox{BV}(\Omega)$. In particular, if $\phi(t)=t^q$ for $q \in (0, 1)$ and $\mbox{TV}^\phi$ is defined directly for piecewise constant functions and extended via weak* lower semicontinuous envelopes to $\mbox{BV}(\Omega)$, then still $\mbox{TV}^\phi(u)=\infty$ for $u$ not piecewise constant. If, on the other hand, $\mbox{TV}^\phi$ is defined analogously via continuously differentiable functions, then $\mbox{TV}^\phi \equiv 0$ (!). We study a way to remedy the models through additional multiscale regularisation and area strict convergence, provided that the energy $\phi(t)=t^q$ is linearised for high values. The fact, that this kind of energies actually better matches reality and improves reconstructions, is demonstrated by statistics and numerical experiments.SIAM Journal on Imaging Sciences**8**(2015), 2581–2621, doi:10.1137/141001457*Valkonen*,*The jump set under geometric regularisation. Part 2: Higher-order approaches*(2017).In Part 1, we developed a new technique based on Lipschitz pushforwards for proving the jump set containment property $\mathcal{H}^{m-1}(J_u \setminus J_f)=0$ of solutions $u$ to total variation denoising. We demonstrated that the technique also applies to Huber-regularised TV. Now, in this Part 2, we extend the technique to higher-order regularisers. We are not quite able to prove the property for total generalised variation (TGV) based on the symmetrised gradient for the second-order term. We show that the property holds under three conditions: First, the solution $u$ is locally bounded. Second, the second-order variable is of locally bounded variation, $w \in \mbox{BV}_{\text{loc}}(\Omega; R^m)$, instead of just bounded deformation, $w \in \mbox{BD}(\Omega)$. Third, $w$ does not jump on $J_u$ parallel to it. The second condition can be achieved for non-symmetric TGV. Both the second and third condition can be achieved if we change the Radon (or $L^1$) norm of the symmetrised gradient $Ew$ into an $L^p$ norm, $p>1$, in which case Korn's inequality holds. On the positive side, we verify the jump set containment property for second-order infimal convolution TV (ICTV) in dimension $m=2$. We also study the limiting behaviour of the singular part of $D u$, as the second parameter of $\mbox{TGV}^2$ goes to zero. Unsurprisingly, it vanishes, but in numerical discretisations the situation looks quite different. Finally, our work additionally includes a result on TGV-strict approximation in $\mbox{BV}(\Omega)$.Journal of Mathematical Analysis and Applications**453**(2017), 1044–1085, doi:10.1016/j.jmaa.2017.04.037*Valkonen*,*The jump set under geometric regularisation. Part 1: Basic technique and first-order denoising*(2015).Let $u \in \mbox{BV}(\Omega)$ solve the total variation denoising problem with $L^2$-squared fidelity and data $f$. Caselles et al. [Multiscale Model. Simul. 6 (2008), 879–894] have shown the containment $\mathcal{H}^{m-1}(J_u \setminus J_f)=0$ of the jump set $J_u$ of $u$ in that of $f$. Their proof unfortunately depends heavily on the co-area formula, as do many results in this area, and as such is not directly extensible to higher-order, curvature-based, and other advanced geometric regularisers, such as total generalised variation (TGV) and Euler's elastica. These have received increased attention in recent times due to their better practical regularisation properties compared to conventional total variation or wavelets. We prove analogous jump set containment properties for a general class of regularisers. We do this with novel Lipschitz transformation techniques, and do not require the co-area formula. In the present Part 1 we demonstrate the general technique on first-order regularisers, while in Part 2 we will extend it to higher-order regularisers. In particular, we concentrate in this part on TV and, as a novelty, Huber-regularised TV. We also demonstrate that the technique would apply to non-convex TV models as well as the Perona-Malik anisotropic diffusion, if these approaches were well-posed to begin with.SIAM Journal on Mathematical Analysis**47**(2015), 2587–2629, doi:10.1137/140976248*Lellmann*,*Lorenz*,*Schönlieb*and*Valkonen*,*Imaging with Kantorovich-Rubinstein discrepancy*(2014).We propose the use of the Kantorovich-Rubinstein norm from optimal transport in imaging problems. In particular, we discuss a variational regularisation model endowed with a Kantorovich-Rubinstein discrepancy term and total variation regularization in the context of image denoising and cartoon-texture decomposition. We point out connections of this approach to several other recently proposed methods such as total generalized variation and norms capturing oscillating patterns. We also show that the respective optimization problem can be turned into a convex-concave saddle point problem with simple constraints and hence, can be solved by standard tools. Numerical examples exhibit interesting features and favourable performance for denoising and cartoon-texture decomposition.SIAM Journal on Imaging Sciences**7**(2014), 2833–2859, doi:10.1137/140975528*Valkonen*,*A primal-dual hybrid gradient method for non-linear operators with applications to MRI*(2014).We study the solution of minimax problems $\min_x \max_y G(x) + \langle K(x), y \rangle - F^*(y)$ in finite-dimensional Hilbert spaces. The functionals $G$ and $F^*$ we assume to be convex, but the operator $K$ we allow to be non-linear. We formulate a natural extension of the modified primal-dual hybrid gradient method (PDHGM), originally for linear $K$, due to Chambolle and Pock. We prove the local convergence of the method, provided various technical conditions are satisfied. These include in particular the Aubin property of the inverse of a monotone operator at the solution. Of particular interest to us is the case arising from Tikhonov type regularisation of inverse problems with non-linear forward operators. Mainly we are interested in total variation and second-order total generalised variation priors. For such problems, we show that our general local convergence result holds when the noise level of the data $f$ is low, and the regularisation parameter $\alpha$ is correspondingly small. We verify the numerical performance of the method by applying it to problems from magnetic resonance imaging (MRI) in chemical engineering and medicine. The specific applications are in diffusion tensor imaging (DTI) and MR velocity imaging. These numerical studies show very promising performance.Inverse Problems**30**(2014), 055012, doi:10.1088/0266-5611/30/5/055012*Benning*,*Gladden*,*Holland*,*Schönlieb*and*Valkonen*,*Phase reconstruction from velocity-encoded MRI measurements – A survey of sparsity-promoting variational approaches*(2014).In recent years there has been significant developments in the reconstruction of magnetic resonance velocity images from sub-sampled $k$-space data. While showing a strong improvement in reconstruction quality compared to classical approaches, the vast number of different methods, and the challenges in setting them up, often leaves the user with the difficult task of choosing the correct approach, or more importantly, not selecting a poor approach. In this paper, we survey variational approaches for the reconstruction of phase-encoded magnetic resonance velocity images from sub-sampled $k$-space data. We are particularly interested in regularisers that correctly treat both smooth and geometric features of the image. These features are common to velocity imaging, where the flow field will be smooth but interfaces between the fluid and surrounding material will be sharp, but are challenging to represent sparsely. As an example we demonstrate the variational approaches on velocity imaging of water flowing through a packed bed of solid particles. We evaluate Wavelet regularisation against Total Variation and the relatively recent second order Total Generalised Variation regularisation. We combine these regularisation schemes with a contrast enhancement approach called Bregman iteration. We verify for a variety of sampling patterns that Morozov's discrepancy principle provides a good criterion for stopping the iterations, both in terms of PSNR and the structural similarity measure SSIM. Therefore, given only the noise level, we present a robust guideline for setting up a variational reconstruction scheme for MR velocity imaging.Journal of Magnetic Resonance**238**(2014), 26–43, doi:10.1016/j.jmr.2013.10.003*Valkonen*,*A method for weighted projections to the positive definite cone*(2015).We study the numerical solution of the problem $\min_{X \ge 0} \|B X-c\|_2$, where $X \in \mathcal{J}$ is a symmetric square matrix, and $B: \mathcal{J} \to \mathbb{R}^N$ is a linear operator, such that $M := B^*B$ is invertible. With $\rho$ the desired fractional duality gap, and $\kappa$ the condition number of $M$, we prove $O(\sqrt{m}\kappa^2\log\rho^{-1})$ iteration complexity for a simple primal-dual interior point method directly based on those for linear programs with semi-definite constraints. We do not, however, require the numerically expensive scalings inherent in these methods to force fast convergence. For low-dimensional problems ($m \le 10$), our numerical experiments indicate excellent performance and only a very slowly growing dependence of the convergence rate on $\kappa$. While our algorithm requires somewhat more iterations than existing interior point methods, the iterations are cheaper. This gives better computational times.Optimization**64**(2015), 2253–2275, doi:10.1080/02331934.2014.929680*Valkonen*,*Bredies*and*Knoll*,*Total generalised variation in diffusion tensor imaging*(2013).We study the extension of total variation (TV) and (second-order) total generalised variation ($\mbox{TGV}^2$) to tensor fields, and the application thereof to the regularisation of noisy diffusion tensor images from medical applications.SIAM Journal on Imaging Sciences**6**(2013), 487–525, doi:10.1137/120867172*Bredies*,*Kunisch*and*Valkonen*,*Properties of $L^1$-$\mbox{TGV}^2$: The one-dimensional case*(2013).We study properties of solutions to second order Total Generalized Variation (TGV2) regularized L1-fitting problems in dimension one. Special attention is paid to the analysis of the structure of the solutions, their regularity and the effect of the regularization parameters.Journal of Mathematical Analysis and Applications**398**(2013), 438–454, doi:10.1016/j.jmaa.2012.08.053*Valkonen*,*Strong polyhedral approximation of simple jump sets*(2012).We prove a strong approximation result for functions $u \in W^{1,\infty}(\Omega \setminus J)$, where $J$ is the union of finitely many Lipschitz graphs satisfying some further technical assumptions. We approximate $J$ by a polyhedral set in such a manner that a regularisation term $\eta(\mathop{Div} u^i)$, ($i=0,1,2,\ldots$), is convergent. The boundedness of this regularisation functional itself, introduced in [T. Valkonen: “Transport equation and image interpolation with SBD velocity fields”, (2011)] ensures the convergence in total variation of the jump part $\mathop{Div} u^i$ of the distributional divergence.Nonlinear Analysis: Theory, Methods, & Applications**75**(2012), 3641–3671, doi:10.1016/j.na.2012.01.022*Valkonen*,*Transport equation and image interpolation with SBD velocity fields*(2011).In this paper, we consider an extended formulation of the transport equation that remains meaningful with discontinuous velocity fields $b$, assuming that $(1, b)$ is a special function of bounded deformation (SBD). We study existence, uniqueness, and continuity/stability of the presented formulation. We then apply this study to the problem of fitting to available data a space-time image subject to the optical flow constraint. Moreover, in order to carry out these studies, we refine the SBD approximation theorem of Chambolle to show the convergence of traces.Journal de mathématiques pures et appliquées**95**(2011), 459–494, doi:10.1016/j.matpur.2010.10.010*Valkonen*,*Optimal transportation networks and stations*(2009).We study a model of optimal transportation networks that includes the cost of constructing stations. The model is expressed in terms of the Federer-Fleming language of currents.Interfaces and Free Boundaries**11**(2009), 569–597, doi:10.4171/IFB/223*Valkonen*,*Refined optimality conditions for differences of convex functions*(2010).We provide a necessary and sufficient condition for strict local minimisers of differences of convex (DC) functions, as well as related results pertaining to characterisation of (non-strict) local minimisers, and uniqueness of global minimisers. Additionally we derive related formulae for the sensitivity analysis of minimisers of DC functions subject to perturbations.Journal of Global Optimization**48**(2010), 311–321, doi:10.1007/s10898-009-9495-y*Valkonen*,*Extension of primal-dual interior point methods to diff-convex problems on symmetric cones*(2013).We consider the extension of primal dual interior point methods for linear programming on symmetric cones, to a wider class of problems that includes approximate necessary optimality conditions for functions expressible as the difference of two convex functions of a special form. Our analysis applies the Jordan-algebraic approach to symmetric cones, as well as graphical differentiation. As the basic method is local, we apply the idea of the filter method for a globalisation strategy.Optimization**62**(2013), doi:10.1080/02331934.2011.585465*Glowinski*,*Kärkkäinen*,*Valkonen*and*Ivannikov*,*Nonsmooth SOR for $L^1$-fitting: Convergence study and Discussion of Related Issues*(2008).In this article, the denoising of smooth (H1-regular) images is considered. To reach this objective, we introduce a simple and highly efficient over-relaxation technique for solving the convex, non-smooth optimization problems resulting from the denoising formulation. We describe the algorithm, discuss its convergence and present the results of numerical experiments, which validate the methods under consideration with respect to both efficiency and denoising capability. Several issues concerning the convergence of an Uzawa algorithm for the solution of the same problem are also discussed.Journal of Scientific Computing**37**(2008), 103–138, doi:10.1007/s10915-008-9229-1*Valkonen*and*Kärkkäinen*,*Continuous reformulations and heuristics for the Euclidean travelling salesperson problem*(2009).We consider continuous reformulations of the Euclidean travelling salesperson problem (TSP), based on certain clustering problem formulations. These reformulations allow us to apply a generalisation with perturbations of the Weiszfeld algorithm in an attempt to find local approximate solutions to the Euclidean TSP.ESAIM: Control, Optimization and Calculus of Variations**15**(2009), doi:10.1051/cocv:2008056*Valkonen*and*Kärkkäinen*,*Clustering and the perturbed spatial median*(2010).The diff-convex (DC) problem of perturbed spatial median and the Weiszfeld algorithm in a framework for incomplete data is studied, and some level set theorems for general DC problems are provided. These results are then applied to study certain multiobjective formulations of clustering problems, and to yield a new algorithm for solving the multisource Weber problem.Mathematical and Computer Modelling**52**(2010), 87–106, doi:10.1016/j.mcm.2010.01.018*Valkonen*,*Convergence of a SOR-Weiszfeld type algorithm for incomplete data sets*(2006).[ abstract ]We consider a generalisation of the Weiszfeld algorithm with successive over-relaxation for data sets where some data point fields may be missing, and prove its global convergence. A counterexample is also provided to the convergence of an alternative generalisation.Numerical Functional Analysis and Optimization**27**(2006), 931–952, doi:10.1080/01630560600791213, (Errata in vol. 29, no 9–10, 2008)

## Book chapters

*Valkonen*,*Optimising Big Images*(2016).[ abstract ]We take a look at big data challenges in image processing. Real-life photographs and other images, such ones from medical imaging modalities, consist of tens of million data points. Mathematically based models for their improvement – due to noise, camera shake, physical and technical limitations, etc. – are moreover often highly non-smooth and increasingly often non-convex. This creates significant optimisation challenges for the application of the models in quasi-real-time software packages, as opposed to more ad hoc approaches whose reliability is not as easily proven as that of mathematically based variational models. After introducing a general framework for mathematical image processing, we take a look at the current state-of-the-art in optimisation methods for solving such problems, and discuss future possibilities and challenges.Big Data Optimization: Recent Developments and Challenges, editors:*Emrouznejad*. Studies in Big Data, Springer-Verlag (2016), 97–131, doi:10.1007/978-3-319-30265-2_5

## Conference proceedings

*Benning*,*Knoll*,*Schönlieb*and*Valkonen*,*Preconditioned ADMM with nonlinear operator constraint*(2016).We are presenting a modification of the well-known Alternating Direction Method of Multipliers (ADMM) algorithm with additional preconditioning that aims at solving convex optimisation problems with nonlinear operator constraints. Connections to the recently developed Nonlinear Primal-Dual Hybrid Gradient Method (NL-PDHGM) are presented, and the algorithm is demonstrated to handle the nonlinear inverse problem of parallel Magnetic Resonance Imaging (MRI).System Modeling and Optimization: 27th IFIP TC 7 Conference, CSMO 2015, Sophia Antipolis, France, June 29–July 3, 2015, Revised Selected Papers, editors:*Bociu*,*Désidéri*and*Habbal*, Springer International Publishing, doi:10.1007/978-3-319-55795-3_10 (2016), 117–126*Calatroni*,*Cao*,*De Los Reyes*,*Schönlieb*and*Valkonen*,*Bilevel approaches for learning of variational imaging models*(2016).We review some recent learning approaches in variational imaging based on bilevel optimisation and emphasise the importance of their treatment in function space. The paper covers both analytical and numerical techniques. Analytically, we include results on the exis- tence and structure of minimisers, as well as optimality conditions for their characterisation. Based on this information, Newton type methods are studied for the solution of the problems at hand, combining them with sampling techniques in case of large databases. The computa- tional verification of the developed techniques is extensively documented, covering instances with different type of regularisers, several noise models, spatially dependent weights and large image databases.Variational Methods in Imaging and Geometric Control, editors:*Bergounioux*,*Peyré*,*Schnörr*,*Caillau*and*Haberkorn*. Radon Series on Computational and Applied Mathematics**18**(2016), 252–290, doi:10.1515/9783110430394-008*Papafitsoros*and*Valkonen*,*Asymptotic behaviour of total generalised variation*(2015).The recently introduced second order total generalised variation functional $\mbox{TGV}_{\beta,\alpha}$ has been a successful regulariser for image processing purposes. Its definition involves two positive parameters $\alpha$ and $\beta$ whose values determine the amount and the quality of the regularisation. In this paper we report on the behaviour of $\mbox{TGV}_{\beta,\alpha}$ in the cases where the parameters $\alpha, \beta$ as well as their ratio $\beta/\alpha$ becomes very large or very small. Among others, we prove that for sufficiently symmetric two dimensional data and large ratio $\beta/\alpha$, $\mbox{TGV}_{\beta,\alpha}$ regularisation coincides with total variation (TV) regularisation.Scale Space and Variational Methods in Computer Vision (SSVM 2015). Lecture Notes in Computer Science**9087**(2015), 702–714, doi:10.1007/978-3-319-18461-6_56*Knoll*,*Valkonen*,*Bredies*and*Stollberger*,*Higher order variational denoising for diffusion tensor imaging*(2013).It can be seen from the results that TGV based denoising of measurements with a single average leads to results with an SNR that is comparable to that of 9 averages. In practice, this leads to a significant reduction of scan-time. The price for the SNR gain is a slight reduction of sharp edges and loss of very fine features. This effect can be observed in both individually denoised DWIs and the reconstructed DTI maps, and is slightly reduced in the case of tensor based denoising. It should be noted that while our choice of the regularization parameter ensured that comparable values are used for the two different methods, the minimization of the error norm to the 9 averages data set cannot be considered absolutely optimal because the data sets were acquired sequentially and small movements of the volunteer occurred between the individual measurements.Proceedings of the International Society for Magnetic Resonance in Medicine, (ISMRM 2013 Magna Cum Laude Award) (2013)*Valkonen*and*Liebmann*,*GPU-accelerated regularisation of large diffusion tensor volumes*(2013).We discuss the benefits, difficulties, and performance of a GPU implementation of the Chambolle-Pock algorithm for TGV (total generalised variation) denoising of medical diffusion tensor images. Whereas we have previously studied the denoising of 2D slices of $2 \times 2$ and $3 \times 3$ tensors, attaining satisfactory performance on a normal CPU, here we concentrate on full 3D volumes of data, where each 3D voxel consists of a symmetric $3 \times 3$ tensor. One of the major computational bottle-necks in the Chambolle-Pock algorithm for these problems is that on each iteration at each voxel of the data set, a tensor potentially needs to be projected to the positive semi-definite cone. This in practise demands the QR algorithm, as explicit solutions are not numerically stable. For a $128 \times 128 \times 128$ data set, for example, the count is 2 megavoxels, which lends itself to massively parallel GPU implementation. Further performance enhancements are obtained by parallelising basic arithmetic operations and differentiation. Since we use the relatively recent OpenACC standard for the GPU implementation, the article includes a study and critique of its applicability.Computing**95**(2013), 771–784, doi:10.1007/s00607-012-0277-x, Special issue for ESCO 2012, Pilsen, Czech Republic*Valkonen*,*Knoll*and*Bredies*,*TGV for diffusion tensors: A comparison of fidelity functions*(2013).We study the total generalised variation regularisation of symmetric tensor fields from medical applications, namely diffusion tensor regularisation. We study the effect of the pointwise positivity constraint on the tensor field, as well as the difference between direct denoising of the tensor field first solved from the Stejskal-Tanner equation, as was done in our earlier work, and of incorporating this equation into the fidelity function. Our results indicate that the latter novel approach provides improved computational results.Journal of Inverse and Ill-Posed Problems**21**(2013), 355–377, doi:10.1515/jip-2013-0005, Special issue for IP:M&S 2012, Antalya, Turkey*Bredies*and*Valkonen*,*Inverse problems with second-order total generalized variation constraints*(2011).Total Generalized Variation (TGV) has recently been intro- duced as penalty functional for modelling images with edges as well as smooth variations. It can be interpreted as a “sparse” penalization of optimal balancing from the first up to the k- th distributional derivative and leads to desirable results when applied to image denoising, i.e., L2-fitting with TGV penalty. The present paper studies TGV of second order in the context of solving ill-posed linear inverse problems. Existence and sta- bility for solutions of Tikhonov-functional minimization with respect to the data is shown and applied to the problem of re- covering an image from blurred and noisy data.Proceedings of the 9th International Conference on Sampling Theory and Applications (SampTA) 2011, Singapore (2011)

## Thesis

*Valkonen*,*Diff-convex combinations of Euclidean distances: a search for optima*(2008).This work presents a study of optimisation problems involving differences of convex (diff-convex) functions of Euclidean distances. Results are provided in four themes: general theory of diff-convex functions, extensions of the Weiszfeld method, interior point methods, and applications to location problems.

Within the theme of general theory, new results on optimality conditions and sensitivity to perturbations of diff-convex functions are provided. Additionally, a characterisation of level-boundedness if provided, and the internal structure is studied for a class of diff-convex functions involving symmetric cones.

Based on this study, the Jordan-algebraic approach to interior point methods for linear programs on symmetric cones is extended. Local convergence of the method is proved, and a globalisation strategy is studied, based on the concept of the filter method.

The Weiszfeld method is extended to “perturbed spatial medians with incomplete data”, where the convex spatial median objective function with scaled Euclidean distances can be perturbed by a concave function. The convergence of the method is studied, along with application to location problems.

The location problems of interest include in particular clustering and the Euclidean travelling salesperson problem (TSP). The classical multisource Weber problem is studied, and a new clustering objective is presented, based on a multi-objective interpretation of the problem. It is then shown, that the Euclidean TSP can be presented as either of the above clustering objectives perturbed with a path length penalty. Some numerical results are provided, although the focus of this work is theoretical.

Jyväskylä Studies in Computing 99, University of Jyväskylä (2008), Ph.D Thesis

## Teaching material

*Valkonen*,*Optimisation for computer vision and data science*(2017).As we have already seen, modern approaches to image processing, machine learning, and various big data applications, almost invariably involve the solution of non-smooth optimisation problems. The main part of this course studies**two (and a half) tricks**to deal with the non-smoothness. These are: splitting methods and duality, as well as saddle point problems closely related to the latter. These tricks are the topics of the respective Chapters 3 and 4. Before this, we however start in Chapter 2 with the necessary basic convex analysis, including the convex subdifferential, and in Chapter 1 with introductory application problems. After this main part, in Chapter 5 we return to practical segmentation approaches based on the Mumford–Shah problem, and indeed introduce a further bag of tricks to deal non-convexity.Lecture Notes, Partially same material as “Set-valued analysis and optimisation”*Valkonen*,*Set-valued analysis and optimisation*(2015).Modern approaches to image processing, machine learning, and various big data applications, almost invariably involve the solution of non-smooth optimisation problems. Already at the start, in the characterisation of optimal solutions to these problems, and the development of numerical methods, we run into the most fundamental concept of set-valued analysis: the convex subdifferential, which is the topic of Chapter 2. We then develop some fundamental optimisation methods for convex problems, based the subdifferential and set-valued view in Chapter 3, with an eye to our image processing and data science example applications. For the understanding of the stability and sensitivity of solutions under perturbations of data and model parameters (Chapter 4), we need to delve further into the differentiation of general set-valued functions–-a fascinating concept faced with many challenges. In Chapter 5, we develop general set-valued differentiation and take a look at the central analytical results of this area.Lecture Notes*Valkonen*,*Measure and Image*(2013).Photographs and other natural images are usually not smooth maps, they contain edges (discontinuities) and other non-smooth geometric features that should be preserved by image enhancement techniques. The correct mathematical modelling of these features involves the space of functions of bounded variation and, in consequence, aspects of geometric measure theory. The aim of this course is to provide an introduction to functions of bounded variation and their applications in image processing.Lecture Notes

A feed of new preprints can be found on my arXiv page.