publications
Selected publications in reversed chronological order.
2026
- Learning geometry and topology via multi-chart flowsHanlin Yu, Søren Hauberg, Marcelo Hartmann, and 2 more authorsIn The 29th international conference on artificial intelligence and statistics, 2026
Real world data often lie on low-dimensional Riemannian manifolds embedded in high-dimensional spaces. This motivates learning degenerate normalizing flows that map between the ambient space and a low-dimensional latent space. However, if the manifold has a non-trivial topology, it can never be correctly learned using a single flow. Instead multiple flows must be ‘glued together’. In this paper, we first propose the general training scheme for learning such a collection of flows, and secondly we develop the first numerical algorithms for computing geodesics on such manifolds. Empirically, we demonstrate that this leads to highly significant improvements in topology estimation.
2025
- Stochastic Variance-Reduced Gaussian Variational Inference on the Bures-Wasserstein ManifoldHoang Phuc Hau Luu, Hanlin Yu, Bernardo Williams, and 2 more authorsIn The Thirteenth International Conference on Learning Representations, 2025
Optimization in the Bures-Wasserstein space has been gaining popularity in the machine learning community since it draws connections between variational inference and Wasserstein gradient flows. The variational inference objective function of Kullback–Leibler divergence can be written as the sum of the negative entropy and the potential energy, making forward-backward Euler the method of choice. Notably, the backward step admits a closed-form solution in this case, facilitating the practicality of the scheme. However, the forward step is not exact since the Bures-Wasserstein gradient of the potential energy involves "intractable" expectations. Recent approaches propose using the Monte Carlo method – in practice a single-sample estimator – to approximate these terms, resulting in high variance and poor performance. We propose a novel variance-reduced estimator based on the principle of control variates. We theoretically show that this estimator has a smaller variance than the Monte-Carlo estimator in scenarios of interest. We also prove that variance reduction helps improve the optimization bounds of the current analysis. We demonstrate that the proposed estimator gains order-of-magnitude improvements over the previous Bures-Wasserstein methods.
- Geodesic Slice Sampler for Multimodal Distributions with Strong CurvatureBernardo Williams, Hanlin Yu, Hoang Phuc Hau Luu, and 2 more authorsIn The 41st Conference on Uncertainty in Artificial Intelligence, 2025
Traditional Markov Chain Monte Carlo sampling methods often struggle with sharp curvatures, intricate geometries, and multimodal distributions. Slice sampling can resolve local exploration inefficiency issues, and Riemannian geometries help with sharp curvatures. Recent extensions enable slice sampling on Riemannian manifolds, but they are restricted to cases where geodesics are available in a closed form. We propose a method that generalizes Hit-and-Run slice sampling to more general geometries tailored to the target distribution, by approximating geodesics as solutions to differential equations. Our approach enables the exploration of the regions with strong curvature and rapid transitions between modes in multimodal distributions. We demonstrate the advantages of the approach over challenging sampling problems.
- Conditional Noise-Contrastive Estimation of Energy-Based Models by Jumping between ModesHanlin Yu, Michael U. Gutmann, Arto Klami, and 1 more authorIn EurIPS 2025 Workshop on Principles of Generative Modeling (PriGM), 2025
Learning Energy-Based Models (EBMs) is notoriously difficult when the data distribution is multi-modal. Standard methods such as Score Matching — even when amortized across many noisy versions of the data as in Energy-Based Diffusion Models — often fail to capture relative energies between modes because they rely solely on local energy differences. We address this limitation by also considering global energy differences. To do so, we use Conditional Noise-Contrastive Estimation (CNCE) which estimates energy differences between pairs of points drawn using a freely chosen noise distribution. We design this noise distribution to propose pairs of points from different modes, thus comparing the modes directly. We further obtain the asymptotic estimation error of CNCE, derive a theoretically optimal noise distribution, and provide a practical algorithm that combines local and global energy differences. Experiments show that this approach substantially improves estimation in multi-modal settings.
- Connecting Neural Models Latent Geometries with Relative Geodesic RepresentationsHanlin Yu, Berfin Inal, Georgios Arvanitidis, and 3 more authorsIn The Thirty-Ninth Annual Conference on Neural Information Processing Systems, 2025
Neural models learn representations of high-dimensional data on low-dimensional manifolds. Multiple factors, including stochasticities in the training process, model architectures, and additional inductive biases, may induce different representations, even when learning the same task on the same data. However, it has recently been shown that when a latent structure is shared between distinct latent spaces, relative distances between representations can be preserved, up to distortions. Building on this idea, we demonstrate that exploiting the differential-geometric structure of latent spaces of neural models, it is possible to capture precisely the transformations between representational spaces trained on similar data distributions. Specifically, we assume that distinct neural models parametrize approximately the same underlying manifold, and introduce a representation based on the pullback metric that captures the intrinsic structure of the latent space, while scaling efficiently to large models. We validate experimentally our method on model stitching and retrieval tasks, covering autoencoders and vision foundation discriminative models, across diverse architectures, datasets, pretraining schemes and modalities.
- Density Ratio Estimation with Conditional Probability PathsHanlin Yu, Arto Klami, Aapo Hyvarinen, and 2 more authorsIn Forty-Second International Conference on Machine Learning, 2025
Density ratio estimation in high dimensions can be reframed as integrating a certain quantity, the time score, over probability paths which interpolate between the two densities. In practice, the time score has to be estimated based on samples from the two densities. However, existing methods for this problem remain computationally expensive and can yield inaccurate estimates. Inspired by recent advances in generative modeling, we introduce a novel framework for time score estimation, based on a conditioning variable. Choosing the conditioning variable judiciously enables a closed-form objective function. We demonstrate that, compared to previous approaches, our approach results in faster learning of the time score and competitive or better estimation accuracies of the density ratio on challenging tasks. Furthermore, we establish theoretical guarantees on the error of the estimated density ratio.
2024
- Non-Geodesically-Convex Optimization in the Wasserstein SpaceHoang Phuc Hau Luu, Hanlin Yu, Bernardo Williams, and 4 more authorsIn The Thirty-Eighth Annual Conference on Neural Information Processing Systems, 2024
We study a class of optimization problems in the Wasserstein space (the space of probability measures) where the objective function is nonconvex along generalized geodesics. Specifically, the objective exhibits some difference-of-convex structure along these geodesics. The setting also encompasses sampling problems where the logarithm of the target distribution is difference-of-convex. We derive multiple convergence insights for a novel semi Forward-Backward Euler scheme under several nonconvex (and possibly nonsmooth) regimes. Notably, the semi Forward-Backward Euler is just a slight modification of the Forward-Backward Euler whose convergence is—to our knowledge—still unknown in our very general non-geodesically-convex setting.
- Geometric No-U-Turn Samplers: Concepts and EvaluationBernardo Williams, Hanlin Yu, Marcelo Hartmann, and 1 more authorIn Proceedings of The 12th International Conference on Probabilistic Graphical Models, Sep 2024
We enhance geometric Markov Chain Monte Carlo methods, in particular making them easier to use by providing better tools for choosing the metric and various tuning parameters. We extend the No-U-Turn criterion for automatic choice of integration length for Lagrangian Monte Carlo and propose a modification to the computationally efficient Monge metric, as well as summarizing several previously proposed metric choices. Through extensive experimentation, including synthetic examples and posteriordb benchmarks, we demonstrate that Riemannian metrics can outperform Euclidean counterparts, particularly in scenarios with high curvature, while highlighting how the optimal choice of metric is problem-specific.
- Riemannian Laplace Approximation with the Fisher MetricHanlin Yu, Marcelo Hartmann, Bernardo Williams Moreno Sanchez, and 2 more authorsIn Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, May 2024
Laplace’s method approximates a target density with a Gaussian distribution at its mode. It is computationally efficient and asymptotically exact for Bayesian inference due to the Bernstein-von Mises theorem, but for complex targets and finite-data posteriors it is often too crude an approximation. A recent generalization of the Laplace Approximation transforms the Gaussian approximation according to a chosen Riemannian geometry providing a richer approximation family, while still retaining computational efficiency. However, as shown here, its properties depend heavily on the chosen metric, indeed the metric adopted in previous work results in approximations that are overly narrow as well as being biased even at the limit of infinite data. We correct this shortcoming by developing the approximation family further, deriving two alternative variants that are exact at the limit of infinite data, extending the theoretical analysis of the method, and demonstrating practical improvements in a range of experiments.
2023
- Warped Geometric Information on the Optimisation of Euclidean FunctionsMarcelo Hartmann, Bernardo Williams, Hanlin Yu, and 3 more authorsMay 2023
We consider the fundamental task of optimising a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use Riemannian geometry notions to redefine the optimisation problem of a function on the Euclidean space to a Riemannian manifold with a warped metric, and then find the function’s optimum along this manifold. The warped metric chosen for the search domain induces a computational friendly metric-tensor for which optimal search directions associated with geodesic curves on the manifold becomes easier to compute. Performing optimization along geodesics is known to be generally infeasible, yet we show that in this specific manifold we can analytically derive Taylor approximations up to third-order. In general these approximations to the geodesic curve will not lie on the manifold, however we construct suitable retraction maps to pull them back onto the manifold. Therefore, we can efficiently optimize along the approximate geodesic curves. We cover the related theory, describe a practical optimization algorithm and empirically evaluate it on a collection of challenging optimisation benchmarks. Our proposed algorithm, using 3rd-order approximation of geodesics, tends to outperform standard Euclidean gradient-based counterparts in term of number of iterations until convergence.
- Scalable Stochastic Gradient Riemannian Langevin Dynamics in Non-Diagonal MetricsHanlin Yu, Marcelo Hartmann, Bernardo Williams, and 1 more authorTransactions on Machine Learning Research, May 2023
Stochastic-gradient sampling methods are often used to perform Bayesian inference on neural networks. It has been observed that the methods in which notions of differential geometry are included tend to have better performances, with the Riemannian metric improving posterior exploration by accounting for the local curvature. However, the existing methods often resort to simple diagonal metrics to remain computationally efficient. This loses some of the gains. We propose two non-diagonal metrics that can be used in stochastic-gradient samplers to improve convergence and exploration but have only a minor computational overhead over diagonal metrics. We show that for fully connected neural networks (NNs) with sparsity-inducing priors and convolutional NNs with correlated priors, using these metrics can provide improvements. For some other choices the posterior is sufficiently easy also for the simpler metrics.