Enhancing Low Back Pain Assessment with Diffusion Models
for Lumbar Spine MRI Segmentation

MIDL 2025 · Proceedings of Machine Learning Research

Maria Monzon^{1,2 *}

Thomas Iff^{1 *}

Ender Konukoglu³

Catherine R. Jutzeler^1,2

^* Contributed equally ¹ Biomedical Data Science Lab, ETH Zürich ² Swiss Institute of Bioinformatics (SIB) ³ Computer Vision Lab, ETH Zürich

Paper arXiv Code GitLab Weights

SpineSegDiff: a diffusion-based dual-encoder framework for semantic segmentation of lumbar spine MRI into vertebral bodies (VB), intervertebral discs (IVD), and spinal canal (SC) from T1- and T2-weighted scans, with uncertainty heatmaps for clinical reliability assessment.

Abstract

This study introduces a diffusion-based framework — SpineSegDiff — for robust and accurate semantic segmentation of lumbar spine MRI scans from patients with low back pain (LBP), regardless of whether the scans are T1- or T2-weighted. We compared with advanced models for segmenting vertebrae, intervertebral discs (IVDs), and spinal canal using the SPIDER dataset. SpineSegDiff achieved segmentation performance comparable to the state-of-the-art non-diffusion nnUNet, particularly in improving the identification of degenerated IVDs. The uncertainty maps generated by our model provide valuable insights for clinical review, enhancing the robustness and reliability of the segmentation results. The potential of diffusion models to enhance the diagnosis and management of LBP through more precise analysis of pathological spine MRI is underscored by our findings.

Method

SpineSegDiff Architecture

SpineSegDiff is a 2D diffusion-based model that directly infers segmentation masks for the central sagittal slice of lumbar spine MRI at 320×320 resolution, without the need for sliding-window inference. The architecture combines a U-shaped denoising network with a dedicated multi-scale image encoder (IE) that extracts rich features from the MRI input y. At each diffusion timestep, the MRI scan is concatenated with the partially noised mask x_t and processed jointly through the dual-encoder to produce the denoised prediction. The model is trained with a composite loss combining MSE (reconstruction quality), Dice loss (boundary alignment), and Binary Cross-Entropy (probability calibration).

Fig. 1 — Architecture overview (top): The MRI scan y is concatenated with the partially noised mask and fed into the dual-encoder (image encoder IE + denoising UNet). Inference (bottom): S=15 intermediate samples are generated via DDIM; the final prediction is computed as a timestep-weighted sum over the last T_S=10 steps, with weights scaled exponentially by time. Uncertainty heatmaps are derived as the maximum entropy across timesteps.

Presegmentation Training Strategy

Training SpineSegDiff from pure Gaussian noise requires a large number of diffusion timesteps (T=1000). To overcome this, we adapt a presegmentation strategy: a pre-trained nnU-Net first produces an initial segmentation x̂_pre, which is then partially noised using a cosine noise scheduler to obtain x_T. SpineSegDiff is trained to recover the original mask x₀ from this partially noised presegmentation, rather than from random noise. This strategy significantly reduces the number of diffusion steps needed — an ablation study showed that T=30 achieves comparable accuracy to T=1000, making inference substantially faster and more practical for clinical deployment.

Fig. 2 — Presegmentation training pipeline: nnU-Net generates an initial mask x_pre from the MRI input y, which is partially noised (cosine schedule) to produce x_T. SpineSegDiff then learns to denoise back to the ground-truth mask x₀ via a shortened diffusion process.

Results

Visual comparison on challenging pathological cases (Pfirmann grades 4–5, endplate irregularities, disc narrowing). Columns: Original MRI, Ground-truth label, SpineSegDiff, IISDM, Diff-UNet 2D, nnU-Net 2D, T1w Heatmap, T2w Heatmap. SpineSegDiff uncertainty heatmaps (darker red = higher uncertainty) effectively highlight anatomical ambiguity at boundaries of spinal structures.

Paper and Supplementary Material

M. Monzon*, T. Iff*, E. Konukoglu, C. R. Jutzeler.
Enhancing Low Back Pain Assessment with Diffusion Models for Lumbar Spine MRI Segmentation.
Proceedings of Machine Learning Research, 2025.
MIDL 2025 · * Contributed equally
(arXiv)

[Bibtex]

Acknowledgements

This project was supported by grant #380 of the Strategic Focus Area “Personalized Health and Related Technologies (PHRT)” of the ETH Domain (Swiss Federal Institutes of Technology). We gratefully acknowledge the SPIDER dataset provided by Radboudumc, Jeroen Bosch Hospital, Rijnstate Hospital, and Sint Maartenskliniek. Our implementation builds upon MONAI and the DiffUNet codebase.