|
|
|
|
|
|
| SpineSegDiff: a diffusion-based dual-encoder framework for semantic segmentation of lumbar spine MRI into vertebral bodies (VB), intervertebral discs (IVD), and spinal canal (SC) from T1- and T2-weighted scans, with uncertainty heatmaps for clinical reliability assessment. |
| This study introduces a diffusion-based framework — SpineSegDiff — for robust and accurate semantic segmentation of lumbar spine MRI scans from patients with low back pain (LBP), regardless of whether the scans are T1- or T2-weighted. We compared with advanced models for segmenting vertebrae, intervertebral discs (IVDs), and spinal canal using the SPIDER dataset. SpineSegDiff achieved segmentation performance comparable to the state-of-the-art non-diffusion nnUNet, particularly in improving the identification of degenerated IVDs. The uncertainty maps generated by our model provide valuable insights for clinical review, enhancing the robustness and reliability of the segmentation results. The potential of diffusion models to enhance the diagnosis and management of LBP through more precise analysis of pathological spine MRI is underscored by our findings. |
SpineSegDiff ArchitectureSpineSegDiff is a 2D diffusion-based model that directly infers segmentation masks for the central sagittal slice of lumbar spine MRI at 320×320 resolution, without the need for sliding-window inference. The architecture combines a U-shaped denoising network with a dedicated multi-scale image encoder (IE) that extracts rich features from the MRI input y. At each diffusion timestep, the MRI scan is concatenated with the partially noised mask xt and processed jointly through the dual-encoder to produce the denoised prediction. The model is trained with a composite loss combining MSE (reconstruction quality), Dice loss (boundary alignment), and Binary Cross-Entropy (probability calibration). |
|
|
| Fig. 1 — Architecture overview (top): The MRI scan y is concatenated with the partially noised mask and fed into the dual-encoder (image encoder IE + denoising UNet). Inference (bottom): S=15 intermediate samples are generated via DDIM; the final prediction is computed as a timestep-weighted sum over the last TS=10 steps, with weights scaled exponentially by time. Uncertainty heatmaps are derived as the maximum entropy across timesteps. |
Presegmentation Training StrategyTraining SpineSegDiff from pure Gaussian noise requires a large number of diffusion timesteps (T=1000). To overcome this, we adapt a presegmentation strategy: a pre-trained nnU-Net first produces an initial segmentation x̂pre, which is then partially noised using a cosine noise scheduler to obtain xT. SpineSegDiff is trained to recover the original mask x0 from this partially noised presegmentation, rather than from random noise. This strategy significantly reduces the number of diffusion steps needed — an ablation study showed that T=30 achieves comparable accuracy to T=1000, making inference substantially faster and more practical for clinical deployment. |
|
| Fig. 2 — Presegmentation training pipeline: nnU-Net generates an initial mask xpre from the MRI input y, which is partially noised (cosine schedule) to produce xT. SpineSegDiff then learns to denoise back to the ground-truth mask x0 via a shortened diffusion process. |
|
| Visual comparison on challenging pathological cases (Pfirmann grades 4–5, endplate irregularities, disc narrowing). Columns: Original MRI, Ground-truth label, SpineSegDiff, IISDM, Diff-UNet 2D, nnU-Net 2D, T1w Heatmap, T2w Heatmap. SpineSegDiff uncertainty heatmaps (darker red = higher uncertainty) effectively highlight anatomical ambiguity at boundaries of spinal structures. |
|
M. Monzon*, T. Iff*, E. Konukoglu, C. R. Jutzeler. Enhancing Low Back Pain Assessment with Diffusion Models for Lumbar Spine MRI Segmentation. Proceedings of Machine Learning Research, 2025. MIDL 2025 · * Contributed equally (arXiv) |
Acknowledgements |