MSc Dissertation · Bournemouth NCCA · 2024–2025

SegSplat

3D instance segmentation for Gaussian splats. Trained on RGB-D scans, applied to photogrammetric captures with zero fine-tuning.

Results

Segmented Gaussian Splat Captures

Neither scene below was included in training. The models learned from RGB-D depth sensor scans and were applied to these photogrammetric Gaussian splat captures without any fine-tuning.

Scene Classification

A living room scan where each object is identified and assigned a distinct colour.

Extracted Chair

A single chair extracted from the scene and exported as an independent Gaussian splat with all original visual properties preserved.

Open Chair

About

What It Does

SegSplat labels every point in a Gaussian splat with a category and an instance ID, so "chair #2" can be extracted, edited, or replaced independently. The key result is cross-domain transfer: with a frozen Sonata encoder from Meta and a PointGroup decoder trained under heavy augmentation, the same weights learn from clean RGB-D depth scans and segment noisy photogrammetric splats with zero fine-tuning.

Approach

How It Works

The architecture pairs Meta's Sonata encoder with a PointGroup decoder. Sonata is a frozen self-supervised transformer pre-trained on 140,000 scenes across 32 GPUs, so only 13% of the model is trainable and the whole thing fits on a single A100 in Colab. PointGroup then predicts a category for each point and an offset toward its parent object's centre, and points with matching offsets cluster into instances.

Key Discovery

Sonata's transformer features need a much wider clustering radius than PointGroup's original U-Net features, and heavy augmentation was what let the frozen encoder generalise from clean RGB-D scans to the noisy distributions of Gaussian splats.

Challenge

No Good Training Data Exists

No Gaussian splat dataset supports object-level instance segmentation, so SegSplat trains on ScanNet's clean RGB-D scans and transfers to the noisy, uneven distributions of photogrammetric captures. Freestanding objects segment cleanly, while objects that touch or overlap (chairs tucked under tables) are the hard cases.

RGB-D → Splat

Zero Fine-tuning

Same weights trained on ScanNet depth scans transfer directly to photogrammetric Gaussian splats.

1 GPU

Google Colab A100

Frozen encoder reduces trainable parameters to 13% of the full model.

Non-destructive

Original Data Preserved

Labels are injected as new attributes. All visual data including spherical harmonics and lighting remain untouched.

Built With

Sonata (Meta) Point Transformer V3 PointGroup Pointcept PyTorch CUDA 11.8 ScanNet v2 Python 3.11 Google Colab / A100

Key References

Sonata — Wu et al. Self-Supervised Learning of Reliable Point Representations. CVPR 2025

Point Transformer V3 — Wu et al. Simpler, Faster, Stronger. CVPR 2024

PointGroup — Jiang et al. Dual-Set Point Grouping for 3D Instance Segmentation. CVPR 2020

ScanNet — Dai et al. Richly-Annotated 3D Reconstructions of Indoor Scenes. CVPR 2017

Pointcept — A Codebase for Point Cloud Perception Research. 2023

GitHub