Download on Hugging Face. Description is in the paper A large-scale heterogeneous 3D magnetic resonance
brain imaging dataset for self-supervised learning
FOMO60K
is a subset of FOMO300K that includes 60,529 MRI scans collected from 13,900 MRI sessions across 11,187 subjects, aggregated from 16 publicly available datasets. In contrast to FOMO300K, all scans in FOMO60K were affinely co-registered within each session to the image with the highest spatial resolution. Additionally, each scan was either skull-stripped or defaced (details provided below). Table 3 summarizes the source datasets, including the number of subjects, sessions, and scans, as well as the MRI sequence types, applied preprocessing steps, and dataset licenses.
The preprocessing pipeline for FOMO60K consisted of three main stages: (1) reorienting images to RAS orientation (as performed in FOMO300K), (2) affine co-registration, and (3) skull stripping. First, all scans were reoriented to RAS and affinely co-registered using the mri_coreg command from FreeSurfer 7.4.1, with default parameters. Within each MRI session, scans were aligned to the image with the highest spatial resolution in order to preserve maximal anatomical detail.
FOMO300K
Some are skull-striped, some are not.
After downloading and unzipping to the same original file structure, this is a script that:
- Picks one scan per dataset (PT001, PT002, etc.)
- Loads and plots a middle slice
- So you can visually inspect which datasets have skulls, artifacts,….
"""
FOMO300K Skull Strip Visual Inspector
--------------------------------------
Samples one scan per dataset (PTxxx_DatasetName) from mapping.tsv,
loads a middle axial slice, and plots a grid so you can visually
identify which datasets have skulls vs. are skull-stripped/defaced.
Usage:
python visualize_skull_check.py \
--fomo_root /path/to/FOMO300K \
--mapping /path/to/FOMO300K/mapping.tsv \
--out_dir ./skull_check_plots
# Optional: only plot specific PT datasets
python visualize_skull_check.py \
--fomo_root /path/to/FOMO300K \
--mapping /path/to/FOMO300K/mapping.tsv \
--out_dir ./skull_check_plots \
--datasets PT001 PT002 PT005
"""
import argparse
import os
import math
import random
from pathlib import Path
from collections import defaultdict
import numpy as np
import pandas as pd
import nibabel as nib
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
# ── helpers ──────────────────────────────────────────────────────────────────
def load_middle_slice(nii_path: Path):
"""Return the middle axial slice of a 3-D NIfTI (or first vol of 4-D)."""
img = nib.load(str(nii_path))
data = img.get_fdata(dtype=np.float32)
# Handle 4-D volumes (take first volume)
if data.ndim == 4:
data = data[..., 0]
# Reorient to RAS so axial = last axis
img_ras = nib.as_closest_canonical(img)
data = img_ras.get_fdata(dtype=np.float32)
if data.ndim == 4:
data = data[..., 0]
mid = data.shape[2] // 2
slc = data[:, :, mid]
return slc
def norm(slc: np.ndarray) -> np.ndarray:
"""Normalise slice to [0, 1] for display."""
p2, p98 = np.percentile(slc[slc > 0], [2, 98]) if slc.any() else (0, 1)
slc = np.clip(slc, p2, p98)
rng = p98 - p2
if rng == 0:
return np.zeros_like(slc)
return (slc - p2) / rng
# ── main ─────────────────────────────────────────────────────────────────────
def main():
parser = argparse.ArgumentParser(description="Visual skull-strip checker for FOMO300K")
parser.add_argument("--fomo_root", required=True, help="Root directory of FOMO300K")
parser.add_argument("--mapping", required=True, help="Path to mapping.tsv")
parser.add_argument("--out_dir", default="./skull_check_plots", help="Output directory for plots")
parser.add_argument("--datasets", nargs="*", default=None,
help="Subset of dataset prefixes to inspect, e.g. PT001 PT002. Default: all.")
parser.add_argument("--seed", type=int, default=42, help="Random seed for scan sampling")
parser.add_argument("--cols", type=int, default=6, help="Number of columns in the output grid")
args = parser.parse_args()
random.seed(args.seed)
fomo_root = Path(args.fomo_root)
out_dir = Path(args.out_dir)
out_dir.mkdir(parents=True, exist_ok=True)
# ── load mapping ──────────────────────────────────────────────────────────
print(f"Loading mapping from: {args.mapping}")
df = pd.read_csv(args.mapping, sep="\t", dtype=str)
# Normalise column names (strip whitespace)
df.columns = df.columns.str.strip()
required = {"dataset", "new_path"}
missing = required - set(df.columns)
if missing:
raise ValueError(f"mapping.tsv is missing columns: {missing}. Found: {list(df.columns)}")
# ── filter to requested datasets ──────────────────────────────────────────
all_datasets = sorted(df["dataset"].unique())
print(f"Found {len(all_datasets)} datasets in mapping.tsv")
if args.datasets:
# Allow matching by prefix (e.g. "PT001") or full name
keep = []
for ds in all_datasets:
for pat in args.datasets:
if ds == pat or ds.startswith(pat):
keep.append(ds)
break
datasets = sorted(set(keep))
print(f"Filtered to {len(datasets)} datasets: {datasets}")
else:
datasets = all_datasets
if not datasets:
raise ValueError("No datasets matched. Check --datasets argument.")
# ── sample one scan per dataset ───────────────────────────────────────────
samples = {} # dataset_name -> (nii_path, new_path_str)
for ds in datasets:
sub = df[df["dataset"] == ds].copy()
# Prefer T1w / anat scans for clearest skull visibility
t1_mask = sub["new_path"].str.contains("T1w|t1w|T1|mprage|MPRAGE", na=False)
sub_t1 = sub[t1_mask]
pool = sub_t1 if len(sub_t1) > 0 else sub
row = pool.sample(1, random_state=args.seed).iloc[0]
rel_path = row["new_path"] # e.g. sub-01/ses-01/anat/sub-01_ses-01_T1w.nii.gz
nii_path = fomo_root / ds / rel_path
samples[ds] = (nii_path, rel_path)
# ── load slices ───────────────────────────────────────────────────────────
print(f"\nLoading {len(samples)} scans …")
slices = {} # ds -> np array or None
statuses = {} # ds -> "ok" | "missing" | "error"
for ds, (nii_path, rel) in samples.items():
if not nii_path.exists():
print(f" [MISSING] {ds}: {nii_path}")
statuses[ds] = "missing"
slices[ds] = None
continue
try:
slc = load_middle_slice(nii_path)
slices[ds] = norm(slc)
statuses[ds] = "ok"
print(f" [OK] {ds}: {nii_path.name} shape={slc.shape}")
except Exception as e:
print(f" [ERROR] {ds}: {e}")
statuses[ds] = f"error: {e}"
slices[ds] = None
# ── plot grid ─────────────────────────────────────────────────────────────
n = len(datasets)
ncols = args.cols
nrows = math.ceil(n / ncols)
fig_w = ncols * 3.2
fig_h = nrows * 3.5
fig = plt.figure(figsize=(fig_w, fig_h), facecolor="#0d0d0d")
fig.suptitle(
"FOMO300K · Middle Axial Slice per Dataset\n"
"(Visual check: with skull = skull visible, defaced = face region blanked, skull-stripped = brain only)",
color="white", fontsize=11, y=0.995, va="top"
)
gs = gridspec.GridSpec(nrows, ncols, figure=fig, hspace=0.45, wspace=0.15)
for idx, ds in enumerate(datasets):
row_i = idx // ncols
col_i = idx % ncols
ax = fig.add_subplot(gs[row_i, col_i])
ax.set_facecolor("#0d0d0d")
slc = slices[ds]
if slc is not None:
ax.imshow(np.rot90(slc), cmap="gray", vmin=0, vmax=1, aspect="equal",
interpolation="nearest")
# Derive a short label: "PT001\nClevelandCCF"
parts = ds.split("_", 1)
label = f"{parts[0]}\n{parts[1]}" if len(parts) == 2 else ds
ax.set_title(label, color="white", fontsize=6.5, pad=2, wrap=True)
else:
ax.text(0.5, 0.5, statuses[ds], color="red", fontsize=6,
ha="center", va="center", transform=ax.transAxes, wrap=True)
parts = ds.split("_", 1)
label = f"{parts[0]}\n{parts[1]}" if len(parts) == 2 else ds
ax.set_title(label, color="#888", fontsize=6.5, pad=2)
ax.axis("off")
# Hide empty cells
for idx in range(n, nrows * ncols):
fig.add_subplot(gs[idx // ncols, idx % ncols]).set_visible(False)
out_path = out_dir / "skull_check_all_datasets.png"
fig.savefig(str(out_path), dpi=130, bbox_inches="tight",
facecolor=fig.get_facecolor())
plt.close(fig)
print(f"\nSaved grid plot → {out_path}")
# ── also save a per-scan summary TSV ─────────────────────────────────────
rows = []
for ds, (nii_path, rel) in samples.items():
rows.append({
"dataset": ds,
"sampled_scan": rel,
"full_path": str(nii_path),
"status": statuses[ds],
})
summary_df = pd.DataFrame(rows)
summary_path = out_dir / "sampled_scans.tsv"
summary_df.to_csv(str(summary_path), sep="\t", index=False)
print(f"Saved sample summary → {summary_path}")
print("\nDone. Open the PNG to visually identify skull-stripped datasets.")
if __name__ == "__main__":
main()The following are skull striped
PT009 BraTS-GEN
PT015 MSD_BrainTumor
PT023 Infant_Development_Brain
PT025 MGH_Wild
PT030 OpenNeuro/ds00022
PT030 OpenNeuro/ds001110
PT030 OpenNeuro/ds001235
PT030 OpenNeuro/ds001339
PT030 OpenNeuro/ds001534
PT030 OpenNeuro/ds001551
PT030 OpenNeuro/ds001832
PT030 OpenNeuro/ds001882
PT030 OpenNeuro/ds002011
PT030 OpenNeuro/ds002076
PT030 OpenNeuro/ds002672
PT030 OpenNeuro/ds002675
PT030 OpenNeuro/ds002748
PT030 OpenNeuro/ds002995
PT030 OpenNeuro/ds003007
PT030 OpenNeuro/ds003340
PT030 OpenNeuro/ds003367
PT030 OpenNeuro/ds003511
PT030 OpenNeuro/ds003716
PT030 OpenNeuro/ds003777
PT030 OpenNeuro/ds003835
PT030 OpenNeuro/ds003972
PT030 OpenNeuro/ds004054
PT030 OpenNeuro/ds004187
PT030 OpenNeuro/ds004286
PT030 OpenNeuro/ds004312
PT030 OpenNeuro/ds004553
PT030 OpenNeuro/ds004564
PT030 OpenNeuro/ds004648
PT030 OpenNeuro/ds004666
PT030 OpenNeuro/ds004692
PT030 OpenNeuro/ds004710
PT030 OpenNeuro/ds004993
PT030 OpenNeuro/ds006188The following do not have full skulls:
PT026 MICA_MICs
PT030 OpenNeuro/ds000228
PT030 OpenNeuro/ds000229
PT030 OpenNeuro/ds001168
PT030 OpenNeuro/ds002606PT007 ATAG contains files like sub-04_ses-01_run-1_T2starw.nii.gz that upon inspection looks like this

Check also
PT030 OpenNeuro/ds001912, PT002 Nigerian_Clinical, PT030 OpenNeuro/ds002367, PT030 OpenNeuro/ds003466, PT030 OpenNeuro/ds003763, PT030 OpenNeuro/ds003798, PT030 OpenNeuro/ds003836, PT030 OpenNeuro/ds003949, PT030 OpenNeuro/ds003967, PT030 OpenNeuro/ds003990, PT030 OpenNeuro/ds004798, PT030 OpenNeuro/ds004889, PT030 OpenNeuro/ds005205, PT030 OpenNeuro/ds005075, PT030 OpenNeuro/ds005138, PT030 OpenNeuro/ds005576, before inclusion to training data.
Some of the heads are a bit rotated. For example, PT030 OpenNeuro/ds001984, PT030 OpenNeuro/ds002006, PT030 OpenNeuro/ds002155, PT030 OpenNeuro/ds002711, PT030 OpenNeuro/ds002715, …