Convert data from SpatialData into InSituPy#
Setup and Imports#
# Enable autoreload for development
%load_ext autoreload
%autoreload 2
Make sure SpatialData is installed#
If it is not installed yet, install it with:
pip install spatialdata[extra]
Make sure the version is >=0.7.2. For more information on the installation of SpatialData see here.
from pathlib import Path
from insitupy import InSituData, CACHE
from insitupy.spatialdata import convert_from_spatialdata
Load SpatialData#
First, let’s load a SpatialData object. We’ll use the spatialdata_io package to read data from common spatial transcriptomics platforms like Xenium or MERSCOPE.
In this tutorial, we’ll use a Xenium dataset as an example. If you did not download the demo datasets already, checkout the demo dataset tutorial to learn how to do so.
from spatialdata_io import xenium
# Path to your Xenium output folder
datapath = CACHE / "demo_datasets/xenium_hbreastcancer/output-XETG00000__slide_id__hbreastcancer"
# Load the Xenium data as a SpatialData object
sdata = xenium(datapath)
WARNING The `feature_key` column feature_name is categorical with unknown categories. Please ensure the categories
are known before calling `PointsModel.parse()` to avoid significant performance implications due to the
need for dask to compute the categories. If you did not use PointsModel.parse() explicitly in your code
(e.g. this message is coming from a reader in `spatialdata_io`), please report this finding.
# Display the SpatialData object to see available elements
sdata
SpatialData object
├── Images
│ ├── 'morphology_focus': DataTree[cyx] (1, 25778, 35416), (1, 12889, 17708), (1, 6444, 8854), (1, 3222, 4427), (1, 1611, 2213)
│ └── 'morphology_mip': DataTree[cyx] (1, 25778, 35416), (1, 12889, 17708), (1, 6444, 8854), (1, 3222, 4427), (1, 1611, 2213)
├── Labels
│ ├── 'cell_labels': DataTree[yx] (25778, 35416), (12889, 17708), (6444, 8854), (3222, 4427), (1611, 2213)
│ └── 'nucleus_labels': DataTree[yx] (25778, 35416), (12889, 17708), (6444, 8854), (3222, 4427), (1611, 2213)
├── Points
│ └── 'transcripts': DataFrame with shape: (<Delayed>, 8) (3D points)
├── Shapes
│ ├── 'cell_boundaries': GeoDataFrame shape: (167780, 1) (2D shapes)
│ ├── 'cell_circles': GeoDataFrame shape: (167780, 2) (2D shapes)
│ └── 'nucleus_boundaries': GeoDataFrame shape: (167780, 1) (2D shapes)
└── Tables
└── 'table': AnnData (167780, 313)
with coordinate systems:
▸ 'global', with elements:
morphology_focus (Images), morphology_mip (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes)
To better understand the SpatialData structure, please look into the SpatialData documentation.
Convert to InSituData#
The convert_from_spatialdata() function converts a SpatialData object into an InSituData object. You need to specify which elements from the SpatialData object should be mapped to InSituPy’s data structure.
Key Parameters#
Parameter |
Description |
|---|---|
|
The SpatialData object to convert |
|
Dictionary mapping image names to |
|
Key for cell shapes in SpatialData (e.g., |
|
Key for the expression table (default: |
|
Tuple of |
|
Tuple of |
|
Key for transcript points (default: |
|
Identifier for the slide |
|
Identifier for the sample |
|
Name of the spatial method (e.g., |
# Define the pixel size (in micrometers per pixel)
# For Xenium, this is typically 0.2125 µm/pixel
pixel_size = 0.2125
# Convert SpatialData to InSituData
xd = convert_from_spatialdata(
sdata=sdata,
# Map images: {new_name: (spatialdata_key, pixel_size)}
image_data={
"nuclei": ("morphology_mip", pixel_size, True),
"mip": ("morphology_focus", pixel_size)
},
# Cell data configuration
cells_key="cell_circles",
table_key="table",
# Boundary masks
cell_boundaries_data=("cell_labels", pixel_size),
nucleus_boundaries_data=("nucleus_labels", pixel_size),
# Transcripts
transcripts_key="transcripts",
# Metadata
slide_id="slide_demo",
sample_id="sample_demo",
method_name="Xenium"
)
2026-02-24 10:01:31 | [INFO] Using 'global' coordinate system for pixel size extraction.
Adding images...
Adding cell data...
2026-02-24 10:01:31 | [WARNING] Spatial coordinates in `obsm['spatial']` are overwritten using centroids from `'cell_circles'`.
2026-02-24 10:01:31 | [WARNING] For the segmentation mask values of the boundaries, it is assumed that the order of the cells matches the ascending values of the segmentation mask.
Adding transcripts...
# Display the converted InSituData object
xd
InSituData
Method: Xenium
Slide ID: slide_demo
Sample ID: sample_demo
Path: None
➤ images
'nuclei': (25778, 35416)
'mip': (25778, 35416)
➤ cells
MultiCellData with main layer 'main'
table
AnnData object with n_obs × n_vars = 167780 × 313
obs: 'cell_id', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'region'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatialdata_attrs'
obsm: 'spatial'
boundaries
BoundariesData object with 2 entries:
cells
nuclei
➤ transcripts
DataFrame with shape <dask_expr.expr.Scalar: expr=(Assign(frame=RenameFrame(frame=Assign(frame=Assign(frame=Assign(frame=Assign(frame=Assign(frame=Assign(frame=ColumnsSetter(frame=Assign(frame=ReadParquetFSSpec(5462333))[['x_location', 'y_location', 'z_location']], columns=['x', 'y', 'z']))))))), columns={'x': 'x_location', 'y': 'y_location', 'z': 'z_location'}))).size() // 8, dtype=int64> x 8
Explore the Converted Data#
Let’s verify that all data modalities were correctly converted.
# Check available images
xd.images
'nuclei': (25778, 35416)
'mip': (25778, 35416)
# Check cell data and spatial coordinates
xd.cells.table.obsm['spatial']
array([[ 847.25991211, 326.19136505],
[ 826.34199524, 328.03182983],
[ 848.76691895, 331.74318695],
...,
[7470.15942383, 5119.13205566],
[7477.73720703, 5128.71281738],
[7489.3765625 , 5123.19777832]], shape=(167780, 2))
# Check cell boundaries
xd.cells.boundaries
BoundariesData object with 2 entries:
cells
nuclei
# Check transcripts
xd.transcripts
| x_location | y_location | z_location | feature_name | cell_id | qv | overlaps_nucleus | transcript_id | |
|---|---|---|---|---|---|---|---|---|
| npartitions=8 | ||||||||
| float32 | float32 | float32 | object | int32 | float32 | uint8 | uint64 | |
| ... | ... | ... | ... | ... | ... | ... | ... | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... | ... | ... | |
| ... | ... | ... | ... | ... | ... | ... | ... |
Saving the Converted Data#
Once converted, you can save the InSituData object to disk in InSituPy’s native format for efficient storage and future use.
# Define output path
outpath = CACHE / "out/from_spatialdata_demo"
# Save the InSituData object
xd.saveas(outpath, overwrite=True)
print(f"Saved to: {outpath}")
Saving data to C:\Users\ge37voy\.cache\InSituPy\out\from_spatialdata_demo
Saved.
Saved to: C:\Users\ge37voy\.cache\InSituPy\out\from_spatialdata_demo
Loading Saved Data#
The saved data can be loaded back using InSituPy’s standard reading functions.
# Load the saved data
xd_loaded = InSituData.read(outpath)
xd_loaded.load_all()
# Display the loaded data
xd_loaded
InSituData
Method: Xenium
Slide ID: slide_demo
Sample ID: sample_demo
Path: C:\Users\ge37voy\.cache\InSituPy\out\from_spatialdata_demo
➤ images
'mip': (25778, 35416)
'nuclei': (25778, 35416)
➤ cells
MultiCellData with main layer 'main'
table
AnnData object with n_obs × n_vars = 167780 × 313
obs: 'cell_id', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'region'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatialdata_attrs'
obsm: 'spatial'
boundaries
BoundariesData object with 2 entries:
cells
nuclei
➤ transcripts
DataFrame with shape <dask_expr.expr.Scalar: expr=ReadParquetFSSpec(269c66f).size() // 8, dtype=int64> x 8
# Visualize the data
xd_loaded.show()
2026-02-24 12:53:35 | [INFO] Extracting unique gene names from Dask DataFrame...
2026-02-24 12:53:37 | [INFO] Found 541 unique genes
2026-02-24 12:53:45 | [INFO] Loading coordinates for gene 'ACTA2'...
2026-02-24 12:53:47 | [INFO] Loaded 439233 coordinates for gene 'ACTA2'
Alternative: Using the Xenium Reader with SpatialData Backend#
InSituPy also provides a convenient way to read Xenium data directly using SpatialData as the backend. This combines the loading and conversion into a single step.
from insitupy.io import read_xenium
# Read Xenium data using SpatialData backend
xd_direct = read_xenium(datapath, backend="spatialdata")
2026-02-24 13:13:36 | [INFO] Reading Xenium data with spatialdata-io backend...
WARNING The `feature_key` column feature_name is categorical with unknown categories. Please ensure the categories
are known before calling `PointsModel.parse()` to avoid significant performance implications due to the
need for dask to compute the categories. If you did not use PointsModel.parse() explicitly in your code
(e.g. this message is coming from a reader in `spatialdata_io`), please report this finding.
2026-02-24 13:15:21 | [INFO] Using 'global' coordinate system for pixel size extraction.
Adding images...
Adding cell data...
2026-02-24 13:15:21 | [WARNING] Spatial coordinates in `obsm['spatial']` are overwritten using centroids from `'cell_circles'`.
2026-02-24 13:15:21 | [WARNING] For the segmentation mask values of the boundaries, it is assumed that the order of the cells matches the ascending values of the segmentation mask.
Adding transcripts...
# Display the result
xd_direct
InSituData
Method: Xenium
Slide ID: slide_id
Sample ID: sample_id
Path: None
➤ images
'nuclei': (25778, 35416)
'mip': (25778, 35416)
➤ cells
MultiCellData with main layer 'main'
table
AnnData object with n_obs × n_vars = 167780 × 313
obs: 'cell_id', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'region'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatialdata_attrs'
obsm: 'spatial'
boundaries
BoundariesData object with 2 entries:
cells
nuclei
➤ transcripts
DataFrame with shape <dask_expr.expr.Scalar: expr=(Assign(frame=RenameFrame(frame=Assign(frame=Assign(frame=Assign(frame=Assign(frame=Assign(frame=Assign(frame=ColumnsSetter(frame=Assign(frame=ReadParquetFSSpec(5462333))[['x_location', 'y_location', 'z_location']], columns=['x', 'y', 'z']))))))), columns={'x': 'x_location', 'y': 'y_location', 'z': 'z_location'}))).size() // 8, dtype=int64> x 8