Investigate gene expression and cell density along an axis

Investigate gene expression and cell density along an axis#

from pathlib import Path
from insitupy import InSituData, CACHE

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

Load Xenium data into `InSituData` object#

Now the Xenium data can be parsed by providing the data path to the InSituPy project folder.

insitupy_project = Path(CACHE / "out/demo_insitupy_project")

xd = InSituData.read(insitupy_project)
xd.load_all()

xd

InSituData
Method:		Xenium
Slide ID:	0001879
Sample ID:	Replicate 1
Path:		C:\Users\ge37voy\.cache\InSituPy\out\demo_insitupy_project
Metadata file:	.ispy
    ➤ images
       nuclei:	(25778, 35416)
       CD20:	(25778, 35416)
       HER2:	(25778, 35416)
       HE:	(25778, 35416, 3)
    ➤ cells
       matrix
           AnnData object with n_obs × n_vars = 157600 × 297
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'n_genes_by_counts', 'n_genes', 'leiden', 'cell_type_dc', 'cell_type_tacco', 'cell_type_dc_sub'
           var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells'
           uns: 'cell_type_dc_colors', 'cell_type_dc_sub', 'cell_type_dc_sub_colors', 'cell_type_tacco_colors', 'counts_location', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
           obsm: 'OT', 'X_pca', 'X_umap', 'annotations', 'ora_estimate', 'ora_pvals', 'regions', 'spatial'
           varm: 'OT', 'PCs'
           layers: 'counts', 'norm_counts'
           obsp: 'connectivities', 'distances'
       boundaries
           BoundariesData object with 2 entries:
               cells
               nuclei
    ➤ transcripts
       DataFrame with shape Delayed('int-dddca522-4bd4-4f58-ba45-37e8d414bb2c') x 8
    ➤ annotations
       TestKey:	8 annotations, 2 classes ('TestClass','points') ✔
       demo:	4 annotations, 2 classes ('Positive','Negative') ✔
       demo2:	5 annotations, 3 classes ('Negative','Positive','Other') ✔
       demo3:	7 annotations, 5 classes ('Stroma','Necrosis','Immune cells','unclassified','Tumor') ✔
       Demo:	28 annotations, 2 classes ('Tumor cells','Stroma') ✔
    ➤ regions
       test:	7 regions, 5 classes ('Stroma','Necrosis','Immune cells','unclassified','Tumor') ✔
       demo_regions:	3 regions, 3 classes ('Region1','Region2','Region3') ✔
       TMA:	6 regions, 6 classes ('B-2','A-3','B-1','B-3','A-1','A-2') ✔
       Demo:	3 regions, 3 classes ('Region 1','Region 3','Region 2') ✔

import scanpy as sc
sc.pl.umap(adata=xd.cells.matrix, color=["cell_type_dc_sub", "cell_type_dc", "leiden"], wspace=0.6)

../../_images/6c23ab3f65632a8efef881011d2f18cb6ffeea2e54d340808d73f4f5399d6e43.png

Import annotations#

xd.import_annotations(
    files=["./demo_annotations/annotations-Tumor.geojson",
           "./demo_annotations/demo_annotations.geojson",
           "./demo_annotations/demo_points.geojson"],
    keys=["Tumor", "Demo", "Demo"],
    scale_factor=0.2125
)

xd.import_regions(
    files="./demo_regions/regions-Tumor.geojson",
    keys="Tumor",
    scale_factor=0.2125
)

Select small region for demonstration purposes#

xdcrop = xd.crop(region_tuple=("Tumor", "Selected Tumor"))

# access transcriptomic data in anndata format from InSituData object
adata = xdcrop.cells.matrix

Calculate density using kernel density estimation or `Mellon`#

The kernel density can be used to identify regions with an increased density of a certain cell type (e.g. immune cells).

from insitupy.utils._calc import calc_density

Option 1: Kernel density estimation#

# Example usage using gaussian kernel density estimation
calc_density(
    adata,
    groupby='cell_type_dc',
    mode="gauss",
    inplace=True)

Option 2: `Mellon` package#

# Example usage using mellon density estimation
calc_density(
    adata,
    groupby='cell_type_dc',
    mode="mellon",
    inplace=True)

[2025-02-24 16:12:43,153] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (3,612) >= n_samples (3,612) and rank = 1.0.
[2025-02-24 16:12:43,154] [INFO    ] Computing nearest neighbor distances.
[2025-02-24 16:12:43,430] [INFO    ] Using embedding dimensionality d=2. Use d_method="fractal" to enable effective density normalization.
[2025-02-24 16:12:43,682] [INFO    ] Using covariance function Matern52(ls=353.31708501505636).
[2025-02-24 16:12:43,684] [INFO    ] Computing Lp.
[2025-02-24 16:12:45,382] [INFO    ] Using rank 3,612 covariance representation.
[2025-02-24 16:12:46,596] [INFO    ] Running inference using L-BFGS-B.
[2025-02-24 16:12:49,150] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (2,619) >= n_samples (2,619) and rank = 1.0.
[2025-02-24 16:12:49,152] [INFO    ] Computing nearest neighbor distances.
[2025-02-24 16:12:49,419] [INFO    ] Using embedding dimensionality d=2. Use d_method="fractal" to enable effective density normalization.
[2025-02-24 16:12:49,615] [INFO    ] Using covariance function Matern52(ls=279.92625703733836).
[2025-02-24 16:12:49,617] [INFO    ] Computing Lp.
[2025-02-24 16:12:51,055] [INFO    ] Using rank 2,619 covariance representation.
[2025-02-24 16:12:51,734] [INFO    ] Running inference using L-BFGS-B.
[2025-02-24 16:12:53,326] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (2,937) >= n_samples (2,937) and rank = 1.0.
[2025-02-24 16:12:53,327] [INFO    ] Computing nearest neighbor distances.
[2025-02-24 16:12:53,563] [INFO    ] Using embedding dimensionality d=2. Use d_method="fractal" to enable effective density normalization.
[2025-02-24 16:12:53,731] [INFO    ] Using covariance function Matern52(ls=330.6008073330627).
[2025-02-24 16:12:53,733] [INFO    ] Computing Lp.
[2025-02-24 16:12:54,940] [INFO    ] Using rank 2,937 covariance representation.
[2025-02-24 16:12:55,407] [INFO    ] Running inference using L-BFGS-B.
[2025-02-24 16:12:56,703] [INFO    ] Using sparse Gaussian Process since n_landmarks (5,000) < n_samples (5,888) and rank = 1.0.
[2025-02-24 16:12:56,704] [INFO    ] Computing nearest neighbor distances.
[2025-02-24 16:12:56,908] [INFO    ] Using embedding dimensionality d=2. Use d_method="fractal" to enable effective density normalization.
[2025-02-24 16:12:57,045] [INFO    ] Using covariance function Matern52(ls=224.74751107139122).
[2025-02-24 16:12:57,047] [INFO    ] Computing 5,000 landmarks with k-means clustering.
[2025-02-24 16:13:05,546] [INFO    ] Using rank 5,000 covariance representation.
[2025-02-24 16:13:07,541] [INFO    ] Running inference using L-BFGS-B.
[2025-02-24 16:13:10,299] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (1,862) >= n_samples (1,862) and rank = 1.0.
[2025-02-24 16:13:10,301] [INFO    ] Computing nearest neighbor distances.
[2025-02-24 16:13:10,556] [INFO    ] Using embedding dimensionality d=2. Use d_method="fractal" to enable effective density normalization.
[2025-02-24 16:13:10,711] [INFO    ] Using covariance function Matern52(ls=203.75474503502824).
[2025-02-24 16:13:10,713] [INFO    ] Computing Lp.
[2025-02-24 16:13:11,819] [INFO    ] Using rank 1,862 covariance representation.
[2025-02-24 16:13:12,176] [INFO    ] Running inference using L-BFGS-B.
[2025-02-24 16:13:13,544] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (1,721) >= n_samples (1,721) and rank = 1.0.
[2025-02-24 16:13:13,546] [INFO    ] Computing nearest neighbor distances.
[2025-02-24 16:13:13,790] [INFO    ] Using embedding dimensionality d=2. Use d_method="fractal" to enable effective density normalization.
[2025-02-24 16:13:13,973] [INFO    ] Using covariance function Matern52(ls=305.7046655864637).
[2025-02-24 16:13:13,975] [INFO    ] Computing Lp.
[2025-02-24 16:13:14,890] [INFO    ] Using rank 1,721 covariance representation.
[2025-02-24 16:13:15,089] [INFO    ] Running inference using L-BFGS-B.
[2025-02-24 16:13:16,485] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (1,382) >= n_samples (1,382) and rank = 1.0.
[2025-02-24 16:13:16,487] [INFO    ] Computing nearest neighbor distances.
[2025-02-24 16:13:16,790] [INFO    ] Using embedding dimensionality d=2. Use d_method="fractal" to enable effective density normalization.
[2025-02-24 16:13:16,994] [INFO    ] Using covariance function Matern52(ls=183.22410121631899).
[2025-02-24 16:13:16,997] [INFO    ] Computing Lp.
[2025-02-24 16:13:18,020] [INFO    ] Using rank 1,382 covariance representation.
[2025-02-24 16:13:18,272] [INFO    ] Running inference using L-BFGS-B.
[2025-02-24 16:13:19,466] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (1,189) >= n_samples (1,189) and rank = 1.0.
[2025-02-24 16:13:19,468] [INFO    ] Computing nearest neighbor distances.
[2025-02-24 16:13:19,693] [INFO    ] Using embedding dimensionality d=2. Use d_method="fractal" to enable effective density normalization.
[2025-02-24 16:13:19,866] [INFO    ] Using covariance function Matern52(ls=249.7099825844308).
[2025-02-24 16:13:19,869] [INFO    ] Computing Lp.
[2025-02-24 16:13:20,706] [INFO    ] Using rank 1,189 covariance representation.
[2025-02-24 16:13:20,897] [INFO    ] Running inference using L-BFGS-B.
[2025-02-24 16:13:21,758] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (654) >= n_samples (654) and rank = 1.0.
[2025-02-24 16:13:21,759] [INFO    ] Computing nearest neighbor distances.
[2025-02-24 16:13:21,941] [INFO    ] Using embedding dimensionality d=2. Use d_method="fractal" to enable effective density normalization.
[2025-02-24 16:13:22,064] [INFO    ] Using covariance function Matern52(ls=176.90525767048888).
[2025-02-24 16:13:22,066] [INFO    ] Computing Lp.
[2025-02-24 16:13:22,780] [INFO    ] Using rank 654 covariance representation.
[2025-02-24 16:13:22,872] [INFO    ] Running inference using L-BFGS-B.

The results are saved in .cells.matrix.obsm["density-{method}'] and can be viewed using .show().

xdcrop.show(keys=["cell_type_dc_sub", "ACTA2"])

Show and save color legend for particular layer#

The color legend is also shown in the napari viewer on the bottom left, but for saving them for a publication in the following the plot_colorlegend function is available.

from insitupy.plotting.plots import plot_colorlegend

Categorical data#

After adding the layer “cell_type” from “obs” in the napari viewer, one can do following:

plot_colorlegend(xdcrop.viewer,
                 layer_name="cell_type_dc_sub",
                 savepath="figures/colorbar-cell_type_dc_sub.pdf")

Saving figure to file figures/colorbar-cell_type_dc_sub.pdf
Saved.

../../_images/a0e42865741305a1568d7515cfbd18908ba00be6519936606fe5e8b1c6243bdf.png

Continuous data#

After adding the layer “density-mellon#Breast cancer” from “obsm” in the napari viewer, one can do following:

plot_colorlegend(xdcrop.viewer,
                 layer_name="ACTA2",
                 savepath="figures/demo_legend.pdf")

Saving figure to file figures/demo_legend.pdf
Saved.

../../_images/690d43fa5c3ccd067b0b6ca77e39299771622de995dd5aee82bed43596ab7744.png

Save currently displayed color legend#

To save the currently in the napari viewer displayed color legend, one can use save_current_colorlegend. But it is important that the color legend is fully displayed in the viewer. If the color legend is cropped because the window of the widget is not wide enough, this will also be the case in the saved file.

xdcrop.save_current_colorlegend("figures/testtest.pdf")

Figure saved as figures/testtest.pdf

xd.show()

Visualize effects along an axis#

Alternatively to visualizing the cellular gene expression or density in 2D as shown above, it is also possible to visualize it along an axis. In the example scenario below we calculate the distance of each cell to the tumor center and visualize the cellular density and gene expression along this axis.

Calculate distance of cells to annotations#

To generate an axis we use here a selected set of annotations and calculate the distance of all cells to these annotations. For demonstration purposes we selected a region of the breast cancer dataset and annotated tumor cells within this region:

These annotations and the region can be imported from files in the repository but of course it would be also possible to do own annotations and select an own region and save the results using .store_geometries().

from insitupy import calc_distance_of_cells_from

calc_distance_of_cells_from(
    data=xdcrop,
    annotation_key="Demo",
    annotation_class="Tumor center",
    # region_key="Tumor",
    # region_name="Selected Tumor"
)

Calculate the distance of cells from the annotation "Tumor center"
Saved distances to `.cells.matrix.obsm["distance_from"]["Tumor center"]`

The distances can be accessed in .cells.matrix.obsm["distance_from"]

xdcrop.cells.matrix.obsm["distance_from"]

	Tumor center
4353	231.644821
4356	226.342973
4358	213.661463
4359	216.604613
4390	226.413766
...	...
118567	135.302369
118569	156.354595
118570	162.485566
118575	147.131136
118576	164.307876

21864 rows × 1 columns

Visualize the results using `napari`#

Using .show() we can visualize the results and see the distance values per cell:

Plot cell abundance along axis#

In addition to exploring the density of a certain cell type in two dimension, one can also explore it along a certain axis.

from insitupy.plotting import cell_abundance_along_axis

cell_abundance_along_axis(
    adata=xdcrop.cells.matrix,
    axis=("distance_from", "Tumor center"),
    groupby="cell_type_dc_sub",
    kde=True
)

Retrieve `obs_val` from .obsm.

../../_images/c916a35a529c4ca488075f7759599cedc522be9bed435a5921b76d224679ec05.png

cell_abundance_along_axis(
    adata=xdcrop.cells.matrix,
    axis=("distance_from", "Tumor center"),
    groupby="cell_type_dc_sub",
    xlim=(0,400),
    kde=True,
    savepath="figures/cell_abundance_along_axis.pdf"
)

Retrieve `obs_val` from .obsm.
Saving figure to file figures/cell_abundance_along_axis.pdf
Saved.

../../_images/07d2222be37637d367b538902957ced5429006c435f456cdc17290d09ee56277.png

Explore cellular density and gene expression along axis#

In single-cell spatial transcriptomic data it is important to explore the gene expression on a cell type level. The cell_expression_along_axis function let’s you do so.

from insitupy.plotting import cell_expression_along_axis

# Example usage:
cell_expression_along_axis(
    adata=xdcrop.cells.matrix,
    axis=("distance_from", "Tumor center"),
    cell_type_column="cell_type_dc_sub",
    cell_type="Fibroblasts",
    genes=["ACTA2", "LUM", "MMP2", "CXCL12"],
    xlim=(50, 400),
    kde=False,
    fit_reg=True,
    min_expression=0,
    savepath="figures/gene_expr_along_axis_fibroblasts.pdf"
    )

Retrieve `obs_val` from .obsm.
Saving figure to file figures/gene_expr_along_axis_fibroblasts.pdf
Saved.

../../_images/a2314fb85a9ee2ff86a42467f99f17d0b22f03f3244ce9ef704c799bb3a86fb7.png