Notebook demonstrating the addition of data segmented with proseg

Contents

Notebook demonstrating the addition of data segmented with proseg #

## The following code ensures that all functions and init files are reloaded before executions.
%load_ext autoreload
%autoreload 2

from pathlib import Path
from insitupy import InSituData, CACHE

Load data#

insitupy_project = Path(CACHE / "out/demo_insitupy_project")
xd = InSituData.read(insitupy_project)
xd.load_all()

xd

InSituData
Method:		Xenium
Slide ID:	0001879
Sample ID:	Replicate 1
Path:		C:\Users\ge37voy\.cache\InSituPy\out\demo_insitupy_project
Metadata file:	.ispy
    ➤ images
       nuclei:	(25778, 35416)
       CD20:	(25778, 35416)
       HER2:	(25778, 35416)
       HE:	(25778, 35416, 3)
    ➤ cells
       MultiCellData with main layer 'main'
           matrix
               AnnData object with n_obs × n_vars = 156447 × 297
               obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'n_genes_by_counts', 'n_genes', 'leiden', 'cell_type_dc', 'cell_type_dc_sub', 'cell_type_tacco', 'cell_type_publ'
               var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells'
               uns: 'cell_type_dc_colors', 'cell_type_dc_sub', 'cell_type_dc_sub_colors', 'cell_type_publ_colors', 'cell_type_tacco_colors', 'counts_location', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
               obsm: 'OT', 'X_pca', 'X_umap', 'annotations', 'ora_estimate', 'ora_pvals', 'regions', 'spatial'
               varm: 'OT', 'PCs'
               layers: 'counts', 'norm_counts'
               obsp: 'connectivities', 'distances'
           boundaries
               BoundariesData object with 2 entries:
                   cells
                   nuclei
    ➤ transcripts
       DataFrame with shape <dask_expr.expr.Scalar: expr=ReadParquetFSSpec(52e75f9).size() // 8, dtype=int32> x 8
    ➤ annotations
       TestKey:	9 annotations, 2 classes ('TestClass','points') ✔
       demo:	4 annotations, 1 class ('None') ✔
       demo2:	5 annotations, 1 class ('None') ✔
       demo3:	7 annotations, 1 class ('None') ✔
       Demo:	28 annotations, 2 classes ('Tumor cells','Stroma') ✔
    ➤ regions
       demo_regions:	3 regions, 3 classes ('Region1','Region2','Region3') ✔
       TMA:	6 regions, 6 classes ('B-2','A-3','B-1','B-3','A-1','A-2') ✔
       Demo:	3 regions, 3 classes ('Region 1','Region 3','Region 2') ✔

Select small region for demonstration#

xdcrop = xd.crop(xlim=(2700,3000), ylim=(2700,3000))

Export transcripts for proseg#

transcripts_out_path = Path(CACHE / "out/transcripts_for_proseg.csv")
transcripts_out_path.parent.mkdir(exist_ok=True)

# export transcripts as csv
xdcrop.transcripts.to_csv(transcripts_out_path, single_file=True)

['C:\\Users\\ge37voy\\.cache\\InSituPy\\out\\transcripts_for_proseg.csv']

Install proseg#

For installation checkout the installation instructions in the proseg Github repository. In brief, proseg is a Rust package and can be installed using:

cargo install proseg

Run proseg#

output_path = transcripts_out_path.parent / "proseg_results"
output_path.mkdir(exist_ok=True)

import subprocess

# Start the process
process = subprocess.Popen([
    'proseg',
    '--xenium', str(transcripts_out_path),
    '--output-path', str(output_path),
    '--min-qv', str(20),
    '--excluded-genes', "^(Deprecated|NegControl|Unassigned|Intergenic|BLANK|antisense)"
    ], stdout=subprocess.PIPE)

# Continuously read the output
while True:
    output = process.stdout.readline()
    if output == b'' and process.poll() is not None:
        break
    if output:
        print(output.decode('utf-8', errors='replace').strip())

Using 16 threads
Read 109974 transcripts
587 cells
310 genes
Estimated full area: 94627.77
Full volume: 557341.2
Using grid size 123.81886. Chunks: 9

Alternative approach: running Proseg in the terminal#

If the previous cell did not execute successfully (e.g., due to spaces in your file path), you can run Proseg directly from the terminal.

Before proceeding, ensure that you have the correct paths to the transcript.csv and for the output_path, then replace the placeholders in the command below:

proseg --xenium /path/to/transcripts.csv --output-path /path/to/output_path

After successfully running the command in the command line, please continue with this tutorial.

Add proseg results to `InSituData`#

xdcrop.cells.add_proseg(path=output_path)
xdcrop.cells.add_proseg(path=output_path, key="test") # add the data a second time with another key

Convert counts to float32.
Convert counts to float32.

cropped_out = CACHE / "out/cropped"
xdcrop.saveas(cropped_out, overwrite=True)

Saving data to C:\Users\ge37voy\.cache\InSituPy\out\cropped
Saved.

Reload and visualize data#

xdr = InSituData.read(cropped_out)
xdr.load_all()

# visualize data
xdr.show()