facs2/4 Jupyter Notebook lamindata

Append a new shard of data#

We have one artifact in storage and are about to receive a new shard of data.

In this notebook, we’ll see how to manage the situation.

import lamindb as ln
import bionty as bt
import readfcs

bt.settings.organism = "human"
💡 connected lamindb: testuser1/test-facs
ln.transform.stem_uid = "SmQmhrhigFPL"
ln.transform.version = "0"
ln.track()
💡 notebook imports: anndata==0.9.2 bionty==0.42.4 lamindb==0.69.2 pytometry==0.1.4 readfcs==1.1.7 scanpy==1.10.0
💡 saved: Transform(uid='SmQmhrhigFPL6K79', name='Append a new shard of data', key='facs2', version='0', type=notebook, updated_at=2024-03-28 12:12:12 UTC, created_by_id=1)
💡 saved: Run(uid='1Js7h2elSoUgfd3d9O6h', transform_id=2, created_by_id=1)

Ingest a new artifact#

Access #

Let us validate and register another .fcs file from Oetjen18:

filepath = readfcs.datasets.Oetjen18_t1()

adata = readfcs.read(filepath)
adata
AnnData object with n_obs × n_vars = 241552 × 20
    var: 'n', 'channel', 'marker', '$PnR', '$PnB', '$PnE', '$PnV', '$PnG'
    uns: 'meta'

Transform: normalize #

import anndata as ad
import pytometry as pm
pm.pp.split_signal(adata, var_key="channel")
pm.pp.compensate(adata)
pm.tl.normalize_biExp(adata)
adata = adata[  # subset to rows that do not have nan values
    adata.to_df().isna().sum(axis=1) == 0
]
adata.to_df().describe()
CD95 CD8 CD27 CXCR4 CCR7 LIVE/DEAD CD4 CD45RA CD3 CD49B CD14/19 CD69 CD103
count 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000
mean 887.579860 1302.985717 1221.257257 877.533482 977.505533 1883.358298 556.687953 929.493316 941.166747 966.012244 1210.769935 741.523184 1003.064857
std 573.549695 827.850302 672.851319 411.966073 584.217139 932.113729 480.875917 795.550133 658.984751 456.437094 694.622980 473.287558 642.728024
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 462.757715 493.413744 605.463427 588.047798 495.437303 1063.670965 240.623098 404.087640 477.932659 592.294399 575.401173 380.247262 475.108131
50% 774.350833 1207.624048 1110.367681 782.939692 782.981430 1951.855099 484.355203 557.904360 655.909639 800.280049 1124.574275 705.802991 775.101973
75% 1327.792103 2036.849496 1721.730010 1070.479036 1453.929567 2623.975657 729.754419 1345.771633 1218.445208 1347.042403 1742.288464 1069.175380 1420.744291
max 4053.903716 4065.495666 4095.351322 4025.827267 3999.075551 4096.000000 4088.719985 3961.255364 3940.061146 4089.445928 3982.769373 3810.774988 4023.968008

Validate cell markers #

Let’s see how many markers validate:

validated = bt.CellMarker.validate(adata.var.index)
9 terms (69.20%) are not validated for name: CD95, CXCR4, CCR7, LIVE/DEAD, CD4, CD49B, CD14/19, CD69, CD103

Let’s standardize and re-validate:

adata.var.index = bt.CellMarker.standardize(adata.var.index)
validated = bt.CellMarker.validate(adata.var.index)
7 terms (53.80%) are not validated for name: CD95, CXCR4, LIVE/DEAD, CD49B, CD14/19, CD69, CD103

Next, register non-validated markers from Bionty:

records = bt.CellMarker.from_values(adata.var.index[~validated])
ln.save(records)
did not create CellMarker records for 2 non-validated names: 'CD14/19', 'LIVE/DEAD'

Manually create 1 marker:

bt.CellMarker(name="CD14/19").save()
❗ record with similar name exist! did you mean to load it?
uid synonyms score
name
Cd14 5JHfKNo5DC8y 90.0

Move metadata to obs:

validated = bt.CellMarker.validate(adata.var.index)
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()
1 term (7.70%) is not validated for name: LIVE/DEAD

Now all markers pass validation:

validated = bt.CellMarker.validate(adata.var.index)
assert all(validated)

Register #

features = ln.Feature.lookup()
efs = bt.ExperimentalFactor.lookup()
organism = bt.Organism.lookup()
markers = bt.CellMarker.lookup()
artifact = ln.Artifact.from_anndata(
    adata,
    description="Oetjen18_t1"
)
Hide code cell output
... storing '$PnR' as categorical
... storing '$PnE' as categorical
... storing '$PnV' as categorical
... storing '$PnG' as categorical
artifact.save()
artifact.features.add_from_anndata(var_field=bt.CellMarker.name)
1 term (100.00%) is not validated for name: LIVE/DEAD
❗ skip linking features to artifact in slot 'obs'
artifact.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
artifact.labels.add(organism.human, features.organism)
artifact.features
Features:
  var: FeatureSet(uid='PWtpOJ8T74fIHwbDuDue', n=12, type='number', registry='bionty.CellMarker', hash='YXolP9mtiV6-oHKhY4h6', updated_at=2024-03-28 12:12:17 UTC, created_by_id=1)
    'Cd4', 'CD8', 'CD95', 'CXCR4', 'CD49B', 'CD69', 'CD3', 'CD103', 'CD27', 'CD14/19', 'Ccr7', 'CD45RA'
  external: FeatureSet(uid='gv7vaXmo8UUNW7QPUbAY', n=2, registry='core.Feature', hash='6Hfp-eR18Cgrt_4tFh3E', updated_at=2024-03-28 12:12:17 UTC, created_by_id=1)
    🔗 assay (1, bionty.ExperimentalFactor): 'fluorescence-activated cell sorting'
    🔗 organism (1, bionty.Organism): 'human'

View data flow:

artifact.view_lineage()
_images/63c861c1c0ab26a19338a85933e8b6c77ce2666e9bc2834a092e2dd9c3c9a9c2.svg

Inspect a PCA fo QC - this collection looks much like noise:

import scanpy as sc

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd8.name)
_images/e968a137a7bc3a205dea9a814215eea2617d72931ae6ff073bf7bf57bccb030a.png

Create a new version of the collection by appending a artifact#

Query the old version:

collection_v1 = ln.Collection.filter(name="My versioned cytometry collection").one()
collection_v2 = ln.Collection(
    [artifact, collection_v1.artifact], is_new_version_of=collection_v1, version="2"
)
collection_v2
Collection(uid='TegJbNDfqQuGlYUqiUiP', name='My versioned cytometry collection', version='2', hash='ZKQxIw0uAvtMtdZk8SAj', visibility=1, transform_id=2, run_id=2, created_by_id=1)
collection_v2.features
Features:
  var: FeatureSet(uid='CMmezChLVLXJlTnCm0E4', n=41, type='number', registry='bionty.CellMarker', hash='n0jcZjyxOx4D0aylQKuM', created_by_id=1)
    'CD57', 'Cd19', 'Cd4', 'CD8', 'Igd', 'CD85j', 'CD11c', 'CD16', 'CD3', 'CD38', 'CD27', 'CD11B', 'Cd14', 'Ccr6', 'CD94', 'CD86', 'CXCR5', 'CXCR3', 'Ccr7', 'CD45RA', ...
  obs: FeatureSet(uid='cnjSQw05TjGVCO6LsjIt', n=5, registry='core.Feature', hash='ZfB8UZuEnOkXDRnaBY2s', updated_at=2024-03-28 12:12:06 UTC, created_by_id=1)
    Time (number)
    Cell_length (number)
    Dead (number)
    (Ba138)Dd (number)
    Bead (number)
  external: FeatureSet(uid='gv7vaXmo8UUNW7QPUbAY', n=2, registry='core.Feature', hash='6Hfp-eR18Cgrt_4tFh3E', updated_at=2024-03-28 12:12:17 UTC, created_by_id=1)
    🔗 assay (1, bionty.ExperimentalFactor): 'My versioned cytometry collection'
    🔗 organism (1, bionty.Organism): 'My versioned cytometry collection'
collection_v2
Collection(uid='TegJbNDfqQuGlYUqiUiP', name='My versioned cytometry collection', version='2', hash='ZKQxIw0uAvtMtdZk8SAj', visibility=1, transform_id=2, run_id=2, created_by_id=1)
collection_v2.save()
collection_v2.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
collection_v2.labels.add(organism.human, features.organism)
collection_v2.view_lineage()
_images/149b2a12fff41f1f800e00ea69834fe5fbd06d37fa67a6c97501d7096a2b9089.svg