You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This single cell pytorch dataloader / lighting datamodule is designed to be used with:
15
+
This single cell pytorch dataloader / lighting datamodule is designed to be used
16
+
with:
16
17
17
18
-[lamindb](https://lamin.ai/)
18
19
@@ -24,11 +25,13 @@ and:
24
25
It allows you to:
25
26
26
27
1. load thousands of datasets containing millions of cells in a few seconds.
27
-
2. preprocess the data per dataset and download it locally (normalization, filtering, etc.)
28
+
2. preprocess the data per dataset and download it locally (normalization,
29
+
filtering, etc.)
28
30
3. create a more complex single cell dataset
29
31
4. extend it to your need
30
32
31
-
built on top of `lamindb` and the `.mapped()` function by Sergei: https://github.com/Koncopd
33
+
built on top of `lamindb` and the `.mapped()` function by Sergei:
34
+
https://github.com/Koncopd
32
35
33
36
```
34
37
Portions of the mapped.py file are derived from Lamin Labs
@@ -39,11 +42,17 @@ Please see https://github.com/laminlabs/lamindb/blob/main/lamindb/core/_mapped_c
39
42
for the original implementation
40
43
```
41
44
42
-
The package has been designed together with the [scPRINT paper](https://doi.org/10.1101/2024.07.29.605556) and [model](https://github.com/cantinilab/scPRINT).
45
+
The package has been designed together with the
46
+
[scPRINT paper](https://doi.org/10.1101/2024.07.29.605556) and
47
+
[model](https://github.com/cantinilab/scPRINT).
43
48
44
49
## More
45
50
46
-
I needed to create this Data Loader for my PhD project. I am using it to load & preprocess thousands of datasets containing millions of cells in a few seconds. I believed that individuals employing AI for single-cell RNA sequencing and other sequencing datasets would eagerly utilize and desire such a tool, which presently does not exist.
51
+
I needed to create this Data Loader for my PhD project. I am using it to load &
52
+
preprocess thousands of datasets containing millions of cells in a few seconds.
53
+
I believed that individuals employing AI for single-cell RNA sequencing and
54
+
other sequencing datasets would eagerly utilize and desire such a tool, which
@@ -57,12 +66,14 @@ pip install scDataLoader[dev] # for dev dependencies
57
66
lamin init --storage ./testdb --name test --schema bionty
58
67
```
59
68
60
-
if you start with lamin and had to do a `lamin init`, you will also need to populate your ontologies. This is because scPRINT is using ontologies to define its cell types, diseases, sexes, ethnicities, etc.
69
+
if you start with lamin and had to do a `lamin init`, you will also need to
70
+
populate your ontologies. This is because scPRINT is using ontologies to define
71
+
its cell types, diseases, sexes, ethnicities, etc.
61
72
62
73
you can do it manually or with our function:
63
74
64
75
```python
65
-
from scdataloader.utils import populate_my_ontology
76
+
from scdataloader.utils import populate_my_ontology, _adding_scbasecamp_genes
66
77
67
78
populate_my_ontology() #to populate everything (recommended) (can take 2-10mns)
This is actually what I did in my own instance to create the full scPRINT-2
288
+
corpus and you can see some of it in the notebooks above.
289
+
290
+
### Getting even more
291
+
292
+
They also host a pertubation atlas in `laminlabs/pertdata` that can be
293
+
downloaded the same way.
228
294
229
-
### command line usage
295
+
### command line usage to train a moel
230
296
231
297
The main way to use
232
298
233
-
> please refer to the [scPRINT documentation](https://www.jkobject.com/scPRINT/) and [lightning documentation](https://lightning.ai/docs/pytorch/stable/cli/lightning_cli_intermediate.html) for more information on command line usage
299
+
> please refer to the [scPRINT documentation](https://www.jkobject.com/scPRINT/)
0 commit comments