A sample table (e.g. samples.csv) file is used to list the samples included in an analysis run, their sample barcode (RT) sequences and sample-specific analysis parameters.
It is a comma separated file (csv), with a header line (column names), followed by one sample per line.
The first column is required to be sample and contains the name of each sample. These sample names need to be UNIQUE. This is especially important in the case of multiple final distribution plates, where the same sample as encoded by the original RT well will need to be found in multiple libraries and so their names need to be unique to maintain proper analysis downstream. The necessary column descriptions can be found below. Column names are case sensitive!
| Column | Description | Example |
|---|---|---|
| sample | Sample name | Foobar-2 |
| barcodes | RT-plate wells used for this sample | 1A-2H |
| libName | Name for the overall sequencing library / fastq files | ScaleCROP |
| libIndex | Library barcode sequence corresponding to overall fastqName to separate out CROP reads | [BARCODE SEQUENCE] |
| guide | File name of guide sequence .tsv (each file should be in "guides" directory) | guides.txt |
| allCells | File name of ScaleRna output SAMPLE.allcells.csv corresponding the RNA libraries cell metrics (ultimately placed in allCells directory) | PBMC.allcells.csv |
sampleandlibNameshould consist only of letters, numbers, dash (-) and dot (.)- A single
libNameshould be used for an entire ScaleCROP sequencing library, not onelibNameper sample loaded into the RT plate.- In the case of multiple final distribution plates as in the Extended Throughput Kit,
libNameshould correspond to each unique PLATE, and will have its own libIndex associated with each PLATE. See samples.csv for illustration.
- In the case of multiple final distribution plates as in the Extended Throughput Kit,
- When running from pre-existing fastq file input,
libNameshould match the first part of the fastq file name for this sample, e.g.:Foo1forFoo1_*.fastq.gz. - See guides.txt.
- See allCells.csv.
During analysis the sequencing data is first converted into library fastq files (libName column). If multiple samples were included in one sequencing library, these are then demultiplexed based on the sample (RT) barcodes. E.g.
| sample | barcodes |
|---|---|
| Foo | 1A-6H |
| Bar | 7A-12H |
The RT wells used for each sample are given in barcodes as either
- An individual value (
1A) - A range of wells (
1A-2H)- Wells are sorted first by number then by letter, i.e.
1A-1H. - Note that all ranges are read in column-wise order; e.g. 1A-2C, refers to 1A-1H (all of column 1) plus 2A-2C.
- Wells are sorted first by number then by letter, i.e.
- A list of values or ranges, separated by semicolon (
;) (1A;2A-2D)