sc_seq pipeline should pull in or organize more TCR/BCR info

Per recent discussions, there is additional data we would want to use from tcr and bcr libraries than only the clonotype_id and cdr3 amino acid sequences that are currently pulled in.

- This isn't as simple as just pulling in more columns because cellranger doesn't organize all the columns we want into their 'clonotypes.csv' file that we currently extract from.

- We've discussed minimally expanding to get:
cdr1, cdr2, cdr3, v_gene, j_gene

- But perhaps we output a metadata_file where we don't need to worry about Seurat metadata clutter and then also grab all of:
`c("v_gene","d_gene","j_gene","c_gene","fwr1","cdr1","fwr2","cdr2","fwr3","cdr3","fwr4","reads","umis")`

(The updated function would build this dataframe by ';' join these data from the multiple lines in the all_contig_annotations.csv file that have matching 'barcode' + 'raw_contig_annotation' values.)

Questions:
1. All columns in the latter suggestion?
2. Should this get output to metadata file, or pull ALL these many columns into the Seurat object?
3. If metadata file where? `automated_processing/(T|B)CR_contigs.csv`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sc_seq pipeline should pull in or organize more TCR/BCR info #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sc_seq pipeline should pull in or organize more TCR/BCR info #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions