Skip to content

sc_seq pipeline should pull in or organize more TCR/BCR info #68

@dtm2451

Description

@dtm2451

Per recent discussions, there is additional data we would want to use from tcr and bcr libraries than only the clonotype_id and cdr3 amino acid sequences that are currently pulled in.

  • This isn't as simple as just pulling in more columns because cellranger doesn't organize all the columns we want into their 'clonotypes.csv' file that we currently extract from.

  • We've discussed minimally expanding to get:
    cdr1, cdr2, cdr3, v_gene, j_gene

  • But perhaps we output a metadata_file where we don't need to worry about Seurat metadata clutter and then also grab all of:
    c("v_gene","d_gene","j_gene","c_gene","fwr1","cdr1","fwr2","cdr2","fwr3","cdr3","fwr4","reads","umis")

(The updated function would build this dataframe by ';' join these data from the multiple lines in the all_contig_annotations.csv file that have matching 'barcode' + 'raw_contig_annotation' values.)

Questions:

  1. All columns in the latter suggestion?
  2. Should this get output to metadata file, or pull ALL these many columns into the Seurat object?
  3. If metadata file where? automated_processing/(T|B)CR_contigs.csv?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions