Skip to content

Latest commit

 

History

History
48 lines (42 loc) · 2.97 KB

File metadata and controls

48 lines (42 loc) · 2.97 KB
About: Annotate regions in DST file with texts from overlapping regions in SRC file.
       The transfer of annotations can be conditioned on matching values in one or more
       columns (-m), multiple columns can be transferred (-t).
       In addition to column transfer and adding special annotations, the program can simply
       print (when neither -t nor -a is given) or drop (-x) matching lines.
       All indexes and coordinates are 1-based and inclusive.
Usage: annot-regs [OPTIONS] DST
Options:
       --allow-dups                Add annotations multiple times
   -a, --annotate list             Add special annotations:
                                       cnt  .. number of overlapping regions
                                       frac .. fraction of the destination region with an overlap
                                       nbp  .. number of source base pairs in the overlap
   -c, --core src:dst              Core columns [chr,beg,end:chr,beg,end]
   -d, --dst-file file             Destination file
   -H, --ignore-headers            Use numeric indexes, ignore the headers completely
   -m, --match src:dst             Require match in these columns
       --max-annots int            Adding at most int annotations per column to save time in big regions
   -o, --overlap float             Minimum required overlap (non-reciprocal, unless -r is given)
   -r, --reciprocal                Require reciprocal overlap
   -s, --src-file file             Source file
   -t, --transfer src:dst          Columns to transfer. If src column does not exist, interpret
                                   as the default value to use. If the dst column does not exist,
                                   a new column is created. If the dst column exists, its values are
                                   overwritten when overlap is found and left as is otherwise.
       --version                   Print version string and exit
   -x, --drop-overlaps             Drop overlapping regions (precludes -t)
Examples:
   # Header is present, match and transfer by column name
   annot-regs -s src.txt.gz -d dst.txt.gz -c chr,beg,end:chr,beg,end -m type,sample:type,smpl -t tp/fp:tp/fp

   # Header is not present, match and transfer by column index (1-based)
   annot-regs -s src.txt.gz -d dst.txt.gz -c 1,2,3:1,2,3 -m 4,5:4,5 -t 6:6

   # If the dst part is not given, the program assumes that the src:dst columns are identical
   annot-regs -s src.txt.gz -d dst.txt.gz -c chr,beg,end -m type,sample -t tp/fp

   # One of source or destination files can be streamed to stdin
   gunzip -c src.txt.gz | annot-regs -d dst.txt.gz -c chr,beg,end -m type,sample -t tp/fp
   gunzip -c dst.txt.gz | annot-regs -s src.txt.gz -c chr,beg,end -m type,sample -t tp/fp

   # Print matching regions as above but without modifying the records
   gunzip -c src.txt.gz | annot-regs -d dst.txt.gz -c chr,beg,end -m type,sample

See also the slides for more