Hi Mike,
Thanks for this awesome tool! I'm using it to teach an undergrad class on analyzing novel MAGs right now and LOVING it : )
Question for you - I'm wondering about adding a little feature to gtt-subset-GTDB-acccessions. We're using this right now to build, say, a tree of a family that has one representative from each genus. Since GTDB doesn't formally designate a "type species" of a genus (that i know of?); your script randomly pulls a single representative for each genus. This is fantastic for making a first rough tree. The hitch is that sometimes it pulls, say, an anonymous MAG from a genus that happens to have a well described cultivated species.
I'm wondering about ways to basically make it pull something with a "real" name as opposed to sp0145850 from a genus where a named species exists... Sadly, seems like there is no "cultivated" column in that GTDB metadata sheet. What about a flag says when gtt-subset-GTDB-acccessions uses --get-only-individuals-for-the-rank it pulls first a RefSeq entry if one is available?
We could of course first restrict to gtt-get-accessions-from-GTDB using --RefSeq-representatives-only, but then we'd lose diversity from some genera with no refseq representative. Obviously could be fixed with some manual futzing, but wondering if this would be straightforward to implement?
Cheers,
Lizzy
Hi Mike,
Thanks for this awesome tool! I'm using it to teach an undergrad class on analyzing novel MAGs right now and LOVING it : )
Question for you - I'm wondering about adding a little feature to gtt-subset-GTDB-acccessions. We're using this right now to build, say, a tree of a family that has one representative from each genus. Since GTDB doesn't formally designate a "type species" of a genus (that i know of?); your script randomly pulls a single representative for each genus. This is fantastic for making a first rough tree. The hitch is that sometimes it pulls, say, an anonymous MAG from a genus that happens to have a well described cultivated species.
I'm wondering about ways to basically make it pull something with a "real" name as opposed to sp0145850 from a genus where a named species exists... Sadly, seems like there is no "cultivated" column in that GTDB metadata sheet. What about a flag says when gtt-subset-GTDB-acccessions uses --get-only-individuals-for-the-rank it pulls first a RefSeq entry if one is available?
We could of course first restrict to gtt-get-accessions-from-GTDB using --RefSeq-representatives-only, but then we'd lose diversity from some genera with no refseq representative. Obviously could be fixed with some manual futzing, but wondering if this would be straightforward to implement?
Cheers,
Lizzy