Bugfixes
ts.samples(population=...)now raises aValueErrorif the population ID is e.g. a population name, rather than silently returning no samples. (:user:`hyanwong`, :pr:`3344`)
Features
- Displaying a summary of the tree sequence now shows the metadata codec and size of the metadata for each table. (:user:`hyanwong`, :pr:`3343`, :issue:`2637`)
Breaking changes
- The
reference_sequenceargument toTreeSequence.alignmentsis now required to be the same length as the tree sequence. Previously it was required to be the length of the requested interval. (:user:`benjeffery`, :pr:`3317`) TreeSequence.tablesnow returns a zero-copy immutable view of the tables. To get a mutable copy, useTreeSequence.dump_tables(). (:user:`benjeffery`, :pr:`3288`, :issue:`760`)- For a tree sequence to be valid, the mutation parents in the table collection
must be correct and consistent with the topology of the tree at each mutation site.
TableCollection.tree_sequence()will raise a_tskit.LibraryErrorif this is not the case. (:user:`benjeffery`, :issue:`2729`, :issue:`2732`, :pr:`3212`). - Drop Python 3.9 support and require Python >= 3.10. (:pr:`3267`, :user:`benjeffery`)
ltrim,rtrim,trimandshiftraise an error if they are used on a tree sequence containing a reference sequence. (:user:`hyanwong`, :pr:`3210`, :issue:`2091`)
Features
- Add
tskit.jit.numba.jitwrapandNumbaTreeSequenceto allow simplified use and development of Numba-jitted functions with tree sequences. See the documentation for details. (:user:`andrewkern`, :pr:`3295`, :issue:`3294`) TreeSequence.map_to_vcf_modelnow also returns the transformed positions and contig length. (:user:`benjeffery`, :pr:`3174`, :issue:`3173`)draw_svg()methods now associate tree branches with edge IDs. (:user:`hyanwong`, :pr:`3193`, :issue:`557`)draw_svg()methods now allow the y-axis to be placed on the right-hand side usingy_axis="right". (:user:`hyanwong`, :pr:`3201`)- Add
contig_idandisolated_as_missingtoVcfModelMapping(:user:`benjeffery`, :pr:`3219`, :issue:`3177`). - Add
TreeSequence.mutations_edge, which returns the edge ID for each mutation's edge. (:user:`benjeffery`, :pr:`3226`, :issue:`3189`) - Add
TreeSequence.sites_ancestral_state,TreeSequence.mutations_derived_stateandTreeSequence.mutations_inherited_stateproperties to return the ancestral state of sites, the derived state of mutations and the inherited state of mutations as NumPy arrays of the new NumPy 2.0StringDType. (:user:`benjeffery`, :pr:`3228`, :issue:`2632`, :pr:`3276`, :issue:`2631`) - Tskit now requires NumPy version 2 or later. However, you can still use
tskit with NumPy 1.x by building tskit from source with NumPy 1.x using
pip install tskit --no-binary tskit. With NumPy 1.x, any use of the newStringDTypeproperties will result in aRuntimeError. If you try to use another Python module that was compiled against NumPy 1.x with NumPy 2.x you may see the error "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash.". If no newer version of the module is available you will have to use the NumPy 1.x build as above. - Add
Mutation.inherited_stateproperty which returns the inherited state for a single mutation. (:user:`benjeffery`, :pr:`3277`, :issue:`2631`) - Add
all_mutationsandall_edgesoptions toTreeSequence.union, allowing greater flexibility in "disjoint union" situations. (:user:`hyanwong`, :user:`petrelharp`, :issue:`3181`) - Add
TreeSequence.divergence_matrix, which was previously undocumented. TreeSequence.variants,.genotype_matrix,.haplotypes, and.alignmentsmethods now fully supportisolated_as_missingbehaviour with internal nodes..alignmentsis also around 10% faster. (:user:`benjeffery`, :pr:`3313`, :pr:`3317`, :issue:`1896`)
Bugfixes
- In some tables with mutations out-of-order
TableCollection.sortdid not re-order the mutations so they formed a valid TreeSequence.TableCollection.sortandTableCollection.canonicalisenow sort mutations by site, then time (if known), then the mutation's node's time, then number of descendant mutations (ensuring that parent mutations occur before children), then node, then their original order in the tables. (:user:`benjeffery`, :pr:`3257`, :issue:`3253`) - Fix bug in
TreeSequence.genetic_relatedness_vectorthat previously ignoredspan_normalise: previously,span_normalisewas always set toFalse; now the default isTruein agreement with other statistics, so the returned values will change. (:user:`petrelharp`, :pr:`3300`, :issue:`3241`) - Fix bug in
TreeSequence.pair_coalescence_countswhenspan_normalise=Trueand a window breakpoint falls within an internal missing interval. (:user:`nspope`, :pr:`3176`, :issue:`3175`) - Fix metadata schemas that are equal but have different byte representations not
being considered equal when using
TableCollection.assert_equalsandTable.assert_equals. (:user:`benjeffery`, :pr:`3246`, :issue:`3244`) - k-way statistics no longer require k sample sets, allowing in particular
"self" comparisons for
TreeSequence.genetic_relatedness. This changes the error code returned in some situations. (:user:`andrewkern`, :user:`petrelharp`, :pr:`3235`, :issue:`3055`) - Fix
UnboundLocalErrorindraw_svg()when using numericmax_timevalues with mutations over roots. (:user:`benjeffery`, :pr:`3274`, :issue:`3273`) - Prevent iterating over a
TopologyCounter. (:user:`benjeffery`, :pr:`3202`, :issue:`1462`) - Fix
TreeSequence.concatenate()to work with internal samples by using theall_mutationsandall_edgesparameters inunion(). (:user:`hyanwong`, :pr:`3283`, :issue:`3181`)
Features
- Add
TreeSequence.sample_nodes_by_ploidymethod to return the sample nodes in a tree sequence, grouped by a ploidy value. (:user:`benjeffery`, :pr:`3157`) - Add
TreeSequence.individuals_nodesattribute to return the nodes associated with each individual as a numpy array. (:user:`benjeffery`, :pr:`3153`) - Add
shiftmethod to bothTableCollectionandTreeSequenceclasses allowing the coordinate system to be shifted, andTreeSequence.concatenateso a set of tree sequence can be added to the right of an existing one. (:user:`hyanwong`, :pr:`3165`, :issue:`3164`) - Add
TreeSequence.map_to_vcf_modelmethod to return a mapping of the tree sequence to the VCF model. (:user:`benjeffery`, :pr:`3163`) - Use a thin space as the thousands separator in HTML output, and a comma in CLI output. (:user:`hossam26644`, :pr:`3167`, :issue:`2951`)
Fixes
- Correct assertion message when tables are compared with metadata ignored. (:user:`benjeffery`, :pr:`3162`, :issue:`3161`)
Breaking changes
TreeSequence.write_vcfnow filters non-sample nodes from individuals by default, instead of raising an error. These nodes can be included using the newinclude_non_sample_nodesargument. By default individual names (sample IDs) in VCF output are now of the formtsk_{individual.id}Previously these were always"tsk_{j}" for j in range(num_individuals). This may break some downstream code if individuals are specified. To fix, manually specifyindividual_namesto the required pattern. (:user:`benjeffery`, :pr:`3163`)
Bugfixes
TreeSequence.draw_svg(path=...)was failing due to a missing import xml.dom.minidom (:user:`petrelharp`, :issue:`3144`, :pr:`3145`)
Bugfixes
- Metadata.schema was returning a modified schema, this is fixed to return a copy of the original schema instead (:user:`benjeffery`, :issue:`3129`, :pr:`3130`)
Breaking Changes
- Legacy formats from msprime<0.6 (HDF5 formats) support is dropped. This includes the
support for
tskit upgrade(:user:`hossam26644`, :issue:`2812`, :pr:`3138`)
Bugfixes
- Fix to
TreeSequence.pair_coalescence_countsoutput dimension when provided with time windows containing no nodes (:user:`nspope`, :issue:`3046`, :pr:`3058`) - Fix to
TreeSequence.pair_coalescence_countsto normalise by non-missing span ifspan_normalise=True. This resolves a bug whereTreeSequence.pair_coalescence_rateswould return incorrect values for intervals with missing trees. (:user:`natep`, :issue:`3053`, :pr:`3059`) - Fix to
TreeSequence.pair_coalescence_ratescausing an assertion to be triggered by floating point error, when all coalescence events are inside a single time window (:user:`natep`, :issue:`3035`, :pr:`3038`)
Features
- Add support for fixed-length arrays in metadata struct codec using the
lengthproperty. (:user:`benjeffery`, :issue:`3088`,:pr:3090) - Add a new
TreeSequence.pcamethod that uses randomized linear algebra to find the top eigenvectors/values of the genetic relatedness matrix (:user:`hanbin973`, :user:`petrelharp`, :pr:`3008`) - Add methods on TreeSequence to efficiently get table metadata as a numpy structured array. (:user:`benjeffery`, :pr:`3098`)
- Add Python 3.13 support (:user:`benjeffery`, :pr:`3107`)
- Add a preamble argument to draw_svg() methods to allow adding arbitrary extra graphics (e.g. legends) to SVG plots (:user:`hyanwong`, issue:`3086, :pr:`3121`)
Breaking Changes
- The definition of
TreeSequence.genetic_relatednessandTreeSequence.genetic_relatedness_weightedare changed to average over sample sets, rather than summing over them. For computation with diploid sample sets, this will change the result by a factor of four; for larger sample sets it will now produce sensible values that are comparable between sample sets of different sizes. The default for these methods is also changed topolarised=True, but the output is unchanged forcentre=True(the default). See the documentation for these methods for more discussion. (:user:`petrelharp`, :user:`mmosmond`, :pr:`1623`)
Bugfixes
- Fix to
TreeSequence.genetic_relatednesswithindexes=Noneandproportion=True. (:user:`petrelharp`, :issue:`2984`, :pr:`1623`) - Fix to
TreeSequence.general_statwhen using non-strict summary functions in the presence of non-ancestral material (very rare). (:user:`petrelharp`, :issue:`2983`, :pr:`1623`) - Printing
tskit.MetadataSchema(schema=None)now shows"Null_schema"rather thanNone, to avoid confusion (:user:`hyanwong`, :pr:`2720`) - Limit output HTML when a tree sequence is displayed that has a large amount of metadata. (:user:`benjeffery`, :pr:`2999`)
- Fix warning in draw_svg to use correct warnings module. (:user:`duncanMR`, :issue:`2870`, :pr:`2871`)
Features
- Add the
centreoption toTreeSequence.genetic_relatednessandTreeSequence.genetic_relatedness_weighted. (:user:`petrelharp`, :user:`mmosmond`, :pr:`1623`) - Edges now have an
.intervalattribute returning atskit.Intervalobject. (:user:`hyanwong`, :pr:`2531`) - Variants now have a states() method that returns the genotypes as an (inefficient) array of strings, rather than integer indexes, to aid comparison of genetic variation (:user:`hyanwong`, :pr:`2617`)
- Added
distance_betweenthat calculates the total distance between two nodes in a tree. (:user:`Billyzhang1229`, :pr:`2771`) - Added
genetic_relatedness_matrixmethod to compute pairwise genetic relatedness between sample sets. (:user:`jeromekelleher`, :user:`petrelharp`, :pr:`2823`) - Add
TreeSequence.extend_haplotypesmethod that extends ancestral haplotypes using recombination information, leading to unary nodes in many trees and fewer edges. (:user:`petrelharp`, :user:`hfr1tz3`, :user: nspope, :user:`avabamf`, :pr:`2651`, :pr:`2938`) - Add
Table.drop_metadatato make clearing metadata from tables easy. (:user:`jeromekelleher`, :pr:`2944`) - Add
Interval.midandTree.midproperties to return the midpoint of the interval. (:user:`currocam`, :pr:`2960`) - Added
genetic_relatedness_vectormethod to compute product of genetic relatedness matrix and weight vector. (:user:`petrelharp`, :pr:`2980`) - Added
pair_coalescence_countsmethod to calculate coalescence events per node or time interval,pair_coalescence_quantilesmethod to estimate quantiles of pair coalescence times using empirical CDF inversion, andpair_coalescence_ratesmethod to estimate instantaneous rates of pair coalescence within time intervals from the empirical CDF. (:user:`nspope`, :pr:`2915`, :pr:`2976`, :pr:`2985`) - Add provenance information to the HTML notebook representation of a tree sequence. (:user:`benjeffery`, :pr:`3001`)
- The
.draw_svg()methods can add annotated genomic regions (e.g. genes) to the x-axis. (:user:`hyanwong`, :pr:`3002`) - Added a
node_titlesand amutation_titlesparameter to.draw_svg()methods which assigns a string to node and mutation symbols, commonly shown on mouseover. This can reduce label clutter while retaining useful info (:user:`hyanwong`, :pr:`3007`) - Added (currently undocumented) use of the order parameter in
Tree.draw_svg()to pass a subset of nodes, so subtrees can be visually collapsed. Additionally, an optionpack_untracked_polytomiesallows large polytomies involving untracked samples to be summarised as a dotted line (:user:`hyanwong`, :issue:`3011` :pr:`3010`, :pr:`3012`) - Added a
titleparameter to.draw_svg()methods (:user:`hyanwong`, :pr:`3015`) - Add comma separation to all display numbers. (:user:`benjeffery`, :issue:`3017`, :pr:`3018`)
- Added
Tree.ancestors(u)method. (:user:`hyanwong`, :issue:`2706`, :pr:`3021`) - Add
resourcessection to provenance schema. (:user:`benjeffery`, :pr:`3016`) - Add
Tree.rf_distancemethod to calculate the unweighted Robinson-Foulds distance between two trees. (:user:`Billyzhang1229`, :issue:`995`, :pr:`2643`, :pr:`3032`)
- Add support for numpy 2 (:user:`jeromekelleher`, :user:`benjeffery`, :pr:`2964`)
Breaking Changes
- The VCF writing methods (ts.write_vcf, ts.as_vcf) now error if a site with position zero is encountered. The VCF spec does not allow zero position sites. Suppress this error with the allow_position_zero argument. (:user:`benjeffery`, :pr:`2901`, :issue:`2838`)
Bugfixes
- Fix to the folded, expected allele frequency spectrum (i.e., TreeSequence.allele_frequency_spectrum(mode="branch", polarised=False), which was half as big as it should have been. (:user:`petrelharp`, :user:`nspope`, :pr:`2933`)
Breaking Changes
- tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27
Features
- Tree.trmca now accepts >2 nodes and returns nicer errors (:user:`hyanwong`, :pr:2808, :issue:`2801`, :issue:`2070`, :issue:`2611`)
- Add
TreeSequence.genetic_relatedness_weightedstats method. (:user:`petrelharp`, :user:`brieuclehmann`, :user:`jeromekelleher`, :pr:`2785`, :pr:`1246`) - Add
TreeSequence.impute_unknown_mutations_timemethod to return an array of mutation times based on the times of associated nodes (:user:`duncanMR`, :pr:`2760`, :issue:`2758`) - Add
asdictto all dataclasses. These are returned when you access a row or other tree sequence object. (:user:`benjeffery`, :pr:`2759`, :issue:`2719`)
Bugfixes
- Fix incompatibility with
jsonschema>4.18.6which causedAttributeError: module jsonschema has no attribute _validators(:user:`benjeffery`, :pr:`2844`, :issue:`2840`)
Performance improvements
- Methods like ts.at() which seek to a specified position on the sequence from a new Tree instance are now much faster (:user:`molpopgen`, :pr:`2661`).
Features
- Add
__repr__for variants to return a string representation of the raw data without spewing megabytes of text (:user:`chriscrsmith`, :pr:`2695`, :issue:`2694`)
Breaking Changes
Bugfixes
- Fix UnicodeDecodeError when calling Variant.alleles on the emscripten platform. (:user:`benjeffery`, :pr:`2754`, :issue:`2737`)
Features
- A new
Tree.is_rootmethod avoids the need to to search the potentially large list ofTree.roots(:user:`hyanwong`, :pr:`2669`, :issue:`2620`) - The
TreeSequenceobject now has the attributesmin_timeandmax_time, which are the minimum and maximum among the node times and mutation times, respectively. (:user:`szhan`, :pr:`2612`, :issue:`2271`) - The
draw_svgmethods now have amax_num_treesparameter to truncate the total number of trees shown, giving a readable display for tree sequences with many trees (:user:`hyanwong`, :pr:`2652`) - The
draw_svgmethods now accept acanvas_sizeparameter to allow extra room on the canvas e.g. for long labels or repositioned graphical elements (:user:`hyanwong`, :pr:`2646`, :issue:`2645`) - The
Treeobject now has the methodsiblingsto get - the siblings of a node. It returns an empty tuple if the node has no siblings, is not a node in the tree, is the virtual root, or is an isolated non-sample node. (:user:`szhan`, :pr:`2618`, :issue:`2616`)
- The
- The
msprime.RateMapclass has been ported into tskit: functionality should be identical to the version in msprime, apart from minor changes in the formatting of tabular text output (:user:`hyanwong`, :user:`jeromekelleher`, :pr:`2678`) - Tskit now supports and has wheels for Python 3.11. This Python version has a significant performance boost (:user:`benjeffery`, :pr:`2624`, :issue:`2248`)
- Add the update_sample_flags option to simplify which ensures no node sample flags are changed to allow calling code to manage sample status. (:user:`jeromekelleher`, :issue:`2662`, :pr:`2663`).
Breaking Changes
- the
filter_populations,filter_individuals, andfilter_sitesparameters to simplify previously defaulted toTruebut now default toNone, which is treated asTrue. Previously, passingNonewould result in an error. (:user:`hyanwong`, :pr:`2609`, :issue:`2608`)
Fixes
- The
Variantobject can now be initialized with 64 bit numpy ints as returned e.g. from np.where (:user:`hyanwong`, :pr:`2518`, :issue:`2514`)- Fix tree.mrca for the case of a tree with multiple roots. (:user:`benjeffery`, :pr:`2533`, :issue:`2521`)
Features
- The
ts.nodesmethod now takes anorderparameter so that nodes can be visited in time order (:user:`hyanwong`, :pr:`2471`, :issue:`2370`)- Add
samplesargument toTreeSequence.genotype_matrix. Default isNone, where all the sample nodes are selected. (:user:`szhan`, :pr:`2493`, :issue:`678`)ts.drawand thedraw_svgmethods now have an optionalomit_sitesparameter, aiding drawing large trees with many sites and mutations (:user:`hyanwong`, :pr:`2519`, :issue:`2516`)
Breaking Changes
- Single statistics computed with
TreeSequence.general_statare now returned as numpy scalars if windows=None, AND; samples is a single list or None (for a 1-way stat), OR indexes is None or a single list of length k (instead of a list of length-k lists). (:user:`gtsambos`, :pr:`2417`, :issue:`2308`)- Accessor methods such as ts.edge(n) and ts.node(n) now allow negative indexes (:user:`hyanwong`, :pr:`2478`, :issue:`1008`)
ts.subset()produces valid tree sequences even if nodes are shuffled out of time order (:user:`hyanwong`, :pr:`2479`, :issue:`2473`), and the same fortables.subset()(:user:`hyanwong`, :pr:`2489`). This involves sorting the returned tables, potentially changing the returned edge order.
Performance improvements
- TreeSequence.link_ancestors no longer continues to process edges once all of the sample and ancestral nodes have been accounted for, improving memory overhead and overall performance (:user:`gtsambos`, :pr:`2456`, :issue:`2442`)
Fixes
- Iterating over
ts.variants()could cause a segfault in tree sequences with large numbers of alleles or very long alleles (:user:`jeromekelleher`, :pr:`2437`, :issue:`2429`). - Various circular references fixed, lowering peak memory usage (:user:`jeromekelleher`, :pr:`2424`, :issue:`2423`, :issue:`2427`).
- Fix bugs in VCF output when there isn't a 1-1 mapping between individuals and sample nodes (:user:`jeromekelleher`, :pr:`2442`, :issue:`2257`, :issue:`2446`, :issue:`2448`).
Performance improvements
- TreeSequence.site position search performance greatly improved, with much lower memory overhead (:user:`jeromekelleher`, :pr:`2424`).
- TreeSequence.samples time/population search performance greatly improved, with much lower memory overhead (:user:`jeromekelleher`, :pr:`2424`, :issue:`1916`).
- The
timeascandtimedescorders forTree.nodeshave much improved performance and lower memory overhead (:user:`jeromekelleher`, :pr:`2424`, :issue:`2423`).
Features
- Variant objects now have a
.num_missingattribute and.counts()and.frequenciesmethods (:user:`hyanwong`, :issue:`2390` :pr:`2393`). - Add the Tree.num_lineages(t) method to return the number of lineages present at time t in the tree (:user:`jeromekelleher`, :issue:`386`, :pr:`2422`)
- Efficient array access to table data now provided via attributes like TreeSequence.nodes_time, etc (:user:`jeromekelleher`, :pr:`2424`).
Breaking Changes
- Previously, accessing (e.g.)
tables.edgesreturned a different instance of EdgeTable each time. This has been changed to return the same instance for the lifetime of a given TableCollection instance. This is technically a breaking change, although it's difficult to see how code would depend on the property that (e.g.)tables.edges is not tables.edges. (:user:`jeromekelleher`, :pr:`2441`, :issue:`2080`).
Fixes
- Copies of a Variant object would cause a segfault when
.sampleswas accessed. (:user:`benjeffery`, :issue:`2400`, :pr:`2401`)
Changes
- Tables in a table collection can be replaced using the replace_with method (:user:`hyanwong`, :issue:`1489` :pr:`2389`)
- SVG drawing routines now return a special string object that is automatically rendered in a Jupyter notebook (:user:`hyanwong`, :pr:`2377`)
Features
- New
Site.alleles()method (:user:`hyanwong`, :issue:`2380`, :pr:`2385`) - The
variants(),haplotypes()andalignments()methods can now take a list of sample ids and a left and right position, to restrict the size of the output (:user:`hyanwong`, :issue:`2092`, :pr:`2397`)
Changes
- A
min_timeparameter indraw_svgenables the youngest node as the y axis min value, allowing negative times. (:user:`hyanwong`, :issue:`2197`, :pr:`2215`) VcfWriter.writenow prints the site ID of variants in the ID field of the output VCF files. (:user:`roohy`, :issue:`2103`, :pr:`2107`)- Make dumping of tables and tree sequences to disk a zero-copy operation. (:user:`benjeffery`, :issue:`2111`, :pr:`2124`)
- Add
copyargument toTreeSequence.variantswhich if False reuses the returnedVariantobject for improved performance. Defaults to True. (:user:`benjeffery`, :issue:`605`, :pr:`2172`) tree.mrcanow takes 2 or more arguments and gives the common ancestor of them all. (:user:`savitakartik`, :issue:`1340`, :pr:`2121`)- Add a
edgeattribute to theMutationclass that gives the ID of the edge that the mutation falls on. (:user:`jeromekelleher`, :issue:`685`, :pr:`2279`). - Add the
TreeSequence.split_edgesoperation which inserts nodes into edges at a specific time. (:user:`jeromekelleher`, :issue:`2276`, :pr:`2296`). - Add the
TreeSequence.decapitate(and closely relatedTableCollection.delete_older) operation to remove topology and mutations older than a give time. (:user:`jeromekelleher`, :issue:`2236`, :pr:`2302`, :pr:`2331`). - Add the
TreeSequence.individuals_timeandTreeSequence.individuals_populationmethods to return arrays of per-individual times and populations, respectively. (:user:`petrelharp`, :issue:`1481`, :pr:`2298`). - Add the
sample_maskandsite_masktowrite_vcfto allow parts of an output VCF to be omitted or marked as missing data. Also add theas_vcfconvenience function, to return VCF as a string. (:user:`jeromekelleher`, :pr:`2300`). - Add support for missing data to
write_vcf, and add theisolated_as_missingargument. (:user:`jeromekelleher`, :pr:`2329`, :issue:`447`). - Add
Tree.num_children_arrayandTree.num_children. Returns the counts of the number of child nodes for each or a single node in the tree respectively. (:user:`GertjanBisschop`, :issue:`2318`, :issue:`2319`, :pr:`2332`) - Add
Tree.path_length. (:user:`jeremyguez`, :issue:`2249`, :pr:`2259`). - Add B1 tree balance index. (:user:`jeremyguez`, :user:`jeromekelleher`, :issue:`2251`, :pr:`2281`, :pr:`2346`).
- Add B2 tree balance index. (:user:`jeremyguez`, :user:`jeromekelleher`, :issue:`2252`, :pr:`2353`, :pr:`2354`).
- Add Sackin tree imbalance index. (:user:`jeremyguez`, :user:`jeromekelleher`, :pr:`2246`, :pr:`2258`).
- Add Colless tree imbalance index. (:user:`jeremyguez`, :user:`jeromekelleher`, :issue:`2250`, :pr:`2266`, :pr:`2344`).
- Add
directionargument toTreeSequence.edge_diffs, allowing iteration over diffs in the reverse direction. NOTE: this comes with a ~10% performance regression as the implementation was moved from C to Python for simplicity and maintainability. Please open an issue if this affects your application. (:user:`jeromekelleher`, :user:`benjeffery`, :pr:`2120`). - Add
Tree.edge_arrayandTree.edge. Returns the edge id of the edge encoding the relationship of each node with its parent. (:user:`GertjanBisschop`, :issue:`2361`, :pr:`2357`) - Add
positionargument toTreeSequence.site. Returns aSiteobject if there is one at the specified position. If not, it raisesValueError. (:user:`szhan`, :issue:`2234`, :pr:`2235`)
Breaking Changes
- The JSON metadata codec now interprets the empty string as an empty object. This means that applying a schema to an existing table will no longer necessitate modifying the existing rows. (:user:`benjeffery`, :issue:`2064`, :pr:`2104`)
- Remove the previously deprecated
as_bytesargument toTreeSequence.variants. If you need genotypes in byte form this can be done following the code in theto_macsmethod on line5573oftrees.py. This argument was initially deprecated more than 3 years ago when the code was part ofmsprime. (:user:`benjeffery`, :issue:`605`, :pr:`2172`) - Arguments after
ploidyinwrite_vcfmarked as keyword only (:user:`jeromekelleher`, :pr:`2329`, :issue:`2315`). - When metadata equal to
b''is printed to text or HTML tables it will render as an empty string rather than"b''". (:user:`hyanwong`, :issue:`2349`, :pr:`2351`)
Changes
TableCollection.name_maphas been deprecated in favour oftable_name_map. (:user:`benjeffery`, :issue:`1981`, :pr:`2086`)
Fixes
TreeSequence.dump_textnow prints decoded metadata if there is a schema. (:user:`benjeffery`, :issue:`1860`, :issue:`1527`)- Add missing
ReferenceSequence.__eq__method. (:user:`benjeffery`, :issue:`2063`, :pr:`2085`)
Breaking changes
- The
Tree.num_nodesmethod is now deprecated with a warning, because it confusingly returns the number of nodes in the entire tree sequence, rather than in the tree. Text summaries of trees (e.g.str(tree)) now return the number of nodes in the tree, not in the entire tree sequence (:user:`hyanwong`, :issue:`1966` :pr:`1968`) - The CLI
infocommand now gives more detailed information on the tree sequence (:user:`benjeffery`, :pr:`1611`) - 64 bits are now used to store the sizes of ragged table columns such as metadata,
allowing them to hold more data. This change is fully backwards and forwards compatible
for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with
large offset arrays that require 64 bits will fail to load in previous versions with
error
_tskit.FileFormatError: An incompatible type for a column was found in the file. (:user:`jeromekelleher`, :issue:`343`, :issue:`1527`, :issue:`1528`, :issue:`1530`, :issue:`1554`, :issue:`1573`, :issue:`1589`,:issue:1598,:issue:1628, :pr:`1571`, :pr:`1579`, :pr:`1585`, :pr:`1590`, :pr:`1602`, :pr:`1618`, :pr:`1620`, :pr:`1652`). - The Tree class now conceptually has an extra node, the "virtual root" whose children are the roots of the tree. The quintuply linked tree arrays (parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array) all have one extra element. (:user:`jeromekelleher`, :issue:`1691`, :pr:`1704`).
- Tree traversal orders returned by the
nodesmethod have changed when there are multiple roots. Previously orders were defined locally for each root, but are now globally across all roots. (:user:`jeromekelleher`, :pr:`1704`). - Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
TableCollection.sortno longer sorts individuals. (:user:`benjeffery`, :issue:`1774`, :pr:`1789`) - Metadata encoding errors now raise
MetadataEncodingError(:user:`benjeffery`, :issue:`1505`, :pr:`1827`). - For
TreeSequence.samplesall arguments afterpopulationare now keyword only (:user:`benjeffery`, :issue:`1715`, :pr:`1831`). - Remove the method
TreeSequence.to_nexusand replace withTreeSequence.as_nexus. As the old method was not generating standards-compliant output, it seems unlikely that it was used by anyone. Calls toto_nexuswill result in a NotImplementedError, informing users of the change. See below for details onas_nexus. - Change default value for
missing_data_charin theTreeSequence.haplotypesmethod from "-" to "N". This is a more idiomatic usage to indicate missing data rather than a gap in an alignment. (:user:`jeromekelleher`, :issue:`1893`, :pr:`1894`)
Features
- Add the
ibd_segmentsmethod and associated classes to compute, summarise and store segments of identity by descent from a tree sequence (:user:`gtsambos`, :user:`jeromekelleher`). - Allow skipping of site and mutation tables in
TableCollection.sort(:user:`benjeffery`, :issue:`1475`, :pr:`1826`). - Add
TableCollection.sort_individualsto sort the individuals as this is no longer done by the default sort (:user:`benjeffery`, :issue:`1774`, :pr:`1789`). - Add
__setitem__to all tables allowing single rows to be updated. For exampletables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)(:user:`jeromekelleher`, :user:`benjeffery`, :issue:`1545`, :pr:`1600`). - Added a new parameter
timetoTreeSequence.samples()allowing to select samples at a specific time point or time interval. (:user:`mufernando`, :user:`petrelharp`, :issue:`1692`, :pr:`1700`) - Add
table.metadata_vectorto all table classes to allow easy extraction of a single metadata key into an array (:user:`petrelharp`, :issue:`1676`, :pr:`1690`). - Add
time_unitstoTreeSequenceto describe the units of the time dimension of the tree sequence. This is then used to generate an error iftime_unitsisuncalibratedwhen using the branch lengths in statistics. (:user:`benjeffery`, :issue:`1644`, :pr:`1760`, :pr:`1832`) - Add the
virtual_rootproperty to the Tree class (:user:`jeromekelleher`, :pr:`1704`). - Add the
num_edgesproperty to the Tree class (:user:`jeromekelleher`, :pr:`1704`). - Improved performance for tree traversal methods in the
nodesiterator. Roughly a 10X performance increase for "preorder", "postorder", "timeasc" and "timedesc" (:user:`jeromekelleher`, :pr:`1704`). - Substantial performance improvement for
Tree.total_branch_length(:user:`jeromekelleher`, :issue:`1794` :pr:`1799`) - Add the
discrete_genomeproperty to the TreeSequence class which is true if all coordinates are discrete (:user:`jeromekelleher`, :issue:`1144`, :pr:`1819`) - Add a
random_nucleotidesfunction. (user:jeromekelleher, :pr:`1825`) - Add the
TreeSequence.alignmentsmethod. (user:jeromekelleher, :pr:`1825`) - Add alignment export in the FASTA and nexus formats using the
TreeSequence.write_nexusandTreeSequence.write_fastamethods. (:user:`jeromekelleher`, :user:`hyanwong`, :pr:`1894`) - Add the
discrete_timeproperty to the TreeSequence class which is true if all time coordinates are discrete or unknown (:user:`benjeffery`, :issue:`1839`, :pr:`1890`) - Add the
skip_tablesoption toloadto support only loading top-level information from a file. Also add theignore_tablesoption toTableCollection.equalsandTableCollection.assert_equalsto compare only top-level information. (:user:`clwgg`, :pr:`1882`, :issue:`1854`). - Add the
skip_reference_sequenceoption toload. Also add theignore_reference_sequenceoptionequalsto compare two table collections without comparing their reference sequence. (:user:`clwgg`, :pr:`2019`, :issue:`1971`). - tskit now supports python 3.10 (:user:`benjeffery`, :issue:`1895`, :pr:`1949`)
Fixes
- dump_tables omitted individual parents. (:user:`benjeffery`, :issue:`1828`, :pr:`1884`)
- Add the
Tree.as_newickmethod and deprecateTree.newick. Theas_newickmethod by default labels samples with the pattern"n{node_id}"which is much more useful that the behaviour ofTree.newick(which mimicsmsoutput). (:user:`jeromekelleher`, :issue:`1671`, :pr:`1838`.) - Add the
as_nexusandwrite_nexusmethods to the TreeSequence class, replacing the brokento_nexusmethod (see above). This uses the same sample labelling pattern asas_newick. (:user:`jeetsukumaran`, :user:`jeromekelleher`, :issue:`1785`, :pr:`1835`, :pr:`1836`, :pr:`1838`) - load_text created additional populations even if the population table was specified, and didn't strip newlines from input text (:user:`hyanwong`, :issue:`1909`, :pr:`1910`)
Features
map_mutationsnow allows the ancestral state to be specified (:user:`hyanwong`, :user:`jeromekelleher`, :issue:`1542`, :pr:`1550`)
Breaking changes
Mutation.positionandMutation.indexwhich were deprecated in 0.2.2 (Sep '19) have been removed.
Features
- Add direct, copy-free access to the arrays representing the quintuply-linked structure
of
Tree(e.g.left_child_array). Allows performant algorithms over the tree structure using, for example, numba (:user:`jeromekelleher`, :issue:`1299`, :pr:`1320`). - Add fancy indexing to tables. E.g.
table[6:86]returns a new table with the specified rows. Supports slices, index arrays and boolean masks (:user:`benjeffery`, :issue:`1221`, :pr:`1348`, :pr:`1342`). - Add
Table.appendmethod for adding rows from classes such asSiteTableRowandSite(:user:`benjeffery`, :issue:`1111`, :pr:`1254`). - SVG visualization of a tree sequence can be restricted to displaying between left
and right genomic coordinates using the
x_limparameter. The default settings now mean that if the left or right flanks of a tree sequence are entirely empty, these regions will not be plotted in the SVG (:user:`hyanwong`, :pr:`1288`). - SVG visualization of a single tree allows all mutations on an edge to be plotted
via the
all_edge_mutationsparam (:user:`hyanwong`,:issue:1253, :pr:`1258`). - Entity classes such as
Mutation,Nodeare now python dataclasses (:user:`benjeffery`, :pr:`1261`). - Metadata decoding for table row access is now lazy (:user:`benjeffery`, :pr:`1261`).
- Add html notebook representation for
Treeand changeTree.__str__from dict representation to info table. (:user:`benjeffery`, :issue:`1269`, :pr:`1304`). - Improve display of tables when
print``ed, limiting lines set via ``tskit.set_print_options(:user:`benjeffery`,:issue:1270, :pr:`1300`). - Add
Table.assert_equalsandTableCollection.assert_equalswhich give an exact report of any differences. (:user:`benjeffery`,:issue:1076, :pr:`1328`)
Changes
- In drawing methods
max_tree_heightandtree_height_scalehave been deprecated in favour ofmax_timeandtime_scale(:user:`benjeffery`,:issue:1262, :pr:`1331`).
Fixes
- Tree sequences were not properly init'd after unpickling (:user:`benjeffery`, :issue:`1297`, :pr:`1298`)
Features
- SVG visualization plots mutations at the correct time, if it exists, and a y-axis, with label can be drawn. Both x- and y-axes can be plotted on trees as well as tree sequences (:user:`hyanwong`,:issue:840, :issue:`580`, :pr:`1236`)
- SVG visualization now uses squares for sample nodes and red crosses for mutations, with the site/mutation positions marked on the x-axis. Additionally, an x-axis label can be set (:user:`hyanwong`,:issue:1155, :issue:`1194`, :pr:`1182`, :pr:`1213`)
- Add
parentscolumn to the individual table to allow recording of pedigrees (:user:`ivan-krukov`, :user:`benjeffery`, :issue:`852`, :pr:`1125`, :pr:`866`, :pr:`1153`, :pr:`1177`, :pr:`1192` :pr:`1199`). - Added
Tree.generate_random_binarystatic method to create random binary trees (:user:`hyanwong`, :user:`jeromekelleher`, :pr:`1037`). - Change the default behaviour of Tree.split_polytomies to generate the shortest possible branch lengths instead of a fixed epsilon of 1e-10. (:user:`jeromekelleher`, :issue:`1089`, :pr:`1090`)
- Default value metadata in
add_rowfunctions is now schema-dependant, so thatmetadata={}is no longer needed as an argument when a schema is present (:user:`benjeffery`, :issue:`1084`). defaultin metadata schemas is used to fill in missing values when encoding for the struct codec. (:user:`benjeffery`, :issue:`1073`, :pr:`1116`).- Added
canonicaloption to table collection sorting (:user:`mufernando`, :user:`petrelharp`, :issue:`705`) - Added various arguments to
TreeSequence.subset, to allow for stable population indexing and lossless node reordering with subset. (:user:`petrelharp`, :pr:`1097`)
Changes
- Allow mutations that have the same derived state as their parent mutation. (:user:`benjeffery`, :issue:`1180`, :pr:`1233`)
- File minor version change to support individual parents
Breaking changes
- tskit now requires Python 3.7 (:user:`benjeffery`, :pr:`1235`)
Minor bugfix release.
Bugfixes
- Reinstate the unused zlib_compression option to tskit.dump, as msprime < 1.0 still uses it (:user:`jeromekelleher`, :issue:`1067`).
Features
- Add
TreeSequence.genetic_relatednessfor calculating genetic relatedness between pairs of sets of nodes (:user:`brieuclehmann`, :issue:`1021`, :pr:`1023`, :issue:`974`, :issue:`973`, :pr:`898`). - Expose
TreeSequence.coiterate()method to allow iteration over 2 sequences simultaneously, aiding comparison of trees from two sequences (:user:`jeromekelleher`, :user:`hyanwong`, :issue:`1021`, :pr:`1022`). - tskit is now supported on, and has wheels for, python3.9 (:user:`benjeffery`, :issue:`982`, :pr:`907`).
Tree.newick()now has extra optioninclude_branch_lengthsto allow branch lengths to be omitted (:user:`hyanwong`, :pr:`931`).- Added
Tree.generate_starstatic method to create star-topologies (:user:`hyanwong`, :pr:`934`). - Added
Tree.generate_combandTree.generate_balancedmethods to create example trees. (:user:`jeromekelleher`, :pr:`1026`). - Added
equalsmethod to TreeSequence, TableCollection and each of the tables which provides more flexible equality comparisons, for example, allowing users to ignore metadata or provenance in the comparison (:user:`mufernando`, :user:`jeromekelleher`, :issue:`896`, :pr:`897`, :issue:`913`, :pr:`917`). - Added
__eq__to TreeSequence (:user:`benjeffery`, :issue:`1011`, :pr:`1020`). ts.dumpandtskit.loadnow support reading and writing file objects such as FIFOs and sockets (:user:`benjeffery`, :issue:`657`, :pr:`909`).- Added
tskit.write_msfor writing to MS format (:user:`saurabhbelsare`, :issue:`727`, :pr:`854`). - Added
TableCollection.indexesfor access to the edge insertion/removal order indexes (:user:`benjeffery`, :issue:`4`, :pr:`916`). - The dictionary representation of a TableCollection now contains its index (:user:`benjeffery`, :issue:`870`, :pr:`921`).
- Added
TreeSequence._repr_html_for use in jupyter notebooks (:user:`benjeffery`, :issue:`872`, :pr:`923`). - Added
TreeSequence.__str__to display a summary for terminal usage (:user:`benjeffery`, :issue:`938`, :pr:`985`). - Added
TableCollection.dumpandTableCollection.load. This allows table collections that are not valid tree sequences to be manipulated (:user:`benjeffery`, :issue:`14`, :pr:`986`). - Added
nbytesmethod to tables,TableCollectionandTreeSequencewhich reports the size in bytes of those objects (:user:`jeromekelleher`, :user:`benjeffery`, :issue:`54`, :pr:`871`). - Added
TableCollection.clearto clear data table rows and optionally provenances, table schemas and tree-sequence level metadata and schema (:user:`benjeffery`, :issue:`929`, :pr:`1001`).
Bugfixes
LightWeightTableCollection.asdictandTableCollection.asdictnow return copies of arrays (:user:`benjeffery`, :issue:`1025`, :pr:`1029`).- The
map_mutationsmethod previously used the Fitch parsimony method, but this does not produce parsimonious results on non-binary trees. We now now use the Hartigan parsimony algorithm, which does (:user:`jeromekelleher`, :issue:`987`, :pr:`1030`). - The
flagargument to tables'add_rowwas treating the value as signed (:user:`benjeffery`, :issue:`1027`, :pr:`1031`).
Breaking changes
- The argument to
ts.dumpandtskit.loadhas been renamed file from path. - All arguments to
Tree.newick()except precision are now keyword-only. - Renamed
ts.trait_regressiontots.trait_linear_model.
Breaking changes
- The argument order of
Tree.unrankandcombinatorics.num_labellingsnow positions the number of leaves before the tree rank (:user:`daniel-goldstein`, :issue:`950`, :pr:`978`) - Change several methods (
simplify(),trees(),Tree()) so most parameters are keyword only, not positional. This allows reordering of parameters, so that deprecated parameters can be moved, and the parameter order in similar functions, e.g.TableCollection.simplifyandTreeSequence.simplify()can be made consistent (:user:`hyanwong`, :issue:`374`, :issue:`846`, :pr:`851`)
Features
- Add
split_polytomiesmethod to the Tree class (:user:`hyanwong`, :user:`jeromekelleher`, :issue:`809`, :pr:`815`) - Tree accessor functions (e.g.
ts.first(),ts.at()pass extra parameters such assample_indexesto the underlyingTreeconstructor; alsoroot_thresholdcan be specified when callingts.trees()(:user:`hyanwong`, :issue:`847`, :pr:`848`) - Genomic intervals returned by python functions are now namedtuples, allowing
.left.rightand.spanusage (:user:`hyanwong`, :issue:`784`, :pr:`786`, :pr:`811`) - Added
include_terminalparameter to edge diffs iterator, to output the last edges at the end of a tree sequence (:user:`hyanwong`, :issue:`783`, :pr:`787`) - :issue:`832` - Add
metadata_bytesmethod to allow access to raw TableCollection metadata (:user:`benjeffery`, :pr:`842`) - New
tree.is_isolated(u)method (:user:`hyanwong`, :pr:`443`). tskit.is_unknown_timecan now check arrays. (:user:`benjeffery`, :pr:`857`).
Bugfixes
- :issue:`823` - Fix mutation time error when using
simplify(keep_input_roots=True)(:user:`petrelharp`, :pr:`823`). - :issue:`821` - Fix mutation rows with unknown time never being equal (:user:`petrelharp`, :pr:`822`).
Major feature release for metadata schemas, set-like operations, mutation times, SVG drawing improvements and many others.
Breaking changes
- The default display order for tree visualisations has been changed to
minlex(see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available withorder="tree". - File system operations such as dump/load now raise an appropriate OSError
instead of
tskit.FileFormatError. Loading from an empty file now raises andEOFError. - Bad tree topologies are detected earlier, so that it is no longer possible
to create a
TreeSequenceobject which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (:user:`jeromekelleher`, :pr:`709`). - The
TableCollection objectno longer implements the iterator protocol. Previouslylist(tables)returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proofTableCollection.name_mapandTreeSequence.tables_dictattributes, which perform the same function (:user:`jeromekelleher`, :issue:`500`, :pr:`694`). - The arguments to
TreeSequence.genotype_matrix,TreeSequence.haplotypesandTreeSequence.variantsmust now be keyword arguments, not positional. This is to support the change fromimpute_missing_datatoisolated_as_missingin the arguments to these methods. (:user:`benjeffery`, :issue:`716`, :pr:`794`)
New features
- New methods to perform set operations on TableCollections and TreeSequences.
TableCollection.subsetsubsets and reorders table collections by nodes (:user:`mufernando`, :user:`petrelharp`, :pr:`663`, :pr:`690`).TableCollection.unionforms the node-wise union of two table collections (:user:`mufernando`, :user:`petrelharp`, :issue:`381` :pr:`623`). - Mutations now have an optional double-precision floating-point
timecolumn. If not specified, this defaults to a particularNaNvalue (tskit.UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see :ref:`sec_mutation_requirements`. Also added functionTableCollection.compute_mutation_times. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence (:user:`benjeffery`, :pr:`672`). - Add support for trees with internal samples for the Kendall-Colijn tree distance metric. (:user:`daniel-goldstein`, :pr:`610`)
- Add background shading to SVG tree sequences to reflect tree position along the sequence (:user:`hyanwong`, :pr:`563`).
- Tables with a metadata column now have a
metadata_schemathat is used to validate and encode metadata that is passed toadd_rowand decode metadata on calls totable[j]and e.g.tree_sequence.node(j)See :ref:`sec_metadata` (:user:`benjeffery`, :pr:`491`, :pr:`542`, :pr:`543`, :pr:`601`). - The tree-sequence now has top-level metadata with a schema (:user:`benjeffery`, :pr:`666`, :pr:`644`, :pr:`642`).
- Add classes to SVG drawings to allow easy adjustment and styling, and document the new
tskit.Tree.draw_svg()andtskit.TreeSequence.draw_svg()methods. This also fixes :issue:`467` for duplicate SVG entityids in Jupyter notebooks (:user:`hyanwong`, :pr:`555`). - Add a
to_nexusfunction that outputs a tree sequence in Nexus format (:user:`saunack`, :pr:`550`). - Add extension of Kendall-Colijn tree distance metric for tree sequences
computed by
TreeSequence.kc_distance(:user:`daniel-goldstein`, :pr:`548`). - Add an optional node traversal order in
tskit.Treethat uses the minimum lexicographic order of leaf nodes visited. This ordering ("minlex_postorder") adds more determinism because it constraints the order in which children of a node are visited (:user:`brianzhang01`, :pr:`411`). - Add an
orderargument to the tree visualisation functions which supports two node orderings:"tree"(the previous default) and"minlex"which stabilises the node ordering (making it easier to compare trees). The default node ordering is changed to"minlex"(:user:`brianzhang01`, :user:`jeromekelleher`, :issue:`389`, :pr:`566`). - Add
_repr_html_to tables, so that jupyter notebooks render them as html tables (:user:`benjeffery`, :pr:`514`). - Remove support for
kc_distanceon trees with unary nodes (:user:`daniel-goldstein`, :pr:`508`). - Improve Kendall-Colijn tree distance algorithm to operate in O(n^2) time instead of O(n^2 * log(n)) where n is the number of samples (:user:`daniel-goldstein`, :pr:`490`).
- Add a metadata column to the migrations table. Works similarly to existing metadata columns on other tables (:user:`benjeffery`, :pr:`505`).
- Add a metadata column to the edges table. Works similarly to existing metadata columns on other tables (:user:`benjeffery`, :pr:`496`).
- Allow sites with missing data to be output by the
haplotypesmethod, by default replacing with-. Errors are no longer raised for missing data withisolated_as_missing=True; the error types returned for bad alleles (e.g. multiletter or non-ascii) have also changed from_tskit.LibraryErrorto TypeError, or ValueError if the missing data character clashes (:user:`hyanwong`, :pr:`426`). - Access the number of children of a node in a tree directly using
tree.num_children(u)(:user:`hyanwong`, :pr:`436`). - User specified allele mapping for genotypes in
variantsandgenotype_matrix(:user:`jeromekelleher`, :pr:`430`). - New
root_thresholdoption for the Tree class, which allows us to efficiently iterate over 'real' roots when we have missing data (:user:`jeromekelleher`, :pr:`462`). - Add pickle support for
TreeSequence(:user:`terhorst`, :pr:`473`). - Add
tree.as_dict_of_dicts()function to enable use with networkx. See :ref:`sec_tutorial_networkx` (:user:`winni2k`, :pr:`457`). - Add
tree_sequence.to_macs()function to convert tree sequence to MACS format (:user:`winni2k`, :pr:`727`) - Add a
keep_input_rootsoption to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (:user:`jeromekelleher`, :issue:`775`, :pr:`782`).
Bugfixes
- :issue:`453` - Fix LibraryError when
tree.newick()is called with large node time values (:user:`jeromekelleher`, :pr:`637`). - :issue:`777` - Mutations over isolated samples were incorrectly decoded as missing data. (:user:`jeromekelleher`, :pr:`778`)
- :issue:`776` - Fix a segfault when a partial list of samples
was provided to the
variantsiterator. (:user:`jeromekelleher`, :pr:`778`)
Deprecated
- The
sample_countsfeature has been deprecated and is now ignored. Sample counts are now always computed. - For
TreeSequence.genotype_matrix,TreeSequence.haplotypesandTreeSequence.variantstheimpute_missing_dataargument is deprecated and replaced withisolated_as_missing. Note that to get the same behaviourimpute_missing_data=Trueshould be replaced withisolated_as_missing=False. (:user:`benjeffery`, :issue:`716`, :pr:`794`)
Minor feature release, providing a tree distance metric and various method to manipulate tree sequence data.
New features
- Kendall-Colijn tree distance metric computed by
Tree.kc_distance(:user:`awohns`, :pr:`172`). - New "timeasc" and "timedesc" orders for tree traversals (:user:`benjeffery`, :issue:`246`, :pr:`399`).
- Up to 2X performance improvements to tree traversals (:user:`benjeffery`, :pr:`400`).
- Add
trim,delete_sites,keep_intervalsanddelete_intervalsmethods to edit tree sequence data. (:user:`hyanwong`, :pr:`364`, :pr:`372`, :pr:`377`, :pr:`390`). - Initial online documentation for CLI (:user:`hyanwong`, :pr:`414`).
- Various documentation improvements (:user:`hyanwong`, :user:`jeromekelleher`, :user:`petrelharp`).
- Rename the
map_ancestorsfunction tolink_ancestors(:user:`hyanwong`, :user:`gtsambos`; :pr:`406`, :issue:`262`). The original function is retained as an deprecated alias.
Bugfixes
- Fix height scaling issues with SVG tree drawing (:user:`jeromekelleher`, :pr:`407`, :issue:`383`, :pr:`378`).
- Do not reuse buffers in
LdCalculator(:user:`jeromekelleher`). See :pr:`397` and :issue:`396`.
Minor bugfix release.
Relaxes overly-strict input requirements on individual location data that caused some SLiM tree sequences to fail loading in version 0.2.1 (see :issue:`351`).
New features
- Add log_time height scaling option for drawing SVG trees (:user:`marianne-aspbury`). See :pr:`324` and :issue:`303`.
Bugfixes
- Allow 4G metadata columns (:user:`jeromekelleher`). See :pr:`342` and :issue:`341`.
Major feature release, adding support for population genetic statistics, improved VCF output and many other features.
Note: Version 0.2.0 was skipped because of an error uploading to PyPI which could not be undone.
Breaking changes
- Genotype arrays returned by
TreeSequence.variantsandTreeSequence.genotype_matrixhave changed from unsigned 8 bit values to signed 8 bit values to accomodate missing data (see :issue:`144` for discussion). Specifically, the dtype of the genotypes arrays have changed from numpy "u8" to "i8". This should not affect client code in any way unless it specifically depends on the type of the returned numpy array. - The VCF written by the
write_vcfis no longer compatible with previous versions, which had significant shortcomings. Position values are now rounded to the nearest integer by default, REF and ALT values are derived from the actual allelic states (rather than always being A and T). Sample names are now of the formtsk_jfor sample ID j. Most of the legacy behaviour can be recovered with new options, however. - The positional parameter
reference_setsingenealogical_nearest_neighboursandmean_descendantsTreeSequence methods has been renamed tosample_sets.
New features
- Support for general windowed statistics. Implementations of diversity, divergence, segregating sites, Tajima's D, Fst, Patterson's F statistics, Y statistics, trait correlations and covariance, and k-dimensional allele frequency specra (:user:`petrelharp`, :user:`jeromekelleher`, :user:`molpopgen`).
- Add the
keep_unaryoption to simplify (:user:`gtsambos`). See :issue:`1` and :pr:`143`. - Add the
map_ancestorsmethod to TableCollection (user:gtsambos). See :pr:`175`. - Add the
squashmethod to EdgeTable (:user:`gtsambos`). See :issue:`59` and :pr:`285`. - Add support for individuals to VCF output, and fix major issues with output
format (:user:`jeromekelleher`). Position values are transformed in a much
more straightforward manner and output has been generalised substantially.
Adds
individual_namesandposition_transformarguments. See :pr:`286`, and issues :issue:`2`, :issue:`30` and :issue:`73`. - Control height scale in SVG trees using 'tree_height_scale' and 'max_tree_height' (:user:`hyanwong`, :user:`jeromekelleher`). See :issue:`167`, :pr:`168`. Various other improvements to tree drawing (:pr:`235`, :pr:`241`, :pr:`242`, :pr:`252`, :pr:`259`).
- Add
Tree.max_root_timeproperty (:user:`hyanwong`, :user:`jeromekelleher`). See :pr:`170`. - Improved input checking on various methods taking numpy arrays as parameters (:user:`hyanwong`). See :issue:`8` and :pr:`185`.
- Define the branch length over roots in trees to be zero (previously raise an error; :user:`jeromekelleher`). See :issue:`188` and :pr:`191`.
- Implementation of the genealogical nearest neighbours statistic (:user:`hyanwong`, :user:`jeromekelleher`).
- New
delete_intervalsandkeep_intervalsmethod for the TableCollection to allow slicing out of topology from specific intervals (:user:`hyanwong`, :user:`andrewkern`, :user:`petrelharp`, :user:`jeromekelleher`). See :pr:`225` and :pr:`261`. - Support for missing data via a topological definition (:user:`jeromekelleher`). See :issue:`270` and :pr:`272`.
- Add ability to set columns directly in the Tables API (:user:`jeromekelleher`). See :issue:`12` and :pr:`307`.
- Various documentation improvements from :user:`brianzhang01`, :user:`hyanwong`, :user:`petrelharp` and :user:`jeromekelleher`.
Deprecated
- Deprecate
Tree.lengthin favour ofTree.span(:user:`hyanwong`). See :pr:`169`. - Deprecate
TreeSequence.pairwise_diversityin favour of the newdiversitymethod. See :issue:`215`, :pr:`312`.
Bugfixes
- Catch NaN and infinity values within tables (:user:`hyanwong`). See :issue:`293` and :pr:`294`.
This release removes support for Python 2, adds more flexible tree access and a
new tskit command line interface.
New features
- Remove support for Python 2 (:user:`hugovk`). See :issue:`137` and :pr:`140`.
- More flexible tree API (:pr:`121`). Adds
TreeSequence.atandTreeSequence.at_indexmethods to find specific trees, and efficient support for backwards traversal usingreversed(ts.trees()). - Add initial
tskitCLI (:issue:`80`) - Add
tskit infoCLI command (:issue:`66`) - Enable drawing SVG trees with coloured edges (:user:`hyanwong`; :issue:`149`).
- Add
Tree.is_descendantmethod (:issue:`120`) - Add
Tree.copymethod (:issue:`122`)
Bugfixes
- Fixes to the low-level C API (:issue:`132` and :issue:`157`)
Minor feature update. Using the C API 0.99.1.
New features
- Add interface for setting TableCollection.sequence_length: #107
- Add support for building and dropping TableCollection indexes: #108
Bugfix release.
Bugfixes
- Fix missing provenance schema: #81
Bugfix release.
Bugfixes
- Fix memory leak in table collection. #76
Fixes broken distribution tarball for 0.1.0.
Initial release after separation from msprime 0.6.2. Code that reads tree sequence files and processes them should be able to work without changes.
Breaking changes
- Removal of the previously deprecated
sort_tables,simplify_tablesandload_tablesfunctions. All code should change to using corresponding TableCollection methods. - Rename
SparseTreeclass toTree.
Initial alpha version posted to PyPI for bootstrapping.
Initial extraction of tskit code from msprime. Relicense to MIT.
Code copied at hash 29921408661d5fe0b1a82b1ca302a8b87510fd23