Skip to content

Latest commit

 

History

History
1502 lines (1056 loc) · 60.8 KB

File metadata and controls

1502 lines (1056 loc) · 60.8 KB

[1.0.x] - YYYY-MM-DD

Bugfixes

  • ts.samples(population=...) now raises a ValueError if the population ID is e.g. a population name, rather than silently returning no samples. (:user:`hyanwong`, :pr:`3344`)

Features

[1.0.0] - 2025-11-27

Breaking changes

  • The reference_sequence argument to TreeSequence.alignments is now required to be the same length as the tree sequence. Previously it was required to be the length of the requested interval. (:user:`benjeffery`, :pr:`3317`)
  • TreeSequence.tables now returns a zero-copy immutable view of the tables. To get a mutable copy, use TreeSequence.dump_tables(). (:user:`benjeffery`, :pr:`3288`, :issue:`760`)
  • For a tree sequence to be valid, the mutation parents in the table collection must be correct and consistent with the topology of the tree at each mutation site. TableCollection.tree_sequence() will raise a _tskit.LibraryError if this is not the case. (:user:`benjeffery`, :issue:`2729`, :issue:`2732`, :pr:`3212`).
  • Drop Python 3.9 support and require Python >= 3.10. (:pr:`3267`, :user:`benjeffery`)
  • ltrim, rtrim, trim and shift raise an error if they are used on a tree sequence containing a reference sequence. (:user:`hyanwong`, :pr:`3210`, :issue:`2091`)

Features

  • Add tskit.jit.numba.jitwrap and NumbaTreeSequence to allow simplified use and development of Numba-jitted functions with tree sequences. See the documentation for details. (:user:`andrewkern`, :pr:`3295`, :issue:`3294`)
  • TreeSequence.map_to_vcf_model now also returns the transformed positions and contig length. (:user:`benjeffery`, :pr:`3174`, :issue:`3173`)
  • draw_svg() methods now associate tree branches with edge IDs. (:user:`hyanwong`, :pr:`3193`, :issue:`557`)
  • draw_svg() methods now allow the y-axis to be placed on the right-hand side using y_axis="right". (:user:`hyanwong`, :pr:`3201`)
  • Add contig_id and isolated_as_missing to VcfModelMapping (:user:`benjeffery`, :pr:`3219`, :issue:`3177`).
  • Add TreeSequence.mutations_edge, which returns the edge ID for each mutation's edge. (:user:`benjeffery`, :pr:`3226`, :issue:`3189`)
  • Add TreeSequence.sites_ancestral_state, TreeSequence.mutations_derived_state and TreeSequence.mutations_inherited_state properties to return the ancestral state of sites, the derived state of mutations and the inherited state of mutations as NumPy arrays of the new NumPy 2.0 StringDType. (:user:`benjeffery`, :pr:`3228`, :issue:`2632`, :pr:`3276`, :issue:`2631`)
  • Tskit now requires NumPy version 2 or later. However, you can still use tskit with NumPy 1.x by building tskit from source with NumPy 1.x using pip install tskit --no-binary tskit. With NumPy 1.x, any use of the new StringDType properties will result in a RuntimeError. If you try to use another Python module that was compiled against NumPy 1.x with NumPy 2.x you may see the error "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash.". If no newer version of the module is available you will have to use the NumPy 1.x build as above.
  • Add Mutation.inherited_state property which returns the inherited state for a single mutation. (:user:`benjeffery`, :pr:`3277`, :issue:`2631`)
  • Add all_mutations and all_edges options to TreeSequence.union, allowing greater flexibility in "disjoint union" situations. (:user:`hyanwong`, :user:`petrelharp`, :issue:`3181`)
  • Add TreeSequence.divergence_matrix, which was previously undocumented.
  • TreeSequence.variants, .genotype_matrix, .haplotypes, and .alignments methods now fully support isolated_as_missing behaviour with internal nodes. .alignments is also around 10% faster. (:user:`benjeffery`, :pr:`3313`, :pr:`3317`, :issue:`1896`)

Bugfixes

  • In some tables with mutations out-of-order TableCollection.sort did not re-order the mutations so they formed a valid TreeSequence. TableCollection.sort and TableCollection.canonicalise now sort mutations by site, then time (if known), then the mutation's node's time, then number of descendant mutations (ensuring that parent mutations occur before children), then node, then their original order in the tables. (:user:`benjeffery`, :pr:`3257`, :issue:`3253`)
  • Fix bug in TreeSequence.genetic_relatedness_vector that previously ignored span_normalise: previously, span_normalise was always set to False; now the default is True in agreement with other statistics, so the returned values will change. (:user:`petrelharp`, :pr:`3300`, :issue:`3241`)
  • Fix bug in TreeSequence.pair_coalescence_counts when span_normalise=True and a window breakpoint falls within an internal missing interval. (:user:`nspope`, :pr:`3176`, :issue:`3175`)
  • Fix metadata schemas that are equal but have different byte representations not being considered equal when using TableCollection.assert_equals and Table.assert_equals. (:user:`benjeffery`, :pr:`3246`, :issue:`3244`)
  • k-way statistics no longer require k sample sets, allowing in particular "self" comparisons for TreeSequence.genetic_relatedness. This changes the error code returned in some situations. (:user:`andrewkern`, :user:`petrelharp`, :pr:`3235`, :issue:`3055`)
  • Fix UnboundLocalError in draw_svg() when using numeric max_time values with mutations over roots. (:user:`benjeffery`, :pr:`3274`, :issue:`3273`)
  • Prevent iterating over a TopologyCounter. (:user:`benjeffery`, :pr:`3202`, :issue:`1462`)
  • Fix TreeSequence.concatenate() to work with internal samples by using the all_mutations and all_edges parameters in union(). (:user:`hyanwong`, :pr:`3283`, :issue:`3181`)

[0.6.4] - 2025-05-21

Features

  • Add TreeSequence.sample_nodes_by_ploidy method to return the sample nodes in a tree sequence, grouped by a ploidy value. (:user:`benjeffery`, :pr:`3157`)
  • Add TreeSequence.individuals_nodes attribute to return the nodes associated with each individual as a numpy array. (:user:`benjeffery`, :pr:`3153`)
  • Add shift method to both TableCollection and TreeSequence classes allowing the coordinate system to be shifted, and TreeSequence.concatenate so a set of tree sequence can be added to the right of an existing one. (:user:`hyanwong`, :pr:`3165`, :issue:`3164`)
  • Add TreeSequence.map_to_vcf_model method to return a mapping of the tree sequence to the VCF model. (:user:`benjeffery`, :pr:`3163`)
  • Use a thin space as the thousands separator in HTML output, and a comma in CLI output. (:user:`hossam26644`, :pr:`3167`, :issue:`2951`)

Fixes

Breaking changes

  • TreeSequence.write_vcf now filters non-sample nodes from individuals by default, instead of raising an error. These nodes can be included using the new include_non_sample_nodes argument. By default individual names (sample IDs) in VCF output are now of the form tsk_{individual.id} Previously these were always "tsk_{j}" for j in range(num_individuals). This may break some downstream code if individuals are specified. To fix, manually specify individual_names to the required pattern. (:user:`benjeffery`, :pr:`3163`)

[0.6.3] - 2025-04-28

Bugfixes

[0.6.2] - 2025-04-01

Bugfixes

Breaking Changes

[0.6.1] - 2025-03-31

Bugfixes

  • Fix to TreeSequence.pair_coalescence_counts output dimension when provided with time windows containing no nodes (:user:`nspope`, :issue:`3046`, :pr:`3058`)
  • Fix to TreeSequence.pair_coalescence_counts to normalise by non-missing span if span_normalise=True. This resolves a bug where TreeSequence.pair_coalescence_rates would return incorrect values for intervals with missing trees. (:user:`natep`, :issue:`3053`, :pr:`3059`)
  • Fix to TreeSequence.pair_coalescence_rates causing an assertion to be triggered by floating point error, when all coalescence events are inside a single time window (:user:`natep`, :issue:`3035`, :pr:`3038`)

Features

[0.6.0] - 2024-10-16

Breaking Changes

  • The definition of TreeSequence.genetic_relatedness and TreeSequence.genetic_relatedness_weighted are changed to average over sample sets, rather than summing over them. For computation with diploid sample sets, this will change the result by a factor of four; for larger sample sets it will now produce sensible values that are comparable between sample sets of different sizes. The default for these methods is also changed to polarised=True, but the output is unchanged for centre=True (the default). See the documentation for these methods for more discussion. (:user:`petrelharp`, :user:`mmosmond`, :pr:`1623`)

Bugfixes

Features

[0.5.8] - 2024-06-27

[0.5.7] - 2024-06-17

Breaking Changes

  • The VCF writing methods (ts.write_vcf, ts.as_vcf) now error if a site with position zero is encountered. The VCF spec does not allow zero position sites. Suppress this error with the allow_position_zero argument. (:user:`benjeffery`, :pr:`2901`, :issue:`2838`)

Bugfixes

  • Fix to the folded, expected allele frequency spectrum (i.e., TreeSequence.allele_frequency_spectrum(mode="branch", polarised=False), which was half as big as it should have been. (:user:`petrelharp`, :user:`nspope`, :pr:`2933`)

[0.5.6] - 2023-10-10

Breaking Changes

  • tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27

Features

Bugfixes

[0.5.5] - 2023-05-17

Performance improvements

  • Methods like ts.at() which seek to a specified position on the sequence from a new Tree instance are now much faster (:user:`molpopgen`, :pr:`2661`).

Features

Breaking Changes

Bugfixes

[0.5.4] - 2023-01-13

Features

  • A new Tree.is_root method avoids the need to to search the potentially large list of Tree.roots (:user:`hyanwong`, :pr:`2669`, :issue:`2620`)
  • The TreeSequence object now has the attributes min_time and max_time, which are the minimum and maximum among the node times and mutation times, respectively. (:user:`szhan`, :pr:`2612`, :issue:`2271`)
  • The draw_svg methods now have a max_num_trees parameter to truncate the total number of trees shown, giving a readable display for tree sequences with many trees (:user:`hyanwong`, :pr:`2652`)
  • The draw_svg methods now accept a canvas_size parameter to allow extra room on the canvas e.g. for long labels or repositioned graphical elements (:user:`hyanwong`, :pr:`2646`, :issue:`2645`)
  • The Tree object now has the method siblings to get
    the siblings of a node. It returns an empty tuple if the node has no siblings, is not a node in the tree, is the virtual root, or is an isolated non-sample node. (:user:`szhan`, :pr:`2618`, :issue:`2616`)
  • The msprime.RateMap class has been ported into tskit: functionality should be identical to the version in msprime, apart from minor changes in the formatting of tabular text output (:user:`hyanwong`, :user:`jeromekelleher`, :pr:`2678`)
  • Tskit now supports and has wheels for Python 3.11. This Python version has a significant performance boost (:user:`benjeffery`, :pr:`2624`, :issue:`2248`)
  • Add the update_sample_flags option to simplify which ensures no node sample flags are changed to allow calling code to manage sample status. (:user:`jeromekelleher`, :issue:`2662`, :pr:`2663`).

Breaking Changes

  • the filter_populations, filter_individuals, and filter_sites parameters to simplify previously defaulted to True but now default to None, which is treated as True. Previously, passing None would result in an error. (:user:`hyanwong`, :pr:`2609`, :issue:`2608`)

[0.5.3] - 2022-10-03

Fixes

Features

Breaking Changes

Performance improvements

  • TreeSequence.link_ancestors no longer continues to process edges once all of the sample and ancestral nodes have been accounted for, improving memory overhead and overall performance (:user:`gtsambos`, :pr:`2456`, :issue:`2442`)

[0.5.2] - 2022-07-29

Fixes

Performance improvements

Features

Breaking Changes

  • Previously, accessing (e.g.) tables.edges returned a different instance of EdgeTable each time. This has been changed to return the same instance for the lifetime of a given TableCollection instance. This is technically a breaking change, although it's difficult to see how code would depend on the property that (e.g.) tables.edges is not tables.edges. (:user:`jeromekelleher`, :pr:`2441`, :issue:`2080`).

[0.5.1] - 2022-07-14

Fixes

Changes

Features

[0.5.0] - 2022-06-22

Changes

Breaking Changes

  • The JSON metadata codec now interprets the empty string as an empty object. This means that applying a schema to an existing table will no longer necessitate modifying the existing rows. (:user:`benjeffery`, :issue:`2064`, :pr:`2104`)
  • Remove the previously deprecated as_bytes argument to TreeSequence.variants. If you need genotypes in byte form this can be done following the code in the to_macs method on line 5573 of trees.py. This argument was initially deprecated more than 3 years ago when the code was part of msprime. (:user:`benjeffery`, :issue:`605`, :pr:`2172`)
  • Arguments after ploidy in write_vcf marked as keyword only (:user:`jeromekelleher`, :pr:`2329`, :issue:`2315`).
  • When metadata equal to b'' is printed to text or HTML tables it will render as an empty string rather than "b''". (:user:`hyanwong`, :issue:`2349`, :pr:`2351`)

[0.4.1] - 2022-01-11

Changes

Fixes

[0.4.0] - 2021-12-10

Breaking changes

  • The Tree.num_nodes method is now deprecated with a warning, because it confusingly returns the number of nodes in the entire tree sequence, rather than in the tree. Text summaries of trees (e.g. str(tree)) now return the number of nodes in the tree, not in the entire tree sequence (:user:`hyanwong`, :issue:`1966` :pr:`1968`)
  • The CLI info command now gives more detailed information on the tree sequence (:user:`benjeffery`, :pr:`1611`)
  • 64 bits are now used to store the sizes of ragged table columns such as metadata, allowing them to hold more data. This change is fully backwards and forwards compatible for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with large offset arrays that require 64 bits will fail to load in previous versions with error _tskit.FileFormatError: An incompatible type for a column was found in the file. (:user:`jeromekelleher`, :issue:`343`, :issue:`1527`, :issue:`1528`, :issue:`1530`, :issue:`1554`, :issue:`1573`, :issue:`1589`,:issue:1598,:issue:1628, :pr:`1571`, :pr:`1579`, :pr:`1585`, :pr:`1590`, :pr:`1602`, :pr:`1618`, :pr:`1620`, :pr:`1652`).
  • The Tree class now conceptually has an extra node, the "virtual root" whose children are the roots of the tree. The quintuply linked tree arrays (parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array) all have one extra element. (:user:`jeromekelleher`, :issue:`1691`, :pr:`1704`).
  • Tree traversal orders returned by the nodes method have changed when there are multiple roots. Previously orders were defined locally for each root, but are now globally across all roots. (:user:`jeromekelleher`, :pr:`1704`).
  • Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence. TableCollection.sort no longer sorts individuals. (:user:`benjeffery`, :issue:`1774`, :pr:`1789`)
  • Metadata encoding errors now raise MetadataEncodingError (:user:`benjeffery`, :issue:`1505`, :pr:`1827`).
  • For TreeSequence.samples all arguments after population are now keyword only (:user:`benjeffery`, :issue:`1715`, :pr:`1831`).
  • Remove the method TreeSequence.to_nexus and replace with TreeSequence.as_nexus. As the old method was not generating standards-compliant output, it seems unlikely that it was used by anyone. Calls to to_nexus will result in a NotImplementedError, informing users of the change. See below for details on as_nexus.
  • Change default value for missing_data_char in the TreeSequence.haplotypes method from "-" to "N". This is a more idiomatic usage to indicate missing data rather than a gap in an alignment. (:user:`jeromekelleher`, :issue:`1893`, :pr:`1894`)

Features

Fixes

[0.3.7] - 2021-07-08

Features

[0.3.6] - 2021-05-14

Breaking changes

  • Mutation.position and Mutation.index which were deprecated in 0.2.2 (Sep '19) have been removed.

Features

Changes

  • In drawing methods max_tree_height and tree_height_scale have been deprecated in favour of max_time and time_scale (:user:`benjeffery`,:issue:1262, :pr:`1331`).

Fixes

[0.3.5] - 2021-03-16

Features

Changes

Breaking changes

[0.3.4] - 2020-12-02

Minor bugfix release.

Bugfixes

[0.3.3] - 2020-11-27

Features

Bugfixes

Breaking changes

  • The argument to ts.dump and tskit.load has been renamed file from path.
  • All arguments to Tree.newick() except precision are now keyword-only.
  • Renamed ts.trait_regression to ts.trait_linear_model.

[0.3.2] - 2020-09-29

Breaking changes

  • The argument order of Tree.unrank and combinatorics.num_labellings now positions the number of leaves before the tree rank (:user:`daniel-goldstein`, :issue:`950`, :pr:`978`)
  • Change several methods (simplify(), trees(), Tree()) so most parameters are keyword only, not positional. This allows reordering of parameters, so that deprecated parameters can be moved, and the parameter order in similar functions, e.g. TableCollection.simplify and TreeSequence.simplify() can be made consistent (:user:`hyanwong`, :issue:`374`, :issue:`846`, :pr:`851`)

Features

[0.3.1] - 2020-09-04

Bugfixes

[0.3.0] - 2020-08-27

Major feature release for metadata schemas, set-like operations, mutation times, SVG drawing improvements and many others.

Breaking changes

  • The default display order for tree visualisations has been changed to minlex (see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available with order="tree".
  • File system operations such as dump/load now raise an appropriate OSError instead of tskit.FileFormatError. Loading from an empty file now raises and EOFError.
  • Bad tree topologies are detected earlier, so that it is no longer possible to create a TreeSequence object which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (:user:`jeromekelleher`, :pr:`709`).
  • The TableCollection object no longer implements the iterator protocol. Previously list(tables) returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proof TableCollection.name_map and TreeSequence.tables_dict attributes, which perform the same function (:user:`jeromekelleher`, :issue:`500`, :pr:`694`).
  • The arguments to TreeSequence.genotype_matrix, TreeSequence.haplotypes and TreeSequence.variants must now be keyword arguments, not positional. This is to support the change from impute_missing_data to isolated_as_missing in the arguments to these methods. (:user:`benjeffery`, :issue:`716`, :pr:`794`)

New features

Bugfixes

Deprecated

  • The sample_counts feature has been deprecated and is now ignored. Sample counts are now always computed.
  • For TreeSequence.genotype_matrix, TreeSequence.haplotypes and TreeSequence.variants the impute_missing_data argument is deprecated and replaced with isolated_as_missing. Note that to get the same behaviour impute_missing_data=True should be replaced with isolated_as_missing=False. (:user:`benjeffery`, :issue:`716`, :pr:`794`)

[0.2.3] - 2019-11-22

Minor feature release, providing a tree distance metric and various method to manipulate tree sequence data.

New features

Bugfixes

[0.2.2] - 2019-09-01

Minor bugfix release.

Relaxes overly-strict input requirements on individual location data that caused some SLiM tree sequences to fail loading in version 0.2.1 (see :issue:`351`).

New features

Bugfixes

[0.2.1] - 2019-08-23

Major feature release, adding support for population genetic statistics, improved VCF output and many other features.

Note: Version 0.2.0 was skipped because of an error uploading to PyPI which could not be undone.

Breaking changes

  • Genotype arrays returned by TreeSequence.variants and TreeSequence.genotype_matrix have changed from unsigned 8 bit values to signed 8 bit values to accomodate missing data (see :issue:`144` for discussion). Specifically, the dtype of the genotypes arrays have changed from numpy "u8" to "i8". This should not affect client code in any way unless it specifically depends on the type of the returned numpy array.
  • The VCF written by the write_vcf is no longer compatible with previous versions, which had significant shortcomings. Position values are now rounded to the nearest integer by default, REF and ALT values are derived from the actual allelic states (rather than always being A and T). Sample names are now of the form tsk_j for sample ID j. Most of the legacy behaviour can be recovered with new options, however.
  • The positional parameter reference_sets in genealogical_nearest_neighbours and mean_descendants TreeSequence methods has been renamed to sample_sets.

New features

Deprecated

Bugfixes

[0.1.5] - 2019-03-27

This release removes support for Python 2, adds more flexible tree access and a new tskit command line interface.

New features

Bugfixes

[0.1.4] - 2019-02-01

Minor feature update. Using the C API 0.99.1.

New features

  • Add interface for setting TableCollection.sequence_length: #107
  • Add support for building and dropping TableCollection indexes: #108

[0.1.3] - 2019-01-14

Bugfix release.

Bugfixes

  • Fix missing provenance schema: #81

[0.1.2] - 2019-01-14

Bugfix release.

Bugfixes

  • Fix memory leak in table collection. #76

[0.1.1] - 2019-01-11

Fixes broken distribution tarball for 0.1.0.

[0.1.0] - 2019-01-11

Initial release after separation from msprime 0.6.2. Code that reads tree sequence files and processes them should be able to work without changes.

Breaking changes

  • Removal of the previously deprecated sort_tables, simplify_tables and load_tables functions. All code should change to using corresponding TableCollection methods.
  • Rename SparseTree class to Tree.

[1.1.0a1] - 2019-01-10

Initial alpha version posted to PyPI for bootstrapping.

[0.0.0] - 2019-01-10

Initial extraction of tskit code from msprime. Relicense to MIT.

Code copied at hash 29921408661d5fe0b1a82b1ca302a8b87510fd23