Skip to content

[FEATURE] t-digest for go #96

@proost

Description

@proost

Summary

Add Go version t-digest.

Design

Double

Adding Double Precision t-digest. Because Go doesn’t support meta programming like C++ does, pairing (float32, uint32) and (float64, uint64) is difficult. In this time, i will add double precision only.

// Double provides an implementation of t-Digest double precision type for estimating quantiles and ranks.
// This implementation is based on the paper:
// Ted Dunning, Otmar Ertl. "Extremely Accurate Quantiles Using t-Digests"
// and the reference implementation: https://github.com/tdunning/t-digest
// NOTE: This implementation is similar to MergingDigest in the above Java implementation
type Double struct {
	min               float64
	max               float64
	centroids         []doublePrecisionCentroid
	buffer            []float64
	centroidsWeight   uint64
	centroidsCapacity int
	k                 uint16
	reverseMerge      bool
}

type doublePrecisionCentroid struct {
	mean   float64
	weight uint64
}

func (c *doublePrecisionCentroid) add(other doublePrecisionCentroid) {
	c.weight += other.weight
	c.mean += (other.mean - c.mean) * float64(other.weight) / float64(c.weight)
}

// Update updates a value to the TDigest
func (d *Double) Update(value float64) error

// Merge merges another Double into this one
func (d *Double) Merge(other *Double) error

// IsEmpty returns true if the TDigest has not seen any data
func (d *Double) IsEmpty() bool

// MinValue returns the minimum value seen by the TDigest
func (d *Double) MinValue() (float64, error)

// MaxValue returns the maximum value seen by the TDigest
func (d *Double) MaxValue() (float64, error)

// TotalWeight returns the total weight of all values
func (d *Double) TotalWeight() uint64

// K returns the compression parameter k
func (d *Double) K() uint16

// Rank computes the approximate normalized rank of the given value
func (d *Double) Rank(value float64) (float64, error)

// Quantile computes the approximate quantile value corresponding to the given normalized rank
func (d *Double) Quantile(rank float64) (float64, error)

// PMF returns an approximation to the Probability Mass Function (PMF)
// of the input stream.
func (d *Double) PMF(splitPoints []float64) ([]float64, error)

// CDF returns an approximation to the Cumulative Distribution Function (CDF)
// which is the cumulative analog of the PMF of the input stream.
func (d *Double) CDF(splitPoints []float64) ([]float64, error)

// String returns a human-readable summary of the TDigest
func (d *Double) String(shouldPrintCentroids bool) string

// SerializedSizeBytes computes the serialized size in bytes of the TDigest.
func (d *Double) SerializedSizeBytes(withBuffer bool) int

Serialization/Deserialization

  • I will use Encoder/Decoder pattern like it was before.

Release Schedule

I sent a PR and discussion opens. When finish the PR, I will start upload implementation.

I will upload 2 PRs.

  1. A PR for double precision t-digest.
  2. A PR for and serialization / deserialization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions