Skip to content
datasketches

datasketches

datasketches : Approximate analytics sketches and aggregates for PostgreSQL

Overview

ID Extension Package Version Category License Language
4690
datasketches
datasketches
1.7.0
FUNC
Apache-2.0
C++
Attribute Has Binary Has Library Need Load Has DDL Relocatable Trusted
--s-d-r
No
Yes
No
Yes
yes
no

Built against Apache DataSketches C++ core 5.0.0.

Packages

Type Repo Version PG Major Compatibility Package Pattern Dependencies
EXT
PIGSTY
1.7.0
18
17
16
15
14
datasketches -
RPM
PIGSTY
1.7.0
18
17
16
15
14
datasketches_$v -
DEB
PIGSTY
1.7.0
18
17
16
15
14
postgresql-$v-datasketches -
Linux / PG PG18 PG17 PG16 PG15 PG14
el8.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el8.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el9.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el9.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el10.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
el10.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d12.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d12.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d13.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d13.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u22.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u22.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u24.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u24.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u26.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u26.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
Package Version OS ORG SIZE File URL
datasketches_18 1.7.0 el8.x86_64 pigsty 324.4 KiB datasketches_18-1.7.0-1PIGSTY.el8.x86_64.rpm
datasketches_18 1.7.0 el8.aarch64 pigsty 314.1 KiB datasketches_18-1.7.0-1PIGSTY.el8.aarch64.rpm
datasketches_18 1.7.0 el9.x86_64 pigsty 309.4 KiB datasketches_18-1.7.0-1PIGSTY.el9.x86_64.rpm
datasketches_18 1.7.0 el9.aarch64 pigsty 315.1 KiB datasketches_18-1.7.0-1PIGSTY.el9.aarch64.rpm
datasketches_18 1.7.0 el10.x86_64 pigsty 319.1 KiB datasketches_18-1.7.0-1PIGSTY.el10.x86_64.rpm
datasketches_18 1.7.0 el10.aarch64 pigsty 319.4 KiB datasketches_18-1.7.0-1PIGSTY.el10.aarch64.rpm
postgresql-18-datasketches 1.7.0 d12.x86_64 pigsty 918.1 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~bookworm_amd64.deb
postgresql-18-datasketches 1.7.0 d12.aarch64 pigsty 920.0 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~bookworm_arm64.deb
postgresql-18-datasketches 1.7.0 d13.x86_64 pigsty 943.3 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~trixie_amd64.deb
postgresql-18-datasketches 1.7.0 d13.aarch64 pigsty 944.0 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~trixie_arm64.deb
postgresql-18-datasketches 1.7.0 u22.x86_64 pigsty 1017.0 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~jammy_amd64.deb
postgresql-18-datasketches 1.7.0 u22.aarch64 pigsty 1020.8 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~jammy_arm64.deb
postgresql-18-datasketches 1.7.0 u24.x86_64 pigsty 977.8 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~noble_amd64.deb
postgresql-18-datasketches 1.7.0 u24.aarch64 pigsty 991.3 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~noble_arm64.deb
postgresql-18-datasketches 1.7.0 u26.x86_64 pigsty 945.1 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~resolute_amd64.deb
postgresql-18-datasketches 1.7.0 u26.aarch64 pigsty 951.7 KiB postgresql-18-datasketches_1.7.0-1PIGSTY~resolute_arm64.deb

Source

pig build pkg datasketches;		# build rpm/deb

Install

Make sure PGDG and PIGSTY repo available:

pig repo add pgsql -u   # add both repo and update cache

Install this extension with pig:

pig install datasketches;		# install via package name, for the active PG version

pig install datasketches -v 18;   # install for PG 18
pig install datasketches -v 17;   # install for PG 17
pig install datasketches -v 16;   # install for PG 16
pig install datasketches -v 15;   # install for PG 15
pig install datasketches -v 14;   # install for PG 14

Create this extension with:

CREATE EXTENSION datasketches;

Usage

Sources: README, latest release 1.7.0, Apache DataSketches

datasketches adds approximate analytics sketch types and aggregates to PostgreSQL. The upstream README lists CPC, HLL, Theta, Array Of Doubles, KLL, Quantiles, and Frequent Strings sketches; the 1.7.0 release is the latest published GitHub release, while the default branch has already moved on to 1.8.0-SNAPSHOT.

CREATE EXTENSION datasketches;

Core Sketch Families

  • cpc_sketch and hll_sketch for approximate distinct counting.
  • theta_sketch for distinct counting plus set operations such as union, intersection, and A-not-B.
  • aod_sketch for tuple-style metrics keyed by identifiers with arrays of doubles.
  • kll_*_sketch and quantiles_*_sketch for quantiles, ranks, PMF, and CDF.
  • frequent_strings_sketch for heavy-hitter detection.

Common Patterns

Build a sketch from raw values:

SELECT cpc_sketch_build(1);
SELECT kll_float_sketch_build(value) FROM normal;

Use one-shot approximate aggregates:

SELECT cpc_sketch_distinct(id) FROM random_ints_100m;

Merge sketches across groups or cube dimensions:

SELECT cpc_sketch_get_estimate(cpc_sketch_union(sketch)) FROM cpc_sketch_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(sketch)) FROM hll_sketch_test;
SELECT kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 0.5)
FROM kll_float_sketch_test;

Run set operations on Theta sketches:

SELECT theta_sketch_get_estimate(theta_sketch_intersection(sketch1, sketch2))
FROM theta_set_op_test;

Find frequent items above a threshold:

SELECT frequent_strings_sketch_result_no_false_negatives(
  frequent_strings_sketch_build(9, value),
  1000000
)
FROM zipf_1p1_8k_100m;

Caveats

  • Upstream documents PostgreSQL 9.6+ plus Boost 1.75.0 and DataSketches C++ core 5.0.0 or later as build dependencies.
  • These are approximate structures meant to be mergeable across dimensions; they are not exact replacements for COUNT(DISTINCT ...) or exact histograms.
Last updated on