datasketches
datasketches : Approximate analytics sketches and aggregates for PostgreSQL
Overview
| ID | Extension | Package | Version | Category | License | Language |
|---|---|---|---|---|---|---|
| 4690 | datasketches
|
datasketches
|
1.7.0 |
FUNC
|
Apache-2.0
|
C++
|
| Attribute | Has Binary | Has Library | Need Load | Has DDL | Relocatable | Trusted |
|---|---|---|---|---|---|---|
--s-d-r
|
No
|
Yes
|
No
|
Yes
|
yes
|
no
|
Built against Apache DataSketches C++ core 5.0.0.
Packages
| Type | Repo | Version | PG Major Compatibility | Package Pattern | Dependencies |
|---|---|---|---|---|---|
| EXT | PIGSTY
|
1.7.0 |
18
17
16
15
14
|
datasketches |
- |
| RPM | PIGSTY
|
1.7.0 |
18
17
16
15
14
|
datasketches_$v |
- |
| DEB | PIGSTY
|
1.7.0 |
18
17
16
15
14
|
postgresql-$v-datasketches |
- |
| Linux / PG | PG18 | PG17 | PG16 | PG15 | PG14 |
|---|---|---|---|---|---|
el8.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el8.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el9.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el9.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el10.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el10.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
d12.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
d12.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
d13.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
d13.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u22.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u22.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u24.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u24.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u26.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u26.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
Source
pig build pkg datasketches; # build rpm/debInstall
Make sure PGDG and PIGSTY repo available:
pig repo add pgsql -u # add both repo and update cacheInstall this extension with pig:
pig install datasketches; # install via package name, for the active PG version
pig install datasketches -v 18; # install for PG 18
pig install datasketches -v 17; # install for PG 17
pig install datasketches -v 16; # install for PG 16
pig install datasketches -v 15; # install for PG 15
pig install datasketches -v 14; # install for PG 14Create this extension with:
CREATE EXTENSION datasketches;Usage
Sources: README, latest release 1.7.0, Apache DataSketches
datasketches adds approximate analytics sketch types and aggregates to PostgreSQL. The upstream README lists CPC, HLL, Theta, Array Of Doubles, KLL, Quantiles, and Frequent Strings sketches; the 1.7.0 release is the latest published GitHub release, while the default branch has already moved on to 1.8.0-SNAPSHOT.
CREATE EXTENSION datasketches;Core Sketch Families
cpc_sketchandhll_sketchfor approximate distinct counting.theta_sketchfor distinct counting plus set operations such as union, intersection, and A-not-B.aod_sketchfor tuple-style metrics keyed by identifiers with arrays of doubles.kll_*_sketchandquantiles_*_sketchfor quantiles, ranks, PMF, and CDF.frequent_strings_sketchfor heavy-hitter detection.
Common Patterns
Build a sketch from raw values:
SELECT cpc_sketch_build(1);
SELECT kll_float_sketch_build(value) FROM normal;Use one-shot approximate aggregates:
SELECT cpc_sketch_distinct(id) FROM random_ints_100m;Merge sketches across groups or cube dimensions:
SELECT cpc_sketch_get_estimate(cpc_sketch_union(sketch)) FROM cpc_sketch_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(sketch)) FROM hll_sketch_test;
SELECT kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 0.5)
FROM kll_float_sketch_test;Run set operations on Theta sketches:
SELECT theta_sketch_get_estimate(theta_sketch_intersection(sketch1, sketch2))
FROM theta_set_op_test;Find frequent items above a threshold:
SELECT frequent_strings_sketch_result_no_false_negatives(
frequent_strings_sketch_build(9, value),
1000000
)
FROM zipf_1p1_8k_100m;Caveats
- Upstream documents PostgreSQL 9.6+ plus Boost 1.75.0 and DataSketches C++ core 5.0.0 or later as build dependencies.
- These are approximate structures meant to be mergeable across dimensions; they are not exact replacements for
COUNT(DISTINCT ...)or exact histograms.