Skip to content
pg_pinyin

pg_pinyin

pg_pinyin : Pinyin romanization and search helpers for PostgreSQL

Overview

ID Extension Package Version Category License Language
2190
pg_pinyin
pg_pinyin
0.0.4
FTS
MIT
Rust
Attribute Has Binary Has Library Need Load Has DDL Relocatable Trusted
--s-d-r
No
Yes
No
Yes
yes
no
Relationships
Schemas pinyin
See Also
zhparser
pg_search
pg_trgm
pg_bigm
pgroonga
pgroonga_database
pg_tokenizer
fuzzystrmatch

optional tokenizer-input overload can integrate with pg_search; pgrx patched to 0.18.1.

Packages

Type Repo Version PG Major Compatibility Package Pattern Dependencies
EXT
PIGSTY
0.0.4
18
17
16
15
14
pg_pinyin -
RPM
PIGSTY
0.0.4
18
17
16
15
14
pg_pinyin_$v -
DEB
PIGSTY
0.0.4
18
17
16
15
14
postgresql-$v-pinyin -
Linux / PG PG18 PG17 PG16 PG15 PG14
el8.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
el8.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
el9.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
el9.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
el10.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
el10.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
d12.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
d12.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
d13.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
d13.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u22.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u22.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u24.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u24.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u26.x86_64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
u26.aarch64
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
PIGSTY 0.0.4
Package Version OS ORG SIZE File URL
pg_pinyin_18 0.0.4 el8.x86_64 pigsty 3.1 MiB pg_pinyin_18-0.0.4-2PIGSTY.el8.x86_64.rpm
pg_pinyin_18 0.0.4 el8.aarch64 pigsty 3.0 MiB pg_pinyin_18-0.0.4-2PIGSTY.el8.aarch64.rpm
pg_pinyin_18 0.0.4 el9.x86_64 pigsty 3.1 MiB pg_pinyin_18-0.0.4-2PIGSTY.el9.x86_64.rpm
pg_pinyin_18 0.0.4 el9.aarch64 pigsty 3.1 MiB pg_pinyin_18-0.0.4-2PIGSTY.el9.aarch64.rpm
pg_pinyin_18 0.0.4 el10.x86_64 pigsty 3.1 MiB pg_pinyin_18-0.0.4-2PIGSTY.el10.x86_64.rpm
pg_pinyin_18 0.0.4 el10.aarch64 pigsty 3.0 MiB pg_pinyin_18-0.0.4-2PIGSTY.el10.aarch64.rpm
postgresql-18-pinyin 0.0.4 d12.x86_64 pigsty 2.6 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~bookworm_amd64.deb
postgresql-18-pinyin 0.0.4 d12.aarch64 pigsty 2.3 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~bookworm_arm64.deb
postgresql-18-pinyin 0.0.4 d13.x86_64 pigsty 2.6 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~trixie_amd64.deb
postgresql-18-pinyin 0.0.4 d13.aarch64 pigsty 2.3 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~trixie_arm64.deb
postgresql-18-pinyin 0.0.4 u22.x86_64 pigsty 2.8 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~jammy_amd64.deb
postgresql-18-pinyin 0.0.4 u22.aarch64 pigsty 2.7 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~jammy_arm64.deb
postgresql-18-pinyin 0.0.4 u24.x86_64 pigsty 2.8 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~noble_amd64.deb
postgresql-18-pinyin 0.0.4 u24.aarch64 pigsty 2.7 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~noble_arm64.deb
postgresql-18-pinyin 0.0.4 u26.x86_64 pigsty 2.8 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~resolute_amd64.deb
postgresql-18-pinyin 0.0.4 u26.aarch64 pigsty 2.7 MiB postgresql-18-pinyin_0.0.4-2PIGSTY~resolute_arm64.deb

Source

pig build pkg pg_pinyin;		# build rpm/deb

Install

Make sure PGDG and PIGSTY repo available:

pig repo add pgsql -u   # add both repo and update cache

Install this extension with pig:

pig install pg_pinyin;		# install via package name, for the active PG version

pig install pg_pinyin -v 18;   # install for PG 18
pig install pg_pinyin -v 17;   # install for PG 17
pig install pg_pinyin -v 16;   # install for PG 16
pig install pg_pinyin -v 15;   # install for PG 15
pig install pg_pinyin -v 14;   # install for PG 14

Create this extension with:

CREATE EXTENSION pg_pinyin;

Usage

Sources: pg_pinyin upstream README, Chinese README, local metadata.

pg_pinyin converts Chinese text to Pinyin, either character by character or by word. It is useful for generated search columns, trigram search, and pg_search BM25 queries that need Pinyin input.

CREATE EXTENSION pg_pinyin;

Functions

Function Description
pinyin_char_romanize(text) Character-level Pinyin romanization
pinyin_char_romanize(text, suffix text) Character-level romanization with a custom dictionary suffix
pinyin_word_romanize(text) Word-level Pinyin romanization
pinyin_word_romanize(text, suffix text) Word-level romanization with a custom dictionary suffix
pinyin_word_romanize(tokenizer_input anyelement) Word-level romanization from a pg_search tokenizer input such as name::pdb.icu::text[]
pinyin_word_romanize(tokenizer_input anyelement, suffix text) Tokenizer-input romanization with a custom dictionary suffix
pinyin_regex_phrase(text, slope integer DEFAULT NULL, max_expansions integer DEFAULT NULL, generated_pinyin boolean DEFAULT false) pg_search query helper returning pdb.query, available when pg_search was enabled before CREATE EXTENSION pg_pinyin
pinyin_regex_phrase_patterns(text, generated_pinyin boolean DEFAULT false) Internal helper returning regex phrase tokens as text[]

Generated Column + Trigram Search

CREATE EXTENSION IF NOT EXISTS pg_pinyin;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE TABLE voice (
  id bigserial PRIMARY KEY,
  description text NOT NULL,
  pinyin text GENERATED ALWAYS AS (public.pinyin_char_romanize(description)) STORED
);

CREATE INDEX voice_pinyin_trgm_idx ON voice USING gin (pinyin gin_trgm_ops);

INSERT INTO voice (description) VALUES ('郑爽ABC');
SELECT id, description, pinyin FROM voice;

Word Tokenization + pg_search

For word-oriented search, use pinyin_word_romanize. When pg_search is available, it can consume tokenizer input such as pdb.icu::text[].

CREATE EXTENSION IF NOT EXISTS pg_search;
CREATE EXTENSION IF NOT EXISTS pg_pinyin;

CREATE TABLE voice (
  id bigserial PRIMARY KEY,
  description text NOT NULL,
  pinyin text GENERATED ALWAYS AS (public.pinyin_word_romanize(description)) STORED
);

CREATE INDEX voice_pinyin_bm25_idx
ON voice
USING bm25 (id, pinyin)
WITH (key_field='id');

SELECT *
FROM voice
WHERE pinyin @@@ public.pinyin_regex_phrase('zhengshuang');

SELECT public.pinyin_word_romanize('郑爽ABC'::pdb.icu::text[]);

pinyin_regex_phrase has return type pdb.query, so pg_search must be enabled in the database before pg_pinyin is created. If pg_pinyin is created first, upstream documents that the romanization functions are installed, but pinyin_regex_phrase is installed as an error stub with a clear exception.

Dictionary Tables

The extension seeds bundled dictionary tables under schema pinyin during CREATE EXTENSION pg_pinyin; no separate data-load step is needed for normal extension usage. The bundled data covers character mappings, word tokens, and word mappings.

Provide custom dictionary tables in schema pinyin with a suffix. Calls using that suffix merge the base dictionary with the suffix tables, and suffix entries take priority.

CREATE TABLE IF NOT EXISTS pinyin.pinyin_mapping_suffix1 (
  character text PRIMARY KEY,
  pinyin text NOT NULL
);

CREATE TABLE IF NOT EXISTS pinyin.pinyin_words_suffix1 (
  word text PRIMARY KEY,
  pinyin text NOT NULL
);

INSERT INTO pinyin.pinyin_mapping_suffix1 (character, pinyin)
VALUES ('郑', '|zhengx|')
ON CONFLICT (character) DO UPDATE SET pinyin = EXCLUDED.pinyin;

INSERT INTO pinyin.pinyin_words_suffix1 (word, pinyin)
VALUES ('郑爽', '|zhengx| |shuangx|')
ON CONFLICT (word) DO UPDATE SET pinyin = EXCLUDED.pinyin;

SELECT public.pinyin_char_romanize('郑爽ABC', '_suffix1');
SELECT public.pinyin_word_romanize('郑爽ABC'::pdb.icu::text[], '_suffix1');
Last updated on