pg_pinyin
pg_pinyin : Pinyin romanization and search helpers for PostgreSQL
Overview
| ID | Extension | Package | Version | Category | License | Language |
|---|---|---|---|---|---|---|
| 2190 | pg_pinyin
|
pg_pinyin
|
0.0.4 |
FTS
|
MIT
|
Rust
|
| Attribute | Has Binary | Has Library | Need Load | Has DDL | Relocatable | Trusted |
|---|---|---|---|---|---|---|
--s-d-r
|
No
|
Yes
|
No
|
Yes
|
yes
|
no
|
| Relationships | |
|---|---|
| Schemas | pinyin |
| See Also | zhparser
pg_search
pg_trgm
pg_bigm
pgroonga
pgroonga_database
pg_tokenizer
fuzzystrmatch
|
optional tokenizer-input overload can integrate with pg_search; pgrx patched to 0.18.1.
Packages
| Type | Repo | Version | PG Major Compatibility | Package Pattern | Dependencies |
|---|---|---|---|---|---|
| EXT | PIGSTY
|
0.0.4 |
18
17
16
15
14
|
pg_pinyin |
- |
| RPM | PIGSTY
|
0.0.4 |
18
17
16
15
14
|
pg_pinyin_$v |
- |
| DEB | PIGSTY
|
0.0.4 |
18
17
16
15
14
|
postgresql-$v-pinyin |
- |
| Linux / PG | PG18 | PG17 | PG16 | PG15 | PG14 |
|---|---|---|---|---|---|
el8.x86_64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
el8.aarch64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
el9.x86_64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
el9.aarch64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
el10.x86_64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
el10.aarch64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
d12.x86_64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
d12.aarch64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
d13.x86_64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
d13.aarch64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
u22.x86_64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
u22.aarch64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
u24.x86_64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
u24.aarch64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
u26.x86_64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
u26.aarch64
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
PIGSTY 0.0.4
|
Source
pig build pkg pg_pinyin; # build rpm/debInstall
Make sure PGDG and PIGSTY repo available:
pig repo add pgsql -u # add both repo and update cacheInstall this extension with pig:
pig install pg_pinyin; # install via package name, for the active PG version
pig install pg_pinyin -v 18; # install for PG 18
pig install pg_pinyin -v 17; # install for PG 17
pig install pg_pinyin -v 16; # install for PG 16
pig install pg_pinyin -v 15; # install for PG 15
pig install pg_pinyin -v 14; # install for PG 14Create this extension with:
CREATE EXTENSION pg_pinyin;Usage
Sources: pg_pinyin upstream README, Chinese README, local metadata.
pg_pinyin converts Chinese text to Pinyin, either character by character or by word. It is useful for generated search columns, trigram search, and pg_search BM25 queries that need Pinyin input.
CREATE EXTENSION pg_pinyin;Functions
| Function | Description |
|---|---|
pinyin_char_romanize(text) |
Character-level Pinyin romanization |
pinyin_char_romanize(text, suffix text) |
Character-level romanization with a custom dictionary suffix |
pinyin_word_romanize(text) |
Word-level Pinyin romanization |
pinyin_word_romanize(text, suffix text) |
Word-level romanization with a custom dictionary suffix |
pinyin_word_romanize(tokenizer_input anyelement) |
Word-level romanization from a pg_search tokenizer input such as name::pdb.icu::text[] |
pinyin_word_romanize(tokenizer_input anyelement, suffix text) |
Tokenizer-input romanization with a custom dictionary suffix |
pinyin_regex_phrase(text, slope integer DEFAULT NULL, max_expansions integer DEFAULT NULL, generated_pinyin boolean DEFAULT false) |
pg_search query helper returning pdb.query, available when pg_search was enabled before CREATE EXTENSION pg_pinyin |
pinyin_regex_phrase_patterns(text, generated_pinyin boolean DEFAULT false) |
Internal helper returning regex phrase tokens as text[] |
Generated Column + Trigram Search
CREATE EXTENSION IF NOT EXISTS pg_pinyin;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE TABLE voice (
id bigserial PRIMARY KEY,
description text NOT NULL,
pinyin text GENERATED ALWAYS AS (public.pinyin_char_romanize(description)) STORED
);
CREATE INDEX voice_pinyin_trgm_idx ON voice USING gin (pinyin gin_trgm_ops);
INSERT INTO voice (description) VALUES ('郑爽ABC');
SELECT id, description, pinyin FROM voice;Word Tokenization + pg_search
For word-oriented search, use pinyin_word_romanize. When pg_search is available, it can consume tokenizer input such as pdb.icu::text[].
CREATE EXTENSION IF NOT EXISTS pg_search;
CREATE EXTENSION IF NOT EXISTS pg_pinyin;
CREATE TABLE voice (
id bigserial PRIMARY KEY,
description text NOT NULL,
pinyin text GENERATED ALWAYS AS (public.pinyin_word_romanize(description)) STORED
);
CREATE INDEX voice_pinyin_bm25_idx
ON voice
USING bm25 (id, pinyin)
WITH (key_field='id');
SELECT *
FROM voice
WHERE pinyin @@@ public.pinyin_regex_phrase('zhengshuang');
SELECT public.pinyin_word_romanize('郑爽ABC'::pdb.icu::text[]);pinyin_regex_phrase has return type pdb.query, so pg_search must be enabled in the database before pg_pinyin is created. If pg_pinyin is created first, upstream documents that the romanization functions are installed, but pinyin_regex_phrase is installed as an error stub with a clear exception.
Dictionary Tables
The extension seeds bundled dictionary tables under schema pinyin during CREATE EXTENSION pg_pinyin; no separate data-load step is needed for normal extension usage. The bundled data covers character mappings, word tokens, and word mappings.
Provide custom dictionary tables in schema pinyin with a suffix. Calls using that suffix merge the base dictionary with the suffix tables, and suffix entries take priority.
CREATE TABLE IF NOT EXISTS pinyin.pinyin_mapping_suffix1 (
character text PRIMARY KEY,
pinyin text NOT NULL
);
CREATE TABLE IF NOT EXISTS pinyin.pinyin_words_suffix1 (
word text PRIMARY KEY,
pinyin text NOT NULL
);
INSERT INTO pinyin.pinyin_mapping_suffix1 (character, pinyin)
VALUES ('郑', '|zhengx|')
ON CONFLICT (character) DO UPDATE SET pinyin = EXCLUDED.pinyin;
INSERT INTO pinyin.pinyin_words_suffix1 (word, pinyin)
VALUES ('郑爽', '|zhengx| |shuangx|')
ON CONFLICT (word) DO UPDATE SET pinyin = EXCLUDED.pinyin;
SELECT public.pinyin_char_romanize('郑爽ABC', '_suffix1');
SELECT public.pinyin_word_romanize('郑爽ABC'::pdb.icu::text[], '_suffix1');