zhhz for Node.js

The npm package zhhz ships the same conversion engine as the zhhz CLI, compiled to WebAssembly. OpenCC dictionaries are baked into the .wasm blob, so there is no data directory to ship alongside and no network fetch at startup — npm install zhhz is the entire setup.

The API surface is strictly richer than opencc-js:

Capability zhhz opencc-js
16 OpenCC conversion configs
Custom words (array form)
Custom words (string form)
Reusable converter instance
Per-instance convertWithCustom
Script-variant detection
Config / locale introspection partial
Semantic region flags (from/to)
Zero native dependencies
Same bytes in / same bytes out as the CLI partial

Install

npm install zhhz

Requires Node.js ≥ 18 (the package targets the ESM bundler artifact).

Quick start (ESM)

import { convert, detect, Converter, listConfigs } from "zhhz";

// One-shot
console.log(convert("汉字", "s2t")); // 漢字

// Script-variant detection (unique to zhhz)
console.log(detect("他去了西維珍尼亞州"));
// { region: "cn-hk", confidence: 70 }

// Reusable instance — no need to rebuild per call
const c = new Converter("s2twp");
console.log(c.convert("信息"));     // 資訊
console.log(c.convert("鼠标"));     // 滑鼠  (Taiwan phrase projection)

// One-line: which configs are available?
console.log(listConfigs());
// ["s2t", "t2s", "s2tw", "tw2s", "s2hk", "hk2s", "s2twp", "tw2sp",
//  "s2hkp", "hk2sp", "t2tw", "tw2t", "t2hk", "hk2t", "t2jp", "jp2t"]

Quick start (CommonJS)

(async () => {
  const { convert, Converter } = await import("zhhz");
  // or: const { convert, Converter } = require("zhhz");
  console.log(convert("汉字", "s2t")); // 漢字
  const c = new Converter("s2twp");
  console.log(c.convert("信息"));     // 資訊
})();

Conversion configs

All 16 OpenCC configs are supported. twp / hkp variants project Taiwan / Hong Kong idioms (鼠标 → 滑鼠, 信息 → 資訊).

Config Direction
s2t Simplified → Traditional (OpenCC standard)
t2s Traditional (OpenCC standard) → Simplified
s2tw Simplified → Traditional (Taiwan)
tw2s Traditional (Taiwan) → Simplified
s2hk Simplified → Traditional (Hong Kong)
hk2s Traditional (Hong Kong) → Simplified
s2twp Simplified → Traditional (Taiwan, with phrases)
tw2sp Traditional (Taiwan, with phrases) → Simplified
s2hkp Simplified → Traditional (Hong Kong, with phrases)
hk2sp Traditional (Hong Kong, with phrases) → Simplified
t2tw Traditional (OpenCC standard) → Traditional (Taiwan)
tw2t Traditional (Taiwan) → Traditional (OpenCC standard)
t2hk Traditional (OpenCC standard) → Traditional (Hong Kong)
hk2t Traditional (Hong Kong) → Traditional (OpenCC standard)
t2jp Traditional → Japanese Shinjitai
jp2t Japanese Kyūjitai → Traditional

Custom words

Two equivalent forms for custom-word overrides (highest priority):

import { convert_with_custom, Converter } from "zhhz";

// Array form (preferred in JS)
convert_with_custom("买软件", "s2t", [["软件", "軟體"]]);
// "買軟體"

// String form — same as opencc-js's DictLike. Entries separated by "|",
// each entry is "key value" (first space splits key from value).
convert_with_custom("买软件", "s2t", "软件 軟體|苹果 蘋果");
// "買軟體"

On the Converter instance, you can inject custom words per call or bake them into a new instance:

const c = new Converter("s2t");

// Per call — original instance unchanged
console.log(c.convertWithCustom("买软件", [["软件", "軟體"]])); // 買軟體
console.log(c.convert("买软件"));                              // 買軟體 (no custom)

// Baked in — new instance with custom always applied
const cCustom = c.withCustom([["软件", "軟體"]]);
console.log(cCustom.convert("买软件")); // 買軟體

Script-variant detection

import { detect } from "zhhz";

detect("汉字计算机软件");
// { region: "cn-s", confidence: 90 }

detect("漢字計算機軟體");
// { region: "cn-t", confidence: 80 }

detect("他去了西維珍尼亞州");
// { region: "cn-hk", confidence: 70 }

detect("こんにちは世界");
// { region: "jp-n", confidence: 80 }

detect("hello world");
// null  (no CJK characters)

region is one of cn-s / cn-t / cn-tw / cn-hk / jp-n / jp-t; confidence is 0–100.

Semantic region flags

zhhz’s CLI accepts --from cn-s --to cn-tw instead of memorising config names. The same idea is exposed to JS:

import { configForRegionPair, Converter } from "zhhz";

configForRegionPair("cn-s", "cn-tw"); // "s2twp"  (phrase-aware variant)
configForRegionPair("cn-s", "cn-hk"); // "s2hkp"
configForRegionPair("cn-t", "cn-s");  // "t2s"

// Or directly build a Converter from a region pair:
const c = Converter.forRegion("cn-s", "cn-tw");
console.log(c.config);  // "s2twp"
console.log(c.convert("鼠标")); // 滑鼠

Introspection

import { listConfigs, listLocales } from "zhhz";

listConfigs();
// ["s2t", "t2s", "s2tw", "tw2s", "s2hk", "hk2s", "s2twp", "tw2sp",
//  "s2hkp", "hk2sp", "t2tw", "tw2t", "t2hk", "hk2t", "t2jp", "jp2t"]

listLocales();
// ["cn-s", "cn-t", "cn-tw", "cn-hk", "jp-t", "jp-n"]

Notes

License

Apache 2.0. Dictionary data vendored from BYVoid/OpenCC at the pinned commit in data/UPSTREAM.