Benchmarks
zhhz ships the same Rust conversion core in four channels (CLI, Rust library, npm/WebAssembly, Python). The numbers below compare each channel’s throughput on representative Chinese text.
Methodology
- Corpus: a 5.18 MiB mixed CJK / Latin corpus (~50 / 50 by character count), repeating 10 hand-written news + literary sentences.
- Configs tested:
s2t(no phrases),s2twp(Taiwan phrase projection — heaviest),t2s(reverse direction, no phrases). - Method: each channel warms up 3 times, runs 5 timed iterations, reports the median wall time.
- Platform: M2 (arm64), Node 22, CPython 3.10, Rust 1.83 release build.
- Source:
tests/bench-node.mjsfor npm;cargo run --release --example bench_perffor the native CLI; the Python number uses theconvert()API in a tight loop with the same corpus.
Results (MB/s, median of 5 runs)
| config | CLI (native) | npm (WASM) | Deno (WASM) | Python (PyO3) | Rust (rlib) |
|---|---|---|---|---|---|
s2t |
88 | 63 | 58 | ~85 (same engine) | ~85 (same engine) |
s2twp |
~88 | 41 | 39 | ~85 | ~85 |
t2s |
~88 | 104 | 108 | ~85 | ~85 |
The CLI / Python / Rust numbers are all roughly the same — they share the Rust conversion core; the per-binding overhead is small.
The WASM columns (npm + Deno) are interesting:
t2sis faster than native CLI (~120-125 %). No subprocess overhead (no fork + exec + stdout pipe) — the conversion runs in-process with the dictionaries already loaded.- Deno and Node.js are within ~5-10% of each other for the same WASM blob — the wasm-bindgen-generated JS performs similarly across both runtimes.
s2twpis ~47 % of native. Taiwan phrase projection is the most WASM-unfriendly config — it does a lot of string scanning and rebuilding. If this becomes a real complaint, the next step is a native binding via napi-rs.
One-shot vs instance (npm)
For the npm channel specifically, the per-call cost of building a Converter instance matters:
// Slower: each convert() call builds a new Converter (~1.3x).
import { convert } from "zhhz";
convert(text, "s2t");
// Faster: build once, reuse many times.
import { Converter } from "zhhz";
const c = new Converter("s2t");
c.convert(text);
The benchmark script (tests/bench-node.mjs) measures both. For a hot loop over many texts, build the Converter once at the top.
Why no full opencc-js comparison?
opencc-js doesn’t expose a programmatic benchmark surface comparable to zhhz’s. Apples-to-apples would require either re-implementing opencc-js’s dictionary loader (significant work) or relying on published numbers from third parties.
The numbers that are comparable:
- Both
opencc-jsandzhhzcompile OpenCC’s same dictionaries to the same target (16 configs, same FMM segmentation). opencc-jsis JS-based (no WASM);zhhzis WASM. On modern V8 the JIT-compiled JS can be competitive with WASM for simple trie walks; we haven’t measured directly.- The single thing zhhz has that opencc-js doesn’t: script-variant detection (
detect()returning{region, confidence}). opencc-js has no equivalent.
Why no full opencc binary comparison?
We have it, internally: see examples/parity.rs (CI step “Differential parity against opencc”) which runs opencc 1.3.1 against the same dictionary data. Byte-for-byte equality on all 538 supported-config cases is verified on every PR.
For raw MB/s, the native CLI v0.7.7 cross-platform report is in the release notes; for the npm channel, the data above is the canonical local measurement.
See also
- Why zhhz — design goals, scope
- Node.js / npm API —
npm install zhhz - Python integration —
pip install zhhz - CLI reference —
zhhz --bench