Rust library

zhhz is a regular Rust crate. The CLI is just one binary that calls the library; everything in src/ is also exposed as public API.

Quick start

Add to Cargo.toml:

[dependencies]
zhhz = "0.7"

Convert some text:

use zhhz::{Config, Converter};

let c = Converter::new(Config::S2t);
assert_eq!(c.convert("汉字"), "漢字");

Custom words override the built-in tables at the highest priority:

let c = Converter::with_custom(
    Config::S2t,
    &[("软件".into(), "軟體".into())],
);
assert_eq!(c.convert("买一台打印机"), "買一台印表機");

Conversion

pub enum Config {
    S2t, T2s,
    S2tw, Tw2s,
    S2hk, Hk2s,
    S2twp, Tw2sp,
    S2hkp, Hk2sp,
    T2tw, Tw2t,
    T2hk, Hk2t,
    T2jp, Jp2t,
}

pub struct Converter { /* ... */ }

impl Converter {
    pub fn new(config: Config) -> Self;
    pub fn with_custom(config: Config, custom: &[(String, String)]) -> Self;
    pub fn convert(&self, text: &str) -> String;
}

All 16 OpenCC configs are variants of Config. Use Config::parse(name) if you have a string from a CLI flag or config file:

let cfg = Config::parse("s2twp").expect("known config");
let converter = Converter::new(cfg);

Semantic region flags

For UI code, prefer region codes over config names:

use zhhz::{Config, Region};

let cfg = Region::parse("cn-s")
    .and_then(|from| region_pair_config(from, Region::CnTw))
    .expect("supported pair");
let converter = Converter::new(cfg);

Region::ALL is [CnS, CnT, CnTw, CnHk, JpT, JpN]. region_pair_config(from, to) returns the Config that performs the conversion (preferring phrase-aware variants when they exist).

Detection

use zhhz::{detect_text, Detection};

let d: Option<Detection> = detect_text("他去了西維珍尼亞州");
// Some(Detection { region: "cn-hk", confidence: 70 })

if let Some(d) = d {
    println!("region = {}, confidence = {}", d.region, d.confidence);
}

Detection::region is one of "cn-s" / "cn-t" / "cn-tw" / "cn-hk" / "jp-n" / "jp-t". confidence is 0–100 (share of signature characters in the input). Returns None when there are no CJK characters or kana.

For raw bytes, use detect_bytes(&[u8]) (rejects invalid UTF-8).

Reuse and cost

Converter holds parsed dictionaries; building one is the expensive step, calling convert() is cheap. Build a Converter once at startup and reuse it across many inputs:

let converter = Converter::new(Config::S2twp);

for line in stdin.lines() {
    let converted = converter.convert(&line?);
    println!("{}", converted);
}

Conversion is pure (no I/O, no network, no filesystem access). It’s safe to call from any thread; you can also build separate Converter instances per thread.

Feature flags

[dependencies]
zhhz = { version = "0.7", default-features = false }   # just the core
zhhz = { version = "0.7", features = ["wasm"] }       # + WebAssembly bindings

The default feature set is empty. The wasm feature pulls in wasm-bindgen + js-sys and exposes the same conversion / detection API to JavaScript via the wasm module (gated by #[cfg(feature = "wasm")]). The native path is unaffected by this flag — building without --features wasm produces the same Converter::convert output.

Threading

Converter::convert takes &self. The conversion engine is pure and thread-safe; you can share a single Converter across threads (it’s Sync). For maximum throughput, build one per worker thread and avoid contention.

Where to go next