There are two major types of Chinese dictionaries: Character (字典) and Word (辞典/词典). Character Dictionaries are more common.

Chinese words are composed of one or more characters, i.e. on average a written Mandarin word has 2.1 characters. The purpose of a character dictionary is to learn characters, not words. For example, when trying to look up the word 方言 (dialect), one will  find the characters 方and 言 individually, but not the combined 方言. Thus for words, we need word dictionaries. In a typical word dictionary, one would look up the character 方 first. There would be a list of words with 方 as the lead character, and 方言 would be among the list.

Language studies in China strongly favor the written form of a language. Thus popular word dictionaries contain mostly written Mandarin words. One could easily find 梯子 and 炒菜铲 (ladder and spatula in Mandarin), but definitely not 凭企 and 锅脷 (ladder and spatula in Taishanese).

Motivated by Dr. Anne Yue-Hashimoto's word list and the CantoDict website, I came up with the idea a few years ago to build a Taishanese online word dictionary with audio. With this in mind, I started to analyze Taishanese (in an ad hoc manner), and jotted down notes along the way (which I posted on this blog to share with everyone). Bit by bit, I also learned the basics of web hosting, database, and  scripting. Everything finally came together as I stumbled upon a time-efficient way to record large amount of audio samples.

Amazingly, all the software used  are free. I'm so thankful to those who make the tools available. By hosting the dictionary, I hope to do my part to keep the open and sharing culture of the internet thriving.

There are over 8000 entries at the moment and I continue to add to it. I must ask everyone to let me know of any errors you may find. In particular, I worry that whole categories of words may be missing. Since the database was not built systematically, one may find unexpected results sometimes -- kind of like hitting the I'm Feeling Lucky button on Google Search, which may actually be fun.


Only Simplified Characters are supported. Phonetic Transcriptions:
Taishanese 台山话 IPA  国际音标 in brackets*
Cantonese 粤语 Jyutping  粤拼
Mandarin 普通话/国语 Pinyin  拼音

To save typing, the following simplified IPA symbols are used.:

Standard IPA Simplified IPA Position in Syllable
p/b̥  t/d̥  k/g̊
b  d  g Initial (onset) Voiceless Stops
pʰ   tʰ   kʰ p  t   k Initial (onset) Aspirated Voiceless Stops
p̚    t̚     k̚ p   t   k Final (coda) Voiceless Stops with no audible release
ts/dz̥   tsʰ dz   ts Initial (onset) Alveolar Affricates
æ ia Vowel (nucleus)
(u)ɔ   (u)ə   əuŋ ɔ   ə  əŋ Vowel (nucleus) Accent and syllable dependent
s  ɕ s Initial (onset) Before a high vowel, the alveolar [s] is pronounced as [ɕ]
? Ø Initial [?] precedes all zero-initial syllables

Note that although both the aspirated voiceless stops and the voiceless stops with no audible release are reduced to the same symbols, there's no ambiguity because they take up different positions in a syllable. For example, the IPA representation of cut (切) is [tɛt33]. We can deduce that the first  [t] is the simplified form of [th], and the second (trailing) [t] is the simplified form of [t̚].