There are two major types of Chinese dictionaries: Character (字典) and Word (辞典/词典). Character Dictionaries are more common.
Chinese words are composed of one or more characters, i.e. on average a written Mandarin word has 2.1 characters. The purpose of a character dictionary is to learn characters, not words. For example, when trying to look up the word 方言 (dialect), one will find the characters 方and 言 individually, but not the combined 方言. Thus for words, we need word dictionaries. In a typical word dictionary, one would look up the character 方 first. There would be a list of words with 方 as the lead character, and 方言 would be among the list.
Language studies in China strongly favor the written form of a language. Thus popular word dictionaries contain mostly written Mandarin words. One could easily find 梯子 and 炒菜铲 (ladder and spatula in Mandarin), but definitely not 凭企 and 锅脷 (ladder and spatula in Taishanese).
Motivated by Dr. Anne Yue-Hashimoto's word list and the CantoDict website, I came up with the idea a
few years ago to build a Taishanese online word dictionary with audio. With this in mind, I started to analyze Taishanese (in an ad hoc manner), and jotted down notes along the way (which I posted on this blog to share with everyone). Bit by bit, I also learned the basics of web hosting, database, and scripting. Everything finally came together as I stumbled upon a time-efficient way to record large amount of audio samples.
Amazingly, all the software used are free. I'm so thankful to those who make the tools available. By hosting the dictionary, I hope to do my part to keep the open and sharing culture of the internet thriving.
There are over 8000 entries at the moment and I continue to add to it. I must ask everyone to let me know of any errors you may find. In particular, I worry that whole categories of words may be missing. Since the database was not built systematically, one may find unexpected results sometimes -- kind of like hitting the I'm Feeling Lucky button on Google Search, which may actually be fun.
|Only Simplified Characters are supported. Phonetic Transcriptions:
国际音标 in brackets*
To save typing, the following simplified IPA symbols are used.:
||Position in Syllable
|p/b̥ t/d̥ k/g̊
||b d g
|pʰ tʰ kʰ
||p t k
||Aspirated Voiceless Stops
|p̚ t̚ k̚
||p t k
||Voiceless Stops with no audible release
|(u)ɔ (u)ə əuŋ
||ɔ ə əŋ
||Accent and syllable dependent
||Before a high vowel, the alveolar [s] is pronounced as [ɕ]
||[?] precedes all zero-initial syllables
Note that although both the aspirated voiceless stops and the voiceless
stops with no audible release are reduced to the same symbols, there's no
ambiguity because they take up different positions in a syllable. For example,
the IPA representation of cut (切)
is [tɛt33]. We can deduce that the first [t]
is the simplified form of [th], and the second
is the simplified form of [t̚].