New dictionary files for Plan 9's dict(7)


This page contains a patch for Plan 9's dict(7) dictionary browser to allow one to use the dictd free dictionary files (in a slightly modified format).
Dictd is a dictionary software for Lunix which has gathered enough momentum recently to make it barely usable as a general cross-language dictionary. Fortunately Plan 9 already includes capabilities to search through dictionary files in the form of dict(7), so all we need to do is write the glue that makes dict understand dictd files.
Available here:
- Patches for Plan 9's dict.
- An english/bulgarian and bulgarian/english dictionary files+index
- Other dictionary files from freedict.de
- Instructions on configuring/recompiling dict(7) to use your own dict files
- Instructions on how to convert dictd's .dz files into a flat file for dict(7) and software for creating the index.
- Explanation of the format used for the dictionary file.
Patches (note, if this gets into sources i'll remove this section and leave only the dictionary files and intructions):
The following tar archive adds ability to read flat dictionary files to dict(7). It also fixes a small problem with the mkfile -- it doesn't clean mkindex properly and it appears on sources.cs.bell-labs.com compiled for the last architecture that the system was recompiled for (MIPS, as of this writing).
- dict.patch.tgz -- the modified files
You can safely unarchive this in /sys/src/cmd/dict (or your equivalent thereof) and type mk; mk install. This adds support for bulgarian/english and english/bulgarian dictionaries. You need to edit utils.c and recompile if you're going to use anything else. Just follow the instructions at the bottom of this page.
The archive contains:
- dict.h
- dictd.c -- support for dictd dictionary flat files
- mkfile -- add dictd and fix mkindex dangling after 'mk clean'
- utils.c -- add English/Bulgarian and Bulgarian/English dictionary entries
Eng-Bul and Bul-Eng dictionary files. Uncompress in /lib/dict.
- bgen.tgz(800K) -- Bulgarian-English dictionary, 17828 words.
- enbg.tgz(2.9M) -- English-Bulgarian dictionary, 46794 words.
Converted dictionary files from freedict.de. Note: those files are not very complete at all, and probably not really ready for everyday use, however they can serve as a starting point for your search -- for example, the (much more complete) bulgarian/english dictionaries above were found after a search among people on IRC who care about linguistics (thanx Pavel Mihaylov ;). NB: I've removed the files below due to space constraints
- afr-deu.tgz -- Afrikaans-Deutch ( 3799 words)
- afr-eng.tgz -- Afrikaans-English ( 5127 words)
- dan-eng.tgz -- Danish-English ( 3996 words)
- deu-eng.tgz -- Deutch-English (81697 words)
- deu-fra.tgz -- Deutch-French ( 8167 words)
- deu-ita.tgz -- Deutch-Italian ( 4453 words)
- deu-nld.tgz -- Deutch-Dutch (12811 words)
- deu-por.tgz -- Deutch-Portuguese ( 8741 words)
- eng-afr.tgz -- English-Afrikaans ( 6395 words)
- eng-deu.tgz -- English-Deutch (93297 words)
- eng-fra.tgz -- English-French ( 8798 words)
- eng-hun.tgz -- English-Hungarian (89676 words)
- eng-iri.tgz -- English-Irish ( 2716 words)
- eng-ita.tgz -- English-Italian ( 4518 words)
- eng-lat.tgz -- English-Latin ( 3025 words)
- eng-nld.tgz -- English-Dutch ( 7611 words)
- eng-por.tgz -- English-Portuguese ( 9307 words)
- eng-rom.tgz -- English-Romanian ( 988 words)
- eng-rus.tgz -- English-Russian ( 3384 words)
- eng-scr.tgz -- English-Croatian ( 598 words)
- eng-spa.tgz -- English-Spanish ( 5906 words)
- eng-swe.tgz -- English-Swedish ( 5482 words)
- eng-wel.tgz -- English-Welsh ( 2120 words)
- fra-deu.tgz -- French-Deutch ( 6113 words)
- fra-eng.tgz -- French-English ( 7830 words)
- fra-nld.tgz -- French-Dutch ( 9603 words)
- hun-eng.tgz -- Hungarian-English (137016 words)
- iri-eng.tgz -- Irish-English ( 1184 words)
- ita-eng.tgz -- Italian-English ( 3428 words)
- jpn-deu.tgz -- Japanese-Deutch ( 451 words)
- kha-deu.tgz -- Khasi-Deutch ( 994 words)
- lat-deu.tgz -- Latin-Deutch ( 1797 words)
- lat-eng.tgz -- Latin-English ( 2304 words)
- nld-deu.tgz -- Dutch-Deutch (17223 words)
- nld-eng.tgz -- Dutch-English (22745 words)
- nld-fra.tgz -- Dutch-French (16769 words)
- por-deu.tgz -- Portuguese-Deutch ( 8293 words)
- por-eng.tgz -- Portuguese-English (10397 words)
- sco-deu.tgz -- Scottish-Deutch ( 256 words)
- scr-eng.tgz -- Croatian-English ( 394 words)
- slo-eng.tgz -- Slovenian-English ( 826 words)
- spa-eng.tgz -- Spanish-English ( 4501 words)
- swa-eng.tgz -- Swahili-English ( 613 words)
- swe-eng.tgz -- Swedish-English ( 5219 words)
- tur-deu.tgz -- Turkish-Deutch ( 940 words)
- tur-eng.tgz -- Turkish-English ( 1025 words)
- wel-eng.tgz -- Welsh-English ( 727 words)
To add one of the above dictionaries just enter it at the bottom of the Dicts structure definition in utils.c. For example, to add wel-eng (Welsh-English) do the following:
- Uncompress wel-eng.tgz in /lib/dict. Two files will appear: wel-eng.dict and wel-eng.index.
- Edit utils.c and put the following at the bottom of
Dicts[] (right before the NUL entry):
{"wel-eng", "Welsh-English Dictionary", "/lib/dict/wel-eng.dict", "/lib/dict/wel-eng.index", dictdnextoff, dictdprintentry, dictdprintkey },
- run mk; mk install
- Add a new alias to lib/profile for running this dictionary only:
% fn wdict { dict -d 'wel-eng' $* }
Converting a .dz file to a flat file suitable for dict as defined above:
- Gunzip the .dz file:
gunzip < somedict.dz > some.dict
- Strip the 00- entries from the top of the file up until the first actual word
- run the mkdictd command described below to create the index:
mkdictd < some.dict > some.index
- Add some.dict and some.index to utils.c as described in the previous section
Format of the flat file:
The index file contains a single entry per word point to the offset of the first character of this word from the beginning of the file. Thus the first entry of the file has offset 0, the second has offset 0+length of the first entry, and so on.
The dictionary file contains entries in the following format:
WORD \n # beginning of entry, sometimes contains pronounciation Description \n # multiline description Description \n Description \n ... Description \n \n # two empty lines denote end of entry \n
How to create the index file:
The following program can be used to create index files from dictionaries in the above format. It expects a dictionary file on stdin and outputs the index on stdout:
Last Modified: Sept 21 2003
mirtchovski at gmail