Who's doing this Server?

This server was being developed by Jeffrey Friedl (Jeffrey Friedl), but has been handed off to someone else. The prose is unlikely to change, but at least the server will continue to be maintained and features added, as time permits. If you'd like to see Jeffrey, you can as a 43 k-byte jpeg or a 165 k-byte gif of the same picture. He's also known to sometimes appear as a smaller black and white 6.5 k-byte gif. This is mostly Jeffrey, in his own words:

The company where I used to work, Omron Corporation, was kind enough to allow me time and facilities to play with this stuff. Omron is a world leader in switches, relays, PLCs, etc., but is not well known by the non-Japanese consumer world. I have run into a few Omron cash registers in The States, and medical personnel tend to know the name.

Server Technology

This was my first WWW application, but I find it all quite cool, so have ended up developing some interesting technology to implement it. The bottom of the change log has the story about the early days of the server's development. (It's much less cool now, relatively speaking, but before Java and HTML2.0, server-side includes and cookies, it was pretty rocking stuff.)

For example, if you look at the HTML files, you'll find that they don't exist... all the ``files'' and directories you think you see exist only in the virtual world of a CGI I've written. Mmmm..., a virtual world within the virtual world of the Web... I guess it's a quasi-pseudo-neo-demi-reality! :-)

Besides doing the dictionary searches for you, the CGI massages the prose files (in a source format I call quasi-HTML) on the fly, adjusting things for your browser type, and selection of language, Japanese support, favorite image size, etc.

Self Addressed Stamped Envelope

At the bottom of each page, smart warp-to buttons are provided, based upon various things including a SASE (Self-Addressed Stamped Envelope) that might be attached to the URL. In fact, if you add
   ?SASE=http://Your_Favorite_Place
to the end of the URL that you use to contact the server, you'll have it placed in with the buttons at the bottom of the initial page. Neat!

Customization

The virtual path in URL is used to maintain state about what options the user has chosen. You can use the customization page to construct your favorite.

Server Processes

In addition to the CGI, I have the actual dictionary search done by a separate process which is running all the time (perhaps on a separate machine)... that way, its access of the several megabytes of dictionary data is kept hot. The CGI merely opens a socket to the lookup process (which just happens to be a version of my lookup program), spits over a request and slurps up the reply which it then formats, spindles and mutilates before it splats it back at you.

All the images of Japanese text are produced on the fly by a perl routine that generates gifs (monochrome only, although they can be ``transparent'' as an option) from random bitmap data (where here ``random'' means ``from the font file'' :-) To reduce the load on the Web server machine, the CGI first tries to access a Japanese GIF Server running on a different system. This will hopefully keep things fast. If the gif server is down, the CGI will just go ahead and generate the text itself.

Thanks

I'd like to thank Michael Chachich (mdchachi@japan.ml.com) for the suggestion to add filename-like glob patterns, which he rightly points out most people know better than raw regular expressions. Many others have helped with bug reports and suggestions of all kinds.

Where did the dictionary data come from?

Main and Name Dictionary Data

Most of the data used for this server has been tirelessly developed, collected, and maintained over the years by Dr. Jim Breen (j.breen@csse.monash.edu.au). This includes the main dictionary data, the name data (originally part of the main data), and the kanji data.

Jim's work was once copyrighted by him, but in 2000, he assigned the copyright to the Electronic Dictionary Research and Development Group. Consequently, you should be aware of the licence governing the data used at this site.

Many, many people have contributed to the quantity and quality of the main dictionary data, edict. This includes not only adding new entries, but correcting current ones. I personally have spent hundreds of hours over the last half dozen years working on scripts to check for errors, consistency, etc., besides adding numerous entries. And when it was at about the 100,000 entry level, Dr. Yo Tomita proofread the entire thing, fixing countless errors. These are just two examples -- others have expended even more energy. Please see edict's documentation file for a list of those that deserve your appreciation for making this data possible.

The kanji dictionary data, kanjidic, is also the collaborative effort of many. Coordinating and verifying the data has been done mostly by Jim and friends, but perhaps most of the work has been done behind the scenes. For example, in 1992 I spent three months of my life entering a whole host of data as I found it in Jack Halpern's ``New Japanese-English Character Dictionary''. Yet this mechanical work is nothing compared to the 18 years Jack spent creating his dictionary. The Korean readings, added by Jim in one quick moment in March 1996, were painstakingly developed over the course of 10 years of research by Charles Muller. These are just two examples from the credits found in kanjidic's documentation file.

About 18 years, 6 months ago (February '96), Jim split the data into two files, the main dictionary which kept the name edict, and name-only entries (which had swelled in recent months) to the new file enamdict. These files, as well as lots of other goodies, can be found at ftp.cc.monash.edu.au. Among the ``lots of other goodies'' you'll find jdic, Jim's own DOS interface to edict and kanjidic. It allows file viewing and dictionary searches on your local PC without the need for any other Japanese support (and since it's local, is much faster than this server!). There are also versions for DOS/Windows, UNIX/X-Windows, (2.1 k-byte blurb here), and the Macintosh. The jdic family has quite a different look and feel from this server, and if your platform supports it, you're encouraged to try it. If you find you like the look and feel of this server, and are on a Unix System, you might consider my lookup program.

If you'd like to see the good doctor, you can as a 8.5 k-byte jpeg or a 76 k-byte gif of the same shot.

Note: please don't contact Jim in regards to this WWW server - contact me (Comments appreciated) with questions or comments about this server. Questions about the dictionary data, or the tools mentioned above, should go to Jim (perhaps with a CC to me, if you like).

Dictionary of Legal Terms

Also available at Monash are the lawgloss.* files, produced by the University of Washington School of Law, Asian Law Program (their Copyright, 1995). I massaged the file, turned it into edict format, corrected a number of errors, and added it to my server. Here's Jim's note about lawgloss.

Dictionary of Life-Science Terms

Similarly, Monash has the lifscdic.* files, produced by a group of Japanese bioscientists from Bio-Net, lead by Dr Shuji Kaneko of Kyoto University. Here's Jim's note about lifscdic.

Dictionary of Four-Character Idiomatic Compounds

The first dictionary in a series of new additions to this server is a compilation based upon a list created by Kanji Haitani. These are yojijukugo or four-character compounds that have been singled-out as commonly ocurring. Jim's commentary and Kanji Haitani's note about 4jwords.

Dictionary of Aviation Terms

This is another EDICT-formatted dictionary by Jim Breen of Ron Schei's English/Japanese Aviation Dictionary. Jim's note about aviation. This server is running a newer dictionary that was recently proof-read by Teijo Kaakinen.

Dictionary of Computer Terms

This dictionary was compiled by Jim in 1997. It is a glossary of terms used in the computing and telecommunications fields. Jim's detailed note describing the sources used to create this file.

Dictionary of Compound Verbs

This is actually already included in the EDICT file, but is reproduced on this server for convenient searching. Jim Breen created this file from the book "Handbook of Japanese Compound Verbs" by Yoshiko Tagashira and Jean Hoff (Hokuseido Press, 1986). Jim's note on compverb.

Dictionary of Concrete Terms

Gururaj Rao produced this file in the course of his translation work dealing with technical reports mainly related to concrete and concrete structures. Gururaj Rao's note about concrete.

Dictionary of Financial Terms

This file contains a listing of financial terms compiled by Kevin Seaver, released to the Hoyaku WWW page nd converted to the EDICT format by Jim Breen. Jim's note on findic.

Dictionary of Geological Terms

This is a list of geology terms put together by Jim Breen, based on two sources. Jim's note on geodic.

Dictionary of Japanese Place Names

This is Jim Breen's compilation of Japanese place names that were extracted from the web pages of the Japanese Ministry of Posts and Telecommunications. Jim's note about j_places.

Dictionary of Computational Linguistic Terms

This dictionary, compiled and maintained by Francis Bond, is a list of terms used in theoretical and computational linguistics. Francis Bond's note explains the history behind this compilation and how to get in touch to contribute.

Dictionary of Marketing Terms

This is Jim Breen's compilation of Adam Rice's business and marketing terms. Jim's note about mktdic has more details about the tags used to denote their origins in the Honyaku WWW pages.

Dictionary of Pulp and Paper Terms

This is Jim Minor's compilation of pulp and paper industry terms. Jim's note about pandpgls.

Dictionary of Constellation Names

Raphael Garrouty's list of constellation names. There's no further documentation on this file.

Dictionary of Enginering and Science Terms

This is a conversion an original Macintosh text file that came as part of a conference kit at an acoustics conference supplied to James Friend. He passed this on to Jim for conversion. Jim's note about engscidic.

Dictionary of Forestry Terms

Juan Manuel Cardona Granda provides a collection of forestry terms originating from Japanese forestry journals. Mr Granda's note in English about forsdic.

Dictionary of Environmental Terms

This is a glossary of environmental terms which frequently appear in Japanese environmental reports, etc. Patrick Oblander's note about envgloss.

Dictionary of River and Water Resources Terms

This is a compilation of the River and Water Resources Glossary produced by the Infrastructure Development Institute of Japan. Jim Breen converted the compilation into EDICT format. Jim's note about riverwater.

Dictionary of Manufacturing terms

This file was prepared by Jim Breen and is derived from several web pages describing paper, molding and general manufacturing. Jim's note about manufdic.

Dictionary of Buddhist Terms

Chuck Muller has assembled a large Digital Dictionary of Buddhism. Thanks to Jim Breen, the XML extract that Chuck provided him was converted to EDICT-style format, so that it could be made easily available to tools that could already deal with EDICT files. Jim's note about buddhdic.

Dictionary of Chemical Terms

This dictionary is compilation gathered from an online catalogue of chemicals produced by Showa Chemicals Ltd. This catalogue is volume 28, consolidated as of August 2002.

http://www.st.rim.or.jp/~shw/cat_dex.html

The original catalogue was produced in Adobe PDF format. Using Adobe Acrobat Distiller, the individual files were then saved as RTF documents, which were then read into Microsoft Word 2000. Using that word processor, they were then saved as encoded text, in Japanese EUC encoding. After that, a perl script was used to extract the contents and produce a rough draft of chemdict in files separated according to Showa Chemicals' index.

Final proofing was done manually. William F. Maton, 20090321

How about all those fonts?

I use the publicly-available ``kanji##.snf'' JISX0208-1983 fonts. Frankly, I don't know their origin, but I'm thankful to whomever provided them. The COPYRIGHT says ``Public domain font. Share and enjoy''. So I do.

The vertical fonts were generated by converting the SNF format files back to BDF on a Sun. Then, using Mark Leisher's gBDFEd editor, I created a vertically set version of each of these fonts, and converted them back to their SNF counterparts. I am grateful to Dr. Ken Lunde for his invaluable assistance in July 1997 in identifying which glyphs in JIS-X-0208 have vertical variants. -wfms

The JIS-X-0212 fonts were created by obtaining the Sazanami open Truetype font produced by the EFont project, and using Mark Leisher's TTF2BDF program to extract BDF versions of the appropriate glyphs on a Linux machine. These were then converted into their SNF counterparts on a Sun using the bdftosnf program. The Sazanami font is copyrighted by Wada Lab, and here is a blurb about that. -wfms


Japanese text stuff

I owe everything I know about using Japanese text on a computer to Ken Lunde and his book Understanding Japanese Information Processing (O'Reilly and Associates's blowfish book) and its successors, CJKV Information Processing and CJKV Information Processing, Second Edition.

Comments appreciated
[Return to Change Log] [Return to Main Page] [Jump to Index]
(this page's master source last modified 1 year, 11 months ago)
This reply to request 115,391,327 made just for you Wed Oct 1st 2014 9:13pm JST [load currently averaging 17018 requests/day over a 264-second sample]