What to Do with Those Russians (Texts)
Intro
As usually, Russians can't do things in a straightforward way.
Currently, there are 4 widely spread codings in use
( that I know of )
- KOI-8
- Mostly in use within Internet, widely accepted as de-facto
e-mail standard
- Alternative PC
- MS-DOS users invented it to use stolen software and
view pseudographics as it is in ACSII
- Some MS-Windows coding
- Which I've little knowledge of
- ISO-8859-5
- Like ISO network protocols VERY rare beast.
I believe, a couple others do exist :)
Naturally, when dealing with Russian text, one never can be sure
that the thing is readable. You must accept this strange situation,
as you have to accept Russia.
As for me, I prefer Unices and therefore KOI.
It's relatively easy to russify Xterm and X 11 Window System in
general. Since the most popular Web browsers supporting forms are
various Mosaic flavours, so I refer to
Mosaic localization document, not exhaustive but useful guide.
If You Are Lazy
There is fast and dirty method to extract ( and to a certain degree
read and understand ) Russian text with WORA.
Apply function to character field you want to select from a database.
You'll get KOI text with 8-th bit cut and having some resemblance
to russian spelling ( it was one of the reasons for introducing this
coding - to cure the problems arising from dumb american 7-bit
sendmails and their ascii-centric owners ). The evident candidates are
- INITCAP
- Function capitalizes words
- LOWER
- Transform text to lower case
- UPPER
- Transform text to upper case
Acting as described above you can also define a condition for
information retrieval ( specify "where" clause ).
Naturally, you should be able to translate russian word
to 7-t ascii. For example, look at
WORA form fragment for "DUBNA_PHONES" table
WHERE
LOWER
last_name LIKE 'iwan%'
That'll return to you records with "last_name" field beginning with
"iwan", "IWAN", "IwAn" and alike strings.
Note that usually those names ("iwan") are being written as "Ivan".
Seems complicated, but for the most cases you should not think
about it.
I'd like to add that search conditions for text inclusion
should conform to Oracle strange notion of
regular expressions.
See also how to use WORA forms document.
ocr