iktsoft.com [Multilingual String]

Keiji Ikuta Software Laboratory

KStr: Multilingual String Class
By Keiji Ikuta(Jul. 5, 2001)

H > P > this page:

bottom (Group)

Objectives

KStr is a class to realize string manipulation in such a way as follows:

Treat multiple languages in the same time.
Treat variable length of strings.

Structure

Byte sequence.
Has length information. (long)
Use ESC sequence to change character codes.
Suppose normal text is Windows 1252.
To enable character search, ESC sequence has previous/next character code.
Each character code has each byte length.
e.g. Shift JIS = 2byte, Windows 1252 = 1byte.

e.g.

ESC sequence is represented like [Prev,Next].
The real code is like: [ESC] [Next] [Prev]

representation: ABC[20,22][22,20]XYZ

hex: 41 42 43 1B 22 20 8A BF 8E 9A 1B 20 22 58 59 5A

Using this structure, there are some merits:

Can draw each string block using different font suites for each character set.
Can draw structured string. An extension of this structure, I developted block structure for equation expression.
[Please see Equation]
Can hold more than UNICODE. The UNICODE is controversial because of unification of Kanji character.
Japanese Kanji and Chinese Kanji is different even though they are similar in form!)
Because KStr is using ESC sequences like ISO-2022, it has ability to extend its capacity.

Character Code

ASCII
Windows Code Page
ISO 8859
JIS X 0201
Japanese, Kanji)
JIS X 0208
Japanese, Kanji: These are almose same, but some characters are different.
JIS C 6226-1978
JIS X 0208-1983
JIS X 0208-1990
JIS X 0208-1997
JIS X 2012 (Japanese, Kanji)
JIS X 0212-1990
SJIS (Japanese, Kanji)
Standard used in PC and AIX.
EUC
Standard in UNIX(Sun). It can hold eny character code.
ISO-2022-JP
Similar concept with KStr. It uses ESC sequence.
ISO 10646/UNICODE
UNICODE has many problem!
e.g. CJK: Even though the shapes are similar, they are NOT same!
e.g. Combining mechanism: Why those mechanism were defined in character code?
GB2312-80 (Chainese, Kanji)
BIG 5 (Chainese, Kanji)
KS C 5601 (Korea)
KS C 5601-1992
VISCII (Vietnam)
Mojikyo (Kanji and any characters)

EBCDIC
JEF (Japanese, Kanji)

Character Code and Language References

Unicode Home Page
Fonts in Cyberspace
Ancient Scripts of the World
CJK.INF Version 2.1 (July 12, 1996)
Copyright (C) 1995-1996 Ken Lunde. All Rights Reserved.
CJK Tables and Character Set Tables by Koichi Yasuoka
Mojikyo Net: Chinese Characters for all the world
MS Software Globalization:Microsoft's Software Globalization Home page.

The Writing Systems of the World by NAKANISHI Printing Co.,Ltd.(Japanese only)

String Manipulations

Definition:

KStr s,s2; // Variable length. Allocated from heap memory. KStr_<100> s3; // Fixed length. Allocated on stack.

Assignment

s="abcde"; s2='A';

Conjunction:

s<<"xyz"<<'A';

String Functions:

s2=Mid(s,3,2);

Substring:

To treat substring, KStr_i is used. This class is iteration class of KStr.
KChar kc;
KStr s;
KStr_i si(s);
while(i){
  kc=i();
  i++;
}

String Functions

long Delete(long len)
long Insert(long st,rcKStr s)
KStrp Mid(rcKStr s,long st,long len);
KStrp Mid(rcKStr s,long st);
KStrp Left(rcKStr s,long len);
KStrp Right(rcKStr s,long len);
long Delete(rKStr s,long st,long len);
long Insert(rKStr s,long st,rcKStr src);
KStrf UL2Str(ulong n,long bs=10,char bc='a');
KStrf L2Str(long n,long bs=10,char bc='a');
KStrf Hex(ulong n,char bc='a');
KStrf Oct(ulong n);
long Str2L(rcKStr s,long bs=10);
ulong Str2UL(rcKStr s,long bs=10);
KStrf Format(long n,rcKStr fmt);
KStrf Format(rcKStr s,rcKStr fmt);

Test Application

Multilingual Text Editor with KStr using Win32 NLS, Win32 IME support under Win2000.
I admit that Windows2000 is great regarding language support including various IME and fonts!
t1169a

H > P > this page:

top

Class Library	KWLib	KStr	Structured Printing	Multiple Precision Arithmetics
User Interface	KWLib-shell	Designing Usability	KWLib-GUI	UI Def Lang
Visualization	KWLib-3D	Color Analysis	Image Processing
Cartography	Map Japan	Terran Base	CIA World Map
Astoronomy	Astronomy
Chemistry	Molecule 3D	Electron Orbitals
Fractals	Fractal Land	Fractal Tree

representation:	ABC[20,22][22,20]XYZ
hex:	41 42 43 1B 22 20 8A BF 8E 9A 1B 20 22 58 59 5A