Objectives
KStr is a class to realize string manipulation in such a way as follows:
- Treat multiple languages in the same time.
- Treat variable length of strings.
Structure
- Byte sequence.
- Has length information. (long)
- Use ESC sequence to change character codes.
- Suppose normal text is Windows 1252.
- To enable character search, ESC sequence has previous/next character code.
- Each character code has each byte length.
e.g. Shift JIS = 2byte, Windows 1252 = 1byte.
e.g.
ESC sequence is represented like [Prev,Next].
The real code is like: [ESC] [Next] [Prev]
representation: |
ABC[20,22][22,20]XYZ
|
hex: |
41 42 43 1B 22 20 8A BF 8E 9A 1B 20 22 58 59 5A
|
Using this structure, there are some merits:
- Can draw each string block using different font suites for each character set.
- Can draw structured string. An extension of this structure, I developted block structure for equation expression.
[Please see Equation]
- Can hold more than UNICODE. The UNICODE is controversial because of unification of Kanji character.
Japanese Kanji and Chinese Kanji is different even though they are similar in form!)
Because KStr is using ESC sequences like ISO-2022, it has ability to extend its capacity.
Character Code
- ASCII
- Windows Code Page
- ISO 8859
- JIS X 0201
Japanese, Kanji)
- JIS X 0208
Japanese, Kanji: These are almose same, but some characters are different.
JIS C 6226-1978
JIS X 0208-1983
JIS X 0208-1990
JIS X 0208-1997
- JIS X 2012 (Japanese, Kanji)
JIS X 0212-1990
- SJIS (Japanese, Kanji)
Standard used in PC and AIX.
- EUC
Standard in UNIX(Sun). It can hold eny character code.
- ISO-2022-JP
Similar concept with KStr. It uses ESC sequence.
- ISO 10646/UNICODE
UNICODE has many problem!
e.g. CJK: Even though the shapes are similar, they are NOT same!
e.g. Combining mechanism: Why those mechanism were defined in character code?
- GB2312-80 (Chainese, Kanji)
- BIG 5 (Chainese, Kanji)
- KS C 5601 (Korea)
KS C 5601-1992
- VISCII (Vietnam)
- Mojikyo (Kanji and any characters)
- EBCDIC
- JEF (Japanese, Kanji)
Character Code and Language References
String Manipulations
Definition:
KStr s,s2; // Variable length. Allocated from heap memory.
KStr_<100> s3; // Fixed length. Allocated on stack.
|
Assignment
Conjunction:
String Functions:
Substring:
To treat substring, KStr_i is used. This class is iteration class of KStr.
KChar kc;
KStr s;
KStr_i si(s);
while(i){
kc=i();
i++;
}
|
String Functions
long Delete(long len)
long Insert(long st,rcKStr s)
KStrp Mid(rcKStr s,long st,long len);
KStrp Mid(rcKStr s,long st);
KStrp Left(rcKStr s,long len);
KStrp Right(rcKStr s,long len);
long Delete(rKStr s,long st,long len);
long Insert(rKStr s,long st,rcKStr src);
KStrf UL2Str(ulong n,long bs=10,char bc='a');
KStrf L2Str(long n,long bs=10,char bc='a');
KStrf Hex(ulong n,char bc='a');
KStrf Oct(ulong n);
long Str2L(rcKStr s,long bs=10);
ulong Str2UL(rcKStr s,long bs=10);
KStrf Format(long n,rcKStr fmt);
KStrf Format(rcKStr s,rcKStr fmt);
Test Application
Multilingual Text Editor with KStr using Win32 NLS, Win32 IME support under Win2000.
I admit that Windows2000 is great regarding language support including various IME and fonts!