DBCS data & Unicode

203 Views


Designing application programs for DBCS data

When you design application programs to process double-byte data or convert alphanumeric application programs to double-byte programs, consider this information.

DBCS stands for Double-Byte Character Set.

A double-byte character set (DBCS), also known as an "expanded 8-bit character set", is an extended implementation of single-byte character set (SBCS). DBCSs were originally developed to extend the SBCS design to handle languages such as Japanese , Chinese and Korean.

Designing DBCS application programs

You can design your application programs for processing double-byte data in the same way you design application programs for processing alphanumeric data, with some additional considerations.

  • Identify double-byte data used in the database files, if any.
  • Design display and printer formats that can be used with double-byte data
  • Write double-byte error messages to be displayed by the program.

Converting alphanumeric programs to process DBCS data

If an alphanumeric application program uses externally described display files, you can change that application program to a double-byte application program by changing only the files.

To change an application program to a double-byte application program, perform these steps:

1. Create a duplicate copy of the source statements for the alphanumeric file you want to change.

2. Change alphanumeric constants and literals to double-byte constants and literals.

3. Change fields in the file to one of the following data types to enter DBCS data:

   DBCS-open (O) data type, DBCS-only (J) data type, DBCS-either (E) data

You do not have to change the length of the fields.

4. Store the converted display file in a separate library. Give the file the same name as its alphanumeric version.

 

Unicode support in control language

Control language (CL) supports Unicode (UTF-16) parameter values.

With this support, your programs can pass a whole set of Unicode characters instead of just the job’s EBCDIC set.

Unicode overview

Unicode is a standard that precisely defines a character set as well as a small number of encodings for it.

It enables you to handle text in any language efficiently.

What is Unicode

Unicode is a standard that enables you to handle text in any language. With Unicode, a single application can work for users all over the world.

Before Unicode, the encoding systems that existed did not cover all the necessary numbers, characters, and symbols in use.

Unicode provides a unique number for every character, regardless of the platform, language, or program.

By using Unicode, you can develop a software product that works with various platforms, languages, and countries.

Why use Unicode

The operating system provides multilingual support. Unicode provides the means to store and retrieve data in the national language that you choose in a single file.

Unicode, therefore, provides one database file to support all text needs, regardless of the language of the input device. For example, the same file can have text in Greek, Russian, and English.

Design of Unicode in control language

With the Unicode support in control language (CL), the command processing program (CPP) can always get its data in either extended binary-coded decimal interchange code (EBCDIC) or UTF-16.

Parameter support

With the PARM support in CL, you can specify if you want this parameter viewed as EBCDIC or Unicode. EBCDIC is the default.

Parser support

The parser support in CL converts the provided input, so that the CPP always gets the type of value you specified in either EBCDIC or Unicode. 

The Unicode support is an option on the CCSID of value (CCSID) parameter of the Parameter (PARM) command .

Example: Passing the EBCDIC and Unicode value

The example shows how to specify the command to pass the extended binary-coded decimal interchange code (EBCDIC) and Unicode value.

START: CMD PROMPT(’EXAMPLE FOR UNICODE’)
       PARM KWD(STRING1) TYPE(*CHAR) LEN(40) DFT(ABC123) +
       MIN(0) CCSID(*JOB) PROMPT(’String one’) +
/* Passed in job CCSID (EBCDIC) */
       PARM KWD(STRING2) TYPE(*CHAR) LEN(40) DFT(ABC123) +
       MIN(0) CCSID(*UTF16) PROMPT(’String two’) +
/* Passed in Unicode (UTF-16)*/

 

Here is an example of a source member that is encoded in UTF-8 with Russian and Greek file names:

//BCHJOB
MKDIR DIR(’~/sample’)
MKDIR DIR(’~/sample/one’)
MKDIR DIR(’~/sample/Κ⌂ντρα πληροφορι⌂ν’)
MKDIR DIR(’~/sample/πο⌂⌂⌂⌂⌂⌂⌂ ⌂οπο⌂⌂⌂⌂⌂⌂⌂⌂ο⌂’)
MKDIR DIR(’~/sample/my backup info’)
//ENDBCHJOB

The UTF-8 file can be created by the following command:

CRTSRCPF FILE(MYLIB/UTF8TEST) MBR(TEST) TEXT(’test of utf8 file’) CCSID(1208)

The system QCMD prompt line does not support Unicode.

 

Post Comments