UTF-8 Encoding

General Information

With EcoStruxure Machine Expert V2.2 and later versions, the STRING data type can be encoded in Latin 1 or in UTF-8 format. For details, refer to the paragraphs Project-Wide UTF-8 Encoding and Encoding Single Literals in UTF-8 Format.

Since UTF-8 encoding provides the most comprehensive character set, it is a good practice to enable UTF-8 encoding for new projects, as well as for existing projects that are to be used in a new context.

EcoStruxure Machine Expert can process a wide variety of characters to display diagnostic messages or visualizations in different languages as well as to accept user input in these languages and characters or symbols.

If a comprehensive character set is not required or if a project is not subject to change, strings can be encoded in Latin 1 format. Additionally, the following character sets are supported:

Character set

Code page number

Description

Character encoding

ASCII

20127

  • 128 characters

  • Suitable for English texts.

7-bit encoded character

DOS Latin 1

819, 850

  • Complies with ISO/IEC 8859.

  • Suitable for Western European languages in the Windows command line.

8-bit encoded character

Latin 1

28591

  • Complies with ISO/IEC 8859–1.

  • Frequently used for HTML pages with äöüß but without or, for example, French special characters.

8-bit encoded character

Windows 1252 Encoding

1252

  • Default Windows character set for Western European countries.

  • Windows internally uses the UTF-16 format.

  • Contains all characters from ISO/IEC 8859–1 and ISO/IEC 8859–15 but partly with different encoding.

8-bit encoded character

Unicode

-

  • Universal character set for a wide range of languages, including historical languages, Braille, music, or emojis.

  • More than 100,000 characters can be displayed.

  • Each character has a numeric code.

  • In contrast to ASCII, a distinction is made between the assignment of code points to characters and the encoding of the characters.

  • Numeric codes < 128 are ASCII compatible

  • Numeric code < 256 are ISO/IEC 8859–1 compatible

For more information, refer to https://home.unicode.org/.

-

Unicode 14.0

-

144,697 characters

-

UTF-16

1200

  • Special Unicode

  • Used in some operating systems (Windows, OS X) and programming languages (Java, .NET) for internal character representation.

  • Different computer architectures encode the 4-byte characters differently.

    Little endian byte order for UTF-16LE.

16-bit encoded characters

The characters are encoded either in 2 bytes or 4 bytes.

UTF-8

65001

  • Byte-oriented encoding format of Unicode characters.

  • Most widely used.

  • Used in GNU/Linux and Unix operating systems, and in various Internet services (email, web, browser).

  • Compatible with ASCII characters in the first 128 characters (0...127).

Tuple of 8-bit words per character

The characters are encoded in different lengths from 1...4 bytes.

Project-Wide UTF-8 Encoding

The Project Settings > Compile options dialog box provides the parameter UTF8 Encoding for STRING that allows you to configureUTF-8 encoding for all strings of data type STRING throughout the project. Refer to Project Settings - Compile Options in the Menu Commands Online Help.

NOTE: Before you set the encoding format to UTF-8, execute the static analysis rule SA0175 on your code to help detect constructs that may cause issues with UTF-8 encoding.

WSTRING data types are not affected by this setting. They are always encoded as Unicode in UTF-16 format.

For data type STRING, project-wide encoding is as follows:

Option UTF8 Encoding for STRING

Project-wide encoding

Selected

UTF-8

Unselected

  • Windows 1252 encoding (default Windows encoding)

  • Latin 1

With project-wide UTF-8 encoding enabled, STRING_TO conversion operators can be used as described under STRING_TO Conversions.

If project-wide UTF-8 encoding is enabled, then this setting also applies to library functions and add-ons.

Encoding Single Literals in UTF-8 Format

If Latin 1 encoding is used throughout the project (UTF8 Encoding for STRING is unselected), you can encode single literals in UTF-8 format. To achieve this, add the UTF8# type prefix to each literal.

{attribute 'monitoring_encoding' := 'UTF-8'}
strVarUtf8: STRING := UTF8#'你好,世界!ÜüÄäÖö';
NOTE: If you use single UTF-8 encoded strings, ensure that they are interpreted correctly wherever they are used.

Example: A string variable in the OPC UA server will be converted to UTF-8 before being transferred to a client if the setting is not selected. This leads to values such as UTF8#'äöü' being misinterpreted.

A similar condition can occur with strings that are displayed in the visualization.

For further information, refer to

String Manipulation

Use library functions to manipulate strings.

NOTICE
UNINTENDED STRING MANIPULATION
Do NOT use an index access to a variable of type STRING for string manipulation as this may lead to unintended results when project-wide UTF-8 encoding is enabled.
Failure to follow these instructions can result in equipment damage.