Coding types

General idea

The idea is to make the serialized message strongly typed, so the receiving program can deserialize the message without a template or recipe how to do that. It also forms an extra check on the correctness, and correspondence between the expected fields and the received fields. Several choices exist: defining message types using separate messages, preceding each message with its structure, or preceding each field with its structure. The current version of DJUTILS-SERIALIZATION implements the last case: every field is preceded by a byte that indicates the type of the byte(s) that follow.

Endianness

When coding multi-byte values, endianness is of the utmost importance. Endianness indicates whether the most-significant byte comes first or whether the least-significant byte is sent first. The Internet and languages like Java use big endian, aka network byte order; the most significant byte comes first. Microsoft products and Intel processors internally use little endian; the least significant byte comes first. As an example, when we code an int (4 bytes) with the value 824, it is coded as follows using decimal notation (824 = 0 * 2563 + 0 * 2562 + 3 * 256 + 56 * 1):

  • | 0 | 0 | 3 | 56 | Big endian
  • | 56 | 3 | 0 | 0 | Little endian

Implemented types, big endian

The following types have been implemented in the v1-version of the standard:

code name description
0 BYTE_8 Byte, 8 bit signed two's complement integer
1 SHORT_16 Short, 16 bit signed two's complement integer, big endian order
2 INT_32 Integer, 32 bit signed two's complement integer, big endian order
3 LONG_64 Long, 64 bit signed two's complement integer, big endian order
4 FLOAT_32 Float, single-precision 32-bit IEEE 754 floating point, big endian order
5 DOUBLE_64 Float, double-precision 64-bit IEEE 754 floating point, big endian order
6 BOOLEAN_8 Boolean, sent / received as a byte; 0 = false, 1 = true
7 CHAR_8 Char, 8-bit ASCII character
8 CHAR_16 Char, 16-bit Unicode character, big-endian order for the 2 parts
9 STRING_8 String, 32-bit big-endian number-preceded byte array of 8-bits characters
10 STRING_16 String, 32-bit big-endian number-preceded char array of 16-bits characters, each 2-byte character in big-endian order
11 BYTE_8_ARRAY Byte array, preceded by a 32-bit big-endian number indicating the number of bytes
12 SHORT_16_ARRAY Short array, preceded by a 32-bit big-endian number indicating the number of shorts, big-endian coded shorts
13 INT_32_ARRAY Integer array, preceded by a 32-bit big-endian number indicating the number of integers, big-endian coded ints
14 LONG_64_ARRAY Long array, preceded by a 32-bit big-endian number indicating the number of longs, big-endian coded longs
15 FLOAT_32_ARRAY Float array, preceded by a 32-bit big-endian number indicating the number of floats, big-endian coded floats
16 DOUBLE_64_ARRAY Double array, preceded by a 32-bit big-endian number indicating the number of doubles, big-endian coded doubles
17 BOOLEAN_8_ARRAY Boolean array, preceded by a 32-bit big-endian number indicating the number of booleans
18 BYTE_8_MATRIX Byte matrix, preceded by a 32-bit big-endian number row count and a 32-bit big-endian number column count
19 SHORT_16_MATRIX Short matrix, preceded by a 32-bit big-endian number row count and a 32-bit big-endian number column count, big-endian coded shorts
20 INT_32_MATRIX Integer matrix, preceded by a 32-bit big-endian number row count and a 32-bit big-endian number column count, big-endian coded ints
21 LONG_64_MATRIX Long matrix, preceded by a 32-bit big-endian number row count and a 32-bit big-endian number column count, big-endian coded longs
22 FLOAT_32_MATRIX Float matrix, preceded by a 32-bit big-endian number row count and a 32-bit big-endian number column count, big-endian coded floats
23 DOUBLE_64_MATRIX Double matrix, preceded by a 32-bit big-endian number row count and a 32-bit big-endian number column count, big-endian doubles
24 BOOLEAN_8_MATRIX Boolean matrix, preceded by a 32-bit big-endian number row count and a 32-bit big-endian number column count
25 FLOAT_32_UNIT Float stored internally as a big-endian float in the corresponding SI unit, with unit type and display unit attached. The total size of the object is 7 bytes plus 1 or 2 extra bytes when a money unit is involved.
26 DOUBLE_64_UNIT Double stored internally as a big-endian double in the corresponding SI unit, with unit type and display unit attached. The total size of the object is 11 bytes plus 1 or 2 extra bytes when a money unit is involved.
27 FLOAT_32_UNIT_ARRAY Dense float array, preceded by a big-endian 32-bit number indicating the number of floats, with unit type and display unit attached to the entire float array. Each float is stored in big-endian order.
28 DOUBLE_64_UNIT_ARRAY Dense double array, preceded by a big-endian 32-bit number indicating the number of doubles, big-endian order, with unit type and display unit attached to the entire double array. Each double is stored in big-endian order.
29 FLOAT_32_UNIT_MATRIX Dense float matrix, preceded by a 32-bit big-endian row count int and a 32-bit big-endian column count int, with unit type and display unit attached to the entire float matrix. Each float is stored in big-endian order.
30 DOUBLE_64_UNIT_MATRIX Dense double matrix, preceded by a 32-bit big-endian row count int and a 32-bit big-endian column count int, with unit type and display unit attached to the entire double matrix. Each double is stored in big-endian order.
31 FLOAT_32_UNIT2_MATRIX Dense big-endian float matrix, preceded by a 32-bit big-endian row count int and a 32-bit big-endian column count int, with a unique unit type and display unit per row of the float matrix.
32 DOUBLE_64_UNIT2_MATRIX Dense big-endian double matrix, preceded by a 32-bit big-endian row count int and a 32-bit big-endian column count int, with a unique unit type and display unit per row of the double matrix.


Implemented types, little endian (v2)

The following types have been implemented in the v2-version of the standard. After each decimal value for the code, the 2's complement byte-value is given between brackets.

code name description
128 (-128)  BYTE_8 Byte, 8 bit signed two's complement integer; equal to code 0
129 (-127) SHORT_16_LE Short, 16 bit signed two's complement integer, little endian order
130 (-126) INT_32_LE Integer, 32 bit signed two's complement integer, little endian order
131 (-125) LONG_64_LE Long, 64 bit signed two's complement integer, little endian order
132 (-124) FLOAT_32_LE Float, single-precision 32-bit IEEE 754 floating point, little endian order
133 (-123) DOUBLE_64_LE Float, double-precision 64-bit IEEE 754 floating point, little endian order
134 (-122) BOOLEAN_8 Boolean, sent / received as a byte; 0 = false, 1 = true; equal to code 6
135 (-121) CHAR_8 Char, 8-bit ASCII character; equal to code 7
136 (-120) CHAR_16_LE Char, 16-bit Unicode character, little-endian order for the 2 parts
137 (-119) STRING_8_LE String, 32-bit little-endian number-preceded byte array of 8-bits characters
138 (-118) STRING_16_LE String, 32-bit little-endian number-preceded char array of 16-bits characters, each 2-byte character in little-endian order
139 (-117) BYTE_8_ARRAY_LE Byte array, preceded by a 32-bit little-endian number indicating the number of bytes
140 (-116) SHORT_16_ARRAY_LE Short array, preceded by a 32-bit little-endian number indicating the number of shorts, little-endian coded shorts
141 (-115) INT_32_ARRAY_LE Integer array, preceded by a 32-bit little-endian number indicating the number of integers, little-endian coded ints
142 (-114) LONG_64_ARRAY_LE Long array, preceded by a 32-bit little-endian number indicating the number of longs, little-endian coded longs
143 (-113) FLOAT_32_ARRAY_LE Float array, preceded by a 32-bit little-endian number indicating the number of floats, little-endian coded floats
144 (-112) DOUBLE_64_ARRAY_LE Double array, preceded by a 32-bit little-endian number indicating the number of doubles, little-endian coded doubles
145 (-111) BOOLEAN_8_ARRAY_LE Boolean array, preceded by a 32-bit little-endian number indicating the number of booleans
146 (-110) BYTE_8_MATRIX_LE Byte matrix, preceded by a 32-bit little-endian number row count and a 32-bit little-endian number column count
147 (-109) SHORT_16_MATRIX_LE Short matrix, preceded by a 32-bit little-endian number row count and a 32-bit little-endian number column count, little-endian coded shorts
148 (-108) INT_32_MATRIX_LE Integer matrix, preceded by a 32-bit little-endian number row count and a 32-bit little-endian number column count, little-endian coded ints
149 (-107) LONG_64_MATRIX_LE Long matrix, preceded by a 32-bit little-endian number row count and a 32-bit little-endian number column count, little-endian coded longs
150 (-106) FLOAT_32_MATRIX_LE Float matrix, preceded by a 32-bit little-endian number row count and a 32-bit little-endian number column count, little-endian coded floats
151 (-105) DOUBLE_64_MATRIX_LE Double matrix, preceded by a 32-bit little-endian number row count and a 32-bit little-endian number column count, little-endian doubles
152 (-104) BOOLEAN_8_MATRIX_LE Boolean matrix, preceded by a 32-bit little-endian number row count and a 32-bit little-endian number column count
153 (-103) FLOAT_32_UNIT_LE Float stored internally as a little-endian float in the corresponding SI unit, with unit type and display unit attached. The total size of the object is 7 bytes plus 1 or 2 extra bytes when a money unit is involved.
154 (-102) DOUBLE_64_UNIT_LE Double stored internally as a little-endian double in the corresponding SI unit, with unit type and display unit attached. The total size of the object is 11 bytes plus 1 or 2 extra bytes when a money unit is involved.
155 (-101) FLOAT_32_UNIT_ARRAY_LE Dense float array, preceded by a little-endian 32-bit number indicating the number of floats, with unit type and display unit attached to the entire float array. Each float is stored in little-endian order.
156 (-100) DOUBLE_64_UNIT_ARRAY_LE Dense double array, preceded by a little-endian 32-bit number indicating the number of doubles, little-endian order, with unit type and display unit attached to the entire double array. Each double is stored in little-endian order.
157 (-99) FLOAT_32_UNIT_MATRIX_LE Dense float matrix, preceded by a 32-bit little-endian row count int and a 32-bit little-endian column count int, with unit type and display unit attached to the entire float matrix. Each float is stored in little-endian order.
158 (-98) DOUBLE_64_UNIT_MATRIX_LE Dense double matrix, preceded by a 32-bit little-endian row count int and a 32-bit little-endian column count int, with unit type and display unit attached to the entire double matrix. Each double is stored in little-endian order.
159 (-97) FLOAT_32_UNIT2_MATRIX_LE Dense little-endian float matrix, preceded by a 32-bit little-endian row count int and a 32-bit little-endian column count int, with a unique unit type and display unit per row of the float matrix.
160 (-96) DOUBLE_64_UNIT2_MATRIX_LE Dense little-endian double matrix, preceded by a 32-bit little-endian row count int and a 32-bit little-endian column count int, with a unique unit type and display unit per row of the double matrix.


Unicode characters

Unicode characters can be of different formats: UTF-8, UTF-16 with a byte-order marker (BOM), UTF-16BE (big-endian), UTF-16LE (little-endian), UTF-32 with a byte-order marker (BOM), UTF-32BE (big-endian), and UTF-32LE (little-endian). To code all Unicode characters in UTF-8, one to four UTF-8 bytes are needed through the use of escape characters. For UTF-16, one or two two-byte combinations are needed. In UTF-32, all Unicode characters can be directly coded. Because of the escape characters, characters and strings really look different in UTF-8, UTF-16, and UTF-32. The curent version of DJUTILS-SERIALIZATION (and Sim0MQ) support UTF-8, UTF-16BE (big-endian), and UTF-16LE (little-endian). Mode about the differences between the encodings is explained in the Unicode FAQ list: https://unicode.org/faq/utf_bom.html#gen6.

For a discussion on little and big endianness for UTF-8 and UTF-16 strings, see the following discussion at StackExchange: https://stackoverflow.com/questions/3833693/isn-t-on-big-endian-machines-utf-8s-byte-order-different-than-on-little-endian, as well as https://unicode.org/faq/utf_bom.html#utf8-2.

Note that because of escape characters (or surrogates), the String length is not equal to the number of bytes in UTF-8, nor to the number of bytes divided by two in UTF-16. The numbers in STRING_8, STRING_16, STRING_8_LE and STRING_16_LE are related to the number of bytes / shorts in the representation, and not to the number of characters in the resulting String.