The LDM binary format is similar to MessagePack (with inspirations from LuaJIT's String Buffers), but specialized for Lua/LuaJIT and my methodology. It is intended for small and medium sized data, e.g. from network messages to immutable file formats1. It doesn't aim to be a standard, but it aims to be portable and durable.
Another document explains why this format has been created.
The format must be backwards compatible and should be frozen.
Generalities
Each object/value is encoded using a byte tag.
The first 216 values are for tags with embedded unsigned integers. They are ranges.
The last 40 values are for unique tags.
A <n>b
suffixe designates the number of bits of an unsigned integer which is a
parameter of the object format.
Dictionaries
There are two external dictionaries and one internal dictionary.
Internal Object Dictionary
This dictionary is internal; it is part of the encoded data.
In the context of this section, an object is a string
or a table
.
Each time an object is packed (or unpacked) into the format, it follows three steps:
- The first time, the object is added to an implicit dictionary. This dictionary associates each object to an index, between 1 and 2^32 included, in the order of their appearances. The object is encoded as is.
- The second time, the object is added to the (explicit) internal dictionary. As the implicit dictionary, it associates each added object to an index, between 1 and 2^32 included, in order. The object is encoded as an internal object entry (a reference), which holds the index (0-based) from the implicit dictionary.
- The third time, the object is encoded as an internal object reference, which holds the index (0-based) from the internal dictionary.
The incentive for having two dictionaries is to have better compression. The heuristic is that an object either appears once or it appears a lot (e.g. a list of objects with similar fields).
This dictionary is ultimately about reference handling. Here are noteworthy benefits:
- Repeated strings are deduplicated.
- Table identities and references are preserved.
- A directed graph (i.e. cycles) can be represented.
External Object and Metatable Dictionaries
These dictionaries are external. They represent data which must stay available and backwards compatible for the lifespan of the packed data pieces using them.
Each dictionary associates an index, between 1 and 2^32 included, to an external resource. When packing, a reference (the index) to the resource is encoded. Lower indexes use less bytes.
Backwards compatibility in this context means that the dictionary can only be
modified by adding new entries; previously defined entries must not change. It
is still possible to mark en entry as removed with nil
to explicitly drop
support for packed data pieces using them.
Example use cases for the external object dictionary:
- String compression for small network messages by putting frequently used strings in the dictionary.
- References to runtime resources, e.g.
userdata
values, functions, etc.
Example use cases for the metatable dictionary:
- Tagging tables with metadata from the application.
- Serialization of a game's state containing objects with types and methods.
Tags Overview
Range tags:
name | bits | count | value | hexadecimal | Lua type |
---|---|---|---|---|---|
positive int 6b | 00xxxxxx | 64 | 0 - 63 | 0x00 - 0x3f | number |
negative int 5b | 010xxxxx | 32 | 64 - 95 | 0x40 - 0x5f | number |
internal object ref 5b | 011xxxxx | 32 | 96 - 127 | 0x60 - 0x7f | table, string |
external object ref 5b | 100xxxxx | 32 | 128 - 159 | 0x80 - 0x9f | table, string, userdata, function, thread |
string 5b | 101xxxxx | 32 | 160 - 191 | 0xa0 - 0xbf | string |
array 4b | 1100xxxx | 16 | 192 - 207 | 0xc0 - 0xcf | table |
map 3b | 11010xxx | 8 | 208 - 215 | 0xd0 - 0xd7 | table |
Unique tags:
name | value | hexadecimal | Lua type |
---|---|---|---|
positive int 8,16,32b | 216 - 218 | 0xd8 - 0xda | number |
negative int 8,16,32b | 219 - 221 | 0xdb - 0xdd | number |
unsigned int 64b | 222 | 0xde | cdata<uint64_t> |
signed int 64b | 223 | 0xdf | cdata<int64_t> |
internal object entry 8,16,32b | 224 - 226 | 0xe0 - 0xe2 | table, string |
internal object ref 8,16,32b | 227 - 229 | 0xe3 - 0xe5 | table, string |
external object ref 8,16,32b | 230 - 232 | 0xe6 - 0xe8 | table, string, userdata, function, thread |
metatable ref 8,16,32b | 233 - 235 | 0xe9 - 0xeb | table |
string 8,16,32b | 236 - 238 | 0xec - 0xee | string |
array 8,16,32b | 239 - 241 | 0xef - 0xf1 | table |
map 8,16,32b | 242 - 244 | 0xf2 - 0xf4 | table |
mixed table | 245 | 0xf5 | table |
unused | 246 - 251 | 0xf6 - 0xfb | |
float64 | 252 | 0xfc | number |
true | 253 | 0xfd | boolean |
false | 254 | 0xfe | boolean |
nil | 255 | 0xff | nil |
Grammar
|
: alternatives{...}
: repetition<...>
: special0x..
: byte
value = nil | true | false | number | string | table |
internal-object-entry | internal-object-ref |
external-object-ref | metatable-ref table
nil = 0xff
true = 0xfd
false = 0xfe
number = positive-int | negative-int | 0xde u64 | 0xdf i64 | 0xfc f64
positive-int = <0x00 + value> |
0xd8 u8 | 0xd9 u16 | 0xda u32
negative-int = <0x40 + absolute value> |
0xdb u8 | 0xdc u16 | 0xdd u32
string = string-header <string>
string-header = <0xa0 + length> |
0xec u8 | 0xed u16 | 0xee u32
table = array | map | mixed
array = array-header array-content
array-header = <0xc0 + length> |
0xef u8 | 0xf0 u16 | 0xf1 u32
array-content = {value}
map = map-header map-content
map-header = <0xd0 + pairs> |
0xf2 u8 | 0xf3 u16 | 0xf4 u32
map-content = {value value}
mixed = 0xf5 array-header map-header array-content map-content
internal-object-entry = 0xe0 u8 | 0xe1 u16 | 0xe2 u32
internal-object-ref = <0x60 + index> |
0xe3 u8 | 0xe4 u16 | 0xe5 u32
external-object-ref = <0x80 + index> |
0xe6 u8 | 0xe7 u16 | 0xe8 u32
metatable-ref = 0xe9 u8 | 0xea u16 | 0xeb u32
u8 = <8 bits unsigned integer>
u16 = <16 bits unsigned integer, little-endian>
u32 = <32 bits unsigned integer, little-endian>
u64 = <64 bits unsigned integer, little-endian>
i64 = <64 bits signed integer, two's complement, little-endian>
f64 = <IEEE 754 double precision floating point number, little-endian>
- ^ Files that are not modified in-place, unlike a SQLite database.