This document is primarily for myself. It may not be up to date or well structured. I wanted to improve on that before sharing, but I'm afraid I might never do it with this way of thinking. Just like my projects, this document is important to me and can be improved over time and is shared even if not advertised to others. It can be of interest to people that are trying to understand what I'm doing in my projects; especially if I am no longer here to tell.
General
- Try to work at the highest levels of abstraction to produce the most understandable and concise sources as possible. Mix edition and algorithms (which is also edition). E.g. in the case of sound synthesis, don't use the data produced by captures (sampling, etc.). Same for the production of textures, animations, geometry, etc. Focus on synthesis from algorithms, parameters and other high-level authored data instead of versioning concrete indecipherable blobs into repositories (this excludes high-level blobs like encoding of Lua data model objects). Concrete data can then be generated from those high-level sources.
Programming
General:
- Lua/LuaJIT (deep understanding of the JIT, interpreter, VM, etc.)
- C data model / FFI
- Processor/CPU behavior (understanding of instructions, cache, memory, etc.)
Extended:
- LÖVE
- GLSL
- GPU behavior
Heuristics
Heuristics that I consider true or useful to apply in my projects.
- Most time is spent reading code, not writing it. Optimize for that first.
- Similarly, in a database or state, most time is spent reading, not writing.
Guidelines
- Work at different levels of abstraction, composition, detail. Balance the graph of function dependencies to not have something too wide (not enough composition, functions are too long, too much context in the body) or too deep (too much indirections to understand what is actually happening).
- Interface abstraction: define an interface somewhere (e.g. source comments or web documentation) and use it as the reference for usage and implementation, so that the user doesn't need to think about the implementation and reciprocally.
- Be careful of over-generalization (e.g. using something to solve everything, when something specific would be better) and over-specialization (e.g. using something too specific when a more general solution is good).
- Limit pre-formatted text lines to 80 characters (with exceptions). Better for side-by-side diffs, pocket editing and readability.
- Most of the time, put the MIT license in a LICENSE and main source file (with short comments, watch for the 80 characters limit). For the other files, just put a license name header (with references to local files containing the entire license) and a copyright notice.
- Use spaces instead of tabs. See textual programming as the edition of a matrix of characters (with minor nitpicks like avoiding trailing spaces) which gives formatting power. Tabs are not only taking multiple cells of the matrix, which impairs that idea, but they also take a variable amount of them (dependent of a specific configuration).
- Use snake_case (lower case only, at the exception of constants) for variable, field, function and method identifiers. I was using camelCase for methods to distinguish them from the rest, but a single word camelCase is equivalent to a single word snake_case, and I try to use single words most of the time, thus it is not relevant. Those identifiers are keywords, it is possible to avoid underscores (spaces) when relevant or to shorten words (seldom). For literal string keywords (labels), use an hyphen instead of the underscore. The general idea is to build a keyword by concatenating words with a separator (or none) in all lower case or upper case, with a specific separator dependent of the context (underscore, hyphen, etc.).
- Don't use plain text Fossil check-in comments, but use the
timeline-hard-newlines
andtimeline-truncate-at-blank
options without inserting line breaks for pre-formatting purpose when editing the comment. This allows to use 3 levels of detail for check-ins: first line summary, optional comment body (after a blank line) and optional wiki page. With the additional ability to use plain text formatting, e.g. hyphen lists, to mostly avoid the Fossil wiki format. - Use a Fossil wiki page to track a tree of tasks (to work at different levels of composition); the Fossil tickets are the leaf nodes of the tree. E.g. for a roadmap.
Code Complexity (experimental)
Use gzip/DEFLATE to measure code complexity. The rough reasoning is as follow:
- Hypothesis: code complexity is related to "information complexity".
- The same identifier or construct used multiple times can be simple to understand (matches compression).
- Comments are more abstract informations, additional details and explanations about the code. They are not optional and can be seen as part of the overall complexity.
- Use general purpose lossless compression to measure irreducible information complexity.
- Lossless compression has logarithmic improvements, thus gzip is probably already a relatively good measure of irreducible information complexity.
- Compress files separately to match per file understanding of the information and avoid global compression (limited context). Use the resulting byte count as the measure. Do the sum of all files of a project for total complexity.
This method measures information complexity, it doesn't take into account reasoning and the relations about the code. But, I believe that it is still much better than line count and is simple to perform on any kind of language or format reliably (gzip/DEFLATE is everywhere).
Correctness
There are multiples ways to reach a certain degree of correctness for a program; this is my model.
In my current methodology, the degree of correctness necessary is low and formalism and static analysis are almost not present. Preferred ways are relation testing and consistency testing (assertions).
Analysis
- Manual: understanding the code.
- Automatic: static analysis, etc.
Testing
- Manual testing: not formalized.
- Automatic testing:
- By relation.
- By consistency / coherence.
Testing by relation is about checking a relation in the field of possible inputs / outputs of an operation. Cases can be generated or they can be produced manually (commonly called unit testing).
Examples of relation types:
- Identity: Check that function A has the same behavior as function B. E.g. two versions of the same function; one implementation is naive and easy to understand, the other is optimized and hard to understand. The first is used to check the behavior of the other.
- Inverse: Check that
A(B(x)) == x
or other combinations. E.g. pack and unpack functions. - ... or anything else
Testing by consistency / coherence would be about checking that the behavior of an operation or program is consistent regarding its rules and constraints. This can be done with ahead of time testing or with runtime checks: a lot of assertions about the state of a program can be inserted to detect issues as soon as possible. For example, if a program crash it would be considered incoherent and a test failure.
Performances
This section list domains, ways and principles to follow when dealing with performances (computation, storage, networking, etc.).
Locality
Manage things to increase the locality of information when computing.
From closest to furthest: registers, L1-2-3 caches, RAM, drives, network, etc.
Caching
Caching can be used to improve locality, but also to factorize computations (prevent recomputing reused outputs).
Pre-computation
A kind of caching by computing costly elements which will or might be needed later.
Algorithmic Improvements
Improve algorithmic complexity.
Overhead Reduction
Remove simple unnecessary operations. E.g. use lower-level constructs, avoid checks at different layers of abstractions, etc.
Approximation
Use an approximated process instead of a costly accurate one (heuristics, etc).
Data Layout
Inspired1 by Data Oriented Design (if not the actual thing). This optimization can be very effective in my methodology!
The way I design things in Lua is inspired by Object Oriented Programming, or at least, I am using objects, more specifically a directed graph of objects. Each object/table have fields. When dealing with lot of elements, it means creating a lot of tables. Instead of doing that, the data layout can be transposed / inverted, i.e. instead of having an array of tables with fields, we have an array for each field of the tables. This is a simplification, because we can put multiple fields in the same Lua or FFI array if they tend to be accessed together.
I have found in testing that it can be much more efficient even without the FFI and that the increase in efficiency is still present even without JIT compilation. And it seems already very effective without having to go into the specifics of the hardware (e.g. cache line size).
It has a significant effect on multiple metrics:
- Memory usage: It can reduce the amount of memory used, probably because tables and their management have a significant overhead.
- Allocation cost: It seems to be faster even for one item, and may be improved further with bulk allocation.
- Processing: By composing/arranging data for bulk processing, cache usage is improved. I suspect that it also helps the JIT compiler which seems to excel with "linear" bulk processing loops (e.g. decomposition of the data can remove unbiased branches).
- Garbage Collection: Because the number of tables/objects is in function of the fields and not the elements, the GC has few objects and references to manage, which greatly decreases the work it has to perform for a collection cycle.
The drawback is that this transposition / inversion of the data layout seems to imply the removal of the object oriented abstraction and requires manual bookkeeping. But, this removal of abstraction also comes with access to the context, i.e. how the objects (now just data) are used, which can reveal possible optimizations.
If we believe the heuristic that hotspots are mostly in loops, this data layout fits well, and it fits LuaJIT's heuristics.
It's also not just an optimization, it has interesting design properties. For example, it can be used as an alternative to a hierarchy of classes by using dynamic "components" which are assigned to entities, a design commonly called Entity Component System (but in my case I just care about systems and entities).
Distributed Computations (Parallelism)
Multi-threading, networks, etc.
For Multi-Threading:
- Don't do it yet; try to improve single-thread performances first.
- Do it, but avoid the need for shared memory and synchronization as much as possible. Copy the data by default.
Concurrent Computations
E.g. Coroutines. These concurrent computations do not happen in parallel; it is about reducing the synchronous latency of an operation, which will probably increase its asynchronous latency and use more compute overall. But, in a lot of cases, the synchronous latency is the problem. An example of that is the Lua Incremental GC which will work concurrently with user code and use (slightly?) more compute than a stop-the-world GC, but with much less synchronous latency, i.e. it will not freeze the program as the latter.
Another way to see it is that the slow synchronous operation is divided into smaller ones and it becomes a slow asynchronous operation.
Distributing Computations Over Time
The following is based on my experience in Factorio. Resources can be produced and consumed with different profiles. For example, the consumption can have spikes; in that case, storage (a buffer) may be used to mitigate those spikes.
Available computing resources for a specific system, if we ignore saving the energy, are in function of time. A specific amount of time is an amount of computing units available. But, those units cannot be fundamentally stored in a buffer to prepare in the event of a consumption spike. Therefore, the periods when the processor is unused are computing units lost forever.
By using the idle time of the processor to compute elements that will or might be used in the future, time can be saved and latency reduced.
Fluidity
In the case of most video games, they need to produce a certain amount of frames per seconds, at a steady pace. For the game to be fluid, each frame cannot exceed at certain time budget. A spike in computing consumption would result in losing frames (stuttering) or higher latency for concurrent operations.
Ideally, instead of consuming low or medium amount of compute most the time and spiky amount of compute sporadically, the game would have a way to consume a steady amount of compute all the time. This not only preserves the fluidity of the experience, but can also increases the amount of compute available to build the experience, because those computing units are no longer lost in idle times.
Profiling
For general CPU profiling, use a statistical/sampling profiler. Avoid as much as possible the profiling of a modified version of the original system, e.g. avoid instrumentation. If the profile doesn't sufficiently match the original system, it can't be used to guide optimizations.
When it is about synchronous latency, use manual and specific instrumentations progressively from the shallowest to the deepest level of details, searching for the culprit with the smallest amount of change to the system.
For non-CPU profiling, like network bandwidth, use automatic/general instrumentation (e.g. count the number of bytes sent by a specific line of code).
Versioning
This section describes how I should version my projects.
There are two kind of version: release and development.
Release
A release is a named specific point in time of a project to be distributed to users.
The version has the form <major>.<minor>
, where major
and minor
are
continuously incremented integers (starting at 1
for the major
number).
Major versions are not backwards compatible and should be considered different branches of the project (which may be simultaneously maintained).
Minor versions are backwards compatible. The notion of backwards compatibility is subtle; thus it is not described here.
The frequency of major version releases must be inversely proportional to the number of users.
Development
A development version is a commit / check-in hash. Don't use the 0.<minor>
form to describe a pre-version of the project.
FFI Bindings
In the case of bindings, don't see the LuaJIT FFI as a replacement for a C compiler, but as a better method than the Lua C API (less friction with C and higher-level) to write a wrapper. The FFI has limitations and is not able to load/check the C interface (headers) of loaded libraries. But if a C interface is designed specifically for the FFI (a wrapper) for a C library, the C compiler can do the work (e.g. checking, use of complex or platform-specific headers, etc.).
Making a C wrapper may be more work, but is safer and can be adapted to FFI needs.
Coroutines
As a rule for interoperability in the general case: only the entity or system that yielded the coroutine should be the one responsible of resuming it. This prevents resuming conflicts, which allows composition of yielding operations.
Entity Systems
As mentioned in the performances section, changing the data layout can be more than an optimization. This section explains a method which removes my need for OOP (my version of it). I should not use a class system ever again; I can still use objects, i.e. direct use of metatables, but not a class system with the intent to build some kind of class hierarchy.
The idea is to have systems. A system is an object, it exposes an interface made of methods and it implements processes and manages data related to them. The system is an abstraction over its data, it can manipulate its structures as necessary. A system can use other systems.
The following is a specific implementation of this idea. Entity systems are more specific systems dealing with unique shared entities. In a specific context we could call a world, each entity is identified by a unique identifier which cannot be reused: a Lua number incremented at each entity creation. This number is then given to various entity systems to register the entity into them. The components of an entity are implicitly given by the systems it is registered in.
It implies that components are dynamic and use-after-free / double-free are not really an issue. But, there is still bookkeeping to be done to avoid memory leaks.
Entity systems can have different purposes and they can own multiple data structures associated to different processes. An important purpose would be to perform a specific computation on all of its entities. To make that efficient, the system can use arrays, with or without the FFI.
The allocation strategy of these arrays is: add a new element at the end and
remove an element by moving the last element into the removed element's slot.
This is simple and prevents holes. To do that, a map from entity_id
to
internal_id
and a map from internal_id
to entity_id
(effectively an array)
are used. This is possible because the system's data is behind an abstraction.
The internal_id
can then be used to index the different arrays.
In preliminary testing, the performances are better on multiple metrics compared to objects, even without FFI and without JIT. The following are my observations.
Memory usage should be lower:
- Let's call a slot what can store a Lua value, i.e. 8 bytes.
- A key-value pair in the hash part can be considered a cost of around 3 slots (3*8 bytes) in my tests.
- The bookkeeping of one entity system is 4 (3 + 1) slots per entity.
- A table has a high initial overhead, but let's say it is negligible if the table has lot of fields. For each field of an object, 3 slots are used.
- At only 2 fields stored in one entity system or one object, the number of slots is now the same: 6 slots per entity (system: 4 + 2, object: 3 * 2).
- With more than 2 fields in one entity system, the amount of memory will go towards 3x less than for objects (system: +1, object: +3).
GC collection should be faster:
- Less memory used overall.
- For the content of an FFI array or a Lua array, there seems to be a small overhead. (It should be better with the FFI.)
- For each table (object), there seems to be a significant overhead.
For processing, the optimization opportunities seem to be much better as I understand it from Data Oriented Design. E.g. batch operations, weave together related data or split unrelated data, improve cache density, decrease (instruction-)cache misses, remove some branches, do manual loop hoisting to optimize for the interpreter, etc.
In conclusion, it should not be less efficient and it gives the opportunity to be much more efficient. But, the impact of this method in regards to simplicity and performances still needs to be determined in a real project.
Recommendations:
- Don't use the FFI by default. Using the FFI is harder (although not that much in that case) and is commonly much slower in interpreted code while not being necessarily faster in compiled code.
- Don't weave together fields by default; keep them separated. It's easier to design (especially without the FFI) and gives good results with sequential accesses.
Documentation
- Lua
- Markdown
- HTML / CSS
- Pikchr / SVG
Data
- Filesystem model
- Lua data model (LDM)
- C data model
LÖVE and Simplicity
I don't consider making (and playing) multiplayer experiences, especially like a game as a service or even a dedicated server like Minecraft, as essential anymore. I like that an experience can be enjoyed alone, offline, customized and with the data stored locally. It also simplifies things a lot, because multiplayer is more complex (synchronization, authentication, security, data responsibility, etc.).
I can still make single player games with multiplayer features, but at a lower scale, with a server running with LÖVE, for LAN (see ZeroTier) or Internet, whether it is embedded in the game, launched as a separate application on the desktop or on a dedicated GNU/Linux server. In all cases, the server and the client would use the same methodology, which simplifies things a lot, especially for distribution.
This means that it is possible to no longer use a system event loop like ljuv
and rely instead on LÖVE threads, channels and elscheduler
to build what is
needed for a specific game, which is more simple.
I can probably do all I want with LÖVE, LuaJIT and the FFI, without any AOT compilation which would dramatically increase the distribution complexity. I want to follow this constraint of not doing AOT compilation or using additional native dependencies for simplicity and to master optimizations restricted to a deep understanding of the interpreter, tracing JIT, C data semantics, processors (Linux perf), etc.
(As a last resort, I could do AOT compilation of C code for performance reasons and it could be an optional library loaded at runtime with a fallback path written in Lua.)
Furthermore, a good chunk of computation is or can be performed on the GPU dynamically with GLSL, making this constraint even more reasonable.
All I Need Is LÖVE. Not exactly, because Lua/LuaJIT and C are still at the core of my methodology and LÖVE is not completely aligned with how I want to do things, but it sounds good.
This allows to focus on what really matters to me: developing interactive and non-interactive audiovisual experiences, prototypes and games in a way that seeks simplicity. It is about programming, simulation, algorithmic generation, fiction, lore, world building, storytelling, visuals, animations, text, musics, audio atmospheres, sound design, etc.
A lot can already be done with this constraint.
- ^ Mostly from CppCon 2014: Mike Acton "Data-Oriented Design and C++". I don't agree with everything, I just took the fundamental ideas.