I am not sure if you are old enough to remember the 1977 IBM movie Powers of Ten (trippy version, without narration) [also at the IMDB and wikipedia], but that’s a movie that sure put things in perspective. Thinking in terms of powers of ten helps me sort things out when I am considering a design problem. Thinking of the scale of a problem in terms of physical scale is a good way to assess its true importance for a project. Sometimes the problem is the one to solve, sometimes, it is not. It’s not because a problem is fun, enticing, or challenging, that it has to be solved optimally right away because, in the correct context, considering its true scale, it may not be as important as first thought.
Maybe comparing problems’ scales to powers of ten in the physical realm helps understanding where to put your efforts. So here are the different scales and what I think they should contain:
- Subatomic. That’s the level of everything over which you do not have direct control, such as how the processor breaks instructions into micro-instructions and execute them out-of-order to exploit instruction level parallelism, how it (re)sequences memory accesses, how it uses branch prediction to avoid stalls as much as possible, or whether or not speculative execution takes place. All those things are taken in charge by the processor and you have little or no control over how each of these feature is used. Moreover, the behavior is likely to change significantly from one level of processor to another, or from one brand to another, even if you remain in the same architectural family.
- Atomic. The level just above lets you control more things. This is the level of assembly language instructions, address generation and the memory management unit. At this level, you control computation at the level of registers, can map very precisely algorithms to CPU instructions in order to minimize code size, or minimize execution time, or both. Very little implementation details are hidden. At this level, only the most basic data types exists (such as byte, word, etc.). Pointers are merely address-sized registers or memory locations. If you’re lucky, you may have IEEE 754 floating point instructions.
- Molecular. This is the level where atomic elements combine to create slightly more complex structures. At this level appear more basic data types, such as strings, arrays, and maybe PODs. Pointer arithmetic also appears at this level. Bit-twiddling, arithmetic expressions, based on atomic-level concepts show up.
- Microscopic. This level sees flow control, branching, loops, and simple functions appear. Controlling the flow of the program by testing and branching, while loops, and other mechanisms are part of the microscopic behavior of the code. PODs are definitely part of the microscopic level. This level allows one to structure information more coherently, but is still in a very basic way. PODs allow the structuring of information, but does not yet export methods as classes do; they are still somewhat inert by themselves. Simple functions, such as those from the C Standard Library, providing very simple services—strcat, mktime, rand—live at the microscopic level.
- Mesoscopic. At this level, we see classes, groups of functions (or modules), and more elaborate flow control such as threading. At this level, libraries, groups of functions, possibly using a context, either static or dynamic, provide specialized functionalities. At this level, classes extend PODs with methods that are potentially automatically called, making the data much more “alive”. At this level, the flow control is more complex, we find devices such as cofunctions, fibers, and threads. Synchronisation mechanisms like mutexes and semaphores are somewhat more basic, and maybe should live in the microscopic level, but as they are of no use by themselves, they are in the mesoscopic level with the fibers and threads.
- Macroscopic. This level corresponds to sizable modules, groups of libraries, and small applications such as those found in the GNU collection. Dæmons, command line utility software, modules part of larger applications are at the macroscopic level. At this level, the details of implementation are partially overshadowed by the specification of the software’s behavior. At this level, the focus is clearly on what it does rather than how it does it.
- System. Operating systems, frameworks, execution and desktop environments are all larger than single applications. They all provide extended run-time environments to applications so that they integrate well with the operating system and possibly the graphical user interface. Abstractions such as file systems, device drivers, resource sharing or network connectivity exist at this level. All these abstractions provide services while hiding the complexity of each individual component as much as is possible; to help application integrate completely with the system. At this level, what also concerns itself with integration.
- Ecosystem. At this level, we are interested in how applications, processes, and resources communicate with each other. At this level, we have networking, inter-process communications that extends from mere synchronisation and signaling to data sharing through shared memory, message passing, and common data formats. The focus at this level is the cooperation between applications, work-flow optimization, and amelioration of productivity. The focus also includes interoperability with other computer systems altogether, such as Windows, OS/X, BSD Unix, and Linux1.
- Global. At this level, we are considering seamless integration of different programs on different operating systems, massive parallelism, great numbers of simultaneous users using different client software. This is the level of social apps that integrates into various clients, not even necessarily designed for interoperability with the social apps. At this level, the focus is to hide interoperability altogether by getting things to just work, regardless of the web browser, operating system or instant messaging program the users use, and provide the best user experience.
This (non-exhaustive) list of levels and topics can serve to guess just how much higher up or lower down in the hierarchy you must concern yourself with, and for which level your are currently designing a solution to a problem. The following diagram summarizes graphically the list:
The diagram shows that the levels are not mutually exclusive, that there is quite of an overlap between levels and of course you may agree or disagree on the particular levels I assigned to an item; and the details are somewhat less important than the whole picture.
So if you are facing the design of a command line utility, you most likely must concern with the next level up from macroscopic, the system (and maybe ecosystem) level, and most likely with the next level down, mesoscopic. Only later will you consider optimization and finer design at the microscopic level if performance proves a problem.
If you’re designing a high-performance number crunching library, you will concern yourself greatly with the atomic and microscopic levels because they will prove determinant in the success of your endeavor; but you will not want to loose the perspective of system-level libraries or framework.
Very often, you’re designing at a rather high level, say the macroscopic level and you spot a possible performance problem. If you’re anything like me, the temptation to explore this problem is very often very high; however, it must be fought so to address correctly the problem at hand and not being diverted to other, relatively unimportant areas of the design. I am not talking about a show-stopper technological risk (something that you do not know how to build, compute, solve, or circumvent and that is critical to the survival of the project2), I am talking about a mere brain-puzzle, a hack of sort, a premature optimization.
Focusing one’s attention at the right level will most likely result in the avoidance of premature optimization and in overall better design. You do need to concern yourself with the details, but in due time; design decisions—unless spectacularly bad—at the microscopic or atomic level will unlikely impact the overall performance of the whole project; algorithms and better data structures will be more important than deciding whether you’re going to use ++i or i++, or any similar nit-pick.
1 Not necessarily in that order.
2 I once worked at a place where the concept of technical risk was totally unknown. Instead of focusing on core product research to eliminate the technological risks, the management, very pointy-haired-boss style, allocated resources to other aspects of the project what would not be viable unless a small number of very important technological risks were eliminated, while neglecting the resolution of those critical issues. Of course, the project ultimately soft-failed—an endless succession of failing deliverables and no new technologies