Categories
Philosophical Work

Getting better performance

All new software projects go through a phase where performance, defined as the number of tasks and the volume of data handled, divided by the time required to achieve the desired result, is much less than originally planned. When this performance shortfall impacts overall project success, solutions must be found, and it’s not always easy to retrofit them.

A common software programming approach is to build complex multi-purpose objects, and give those objects many properties and methods. This leads to significant overhead, creating performance issues when dealing with huge data sources and/or situations where the methods are too generic and not optimized.

A complementary approach is to create numerous reusable building blocks, each with limited functionality, and then sequence many of these together to create an application, commonly known as a workflow. This leads to significant inter-process data transfer, which is typically limited to one way transfers. There is significant overhead associated with managing each block, and performance decreases significantly as the workflow becomes more complex with a large number of blocks.

The data structures used can also represent a significant usability and/or performance constraint, such as choosing flat data representations when hierarchy exists, or sequentially accessed data files when random access is required, or when data compression schemes interfere with optimal data access (such as column compression when row access is required from tables).

The highest performance can only be achieved with software that enables the use case in the most direct way, with the least number of unused options and the smallest number of building blocks. In complex environments, this can lead to a large number of use case specific software, and a balance has to be found with other techniques.

My advice is to carefully align the software approach with the expected use cases, and to accept fundamental changes to correct early wrong assumptions, instead of implementing workarounds which inevitably lead to less performance. And when ultimate performance is required, specific development can circumvent existing methods and objects to “get the job done fast”.