software – Thoughts on software architecture and design in the digital age

On Separation of Concern (SoC)

January 5, 2024February 1, 2024 emlandre1 Comment

Separation of concern is a key design principle. Just like our homes have different rooms for different purposes. In computer science, seperation of concern leads to modularization, encapsulation, and programming language constructs like functions and objects. It’s also behind software design patterns like layering and model-view-controller (MVC).

The importance of breaking software down into smaller loosley coupled parts with high cohesion have been agreed for decades. Yourdon & Constantine’s 1978 book Structured Design: Fundamentals of a Discipline of Computer Program and System Design provides fundamental thinking on the relationships between coupling, cohesion, changability and software system lifecycle cost.

Modularization begins with the decomposition of the problem domain. For digital information systems (software) it continues with identification of abstractions that match both the problem domain and the execution machinery, and the packaging of these abstractions into working assemblies aka applications that can run on a computer.

Figure 1: Applications serving multiple domains

Figure 1 shows that each application addresses three concerns: the user interface, the functional logic, and its data. While software architects talk about their systems using such strict layering, everybody who have looked inside a real-world applications know reality is much more muddy.

What figure 1 does not show is how a company is divided into different areas, and how these areas are used to define which applications are responsible for which tasks. It also doesn’t explain how information can be exchanged between applications, and it doesn’t illustrate how data gravity corrupts even the best design as time goes by. That what began as a well-defined application over years becomes an entangled unmanagable behemoth.

Separating the wrong concerns?

When we try to figure out why certain things are missing, we need to ask a key question about how we design business applications: Are we dividing things up in the best way? To answer this, we have to think about the different factors at work and consider different ways to deal with them.

Data gravity captures the hard fact that data gravitates data. Why this is so has many reasons. Firstly, it’s always easier to extend an existing database with a new table than making a new one from scratch. The new table can leverage the values of existing tables in that database. In other words, it makes data exchange simpler. Finally, the energy cost of adding a new table is lower than the energy cost of creating a complete new database and establishing required exchange mechanisms.

Enterprises are complex systems without boundaries. Organisational charts show how the enterprise is organized into verticals such as sales, engineering, and manufacturing, verticals that are served by horizontals such as IT and finance and control. In reality, the enterprise is a web of changing interactions that continuously consume and produce data. Mergers, divestments, and reorganizations change the corporate landscape continuously, adding to the underpinning complexity.

Applications are snapshots of the enterprise. They represent how the enterprise works at a given point in time and as such, they are pulled apart by data gravity on one side and the enterprise dynamics on the other. This insight is captured by the old saying that data ages as wine, and software ages as fish. Using the wine and fish analogy, applications are best described as barrels with fish and wine.

The effects of these forces play over time is entangled monolithic behemoth applications that are almost impossible to change and adapt to new business needs. The microservices architectural style was developed to deal with this effect.

Microservices

Microservices is based on decomposing the problem space into small independent applications that communicates using messaging. The approach is reductionistic and driven by development velocity. One observed effect is distributed data management, and they does not solve the underpinning problem of data and software entanglement.

The specific relevant content for this request, if necessary, delimited with characters: Sticking to the fish analogy, microservices enable us to create small barrels that contain Merlot and salmon, Barbera and cod, and Riesling and shark, or whatever combination you prefer. They do not separate data from the code, quite the opposite, they are based on data and software entanglement.

The history of the microservice architectural style goes back to 2011-2012, and for the record, the author was present at the 2011 workshop. It’s also important to mention that the style has evolved and matured since its inception, and it is not seen as a silver bullet that solves everything that is bad with monoliths. It’s quite the opposite; microservices can lead to more problems than they solve.

To get a deeper and more profound understanding of microservices read Software Architecture: The Hard Parts and pay attention to what is defined as architectural quanta: an independently deployable artifact with high functional cohesion, high static coupling, & synchronous dynamic coupling. The essence of an architectural quanta is independent deployment of units that perform a useful function. A quanta can be built from one or many microservices.

Introducing the library architectural style

The proposed solution to the problems listed above is to separate data from applications and to establish a library as described in the reinventing the library post, and to call this the library architectural style as illustrated by figure 2 below.

The essence of the library style is to separate data from the applications and to provide a home for the data and knowledge management services that are often neglected. The library style is compatible with the microservice style, and in many ways strengthens it as the individual microservice is relieved from data management tasks. The library itself can also be built using microservices, but it’s important that it’s deployed as independent architectural quanta(s) from the application space, which are quantas of their own.

Figure 2: Separating applications from data by adopting *the library architectural style*

There are many fators that make implementation of a functional library difficult. Firstly, all data stored in the library must conform to the libraries data definitions. Secondly, the library must serve two conflicting needs. The verticals need for efficient work and easy access to relevant data, and the library’s own needs for data inventory control, transformations and processing of data into consumable insigthts and knowledge.

The library protocol

A classical library lending out books provide basically four operations to its users:

Search i.e., what books exist, their loan status, and where (rack & shelf) they can be found.
Loan i.e., you take a book and register a loan.
Return i.e., you return the book and it’s placed back by the librarian.
Wait i.e., you register for a book to borow and are notified when it’s available.

These operations constitute the protocol between the library and its user, e.g., the rules and procedures for lending out books. In the pre-computer age, the protocol was implemented by the librarian using index cards and handwritten lists. The librarians performed in addition many back-office services such as acquiring new books, replacing damaged copies, and interacting with publishers and so on. These services are left out for now.

Linda, Tuple spaces and the Web

In danger of making this a history lesson, but in 1986 David Gelerntner at Yale University released what is called the Linda coordination language for distributed and parallel computing. Linda’s underpinning model is the tuple space.

A tuple space is an implementation of the associative memory paradigm for parallel / distributed computing where the “space” provides repository of tuples i.e., sequences or ordered lists of elements that can be manipulated by a set of basic operations. Sun Microsystems (now Oracle) implemented the tuple space paradigm in their JavaSpace service that provide four basic operations:

write (Entry e, ..): Write the given entry into this space instance
read (Entry tmpl, ..): Read a matching entry from this space instance
take (Entry tmpl, ..): Read an entry that matches the template and remove it from the space
notify(Entry tmpl, ..): Notify the application when a matching entry is written to the space instance

Entry objects – define the data elements that a JavaSpace can store. The strength of this concept is that the JavaSpace can store linked and loosely coupled data structures. Since entries are Java objects, they can also be executed by the readers. This makes it possible to create linked data structures of executable elements. An example being arrays whose cells can be manipulated independently.

JavaSpaces is bound to the Java programming language, something that contributed to its commercial failure. Another factor was that around year 2000 few if any enterprises were interested in parallel/distributed computing, nor were the enterprise software suppliers such as Microsoft, Oracle, SAP, and IBM.

The evolution of the Internet has changed this. The HTTP (Hypertext Transfer Protocol) provides an interface quite similar to the one associated with tuple spaces, though with some exceptions:

GET: requests the target resource state to be transferred
PUT: incurs changes to the target resource state
DELETE: requests a deletion of the target resource state

The main difference is that HTTP is a pure transport protocol between a client and a server, where the client can request access to an addressable resource. A tuple space implements behavior on the server side, behavior that could be made accessible on the Internet using HTTP.

Roy Fielding’s doctoral dissertation architectural styles and design of network-based software architectures defines the REST (Representative State Transfer) architectural style that is used by most Web APIs today.

REST provides an abstraction of how hypermedia resources (data, metadata, links, and executables) can be moved back and forth between where they’re stored and used. Most REST implementations use the HTTP protocol for transport but are not bound by it. REST should not be confused with Remote Procedure Calls (RPC) that can also be implemented using HTTP. Distinguishing RESTful from RPC can be difficult; the crux is that REST is bound to access of hypermedia resources.

The JavaSpace API is a RESTful RPC API as its operations are about moving Entry objects between a space and its clients, as shown in figure 3. The implication is that RESTful systems and space-based systems are architecturally close and may be closer than first thought when we explore what a data platform can do.

Data platforms

A data platform is according to ChatGPT a comprehensive infrastructure that facilitates the collection, storage, processing, and analysis of data within an organization. It serves as a central hub for managing and leveraging data to derive insights, make informed decisions, and support various business operations.

The OSDU® Data Platform, as an example, provides the following capabilities:

Provenance aka lineage that tracks the origin of a dataset and all its transformations
Entitlement that ensures that only those entitled have access to data
Contextualisation by sector specific scaffolding structures such as area, field, well, wellbore, survey
Standardised data definitions and APIs
Imutability and versioning as the foundation of provenance and enrichment
Dataset linking by use of geolocations including transformation between coordinate systems
Unit of measure and transformations between unit systems
Adaptable to new functional needs by defining new data models aka schemas

The OSDU® Data Platform is an industry-driven initiative aimed at creating an open-source data platform that standardizes the way data is handled in the upstream oil and gas sector. The platform demonstrates how a library can be built. For more details, it can be found on the OSDU® Forum’s homepage. Those needing a deep-dive are recommended to read the primer part one and two.

The OSDU® Data Platform functions in many ways as a collaborative data space. It can also be understood as a system for learning and transfer of knowledge within and between disciplines. It is in many ways a starting point for developing a digital variant of the ancient library where scribes captured and transformed the knowledge of their time for future use.

From JavaSpaces to DataSpaces

JavaSpaces failed commercially for many reasons outside the scope of this blog post. Despite that, the value of space-based computing lives on in initiatives such as the European Strategy for Data and the European Data Space initiative.

The OSDU® Data Platform has demonstrated that it’s possible to create industry-wide data platforms that can act as the foundation in what can be called the library architectural style. A style that is basically a reincarnation of space-based computing.

The true power of space-based computing disappears in technical discussions related to how to construct the spaces, discussions on who owns what data, and so on. What is lost in those discussions is the simplicity space-based computing offers application developers.

A simplicity that is very well illustrated in this quarter of a century-old picture from Sun Microsystems showing how “Duke the JavaBean” interacts with JavaSpace instances moving elements around.

This is also a good example of how a library works; some create new books that are made available in the library. Others wait for books to read, and others take books on a loan. Others work on the backside, moving books from one section (space) to another, or for that matter, from one library to another.

Figure 3 illustrates what this blog post started out with, separation of concern and the importance of separating the right concerns. There is a huge potential in decoupling workers (applications) from the inventory management that is better done by a library.

A digression. My soon 25-year-old master thesis demonstrated that a JavaSpace-based implementation of a collaborative business problem was half the size of the solution built with the more traditional technology of the time. That, in many ways, shows the effect of having a data platform in place and the value from separating data from business logic.

The OSDU® Data Platform work has thought us that the development of an industry wide data platform does not come for free and require collaboration on global scale. The OSDU® Data Platform development has also demonstrated that this is possible within one of the most conservative industries of this world, something that mean its doable in other sectors as well. Their benefit beeing that they can stand on the shoulders of what has already been achieved. A working technological framework is made availble for those who want to explore it.

Whats wrong with Microservices?

October 18, 2021October 18, 2021 emlandreLeave a comment

This is the first post in a longer series on service oriented software architecture. The story begins with a retrospective of a large software project I was part of at the beginning of the century. The task was to implemented a new merchandise software suit for a large European retailer in line with their “Delta” service architecture.

Delta

The Delta service architecture was based on healthy design principles such as encapsulation, autonomy, independent deployment and contracts. The principles had been used on the business level and created nice fine granular, independent services with names like Merchandising Store, Assortment, Promotion, Retail Price, Store Replenishment, each service being responsible for a limited but clearly defined business capability. Remembering correctly there was more than 20 services to be made for the whole suite including buying and selling, representing a functional decomposition of the business.

Since all services should be deployed in the same Enterprise Java production environment the team decided on a product line approach, and established a set of principles supported by common components such as the Object Relational Bridge for object persistence, JMS (Java Message Service) for asynchronous contracts processing, and implementation of domain logic using a rich object model guarded by transactional boundaries, an approach that turned out to work reasonable well from a technical point of view. Blueprint shown in figure 2.

Figure 2: Technical architecture anno 2003.

Today some of these choices might look weird, but there was no Cloud, no Spring framework, and no Docker when this was made. The client had in addition made several architectural decisions such as choosing application server, database and message broker, choices that reduced our ability to exploit the technology as we could have by starting out with a open source strategy.

One lesson learned here is, if possible, to avoid premature, political high profile commercial technology decisions. Such decisions always involve senior management as serious money change hands, and at the same time they need to be architectural healthy to stand their time.

Core Domain

The development team was new to the retail domain, and therefore we decided to begin with he simplest service, Retail Store. In retrospect this was a bad decision because it made us begin at the fringe of the domain instead of at the core.

Item is the core of retail. Deciding what items to sell in a particular store is the essence of the retailing business. A can of beer, the six pack and the case of six packs area all items, the same is a bundle of a pizza and a bottle of water. The effect being that there are tens of thousand items in play, some of them are seasonal, others geographical, and others are bundles and promotions such as 3 for 2.

Items can be nested structures organised into categories such as diary, meat, fish, fruit, and beverage. Which items are found in a particular store is defined by store type or format, size, season and geography. A simplified domain model is found in figure 3.

Figure 3: Retail assortment hierarchy from 35000 feet

When the work began we had a high level business service architecture, but we had no domain model showing the main business objects and their relationships to guide the work forward. The effect was that the team learnt the domain the hard way as it dug itself from the edges toward the core.

A fun fact is that the team met Eric Evans when he presented his book on Domain Driven Design in 2004 discovering that we had faced many of the same challenges as him. The difference was that he had been able to articulate those challenges and turn them into a book. Had the book been out earlier we would have been in a better position to ask hard questions related to our own approach.

Learnings

I have identified five major learnings from this endeavour that are relevant for those who considers to use Microservices as architectural style for their enterprise business application. At first glance the idea of small independent services sounds great, but it comes with some caveats and food for thought.

Firstly, a top down business capability based service decomposition without a thoroughly bottom-up analysis of the underpinning domain model is dangerous. In Domain-Driven Design speak this mean that the identification of bounded contexts require a top-dow, bottom-up, middle-out strategic design exercise since business capability and domain model boundaries are seldom the same. Cranking out those boundaries early is crucial for the systems architectural integrity. Its the key to evolution.

Secondly, begin with the core of the domain and work toward the edges. Retail is about the management of items, how to source them, and how to bring them to the appropriate shelves with the correct price given season, geography and campaigns. Beginning with the Retail Store service because it was simple was ok as a technical spike, but not as strategy.

Thirdly, fine granular services leads to exponential connectivity growth and a need to copy data between services. The number of connections in a graph grows according to f(n) = (n(n-1)/2). Therefore 5 services have 10 connections. Doubling to 10 services gives 45 connections, and a doubling to 20 services gives 190 connections and so on. The crux is to understand how many connections must be operational for the system as a whole to work and to balance this out with a healthy highway system providing the required transport capacity and end point management.

Fourthly, the development team was happy when the store service worked, but a single working service at the fringe of the domain does not serve the business. The crucial question to ask is what set of functionality must be present for the system as a whole to be useful? The ugly worst case answer to that is all, including a cross cutting user interface that we leave out for now. The lesson is that Microservices might give developers a perception of speed, but for the business who needs the whole operational the opposite might be the case. Therefore should the operational needs drive the architecture as functional wholes must be put into production.

Fifthly, the service architecture led to a distributed and fragmented domain model since service boundaries was not aligned with the underpinning domain model seen in figure 3. Price, Assortment and Promotion has the same data foundation and share 80% of the logic that we ended up replicating across those services.

To sum it all up, understand the whole before the parts and then carefully slice the whole into cohesive modules with well defined boundaries remembering that the business make their money from an operational whole, not from fragmented services that was easy to build.

Microservices

Microservices is an architectural style introduced around 2013 that promotes small autonomous services that work together, modelled around a business domain and supported by a set of principles or properties such as culture of automation, hide implementation details, decentralise all the things, independent deployment, consumer first, isolate failure, principles that are difficult to argue against, while a literal interpretation might cause more harm than needed.

Philosophically Microservice follows the principles of Cartesian reductionism, as did Lord Nelson in his divide and conquer strategy. The big difference, Lord Nelson’s task was to dismantle the French fleet, not to build a new fleet from the rubbles, and this difference coins IMHO the major challenge with the Microservice style. Its aimed at independent development and deployment of the parts pushing the assembly of the whole to operations. Some might argue that its then fixed by the DevOps model, but if there are 20 services supported by 20 teams the coordination problem is invitable.

Conclusion

Service oriented architectures and the Microservice architectural style offers opportunities for those who need independent deployment. For applications who do not need independent deployment a more cohesive or monolitic deployment approach might be better. Independent of style, the crux is to get the partitioning of the domain right at design time and in operations. The key question to answer is what must be operational as a functional whole?

This mean that the design time boundaries and the operational boundaries are not identical and for a solution to be successful the operational boundary is the most important. That said, to secure healthy operational boundaries, the internals of the system need to be well designed. Approaching a large complex domain with transactional scripts will most likely create problems.

In the next post the plan is to address how data management can be moved out of the functional services enabling data less services along the lines that data ages as wine, while software ages as fish …

See you all next time, any comments and questions are more than welcome.