One of the major issues with integrating systems is the amount of code required to make things work. Conceptually integration is simple, yet actual implementations usually end up being quite complicated.
There are a number of reasons for this, including the fact many integrations tightly couple business logic with data transformation and transportation code, but the main culprit is glue code.
System Communication
Anyone who has ever tried to make two different systems talk to one another has written or implemented glue code. It’s the code that handles all the little idiosyncratic manifestations in individual applications.
And that code has a tendency to grow quickly and become unwieldy if not carefully managed – unfortunately we tend to ignore it, thinking it’s unimportant to the overall architecture and thus it is rarely managed at all, let alone carefully managed.
Alas, this line of thinking comes back to bite us as the project grows. Glue code grows in proportion to the square of the number of integration points and creates invisible dependencies as well as technical debt that’s rarely repaid.
The number of lines in our application code, which is the main focus of the initial integration, can be dwarfed by the, “support,” code (such as customizations to pipelines, configurations, middleware, database access, queues and other such code added in order to glue applications together).
To get out of this dilemma, we need to create suitable abstractions. In other words, we want to hide complicated lower-level implementation details behind easy-to-use APIs thereby creating standardized integration touchpoints that can be reused as additional applications are integrated.
We want to hide complicated lower-level implementation details behind easy-to-use APIs thereby creating standardized integration touchpoints that can be reused as additional applications are integrated.
But there are different levels of abstractions and these must be carefully managed if we’re to gain a benefit and not simply replace one type of unwieldy codebase with another. The important point is that each layer of abstraction must be cohesive – that is, concentrate on doing one thing only and doing it well.
Transporting Data
The first abstraction is to insulate developers from having to write code to transport data. Developers should be able to work with objects and not be concerned with how those objects are serialized and transported between systems.
Under the covers, of course, these details are necessary, but since they must be done over and over again for every integration project, it’s a prevalent pattern and a good place to start pulling out common functionality. Therefore we need a standard abstraction to transport data.
Over the years there have been many attempts to do this but, for the transportation piece, the two most commonplace ways today involve RESTful APIs and GraphQL. These will surely be replaced by other technologies in the future, but for now at least, they represent a semi-universal, standardized method of communicating between applications.
GraphQL is the newer of the two and has a number of advantages over REST, however since GSS does not yet support GraphQL (and we usually want to use GSS for Smallworld integrations), we’ll concentrate on using RESTful APIs. However if we write the underlying code correctly, we should be able to plug out REST and plug in GraphQL (or any other technology), once support is available, without affecting the application.
In fact, if we REALLY get things right, we should be able to unplug an entire GSS implementation and replace it with another product that does something similar. For an idea on how this might be accomplished, look at this article (but let’s not get too far ahead of ourselves just yet).
So that’s the first abstraction: develop a layer that takes an object and sends it to another system in a standardized, lightweight, loosely-coupled manner.
Transforming Data
Next we need to represent data in a standardized format (the transformation piece). This is very difficult and much of the aforementioned glue code exists because of this requirement. In fact entire companies have arisen promising to solve this exact problem. Yet it still exists – even while many of these companies no longer do.
Delivering data from a source to target application requires the two applications understand what’s being sent and received. The source needs to send data in a format the target can understand – or in most systems, a middle layer is added to parse and reformat the source’s data into the target’s domain.
This is the data integration piece – and it is a really difficult problem.
How does the target understand what the source is sending?
At a high level, this can be broken down into two parts: standardized parsing and data semantics.
Standardized Parsing
Standardized parsing is relatively easy and formats such as XML and JSON handle this well. XML was the granddaddy of formats in the past, but JSON has supplanted XML and is the current cool kid on the block.
JSON exists to deliver a data payload between systems. The idea is to find the correct abstraction for JSON (on the Magik side we can use a beeble object. On the external system’s end we should use whatever makes sense in that domain).
That’s the second abstraction: create a layer that serializes an object to a standard format that can be parsed by other systems.
Of course we know something else will come along to replace JSON, so we better keep this in mind and abstract the format layer so when, not if, it changes, we can plug in the newest shiny technology without affecting our applications.
So far we’re doing quite well with our abstractions. We’ve created RESTful API and beeble object abstraction layers that move reusable data transportation and transformation functionality out of the applications so they can focus solely on business logic.
We’ve created RESTful API and beeble object abstraction layers that move reusable data transportation and transformation functionality out of the applications so they can focus solely on business logic.
At this point, however, the free ride is over and things move up quite a few notches in difficulty. Despite the fact our applications only have to deal with objects and we can Magikally (that never gets old does it?) send data in a standard JSON format via HTTP and TCP/IP, we still have the glaring problem of the target application needing to understand what the source application is sending it.
Data Semantics
The difficulty is that disparate systems have different semantics and meanings for data and just understanding these differences is hard enough – codifying them shoots the complexity level into the stratosphere.
It’s not uncommon to see integrations that implement thousands of lines of glue code in an attempt to solve the issue. This is definitely one part that can’t be standardized because each application is as individual as a fingerprint or snowflake.
There is no standard for universally integrating semantics across the board. Yes, there have been attempts to create standard grammars (or vocabularies) for various industries, but these vocabularies just about always need to be customized in much the same way, “industry standard,” out-of-the-box applications need to be customized.
Unless every application is forced to adhere to a standard (much like browsers adhere to HTML, HTTP and TCP/IP standards), this problem will remain unsolved at the application level.
Of course there’s no foreseeable road to universal application agreement so, as I’ve already mentioned, glue code grows as the square of the number of applications that need to be integrated expand.
And that’s why the mountain of glue code exists. In the old days of tightly coupled monolithic applications, the glue was intermixed with the business logic. That was, and still is, a very, very, very bad way of doing things.
As applications get larger, via enhancement requests, or as the need to communicate with new applications arise, complexity increases exponentially. It can get to a point where changing one small part of the monolith breaks multiple other pieces because the number of moving parts is just too much for anyone to understand or reason about.
Loosely Coupled Applications
Rather than creating gigantic monoliths, a better way is to create loosely coupled applications that focus mainly on business logic and ideally don’t know about one another. But that doesn’t mean the glue code disappears. In fact it’s simply moved out of the individual applications and into other parts of the system – such as the middleware.
This may mean glue code becomes even more complex, but, in theory at least, it’s reusable and more manageable so if an application needs to be enhanced, it can be done without breaking the transformation, transportation and business logic pieces.
In olden times (well… let’s say last year), a company’s applications were usually on prem and thus they could directly access the internal network or a shared file system or perhaps even a database all managed by the company. So integrations were usually done on an ad hoc basis in a quintessentially non-standard, point-to-point manner.
Today, however, systems are distributed and many companies have their applications running remotely (such as in the cloud). This plays a significant role in determining an effective integration architecture and its associated implementation.
The network, now no longer completely internal, connects far-flung applications that could just as easily be half-a-world away as in the next room. A company has no control over the global network’s availability and thus architectures must account for cases when remote applications are unavailable if integrations are to function correctly.
The rush to distributed systems has led to a marked increase in external dependencies and therefore, integration points. With microservices — especially — availability, latency and bandwidth must all be carefully considered lest data ends up disappearing into the gaping maw of some remote API. With myriad moving parts, chances of something going wrong increases in proportion to the number of dependencies.
That’s not to say the old way was better, though, because the cloud (and its ilk) provides access to vastly more services, resources and raw power, exploiting the ability to reuse functionality in ways not even dreamt about during the reign of the monstrous monolithic applications from yesteryear — where increasing technical debt constantly added complexity to maintaining and enhancing systems.
Of course if you’ve ever worked on a production system, you already know it’s rare that technical debt is ever repaid. Nobody comes in and refactors the code, because there is no time or budget – given things are, “working.” The result is systems evolve into big, tangled balls of mud that eventually lead to situations where everyone avoids touching them because the complexity is enormous, nobody understands them and maintenance developers are afraid of breaking something.
So what’s the solution?
Using frameworks and libraries that accept standardized plugins while carefully architecting interfaces, observing current standards and making liberal use of abstractions is the best way to contain runaway glue code and keep systems maintainable and manageable well into the future.
While it’s true we can’t eliminate the glue, we can certainly contain it. And we can do that with further abstractions implemented by libraries and frameworks our applications can run on top of – in much the same way Java runs on a virtual machine. If the Java Virtual Machine (JVM) decides to implement, say, garbage collector functionality in another way or use a completely different underlying technology, the millions of applications running on it don’t break or need to change, only the JVM’s code changes.
And that’s the third abstraction: create frameworks and libraries to hide the underlying complexities of data semantics. While not as cut-and-dry as the first two, it is still useful.
Using frameworks and libraries that accept standardized plugins while carefully architecting interfaces, observing current standards and making liberal use of abstractions is the best way to contain runaway glue code and keep systems maintainable and manageable well into the future.
Magik Libraries and Frameworks
To that end I’ve written the MagikFP library as well as the GSS Magik Framework.
Libraries are self-explanatory. Developers write code that call library functions in order to use functionality exposed by the library. Application code is always in control and decides when and what library functions to invoke.
The framework, on the other hand, inverts control. So once application code triggers, or registers with, the framework, the framework takes over and calls back into the application code by way of hooks registered with it (passing the application simple objects containing the external data and metadata).
This is how applications are notified when an external system sends them data and how return values from asynchronous calls to target applications are handled.
The GSS Magik Framework provides a flexible and intuitive way to use GSS to communicate with external systems. It allows applications to be integrated in a standardized and effective manner in order to efficiently deliver Smallworld solutions at unprecedented speed, and it insulates an application from the underlying, low-level GSS code.
In effect it provides a mechanism for loose coupling between applications and GSS. The benefit is that applications need not be aware of GSS, so if a future version of GSS introduces a breaking change, only the framework needs to be altered to work with it.
Applications running on the framework will continue to run without modification because the underlying data transportation, transformation and semantic pieces have been abstracted away by the framework.
The MagikFP library brings the Functional Programming paradigm to Magik. As applications become more complex, creating loosely coupled components is paramount — so applications can be easily maintained and enhanced.
The objective of MagikFP is to provide a standard library of functions that can be reused to build functionality in a highly readable, straightforward manner using industry standard concepts and processes. Customized library functions can be independently created and plugged in to handle non-standard requirements when necessary.
MagikFP also includes the MagikCheck Property-based Testing library that hooks into MUnit so automated testing can be easily implemented. Once properties are defined, hundreds, or thousands, of randomized tests can be executed with the push of a button.
The combination of MagikFP, MagikCheck and the GSS Magik Framework provides a powerful set of tools that enable developers to create cohesive functions that exhibit a high degree of reusability, quickly implement industry-standard integrations and write tests that automatically cover nearly one hundred percent of the code.
Further, the functional programming paradigm supports a high degree of scalability and the ability to directly map functionality to requirements in a one-to-one manner.
If you are thinking of implementing a GSS-based Smallworld solution to integrate applications, using libraries and frameworks to abstract away low-level, tightly coupled code is an excellent way to ensure your applications are highly maintainable and can be readily enhanced well into the future.