The Magik programming language is imperative and leans heavily toward the Object Oriented Programming (OOP) paradigm. It relies on mutation, hidden state, side-effects and tight coupling between data and behavior.
Code-reuse is achieved primarily through inheritance.
The Functional Programming (FP) paradigm, on the other hand, is declarative and relies on immutable objects, minimizes side-effects and achieves loose coupling by
chaining functions together so they act as a pipeline through which data flows and is operated on.
Code-reuse is achieved by creating small, well-defined, cohesive functions and
changing behavior using higher-order functions.
Both paradigms can be used to produce Smallworld GIS software, but FP has several advantages because it:
- Makes writing automated tests easier and allows programs to be more thoroughly tested.
- Makes it simpler and less error-prone to write parallel and concurrent programs.
- Makes code easier to read and comprehend because of its higher-level of abstraction, which also makes debugging easier.
- Allows developers to write less code.
- Allows for better code reuse.
This results in higher quality and more scalable programs.
The Case for Functional Programming
Software today is more complex than at anytime in the past, and that has led to an
explosion in difficult to find bugs and incorrect programs that increase project and
maintenance costs, introduce performance problems and fail to meet users’
expectations.
Therefore it’s critical to develop software in a manner that is easy to write and
understand, simple to maintain and enhance, and straightforward to test and debug.
Popular programming paradigms such as Procedural and OOP, the imperative style, have proliferated over the years and dominate the software landscape. GE’s
Smallworld GIS language, Magik, is no exception in its support for OOP.
And although FP has been around for more than six decades1, it has, until recently,
been relegated to writing programs for academia and non-enterprise-level software.
However that’s beginning to change as tech giants such as Facebook, Amazon and
Twitter have chosen to go the FP route.
To understand why, it’s necessary to look at the different paradigms and identify why FP makes it easier to write better, more reliable programs with less effort2.
To start, we need to realize humans are not good at keeping even small amounts of
information in their heads, and that leads to software that falls short of acceptable.
Studies3 have shown the average person can only hold about four concurrent items in working memory. So as the number of control paths, variables and other program state begin to grow, programmers lose track of what’s happening – allowing bugs to creep in.
It is far easier to understand code if we only need to remember what that code does rather than trying to recall how it does it (such as what variables are changed or how data are flowing through conditionals and loops).
One of the main problems is mutable state. State changes (such as variables being
given new values) make programs more difficult to understand and harder to reason about. But it’s even worse than that because, usually, that mutable state is scattered throughout the application and shared between objects.
It’s no wonder programmers can’t keep track of what’s happening or write meaningful tests. And if we can’t understand our program’s behavior, it leads to difficult testability and debugging issues.
Stateful programs are also more difficult to run concurrently or in parallel, requiring
us to jump through all sorts of hoops (such as thread locking, mutex and other such mechanisms) to ensure different threads and processes don’t overwrite each other’s memory locations or file data.
Functional Programs, on the other hand, are ready to be run concurrently or in parallel because data is generally immutable and so won’t be modified twice by different threads (or even by the same thread). The upshot is deadlocks and race conditions can’t occur because we don’t need to implement locks.
Think about that for a minute. When you write in the FP style, your code is automatically thread safe.
Now before we continue, let’s take a step back and look at a real-world example.
Suppose we want someone to change the oil in our car. Is it easier to list all the steps (such as, jack up the car, remove the oil filler cap, put a pan under the drain, use a wrench to undo the drain plug, let the oil drain into the pan, place the oil filter
wrench over the oil filter and loosen it…) or simply to tell that person, “change the
oil”?
Obviously the latter is easier to understand and remember. Instead of telling the
person how to change oil, in small detailed steps, we’re saying what we want done.
Or to take the example further, suppose the various parts we need are scattered across different suppliers. Now we have to keep track of which supplier has which part and then string together a plan to obtain the parts before we can do the oil change. Wouldn’t it be better if we could encapsulate all that in one place, such as an oil change shop, and simply tell them we want our oil changed?
Of course it would.
The shop can handle the underlying details of obtaining the correct parts and performing the work. We don’t have to worry about these things so our cognitive load is significantly reduced.
And in much the same way, that’s the difference between imperative and declarative programming.
Declarative Programs are Easier to Write and Understand than Imperative Ones
Declarative paradigms describe what needs to be done rather than how it needs to be done, so requirements map more readily to the code and aren’t obscured by lower-level details.
Of course declarative methods still require small detailed steps that under the covers map to imperative code, but we don’t have to worry about that lower-level code and so we’re freed from having to keep too much information in our working memory (in fact, in our example, we only needed one piece of information: “change the oil”).
The specialist that changes our oil understands what we want and follows the
detailed steps. But the specialist has been doing this for a long time and does it
frequently, so the process is efficient and has been thoroughly tested and debugged.
In a similar way, by creating higher levels of program abstraction, the underlying,
nitty, gritty, details about how to perform the necessary tasks are hidden (in much
the same way an Operating System hides the details of writing to a disk). And since that code is generally tested, debugged and proven, it’s almost certain to do the task better than any code a typical programmer would write from scratch.
Further, with FP, state is typically isolated to a function’s scope and not promiscuously shared outside that scope. A good way to do that is to create immutable soft objects (i.e. objects containing only data, not methods) to transfer data through a chain of functions. Note the word, immutable. That’s important. Each function makes a copy of the soft object passed into it, performs its work on that copy and then passes the copy to the next function. This is the opposite of what OOP does because mutability is inherent in OOP.
We also generally try to write most of an FP application using pure functions and keep the impure code separated and well managed.
Easy-to-Read Code is Easier to Comprehend
According to Robert Martin, reading code takes more than 10 times the effort of
writing that code4.
And since imperative programs expose relatively low-level details to the programmer, who must then keep track of every single detail (such as loop indexes and the state of variables), readability is sacrificed.
In addition, as programs grow, complex structures with hidden dependencies are created and that results in it being very difficult to understand what happens when methods are invoked and what the side effects are.
If we want easy-to-understand programs then we must minimize code complexity, but OOP makes this difficult because it is fundamentally based on mutable state, abstractions that don’t map well to the real world and lots of boilerplate code.
We’ve already discussed mutable state, so let’s turn our attention to abstractions.
Languages with OOP-support are inherently imperative but add the class abstraction, and inheritance, in an attempt to improve code sharing – unfortunately that causes a number of additional problems.
Inheritance is a good idea for sharing code when problem structures are static, fully understood and the hierarchy is just one or two levels deep. Otherwise trouble is brewing. This is because inheritance breaks encapsulation. Changes to a superclass requires recompilation of all subclasses and could potentially break subclasses — in other words, the classes are tightly coupled.
For example, different types of cats share common attributes, so creating a Cat class and then sub-classing Siamese, Persian and Scottish Fold appears to make sense because we can define data and behavior, such as 4 legs, fur, run and, “meow” on the Cat class and share them with subclasses via inheritance.
The problem, however, is when we don’t fully understand the structure or the
structure changes in ways we didn’t anticipate. If, for example, we recently
discovered the Sphynx cat (which lacks fur), then the fact the Cat class shares fur with its subclasses, presents a problem.
Now we must go back and handle the exception.
Overriding the fur method to return nothing might come to mind, but if Sphynx inherits from Cat, it will still have a fur method that is inappropriate (after all, you wouldn’t want to take your Sphynx cat to an expensive cat salon, execute the fur method and then discover, to your great embarrassment, no fur is returned).
If, on the other hand, we remove fur from the Cat class, that may break existing code because objects depend on classes and assume specific data and methods will be inherited from their superclasses. If we change a superclass, that could cause problems.
But perhaps we decide to create two new classes: CatWithFur and CatWithoutFur then reorganize the inheritance tree appropriately. However we’re now cluttering our program with extra classes meant to handle a small exception. If we expand this trivial example to a real-world application, these problems increase by orders of magnitude.
Unfortunately this occurs frequently in the OOP realm because programs constantly change. Requirements change, enhancements must be implemented and new, unexpected code has to be written. But because OO programs are built as hierarchies of classes, unexpected requirements can be difficult to fit into the existing hierarchical structure.
The fact is that most real-world requirements do not fit nicely into a hierarchical structure — and even if a simple case initially did, as it evolves over time, the hierarchy no longer applies and then we have to jump through hoops and implement kludges in order to get things to work.
Erlang creator Joe Armstrong said, “I think the lack of reusability comes in object-oriented languages, not in functional languages. Because the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.“
In other words, you might want to invoke one method on a Class, but you can’t do that without loading the Class and all of its dependencies (such as Super Classes and other methods). And that leads to further complexity because the code for Super Classes and their Super Classes, and so on and so on and so on, is usually spread out across the codebase.
In reality, real-world objects are relatively independent of one another and don’t normally inherit behaviour from parents. Behaviour should also be separate from data, yet OOP tightly couples them together within a Class. And since that’s the structure OOP presents to developers, developers try to use it even when it doesn’t make much sense.
What we really want to do is decouple classes in order to eliminate strict dependencies, so rather than implementing an is-a relationship (such as Siamese is-a cat), we should be implementing a has-a relationship (like Siamese has-a fur) for the majority of things. And that leads us to composition over inheritance. This was a debate for many years, but the dust has finally settled and most developers now advocate using composition rather than inheritance because the latter creates problems when it comes to maintenance and testing — and because maintenance is the highest cost over the life of a system, and automated testing is crucial for delivering high-quality software, we want to eliminate or minimize issues in these areas.
With the FP paradigm, we use functions to share code and those functions are loosely coupled with data. We then compose these functions in different ways to form various applications. So our Cat class can be replaced with a set of small, cohesive Cat attribute functions executed sequentially in a pipeline.
If we come across a hairless cat, we simply create a new function to represent it and reuse all the relevant functions created for other cat types, except we remove the fur function from the hairless cat’s pipeline — or if we’re feeling particularly clever, make use of higher order functions.
Because the new function is independent and loosely coupled to not only the data,
but to other functions, we don’t break hierarchies – because we didn’t have static, pre-defined hierarchies to begin with. If we choose to implement hierarchies (via
closures5 for example), they are flexible and dynamic.
As a real-world example, think about an Apple MacBook that has no upgradeable parts — everything is soldered to the motherboard and nothing can be independently replaced. This is an example of tight coupling. If something breaks, the entire system needs to be sent out for repair or scrapped. A tower PC, on the other hand, has individual components (such as RAM, storage, CPU, graphics cards) that can be unplugged and swapped at will. If you want faster graphics, simply upgrade the graphics card. If RAM dies, replace it. This is loose-coupling in action and demonstrates why our software should not be tightly coupled.
The other major issue with OOP is the amount of boilerplate code it requires. Let’s say we want to add two numbers, think of what that entails using the OOP paradigm. We first need boilerplate to create a Class, then more boilerplate to define the constructor and then, finally, we can implement the add method. With FP, we simply define the add function, no boilerplate required, and we’re done.
Boilerplate hampers readability by requiring additional code that has nothing to do with the business logic. If you’ve ever found yourself copying and pasting reams of boilerplate in your code, then you understand what I’m talking about.
Of course we can implement composition in OOP through, say, dependency injection, but it’s much clunkier than doing it with FP (using, for example, higher order functions) and there is quite a bit of additional boilerplate required.
Therefore since OOP is all about mutating state (via variables, class
variables, instance variables and such), unsuitable hierarchical abstractions and requires boilerplate, we can see how OO programs become difficult to reason about, especially if there are myriad classes and a legion of objects all interacting with one another.
On the other hand, since higher FP abstractions encapsulate lower-level code and behaviour is loosely coupled with data and other behaviour, programs tend to be shorter, which further aids in understanding.
Functional Programs are Easier to Test
FP uses, unsurprisingly, functions as its main unit of encapsulation. Program
requirements are decomposed into small, cohesive functions, each doing one thing
well, and then these functions are composed into actual programs.
Functions can be of two types: Pure and Impure.
Pure functions rely solely on their inputs to produce their output, unlike OOP
methods that often return values based on class variables, superclass instance
variables and additional hidden, mutable inputs or side-effects from other methods.
Since pure functions don’t depend on hidden external values or mutable state, they
always return the same output given the same input. They are the closest thing in
programming to a mathematical function.
Because input(s) directly map to the output, the function’s output is easily predicted
from just its inputs. There is no state to keep track of (or forget to keep track of) and
code tends to be obvious and clear.
In fact, the value returned from a pure function can replace that function (called with specific arguments) in a piece of code (this is known as referential transparency) and thus makes functional programs far easier to test and debug.
The result is we can test functions in an application by only looking at the arguments they require and the values they return. There’s no need to call functions in a particular order or consider external state that may have been modified by another function or method (since our functions don’t rely on external state, held in slots or global variables for example, our tests are significantly simplified). And because our functions don’t modify external state, we only need to test the returned values. We just pass in arguments to exercise the function’s code (including edge cases) and if the tests pass, we’re confident our software will work.
Why?
It’s because we’ve thoroughly tested small pieces of code with a high degree of coverage. Since it’s easy to understand small chunks of code, we can write meaningful tests. Then, once all our functions have been individually unit tested, composing them inside a pipe to form an application does not add any errors.
Read that last part again. If our functions all pass, creating an application by composing these functions in a pipe will not add any errors. So we don’t have to test the entire application from end-to-end. We know our application is fully tested because we’ve tested the individual functions that make it up.
This is a huge advantage over OOP. With OOP, you can’t simply test a method in isolation because it relies on shared, mutable state and hidden inputs. Therefore you need to test the corresponding Class and all other Classes with which it interacts. So you’re writing tests for a huge amount of code… and that is very, very difficult to do. What usually ends up happening is developers write a few meaningless tests just to keep the project managers happy and move on. However meaningless tests don’t uncover bugs and these bugs don’t magically disappear. Even testers working through test scripts don’t end up finding a large number of bugs, which means bugs end up in production code and are found by end users — and that is never a good thing.
In addition, functional programs minimize mutable state (more on that shortly) and
since pure functions are free of side-effects, this leads to safer programming
techniques – which includes eliminating entire classes of bugs – and seamless
concurrent and parallel programming without the need for locks, semaphores and
other such techniques.
Of course not all functions can be effect-free, stateless, non-mutable pure
functions.
Writing data to files and databases or interacting with users, for example, are impure events by their very nature — however we want these particular impure events to occur.
For a functional program to be useful, it must contain some impure behavior. Therefore we can divide impure functions into two types: Managed Effects and Side Effects.
Managed Effects represent impure behaviour that is useful and we expect it – although impure, they still decouple data from behavior and thus are usually easier to test than their imperative counterparts.
Side Effects, on the other hand, represent unexpected behaviour which is something we want to get rid of.
In the FP world, we write as much of the application as possible using pure functions, separate that piece from the part containing impure, Managed Effects, functions and strive to eliminate impure functions that have Side Effects.
Pure functions make up the core and are surrounded by an impure, Managed Effects, shell that interacts with external components, users and the rest of the outside world.
This creates a definite separation between the pure and impure.
Testing is Not an Afterthought
As we’ve seen, pure functions are easier to test and lend themselves well to testing
methodologies such as Test-Driven Development and Property-Based Testing.
When we aren’t concerned with mimicking hidden state and wrestling with side-effects, testing becomes effortless because we know if we supply specific inputs, we’ll get a specific output.
And that is a game changer.
For decades testing has been an afterthought in the software world. Programmers
wrote code, performed cursory unit tests and then sent it to the testing group who
threw different inputs at the code hoping to find bugs.
Unfortunately, even today, Magik developers usually follow this line of thinking.
But that ad hoc way of testing has no place in today’s complex development cycle.
Testing must be built in from the very beginning. It is a foundational part of software development. Meaningful tests and high code coverage are no longer optional or done simply to check boxes in a SOW.
Testing must be done to find bugs and verify they’re eliminated.
However if we write code that is difficult to properly test, we make it difficult to write proper tests and achieve satisfactory coverage.
It’s much easier and simpler to test small functions that rely only on their inputs than it is to test large classes and subclasses consisting of behavior tightly coupled to data and other classes; classes that rely on hidden dependencies, mutated state and side effects (such code’s output is difficult to determine and thus difficult to test).
Testing OOP code can also be labour intensive because it is primarily example-based and manual, so it covers a small (sometimes insignificant) amount of code — because most developers don’t want to write hundreds or thousands of test cases.
This results in many projects being delivered despite having hundreds of bug reports still open. And the reason for that is because when bugs are found late in the game, they’re more difficult to fix. If a bug is found by a unit test written at the same time the code was created, it’s a trivial fix. However if that same bug is found just before a production implementation (or worse, in production), it is at least an order of magnitude more expensive to fix.
Furthermore, we’ve not even discussed bugs that are lurking below the surface but haven’t yet been found because of insufficient testing.
But that’s not all. The fact is, imperative code has far more execution paths than functional code (which usually has one or two paths of execution), so code coverage is likely to be inferior.
The bottom line is, if we make it difficult, or impractical, to test our programs, we
won’t properly test them. Sure, we may run a few rudimentary tests (with a handful
of very specific data inputs) simply to keep the project managers happy, but we won’t find bugs as efficiently as if we developed generalized, fundamental tests and used a property-based testing library, for example, to automatically generate thousands of test data inputs.
Of course we can do this best with pure functions (we can also do it with impure
functions, it’s just not as effective. And yes, we can sometimes do it with OOP methods, but it’s even less effective).
Testing code that relies on hidden state and side-effects is a bit like changing the oil in a Bugatti Veyron, where you have to remove 16 drain plugs, but only after removing parts of the underbody. Then you need to remove the rear fender liners and deck (by removing scores of bolts). It takes more than 20 hours. Just to change the oil.
A Toyota Corolla, on the other hand, takes less than 30 minutes and you don’t even
have to jack it up.
Which oil change would you be more likely to do?
In Conclusion…
Functional Programming is desirable because it promotes practices that make programming easier and safer. Application state flows through a series of functions which means state is confined to one place. Furthermore, it includes powerful ideas that eliminate entire classes of errors, make programs easier to reason about and therefore easier to maintain and enhance.
I’ve listed a few of these ideas below…
- FP avoids mutations.
- FP avoids side effects.
- FP avoids shared state.
- FP avoids hidden dependencies.
- FP uses pure functions as much as possible.
- FP uses functional composition.
- FP uses declarative rather than imperative code.
In addition, it has a sound theoretical background because it’s based on lambda calculus.
But there’s more… today it’s generally accepted that testing is a foundational requirement of software development. FP, with its pure functions and loosely coupled components, makes it significantly easier to properly test code in an automated fashion.
The MagikCheck property-based testing library works far better with FP-style
programs than with imperative ones (in fact just about all testing libraries and
frameworks do).
And when programs are easier to understand and reason about (because of their
higher level of abstraction), refactoring, maintenance and enhancements become
easier and that usually results in a lower total cost of ownership.
In addition, concurrent and parallel programming become very simple, because if we don’t mutate state, code is automatically thread-safe and doesn’t suffer from
synchronization issues6.
These benefits work together to produce more bulletproof code with less bugs.
Unfortunately Magik has almost no support for FP. However it does contain a basic
building block, the procedure, that can be used to construct a library that supports the FP paradigm.
MagikFP is such a library. It contains a number of powerful components, including prototypal objects, observables, monadic error handling and makes it easy to use currying and closures. The goal is to give all Magik developers an easy way to start using FP without having to reinvent the wheel.
Of course OOP is not all bad and brings a unique set of benefits to the table if used
correctly. Unfortunately OOP programmers have bought into a certain way of writing code that promotes bad habits, which leads to bad code (but that’s another article in itself). Still, some problems are better suited to an OOP solution while others favor FP.
By switching to a combination of FP and well-written OOP code, Magik applications can achieve the benefits of both paradigms.
If you are interested in seeing, first-hand, how FP can provide the benefits just described to your Smallworld Magik code, feel free to reach out and let me know.
End Notes
- LISP, the first Functional Programming Language, was invented in 1958,
https://en.wikipedia.org/wiki/Lisp_programming_language - B. Ray, D. Posnett, V. Filkov, P. Devanbu, “A Large Scale Study of Programming
Languages and Code Quality in Github,” 2014,
https://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf - Clara Moskowitz, “Mind’s Limit Found: Four Things at Once,” April 2008,
https://www.livescience.com/2493-mind-limit-4.html - Robert C. Martin, “Clean Code,” Chapter 1, 2009,
https://www.amazon.ca/Clean-Code-Handbook-SoftwareCraftsmanship/dp/0132350882 - Closures, https://en.wikipedia.org/wiki/Closure_(computer_programming)
- Neal Ford, “Immutability,” July 2011,
https://www.ibm.com/developerworks/library/j-ft4/