My first computer was an 8-bit Netronics ELF II with 256 Bytes (yes, BYTES), that ran an RCA 1802 CPU at 1.76 MHz (yes, MegaHertz). It didn’t have any external storage and provided a hex keypad for input of Machine Language (yes, Machine Language – not even assembly language) programs.

Suffice it to say, I really couldn’t do much with it. But it served to teach me how to program, how to deal with binary numbers and how computers actually worked at the hardware level.
I spent countless hours filling those 256 bytes of RAM with all sorts of programs – that would disappear the moment I disconnected the power.
It was the best $149 I have ever spent on computer gear.
My next computer was an 8-bit Radio Shack TRS-80 Color Computer with 4 KB (2340 Bytes usable because the OS reserved some RAM for itself) running a Motorola 6809E at 1.79 MHz and featuring cassette tape storage, a chicklet keyboard and built in BASIC.
Neither of these computers promoted program readability over how much you could cram into their limited memories using all sorts of shortcuts and unintelligible tricks. If you wanted to run a program, you had to take into account memory size and performance. Otherwise your program wouldn’t work.
A prime example of this was when, in 1983, I wrote a text Adventure game in BASIC called, “The Ring Adventure.” I sold it through ads in a computer magazine, so it had to run on various Color Computer configurations — some with a limited amount of memory.

In order for that to happen, I had to remove unnecessary spaces and combine lines. The result was tightly packed code that saved memory, but was not very easy to read.
Take a look at one page of the program printout…

Not a model of readability, but back then readability came in second to getting the program to fit into memory.
Today, however, computers are massively powerful with tons of RAM and storage and we have optimizing compilers for performance, so there is no excuse to create cryptic source code.
None whatsoever.
And because the lion’s share of costs are in the maintenance phase of an application’s life, readability (and thus, maintainability) should take precedence over other concerns.
As such, I’ve looked back on my decades of writing programs and put together this guide explaining what I’ve found that works. I’ve tailored this to Magik, but most of the concepts are transferable to other languages.
Keep in mind this is a guide, not a set of laws. If there’s a good reason to stray, by all means, go ahead and stray. But following these guidelines will help produce cleaner, more understandable code that can make your programs easier to maintain and better performing.
Variable Names
Variable names should explicitly state what the variable stands for and is usually a noun or an adjective followed by a noun. Vague abbreviations should not be used. So using first_name rather than fn and postal_code rather than pc is preferred. In addition, I like to prefix my variables with a scope identifier. If a variable has local scope in a method or procedure, I use the, “l_” prefix (e.g. l_first_name).
For parameters, I use, “p_” (e.g. p_postal_code).
I write global variables prefixed with, “g_” (e.g. g_cli_pointer) and constants in all caps (e.g. PI = 3.14159) unless the constant refers to a procedure, in which case I write it with no prefix as in the following…
_constant apply_fn << _proc @apply_fn(x, f) _return f(x) _endproc
As an example, try to guess what the following variable names represent…
- fn: does this represent a function? A field name? Something else?
- add: is this an arithmetic operation?
- s: this can represent so many things.
- tn: is this a table name?
- vld: ummm….
For the examples above, you’d most likely have to dig around in the code to determine what these variables contain.
But what about if we renamed them…
- l_first_name: a local holding someone’s first name, most likely a string.
- p_address: a parameter passed in holding an address.
- l_sum: a local holding a sum, probably a number.
- g_table_name: a global holding the name of a table.
- MAX_NUM: a constant holding the maximum value a number can be.
- l_isValid?: a local holding the result of a validation. Probably a Boolean. Note I’ve used the Magik convention of appending a question mark to denote Boolean values.
Even without the surrounding context, I’m confident most people could figure out what values these variables were designed to hold. So rather than wasting effort trying to find clues about what’s going on in your programs, just use meaningful names. It will pay dividends to you and your colleagues.
The other decision is whether to use underscores, camelCase or PascalCase.
As you can see from my examples, I prefer underscores. However this wasn’t always the case.
When I first started learning Magik I had already been programming in other languages for about 12 years, and most of those languages were statically typed.
I was working on a project with a very experienced Magik programmer. He left for the day and I worked late. The next morning he came in and smiled when he saw my code. “Is that a convention you use for another language?” he asked.
“Yes.”
“I see,” he replied and didn’t say anything else.
I had used camelCase variable naming to identify what type a variable should contain. Although Magik isn’t statically typed, I named my variables to denote what type I expected these variable to contain: such as strFirstName and intCounter.
It made sense to me at the time, however I’ve since discovered, for Magik, descriptive names with underscores are the way to go.
Of course when I write in other languages, such as JavaScript, I automatically revert to the standard conventions they use – and therein lies the rub: you should use the conventions of the environments you are writing in because it ensures others in those environments quickly understand what your code is doing.
But whatever you decide to use, be consistent.
And, finally, although I shouldn’t have to say it, I will anyways. Don’t use variable names that can be easily confused with Magik keywords. Here’s a list of names you shouldn’t use.
_pragma
_block _endblock
_handling _default
_protect _protection _endprotect
_try _with _when _endtry
_catch _endcatch
_throw
_lock _endlock
_if _then _elif _else _endif
_for _over _while _loop _endloop _continue _leave _finally
_loopbody
_return
_local _constant _recursive _global _dynamic _import
_private _iter _abstract _method _endmethod
_proc _endproc
_gather _scatter _allresults _optional
_thisthread _self _clone _super
_primitive
_unset _true _false _maybe
_is _isnt _not _and _or _xor _cf _andif _orif
_div _mod
_package
_no_way _locking
And a few additional names you should avoid…
def_mixin
def_slotted_exemplar
define_slot_access
define_pseudo_slot
define_property
def_property
define_shared_variable
define_shared_constant
define_condition
register_new
register_application
sw_patch_software
Method and Procedure Names
The same logic holds when naming methods and procedures (as well as classes and, in reality, everything else).
Since methods and procedures encapsulate functionality that is executed, they should indicate what that functionality is. Therefore a verb (e.g. log) , a verb followed by a noun (e.g. create_user) or a verb, adjective and noun (e.g. parse_json_string) should be used. However you should limit names to 20 characters or less and preferably use fewer than 15 characters.
If you find your method can’t be described by one succinct name, then it’s probably doing too much (methods and procedures should do one thing only. If they’re doing too much, split them).
So parse_json_string() is preferable to parse() or even parse_string(). The more specific you can get, the better.
You’re telling the computer what to do and what data to do it on, so put this information in the name when possible.
Avoid Generic Names
You should usually avoid generic names such as process, handle or get. Outside of rare special cases, it’s generally better to be more specific (e.g. process_users, handle_error, or get_price).
Don’t Use Unnamed Procedures
Magik procedures are severely underutilized by programmers. It’s almost as if they’re taboo because Magik is supposed to be Object Oriented. But Magik implements methods using procedures, so under the covers, procedures are the engine that runs the Magik machine.
To see this in action, look at the following code.
Magik> polygon_mixin.method(:check_dimension|()|).value.class_name
:procedure
The method check_dimension on polygon_mixin is a procedure. This holds true for all methods written on all classes. So when you invoke a method, you’re running a procedure under the covers.
Pretty cool eh?
Of course using procedures to write procedural code is rarely a good idea, but using them to implement functional programming techniques can make your code more robust, readable and easier to maintain.
However many programmers simply define procedures and assign them to global or local variables. The procedures below do just that.
_global unnamed_proc <<
_proc()
write("unnamed 1")
_local l_second <<
_proc()
write("unnamed 2")
_local l_third <<
_proc()
write("unnamed 3 error...")
gis_program_manager.beeble()
_endproc
l_third()
_endproc
l_second()
_endproc
Line 12 throws an error and causes a traceback. But since we didn’t label our procedures, they’re unnamed and therefore we don’t know, from the traceback alone, which procedure contains the error. This can make it more difficult to understand what’s happening when debugging and requires additional effort to look through the code.
Executing unnamed_proc() results in the following output.
Magik> unnamed_proc()
unnamed 1
unnamed 2
unnamed 3 error...
**** Error: Object a sw:gis_program_manager does not understand message beeble()
does_not_understand(object=a sw:gis_program_manager, selector=:|beeble()|, arguments=sw:simple_vector:[1-0], iterator?=False, private?=False)
---- traceback: Alchemy-REPL (light_thread 425772074) ----
time=24/01/2020 14:23:16
sw!version=5.2.1.0 (swaf)
os_text_encoding=cp1252
!snapshot_traceback?!=unset
condition.raise() (sys_src/guts/condition.magik:616)
object.does_not_understand() (sys_src/guts/object.magik:810)
object.sys!does_not_understand() (sys_src/guts/object.magik:684)
_()
_()
<unnamed proc>()
<unknown exemplar>.<unknown method> (Evaluated-inline:1)
magik_rep.process_command() (sys_src/misc/magik_rep.magik:136)
magik_rep.cli() (sys_src/misc/magik_rep.magik:90)
system.session_start() (sys_src/guts/system.magik:3187)
Magik>
Notice lines 18, 19 and 20. We can see procedures being invoked, but we don’t know which ones… and that means we have to trace through the code to determine what’s happening.
Fortunately, procedures can have labels attached to them. Look at the rewritten procedures (renamed named_proc) below.
_global named_proc <<
_proc @first()
write("named 1")
_local l_second <<
_proc @second()
write("named 2")
_local l_third <<
_proc @third()
write("named 3 error...")
gis_program_manager.beeble()
_endproc
l_third()
_endproc
l_second()
_endproc
We’ve added labels to our procedures in lines 2, 6 and 10. Now when we invoke them, we see the following output.
Magik> named_proc()
named 1
named 2
named 3 error...
**** Error: Object a sw:gis_program_manager does not understand message beeble()
does_not_understand(object=a sw:gis_program_manager, selector=:|beeble()|, arguments=sw:simple_vector:[1-0], iterator?=False, private?=False)
---- traceback: Alchemy-REPL (light_thread 425772074) ----
time=24/01/2020 14:45:42
sw!version=5.2.1.0 (swaf)
os_text_encoding=cp1252
!snapshot_traceback?!=unset
condition.raise() (sys_src/guts/condition.magik:616)
object.does_not_understand() (sys_src/guts/object.magik:810)
object.sys!does_not_understand() (sys_src/guts/object.magik:684)
third()
second()
first()
<unknown exemplar>.<unknown method> (Evaluated-inline:1)
magik_rep.process_command() (sys_src/misc/magik_rep.magik:136)
magik_rep.cli() (sys_src/misc/magik_rep.magik:90)
system.session_start() (sys_src/guts/system.magik:3187)
Magik>
As lines 18, 19 and 20 show, we can see exactly which procedures were invoked (as well as their order of invocation) and we have a pretty good idea the problem occurred in the procedure labelled third().
But surely, you might be thinking, the global or local variable name explains what the procedure does.
Yes, that’s true, but only if you are looking at the assignment in the code.
If the procedure shows up in a traceback, or in other places, additional digging will have to be performed to determine what it does. However, adding a meaningful name tells you at a glance where the problem lies.
Now imagine you revisit this code two years from now. Wouldn’t a nice name help you re-familiarize yourself with the code?
Sure it would. And it will help anyone else that has to work on your code.
And while there is a case to be made that short inline procedures don’t necessarily have to be named, because they’re generally trivial to understand and the potential for bugs in such short procedures is less likely, it doesn’t hurt to label them anyways… and that’s what I recommend.
Always label procedures!
Always Explicitly Declare Local Variables
Local variables and constants in a method or procedure should be declared, using _local or _constant respectively, as close to the top of the block to which they belong (assuming that makes sense).
If you need a variable in a block (such as a _loop or _if block) declare it at the top of that block (not outside the block).
Keep your variables scoped to the block in which they will be used and ensure you don’t scope them any wider. By doing this, you minimize unintended side-effects and make your code easier to read and debug.
Look at the code below.
_method area.as_beeble_geojson(p_excludes, _optional p_out_stream )
_constant COMMA << beeble_geojson.comma
_constant COLON << beeble_geojson.colon
_constant LEFT_CURLY << beeble_geojson.left_curly_bracket
_constant RIGHT_CURLY << beeble_geojson.right_curly_bracket
_constant LEFT_SQUARE << beeble_geojson.left_square_bracket
_constant RIGHT_SQUARE << beeble_geojson.right_square_bracket
_local l_out << p_out_stream.default(!output!)
_local l_not_first? << _false
_local number_format << _proc@number_format(n)
>> beeble_geojson.number_format.format(n)
_endproc
beeble_geojson.write_elements(l_out,
LEFT_CURLY,
beeble_geojson.as_string("type"),
COLON,
beeble_geojson.as_string("Polygon"),
COMMA,
beeble_geojson.as_string("coordinates"),
COLON,
LEFT_SQUARE)
_for l_boundary _over _self.boundaries()
_loop
_local l_nf? << _false
_if l_not_first?
_then
beeble_geojson.write_elements(l_out, COMMA)
_else
l_not_first? << _true
_endif
beeble_geojson.write_elements(l_out, LEFT_SQUARE)
_for l_sec _over l_boundary.sectors()
_loop
_for l_coord _over l_sec.fast_elements()
_loop
_if l_nf?
_then
beeble_geojson.write_elements(l_out, COMMA)
_endif
l_nf? << _true
beeble_geojson.write_elements(l_out,
LEFT_SQUARE,
number_format(l_coord.x),
COMMA,
number_format(l_coord.y),
RIGHT_SQUARE)
_endloop
_endloop
beeble_geojson.write_elements(l_out, RIGHT_SQUARE)
_endloop
beeble_geojson.write_elements(l_out, RIGHT_SQUARE, RIGHT_CURLY )
_endmethod
Notice lines 3 to 8 are constants declared at the top of the method. Lines 10 and 11 are locals declared right after the constants. Line 13 declares a procedure, but since it is still a local variable, it is also declared at the top of the method.
Lines 27, 40 and 42 declare local variables in the loop control statement that follow the naming convention for local variables.
Line 29 declares a local variable scoped to the loop block because it is not required outside that scope (this allows the same variable name to be re-used in other blocks without side-effects). Note it is also declared at the top of the block.
Finally, all locals are declared with either the _local or _constant keywords (or as part of a loop control statement) so the intent of their scope and use are explicitly stated, which makes the code cleaner and easier to understand.
Having said all that, there are exceptions. For well-known index variables (such as e and i), I declare them in the loop control statement and don’t usually follow the naming conventions (as such I won’t name these variables l_element or l_index simply because they’re so well known it’s obvious what they represent and how they’re scoped).
To further illustrate the reason to explicitly declare variables rather than relying on implicit declarations, look at the following code.
_global decl <<
_proc @decl()
_global name
name << "Zaphod"
_proc()
_block
name << "Beeblebrox"
_endblock
write(name)
_endproc()
_endproc
In line 4 we declare the global variable, name, and assign it a value in line 5. Then in line 10 we change the value of the variable.
When we invoke the procedure, we get the following output.
Magik> decl()
Beeblebrox
Magik>
“Beeblebrox” is output because we changed the global variable from within the block. Now this might be what we intended, but it’s also possible we actually wanted to reuse name as a variable that’s local to this block — shadowing the global name.
This could potentially cause a problem. If we reused name, in line 10, with the intent of it being scoped locally to the block, it would cause a side-effect (because rather than using a shadowing local variable as we intended, we’d be assigning to the globally-scoped variable instead, which could result in a bug somewhere else in the program). And this is the sort of error that can be difficult to test for and find.
However if we changed line 10 to declare name locally, using the _local keyword…
_global decl <<
_proc @decl()
_global name
name << "Zaphod"
_proc()
_block
_local name << "Beeblebrox"
_endblock
write(name)
_endproc()
_endproc
… then the output is…
Magik> decl()
Zaphod
Magik>
Notice how the global variable is not modified from within the block (so the output is, “Zaphod”). There is no side-effect that changes the global variable’s value. The local name variable is scoped only to the block. Anyone reading this code explicitly understands the variable is locally scoped.
If you really wanted to modify the global name, then you would omit the _local keyword. But if your usual habit is to rely on implicitly declaring variables, it would be difficult to determine if your intent was to use the global variable or implicitly create a local one. These are the types of things that allow insidious, difficult-to-find bugs to creep into your code.
Additionally, accessing local variables is much faster than accessing globals, dynamics and slots. So if your code repeatedly references a value from a global, dynamic or slot (especially in loops), declare a local variable with the value and use the local instead.
I’ve found these rules help make code more readable, groups variables by scope so they’re easy to find, reduces side-effects and reduces the possibility of erroneous re-declarations (because the compiler will tell you in no uncertain terms).
Always Initialize Variables
It is good coding practice to initialize variables when you declare them, even if you initialize them to _unset.
Why?
Because it will make your intentions clear and you can say stuff such as…
“Yes, I meant to initialize this variable to _unset, so there!”
rather than…
“Whoops, I forgot to initialize this variable and now it’s _unset.”
Furthermore, if you’re importing a variable, performance is better if you initialize the variable when you define it rather than at a later time.
.
.
.
_local l_self << _self
_proc @add_zaphod()
_import l_self
write("Zaphod ",l_self.last_name)
_endproc()
.
.
.
Notice how the local variable l_self was initialized at the same time it was defined in line 4. This is more efficient than doing the following…
.
.
.
_local l_self
l_self << _self
_proc @add_zaphod()
_import l_self
write("Zaphod ",l_self.last_name)
_endproc()
.
.
.
In line 4 the local l_self is defined and then in line 6 it’s assigned a value. Doing it this way is less efficient than initializing the value when you declare the variable.
So by initializing local variables you can get better performance, write cleaner code, avoid inadvertent _unset values and provide a single place for variable initialization.
All good things.
Avoid Global Variables
Back in the days of BASIC, every variable was global. And our code was littered with GOTO statements. This produced difficult-to-follow code that was also hard to debug because of the spaghetti-like control-flow and the massive potential for side-effects.
Today, we rarely see GOTO statements, but, unfortunately, global variables are still being used when they shouldn’t be. I can’t think of many scenarios where a global is required. Variables should be scoped to the block of code they’ll be used in – and no more.
Beyond accidentally changing state for another routine linked by a global, globals might be overwritten, removed or changed by external methods and procedures. Therefore it’s a necessity to use locally scoped variables.
Look at the code below.
_global test_global <<
_proc ()
_global g_var
write("g_var from outside proc= ", g_var)
_if g_var _is 100
_then
write("Your very important imaginary files are safe!")
_else
write("Your very important imaginary files have all been deleted!")
_endif
g_var << 404
_endproc
Line 5 writes out the value of the global variable retrieved from the global environment.
Line 7 tests to see if the value is equal to a specific value (i.e. 100). If so, things are good, otherwise things are bad. Very, very bad.
Line 14, changes the global variable to 404.
Now look what happens when we invoke the procedure…
Magik> g_var << 100
100
Magik> test_global()
g_var from outside proc= 100
Your very important imaginary files are safe!
Magik> g_var
404
Magik> test_global()
g_var from outside proc= 404
Your very important imaginary files have all been deleted!
Magik> g_var
404
Magik>
In line 1, we set the value of the global variable g_var to 100 (which is the sunny day outcome we want to see when we invoke test_global).
Line 5 shows the value of g_var that was imported from the global environment. It’s 100, so things are good when we invoke test_global() and your very important imaginary files are safe.
However line 9 shows the global variable’s value is now 404. Although test_global() made that change, keep in mind something else could have just as easily done this. Because g_var is a global variable, other methods or procedures could have changed the value from that magic 100.
When we invoke test_global() again (line 11), this time things are not so rosy. The value of g_var is now 404 and therefore your very important imaginary files have all been deleted. Go directly to jail! Do not pass Go! Oh No!
And all this happened because your program relied on global state that could be changed from anywhere. So avoid global variables!
Avoid Dynamic Variables, Except…
Dynamic variables are global variables with a twist. They act as globals, but if their values are changed within a block, that change persists only within the scope of that block. Outside that scope, dynamics retain their original values.
Look at the following code…
_global test_dynamic <<
_proc ()
_dynamic !print_length!
write("print_length from outside proc= ", !print_length!)
!print_length! << 100
write("print_length changed in proc= ", !print_length!)
_endproc
In line 4 we’re getting the value of the !print_length! dynamic variable from the global environment, then writing it out. We then change the value to 100 (line 7) and write it again.
Now look at what happens when we run the code.
Magik> !print_length!
200
Magik> test_dynamic()
print_length from outside proc= 200
print_length changed in proc= 100
Magik> !print_length!
200
Magik>
Line 2 shows the value of !print_length! in the global environment (i.e. 200).
When we invoke test_dynamic(), it prints the value it retrieved from the global environment (i.e. 200 in line 5) and then writes the modified value (i.e. 100 in line 6).
After the procedure completes, we write out the value again from the global environment (line 9) and see it is unchanged from its original value of 200.
So what does this really mean for developers?
Basically it’s another way to pass a parameter to a method or procedure. However I would refrain from using it for the same reason I wouldn’t use a global — it adds an implicit dependency to your code.
If your code requires a value, it should be explicitly passed in as a parameter or retrieved from the object’s state. Using global (or dynamic) variables makes your code dependent on the value of that variable… and that variable might change at any time outside your code’s control, so your code will be difficult to follow and test.
Unfortunately Smallworld uses a few dynamic variables to hold important data, so my recommendation is to use what Smallworld has provided when necessary, but don’t define new dynamic variables in your own code unless there is a very good reason to do so.
Make Good Use of White Space
The overarching point to keep in mind is that readability is paramount when writing code.
Good use of white space makes code understandable, easy to read and reduces strain on users’ eyes and fatigue on their brains. So just because the compiler will ignore most white space, it doesn’t mean you should.
Take a look at the code below…
_method ro_indexed_collection_mixin.beeble_reduce(p_callback,_optional p_initial_value)
_if _self.empty? _then _return _self _endif
_local l_indx<<1
_local l_accumulated_value<<p_initial_value.default(0)
_for e _over _self.fast_elements() _loop _if e _isnt _unset _then l_accumulated_value<<p_callback(l_accumulated_value,e,l_indx,_self) _endif
l_indx +<<1 _endloop
_local l_return<<_self.new(1)
l_return[1]<<l_accumulated_value
_return l_return
_endmethod
Although it’s quite short, can you easily follow the logic? Not so much?
Here’s that same code, but with white space added…
_method ro_indexed_collection_mixin.beeble_reduce(p_callback, _optional p_initial_value)
_if _self.empty?
_then
_return _self
_endif
_local l_indx << 1
_local l_accumulated_value << p_initial_value.default(0)
_for e _over _self.fast_elements()
_loop
_if e _isnt _unset
_then
l_accumulated_value << p_callback(l_accumulated_value, e, l_indx, _self)
_endif
l_indx +<< 1
_endloop
_local l_return << _self.new(1)
l_return[1] << l_accumulated_value
_return l_return
_endmethod
Much better, isn’t it? That’s what strategically adding white space can do for you.
So what changed? First, I indented nested blocks, one indent per block, so we could quickly follow the logic in the related block at a glance.
Next I put spaces around operators and parameters. I like to add one space after commas and put spaces around operators. I also added extra lines between blocks and some statements to reduce eyestrain when reading the code.
The code still compiles to the same bytecodes as the first version, but working with it is so much easier.
There are many ways to utilize whitespace and I’m of the opinion that as long as the code is easy to understand and the formatting looks good, use whatever suits your style and sense of aesthetics.
However when it comes to indentation, I have a recommendation: don’t use tabs, use spaces instead.
Why?
Because not all editors and viewers treat tabs the same way (different editors use different values for tabstops), so someone opening your code in Notepad might see tabs differently from someone opening your code in VSCode or emacs. Whereas spaces are treated the same way in all editors and viewers.
Plus most editors and IDEs have a setting that allows you to still use the tab key to indent, but they convert tab characters to spaces, so you don’t have to continually pound on the space bar.
And as far as how many spaces to indent, I like to use 4. That ensures indents are easily visible, yet keeps lines compact enough so longer lines don’t overflow to the next line.
The easier it is for someone to scan your code, rather than having to expend effort checking each line individually, the easier it is to pick out control flow and other details, thus leading to better understanding and more effective debugging.
Well formatted code is a joy to read, even if it’s not very good code. Poorly formatted code makes people wonder why they got out of bed, even if the code is exceptional. Of course your code is exceptional, so make it easy to read too.
Favour Readability Over Everything Else
As I mentioned earlier, when I started writing code, memory and other resources were scarce, so I had to find ways to pack as much code into as little memory as possible.
In time it became something of a contest to see how many tricks a programmer knew to get more code into less memory. Other programmers would gaze at you with awe if you did something tricky but unintelligible. It almost seemed the more arcane and unreadable your code was, the higher your status in the programming world.
But that was then, and this is now.
There is absolutely no reason to use tricks that obfuscate your intent in today’s world. Computing power has progressed to a point where you have as many resources as you need. And optimizing-compilers will take verbose code and make it efficient.
So if you’re still writing unreadable code today, you’re doing something wrong.
Nobody cares if you can save a few bytes by packing 8 flags into a byte and then defining a mask to retrieve or set the appropriate bit — not when it takes them 10 times longer to figure out what you’ve done.
What everybody cares about today is that your code is easy to understand and simple to maintain.
That’s it.
So keep your code clear and easy to understand, use good formatting techniques and try to keep each line under 80 characters long — because not every developer has that 43-inch, 4K monster monitor sitting on your desk.
And remember… it’s far easier to write code than to read code. So try to close that gap whenever you can.
Always Use a Source Code Control System
You know, there are times when I think things are so obvious, they just don’t need mentioning.
Yet I undoubtedly discover someone not doing the obvious.
As an example, I once did some work at a major company, with a large Smallworld installation and lots of custom code, and they weren’t using a source control system. Oh… they had one installed… they just weren’t using it to manage their code.
So here goes… if you write code, even if it’s only one module, always use a source code control system.
There. I said it.
‘Nuff said!
Only Use These Methods for Debugging
One of the main ideas behind OO is encapsulation. Classes encapsulate behaviour and data so, theoretically at least, we can hide underlying implementation details and separate what is to be done from how it is done.
Sometimes, however, you might need to break that encapsulation — usually when debugging. The following methods allow you to do just that.
But…
Never use these methods in production code. Yes, it might be tempting when you’re facing a hard deadline and nothing else seems to be working, but resist the urge and ignore the Sirens’ song. Breaking encapsulation increases the probability of introducing side-effects, bugs and other insidious and undesirable traits.
Plus these methods are very inefficient and slow.
So just don’t do it!
sys!perform()
Methods marked as _private can only be invoked by _clone, _super or _self.
Let’s say we have the code below…
def_slotted_exemplar( :beeble_debugger, {{:lines, _unset }}, {} )
_private _method beeble_debugger.debug_line(p_line_num)
write("this is a _private method.")
_endmethod
Line 3 defines the debug_line() method as private. So if we try to invoke it, we get the following traceback…
Magik> beeble_debugger.debug_line(1)
**** Error: Cannot invoke private method debug_line() on object a beeble_debugger with arguments: 1
does_not_understand(object=a beeble_debugger, selector=:|debug_line()|, arguments=sw:simple_vector:[1-1], iterator?=False, private?=False)
---- traceback: Alchemy-REPL (light_thread 1284089996) ----
time=2020/01/28 10:42:23
sw!version=5.2.0.0 (swaf)
os_text_encoding=cp1252
!snapshot_traceback?!=unset
condition.raise() (sys_src/guts/condition.magik:616)
object.does_not_understand() (sys_src/guts/object.magik:810)
object.sys!does_not_understand() (sys_src/guts/object.magik:684)
<unknown exemplar>.<unknown method> (Evaluated-inline:1)
magik_rep.process_command() (sys_src/misc/magik_rep.magik:136)
magik_rep.cli() (sys_src/misc/magik_rep.magik:90)
system.session_start() (sys_src/guts/system.magik:3160)
We can get around this by using sys!perform().
Magik> beeble_debugger.sys!perform(:debug_line|()|, 1)
this is a _private method.
This overrides the private method and allows us to invoke it, while leaving it designated as private.
Another way to invoke a private method is to change its designation from private to public. Look at the following statements.
Magik> beeble_debugger.method(:debug_line|()|)
method (private) debug_line(p_line_num) in beeble_debugger
Magik> beeble_debugger.method(:debug_line|()|).set_private(_false)
unset
Magik> beeble_debugger.method(:debug_line|()|)
method debug_line(p_line_num) in beeble_debugger
Magik> beeble_debugger.debug_line(1)
this is a _private method.
Line 2 shows debug_line() is private. We execute line 4 to flag the method as public (using the set_private(_false) method).
Now line 8 shows debug_line() is no longer private and we can invoke it, in line 10, without issues. Doing it this way changes the method from private to public (compared with using sys!perform() that does not change the private designation).
Interestingly enough, shared variables (and shared constants) are implemented as methods. So we can define a private shared variable named current_statement as follows…
beeble_debugger.define_shared_variable(
:current_statement,
42,
:private)
Since this is really a private method (indicated by setting the third argument to :private), we can override its private nature in the same way we did for the debug_line() method.
Magik> beeble_debugger.current_statement
**** Error: Cannot invoke private method current_statement on object a beeble_debugger
does_not_understand(object=a beeble_debugger, selector=:current_statement, arguments=sw:simple_vector:[1-0], iterator?=False, private?=False)
Magik> beeble_debugger.sys!perform(:current_statement)
42
Magik> beeble_debugger.method(:current_statement).set_private(_false)
unset
Magik> beeble_debugger.current_statement
42
Magik>
Pretty slick, eh?
So if you need to invoke private methods while debugging, you now have two ways to do it.
Slots
Similarly, if you need to look at the value of a slot that’s private, while debugging, you can use sys!slot(:<slot_name>), and if you need to set the value of such a slot, use sys!slot(:<slot_name>) << value (where <slot_name> is the name of the actual slot).
Property List, Hash Table or Concurrent Hash Map?
A good rule of thumb is to use the simplest collection that will do the job. So if a simple_vector will do, don’t use a rope. But what about property_list, hash_table and concurrent_hash_map?
Don’t these do the same thing?
Conceptually yes. In reality no. Fortunately there are some hard-and-fast rules for when to use each one.
Before the Concurrent Hash Map was available, property lists were more efficient for smaller collections containing, say, 10 or fewer elements while hash tables were better for larger collections. So…
- If you expect 10 or fewer elements, use a property_list.
- If you expect 20 or more elements, use a hash_table.
- If you expect between 10 and 20 elements and it’s a Tuesday or Thursday, use a property_list, otherwise use a hash_table (no, no, no… I’m joking… use whichever you feel like using despite what day it happens to be).
But, if you need to preserve the order of your entries, you will have to use a property_list. The following code adds entries to all three collection types…
_global hash_test <<
_proc @hash_test()
_constant SV << {:one, :two, :three, :four, :five, :six}
_local l_value << 1
_constant PROP_LIST << property_list.new()
_constant HASH_TBL << hash_table.new()
_constant CONC_HASH_MAP << concurrent_hash_map.new()
_for e _over SV.fast_elements()
_loop
PROP_LIST[e] << l_value
HASH_TBL[e] << l_value
CONC_HASH_MAP[e] << l_value
l_value +<< 1
_endloop
print(PROP_LIST)
write(newline_char)
print(HASH_TBL)
write(newline_char)
print(CONC_HASH_MAP)
_endproc
Notice how we add the same key and value to all the collections during each loop iteration, but when we execute this code, pay strict attention to what happens.
Magik> hash_test()
property_list:
:one 1
:two 2
:three 3
:four 4
:five 5
:six 6
hash_table:
:six 6
:two 2
:four 4
:five 5
:one 1
:three 3
concurrent_hash_map:
:six 6
:two 2
:three 3
:five 5
:one 1
:four 4
Magik>
Will you look at that… the property_list preserved the order of the entries while the hash_table and concurrent_hash_map didn’t. Sometimes you need this, sometimes you don’t. So choose appropriately depending on your requirements.
In reality, however, the concurrent_hash_map is what you should use when running Smallworld 5, because it’s highly optimized on the performance front — even for smaller data sets — and it’s thread safe.
There are a few specific use-cases for the hash_table and, of course, if you want to preserve order then use a property_list, but if insertion order is not important, just use the concurrent_hash_map.
Use Defensive Programming Techniques
A few days ago a user reported a traceback when using production code. I took a look at the code and noticed it depended on a slot being set to a particular object, which it wasn’t and therefore the dreaded does not understand message traceback occurred.
There was no error checking to see if the slot was _unset and no try block or other error handling mechanisms protecting that piece of code.
I think that’s unreasonable.
Users should not have to deal with tracebacks in production code. At the very least the code should handle errors and fail gracefully, displaying a human readable message with some suggested actions.
Or better yet, take a page from Joel Spolsky‘s book and pop up a small window asking the user to describe, in one sentence, what he or she was doing when the bug appeared. Then package the answer with the traceback behind the scenes and write it to a log (or email it to the support desk).
The key phrase is, behind the scenes. End users should never have to deal with tracebacks.
So use try blocks to catch errors. Use protection blocks to clean up after an error.
And use default values on optional parameters.
If a method or procedure is called with missing optional arguments, those come through as parameters with values of _unset — which can quickly raise unexpected errors and break your code.
So it’s a good idea to get into the habit of setting default values.
_method chain.as_beeble_geojson(p_excludes, _optional p_out_stream)
_local l_out << p_out_stream.default(!output!)
.
.
.
_endmethod
See how easy that was?
Avoid Using evaluate()
The evaluate() method on ro_charindex_mixin allows you to execute Magik statements and expressions held in strings. Take a look at the following code.
Magik> cmd << "answer << 42"
"answer << 42"
Magik> cmd.evaluate()
42
Magik> answer
42
Magik>
Or how about this code…
Magik> txt << magik_text.new()
a sw:magik_text
Magik> txt.add_last("# evaluate()")
Magik> txt.add_last("_proc()")
Magik> txt.add_last("_constant c_sv << {42,52,62,72}")
Magik> txt.add_last("_for e _over c_sv.fast_elements()")
Magik> txt.add_last("_loop")
Magik> txt.add_last(" write(e)")
Magik> txt.add_last(" _endloop")
Magik> txt.add_last(" _endproc")
Magik> print(txt)
magik_text(1,8):
1 "# evaluate()"
2 "_proc()"
3 "_constant c_sv << {42,52,62,72}"
4 "_for e _over c_sv.fast_elements()"
5 "_loop"
6 " write(e)"
7 " _endloop"
8 " _endproc"
Magik> txt.evaluate()()
42
52
62
72
Magik>
You like that, don’t you? As cool as it appears, stay away from using it. There are very, very, very few cases when you need to turn strings into Magik code.
Not only is performance poor, but because it can allow arbitrary code to run, it represents a security risk.
Think long and hard and… carefully, *ahem*, evaluate your use case before using evaluate().
Make Good Use of Private and Public Comments
While comments are essential in creating understandable code, they should be used judiciously. If you’ve followed the guidelines for naming variables, methods, classes and procedures (above), then the majority of your code will be self-documenting.
In Magik, there are two types of comments. Private (introduced with #) and Public (introduced with ##). The snippet below shows an example of each one.
# this is a private comment only visible from...
# within the source file.
## this is a public comment visible from within the source file...
## and from within class browser.
Private comments should be used to explain complex logic that might not be understandable at first glance. Too many developers add superfluous comments that either restate something that is obvious, or have to explain something that could have been explained by using proper naming conventions.
Another valid use of private comments is to explain why you did something that may not be obvious. For example, if you had to implement a work-around as a custom method rather than use an out-of-the-box method, a concise comment would be in order.
A not-so-valid use of private comments is the ubiquitous TODO. These should only be used while developing your code, yet they tend to stick around and make it into production code.
There shouldn’t be any TODO comments in production.
When you finish your code, remove the TODO comments. You should not have anything left to do when promoting your code to production — either remove the incomplete functionality or finish what’s left to do.
If you’ve been around the software industry for any length of time, you’ll know people don’t come back to fix TODOs when the code is in production.
Another invalid use of private comments is to comment out code you don’t want compiled, but still want in the source code.
Don’t do this.
You should be using a source code control system that gives you the ability to go back to a version that contains the old code. Commented out code reduces readability and increases maintenance costs, so use your source control for what it’s meant to do and use comments for what they’re meant to do.
In addition, always keep your comments up to date. If you change code with related comments, update or remove the old comments. Otherwise your comments and code won’t match and this will cause confusion somewhere down the line.
Finally, use public comments effectively. Public comments (i.e. ##) are useful to describe how to use a method or procedure when viewed in the class browser.
Of course if you follow good variable, method, class and procedure naming, the functionality should be self-evident, so public comments shouldn’t be necessary in the majority of cases. However if there is something that might be unclear, clarify it with a public comment.
At the end of the day, comments are meant to help others (and even help you when you revisit your code a year later) understand what is going on. They should be clear, concise and accurate. Otherwise they can easily clutter the code or, even worse, mislead others.
So keep a tight rein on your comments and diligently cull them when they are no longer fulfilling their mandate.
Understand Floating Point Numbers
A floating point number is the approximate representation of the number, not the actual number. To see this, look at the code below.
Magik> !print_float_precision! << 20
20
Magik> a << 0.1
0.10000000000000000000
Magik> b << 0.2
0.20000000000000000000
Magik> c << a + b
0.30000000000000004000
Magik> c = 0.3
False
Magik>
In line 1 we set the precision for printing floats to 20. Then we do some simple arithmetic by adding 0.1 and 0.2. However, as line 11 demonstrates, the result is not 0.3, as you might expect, but rather 0.30000000000000004000 — because of how floats are stored.
Line 14 shows c is not equal to 0.3.
If you only need approximate numbers, you can get away with using floats this way, however if you need exact accuracy, multiply and divide.
Magik> c << (a * 10 + b * 10) / 10
0.30000000000000000000
Magik> c = 0.3
True
Magik>
Now c is exactly equal to 0.3 as shown in line 5.
Optimize Loops
Loops are essential control structures in programming. However, they can quickly become bloated and hamper performance if you’re not careful. I don’t know how many times I’ve seen code similar to the following — in production code.
_global loop_test <<
_proc()
_local l_v << gis_program_manager.databases[:gis]
_local l_footpaths << l_v.collections[:footpath]
_local l_min << l_footpaths.an_element()
_for e _over l_footpaths.fast_elements()
_loop
_if e.length.write_string.as_number() > l_min.length.write_string.as_number()
_then
write("found a footpath of greater length.")
_endif
_endloop
_endproc
Check out line 10.
The expression on the right hand side (l_min.length.write_string.as_number()) is retrieving the length, converting it to a string and then to a number on each iteration. But the value is the same each time it’s evaluated. It’s a constant.
Now the Cambridge DB footpath collection only has 29 records, so for such a small number of iterations, you might not notice a difference. But if the footpath collection had, say, 10 million records, that’s a lot of unnecessary overhead being added, so you’d definitely see a difference.
Therefore anything that can be evaluated outside a loop should be removed from the loop body because it will make the loop faster. By removing l_min.length.write_string.as_number() from the loop body, we cut down on the number of invocations inside the loop and thus increase its performance.
Here’s the improved code.
_global loop_test_improved <<
_proc()
_local l_v << gis_program_manager.databases[:gis]
_local l_footpaths << l_v.collections[:footpath]
_constant MIN_LENGTH << l_footpaths.an_element().length.write_string.as_number()
_for e _over l_footpaths.fast_elements()
_loop
_if e.length.write_string.as_number() > MIN_LENGTH
_then
write("found a footpath of greater length.")
_endif
_endloop
_endproc
In line 6, we’ve moved the minimum length calculation outside the loop and declared it as a local constant. Then in line 10, we compare the length of the current footpath element to the constant value, thereby removing the need to calculate that constant value for each iteration.
And just like that, we’ve improved our code’s performance. So pay special attention to loops and ensure they’re as efficient as possible.
Avoid Unnecessary Variables
If you don’t need to use a value later on in the code, don’t create a variable for it if you don’t have to.
So rather than writing code like this…
_global useless_vars <<
_proc(p_first_name, p_last_name)
_local l_full_name << write_string(p_first_name, " ", p_last_name)
write(l_full_name)
_endproc
Opt to write it this way instead…
_global no_vars <<
_proc(p_first_name, p_last_name)
write(p_first_name, " ", p_last_name)
_endproc
See how we’ve eliminated the unnecessary l_full_name variable?
There are two good reasons to get into this habit. First, it’s easier to understand what’s happening, especially if there are lots of statements between when the unnecessary variable is defined and when it’s used. Others reading the code don’t have to search to find where the l_full_name variable is defined.
The other reason is performance related. When Magik is compiled to Java bytecodes, the statement to declare and assign a value to a variable requires at least 2 bytecode instructions be generated (in some cases 4 instructions may be generated). So eliminating the unnecessary variable reduces the number of instructions needed and, thus, improves performance.
Of course this example is trivial, so generating 2 or 4 additional instructions wouldn’t result in any noticeable performance degradation. However, if there were numerous such unnecessary variables, the result would be numerous unnecessary instructions, and that could cause noticeable issues.
The reason is two-fold. The first is obvious, additional instructions take more time to execute. The second, and arguably more important reason, is not so obvious and has to do with how the Just-in-time (JIT) Java compiler works.
At this point, let’s step back and look at what happens when you compile Magik code.
Compilation is a two-step process.
- Magik code is compiled to Java bytecode instructions. This is a typical example of static compilation.
- During runtime, bytecode instructions may be compiled to native machine language instructions that are executed on the underlying hardware. This is an example of dynamic compilation.
After step 1 completes, bytecode instructions can run via the Java Virtual Machine (JVM). The JVM takes bytecode instructions and interprets them by dynamically mapping the bytecodes to native instructions the underlying microprocessor can directly execute — as you might imagine, there is overhead required to do the interpretation.
Therefore, in order to optimize frequently used code, the JVM also implements the JIT compiler. This means parts of the the program (in bytecode format) may be compiled to native machine language instructions, while the program is running, so they can be directly executed by the microprocessor (that’s step 2).
For example, if a method call (say in a loop) is executed 15,000 times, rather than interpreting this code on each iteration (a relatively slow process), the JVM may decide to compile the method using the JIT compiler. Since the code no longer needs to be interpreted, but can be executed directly, performance can be significantly improved.
But the JIT compiler doesn’t just blindly compile the bytecodes, it also attempts to perform some basic optimizations.
One of these is inlining — which replaces a method call with the body of the method, thereby avoiding the overhead of a method invocation. So if our method that was executed 15,000 times was inlined, we’d have saved the overhead on approximately 15,000 method calls (I say, “approximately,” because the profiling mechanism won’t inline methods until after several iterations. but you get the idea).
However the JIT compiler has a number of limitations, given it’s dynamic nature, and one such limitation is an upper bound on how many bytecodes it can inline. If the number exceeds its limit, it won’t be able to perform that optimization.
So… if your code’s length is near the JIT compiler’s inlining limit and you add unnecessary variables, that may push it over and cause your code to be less-than-fully-optimized — which could result in significantly decreased performance.
Obviously there are a number of other items that can affect optimization, but if you simply avoid introducing unnecessary variables (particularly in frequently executed code), the JIT compiler will be able to perform optimizations to the best of its ability (at least in terms of reduced bytecode instructions size) and your code may run faster… sometimes much faster.
By doing this you give the JIT compiler the best chance of applying useful optimizations to your code.
_finally
This has been a whirlwind tour describing some best practices to use when writing Magik code. It is by no means exhaustive or even required for you to follow. It’s simply a list of ideas I’ve discovered that allows me to write better code. If you have another way of doing this, by all means, use whatever you’re most comfortable with.
On the other hand, if you follow these guidelines, your code will be clearer and easier to maintain. Others will understand what you are doing and you’ll gain a reputation as a master of coding, a man (or woman) amongst boys (or girls) and a dynamic hero of epic proportions who doesn’t just get one or two results, but gets !all_results!
(Okay, perhaps that’s a bit of an exaggeration.)
So use meaningful names for variables, classes, methods, procedures and other constructions. Avoid “smart” tricks and write code that clearly shows your intent.
Use comments carefully and always be watching for traps that make your intentions and code difficult to decipher or less efficient.
Software development is hard enough at the best of times, so don’t make it any more difficult for those who come after and have to work on your code.