In the world of Version Control Systems (VCS), two types dominate: Centralized (CVCS) and Distributed (DVCS). Common centralized VCSs include Subversion and Clearcase while the most popular distributed VCSs are Git and Mecurial (Hg).
I know I shouldn’t have to say this, but since I’ve seen large companies that don’t use VCSs or use them in ways that defeat their purpose, I’ll say it: Always use a VCS, and use it the way it was intended to be used. Always!
With that out of the way, the obvious question becomes, “which type should I use?”
DVCSs have a number of advantages over CVCSs — for details simply Google “DVCS vs CVCS” — but the most significant is that DVCSs make it very simple to merge different branches back into a master branch.
I’ve used both types and I’ll clearly state a DVCS is better (now I know there are those who think Centralized is the way to go — and that might have been true 20 years ago, but with today’s DevOps practices using Agile development techniques, continuous testing and deployment, to name just a few items, there is no question in my mind that you either go Distributed or you go home).
If you’re still with me, the next question is which DVCS to use. I don’t think you can go wrong with either Git or Hg. Git is used by a large majority of the open source community and enjoys wide support. But Hg is also a good choice, so download both and play around with each one. Then choose the one with which you are most comfortable.
I’ve used Hg more than I’ve used Git, but if I had to start from scratch, I think I’d go with Git just because it enjoys more support. But, as I’ve mentioned, either one is fine. In the following examples I’ll use Hg, but there are analogous commands in Git.
Let’s assume you’ve settled on a DVCS and you’re ready to get going. What’s next?
Well, if you’re setting up a DVCS for use in your company (or department), then I think the following information will help.
The basic concepts are:
- All source code and important documents must be version managed (okay, sometimes you don’t want to manage certain source files, but you get the idea).
- New code should never be shared with other developers or put into a stable repository branch until it has been tested.
- Developers should commit changes at logical points and not wait until the end to commit large, possibly unrelated, changes. This allows a developer to go back to a particular revision when needed.
To implement these concepts, being able to merge quickly and easily is very important. With Git or Hg, you have your own private branch (which is a private copy of the entire repository and all its associated history) and can therefore commit your code whenever you’d like.
Once your code is completed and tested, you then push it to a shared repository so other developers can see it. This also ensures your code does not break builds or cause issues for other developers.
Keep in mind the shared repository is just another distributed repository. It’s no different from developers’ private repositories other than we designate it as, “shared,” and tell everyone it’s the main repo.
The last thing I’ll mention about CVCSs and DVCSs, before describing the actual workflow for managing source, is that CVCSs look at revisions (which is basically a snapshot of what the entire file structure looked like at a particular point in time) while DVCSs look at changesets (which is a list of changes between one revision and another).
So are there any advantages of one over the other? As it turns out, there are significant advantages in the way DVCSs do things.
When we create separate branches with lots of changes, the CVCS’s way of doing things looks at the before and after snapshots and tries to guess how things progressed from the before snapshot to the after snapshot. This is usually not possible to determine based simply on the snapshots and that’s why merging is so difficult when using a CVCS.
The result is lots of “merge conflicts” (which aren’t really conflicts, but are flagged as conflicts because the CVCS can’t determine how to get from the before state to the after state) which must be resolved manually.
On the other hand, Hg maintain a complete series of changesets and therefore knows exactly how to move from the before state to the after state. So when it merges, it will only report actual conflicts – not “conflicts” that are a result of it not knowing how to move from one state to another.
And since merging is so painless, it allows us to branch whenever we want and therefore use the source control system to its full advantage, such as keeping stable master source code in a separate branch from unstable development code and keeping patches in another branch and so on.
This lets each developer work in a separate branch until they are certain their code has been tested and working as intended. It allows us to have branches for different development projects, patches, QA and experiments and not worry about how to merge everything together when the time comes.
In addition to the main repository, there are usually multiple local developer repositories. These are generally copied (the Hg term is, “cloned”) from the main shared repository.
In fact a single developer can clone multiple local repositories if necessary, so he can work on multiple independent projects at once (each in its own clone). Each clone can be viewed as a separate, independent branch.
This brings up an important concept. A branch in Git or Hg can be contained in one repository or it can be a separate cloned repository. In either case, it is very easy to merge these branches back into the main branch.
Once you have a cloned repository, you can modify, add and delete files as necessary.
After you’ve modified, added or deleted files, you need to commit your changes so the DVCS can create a changeset (which you can think of as making a copy of the entire directory structure at that instant in time as well as the rules for moving from the previous changeset to the current one).
The rule of thumb is, after you make changes that implement a specific piece of functionality, you should commit. On one hand you don’t want to commit too often because you’ll create lots of changesets, but you want to commit often enough so you can revert back to a particular discrete revision if necessary.
When you choose to commit, you must supply a commit message. You can use any message format you want, but I like to use a message in the following format:
v[YYYY-MM-DD-x] [message] – [UID]
where [message] is a short description of what you are committing (e.g. “December Maintenance Release”), [UID] is your user id (e.g. MHING), YYYY-MM-DD is the current date (e.g. 2019-12-25) and x is an optional number (for commits that occur on the same day).
As an example, a commit message may be, “v2019-12-25-1 December Maintenance Release – MHING”
Note that the “v” before the date is mandatory because it allows us to search for versions easily in the future. You should also tag important commits with the same message.
A tag is simply a user-friendly way to identify revisions. For example, in Hg’s case, it assigns a unique identifier to each revision, however this identifier is a very difficult to remember hexadecimal number. It’s far easier to tell someone to go to revision “December Maintenance Release v2019-12-25-1” rather than asking them to go to revision “77480a59c732”. It’s also easier to remember what that revision represents if you’ve intelligently tagged it.
Hg also uses decimal numbers in local repositories, but you can only use these within that local repository. They are not globally unique across repositories like the hexadecimal numbers and tags are.
Also keep in mind that when you add a tag, it creates a new changeset (and since it will be the latest changeset, it will be named, “tip”).
Each repository stores all old versions of every relevant file within its particular history. At certain points repositories are merged. Let’s assume we have a shared repository called, QA. At a high level the merging process is as follows:
- Local repositories, after their code have been fully tested by their developers, are merged with the QA shared repository which only contains code that is fully unit tested, but might not be integration, fully regression or user tested. This repository is where the work is checked against QA standards and users can test it.
- The QA shared repository is merged with the PROD Master repository only once new code has been integration, fully regression and user tested as well as signed off by the powers-that-be (in addition, whatever processes are in effect, such as change controls, must be followed).
PROD Master contains the production source code used to deploy production builds that users run. At most times, PROD Master and QA will be identical. However there are two times when these repositories will differ:
- Just prior to a production implementation where changes have been completed but not yet put into the production environment because they are being tested.
- When patches are implemented. Patches are implemented directly in QA (and then, after testing, pushed to PROD Master)
- Just prior to a production implementation where changes have been completed but not yet put into the production environment because they are being tested.
Following are some useful Hg commands:
- Revert: go back to the last committed revision.
- Status: see which files have been modified, removed or added.
- Diff: see differences between files at different revisions.
- Remove: removes files from the repository – they will still remain in the folder structure.
- Commit: create a changeset based on the current state of the files.
- Log: view a list of committed revisions.
- Cat: shows any revision of any file.
- Update: move the working folder to a particular revision. This command actually modifies every file in the folder structure so it represents the state of these files at the time the selected revision was committed. Using update without any parameters moves to the latest revision (the one tagged as “tip”).
The Update command can be used to move between revisions. When you update, your working folder is updated to the revision selected. If you update without providing a revision, your working folder is updated to the latest revision.
As an example, let’s say you have a March 2019 Maintenance Release revision and you’re currently working on the August 2019 Maintenance Release revision. Let’s also say a user tells you a new bug was introduced sometime after the March 2019 Maintenance Release and he would like it fixed. You do some investigation and find out exactly where it is.
The issue is you don’t want to add the fix to the revision you’re currently working on because then the user will have to wait until the August 2019 Maintenance Release revision is released.
So you could update to the March 2019 Maintenance Release revision, make the change and then commit it as March 2019 Maintenance Release v1.1. Hg will now tell you there are two heads. One is the regular branch you’re currently working on (i.e. the August 2019 Maintenance Release revision) and the other is the March 2019 Maintenance Release v1.1 you just committed.
You can then update to the March 2019 Maintenance Release v1.1 so the correct files are in your working folder, perform a build from your working folder and then tell the user to use that build.
Afterwards you can update to the August 2019 Maintenance Release revision and continue working on that release.
One word of caution. Sometimes Hg requires you to merge the .hgtags files (which is a version managed file that contains the Hg tags). If you ever have to resolve a conflict involving .hgtags, always choose to keep both versions.
Sharing Code Changes
One of the main benefits of having a change control system is to allow multiple developers to independently work on the same code at the same time. Developers can then merge their changes when complete and resolve any merge conflicts.
There is nothing technically special about the shared repositories other than we’ve agreed they are shared.”
As you’ve seen, I recommend using shared repositories to coordinate changes. Recall, in Hg, there is nothing technically special about the “shared repositories” other than we’ve agreed they are “shared.” Unlike a CVCS, where there is a technical, “central repository,” Hg doesn’t require one, so keep that in mind when we use the term, it is merely a convention.
Recall we have set up 2 shared repositories:
- PROD Master
We use these shared repositories as a place to publish our code to other developers. If you don’t already have a local, private repository (also known as a branch), you can easily make one using the clone command. This will make a complete copy of the, say, QA repository (or any other repository you choose), including all its history, and store it locally wherever you specify (usually your local computer).
Once you have a local repository, you can add, modify or delete files as required. When you feel you’ve made enough changes that constitute a package of work, you can commit that work (don’t forget to add a commit message following the convention described earlier. You may also want to add a tag if your changes are something relatively major).
As you develop new code, your local repository will change and you will most likely have multiple revisions (note that each time you commit, a new changeset is created and that’s what we are calling a revision. Also note your changes/revisions are only in your local repository, not in the shared repository from which you cloned.).
To move changes from your local repository to a shared repository, use the push command. This pushes new changes from your local repository to the shared repository (in actual fact you can push from your local repository to any other repository, assuming you have permission to do so, so if you are developing on a team and each member of your team has a local repository, you could share your changes with a specific team member by pushing your revisions to that person’s local repository. However to keep things simple, we won’t discuss peer-to-peer pushing here).
If you’ve just cloned a shared repository, you can push your changes without doing anything else. However you should get into the habit of always using the pull command to get any changes other developers may have pushed to the shared repository since you cloned it.
When you pull, changesets that are in the shared repository, but not in your local repository, are brought in and merged in your local repository. The upshot is, after a successful merge, your local repository will be the same as, or a superset of, the shared repository from which you just just pulled. At this point you can then safely push your changes to the shared repository.
To summarize, when you create new revisions by committing them in your local repository, they stay in your local repository. If you want to share with others, you need to first pull from the designated shared repository of interest, to get any new changes from other developers, and then push your changes to that repository. At that point, other developers can see your changes when they pull from the shared repository into their local repositories.
But what happens if someone changes the same file you do?
Well, in that case Hg will say you have a merge conflict when you try to pull changes from the shared repository. Since you are the person adding your changes to the shared repository last, it is up to you to decide how to merge the file (in practice you would want to discuss this with whoever made the other changes, but Hg won’t allow you to merge until you resolve the conflict).
You can use the outgoing command to see which changes in your local repository are waiting to be pushed. This is a list of the revisions that a push command would send to the shared repository, or thought of another way, it is a list of changes that are in the local repository but not in the shared repository.
If you neglect to merge, and simply push, and if another developer has pushed his changes to the central repository, your push can fail with a message saying the, “push creates new remote heads.” Hg will also suggest you can use push –f to force the push. DO NOT DO THIS!
Always pull and merge from the other repository first, resolve any merge conflicts and then push. Never. Ever. Use. push –f.
Hg knows when changes are made that create a merge conflict and it will not let you push conflicting changes unless you resolve the merge conflicts first (unless you use push –f, which, of course, you know never to use).
Before you actually pull (and merge) from the shared repository, you might want to see which revisions will be copied to your local repository without actually performing the copy. Hg has a command for that. It’s called incoming.
This is similar to the outgoing command we discussed earlier, but does the opposite. It lists all revisions that are in the shared repository but not in your local repository (these are the revisions that will end up in your local repository if you do a pull).
As an example, if you changed line 42 in a file called marvin.magik and another developer changed the same line in the same file and he pushed that change to the shared repository, Hg will not let you push without first resolving the conflict.
To resolve the conflict you must first get the changed marvin.magik file from the shared repository, resolve the conflict (by choosing to keep your change, keep the shared repository’s change or keep both changes) and then tell Hg the conflict has been resolved.
To do this, issue the merge command. Most of the time Hg will be able to do the merge automatically (for example, if different parts of the file were changed by each developer) but in cases like our example, Hg won’t know what to do. So you will have to tell it.
Merging with Conflicts
One of the great benefits a DVCS has over a CVCS is it lets you merge much more easily. Merging in Hg is simple, easy and fast. Merging in Subversion is complex, difficult and can take hours, days or weeks.
However there are times when merging in Hg will cause conflicts.
As I’ve mentioned, when two developers make changes to the same file from a common parent changeset, Hg will now have two versions of the same file. One version will be the one with developer A’s changes and the other will be the one with developer B’s changes.
Hg refers to this as having two heads. When you issue the merge command, Hg takes the two heads and combines them into one if it knows how to do that (for example, if developer A and developer B changed different parts of the same file, Hg will be able to merge them automatically).
Hg will then put the results in the working directory but will not commit. This allows you to check the merge did what it was supposed to do in a correct manner. After you’ve verified the merge is correct, you can commit and push as desired.
If, on the other hand, both developers changed the same line of a file, then Hg won’t know which change to use. That’s where you come in. If a conflict is detected and Hg doesn’t know how to handle it, a merge conflict resolution tool, called KDiff3, will be available.
You’ll be shown four panes: the original file, the version developer A modified, the version developer B modified and the version that will resolve the conflicts. Your options are to keep developer A’s version, keep developer B’s version, keep both versions or manually intervene and do something else entirely.
Once the conflict has been resolved, don’t forget to commit. Hg never commits for you, so don’t forget to do it when you’re happy with the resolution.
Summarizing the Workflow
Once all merge conflicts have been resolved and you’ve committed the changes, you can push your changes to the shared repository and it will be updated to contain the same things as your local repository.
Of course, you have the other developer’s changes and the shared repository has his changes and your changes, but the other developer will not have your changes until he pulls from the shared repository. So let’s say he pulls in order to get your changes. He’s good to go right?
Not so fast bucko! He may have all your changes in his repository, but he does not yet have them in his working folder – and it’s his working folder that stores the files which will be used to make builds from his source code.
To update his working folder, he needs to run the, appropriately named, update command. This will make all the files in his working folder exactly the same as when the last revision was created.
Hg will never overwrite your working folder until you tell it to by using the update command. This means you can pull from the shared repository as often as you want and still continue working on your unmodified files. When you’re ready to see the pulled in changes, simply update to the relevant revision and your working folder will be updated to reflect the changes at that revision. In essence, you have complete control over what changes you see and when you see them.
Note that Hg automatically tags the latest revision as, “tip.” You can’t rename “tip” to anything else. The nice thing about this is you always know the name of the latest revision. Further, if you run an update command and don’t specify a revision, you will be moved to tip.
To summarize the workflow, once you have a local repository:
- Pull from the shared repository to get the latest revisions other developers may have created.
- Update to ensure the changes are in your working folder.
- Modify/Create/Delete files to implement your functionality.
- Commit your changes at the appropriate times. Your revisions will only be committed in your local repository.
- Test your code.
- Repeat steps 3 to 5 until your code is stable.
- Pull again from the shared repository to get new changes (if any) that were pushed by other developers since your last pull.
- Merge and resolve conflicts (if any).
- Update to ensure changes are in your working folder.
- Test the merged changes to ensure everything is working.
- Commit the merged changes (again, this will only create a revision in your local repository).
- Push the changes to the shared repository (in other words, publish it so other developers can see it).
- Test the changes in the shared repository (this should include integration, regression and user testing).
Of course there are many other things you can do with a DVCS, but I’ve covered the basics to get you going. I strongly recommend reading through your chosen DVCS’s documentation and searching online for a few tutorials that can get you up to speed quickly and with relative ease.
Once you’re properly set up, you won’t have to worry about making unrecoverable code changes or having other developers overwrite your changes again.