Git trees

Git trees

And now for something a little different: let’s talk about version control.

Let’s say you have a file—an essay, maybe, or a spreadsheet—that you edit a lot. You want to make sure that you always save your changes to the disk, so that when you update something the file reflects that change. Pretty obvious stuff.

Now let’s say that at some point you accidentally deleted a page of the essay (or a sheet from the spreadsheet, and so on). You can either click “Undo” furiously, hoping it’ll fix itself, but Undo doesn’t always work. It relies on a stack of change states (that is, a list of changes that can be reverted), and sometimes that stack is cleared, or isn’t updated when you think it should be, and so on.

So instead, rather than saving a temporary Undo buffer and hoping that any bad changes you make will be cached there for reversal, what if you stored a bunch of versions of the file?

That is, rather than having your editor keep a couple of changes, you now store a different version of a file every time you make a big change. This way, if you want to revert back to a previous version, you can do so easily by just renaming the correct version to the current version.

This has more applications than just reverting drastic changes. It can also be used as a way of keeping track of “important” versions of the file—that is, versions you might want to revert to at some point in the future. Maybe after finishing every paragraph, you store that version of the file, so that if you realize that you got completely off-track four paragraphs ago, you can just revert to the version before you wrote those four paragraphs.

Some of you might be thinking that this is incredibly bad storage-wise. You’re storing a bunch of files that are effectively the same size—and if you have k files, you’re only going to use 1/k of the space for the final product anyway, which isn’t super efficient.

So instead, what if you stored not the entire file, but the difference between the files? That uses far less space and preserves the same information. This is the approach most version control systems—the programs that automate everything I’ve just been talking about—use.

Now the big advantage comes when you have multiple people editing the same file. Google Drive lets you get around this pretty easily for documents and spreadsheets, but for files that its editors don’t support (i.e. pretty much everything else), version control helps you get around this. You can store a single “master” version of the file, people can download it to their local machines, edit it as necessary, and then reupload it. If two people try to re-upload the same file at the same time, that can get sticky, but you just pick which version you like better.

With code, this is tremendously useful. Most VCSs (version control systems) let you tag versions with messages, short explanations about what differs in this version from the previous. It’s another way of commenting your code, and in fact is arguably more useful—with this you can see exactly who introduced a new function, where it went, and when it was introduced. If you get a bunch of bugs after updating your server, you can track the bug down to the changes introduced when the first bug was reported.

At hackathons, when it’s 5 AM and you’re operating mostly on Red Bull and cold pizza, you might accidentally introduce a bug into your code that you didn’t have the brainpower (or cognitive abilities) to find at the time. When everything crashes and your team is frantically searching for a fix, rather than try and deconstruct what happened, you can just revert back to an earlier version and take a second stab at writing the destructive function (hopefully with more sleep the second time around).

There are a bunch of version control systems—git, mercurial, subversion, etc.—but git is probably the most frequently used (although it varies company-to-company and even project-to-project). It’s one of those things you absolutely should know before starting a job—regardless of your technical abilities, managers always like to see knowledge of workplace tools, since it makes onboarding that much easier.

Anyway, to finish up, here are some great Git tutorials from around the web. Hopefully one of them proves useful—most of them explain everything in a slightly more technical way than I did above, but they all get the basic point across.

Tags: 

The views, opinions and positions expressed by the authors and those providing comments on these blogs are theirs alone, and do not necessarily reflect the views, opinions or positions of UT Computer Science, The University of Texas or any employee thereof.