I spend probably a third of my maker time writing code (another third doing root cause analysis on bugs and another third reviewing code, more or less). In my coding time, I do almost all of my work in Vim and Bash, mostly Vim. You IDE buffs out there can scoff if you will. Vim is a tool that has proven its worth over two decades. Two decades. That’s like two milennia in computer years. I’m an admittedly bad Vim user, but I’m making strides to recover. One stride I’ve just taken is using Vim Macros to automate stuff, like converting the format of text files.
If you’ve never used macros in any context, you may not have a sense of how powerful a tool they really are. Macros give you the ability to take a series of actions that achieve some goal and put them into a recording. You can then replay those actions again and again. This means that complex transformations on some text only have to be done correctly one time. The instructions can be captured and reused. We humans are pretty good at getting something hard done correctly once. Our error rate goes up the more times we do the thing. So, macros are helpful because when you get the steps right once, you get to use your own actions again and again. The replay is exact. The computer doesn’t accidentaly skip step three like you or I might the fifteenth time we walk through the process.
The other benefit is that the steps get executed at human speed (i.e., slow and clumsy speed) only the first time. Each time the actions are replayed, all the time you spent head scratching and double checking that you’re doing the right thing disappears and the actions get executed as fast as the computer can do it. Vim has macros built-in as do a lot of applications. Oh how I long for real life macrobot. Imagine being able to program a robot, one time, to fold laundry. I would never match a pair of socks again. Baby steps.
The background for all this macro talk is a task I took on recently. I needed to get the data out of a bunch of hand-rolled, inconsistently formatted PDFs and into some reasonable format. The goal was to have a structured data representation of the data that could be easily maintained over time. I guess it’s not strictly insane to try to keep a bunch of willy-nilly PDF files full of timetables up to date, but it’s far from efficient. To achieve this textual cleaning of the stables, I wanted to use Vim Macros because I knew they were a good tool for the job and I didn’t already know how to use them. Before this little jaunt, my exposure to Vim’s macros was pretty much just accidentally starting the recording process and trying every combination of colon, escape, and q to get out of it. Like I said, I aspire to be a competent Vim user. Baby steps.
Anyway, my tactic was to copy and paste the tables from the PDF files into a Vim document and determine an automated way to turn the data into a CSV format. Okay, actually I’m lying. I started by pasting that data into TextMate and trying to use find and replace hacks in addition to my amazing repetitive typing abilities to get the data into a CSV format. As I started to get into the rhythm of it, I could feel my inner smarty pants engineer sighing at me. Yes, I know there is no good reason to be using my pitiful human brain for simple, repeated tasks, but I don’t already know how to use them. Yes, I should use the generic tools available for automating the process so I can get it right and do it fast, but I just want to get it done. This is a classic dilemma. Option one: Do it the slow, hard way that is guaranteed to get finished in a fixed amount of time but may introduce errors at worst and at best is a pain in the ass. Option two: Maybe do it the super fast way, assuming you can actually figure out how to use the scary new tools properly. XKCD 974 comes to mind.
After a few moments of complete philosophical system lock, I decided it was time to brain-up and make the computer do my bidding. Since I knew I needed to learn macros, I decided that would be the weapon with which I would dispatch these gobs of poorly structured nonsense.
Copying and pasting from the PDF, I get pretty garbled data, like this:
9:00 AM Location 1 6:30 PM Location 5
9:05 AM Location 2 6:35 PM Location 4
9:20 AM Location 3 7:45 PM Location 6
10:25 AM Location 4 7:55 PM Location 2
10:30 AM Location 5 8:00 PM Location 1
9:30 AM Location 1 7:30 PM Location 5
9:35 AM Location 2 7:35 PM Location 4
9:50 AM Location 3 8:45 PM Location 6
10:55 AM Location 4 8:55 PM Location 2
11:00 AM Location 5 9:00 PM Location 1
10:15 AM Location 1 8:10 PM Location 5
10:20 AM Location 2 8:15 PM Location 4
10:35 AM Location 3 9:25 PM Location 6
11:35 AM Location 4 9:35 PM Location 2
11:40 AM Location 5 9:40 PM Location 1
...and so on for about a hundred lines.
Though it’s bad, it’s a bearable starting place because it’s consistent in at least one way. For every time, the location that follows it is correct. The ideal CSV output would have the locations as the first row and the times for each location in the correct CSV column on the following rows. Of course there are some hurdles here. In the PDF paste, there are two columns because the PDF was in a two column layout and whoever created it decided to just stretch one table across the page rather than created two separate tables. So, step one is to get the columns cleaned up. Easy enough, right? After all, I know VIM MACROS!!!
It’s macro time.
I actually didn’t know much about macros in Vim, but I’ve used them in other contexts. The basic idea is that a macro is a recording of some actions that can be replayed. There are some basic rules one has to know to get full value out of macros. As far as I can tell, there are two:
Remember the computer has no intuition so it must be told exactly what to do
Maybe this is obvious, but it’s worth reiterating that when you, as a human, say “select this text and do this with it” in your head, you’re glossing over tons and tons of steps that a computer will have to be told. To write a macro, you have to break down complex actions into each of the sub-actions until you arrive at steps simple enough for the computer to do it. Baby steps.
Use the most general commands possible to get the effect you want
Once you have simple steps the computer can handle, you have to figure out how to make those action work on all the text you want to modify. For instance, I could read the first line of my data and say, “There are 19 characters in the first column. Computer, go forward 19 characters to the start of the second column.” The problem is that this is a specific rather than a general command. It’s specific to the first line only. If you skip ahead to the fourth line, you can see it wouldn’t work correctly. Instead, I have to come up with a general way of getting to the position I want. This will require finding something that is true of every line of the data and using that consistency to my advantage. In this dataset, I know that every line has exactly two colons in it. I can also see that the second column always starts at the first space before the second colon. I can use this consistency to write general commands that always put me in the column space. As you can probably tell, this part of macros requires some cleverness.
Keeping these two rules in mind, here’s exactly how I think my macro should work:
- Move to the beginning of the line
- Set a marker at the current line
- Jump to the second column
- Delete from the start of the second column to the end of the line
- Jump to the bottom of the file
- Create a new line and paste the column data
- Jump back up to my marker
- Go down to the next line
These are pretty simple steps, but the real test will be converting them to Vim commands. As Vim commands, this is all pretty straight-forward except for the “jump to the second column” part, but I’ll explain it in detail below.
Here’s what the guts of my macro are going to look like, starting from visual mode (i.e., not Insert or Ex mode) on the first line of the data.
- 0 Jump to the beginning of the line
- mm Create a marker labelled “m”
- f:f: Jump forward to the second colon; my cursor is now in the middle of the time string in the second column
- bb Go back to the beginning of the word, then go back to the word before that, this puts my cursor on the number character at the end of the first column
- f<space> Forward to the space; my cursor is now right where I want it, in between the two columns
- d$ Delete from the cursor position to the end of the line; the text is in-memory and I can paste it somewhere else
- G Jump to the last line of the file
- o Create a new empty line, which puts me in Insert mode
- <escape> Go back to Edit mode
- p Paste the data into the line
- 0x Jump to the beginning of the line and delete the space that I don’t need
- 'm Jump back to the line I started the process with
- j Go down to the next line to start the process over
Cool, I have steps that can be repeated line after line. Now it’s time to hit the docs. According to the Macros page on the Vim Tips Wiki, to record a macro, hit “q” and then a label key, just like with markers. So, I choose “s” as my label. It’s a good, memorable letter. I now type “qs”, which gives me the familiar “recording” text at the bottom left of my editor. Only this time, I actually intend for it to show up. Huzza. I’m a pro.
If I’ve designed my macro correctly, I should be able to follow it exactly, then hit the “q” key to stop recording. Of course, as a frail human, I botched the steps up the first time and had to start over. Sad panda. Life goes on. On the second try, I got the commands right. Hitting the “q” key finishes the macro.
It’s the moment of truth. I’m on the second line of my data set. The first line has already been converted. According to the macro documentation, if I want to replay my macro, I hit “@” and the macro label. Drumroll. I hit “@s” and instantly my cursor is on the third line and the second line has been converted in exactly the way I want. At the bottom of the file, the second column of the second line has been appended. It’s magic!
I know I started the process willing to do all that work myself, but now I’m spoiled. The computer is my bitch and I want it to get me some cookies and milk while I lay on the couch basking in my genius. Translation: I realize that I’m still going to have to type “@s” a bunch of times and I’m sad. There has to be a better way! And, reading the documentation further, it turns out that you can replay a macro as many times in a row as you want by declaring the number of times to repeat the “@” then the label key. So, doing some quick math, I see I have 116 more lines to convert. I type “116@qs” and BAM! The file is now one column of data. Macros are awesome.
When you think about it, if you’re doing work in Vim already, macros are pretty much just a recording of the steps you would take anyway. From now on, I’m going to think about every editing procedure I do in terms of simple, sequential actions and general commands.
The time it took me to devise the macro plus the time it took me to get it recorded was about 15 minutes, including the time it took to read the docs. Sure, that’s longer than it would’ve taken to do the conversion by hand. But, I got smarter in that time. Manual work would’ve made me more inclined to take the route of drudgery, which I’ll submit is the equivalent of getting dumber. Fifteen minutes spent learning and applying a new skill is definitely a more leveraged use of the time than reapplying a well-worn skill. If you factor in the damage done by the error I probably would’ve made doing it all manually as well as the time it would take to find and correct the error, learning to use automation is easily the cheaper way to go. The next time around it’ll be an even better option. Speaking of the next time around, now I need to create a macro to convert this one-column state I’m in into a multicolumn file using CSV format. This is going to be be fun.