Undo that mess

During last marking season (at the end of the semester), I had, of course, to grade a lot of assignments. For some reason, every semester, I have a good number of students that write code like they just don’t care. I get code that looks like this:

int fonction              (int random_spacing)^M{           ^M
  int            niaiseuses;

  for (int i=0;i<random_spacing;         i++){
                    {
       {
        std::cout
         << bleh
         << std::endl;
    }}

  }
}

There’s a bit of everything. Random spacing. Traces of conversions from one OS to another, braces at the end of line. Of course, they lose points, but that doesn’t make the code any easier to read. In a previous installment, I proposed something to rebuild the whitespaces only. Now, let’s see how we can repair as many defects as possible with an Emacs function.

Let’s start at the beginning: a list of the things to repair:

  • OS-related conversion. Linux/*nixes end lines in \n, Windows in \r\n. Other platforms may use something else. Let’s not concern ourselves with the ZX80.
  • Replace longs series of (white)spaces by only one space.
  • Deal with braces at the end of lines.
  • Reindent everything else using the defined style.

The first two items can be combined. Since transforming \r\n into \n only requires to remove \r, we can bundle series of (white)spaces and \r for replacement. I’m not a regex ninja: I came up with this:

; replaces multiple spaces and stray ^M
(while (re-search-forward "[[:space:]\|?\r]+" nil t)
  (replace-match " " nil nil))

Trailing braces are a bit more complicated. They may, or mayn’t, be preceded by spaces and followedby spaces. This time, the regex is a bit more complicated:

; remove fiendish { at end of (non-empty) line
(while (re-search-forward
 "\\([^[:space:]{?\n]+\\)\\([[:space:]]*\\)\\({\\)\\([[:space:]]*$\\)" nil t)
 (replace-match "\\1\n{" nil nil))

It matches three parts. Something that is not whitespaces, followed by something that is whitespaces, the brace {, then whitespaces to the end of line. OK, that makes four. The only one we’re interested in not replacing is the first (the \\1 argument in replace). Everything else, most of it whitespaces, is replaced by newline, { , newline.

Now, the buffer should be in a rather messy state, possibly with trailing whitespaces and destroyed indentation. Calls to whitespace-cleanup and indent-region should finish the job.

Putting all that together:

(defun cleanup-whole-buffer()
   "Removes ^M, tabs, and reindent whole buffer"
   (interactive)
   (save-excursion
     (undo-boundary)

     (beginning-of-buffer)
     ; replaces multiple spaces and stray ^M
     (while (re-search-forward "[[:space:]\|?\r]+" nil t)
       (replace-match " " nil nil))

     (beginning-of-buffer)
     ; remove fiendish { at end of (non-empty) line
     (while (re-search-forward
             "\\([^[:space:]{?\n]+\\)\\([[:space:]]*\\)\\({\\)\\([[:space:]]*$\\)" nil t)
       (replace-match "\\1\n{" nil nil))

     (beginning-of-buffer)
     (whitespace-cleanup)
     (indent-region (point-min) (point-max) nil)
     )
   )

A few explanations on the other stuff we haven’t discussed yet. The save-excursion primitive saves cursor position so that when the function ends, we are still where we called it from. The undo-boundary makes sure that we won’t need a series of undos to undo the cleanup. beginning-of-buffer moves the cursor… at the beginning of the buffer.

Applying it to the above code snippet, we end up with:

int fonction (int random_spacing)
{
  int niaiseuses;

  for (int i=0;i<random_spacing; i++)
   {
    {
     {
      std::cout
       << bleh
       << std::endl;
     }}

   }
}

There are still a number of issues. For example, i++ has still an extraneous space before it, and we still have two closing braces on the same line. Maybe we should fix that sometime.

6 Responses to Undo that mess

  1. A whole lot simpler solution would be to run something astyle (http://astyle.sourceforge.net/) on the file. It does a lot more than what you can do with a few regexes.

    Another option is clang-format but I haven’t used that one.

    That is what I have used when I had to check/grade assignments.

  2. ivankanis2 says:

    Do you know M-x delete-trailing-whitespace ?

  3. tjwhaynes says:

    A second vote for astyle. There are many automated code formatters out there – I also like uncrustify.

  4. […] at the Harder, Better, Faster, Stronger site, Steven Pigeon wrote this post about cleaning up code […]

Leave a comment