TIL: The Importance of RegEx

It’s no secret: I’m terrible with regex. It’s one of those things that I always say “Eh, I’ll just google it when I need it” and brush off every time it comes out. I found out hard way today that that’s a pretty poor excuse not to put some effort into learning it.

As I mentioned a few days ago, I’ve started working on a set of sample problems to try to improve my coding skills (will cover that site in a later blog). Basically the site gives you a bunch of different projects to solve doing them however you want. The only criteria is that their unit tests pass against your code.

So yesterday I got to a problem that asked you to deal with letters that appear in sequence. Basically, if you see AAAAAABCCDDDE, you need to translate that to 6AB2C3DE. And if given the shortened version, convert it to the longer version.

Without thinking, I dove head first into writing this ugly, complex, brute force nested if else monster of garbage. I figured that was the best way to accomplish it, pretty straight forward. I’d completely forgotten that regular expressions were even a tool to use…

I was able to force my way through the initial encode part of the problem, but I was completely stumped once I got to the decode. “How can I split when I’ve got these numbers here?” I thought to myself. That’s when it hit me that I’d made a huge mistake.

Let’s take a look at just how ugly the initial code I wrote was for the encode block:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
const encode = (toEncode) => {
   if (!toEncode) {
       return toEncode;
   } else {
       let final = '';
       let last = '';
       let count = 0;
       let letters = toEncode.split('');
       // console.log(letters);
       for (let i = 0; i < letters.length; i++) {
           // console.log(letters[i]);
           if (i === 0) {
               last = letters[i];
               count += 1;
           }  else if (i === letters.length - 1) {
               if (letters[i] === last) {
                   count += 1;
                   if (count > 1) {
                       final += `${count}${last}`;
                   } else {
                       final += `${last}${letters[i]}`;
                   }
               } else {
                   if (count > 1) {
                       final += `${count}${last}${letters[i]}`;
                   } else {
                       final += `${last}${letters[i]}`;
                   }
               }
           } else if (letters[i] === last) {
               count += 1;
           } else {
               if (count > 1) {
                   final += `${count}${last}`;
               } else {
                   final += `${last}`;
               }
               count = 1;
               last = letters[i]
           }
       }
       console.log(final);
       return final;
   }
};

Oh my god, it’s hideous. Functional, but absolutely hideous.

After spending about an hour and a half watching videos and reading through some stackoverflow posts, I had a much better idea of how I could be handling the problem at hand.

The code refactoring was huge:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
const encode = (toEncode) => {
   if (!toEncode) {
       return toEncode;
   } else {
       let encode = '';
       let matches = toEncode.match(/(.)\1*/g);
       for (let i of matches) {
           if (i.length > 1) {
               encode += `${i.length}`;
           }
           encode += `${i[0]}`;
       }
       return encode;
   }
};

30+ lines down to ~8, and completely readable now. No mystery as to what was going on with the code.

This was a pretty good wake up call for me, as I wasted quite a bit of time writing that terrible chunk of encoding code. I’m going to be spending a bit more time focusing on getting at least some foundation set with regex before I start running away from it again.

Expect tomorrow’s blog to be more focused on the patterns I picked up between today and tomorrow.

💚