A regex I submitted to Redditrecently climbed to the top of / r / programming and made quite a a few heads explode in the process. As delightful as this was, I couldn’t help but feel a little guilty for subjecting tens of thousands of people to this disgraceful pile of electronic fecal matter. Absolutely zero effort was put into making it something that even remotely resembled a useful, constructive demonstration. Instead, I lured you all into my own private lemon party and left you shocked, bewildered, and fearing for your lives. And for this I apologize.

************First, compare the excess part of A or B with the excess part of C. They will either be equal, or C’s will be greater by 1.

- Next, iterate through digits in A and match corresponding digits in B with their sums in C. Again, there may be differences of 1 depending upon the rest of the digits in A and B.

- These potential differences of 1 (“carrying”) are determined by moving through pairs of digits that sum to 9 until a pair is found whose sum exceeds 9.

************************# I wrapped the entire expression in (?! (?!)) just to do away with all # captured substrings and cleanly match a verified line. (?! (?! ^ 0 * # Here we essentially right-align A, B, and C, ignoring leading zeros, # and populate backreferences with components that will be useful later. # # 1 2 3 4/5/5 (?=( d *?) ((?: (?= d 0 * ( d *?) ( d (? (4) 4)) 0 * ( d *?) ( d (? (6) 6)) $) d) ) ) # # Taking ” 768 41300 “as an example: # # 1=”21, ie. the extra digits in A if A is longer than B. Empty otherwise. # 2=” “, ie. the rest of the digits in A that match up with those in B and C. # 3=””, ie. the extra digits in B if B is longer than A. Empty otherwise. # 4=”768 “, ie. the rest of the digits in B that match up with those in A. # 5=”21 “, ie. the extra digits in C that match up with the longer of A and B. # 6=”37 “, ie. the rest of the digits in C. # This next part checks the extra digit portions to make sure everything is in order. # # There are two main paths to take: # Easy: Adding 2 to 4 results in no “carrying”; the length stays the same. # 5 should then exactly equal either 1 or 3, whichever was longer. # An example of this is when matching “12345 768 “, since 456 =579. # Then 5= 1=”5”. # OR # Hard: Adding 2 to 4 results in “carrying”; the length increases by 1. In this case, # 5 should equal 1 more than either 1 or 3 (which is non-empty). # This is the case we need to handle for our example of ” (****************************************************** 1000001. # Here, 5=”21 “and 1=” “, and so we need to verify 5= 1 1. (?=(? (?! # First thing to check is whether 2 4 results in carrying. # To do this, we must inspect 2 and 4 from the left and match # optional pairs of digits that sum to 9 until we find a pair that # sum to>9. # # In our example, “456 “and” “, we find that ‘3’ and ‘6’ sum to 9, # then ‘4’ and ‘7’ sum to>9. Therefore we have carrying. # Consume the extra digits in A; they’re not important here. 1 # Move through all pairs of digits that sum to 9. (?: # Collect the next digit of interest in B. (?= d 0 * 3 (( g {-2}? ) d)) # This lookahead is used to set up a backreference that goes from one digit # of interest to the next, in the interests of simplifying the long check ahead. (?= d ( d * 0 * 3 g {-2})) # Now to use that backreference to match pairs of digits that sum to 9. (?=0 g {-1} 9 | 1 g {-1} 8 | 2 g {-1} 7 | 3 g {-1} 6 | 4 g {-1} 5 | 5 g {-1} 4 | 6 g {-1} 3 | 7 g {-1} 2 | 8 g {-1} 1 | 9 g {-1} 0) # Consume a digit so we can move forward. d ) * # Now that we’ve gone through all pairs that sum to 9, let’s try to match one # that sums to>9. # First set up our backreference of convenience. (?= d ( d * 0 * 3 g {-3}? )) # Now find a pair that sums to>9. (?=[5-9] g {-1} [5-9] | 1 g {-1} 9 | 2 g {-1} [89] | 3 g {-1} [7-9] | 4 g {-1} [6-9] | 6 g {-1} 4 | 7 g {-1} [34] | 8 g {-1} [2-4] | 9 g {-1} [1-4]) ) # The above was a negative lookahead, so if it matched successfully then there is no # carrying and it’s smooth sailing ahead. # Since either 1 or 3 (or both) is empty, the concatenation of the two will produce # what we need to match at the front of C. Then, 6 is the rest of C. (?= d d 0 * 1 3 6 $) | # Carrying. This is where it gets complicated. # First let’s move forward to the extra digits of interest. # “. * ” matches up to the end of the line with no backtracking. The only way # 3 can be found at that position is if 3=””. # So if the negative lookahead succeeds, 3 isn’t empty and B contains the # extra digits of interest, so we consume A and a space in that case. (? (?!. * 3) d ) # More declarations for convenience. # (?= d *? ( 2 | 4) (. *? 0 * ) d $) # =the rest of the digits in A or B, 2 or 4 , depending on where we’re at. # This anchor is important so we know where to stop matching the extra digits. # 19=The part between the end of A / B and the beginning of C. # Another decision tree. Are the extra digits of interest composed solely of ‘9’s, # such as in the example ” 1600 4129990? # If so, the strategy is somewhat simplified. # This also handles zero ‘9’s, when A and B are of equal length. (? (?=9 * 150534 ) # If the extra digits of interest are composed solely of ‘9’s, all we need # to do is pair ‘9’s in A / B with’ 0’s in C, and match a ‘1’ at the start of C. # So, start pairing ‘9’s and’ 0’s. (?: 9 (?= D *? [1] ( g {-1}? 0))) *? # Stop when we exhaust the extra digits of interest. (?= 19) # Now verify C actually starts with a ‘1’, then match the ‘0’s we’ve collected, # and also make sure all that follows is 6 (the rest of C). 19 [1] g {-1}? 6 $ | # Now the trickier path. We need to add 1 to extra digits in A / B and match it to C. # Because we know these extra digits are not composed solely of ‘9’s, we know the # extra digits in C will be the same length. # # How do you check if a number is 1 more than another given they’re equal length? # First, iterate through the digits and match pairs of equivalent digits. # When you reach a position where they differ, it must be the case that C’s # digit is 1 greater than A / B’s. After this point, you need to pair ‘9’s in A / B # with ‘0’s in C until you exhaust the extra digits of interest. # # To see why this last part is necessary, consider the example “4129990

Demo on regex(comments removed)

If anyone out there shares my zany passion for things of this nature, I invite you to subscribe to this blog and / orfollow me on Twitter. I do have plenty more madness lined up to share with the world.

Also, please know that I do actually spend time creating and helping others create regular expressions that are useful and serve a practical purpose. You are welcome to fire up your favorite IRC client and pop by Freenode’s #regex if you ever need any advice or just want to shoot the shit. We’ve got a great team there who are always happy to help you conquer your regex woes.

Thanks for reading.

************************************************************

Read More***********************************