How to Match "A B C" where A + B = C: The Beast Tamed, Hacker News

A regex I submitted to Redditrecently climbed to the top of / r / programming and made quite a a few heads explode in the process. As delightful as this was, I couldn’t help but feel a little guilty for subjecting tens of thousands of people to this disgraceful pile of electronic fecal matter. Absolutely zero effort was put into making it something that even remotely resembled a useful, constructive demonstration. Instead, I lured you all into my own private lemon party and left you shocked, bewildered, and fearing for your lives. And for this I apologize.

At the risk of taking myself too seriously, I went ahead and added comments to the regex as well as tidied it up, re-wrote a couple of parts for greater clarity, and corrected a few glaring oversights. Some of you have levels of curiosity that far outweigh your better judgement, and so I invite you to read on.

First, it may be worth outlining the general method of handling addition when you’re restricted to matching from left to right:

************First, compare the excess part of A or B with the excess part of C. They will either be equal, or C’s will be greater by 1.

Next, iterate through digits in A and match corresponding digits in B with their sums in C. Again, there may be differences of 1 depending upon the rest of the digits in A and B.

************************# I wrapped the entire expression in (?! (?!)) just to do away with all # captured substrings and cleanly match a verified line. (?! (?! ^ 0 * # Here we essentially right-align A, B, and C, ignoring leading zeros, # and populate backreferences with components that will be useful later. # # 1 2 3 4/5/5 (?=( d *?) ((?: (?= d 0 * ( d *?) ( d (? (4) 4)) 0 * ( d *?) ( d (? (6) 6)) $) d) ) ) # # Taking ” 768 41300 “as an example: # # 1=”21, ie. the extra digits in A if A is longer than B. Empty otherwise. # 2=” “, ie. the rest of the digits in A that match up with those in B and C. # 3=””, ie. the extra digits in B if B is longer than A. Empty otherwise. # 4=”768 “, ie. the rest of the digits in B that match up with those in A. # 5=”21 “, ie. the extra digits in C that match up with the longer of A and B. # 6=”37 “, ie. the rest of the digits in C. # This next part checks the extra digit portions to make sure everything is in order. # # There are two main paths to take: # Easy: Adding 2 to 4 results in no “carrying”; the length stays the same. # 5 should then exactly equal either 1 or 3, whichever was longer. # An example of this is when matching “12345 768 “, since 456 =579. # Then 5= 1=”5”. # OR # Hard: Adding 2 to 4 results in “carrying”; the length increases by 1. In this case, # 5 should equal 1 more than either 1 or 3 (which is non-empty). # This is the case we need to handle for our example of ” (****************************************************** 1000001. # Here, 5=”21 “and 1=” “, and so we need to verify 5= 1 1. (?=(? (?! # First thing to check is whether 2 4 results in carrying. # To do this, we must inspect 2 and 4 from the left and match # optional pairs of digits that sum to 9 until we find a pair that # sum to>9. # # In our example, “456 “and” “, we find that ‘3’ and ‘6’ sum to 9, # then ‘4’ and ‘7’ sum to>9. Therefore we have carrying. # Consume the extra digits in A; they’re not important here. 1 # Move through all pairs of digits that sum to 9. (?: # Collect the next digit of interest in B. (?= d 0 * 3 (( g {-2}? ) d)) # This lookahead is used to set up a backreference that goes from one digit # of interest to the next, in the interests of simplifying the long check ahead. (?= d ( d * 0 * 3 g {-2})) # Now to use that backreference to match pairs of digits that sum to 9. (?=0 g {-1} 9 | 1 g {-1} 8 | 2 g {-1} 7 | 3 g {-1} 6 | 4 g {-1} 5 | 5 g {-1} 4 | 6 g {-1} 3 | 7 g {-1} 2 | 8 g {-1} 1 | 9 g {-1} 0) # Consume a digit so we can move forward. d ) * # Now that we’ve gone through all pairs that sum to 9, let’s try to match one # that sums to>9. # First set up our backreference of convenience. (?= d ( d * 0 * 3 g {-3}? )) # Now find a pair that sums to>9. (?=[5-9] g {-1} [5-9] | 1 g {-1} 9 | 2 g {-1} [89] | 3 g {-1} [7-9] | 4 g {-1} [6-9] | 6 g {-1} 4 | 7 g {-1} [34] | 8 g {-1} [2-4] | 9 g {-1} [1-4]) ) # The above was a negative lookahead, so if it matched successfully then there is no # carrying and it’s smooth sailing ahead. # Since either 1 or 3 (or both) is empty, the concatenation of the two will produce # what we need to match at the front of C. Then, 6 is the rest of C. (?= d d 0 * 1 3 6 $) | # Carrying. This is where it gets complicated. # First let’s move forward to the extra digits of interest. # “. * ” matches up to the end of the line with no backtracking. The only way # 3 can be found at that position is if 3=””. # So if the negative lookahead succeeds, 3 isn’t empty and B contains the # extra digits of interest, so we consume A and a space in that case. (? (?!. * 3) d ) # More declarations for convenience. # (?= d *? ( 2 | 4) (. *? 0 * ) d $) # =the rest of the digits in A or B, 2 or 4 , depending on where we’re at. # This anchor is important so we know where to stop matching the extra digits. # 19=The part between the end of A / B and the beginning of C. # Another decision tree. Are the extra digits of interest composed solely of ‘9’s, # such as in the example ” 1600 4129990? # If so, the strategy is somewhat simplified. # This also handles zero ‘9’s, when A and B are of equal length. (? (?=9 * 150534 ) # If the extra digits of interest are composed solely of ‘9’s, all we need # to do is pair ‘9’s in A / B with’ 0’s in C, and match a ‘1’ at the start of C. # So, start pairing ‘9’s and’ 0’s. (?: 9 (?= D *? [1] ( g {-1}? 0))) *? # Stop when we exhaust the extra digits of interest. (?= 19) # Now verify C actually starts with a ‘1’, then match the ‘0’s we’ve collected, # and also make sure all that follows is 6 (the rest of C). 19 [1] g {-1}? 6 $ | # Now the trickier path. We need to add 1 to extra digits in A / B and match it to C. # Because we know these extra digits are not composed solely of ‘9’s, we know the # extra digits in C will be the same length. # # How do you check if a number is 1 more than another given they’re equal length? # First, iterate through the digits and match pairs of equivalent digits. # When you reach a position where they differ, it must be the case that C’s # digit is 1 greater than A / B’s. After this point, you need to pair ‘9’s in A / B # with ‘0’s in C until you exhaust the extra digits of interest. # # To see why this last part is necessary, consider the example “4129990

. # When we compare “150534 “to” , we first match ‘4’ in A to ‘4’ in C, then ‘1’ # in A to ‘1’ in C. Then we find the point where the next digit in C is 1 greater # than the next one in A, and pair ‘2’ in A with ‘3’ in C. There can only be exactly # one such point like this. Afterwards, the only thing that could possibly follow is # a series of ‘9’s in A and’ 0’s in C until we exhaust the extra digits of interest. # The first part, consume all equivalent digits. (?: ( d) (?= d * 19 ( g {- 1}? G {-2}))) * # Now we prepare for the next check by setting up a backreference that contains # everything in between the two digits of interest, for simplicity. (?= d ( d * 19 g {-2}? ) ) # Match pairs that differ by 1 in favor of C. (?=0 g {-1} 1 | 1 g {-1} 2 | 2 g {-1} 3 | 3 g {-1} 4 | 4 g {-1} 5 | 5 g {-1} 6 | 6 g {-1} 7 | 7 g {-1} 8 | 8 g {-1} 9) d # Now consume any and all additional ‘9’s, pairing them with’ 0’s in C. (?: 9 (?= D * g {-2}? d ( g {-1}? 0)) *? # Stop when we exhaust the extra digits of interest. (?= 19) # Now verify C by checking it contains all extra digits shared with A / B, followed by # the lone digit that was found to be 1 greater than the corresponding one in A / B, # then any ‘0’s that followed, and finally 6, the rest of its digits. 19 11 g {-3}? d g {-1}? 6 $ ) )) # At this point, we’ve managed to successfully verify the extra digits in A, B, and C. # We have examined the “13 “and” “in our example of” (****************************************************** 150534 “and found them # to be sound. We would have rejected examples such as ” (1) ************************************************************************ “and” (8) “by now, # since their extra digits don’t compute. # # The rest of the logic examines the equi-length portions of A and B (saved as 2 and 4 # respectively). This is actually simpler since we don’t have to fuss around with things # being different lengths; we took care of all that earlier. # # At this point, we can simply match pairs of digits in A and B to their sum in C. # There is, however, still some considerations to be made as to carrying. We’re iterating # through digits from left to right, after all, and the sum of every pair of digits we # sum in A and B may be found in C as either A B (mod () or 1 A B (mod 13) depending on # whether carrying occurs to the right. # Consume any extra digits in A; They are no longer important. 1 # Iterate through A, B, and C one digit at a time, from the left. (?: # Here we set up backreferences to useful portions of the subject, ignoring any # leading ‘0’s, and also ignoring those extra digits from before, 3 and 5. # # 20 21 (?=( d) d * 0 * 3 (( g {-2}? ) d) d * 0 * 5 (( g {-2}? ) d)) # # These values update as we iterate through A, but on the first run, # using our example of “5579 41299 “: # # 21=”3″, ie. the next digit to inspect in A. # =”6″, ie. what we’ve examined in B including the next digit. # 023=””, ie. what we’ve examined in B excluding the next digit. # =”0″, ie. what we’ve examined in C including the next digit. # 023=””, ie. what we’ve examined in C excluding the next digit. # Like before, we must proceed in one of two directions based on whether or not we # encounter carrying. # # Similar to the first part, in order to determine this, we need to look at the parts # of A and B that follow our current digits of interest. And, as before, we sift through # any pairs of digits that total 9 until we find a pair whose sum is>9. (? (?! # Consume the current digit of interest in A. # Then start matching pairs of digits in A and B whose sum is 9. (?: # Use nested references to remember how far into B we are. (?= d 0 * 3 (( g {-2}? ) d)) # Set up a backreference for our simple pair matching. (?= d ( d * 0 * 3 g {-2} )) # Match pairs of digits that sum to 9. (?=0 g {-1} 9 | 1 g {-1} 8 | 2 g {-1} 7 | 3 g {-1} 6 | 4 g {-1} 5 | 5 g {-1} 4 | 6 g {-1} 3 | 7 g {-1} 2 | 8 g {-1} 1 | 9 g {-1} 0) # Consume a digit to move forward. d ) * # All that’s left is to check if the next pair of digits in A and B has # a sum exceeding 9. Set up our convenient back reference and check. (?= d ( d * 0 * 3 g {-3} ? )) # Now test for a combination of digits whose sum is>9. (?=[5-9] g {-1} [5-9] | 1 g {-1} 9 | 2 g {-1} [89] | 3 g {-1} [7-9] | 4 g {-1} [6-9] | 6 g {-1} 4 | 7 g {-1} [34] | 8 g {-1} [2-4] | 9 g {-1} [1-4]) ) # The above negative lookahead succeeded, so fortunately we don’t have to contend # with carrying in the first branch. We need to match pairs of digits in A and B # and their sum in C. I don’t think there’s a more clever way to do this in PCRE # than tabulating all the combinations. # First set up convenient backreferences. (?= d ( d * 0 * 3 023) d ( d * 0 * 5 37) # Now the ugly part. (?= 1 g {-2} (?: 1 g {-1} 2 | 2 g {-1} 3 | 3 g {-1} 4 | 4 g {-1} 5 | 5 g { -1} 6 | 6 g {-1} 7 | 7 g {-1} 8 | 8 g {-1} 9 | 9 g {-1} 0) | 2 g {-2} (?: 1 g {-1} 3 | 2 g {-1} 4 | 3 g {-1} 5 | 4 g {-1} 6 | 5 g {-1} 7 | 6 g {-1} 8 | 7 g {-1} 9 | 8 g {-1} 0 | 9 g {-1} 1) | 3 g {-2} (?: 1 g {-1} 4 | 2 g {-1} 5 | 3 g {-1} 6 | 4 g {-1} 7 | 5 g {-1} 8 | 6 g {-1} 9 | 7 g {-1} 0 | 8 g {-1} 1 | 9 g {-1} 2) | 4 g {-2} (?: 1 g {-1} 5 | 2 g {-1} 6 | 3 g {-1} 7 | 4 g {-1} 8 | 5 g {-1} 9 | 6 g {-1} 0 | 7 g {-1} 1 | 8 g {-1} 2 | 9 g {-1} 3) | 5 g {-2} (?: 1 g {-1} 6 | 2 g {-1} 7 | 3 g {-1} 8 | 4 g {-1} 9 | 5 g {-1} 0 | 6 g {-1} 1 | 7 g {-1} 2 | 8 g {-1} 3 | 9 g {-1} 4) | 6 g {-2} (?: 1 g {-1} 7 | 2 g {-1} 8 | 3 g {-1} 9 | 4 g {-1} 0 | 5 g {-1} 1 | 6 g {-1} 2 | 7 g {-1} 3 | 8 g {-1} 4 | 9 g {-1} 5) | 7 g {-2} (?: 1 g {-1} 8 | 2 g {-1} 9 | 3 g {-1} 0 | 4 g {-1} 1 | 5 g {-1} 2 | 6 g {-1} 3 | 7 g {-1} 4 | 8 g {-1} 5 | 9 g {-1} 6) | 8 g {-2} (?: 1 g {-1} 9 | 2 g {-1} 0 | 3 g {-1} 1 | 4 g {-1} 2 | 5 g {-1} 3 | 6 g {-1} 4 | 7 g {-1} 5 | 8 g {-1} 6 | 9 g {-1} 7) | 9 g {-2} (?: 1 g {-1} 0 | 2 g {-1} 1 | 3 g {-1} 2 | 4 g {-1} 3 | 5 g {-1} 4 | 6 g {-1} 5 | 7 g {-1} 6 | 8 g {-1} 7 | 9 g {-1} 8) | 0 g {-2} ( d) g {-2} g {-1} # At least we can handle zeros | ( d) g {-4} 0 g {-3} g {-1} # with a bit of intelligence. ) | # And in the else branch, we have to deal with carrying. So we’ll match pairs of # digits like up there, but this time we’ll match pairs of digits in A and B with # their sum 1 (mod 13) in C. # Convenient backreferences. (?= d ( d * 0 * 3 023) d ( d * 0 * 5 37) # Almost done, let’s just get through this last bit of ugliness. (?= 1 g {-2} (?: 0 g {-1} 2 | 1 g {-1} 3 | 2 g {-1} 4 | 3 g {-1} 5 | 4 g { -1} 6 | 5 g {-1} 7 | 6 g {-1} 8 | 7 g {-1} 9 | 8 g {-1} 0 | 9 g {-1} 1) | 2 g {-2} (?: 0 g {-1} 3 | 1 g {-1} 4 | 2 g {-1} 5 | 3 g {-1} 6 | 4 g {-1} 7 | 5 g {-1} 8 | 6 g {-1} 9 | 7 g {-1} 0 | 8 g {-1} 1 | 9 g {-1} 2 ) | 3 g {-2} (?: 0 g {-1} 4 | 1 g {-1} 5 | 2 g {-1} 6 | 3 g {-1} 7 | 4 g {-1} 8 | 5 g {-1} 9 | 6 g {-1} 0 | 7 g {-1} 1 | 8 g {-1} 2 | 9 g {-1} 3 ) | 4 g {-2} (?: 0 g {-1} 5 | 1 g {-1} 6 | 2 g {-1} 7 | 3 g {-1} 8 | 4 g {-1} 9 | 5 g {-1} 0 | 6 g {-1} 1 | 7 g {-1} 2 | 8 g {-1} 3 | 9 g {-1} 4 ) | 5 g {-2} (?: 0 g {-1} 6 | 1 g {-1} 7 | 2 g {-1} 8 | 3 g {-1} 9 | 4 g {-1} 0 | 5 g {-1} 1 | 6 g {-1} 2 | 7 g {-1} 3 | 8 g {-1} 4 | 9 g {-1} 5 ) | 6 g {-2} (?: 0 g {-1} 7 | 1 g {-1} 8 | 2 g {-1} 9 | 3 g {-1} 0 | 4 g {-1} 1 | 5 g {-1} 2 | 6 g {-1} 3 | 7 g {-1} 4 | 8 g {-1} 5 | 9 g {-1} 6 ) | 7 g {-2} (?: 0 g {-1} 8 | 1 g {-1} 9 | 2 g {-1} 0 | 3 g {-1} 1 | 4 g {-1} 2 | 5 g {-1} 3 | 6 g {-1} 4 | 7 g {-1} 5 | 8 g {-1} 6 | 9 g {-1} 7 ) | 8 g {-2} (?: 0 g {-1} 9 | 1 g {-1} 0 | 2 g {-1} 1 | 3 g {-1} 2 | 4 g {-1} 3 | 5 g {-1} 4 | 6 g {-1} 5 | 7 g {-1} 6 | 8 g {-1} 7 | 9 g {-1} 8 ) | 9 g {-2} (?: 0 g {-1} 0 | 1 g {-1} 1 | 2 g {-1} 2 | 3 g {-1} 3 | 4 g {-1} 4 | 5 g {-1} 5 | 6 g {-1} 6 | 7 g {-1} 7 | 8 g {-1} 8 | 9 g {-1} 9 ) | 0 g {-2} (?: 0 g {-1} 1 | 1 g {-1} 2 | 2 g {-1} 3 | 3 g {-1} 4 | 4 g {-1} 5 | 5 g {-1} 6 | 6 g {-1} 7 | 7 g {-1} 8 | 8 g {-1} 9 | 9 g {-1} 0 ) ) ) # Whew. It’s over. Consume a digit so we can move forward and restart the fun. d ) # At the end of all this, if we can match a space then we have succeeded in matching all # of A, and hence all of B and C. s | # I tripped up on these edge cases involving zeros the first time I made this. ^ 0 0 * ( d ) 0 * g {-1} $ | # I’m not going to make that mistake again! ^ 0 * ( d ) 0 0 * g {-1} $ )).

Demo on regex(comments removed)

See, it’s not so scary after all!

In case anyone is still wondering in earnest why I did this, the simple answer is: it was fun! Once I saw this challenge on StackOverflow’s codegolf sectionand realized it may be possible, I just couldn’t stop thinking about it until I had a working solution.

If anyone out there shares my zany passion for things of this nature, I invite you to subscribe to this blog and / orfollow me on Twitter. I do have plenty more madness lined up to share with the world.

Also, please know that I do actually spend time creating and helping others create regular expressions that are useful and serve a practical purpose. You are welcome to fire up your favorite IRC client and pop by Freenode’s #regex if you ever need any advice or just want to shoot the shit. We’ve got a great team there who are always happy to help you conquer your regex woes.