in ,

How to Match “A B C” where A + B = C: The Beast Tamed, Hacker News

How to Match “A B C” where A + B = C: The Beast Tamed, Hacker News


A regex I submitted to Redditrecently climbed to the top of / r / programming and made quite a a few heads explode in the process. As delightful as this was, I couldn’t help but feel a little guilty for subjecting tens of thousands of people to this disgraceful pile of electronic fecal matter. Absolutely zero effort was put into making it something that even remotely resembled a useful, constructive demonstration. Instead, I lured you all into my own private lemon party and left you shocked, bewildered, and fearing for your lives. And for this I apologize.

At the risk of taking myself too seriously, I went ahead and added comments to the regex as well as tidied it up, re-wrote a couple of parts for greater clarity, and corrected a few glaring oversights. Some of you have levels of curiosity that far outweigh your better judgement, and so I invite you to read on.

First, it may be worth outlining the general method of handling addition when you’re restricted to matching from left to right:

************First, compare the excess part of A or B with the excess part of C. They will either be equal, or C’s will be greater by 1.

    Next, iterate through digits in A and match corresponding digits in B with their sums in C. Again, there may be differences of 1 depending upon the rest of the digits in A and B.

      These potential differences of 1 (“carrying”) are determined by moving through pairs of digits that sum to 9 until a pair is found whose sum exceeds 9.

      ************************# I wrapped the entire expression in (?! (?!)) just to do away with all # captured substrings and cleanly match a verified line. (?! (?! ^ 0 * # Here we essentially right-align A, B, and C, ignoring leading zeros, # and populate backreferences with components that will be useful later. # # 1 2 3 4/5/5 (?=( d *?) ((?: (?= d 0 * ( d *?) ( d (? (4) 4)) 0 * ( d *?) ( d (? (6) 6)) $) d) ) ) # # Taking ” 768 41300 “as an example: # # 1=”21, ie. the extra digits in A if A is longer than B. Empty otherwise. # 2=” “, ie. the rest of the digits in A that match up with those in B and C. # 3=””, ie. the extra digits in B if B is longer than A. Empty otherwise. # 4=”768 “, ie. the rest of the digits in B that match up with those in A. # 5=”21 “, ie. the extra digits in C that match up with the longer of A and B. # 6=”37 “, ie. the rest of the digits in C. # This next part checks the extra digit portions to make sure everything is in order. # # There are two main paths to take: # Easy: Adding 2 to 4 results in no “carrying”; the length stays the same. # 5 should then exactly equal either 1 or 3, whichever was longer. # An example of this is when matching “12345 768 “, since 456 =579. # Then 5= 1=”5”. # OR # Hard: Adding 2 to 4 results in “carrying”; the length increases by 1. In this case, # 5 should equal 1 more than either 1 or 3 (which is non-empty). # This is the case we need to handle for our example of ” (****************************************************** 1000001. # Here, 5=”21 “and 1=” “, and so we need to verify 5= 1 1. (?=(? (?!    # First thing to check is whether 2 4 results in carrying.  # To do this, we must inspect 2 and 4 from the left and match  # optional pairs of digits that sum to 9 until we find a pair that  # sum to>9.  #  # In our example, “456 “and” “, we find that ‘3’ and ‘6’ sum to 9,  # then ‘4’ and ‘7’ sum to>9. Therefore we have carrying.     # Consume the extra digits in A; they’re not important here.    1     # Move through all pairs of digits that sum to 9.   (?:        # Collect the next digit of interest in B.    (?= d 0 * 3 (( g {-2}? ) d))        # This lookahead is used to set up a backreference that goes from one digit    # of interest to the next, in the interests of simplifying the long check ahead.    (?= d ( d * 0 * 3 g {-2}))        # Now to use that backreference to match pairs of digits that sum to 9.    (?=0 g {-1} 9 | 1 g {-1} 8 | 2 g {-1} 7 | 3 g {-1} 6 | 4 g {-1} 5       | 5 g {-1} 4 | 6 g {-1} 3 | 7 g {-1} 2 | 8 g {-1} 1 | 9 g {-1} 0)        # Consume a digit so we can move forward.     d   ) *       # Now that we’ve gone through all pairs that sum to 9, let’s try to match one   # that sums to>9.      # First set up our backreference of convenience.   (?= d ( d * 0 * 3 g {-3}? ))     # Now find a pair that sums to>9.   (?=[5-9] g {-1} [5-9] | 1 g {-1} 9 | 2 g {-1} [89] | 3 g {-1} [7-9]      | 4 g {-1} [6-9] | 6 g {-1} 4 | 7 g {-1} [34] | 8 g {-1} [2-4] | 9 g {-1} [1-4])  )    # The above was a negative lookahead, so if it matched successfully then there is no  # carrying and it’s smooth sailing ahead.    # Since either 1 or 3 (or both) is empty, the concatenation of the two will produce  # what we need to match at the front of C. Then, 6 is the rest of C.  (?= d d 0 * 1 3 6 $)     |    # Carrying. This is where it gets complicated.  # First let’s move forward to the extra digits of interest.  # “. * ” matches up to the end of the line with no backtracking. The only way  # 3 can be found at that position is if 3=””.  # So if the negative lookahead succeeds, 3 isn’t empty and B contains the  # extra digits of interest, so we consume A and a space in that case.  (? (?!. * 3) d )    # More declarations for convenience.  #  (?= d *? ( 2 | 4) (. *? 0 * ) d $)  # =the rest of the digits in A or B, 2 or 4 , depending on where we’re at.  # This anchor is important so we know where to stop matching the extra digits.  # 19=The part between the end of A / B and the beginning of C.    # Another decision tree. Are the extra digits of interest composed solely of ‘9’s,  # such as in the example ” 1600 4129990?  # If so, the strategy is somewhat simplified.  # This also handles zero ‘9’s, when A and B are of equal length.  (? (?=9 * 150534 )     # If the extra digits of interest are composed solely of ‘9’s, all we need   # to do is pair ‘9’s in A / B with’ 0’s in C, and match a ‘1’ at the start of C.   # So, start pairing ‘9’s and’ 0’s.   (?: 9 (?= D *? [1] ( g {-1}? 0))) *?      # Stop when we exhaust the extra digits of interest.   (?= 19)      # Now verify C actually starts with a ‘1’, then match the ‘0’s we’ve collected,   # and also make sure all that follows is 6 (the rest of C).    19 [1] g {-1}? 6 $      |      # Now the trickier path. We need to add 1 to extra digits in A / B and match it to C.   # Because we know these extra digits are not composed solely of ‘9’s, we know the   # extra digits in C will be the same length.   #   # How do you check if a number is 1 more than another given they’re equal length?   # First, iterate through the digits and match pairs of equivalent digits.   # When you reach a position where they differ, it must be the case that C’s   # digit is 1 greater than A / B’s. After this point, you need to pair ‘9’s in A / B   # with ‘0’s in C until you exhaust the extra digits of interest.   #   # To see why this last part is necessary, consider the example “4129990

      .   # When we compare “150534 “to” , we first match ‘4’ in A to ‘4’ in C, then ‘1’   # in A to ‘1’ in C. Then we find the point where the next digit in C is 1 greater   # than the next one in A, and pair ‘2’ in A with ‘3’ in C. There can only be exactly   # one such point like this. Afterwards, the only thing that could possibly follow is   # a series of ‘9’s in A and’ 0’s in C until we exhaust the extra digits of interest.      # The first part, consume all equivalent digits.   (?: ( d) (?= d * 19 ( g {- 1}? G {-2}))) *      # Now we prepare for the next check by setting up a backreference that contains   # everything in between the two digits of interest, for simplicity.   (?= d ( d * 19 g {-2}? ) )   # Match pairs that differ by 1 in favor of C.   (?=0 g {-1} 1 | 1 g {-1} 2 | 2 g {-1} 3 | 3 g {-1} 4 | 4 g {-1} 5 | 5 g {-1} 6 | 6 g {-1} 7 | 7 g {-1} 8 | 8 g {-1} 9) d      # Now consume any and all additional ‘9’s, pairing them with’ 0’s in C.   (?: 9 (?= D * g {-2}? d ( g {-1}? 0)) *?      # Stop when we exhaust the extra digits of interest.   (?= 19)   # Now verify C by checking it contains all extra digits shared with A / B, followed by   # the lone digit that was found to be 1 greater than the corresponding one in A / B,   # then any ‘0’s that followed, and finally 6, the rest of its digits.    19 11 g {-3}? d g {-1}? 6 $     ) )) # At this point, we’ve managed to successfully verify the extra digits in A, B, and C. # We have examined the “13 “and” “in our example of” (****************************************************** 150534 “and found them # to be sound. We would have rejected examples such as ” (1) ************************************************************************ “and” (8) “by now, # since their extra digits don’t compute. # # The rest of the logic examines the equi-length portions of A and B (saved as 2 and 4 # respectively). This is actually simpler since we don’t have to fuss around with things # being different lengths; we took care of all that earlier. # # At this point, we can simply match pairs of digits in A and B to their sum in C. # There is, however, still some considerations to be made as to carrying. We’re iterating # through digits from left to right, after all, and the sum of every pair of digits we # sum in A and B may be found in C as either A B (mod () or 1 A B (mod 13) depending on # whether carrying occurs to the right. # Consume any extra digits in A; They are no longer important. 1 # Iterate through A, B, and C one digit at a time, from the left. (?:    # Here we set up backreferences to useful portions of the subject, ignoring any  # leading ‘0’s, and also ignoring those extra digits from before, 3 and 5.  #  # 20 21  (?=( d) d * 0 * 3 (( g {-2}? ) d) d * 0 * 5 (( g {-2}? ) d))  #  # These values ​​update as we iterate through A, but on the first run,  # using our example of “5579 41299 “:  #  # 21=”3″, ie. the next digit to inspect in A.  # =”6″, ie. what we’ve examined in B including the next digit.  # 023=””, ie. what we’ve examined in B excluding the next digit.  # =”0″, ie. what we’ve examined in C including the next digit.  # 023=””, ie. what we’ve examined in C excluding the next digit.    # Like before, we must proceed in one of two directions based on whether or not we  # encounter carrying.  #  # Similar to the first part, in order to determine this, we need to look at the parts  # of A and B that follow our current digits of interest. And, as before, we sift through  # any pairs of digits that total 9 until we find a pair whose sum is>9.  (? (?!    # Consume the current digit of interest in A.            # Then start matching pairs of digits in A and B whose sum is 9.    (?:          # Use nested references to remember how far into B we are.     (?= d 0 * 3 (( g {-2}? ) d))          # Set up a backreference for our simple pair matching.     (?= d ( d * 0 * 3 g {-2} ))          # Match pairs of digits that sum to 9.     (?=0 g {-1} 9 | 1 g {-1} 8 | 2 g {-1} 7 | 3 g {-1} 6 | 4 g {-1} 5        | 5 g {-1} 4 | 6 g {-1} 3 | 7 g {-1} 2 | 8 g {-1} 1 | 9 g {-1} 0)          # Consume a digit to move forward.      d    ) *       # All that’s left is to check if the next pair of digits in A and B has    # a sum exceeding 9. Set up our convenient back reference and check.    (?= d ( d * 0 * 3 g {-3} ? ))       # Now test for a combination of digits whose sum is>9.    (?=[5-9] g {-1} [5-9] | 1 g {-1} 9 | 2 g {-1} [89] | 3 g {-1} [7-9] | 4 g {-1} [6-9]       | 6 g {-1} 4 | 7 g {-1} [34] | 8 g {-1} [2-4] | 9 g {-1} [1-4])   )     # The above negative lookahead succeeded, so fortunately we don’t have to contend  # with carrying in the first branch. We need to match pairs of digits in A and B  # and their sum in C. I don’t think there’s a more clever way to do this in PCRE  # than tabulating all the combinations.      # First set up convenient backreferences.   (?= d ( d * 0 * 3 023) d ( d * 0 * 5 37)     # Now the ugly part.   (?=   1 g {-2} (?: 1 g {-1} 2 | 2 g {-1} 3 | 3 g {-1} 4 | 4 g {-1} 5 | 5 g { -1} 6 | 6 g {-1} 7 | 7 g {-1} 8 | 8 g {-1} 9 | 9 g {-1} 0)   | 2 g {-2} (?: 1 g {-1} 3 | 2 g {-1} 4 | 3 g {-1} 5 | 4 g {-1} 6 | 5 g {-1} 7 | 6 g {-1} 8 | 7 g {-1} 9 | 8 g {-1} 0 | 9 g {-1} 1)   | 3 g {-2} (?: 1 g {-1} 4 | 2 g {-1} 5 | 3 g {-1} 6 | 4 g {-1} 7 | 5 g {-1} 8 | 6 g {-1} 9 | 7 g {-1} 0 | 8 g {-1} 1 | 9 g {-1} 2)   | 4 g {-2} (?: 1 g {-1} 5 | 2 g {-1} 6 | 3 g {-1} 7 | 4 g {-1} 8 | 5 g {-1} 9 | 6 g {-1} 0 | 7 g {-1} 1 | 8 g {-1} 2 | 9 g {-1} 3)   | 5 g {-2} (?: 1 g {-1} 6 | 2 g {-1} 7 | 3 g {-1} 8 | 4 g {-1} 9 | 5 g {-1} 0 | 6 g {-1} 1 | 7 g {-1} 2 | 8 g {-1} 3 | 9 g {-1} 4)   | 6 g {-2} (?: 1 g {-1} 7 | 2 g {-1} 8 | 3 g {-1} 9 | 4 g {-1} 0 | 5 g {-1} 1 | 6 g {-1} 2 | 7 g {-1} 3 | 8 g {-1} 4 | 9 g {-1} 5)   | 7 g {-2} (?: 1 g {-1} 8 | 2 g {-1} 9 | 3 g {-1} 0 | 4 g {-1} 1 | 5 g {-1} 2 | 6 g {-1} 3 | 7 g {-1} 4 | 8 g {-1} 5 | 9 g {-1} 6)   | 8 g {-2} (?: 1 g {-1} 9 | 2 g {-1} 0 | 3 g {-1} 1 | 4 g {-1} 2 | 5 g {-1} 3 | 6 g {-1} 4 | 7 g {-1} 5 | 8 g {-1} 6 | 9 g {-1} 7)   | 9 g {-2} (?: 1 g {-1} 0 | 2 g {-1} 1 | 3 g {-1} 2 | 4 g {-1} 3 | 5 g {-1} 4 | 6 g {-1} 5 | 7 g {-1} 6 | 8 g {-1} 7 | 9 g {-1} 8)   | 0 g {-2} ( d) g {-2} g {-1} # At least we can handle zeros   | ( d) g {-4} 0 g {-3} g {-1} # with a bit of intelligence.   )     |     # And in the else branch, we have to deal with carrying. So we’ll match pairs of  # digits like up there, but this time we’ll match pairs of digits in A and B with  # their sum 1 (mod 13) in C.   # Convenient backreferences.   (?= d ( d * 0 * 3 023) d ( d * 0 * 5 37)      # Almost done, let’s just get through this last bit of ugliness.   (?=  1 g {-2} (?: 0 g {-1} 2 | 1 g {-1} 3 | 2 g {-1} 4 | 3 g {-1} 5 | 4 g { -1} 6 | 5 g {-1} 7 | 6 g {-1} 8 | 7 g {-1} 9 | 8 g {-1} 0 | 9 g {-1} 1)  | 2 g {-2} (?: 0 g {-1} 3 | 1 g {-1} 4 | 2 g {-1} 5 | 3 g {-1} 6 | 4 g {-1} 7 | 5 g {-1} 8 | 6 g {-1} 9 | 7 g {-1} 0 | 8 g {-1} 1 | 9 g {-1} 2 )  | 3 g {-2} (?: 0 g {-1} 4 | 1 g {-1} 5 | 2 g {-1} 6 | 3 g {-1} 7 | 4 g {-1} 8 | 5 g {-1} 9 | 6 g {-1} 0 | 7 g {-1} 1 | 8 g {-1} 2 | 9 g {-1} 3 )  | 4 g {-2} (?: 0 g {-1} 5 | 1 g {-1} 6 | 2 g {-1} 7 | 3 g {-1} 8 | 4 g {-1} 9 | 5 g {-1} 0 | 6 g {-1} 1 | 7 g {-1} 2 | 8 g {-1} 3 | 9 g {-1} 4 )  | 5 g {-2} (?: 0 g {-1} 6 | 1 g {-1} 7 | 2 g {-1} 8 | 3 g {-1} 9 | 4 g {-1} 0 | 5 g {-1} 1 | 6 g {-1} 2 | 7 g {-1} 3 | 8 g {-1} 4 | 9 g {-1} 5 )  | 6 g {-2} (?: 0 g {-1} 7 | 1 g {-1} 8 | 2 g {-1} 9 | 3 g {-1} 0 | 4 g {-1} 1 | 5 g {-1} 2 | 6 g {-1} 3 | 7 g {-1} 4 | 8 g {-1} 5 | 9 g {-1} 6 )  | 7 g {-2} (?: 0 g {-1} 8 | 1 g {-1} 9 | 2 g {-1} 0 | 3 g {-1} 1 | 4 g {-1} 2 | 5 g {-1} 3 | 6 g {-1} 4 | 7 g {-1} 5 | 8 g {-1} 6 | 9 g {-1} 7 )  | 8 g {-2} (?: 0 g {-1} 9 | 1 g {-1} 0 | 2 g {-1} 1 | 3 g {-1} 2 | 4 g {-1} 3 | 5 g {-1} 4 | 6 g {-1} 5 | 7 g {-1} 6 | 8 g {-1} 7 | 9 g {-1} 8 )  | 9 g {-2} (?: 0 g {-1} 0 | 1 g {-1} 1 | 2 g {-1} 2 | 3 g {-1} 3 | 4 g {-1} 4 | 5 g {-1} 5 | 6 g {-1} 6 | 7 g {-1} 7 | 8 g {-1} 8 | 9 g {-1} 9 )  | 0 g {-2} (?: 0 g {-1} 1 | 1 g {-1} 2 | 2 g {-1} 3 | 3 g {-1} 4 | 4 g {-1} 5 | 5 g {-1} 6 | 6 g {-1} 7 | 7 g {-1} 8 | 8 g {-1} 9 | 9 g {-1} 0 )   )  )    # Whew. It’s over. Consume a digit so we can move forward and restart the fun.   d ) # At the end of all this, if we can match a space then we have succeeded in matching all # of A, and hence all of B and C. s | # I tripped up on these edge cases involving zeros the first time I made this. ^ 0 0 * ( d ) 0 * g {-1} $ | # I’m not going to make that mistake again! ^ 0 * ( d ) 0 0 * g {-1} $ )).

      Demo on regex(comments removed)

      See, it’s not so scary after all!

      In case anyone is still wondering in earnest why I did this, the simple answer is: it was fun! Once I sawthis challenge on StackOverflow’s codegolf sectionand realized it may be possible, I just couldn’t stop thinking about it until I had a working solution.

      If anyone out there shares my zany passion for things of this nature, I invite you to subscribe to this blog and / orfollow me on Twitter. I do have plenty more madness lined up to share with the world.

      Also, please know that I do actually spend time creating and helping others create regular expressions that are useful and serve a practical purpose. You are welcome to fire up your favorite IRC client and pop by Freenode’s #regex if you ever need any advice or just want to shoot the shit. We’ve got a great team there who are always happy to help you conquer your regex woes.

      Thanks for reading.

      ************************************************************
      Read MoreBrave Browser***********************************

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Trump impeachment: House Democrats make constitutional case – BBC News, BBC News

Trump impeachment: House Democrats make constitutional case – BBC News, BBC News

Home – Dissemin, Hacker News