Extracting IP addresses from free-format text with ASL

ASL's pattern matching is a powerful way of verifying text and extracting information from it. In this, it is a little like "perl" and other languages that support regexs. However ASL's parsing logic is very different from that of perl, and you can hit an apparent obstacle when tackling a challenge that should be trivial.

One requirement I have met from time to time is that of extracting an IP address (or other recognizable strings) from free-format text.

One way to do this is to start by writing a rule that can match the required information. For an IP address, the following will match a string of four integers (each of which has a value between 0 and 255), delimited by dots..

IP_ADDRESS { N . "." . N . "." . N . "." . N }
N { i:integer i>=0 && i<= 255 }

The next part of the logic is to write a rule that scans through an input string until a sub-string matching the pattern is found. One syntax that can do this is..

rep(ip : IP_ADDRESS | char)?

This works by testing to see whether the input starts with something that matches the IP_ADDRESS pattern. If it does, then the found sub-string is copied into the "ip" variable. If the string doesn't start with a match, a single character is consumed from the input instead and the test is retried. This logic is repeated until the of the string is hit.

If a match isn't found anywhere in the input, then the "ip" variable will continue to hold the value it had before. For this reason it needs to be reset first; like this..

do { ip = ""; }

Building these elements into single script, we get something like the following. This extracts the last IP address from each line of the input (if any) and prints them.

START {
    do { ip = ""; }
    rep(ip: IP_ADDRESS | char)?
    .. eol
} do {
  if (ip != "") { print(ip); }
}
IP_ADDRESS { N . "." . N . "." . N . "." . N }
N { i:integer i>=0 && i<= 255 }

With a little adjustment, the same technique can readily be used to extract other recognizable sub-strings, or to print all recognized sub-strings rather than just the last one.

Scroll to Top