Awk in 20 Minutes (2015), Hacker News

(/ 03 / What’s Awk

Awk is a tiny programming language and a command line tool. It’s particularly appropriate for log parsing on servers, mostly because Awk will operate on files, usually structured in lines of human-readable text.

I say it’s useful on servers because log files, dump files, or whatever text format servers end up dumping to disk will tend to grow large, and you’ll have many of them per server. If you ever get into the situation where you have to analyze gigabytes of files from different servers without tools like Splunk

or its equivalents, it would feel fairly bad to have and download all these files locally to then drive some forensics on them.

This personally happens to me when some Erlang nodes tend to die and leave a crash dump of 1598 MB to 4GB behind, or on smaller individual servers (say a VPS) where I need to quickly go through logs, looking for a common pattern.

In any case, Awk does more than finding data (otherwise, grep or ack would be enough) - it also lets you process the data and transform it.


 Code Structure 


 An Awk script is structured simply, as a sequence of patterns and actions:

[, , ] [, , ] # comment Pattern1 { ACTIONS ; [, , ] } [, , ] # comment Pattern2 [, , ] { ACTIONS ; [, , ] } [, , ] # comment Pattern3 { ACTIONS ; [, , ] } [, , ] # comment Pattern4 { ACTIONS ; [, , ] } [, , ]

Every line of the document to scan will have to go through each of the patterns, one at a time. So if I pass in a file that contains the following content:

 this is line 1 this is line 2    Then the content 
 this is line 1  will match against  Pattern1 . If it matches,  ACTIONS  will be executed. Then  this is line 1  (will match against  Pattern2) . If it doesn't match, it skips to  Pattern3  , and so on. 

 Once all patterns have been cleared, 
 this is line 2  will go through the same process, and so on for other lines, until the input has been Read entirely. 
 This, in short, is Awk’s execution model. 
 Data Types 
  

 Awk only has two main data types: strings and numbers. And even then, Awk likes to convert them into each other. Strings can be interpreted as numerals to convert their values to numbers. If the string doesn’t look like a numeral, it’s  0 . 

 Both can be assigned to variables in  ACTIONS  parts of your code with the =

 operator. Variables can be declared anywhere, at any time, and used even if they're not initialized: their default value is  , the empty string. 
 Finally, Awk has arrays. They're unidimensional associative arrays that can be started dynamically. Their syntax is just  var [key]=value . Awk can  simulate multidimensional arrays 

, but it's all a big hack anyway. 
   Patterns 
  
 The patterns that can be used will fall into three broad categories: regular expressions, Boolean expressions, and special patterns. 
   Regular and Boolean Expressions []   
 The Awk regular expressions are your run of the mill regexes. They're not PCRE under  awk  (but  gawk  will support the fancier stuff - it depends on the implementation! See with 
 awk --version , though for most usages they'll do plenty: 
    [, , ] [, , ]  / admin /  [, , ] {  ... [, , ] } 
  # any line that contains' admin '  / ^ admin /  [, , ]  { [, , ] ...  } [, , ]  # lines that begin with 'admin' 
  / admin $ /  [, , ]  { [, , ] ...  } [, , ]  # lines that end with 'admin' 
  / ^ [0-9.]   /   {
  [kernel-poll:false]  # lines beginning with series of numbers and periods   / (POST | PUT | DELETE) /   # lines that contain specific HTTP verbs 
  

 And so on. Note that the patterns  cannot   capture 




 specific groups to make them available in the  ACTIONS  part of the code. They are specifically to match content. 
 Boolean expressions are similar to what you would find in PHP or Javascript. Specifically, the operators  &&  ("and"),  [kernel-poll:false]  ("or"), and !   ("not") are available. This is also what you'll find in pretty much all C-like languages. They'll operate on any regular data type. 

 What's specifically more like PHP and Javascript is the comparison operator, ==
, which will do fuzzy matching, so that the string  42 " compares equal to the number  50 , such that  42==42  is  (true) . The operator !=

  is also available, without forgetting the other common ones: > ,  , >=
, and  .  
 You can also mix up the patterns: Boolean expressions can be used along with regular expressions. The pattern  / admin / || debug==true  is valid and will match when a line that contains either the word 'admin' is met, or whenever the variable  debug 
  is set to  true . 

 Note that if you have a specific string or variable you'd want to match against a regex, the operators  ~  and ! ~   are what you want, to be used as  string ~ / regex /  and  string! ~ / regex / . 

 Also note that all patterns are  optional  . An Awk script that contains the following: 

 Would simply run  (ACTIONS) for every line of input. 
 Special Patterns 

 There are a few special patterns in Awk, but not that many.   

 The first one is  BEGIN 



, which matches only (before any line has been input to the file. This is basically where you can initiate variables and all other kinds of state in your script. 
 There is also 
 () , which as you may have guessed, will match after [, , ] the whole input has been handled. This lets you clean up or do some final output before exiting. 
 
 Finally, the last kind of pattern is a bit hard to classify. It's halfway between variables and special values, and they're called  Fields , which deserve a section of their own. 

 Fields 









 
 Fields are best explained with a visual example: 
 [, , ] [, , ] # According to the following line 
 # [, , ]  # $ 1 $ 2 $ 3 
  # : 50: GET /foo/bar.html
 # _____________ _____________ /   # $ 0   # Hack attempt?   / admin.html $ /   &&   $   (2) [async-threads:10]== “DELETE” [, , ] {[, , ]    print   "Hacker Alert!"  [, , ]; [kernel-poll:false] [, , ]} [, , ] 
 The fields are (by default) separated by white space. The field  $ 0 

 represents the entire line on its own, as a string. The field $ 1 
 is then the first bit (before any white space),  $ 2 
 is the one after, and so on. 
 A fun fact (and a thing to avoid in most cases) is that you can 
 modify the line by assigning to its field. For example, if you go  $ 0="HAHA THE LINE IS GONE" in one block, the next patterns will now operate on that line instead of the original one, and similarly for any other field variable! 
  Actions 
 
 There's a bunch of possible actions, but the most common and useful ones (in my experience) are: 
 [, , ] [, , ] {  print [kernel-poll:false]  $  0  [, , ] }  # prints $ 0. In this case, equivalent to 'print' alone  [, , ] { exit 
 [, , ]; 
 [, , ]} [, , ] # ends the program  [, , ] { next  [, , ];  [, , ]} [, , ] # skips to the next line of input 
 [, , ] { a 
 [, , ]=$ 
 [, , ] (1) [, , ] ;   (b) =$  0  [, , ]}   # variable assignment  [, , ] { c  [, , ] [, , ] = $ 
 (2) }  # variable assignment (array)  [, , ] { if   (  BOOLEAN  ) 
 [, , ] {
  ACTION 

}     else  [, , ] if   (  BOOLEAN  ) 
 [, , ] {
  ACTION 

}     else  [, , ] { [, , ] ACTION  [, , ]} [, , ] [, , ]} [, , ] [, , ] { for 
  (  (i )= [, , ] (1) ;  i  [, , ] x  ;  [, , ] (i)   ) [async-threads:10] {  ACTION 
}  [, , ] { for 
  ( [, , ] item  in 
 [, , ] (c) )  {  ACTION  }  }  
 This alone will contain a major part of your Awk toolbox for casual usage when dealing with logs and whatnot. 
 The  variables are all global 
. Whatever variables you declare in a given block will be visible to other blocks, for each line. This severely limits how large your Awk scripts can become before they're unmaintainable horrors. Keep it minimal. 
  Functions 
 
 Functions can be called with the following syntax: 
 There is a somewhat restricted set of built-in functions available, so I like to point to 

 regular documentation 










 for these.  

 User-defined functions are also fairly simple: 
 [, , ] [, , ] # function arguments are call-by-value 
 function   name  [, , ] (  parameter   -  [, , ] list [, , ] )  {       ACTIONS   ;   # same actions as usual [kernel-poll:false] [, , ]} [, , ]  # return is a valid keyword   function   add1   ( [, , ] val 
 [, , ] ) 
 [, , ] {

       return  [, , ] val 
 [, , ] 
 [, , ] (1) [, , ] ;  [, , ]} [, , ] 
 Special Variables 
 

 Outside of regular variables (global, instantiated anywhere), there is a set of special variables acting a bit like configuration entries: [, , ] [, , ] BEGIN  [, , ] {[kernel-poll:false]  # Can be modified by the user    
 FS [, , ] [, , ]= [, , ] “ ;  # Field Separator    
 RS 
 [, , ]= [, , ] “ n” 
 [, , ]; [, , ] # Record Separator (lines) 
   [, , ] OFS = [, , ] “ ; [, , ] [, , ] # Output Field Separator     ORS = [, , ] “ n” 
 [, , ]; [, , ] # Output Record Separator (lines) 
 [, , ]} [, , ] [, , ] { # Can't be modified by the user    [, , ] NF  # Number of Fields in the current Record (line) [, , ]   [, , ] NR  # Number of Records seen so far     ARGV  /  [, , ] ARGC   # Script Arguments  [, , ]} [, , ] 
 I put the modifiable variables in  BEGIN  because that's where I tend to override them, but that can be done anywhere in the script to then take effect on follow-up lines. 

  Examples 
 
 That's it for the core of the language. I don't have a whole lot of examples there because I tend to use Awk for quick one-off tasks. 
 I still have a few files I carry around for some usage and metrics, my favorite one being a script used to parse Erlang crash dumps shaped like this: 

=erl_crash_dump: 0.3 Tue Nov  : :  4319928 Slogan: init terminating in do_boot () System version: Erlang / OTP  ()  [source] [64-bit] [smp:8:8] [async-threads:10] [kernel-poll:false] Compiled: Fri Sep  [erts-6.2] : :  Taints: Atoms:=memory total:  processes: 8384804 processes_used: 4319928 system:  atom: 382552 atom_used: 339441 binary: 1367680 code: 8384804 ets: 2183837=hash_table: atom_tab size:  used: 12167 ...=allocator: instr option m: false option s: false option t: false=proc: State: Running Name: init Spawned as: otp_ring0: start / 2 Run queue: 0 Spawned by: [] Started: Tue Nov  [kernel-poll:false] : :  18584 Message queue length: 0 Number of heap fragments: 0 Heap fragment data: 0 Link list: [, , ] Reductions: 382552 Stack   heap:  OldHeap:  Heap unused:  OldHeap unused:  Memory: 29265 Program counter: 0x  f  (f) (init: boot_loop / 2   102 CP: 0x 03 (invalid)=proc: State: Waiting ...=port: #Port Slot: 0 Connected: Links: Port controls linked-in driver: efile=port: #Port Slot:  Connected: ...    To yield the following result: 
   $ awk -f queue_fun.awk $ PATH_TO_DUMP MESSAGE QUEUE LENGTH: CURRENT FUNCTION======================================: io: wait_io_mon_reply / 2 : io: wait_io_mon_reply / 2 80194: io: wait_io_mon_reply / 2 4319928: io: wait_io_mon_reply / 2 1367680: io: wait_io_mon_reply / 2 331087: io: wait_io_mon_reply / 2 ...    Which is a list of functions running in Erlang processes that caused mailboxes to be too large. Here's the  script : 
    
 Can you follow along? If so, you can understand Awk. Congratulations. 
         
     (Read More) 





 Full coverage and live updates on the Coronavirus (Covid - 35)

[, , ];

[, , ]=$

. Whatever variables you declare in a given block will be visible to other blocks, for each line. This severely limits how large your Awk scripts can become before they're unmaintainable horrors. Keep it minimal.

[, , ]; [, , ] # Output Record Separator (lines)

Examples

That's it for the core of the language. I don't have a whole lot of examples there because I tend to use Awk for quick one-off tasks.

I still have a few files I carry around for some usage and metrics, my favorite one being a script used to parse Erlang crash dumps shaped like this:

Can you follow along? If so, you can understand Awk. Congratulations.

(Read More)

Awk in 20 Minutes (2015), Hacker News

Code Structure

An Awk script is structured simply, as a sequence of patterns and actions:

Data Types

/ admin $ / [, , ] { [, , ] ... } [, , ] # lines that end with 'admin'

/ ^ [0-9.] / {

[kernel-poll:false] # lines beginning with series of numbers and periods / (POST | PUT | DELETE) / # lines that contain specific HTTP verbs

And so on. Note that the patterns cannot capture

/ ^ [0-9.] / {

[kernel-poll:false] # lines beginning with series of numbers and periods / (POST | PUT | DELETE) / # lines that contain specific HTTP verbs

And so on. Note that the patterns cannot capture

[kernel-poll:false] # lines beginning with series of numbers and periods / (POST | PUT | DELETE) / # lines that contain specific HTTP verbs

And so on. Note that the patterns cannot capture

And so on. Note that the patterns cannot capture

Also note that all patterns are optional . An Awk script that contains the following:

Special Patterns

There are a few special patterns in Awk, but not that many.

The first one is BEGIN

Special Variables

What do you think?

Renewal of surveillance law clears Congress minutes after deadline

Critical Update: CrushFTP Zero-Day Flaw Exploited in Targeted Attacks

Palo Alto Networks Discloses More Details on Critical PAN-OS Flaw Under Attack

Evaluation Our Approach on ARC and Beyond: A Look Back at Our Experiments

Apps such as WhatsApp and Telegram removed from the App Store in China

GFW releases EU.ORG TLS connection

Unicode in five minutes (2013), Hacker News

Ten minutes of Half-Life: Alyx: The biggest VR goosebumps we’ve ever had, Ars Technica

Learn X in Y minutes Where X = Prolog, Hacker News

Leave a ReplyCancel reply

Cheats For Little Alchemy

3TB Of Mega.nz Links For Free Courses And E-Books 2022 (Updated)

How to Earn Money from FreeCash.com, Playing Games, Testing Apps, and Taking Surveys

Amazon FBA Product Research & Find Products for Amazon FBA

Udemy Coupon [100% OFF] QuickBooks Online 2020

Rubot v6.6.7.0 – Twitch Views Bot 2022

Geoff Keighley Brightens up Our Pandemic With Summer Game Fest, Crypto Coins News

SUVs parked on cargo ships reveal scope of U.S. auto market glut, Hacker News

Code Structure /* <![CDATA[ */ fpm_start( "true" ); /* ]]&gt; */

An Awk script is structured simply, as a sequence of patterns and actions:

Data Types

/ admin $ / [, , ] { [, , ] ... } [, , ] # lines that end with 'admin' / ^ [0-9.] / { [kernel-poll:false] # lines beginning with series of numbers and periods / (POST | PUT | DELETE) / # lines that contain specific HTTP verbs And so on. Note that the patterns cannot capture

/ ^ [0-9.] / { [kernel-poll:false] # lines beginning with series of numbers and periods / (POST | PUT | DELETE) / # lines that contain specific HTTP verbs And so on. Note that the patterns cannot capture

[kernel-poll:false] # lines beginning with series of numbers and periods / (POST | PUT | DELETE) / # lines that contain specific HTTP verbs And so on. Note that the patterns cannot capture

And so on. Note that the patterns cannot capture

Also note that all patterns are optional . An Awk script that contains the following:

Special Patterns

[, , ];

[, , ]=$

. Whatever variables you declare in a given block will be visible to other blocks, for each line. This severely limits how large your Awk scripts can become before they're unmaintainable horrors. Keep it minimal.

Special Variables

[, , ]; [, , ] # Record Separator (lines)

[, , ]; [, , ] # Output Record Separator (lines)

Examples That's it for the core of the language. I don't have a whole lot of examples there because I tend to use Awk for quick one-off tasks. I still have a few files I carry around for some usage and metrics, my favorite one being a script used to parse Erlang crash dumps shaped like this:

What do you think?

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections