awk is a great tool for working with columnar data a line at a time,
but the langauge itself is a bit limited. Usually when it's not quite
powerful enough, I turn to perl. But... choice is good. Enter
tawk, which uses tcl as the scripting language. It's designed to
be very familiar to anyone coming from an awk background.
Much like awk:
tawk [OPTIONS] ['script'] [var=value | filename] ...
Reads from standard input if no filenames are given on command line.
Dependencies are tcl 8.6, and tcllib. Copy the tawk script to
/usr/local/bin or wherever - it's a single, self-contained script.
-F regexpSets the field seperator (FS).-f filenameRead the script from the given file instead of it being the first non-option command line argument.-safeRun the script in a safe tcl interpreter. Meant for untrusted code.-timeout NExit with an error if a script takes more thanNseconds to complete.-csvTurn on CSV line parsing. Prefer this over settingFSto a comma.-quotechar CUse the given character instead of double quote for quoted CSV fields.-quoteallAlways quote every CSV field when printing.
tawk adds the following commands on top of basic tcl:
BEGIN scriptExecuted at the beginning of processing, before any data.END scriptExecuted at the end, after processing all data.BEGINFILE scriptExecuted at the beginning of each file.ENDFILE scriptExecuted at the end of reading each file.line scriptExecuted for every line read.line test scriptIftestreturns true when evaluated byexpr, execute the script.rline [-field N] re scriptIf the regular expressionrematches the line, (Or the specified field), execute the script.
print [arg ...]Print out all its arguments joined by$OFS, or$F(0)if called with no arguments.csv_join arglist [delim] [quotechar] [quotemode]Return the list joined into a CSV-formatted string.csv_split string [delim] [quotechar]Split a CSV-formatted string into a list.
continuestops processing the current line and goes to the next. Likenextin awk.breakstops processing the current file and goes on to the next.
Most of these are lifted straight from awk names.
FAn array holding the columns of the line.$F(0)is the whole line. Setting a new element aboveNFfills in the missing interval with empty strings. SettingF(0)rebuilds the rest of the array based on splitting the new value.NFThe number of fields in the current line. Modifying this adjustsF.NRThe current line number.FNRThe line number of the current file.FILENAMEThe name of the current file,-for standard input.INFILEThe file handle of the current file. Only set inBEGINFILE,lineandrlineblocks.FSIf set, a single character, or regular expression that is used to indicate field delimiters. If a a single space, or not set, any amount of whitespace is used, and leading and trailing whitespace is first stripped. If an empty string, splits every character into its own field. Can only be a single character in CSV mode.OFSUsed to separate fields inF(0)when other elements ofFare written to orNFis changed. Also used to seperate arguments ofprint. Can only be a single character in CSV mode.CSV1 if in CSV mode, 0 if in normal mode. (Read-only)CSVQUOTECHARwhen in CSV mode, the character used to quote fields. Set by the-quotecharoption. Defaults to a double quote.CSVQUOTESet toalwaysto always quote CSV fields (Turned on by the-quoteallargument), orautoto only quote when needed. Attempting to set other values raises an error. Defaults to auto.
If invoked with the -csv option, the output field separator (OFS)
is set to comma instead of a space, and print joins its arguments
with CSV escaping.
When reading fields, the default field separator (FS) if not
explicitly set is a comma, and only single-character separators are
supported. Lines are split by a CSV-aware parser - so commas in quoted
fields don't count, unlike if just setting FS to a comma in normal
mode. The CSVQUOTECHAR variable controls the character used to quote
fields (Defaults to double quote, set by the -quotechar option.)
Also, the print command CSV-escapes its arguments, and gets reads
a full CSV record, which may be multiple lines.