XMLgawk is a very useful tool at times you want to quickly extract specific data from an XML data file. As you probably noticed, XMLgawk is a concatenation of XML and gawk. Awk is a scripting language in which it is easy to process data that is stored as a text file. Gawk is the GNU implementation of awk. By default, awk processes the text file line-by-line. Awk carries out specified tasks triggered by specified patterns. The working of XMLgawk is very similar, only it processes XML files node-by-node. A simple example of the use is given below.
XMLgawk was written by Jürgen Kahrs, with help from Stefan Tramm, Manuel Collado and Andrew Schorr. The XMLgawk home page is found here.
I started to create my own binaries for XMLgawk, since I found the concept very useful and no binaries were available that run on native Windows. To accomplish compiling under Windows, I needed to make some modifications to the source files, as expained on the page Building XMLgawk. The binaries are available in the Downloads section.
Consider the XML file VectorDb.xml, which represents an index file to a very limited geographical database of the Netherlands. The following simple script will extract the paths and file names of all files contained in the index file.
# Parse the vector database
@load xml
XMLSTARTELEM == "file" {
path = XMLATTR["path"]
filename = XMLATTR["filename"]
}
XMLENDELEM == "file" {
print path, filename
}Running the command:gawk -f Parse-VectorDb.awk VectorDb.xml
gives the following output:
Roads\Highways A27-North.tab Roads\Highways A27-South.tab Roads\N-Roads N-Roads-West.tab Roads\N-Roads N-Roads-East.tab Cities\Large-cities Amsterdam.tab Cities\Large-cities Rotterdam.tab Cities\Large-cities Den Haag.tab
With the command @load xml, the script loads the XML extension. Wih that, awk will work in "XML mode". The expression XMLSTARTELEM == "file" will execute the following code block when a node with name "file" is entered.
With path = XMLATTR["path"], the script variable path gets the value of the attribute "path" of the xml node "file". The next line has a similar function. Finally, the expression XMLENDELEM == "file" executes the print statement when the node "file" is left.
The full possibilities of XMLgawk are explained in the XMLgawk manual.
The zip file contains all binaries that are needed to run XMLgawk on a pc with a standard Windows installation.
Download:
xgawk_w32bin_316a_alpha-4.zip
Generally, we have the following options for the file locations:
gawk.exe, libexpat-1.dll and libiconv-2.dll need to be in the current directory or in a directory which is contained in your path (e.g. in c:\WINDOWS\system32 or in c:\Programs\MinGW\bin) andxml.dll needs to be in the current directory or in a directory which is in AWKLIBPATH or in a directory defined in deflibpath. On my system it is in c:\Programs\MinGW\lib\awk.You can set AWKLIBPATH yourself by issueing:
set AWKLIBPATH=[your_path]deflibpath is defined in gawk.exe, with current definition:
c:\Windows;c:\Windows\System32;c:\Programs\MinGW\bin;\ c:\Programs\MinGW\lib;c:\Programs\MinGW\lib\awk