Importing tables into AWK

Data Mining
CS 510 (DM)
Winter,2004
home | news | site map
review | project | subject | group
weka | mining | gawk | bash
modeling | reference | pods
Display: big | small

Why all the scripting?
 copyleft() {
        cat<<-EOF
        readTableEg:  importing tables into AWK
        Copyright (C) 2004 Tim Menzies
        This program is free software; you can redistribute it and/or
        modify it under the terms of the GNU General Public License
        as published by the Free Software Foundation, version 2.
        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.
        You should have received a copy of the GNU General Public License
        along with this program; if not, write to the Free Software
        Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
        EOF
  }

Motivation

Sometimes, you want to import tables of data where where the first row is a list of x-labels, the first column is a set of y-labels, and the other cells are data at position x-y. For example:

  ,  xl,   vl,    l,    n,    h,   vh,  xh
  prec,    , 6.20, 4.96, 3.72, 2.48, 1.24,
  flex,    , 5.07, 4.05, 3.04, 2.03, 1.01,
  arch,    , 7.07, 5.65, 4.24, 2.83, 1.41,
 ...

These tables have N fields and L lines. Iternally these are converted to tables with M=N-1 columns and K=L-1 rows (since field one and line one are for labels, not data).

[TOP]


Usage

This demo's usage is as follows:

 usage() {
        cat<<-EOF
        Usage: readTableEg
        Show gawk reading in tables of numbers
        Flags: 
          -h        print this help text
          -l        copyright notice
        EOF
        exit
 }

Example

A standard usage would be to have some config file storing the table file and name with lines like:

 %TABLE postArchScaleFactors2000.dat       sf2000
 %TABLE postArchScaleFactors1983.dat       sf1983

Which could be imported into the Headers and Table arrays as follows:

 /%TABLE/ {readTable($2,$3, Headers,Table)}

For the above table, the expression Table[sf2000,"vl","prec"] would yield 6.2.

Installation

First, if you have installed anything from this site before, save your config file to somewhere safe.

Second, copy the following files to your directory (from either ~timm/public_html/dm or http://www.cs.pdx.edu/~timm/dm or from http://www.cs.pdx.edu/~timm/dm/readTableEg.zip): config, readTableEg, lib.awk, readTable.awk.

Third, make readTableEg executable:

 chmod +x readTableEg.

Fourth, compare your safe version of config with the new version you just copied and fix up any paths.

Five, edit your this file and config. The first line of this file should point to your local bash shell. and you'll need to check at least the #paths sectionin config

Check that all it works:

 readTableEg

If the installation worked, then you should see 6.02 printed

[TOP]


Source code

Settings

 . config

Demo code

 demoReadTable() {
        cat<<-EOF> readTableEg.dat
             ,  xl,   vl,    l,    n,    h,   vh,  xh
         prec,    , 6.20, 4.96, 3.72, 2.48, 1.24,
         flex,    , 5.07, 4.05, 3.04, 2.03, 1.01,
         arch,    , 7.07, 5.65, 4.24, 2.83, 1.41,
        EOF
        cat<<-EOF> readTableEgSpec.dat
        %TABLE readTableEg.dat   sf2000
        EOF
        cat <<-"EOF"> readTableEg.awk
        /%TABLE/ {readTable($2,$3, Headers,Table)}
        END      {#for(i in Table) print  i ;
                 print Table["sf2000","prec","vl"]}
        EOF
        gawk -f lib.awk -f readTable.awk \
             -f readTableEg.awk readTableEgSpec.dat
 }

readTable.awk: The Worker

This function accepts as arguments the table file, a name for the table, and two arrays: header and table. On execution, the function adds header[name,1..M] for each column label. It also adds table[name,1..M,1..J] for each cell of the table.

 function  readTable(file,name,header,table,   a,i,line,lines,n,what) {
  if (! exists(file)) die("missing file " file);
  while ((getline line < file) > 0) {
    n=split(line,a,/,/); 
    for(i=1;i<=n;i++) {
      if (! blank(a[i]) ){
        if (i> 1) {
          if (lines) {table[name,what,header[name,i-1]]=a[i]+0}
          else {header[name,i-1]=trim(a[i])}}
        else {
          if (lines) {what=trim(a[i])}}}}
    lines++; 
  }
  close(file);
 }

Command line processing

 while getopts "hl" flag
 do case "$flag" in
        l)  copyleft; exit;;
        h)  usage; exit ;;
    esac
 done
 demoReadTable

[TOP]


Credits

Author

Tim Menzies , tim@menzies.us, http://menzies.us

Software

This page generated by Site: see http://www.cs.pdx.edu/~timm/dm/site.html

Acknowledgements

This site is built using PerlPod.

Style sheet switching method taken from Eddie Traversa's excellent and simple-to-apply tutorial: http://dhtmlnirvana.com/content/styleswitch/styleswitch1.html.

Search engine powered by ATOMZ http://www.atomz.com/search/. Note, the indexes to this site are only updated weekly (heh, its a free service- what more ja want?).

Icons on this site come from http://www.sql-news.de/rubriken/olap.asp and http://www.ifnet.it/webif/centrodi/eng/toolbar.htm.

The JAVA machine learners used at this site come from the extensive data mining libraries found in the University of Waikato's Environment for Knowledge Analysis (the WEKA) http://www.cs.waikato.ac.nz/ml/weka/

[TOP]


Legal

Copyright

Copyright (C) Tim Menzies 2004

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; see http://www.gnu.org/copyleft/gpl.html. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Disclaimer

The content from or through this web page are provided 'as is' and the author makes no warranties or representations regarding the accuracy or completeness of the information. Your use of this web page and information is at your own risk. You assume full responsibility and risk of loss resulting from the use of this web page or information. If your use of materials from this page results in the need for servicing, repair or correction of equipment, you assume any costs thereof. Follow all external links at your own risk and liability.

[TOP]