Unixy goodness: use awk to group and average data

As a software tester, sometimes you are called upon to performance test a web service and present results in a nice chart to impress your manager. JMeter is commonly used to thrash your server and produce insane amounts of throughput data. If you’re running 1000 tpm this can be rather a lot of data (180,000 transactions for a 3 hour test run). This is beyond the capability of JMeter’s inbuilt graphics package and is too much to import to Excel.

perf-excelMy solution is to group throughput per minute and average transaction time for each minute.  Attached below is a script for processing a JTL log file from JMeter. It reduces a 3-hour test run to 180 data points which is much easier to represent with a chart program such as Excel.

The script uses a few neat awk tricks, such as:

  • Rounding unix timestamps to nearest minute
  • Collect timestamps grouped by minute
  • Convert unix timestamp to YYYY-MM-dd etc.
  • Print Throughput for a minute increment
  • Print Average response time for a minute increment
  • Do all of the above in an efficient single pass through awk (this was the hardest bit!)

Hat tip: Jadu Saikia for excellent awk tips.

Recommended link: Improve the quality of your JMeter scripts

::Code is below the fold::

# jtlmin.sh :
#   JMeter log processing script
#   Collects & Averages throughput data using 1-minute increments
#   Requires CSV-formatted log from JMeter "Simple Data Writer".
#   Version   Date          Author      Comment
#       2.0   2009-02-17    R. Papesch  Refined awk procedure, renamed variables
#       1.0   2006-11-28    R. Papesch
#set -x  #debug

USAGE="Usage: jtlmin.sh  \nSummarizes JMeter JTL output into 1-minute blocks"
[ $1 ] || { echo -e $USAGE; exit 1 ; }
echo -e "Processing \c"
ls $1 || { exit 1 ; }

  STEP=60       # <-- can modify this value for larger time increments

  # Workfile: Chop milliseconds, Round timestamps to nearest Minute
  sed -n '2,$ p' $1 | cut -c -10,14- | sort | awk -F',' -v step=$STEP '{print $1-$1%step,$2}' > $WORKFILE

  echo "Outputting data to $OUTFILE .."
  echo "$PWD/$1" >> $OUTFILE
  echo -e "unixtime \tdate \ttime \tthruput(tpm) \tresponse(ms) " >> $OUTFILE
  awk_routine | sort >> $OUTFILE


  awk '
    NR!=1 {minute[$1]++; rsum[$1]=rsum[$1]+$2}
    END {
      for (i in minute) {
        printf("%d\t", i);
        printf("%s\t", strftime("%Y.%b.%d",i));
        printf("%s\t", strftime("%H:%M",i));
        printf("%d\t", minute[i]);
        printf("%d\n", rsum[i]/minute[i])
    }' $WORKFILE

main $*

The input is a CSV-formatted log from JMeter 2.2. Example (anonymized):

1162242888889,101,TEST_applyCredit,200,,applyCredit 1-1,text,true,672,1,7,
1162242889190,30,TEST_applyCredit,200,,applyCredit 1-2,text,true,672,2,12,
1162242918672,30,TEST_applyCredit,200,,applyCredit 1-1,text,true,672,2,14,
1162242949176,20,TEST_applyCredit,200,,applyCredit 1-2,text,true,672,2,14,
1162242978669,20,TEST_applyCredit,200,,applyCredit 1-1,text,true,672,2,14,
1162243009182,30,TEST_applyCredit,200,,applyCredit 1-2,text,true,672,2,14,
1162243038665,30,TEST_applyCredit,200,,applyCredit 1-1,text,true,672,2,14,
1162243069179,20,TEST_applyCredit,200,,applyCredit 1-2,text,true,672,2,14,
1162243098671,20,TEST_applyCredit,200,,applyCredit 1-1,text,true,672,2,14,
1162243129165,30,TEST_applyCredit,200,,applyCredit 1-2,text,true,672,2,14,
1162243158667,30,TEST_applyCredit,200,,applyCredit 1-1,text,true,672,2,14,
1162243189171,30,TEST_applyCredit,200,,applyCredit 1-2,text,true,672,2,14,
1162243218664,20,TEST_applyCredit,200,,applyCredit 1-1,text,true,672,2,14,
1162243249177,31,TEST_applyCredit,200,,applyCredit 1-2,text,true,672,2,14,
1162243278660,30,TEST_applyCredit,200,,applyCredit 1-1,text,true,672,2,14,
1162243309174,30,TEST_applyCredit,200,,applyCredit 1-2,text,true,672,2,14,
1162243338666,30,TEST_applyCredit,200,,applyCredit 1-1,text,true,672,2,14,
1162243369170,30,TEST_applyCredit,200,,applyCredit 1-2,text,true,672,2,14,

7 thoughts on “Unixy goodness: use awk to group and average data

  1. This is great, however it’s missing quite a bit of information about the input file. An example of the input would be invaluable here.. I have no idea what fields you’ve included/excluded in JMeter, and without trying to parse what you’ve written here it’s impossible to figure out what to feed into your script.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s