Elementary execution timing using QueryPerformanceCounter

At some point in your career as a developer, you will need to optimise your code to make it run faster or more economically. If you have not been asked by a user to “make that report or feature run/finish faster”, hang on in there, your time will not be far off.

There are many ways of optimising your code such that it offers the user a more responsive experience (i.e. it runs or feels faster). You may well have implemented an algorithm in the knowledge that it is slow, but gets the job done (e.g. a bubble sort). You have then gone on to optimise the algorithm by replacing it with something that you know to be much faster (e.g. the quicksort). Alternatively, your application may have grown over a long period of time, some calculations are now relying on so much data that it is time to re-think how they work. Or, as is the case here, two factors have resulted in part of my application taking a performance hit:

Growth of code that performs progressively cumulative calculations, i.e. calculate a & b, c is based on a percentage of (a+b), d is based on a percentage of (a+b+c), where a, b and c are recalculated when d is requested – yes I know it’s by no means the best way, however the code was reasonably elegant and more importantly, it was very understandable.
Acceptance that the calculations work and that the code looks reasonably elegant, albeit the calculations take “some time” for relatively few records.

How did the need for optimisation raise its head? Well, in this case, simply adding 20-30 records to my application then viewing the on-screen reports was sufficient enough to make me augment the code behind the reports with an hourglass (waitcursor). Whilst the reports worked, the lack of user feedback after the hourglass appeared was of concern to me. For this particular application, 20-30 records is a little more than the average that I would expect it to have to cope with.

Elementary Execution Timing
Luckily, the Win32 API provides a couple of useful methods that we can use to “time” our function calls. Here’s a class that surfaces those methods for use in C#:

[code lang=”C#”]
using System.Runtime.InteropServices;

namespace uSimpleExecutionTimer
{
public class SimpleExecutionTimer
{
[DllImport(“kernel32.dll”)]
extern static short QueryPerformanceCounter(ref long x);
[DllImport(“kernel32.dll”)]
extern static short QueryPerformanceFrequency(ref long x);

private long counter1 = 0;
private long counter2 = 0;
private long freq = 0;

public long TickCount
{
get { return counter2 – counter1; }
}

public long TickFrequency
{
get { return freq; }
}

public long StartTick
{
get { return counter1; }
}

public long EndTick
{
get { return counter2; }
}

public double TickCountSeconds
{
get { return (counter2 – counter1) * 1.0 / freq; }
}

public bool Start()
{
return QueryPerformanceCounter(ref counter1) != 0;
}

public void Stop()
{
QueryPerformanceCounter(ref counter2);
QueryPerformanceFrequency(ref freq);
}
}
}
[/code]

In use, it looks like this:

[code lang=”‘C#”]
SimpleExecutionTimer set = new SimpleExecutionTimer();

set.Start();
PopulateSummaryGrid();
PopulateDetailGrid();
set.Stop();

MessageBox.Show(set.TickCount.ToString());
[/code]

Granted, you might want to do something a little more scientific than a simple MessageBox.Show, however this is enough to give us our first clue relating to the execution time of the two Populate…() methods.

Before optimisation
When I measured the execution time of the two Populate…() methods in my un-optimised application, the TickCount was 14,870,258. With a QueryPerformanceFrequency of 3579545, that made for a delay of over 4 seconds. Given that there were only 20 records being processed, 4 seconds is unacceptable, even if the results are correct. With a few other things thrown into the equation, we were at nearly 30 seconds as noted earlier.

The problem with augmentation
Of course, whilst the use of QueryPerformanceCounter/SimpleExecutionTimer is the need to add extra lines of code to your application. Removing the lines of code often means “commenting out” ot physically deleting the lines – both of which involve you touching your code thus introducing the possibility of accidental error. Perhaps more obviously, adding extra lines of code to your application necessitates at least a partial re-compile.

The problem: well, without going into detail, I’ll save that for another posting/review, it was compounded calculations.
Basically, I had a series of about 15 calculations like so:

a = CalcA(); // Iterates over a recordset performing up to 15 calculations per record
b = CalcB(); // Iterates over the same recordset performing a different calculation, again 15 times per record
c = (a + b) + 5% of (a+b);
d = c + 10% of c
…

My application required access to a and b on their own, but also required access to c. Simplicity and neatness meant that my calculations, whilst elegant, were repetitive and rather slow. Fortunately I had identified this up front, and wanted to move the application to “feature complete” before spending time optimising it. Thus when my application requested the value of c it actually went and recalculated a and b. Twice. And then we have the calculation for d…you can see the repetition taking its toll, however the calculations are reasonably elegant and obvious.

The solution: I created a new class that performed the calculations once and once only. With the problem identified and a solution in place, how long did the two Populate…() methods take? Well, the results were somewhat good: the TickCount was only 75,000…fractions of a second. So fast, the populating of the two grids appeared instantaneous. So much so, I could have disabled the code that turned the hourglass on and off! (But I didn’t!)

Code augmentation (adding extra lines of code) is not too bad if you are looking to perform a series of benchmarking exercises. For example, a few years ago I did some work that benchmarked the XML Document Object Model (DOM) versus the Simple API for XML (SAX). I wrote a small application that used QueryPerformanceCounter to monitor the results of loading small and large XML documents whilst taking into account the effects of caching. In this scenario, code augmentation prove to be rather useful, the code itself was never going to reach “Production” and was merely used for the benchmarking exercise.

Anyway, I hope that you’ll find QueryPerformanceCounter of some use in your applications.

The Social Programmer

Elementary execution timing using QueryPerformanceCounter

Craig Murphy: author, blogger, community evangelist, developer, speaker, runner