Using a Calculation Engine for .NET

Using a Calculation Engine for .NET

CheapASPNETHostingReview.com | Best and cheap ASP.NET hosting. “Creating a mathematical expression evaluator is one of the most interesting exercises in computer science, whatever the language used. This is the first step towards really understanding what sort of magic is hidden behind compilers and interpreters….“.

I agree completely, and hope that you do too.

Using the CalcEngine class

The CalcEngine class performs two main tasks:

  1. Parses strings that contain formulas into Expression objects that can be evaluated.
  2. Evaluates Expression objects and returns the value they represent.

To evaluate a string that represents a formula, you call the CalcEngine.Parse method to get an Expression, then call the Expression.Evaluate method to get the value. For example:

Function names are case-insensitive (as in Excel), and the parameters are themselves expressions. This allows the engine to calculate expressions such as “=ATAN(2+2, 4+4*SIN(4))”.

The CalcEngine class also provides a Functions property that returns a dictionary containing all the functions currently defined. This can be useful if you ever need to enumerate remove functions from the engine.

Notice how the method implementation listed above casts the expression parameters to the expected type (double). This works because the Expression class implements implicit converters to several types (string, double, bool, and DateTime). I find that the implicit converters allow me to write code that is concise and clear.

If you don’t like implicit converters, the alternative would be to override ToString in the Expression class and add ToDouble, ToDateTime, ToBoolean, etc.

Variables: Binding to simple values

Most calculation engines provide a way for you to define variables which can be used in expressions. The CalcEngine class implements a Variables dictionary that associates keys (variable names) and values (variable contents).

For example, the code below defines a variable named angle and calculates a short sine table:

Variables: Binding to CLR objects

In addition to simple values, the CalcEngine class implements a DataContext property that allows callers to connect a regular .NET object to the engine’s evaluation context. The engine uses Reflection to access the properties of the object so they can be used in expressions.

This approach is similar to the binding mechanism used in WPF and Silverlight, and is substantially more powerful than the simple value approach described in the previous section. However, it is also slower than using simple values as variables.

For example, if you wanted to perform calculations on an object of type Customer, you could do it like this:

CalcEngine supports binding to sub-properties and collections. The object assigned to the DataContext property can represent complex business objects and entire data models.

This approach makes it easier to integrate the calculation engine into the application, because the variables it uses are just plain old CLR objects. You don’t have to learn anything new in order to apply validation, notifications, serialization, etc.

Variables: Binding to dynamic objects

The original usage scenario for the calculation engine was an Excel-like application, so it had to be able to support cell range objects such as “A1” or “A1:B10”. This requires a different approach, since the cell ranges have to be parsed dynamically (it would not be practical to define a DataContext object with properties A1, A2, A3, etc).

To support this scenario, the CalcEngine implements a virtual method called GetExternalObject. Derived classes can override this method to parse identifiers and dynamically build objects that can be evaluated.

If the object returned implements the CalcEngine.IValueObject interface, the engine evaluates it by calling the IValueObject.GetValue method. Otherwise, the object itself is used as the value.

If the object returned implements the IEnumerable interface, then functions that take multiple values (such as Sum, Count, or Average) use the IEnumerable implementation to get all the values represented by the object.

For example, the sample application included with this article defines a DataGridCalcEngine class that derives from CalcEngine and overrides GetExternalObject to support Excel-style ranges. This is described in detail in a later section (“Adding Formula Support to the DataGridView Control”).

Optimizations

I mentioned earlier that the CalcEngine class performs two main functions: parsing and evaluating.

If you look at the CalcEngine code, you will notice that the parsing methods are written for speed, sometimes even at the expense of clarity. The GetToken method is especially critical, and has been through several rounds of profiling and tweaking.

For example, GetToken detects characters and digits using logical statements instead of the convenient char.IsAlpha or char.IsDigit methods. This does make a difference that shows up clearly in the benchmarks.

In addition to this, CalcEngine implements two other optimization techniques:

Expression caching

The parsing process typically consumes more time than the actual evaluation, so it makes sense to keep track of parsed expressions and avoid parsing them again, especially if the same expressions are likely to be used over and over again (as in spreadsheet cells or report fields, for example).

The CalcEngine class implements an expression cache that handles this automatically. The CalcEngine.Evaluate method looks up the expression in the cache before trying to parse it. The cache is based on WeakReference objects, so unused expressions eventually get removed from the cache by the .NET garbage collector. (This technique is also used in the NCalc library.)

Expression caching can be turned off by setting the CalcEngine.CacheExpressions property to false.

Expression optimization

After a string has been parsed, the resulting expression can sometimes be optimized by replacing parts of the expression that refer only to constant values. For example, consider the expression:

This expression contains several constants and functions of constants. It can be simplified to:

This second expression is equivalent to the first, but can be evaluated much faster.

Expression simplification was surprisingly easy to implement. It consists of a virtual Expression.Optimize method that is called immediately after an expression is parsed.

The base Expression class provides an Optimize method that does nothing:

This simply allows all derived classes that derive from Expression to implement their own optimization strategy.

For example, the BinaryExpression class implements the Optimize method as follows:

The method calls the Optimize method on each of the two operand expressions. If the resulting optimized expressions are both literal values, then the method calculates the result (which is a constant) and returns a literal expression that represents the result.

To illustrate further, function call expressions are optimized as follows:

First, all parameters are optimized. Next, if all optimized parameters are literals, the function call itself is replaced with a literal expression that represents the result.

Expression optimization reduces evaluation time at the expense of a slight increase in parse time. It can be turned off by setting the CalcEngine.OptimizeExpressions property to false.

Globalization

The CalcEngine class has a CultureInfo property that allows you to define how the engine should parse numbers and dates in expressions.

By default, the CalcEngine.CultureInfo property is set to CultureInfo.CurrentCulture, which causes it to use the settings selected by the user for parsing numbers and dates. In English systems, numbers and dates look like “123.456” and “12/31/2011”. In German or Spanish systems, numbers and dates look like “123,456” and “31/12/2011”. This is the behavior used by Microsoft Excel.

If you prefer to use expressions that look the same on all systems, you can set the CalcEngine.CultureInfo property to CultureInfo.InvariantCulture for example, or to whatever your favorite culture happens to be.

Sample: A DataGridView control with formula support

The sample included with this article shows how the CalcEngine class can be used to extend the standard Microsoft DataGridView control to support Excel-style formulas. The image at the start of the article shows the sample in action.

Note that the formula support described here is restricted to typing formulas into cells and evaluating them. The sample does not implement Excel’s more advanced features like automatic reference adjustment for clipboard operations, selection-style formula editing, reference coloring, and so on.

The DataGridCalcEngine class

The sample defines a DataGridCalcEngine class that extends CalcEngine with a reference to the grid that owns the engine. The grid is responsible for storing the cell values which are used in the calculations.

The DataGridCalcEngine class adds cell range support by overriding the CalcEngine.GetExternalObject method as follows:

The method analyzes the identifier passed in as a parameter. If the identifier can be parsed as a cell reference (e.g., “A1” or “AZ123:XC23”), then the method builds and returns a CellRangeReference object. If the identifier cannot be parsed as an expression, the method returns null.

The CellRangeReference class is implemented as follows:

The CellRangeReference class implements the IValueObject interface to return the value in the first cell in the range. It does this by calling the owner grid’s Evaluate method.

The CellRangeReference also implements the IEnumerable interface to return the value of all cells in the range. This allows the calculation engine to evaluate expressions such as “Sum(A1:B10)”.

Notice that the GetValue method listed above uses an _evaluating flag to keep track of ranges that are currently being evaluated. This allows the class to detect circular references, where cells contain formulas that reference the cell itself or other cells that depend on the original cell.

The DataGridCalc class

The sample also implements a DataGridCalc class that derives from DataGridView and adds a DataGridCalcEngine member.

To display formula results, the DataGridCalc class overrides the OnCellFormatting method as follows:

The method starts by retrieving the value stored in the cell. If the cell is not in edit mode, and the value is a string that starts with an equals sign, the method uses CalcEngine to evaluate the formula and assigns the result to the cell.

If the cell is in edit mode, then the editor displays the formula rather than the value. This allows users to edit the formulas by typing into in the cells, just like they do in Excel.

If the expression evaluation causes any errors, the error message is displayed in the cell.

At this point, the grid will evaluate expressions and show their results. But it does not track dependencies, so if you type a new value into cell “A1” for example, any formulas that use the value in “A1” will not be updated.

To address this, the DataGridCalc class overrides the OnCellEditEnded method to invalidate the control. This causes all visible cells to be repainted and automatically recalculated after any edits.

Let’s not forget that implementation of the Evaluate method used by the CellRangeReference class listed earlier. The method starts by retrieving the cell content. If the content is a string that starts with an equals sign, the method evaluates it and returns the result; otherwise it returns the content itself:

That is all there is to the DataGridCalc class. Notice that calculated values are never stored anywhere . All formulas are parsed and evaluated on demand.

The sample application creates a DataTable with 50 columns and 50 rows, and binds that table to the grid. The table stores the values and formulas typed by users.

The sample also implements an Excel-style formula bar across the top of the form that shows the current cell address, content, and has a context menu that shows the functions available and their parameters.

Finally, the sample has a status bar along the bottom that shows summary statistics for the current selection (Sum, Count, and Average, as in Excel 2010). The summary statistics are calculated using the grid’s CalcEngine as well.

Testing

I built some testing methods right into the CalcEngine class. In debug builds, these are called by the class constructor:

This ensures that tests are performed whenever the class is used (in debug mode), and that derived classes do not break any core functionality when they override the base class methods.

The Test method is implemented in a Tester.cs file that extends the CalcEngine using partial classes. All test methods are enclosed in an #if DEBUG/#endif block, so they are not included in release builds.

This mechanism worked well during development. It helped detect many subtle bugs that might have gone unnoticed if I had forgotten to run my unit tests when working on separate projects.

Benchmarks

While implementing the CalcEngine class, I used benchmarks to compare its size and performance with alternate libraries and make sure CalcEngine was doing a good job. A lot of the optimizations that went into the CalcEngine class came from these benchmarks.

I compared CalcEngine with two other similar libraries which seem to be among the best available. Both of these started as CodeProject articles and later moved to CodePlex:

  • NCalc: This is a really nice library, small, efficient, and feature-rich. I could not use NCalc in my Silverlight project because it relies on the ANTLR runtime DLL, which cannot be used in Silverlight projects (at least I couldn’t figure out how to do it).
  • Flee: Unlike CalcEngine and NCalc, Flee keeps track of formulas, their values, and dependencies. When a formula changes, Flee re-evaluates all cells that depend on it. One of the interesting features of Flee is that it actually compiles formulas into IL. This represents a trade-off since compilation is quite slow, but evaluation is extremely fast. I decided not to use Flee in my Silverlight project because it is relatively large and the parsing times were too long for the type of application I had in mind.

The benchmarking method was similar to the one described by Gary Beene in his 2007 Equation Parsers article. Each engine was tested for parsing and evaluating performance using three expressions. The total time spent was used to calculate a “Meps” (million expressions parsed or evaluated per second) index that represents the engine speed.

The expressions used were the following:

Where “a” and “b” are variables set to 2 and 4.

Each engine parsed and evaluated the expressions 500,000 times. The times were added and used to calculate the “Meps” index by dividing the number of repetitions by the time consumed. The results were as follows:

Time in secondsSpeeds in “Meps”
LibraryParseEvaluateParseEvaluate
CalcEngine1.41.31.041.18
NCalc7.15.70.210.26
Flee1,283.00.50.002.91
CalcEngine*10.71.50.140.99
NCalc*145.25.70.010.27

speed

Some comments about the benchmark results:

  • CalcEngine performed well, being the fastest parser and the second fastest evaluator (after Flee).
  • Flee is literally “off the charts” on both counts, almost 900 times slower parsing and 2.5 times faster evaluating than CalcEngine. Because Flee compiles formulas to IL, I expected slow parsing and fast evaluation, but the magnitude of the difference was surprising.
  • Entries marked with asterisks were performed with optimizations off. They are included to demonstrate the impact of the optimization options.

In addition to speed, size is important, especially for Silverlight applications that need to be downloaded to the client machine. Here is a comparison of library sizes:

LibrarySize (kB)
CalcEngine26
NCalc188
Flee202

size

CalcEngine is the smallest library by far, more than seven times smaller than NCalc. If necessary, it could be trimmed even further by removing some of the less important built-in functions.

Conclusion

The CalcEngine class is compact, fast, extensible, and multi-platform. I think it is different enough from NCalc and Flee to add value for many types of projects, especially Silverlight applications like the one it was created for. You can see the Silverlight app in action in the image below.

excelBookDemo

I hope others will find CalcEngine useful and interesting as well.