In this blog I will be demonstrating the use of Tail End Recursion in F#. In functional languages, looping constructs such as *while *and *for* loops are rarely seen in code. Instead code is written to act upon streams of data which can have either a finite or infinite number of elements. LINQ follows the same model where it “pretends” that database records and XML elements are a stream of objects.

Functional languages process streams by “pealing” elements out of the lists one at a time using the *head::tail *syntax. In this blog I will demonstrate how to write a simple program using both a recursive and a non-recursive approach.

# Linear Regression Model

Assume we have collected some data then plotted it onto a graph. Within the scatter diagram it appears that the data follows a straight line. A straight line is defined by the following simple formula **y = mx + c**.

Assuming that the sample above can be modeled by a simple line. We can assume that each sample **y(i) = mx(i) + c + noise(i)**. We want to estimate what the values of **m** and **c** are for the graph above. To do this we need to make some assumptions about the noise generator:

- Average noise value is zero
- Noise is random and uncorrelated (noise samples are unrelated and are independent)

If the assumptions are valid then we can use the Linear Regression Model which can be found here. **N** is used in the formulas to indicate the number of samples.

##

We can load the samples into F# using a list of tuples of (x,y).

let samples = [

(1.0,15.95944862);

(2.0,14.70482969);

(3.0,19.57532692);

(4.0,24.37143171); …]

## Simple implementation

The simplest solution is the calculate the values by using built in F# functions to find the:

- Sum of
**x** elements - sum of
**y** elements - sum of
**x*x** elements - sum of
**x*y** elements

And using the results to populate the **m** and **c** values

type linearRegressionCoeffs = {

m:float;

c:float;

}

1: let RegressionSimple l : linearRegressionCoeffs =

2: // Create list of x and y by themselves is we can easily calculate the sum of x & y elements

3: let (x, y) = List.unzip l

4: let sumX = List.sum x

5: let sumY = List.sum y

6:

7: // Calculate the products on the X*Y and X*X

8: let sumXY = List.sum_by(fun f ->

9: let (x, y) = f

10: x * y

11: ) l

12: let sumXX = List.sum_by(fun f ->

13: let (x, y) = f

14: x * x

15: ) l

16:

17: // Calculate the m and c coefficients

18: let n = (float)(List.length l)

19: let m = (n*sumXY - (sumX *sumY))/(n*sumXX - sumX*sumX)

20: let c = (sumY - m*sumX)/n

21: {m = m; c=c;}

## Optimising using a while loop

Obviously iterating over the list five times is extremely inefficient (four for the calculates and a further time in the unzip method). In C# we would use a while or for loop to calculate the values and we can follow the same pattern in F#.

1: type regressionCoeffs = {

2: mutable sumXiXi:float;

3: mutable sumXiYi:float;

4: mutable sumXi:float;

5: mutable sumYi:float;

6: }

7:

8: let public regressionWithLoop l : linearRegressionCoeffs =

9: let accumilator:regressionCoeffs = {sumXiXi = 0.0; sumXiYi =0.0; sumXi = 0.0; sumYi = 0.0;}

10:

11: for sample in l do

12: let (x,y) = sample

13: accumilator.sumXiYi <- accumilator.sumXiYi + x*y;

14: accumilator.sumYi <- accumilator.sumYi + y;

15: accumilator.sumXi <- accumilator.sumXi + x;

16: accumilator.sumXiXi <- accumilator.sumXiXi + x*x;

17:

18: let length = (float) (List.length l)

19: let meanX = accumilator.sumXi/length

20: let meanY = accumilator.sumYi/length

21: let m = (accumilator.sumXiYi - length*meanX*meanY) / (accumilator.sumXiXi - length* meanX* meanX)

22: let c = meanY - m*meanX

23: {m = m; c=c;}

Unfortunately the solution above resorts to having to update a temporary variable using the mutable keyword which defeats one of the main benefits of using a safe language like F#.

## Using Tail End Recursion

Tail end recursion is the process of recursively calling a method (identified by the **rec** keyword) that eats through the list and processing one element at a time.

1: let public regressionFast l : linearRegressionCoeffs =

2: let rec calcRegressionCoeffs l (coeffs:regressionCoeffs) =

3: match l with

4: |head::tail ->

5: let (x,y) = head

6:

7: let nextCoeffs = {

8: sumXiYi = coeffs.sumXiYi + x*y;

9: sumYi = coeffs.sumYi + y;

10: sumXi = coeffs.sumXi + x;

11: sumXiXi = coeffs.sumXiXi + x*x;}

12:

13: calcRegressionCoeffs tail nextCoeffs

14: |[]-> coeffs

15:

16: let t = calcRegressionCoeffs l {sumXiYi = 0.0; sumXi = 0.0; sumYi = 0.0; sumXiXi = 0.0;}

17: let length = (float) (List.length l)

18: let meanX = t.sumXi/length

19: let meanY = t.sumYi/length

20: let m = (t.sumXiYi - length*meanX*meanY) / (t.sumXiXi - length* meanX* meanX)

21: let c = meanY - m*meanX

22: {m = m; c=c;}

The **head::tail** syntax within the **match** statement is used to peal off the first element off the list and place the rest of the list in the **tail**. **calcRegressionCoeffs **is repeatedly executed until the code has eaten through all the elements until the list is empty **[]** at which point is is found by the second part of the **match** statement. In this version of the code the **regressionCoeffs** members can be immutable as a new variable (nextCoeffs) is created each time the code recursively eats through the list.

The code now appears very inefficient compared to the previous version that used a while loop. For instance it appears that the **calcRegressionCoeffs** method is executed for once for each element in the list and within each method call a temporary variable is created. However this is not the case as tail end recursion is a core idiom within functional languages and for this reason the compiler AUTOMATICALLY converts the code into a very fast while loop.

Using reflector and drilling into the compiled code we find the implementation of the **regressionFast** method

1: public static linearRegressionCoeffs regressionFast(List<Tuple<double, double>> l)

2: {

3: regressionCoeffs t = new regressionCoeffs(0.0, 0.0, 0.0, 0.0);

4: ListModule.iter<Tuple<double, double>>(new clo@31(t), l);

5: double length = ListModule.length<Tuple<double, double>>(l);

6: double meanX = t._sumXi / length;

7: double meanY = t._sumYi / length;

8: double b2 = (t._sumXiYi - ((length * meanX) * meanY)) / (t._sumXiXi - ((length * meanX) * meanX));

9: return new linearRegressionCoeffs(meanY - (b2 * meanX), b2);

10: }

11:

12:

The recursive code has been automatically converted into a call to the **iter **method which in turn calls the closure 15 class. The class itself is below:

1: [Serializable]

2: internal class clo@15 : OptimizedClosures.FastFunc2<List<Tuple<double, double>>, Regression.regressionCoeffs, Regression.regressionCoeffs>

3: {

4: // Methods

5: public clo@15();

6: public override Regression.regressionCoeffs Invoke(List<Tuple<double, double>> l, Regression.regressionCoeffs coeffs);

7: }

8:

9:

10: Expand Methods

11:

Notice that the compiler overrides the **Invoke** method.

1: public override Regression.regressionCoeffs Invoke(List<Tuple<double, double>> l, Regression.regressionCoeffs coeffs)

2: {

3: while (true)

4: {

5: List<Tuple<double, double>> that = l;

6: if (that is List<Tuple<double, double>>._Empty)

7: {

8: return coeffs;

9: }

10: List<Tuple<double, double>> list = ((List<Tuple<double, double>>._Cons) that).get___Tail();

11: Tuple<double, double> tuple2 = ((List<Tuple<double, double>>._Cons) that).get___Head();

12: double num = tuple2.Item2;

13: double num2 = tuple2.Item1;

14: Regression.regressionCoeffs coeffs = new Regression.regressionCoeffs(coeffs._sumXiXi + (num2 * num2), coeffs._sumXiYi + (num2 * num), coeffs._sumXi + num2, coeffs._sumYi + num);

15: coeffs = coeffs;

16: l = list;

17: }

18: }

19:

20:

Interestingly the **Invoke** method directly updates the local **coeff** instance**.** So within F# the **regressionCoeffs **type is immutable, however the compiler has created a mutable value that is updated. In summary the F# code that included recursion and temporary variables has been optimised away into a simple while loop and the temporary variable has replaced with a single instance.

## Finally here is the test

1: [<Test>]

2: let verify_regressionfast_using_static_data =

3: let samples = [

4: (1.0,15.95944862);

5: (2.0,14.70482969);

6: (3.0,19.57532692);

7: (4.0,24.37143171);

8: (5.0,18.32077464);

9: (6.0,27.80173622);

10: (7.0,29.20721381);

11: (8.0,40.07369171);

12: (9.0,39.10475664);

13: (10.0,46.4686677);

14: (11.0,35.46231593);

15: (12.0,52.42103983);

16: (13.0,54.16002796);

17: (14.0,56.40865623);

18: (15.0,48.77890855);

19: (16.0,57.80104059);

20: (17.0,57.79730541);

21: (18.0,72.22923153);

22: (19.0,57.63875286);

23: (20.0,75.45859069);

24: (21.0,70.21595096)]

25:

26: let resultSimple = RegressionSimple samples

27: Assert.AreApproximatelyEqual (10.0, resultSimple.c, 3.0)

28: Assert.AreApproximatelyEqual (3.0, resultSimple.m, 0.2)

29:

30: let resultWithLoop = regressionWithLoop samples

31: Assert.AreApproximatelyEqual (10.0, resultWithLoop.c, 3.0)

32: Assert.AreApproximatelyEqual (3.0, resultWithLoop.m, 0.2)

33:

34: let resultFast = regressionFast samples

35: Assert.AreApproximatelyEqual (10.0, resultFast.c, 3.0)

36: Assert.AreApproximatelyEqual (3.0, resultFast.m, 0.2)

37:

## Summary

F# allows the developer to write safe code using recursion and the compiler will optimise the code into a form similar to a hand crafted while loop.