Programming Assignment 4
CS-150, Fall ’02, Dr. C. S. Tritt
Due Wednesday, 11/13
Imagine you’re still stuck in 1000 B.C.E. with Dr. Viets and company. He asks you to perform a statistical analysis of some trebuchet data. In particular, he asks you to do what is known as least squares linear regression. A least squares linear regression provides the “best” constants a and b for the equation y = a + bx for a set of data. This equation can then be used to predict approximate values of y for specified values of x.
In this case, y represents the distance to the point of impact and x the trebuchet range setting (on a scale from 1 to 10). Sample data is given in the following table:
Setting |
Distance (m) |
|
Setting |
Distance (m) |
1.0 |
190 |
|
6.0 |
444 |
1.5 |
242 |
|
6.5 |
485 |
2.0 |
257 |
|
7.0 |
504 |
2.5 |
268 |
|
7.5 |
527 |
3.0 |
319 |
|
8.0 |
546 |
3.5 |
327 |
|
8.5 |
584 |
4.0 |
349 |
|
9.0 |
599 |
4.5 |
371 |
|
9.5 |
631 |
5.0 |
393 |
|
10.0 |
652 |
5.5 |
425 |
|
|
|
This data is available in tab delimited form on the course website as prob4data.txt.
The equations you should use to solve this problem are:
|
(1) |
|
(2) |
|
(3) |
|
(4) |
where n is the number of data pairs. Note that there computationally more efficient systems of equation for doing linear regression, but the given equations work well for the purposes of this course. Equations 1 and 2 simply calculate the mean for a given collection of values. Equations 3 and 4 find the constants for the regression.
For full credit, your program must make logical use arrays and functions. I suggest your program follow these general steps:
Read the given data from the input file and store it in two arrays (one for x values and one for y values). You may also want to store n, the number of data points.
Call a function that returns the mean of the array of x values.
Call the same function again to find the mean of the y values.
Call a function that returns b given n, , and arrays of x and y and.
Call a function that returns a given b, and .
Display the calculated values for a and b for the given data.
For testing purposes, I get values of a = 152.2 and b = 49.96 for the given data set.