map{|x|x.to_f}
samples2 = %w( 2 11 6 14 20 21 3 4 8 13).map{|x|x.to_f}
These two data sets in fact have the same mean, 10.2. But they clearly represent
wildly different performance profiles, as can be seen from their graph (see
Figure 6-1).
We need a new statistic to measure how much the data varies from the mean. That
statistic is the standard deviation. The standard deviation of a sample is calculated by
taking the root mean square deviation from the sample mean. In Ruby, it looks like this:
module Enumerable
def population_stdev
Math.sqrt( map{|x| (x - mean) ** 2}.mean )
end
end
This code maps over the collection, taking the square of the deviation of each element
from the mean. It then takes the mean of those squared values, and takes the
square root of the mean, yielding the standard deviation.
However, this is only half the story. What has been introduced so far is the
population standard deviation, while what we really want is the sample standard
deviation. Without completely diving into the relevant mathematics, the basic difference
between the two is whether the data represent an entire population or only a
portion of it.
Figure 6-1. Two vastly different response-time profiles with the same mean
samples1
16
2
6
11
20
samples2
150 | Chapter 6: Performance
As our data set represents application response times, from which we want to infer a
mean and confidence interval applicable to data points we have not sampled, we want
to use the sample standard deviation.
Pages:
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233