Class: MoreMath::Sequence
- Includes:
- Enumerable, MovingAverage
- Defined in:
- lib/more_math/sequence.rb,
lib/more_math/sequence/moving_average.rb
Overview
A sequence class for statistical analysis and mathematical operations.
This class provides comprehensive statistical functionality including:
-
Basic sequence operations (iteration, size, etc.)
-
Statistical measures (mean, variance, standard deviation)
-
Advanced statistical methods (percentiles, confidence intervals)
-
Time series analysis (moving averages, autocorrelation)
-
Hypothesis testing (t-tests, confidence intervals)
-
Data visualization tools (histograms)
Defined Under Namespace
Modules: MovingAverage, Refinement
Instance Attribute Summary collapse
-
#elements ⇒ Array
readonly
Returns the array of elements.
Instance Method Summary collapse
-
#arithmetic_mean ⇒ Float
(also: #mean)
Returns the arithmetic mean of the elements.
-
#autocorrelation ⇒ Array<Float>
Returns the array of autocorrelation values.
-
#autovariance ⇒ Array<Float>
Returns the array of autovariances.
-
#common_standard_deviation(other) ⇒ Float
Returns an estimation of the common standard deviation of this and another sequence.
-
#common_variance(other) ⇒ Float
Returns an estimation of the common variance of this and another sequence.
-
#compute_student_df(other) ⇒ Integer
Computes the degrees of freedom for Student’s t-test.
-
#compute_welch_df(other) ⇒ Float
Computes the degrees of freedom for Welch’s t-test.
-
#confidence_interval(alpha = 0.05) ⇒ Range
Returns the confidence interval for the arithmetic mean.
-
#cover?(other, alpha = 0.05) ⇒ Boolean
Determines if this sequence covers another sequence at the given alpha level.
-
#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Hash?
Detects autocorrelation using the Ljung-Box statistic.
-
#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Hash?
Detects outliers using the boxplot algorithm.
-
#durbin_watson_statistic ⇒ Float
Returns the d-value for the Durbin-Watson statistic.
-
#each {|element| ... } ⇒ self
Calls the block for every element of this Sequence.
-
#empty? ⇒ Boolean
Returns true if this sequence is empty, otherwise false.
-
#geometric_mean ⇒ Float
Returns the geometric mean of the elements.
-
#harmonic_mean ⇒ Float
Returns the harmonic mean of the elements.
-
#histogram(bins) ⇒ Histogram
Creates a Histogram instance from this sequence.
-
#initialize(elements) ⇒ Sequence
constructor
Initializes a new Sequence instance with the given elements.
-
#interquartile_range ⇒ Float
Returns the interquartile range for this sequence.
-
#linear_regression ⇒ LinearRegression
Returns the LinearRegression object for this sequence.
-
#ljung_box_statistic(lags = 20) ⇒ Float?
Returns the q value of the Ljung-Box statistic.
-
#max ⇒ Object
Returns the maximum of the elements.
-
#min ⇒ Object
Returns the minimum of the elements.
-
#percentile(p = 50) ⇒ Float
(also: #median)
Returns the p-percentile of the elements.
-
#push(element) ⇒ Sequence
(also: #<<)
Pushes an element onto this Sequence and returns a new Sequence instance.
-
#reset ⇒ self
Reset all memoized values of this sequence.
-
#sample_standard_deviation ⇒ Float
Returns the sample standard deviation of the elements.
-
#sample_standard_deviation_percentage ⇒ Float
Returns the sample standard deviation as a percentage of the arithmetic mean.
-
#sample_variance ⇒ Float
Returns the sample variance of the elements.
-
#size ⇒ Integer
Returns the number of elements in this sequence.
-
#sorted ⇒ Array
Returns a sorted array of the elements.
-
#standard_deviation ⇒ Float
Returns the standard deviation of the elements.
-
#standard_deviation_percentage ⇒ Float
Returns the standard deviation as a percentage of the arithmetic mean.
-
#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Float
Computes the suggested sample size for detecting a mean difference.
-
#sum ⇒ Float
Returns the sum of all elements.
-
#sum_of_squares ⇒ Float
Returns the sum of squares of the elements.
-
#t_student(other) ⇒ Float
Returns the t value of the Student’s t-test between this sequence and another.
-
#t_welch(other) ⇒ Float
Returns the t value of the Welch’s t-test between this sequence and another.
-
#to_ary ⇒ Array
(also: #to_a)
Converts the sequence to an array.
-
#variance ⇒ Float
Returns the variance of the elements.
-
#z_score ⇒ Sequence
Returns the Z-score sequence derived from the current sequence.
Methods included from MovingAverage
Constructor Details
#initialize(elements) ⇒ Sequence
Initializes a new Sequence instance with the given elements.
31 32 33 |
# File 'lib/more_math/sequence.rb', line 31 def initialize(elements) @elements = elements.dup.freeze end |
Instance Attribute Details
#elements ⇒ Array (readonly)
Returns the array of elements.
38 39 40 |
# File 'lib/more_math/sequence.rb', line 38 def elements @elements end |
Instance Method Details
#arithmetic_mean ⇒ Float Also known as: mean
Returns the arithmetic mean of the elements.
178 179 180 181 |
# File 'lib/more_math/sequence.rb', line 178 memoize method: def arithmetic_mean sum / size end |
#autocorrelation ⇒ Array<Float>
Returns the array of autocorrelation values.
397 398 399 400 |
# File 'lib/more_math/sequence.rb', line 397 def autocorrelation c = autovariance Array.new(c.size) { |k| c[k] / c[0] } end |
#autovariance ⇒ Array<Float>
Returns the array of autovariances.
384 385 386 387 388 389 390 391 392 |
# File 'lib/more_math/sequence.rb', line 384 def autovariance Array.new(size - 1) do |k| s = 0.0 0.upto(size - k - 1) do |i| s += (@elements[i] - arithmetic_mean) * (@elements[i + k] - arithmetic_mean) end s / size end end |
#common_standard_deviation(other) ⇒ Float
Returns an estimation of the common standard deviation of this and another sequence.
309 310 311 |
# File 'lib/more_math/sequence.rb', line 309 def common_standard_deviation(other) Math.sqrt(common_variance(other)) end |
#common_variance(other) ⇒ Float
Returns an estimation of the common variance of this and another sequence.
317 318 319 320 |
# File 'lib/more_math/sequence.rb', line 317 def common_variance(other) (size - 1) * sample_variance + (other.size - 1) * other.sample_variance / (size + other.size - 2) end |
#compute_student_df(other) ⇒ Integer
Computes the degrees of freedom for Student’s t-test.
326 327 328 |
# File 'lib/more_math/sequence.rb', line 326 def compute_student_df(other) size + other.size - 2 end |
#compute_welch_df(other) ⇒ Float
Computes the degrees of freedom for Welch’s t-test.
286 287 288 289 290 |
# File 'lib/more_math/sequence.rb', line 286 def compute_welch_df(other) (sample_variance / size + other.sample_variance / other.size) ** 2 / ( (sample_variance ** 2 / (size ** 2 * (size - 1))) + (other.sample_variance ** 2 / (other.size ** 2 * (other.size - 1)))) end |
#confidence_interval(alpha = 0.05) ⇒ Range
Returns the confidence interval for the arithmetic mean.
374 375 376 377 378 379 |
# File 'lib/more_math/sequence.rb', line 374 def confidence_interval(alpha = 0.05) td = TDistribution.new(size - 1) t = td.inverse_probability(alpha / 2).abs delta = t * sample_standard_deviation / Math.sqrt(size) (arithmetic_mean - delta)..(arithmetic_mean + delta) end |
#cover?(other, alpha = 0.05) ⇒ Boolean
Determines if this sequence covers another sequence at the given alpha level.
364 365 366 367 368 |
# File 'lib/more_math/sequence.rb', line 364 def cover?(other, alpha = 0.05) t = t_welch(other) td = TDistribution.new(compute_welch_df(other)) t.abs < td.inverse_probability(1 - alpha.abs / 2.0) end |
#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Hash?
Detects autocorrelation using the Ljung-Box statistic.
428 429 430 431 432 433 434 435 436 437 438 439 |
# File 'lib/more_math/sequence.rb', line 428 def detect_autocorrelation(lags = 20, alpha_level = 0.05) if q = ljung_box_statistic(lags) p = ChiSquareDistribution.new(lags).probability(q) return { :lags => lags, :alpha_level => alpha_level, :q => q, :p => p, :detected => p >= 1 - alpha_level, } end end |
#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Hash?
Detects outliers using the boxplot algorithm.
455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 |
# File 'lib/more_math/sequence.rb', line 455 def detect_outliers(factor = 3.0, epsilon = 1E-5) half_factor = factor / 2.0 quartile1 = percentile(25) quartile3 = percentile(75) iqr = quartile3 - quartile1 iqr < epsilon and return result = @elements.inject(Hash.new(0)) do |h, t| extreme = case t when -Infinity..(quartile1 - factor * iqr) :very_low when (quartile1 - factor * iqr)..(quartile1 - half_factor * iqr) :low when (quartile1 + half_factor * iqr)..(quartile3 + factor * iqr) :high when (quartile3 + factor * iqr)..Infinity :very_high end and h[extreme] += 1 h end unless result.empty? result[:median] = median result[:iqr] = iqr result[:factor] = factor result end end |
#durbin_watson_statistic ⇒ Float
Returns the d-value for the Durbin-Watson statistic.
405 406 407 408 409 410 |
# File 'lib/more_math/sequence.rb', line 405 def durbin_watson_statistic e = linear_regression.residuals e.size <= 1 and return 2.0 (1...e.size).inject(0.0) { |s, i| s + (e[i] - e[i - 1]) ** 2 } / e.inject(0.0) { |s, x| s + x ** 2 } end |
#each {|element| ... } ⇒ self
Calls the block for every element of this Sequence.
45 46 47 |
# File 'lib/more_math/sequence.rb', line 45 def each(&block) @elements.each(&block) end |
#empty? ⇒ Boolean
Returns true if this sequence is empty, otherwise false.
53 54 55 |
# File 'lib/more_math/sequence.rb', line 53 def empty? @elements.empty? end |
#geometric_mean ⇒ Float
Returns the geometric mean of the elements.
The geometric mean is useful for sets of positive numbers that are to be multiplied together. Returns NaN if any element is negative, 0 if any element is zero.
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
# File 'lib/more_math/sequence.rb', line 208 memoize method: def geometric_mean sum = @elements.inject(0.0) { |s, t| case when t > 0 s + Math.log(t) when t == 0 break :null else break nil end } case sum when :null 0.0 when Float Math.exp(sum / size) else 0 / 0.0 end end |
#harmonic_mean ⇒ Float
Returns the harmonic mean of the elements.
The harmonic mean is useful for rates and ratios. Returns NaN if any element is <= 0.
190 191 192 193 194 195 196 197 198 199 200 |
# File 'lib/more_math/sequence.rb', line 190 memoize method: def harmonic_mean sum = @elements.inject(0.0) { |s, t| if t > 0 s + 1.0 / t else break nil end } sum ? size / sum : 0 / 0.0 end |
#histogram(bins) ⇒ Histogram
Creates a Histogram instance from this sequence.
495 496 497 |
# File 'lib/more_math/sequence.rb', line 495 def histogram(bins) Histogram.new(self, bins) end |
#interquartile_range ⇒ Float
Returns the interquartile range for this sequence.
444 445 446 447 448 |
# File 'lib/more_math/sequence.rb', line 444 def interquartile_range quartile1 = percentile(25) quartile3 = percentile(75) quartile3 - quartile1 end |
#linear_regression ⇒ LinearRegression
Returns the LinearRegression object for this sequence.
486 487 488 489 |
# File 'lib/more_math/sequence.rb', line 486 memoize method: def linear_regression LinearRegression.new @elements end |
#ljung_box_statistic(lags = 20) ⇒ Float?
Returns the q value of the Ljung-Box statistic.
416 417 418 419 420 421 |
# File 'lib/more_math/sequence.rb', line 416 def ljung_box_statistic(lags = 20) r = autocorrelation lags >= r.size and return n = size n * (n + 2) * (1..lags).inject(0.0) { |s, i| s + r[i] ** 2 / (n - i) } end |
#max ⇒ Object
Returns the maximum of the elements.
241 242 243 244 |
# File 'lib/more_math/sequence.rb', line 241 memoize method: def max @elements.max end |
#min ⇒ Object
Returns the minimum of the elements.
233 234 235 236 |
# File 'lib/more_math/sequence.rb', line 233 memoize method: def min @elements.min end |
#percentile(p = 50) ⇒ Float Also known as: median
Returns the p-percentile of the elements.
Uses weighted average at x_(n + 1)p for interpolation between percentiles.
261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 |
# File 'lib/more_math/sequence.rb', line 261 def percentile(p = 50) (0...100).include?(p) or raise ArgumentError, "p = #{p}, but has to be in (0...100)" p /= 100.0 sorted_elements = sorted r = p * (sorted_elements.size + 1) r_i = r.to_i r_f = r - r_i if r_i >= 1 result = sorted_elements[r_i - 1] if r_i < sorted_elements.size result += r_f * (sorted_elements[r_i] - sorted_elements[r_i - 1]) end else result = sorted_elements[0] end result end |
#push(element) ⇒ Sequence Also known as: <<
Pushes an element onto this Sequence and returns a new Sequence instance.
85 86 87 |
# File 'lib/more_math/sequence.rb', line 85 def push(element) Sequence.new(@elements.dup.push(element)) end |
#reset ⇒ self
Reset all memoized values of this sequence.
67 68 69 70 |
# File 'lib/more_math/sequence.rb', line 67 def reset self.class.mize_cache_clear self end |
#sample_standard_deviation ⇒ Float
Returns the sample standard deviation of the elements.
154 155 156 157 |
# File 'lib/more_math/sequence.rb', line 154 memoize method: def sample_standard_deviation Math.sqrt(sample_variance) end |
#sample_standard_deviation_percentage ⇒ Float
Returns the sample standard deviation as a percentage of the arithmetic mean.
162 163 164 165 |
# File 'lib/more_math/sequence.rb', line 162 memoize method: def sample_standard_deviation_percentage 100.0 * sample_standard_deviation / arithmetic_mean end |
#sample_variance ⇒ Float
Uses the formula: Σ(xi - μ)² / (n-1)
Returns the sample variance of the elements.
Sample variance is used when the data represents a sample rather than a population.
108 109 110 111 |
# File 'lib/more_math/sequence.rb', line 108 memoize method: def sample_variance size > 1 ? sum_of_squares / (size - 1.0) : 0.0 end |
#size ⇒ Integer
Returns the number of elements in this sequence.
60 61 62 |
# File 'lib/more_math/sequence.rb', line 60 def size @elements.size end |
#sorted ⇒ Array
Returns a sorted array of the elements.
249 250 251 252 |
# File 'lib/more_math/sequence.rb', line 249 memoize method: def sorted @elements.sort end |
#standard_deviation ⇒ Float
Returns the standard deviation of the elements.
Standard deviation measures the amount of variation or dispersion in a set of values.
128 129 130 131 |
# File 'lib/more_math/sequence.rb', line 128 memoize method: def standard_deviation Math.sqrt(variance) end |
#standard_deviation_percentage ⇒ Float
Returns the standard deviation as a percentage of the arithmetic mean.
146 147 148 149 |
# File 'lib/more_math/sequence.rb', line 146 memoize method: def standard_deviation_percentage 100.0 * standard_deviation / arithmetic_mean end |
#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Float
Computes the suggested sample size for detecting a mean difference.
349 350 351 352 353 354 355 356 357 |
# File 'lib/more_math/sequence.rb', line 349 def suggested_sample_size(other, alpha = 0.05, beta = 0.05) alpha, beta = alpha.abs, beta.abs signal = arithmetic_mean - other.arithmetic_mean df = size + other.size - 2 pooled_variance_estimate = (sum_of_squares + other.sum_of_squares) / df td = TDistribution.new df (((td.inverse_probability(alpha) + td.inverse_probability(beta)) * Math.sqrt(pooled_variance_estimate)) / signal) ** 2 end |
#sum ⇒ Float
Returns the sum of all elements.
170 171 172 173 |
# File 'lib/more_math/sequence.rb', line 170 memoize method: def sum @elements.inject(0.0) { |s, t| s + t } end |
#sum_of_squares ⇒ Float
Returns the sum of squares of the elements.
Sum of squares is used in variance and standard deviation calculations.
118 119 120 121 |
# File 'lib/more_math/sequence.rb', line 118 memoize method: def sum_of_squares @elements.inject(0.0) { |s, t| s + (t - arithmetic_mean) ** 2 } end |
#t_student(other) ⇒ Float
Returns the t value of the Student’s t-test between this sequence and another.
334 335 336 337 338 339 340 341 |
# File 'lib/more_math/sequence.rb', line 334 def t_student(other) signal = arithmetic_mean - other.arithmetic_mean noise = common_standard_deviation(other) * Math.sqrt(size ** -1 + size ** -1) signal / noise rescue Errno::EDOM 0.0 end |
#t_welch(other) ⇒ Float
Returns the t value of the Welch’s t-test between this sequence and another.
296 297 298 299 300 301 302 303 |
# File 'lib/more_math/sequence.rb', line 296 def t_welch(other) signal = arithmetic_mean - other.arithmetic_mean noise = Math.sqrt(sample_variance / size + other.sample_variance / other.size) signal / noise rescue Errno::EDOM 0.0 end |
#to_ary ⇒ Array Also known as: to_a
Converts the sequence to an array.
75 76 77 |
# File 'lib/more_math/sequence.rb', line 75 def to_ary @elements.dup end |
#variance ⇒ Float
Uses the formula: Σ(xi - μ)² / n
Returns the variance of the elements.
Variance measures how far each number in the set is from the mean.
97 98 99 100 |
# File 'lib/more_math/sequence.rb', line 97 memoize method: def variance sum_of_squares / size end |
#z_score ⇒ Sequence
Returns the Z-score sequence derived from the current sequence.
Z-scores standardize data by transforming it to have a mean of 0 and standard deviation of 1.
138 139 140 141 |
# File 'lib/more_math/sequence.rb', line 138 memoize method: def z_score self.class.new(elements.map { |t| t.to_f - mean / standard_deviation }) end |