Class: MoreMath::Sequence

Inherits:
Object show all
Includes:
Enumerable, MovingAverage
Defined in:
lib/more_math/sequence.rb,
lib/more_math/sequence/moving_average.rb

Overview

A sequence class for statistical analysis and mathematical operations.

This class provides comprehensive statistical functionality including:

  • Basic sequence operations (iteration, size, etc.)

  • Statistical measures (mean, variance, standard deviation)

  • Advanced statistical methods (percentiles, confidence intervals)

  • Time series analysis (moving averages, autocorrelation)

  • Hypothesis testing (t-tests, confidence intervals)

  • Data visualization tools (histograms)

Examples:

Basic usage

sequence = Sequence.new([1, 2, 3, 4, 5])
puts sequence.mean        # => 3.0
puts sequence.variance    # => 2.0
sequence.simple_moving_average(3) # => [2.0, 3.0, 4.0]

Statistical analysis

data = Sequence.new([10, 15, 20, 25, 30])
puts data.percentile(90)      # => 28.0
puts data.confidence_interval(0.05) # => 17.0..23.0

Defined Under Namespace

Modules: MovingAverage, Refinement

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from MovingAverage

#simple_moving_average

Constructor Details

#initialize(elements) ⇒ Sequence

Initializes a new Sequence instance with the given elements.

Parameters:

  • elements (Array)

    The array of elements to store in this sequence



31
32
33
# File 'lib/more_math/sequence.rb', line 31

def initialize(elements)
  @elements = elements.dup.freeze
end

Instance Attribute Details

#elementsArray (readonly)

Returns the array of elements.

Returns:

  • (Array)

    The frozen array of elements in this sequence



38
39
40
# File 'lib/more_math/sequence.rb', line 38

def elements
  @elements
end

Instance Method Details

#arithmetic_meanFloat Also known as: mean

Returns the arithmetic mean of the elements.

Returns:

  • (Float)

    The arithmetic mean (average) of the elements



178
179
180
181
# File 'lib/more_math/sequence.rb', line 178

memoize method:
def arithmetic_mean
  sum / size
end

#autocorrelationArray<Float>

Returns the array of autocorrelation values.

Returns:

  • (Array<Float>)

    Array of autocorrelation values (normalized by first variance)



397
398
399
400
# File 'lib/more_math/sequence.rb', line 397

def autocorrelation
  c = autovariance
  Array.new(c.size) { |k| c[k] / c[0] }
end

#autovarianceArray<Float>

Returns the array of autovariances.

Returns:

  • (Array<Float>)

    Array of autovariance values



384
385
386
387
388
389
390
391
392
# File 'lib/more_math/sequence.rb', line 384

def autovariance
  Array.new(size - 1) do |k|
    s = 0.0
    0.upto(size - k - 1) do |i|
      s += (@elements[i] - arithmetic_mean) * (@elements[i + k] - arithmetic_mean)
    end
    s / size
  end
end

#common_standard_deviation(other) ⇒ Float

Returns an estimation of the common standard deviation of this and another sequence.

Parameters:

  • other (Sequence)

    The other sequence to compare against

Returns:

  • (Float)

    The pooled standard deviation estimate



309
310
311
# File 'lib/more_math/sequence.rb', line 309

def common_standard_deviation(other)
  Math.sqrt(common_variance(other))
end

#common_variance(other) ⇒ Float

Returns an estimation of the common variance of this and another sequence.

Parameters:

  • other (Sequence)

    The other sequence to compare against

Returns:

  • (Float)

    The pooled variance estimate



317
318
319
320
# File 'lib/more_math/sequence.rb', line 317

def common_variance(other)
  (size - 1) * sample_variance + (other.size - 1) *
    other.sample_variance / (size + other.size - 2)
end

#compute_student_df(other) ⇒ Integer

Computes the degrees of freedom for Student’s t-test.

Parameters:

  • other (Sequence)

    The other sequence to compare against

Returns:

  • (Integer)

    The degrees of freedom for Student’s t-test



326
327
328
# File 'lib/more_math/sequence.rb', line 326

def compute_student_df(other)
  size + other.size - 2
end

#compute_welch_df(other) ⇒ Float

Computes the degrees of freedom for Welch’s t-test.

Parameters:

  • other (Sequence)

    The other sequence to compare against

Returns:

  • (Float)

    The degrees of freedom for Welch’s t-test



286
287
288
289
290
# File 'lib/more_math/sequence.rb', line 286

def compute_welch_df(other)
  (sample_variance / size + other.sample_variance / other.size) ** 2 / (
    (sample_variance ** 2 / (size ** 2 * (size - 1))) +
    (other.sample_variance ** 2 / (other.size ** 2 * (other.size - 1))))
end

#confidence_interval(alpha = 0.05) ⇒ Range

Returns the confidence interval for the arithmetic mean.

Parameters:

  • alpha (Float) (defaults to: 0.05)

    The significance level (default: 0.05)

Returns:

  • (Range)

    The confidence interval as a range object



374
375
376
377
378
379
# File 'lib/more_math/sequence.rb', line 374

def confidence_interval(alpha = 0.05)
  td = TDistribution.new(size - 1)
  t = td.inverse_probability(alpha / 2).abs
  delta = t * sample_standard_deviation / Math.sqrt(size)
  (arithmetic_mean - delta)..(arithmetic_mean + delta)
end

#cover?(other, alpha = 0.05) ⇒ Boolean

Determines if this sequence covers another sequence at the given alpha level.

Parameters:

  • other (Sequence)

    The other sequence to compare against

  • alpha (Float) (defaults to: 0.05)

    The significance level (default: 0.05)

Returns:

  • (Boolean)

    true if sequences are statistically equivalent



364
365
366
367
368
# File 'lib/more_math/sequence.rb', line 364

def cover?(other, alpha = 0.05)
  t = t_welch(other)
  td = TDistribution.new(compute_welch_df(other))
  t.abs < td.inverse_probability(1 - alpha.abs / 2.0)
end

#detect_autocorrelation(lags = 20, alpha_level = 0.05) ⇒ Hash?

Detects autocorrelation using the Ljung-Box statistic.

Parameters:

  • lags (Integer) (defaults to: 20)

    The number of lags to consider (default: 20)

  • alpha_level (Float) (defaults to: 0.05)

    The significance level (default: 0.05)

Returns:

  • (Hash, nil)

    Results hash or nil if insufficient data



428
429
430
431
432
433
434
435
436
437
438
439
# File 'lib/more_math/sequence.rb', line 428

def detect_autocorrelation(lags = 20, alpha_level = 0.05)
  if q = ljung_box_statistic(lags)
    p = ChiSquareDistribution.new(lags).probability(q)
    return {
      :lags         => lags,
      :alpha_level  => alpha_level,
      :q            => q,
      :p            => p,
      :detected     => p >= 1 - alpha_level,
    }
  end
end

#detect_outliers(factor = 3.0, epsilon = 1E-5) ⇒ Hash?

Detects outliers using the boxplot algorithm.

Parameters:

  • factor (Float) (defaults to: 3.0)

    The multiplier for IQR to define outlier boundaries (default: 3.0)

  • epsilon (Float) (defaults to: 1E-5)

    Small value for numerical stability (default: 1E-5)

Returns:

  • (Hash, nil)

    Outlier statistics or nil if no outliers or insufficient data



455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
# File 'lib/more_math/sequence.rb', line 455

def detect_outliers(factor = 3.0, epsilon = 1E-5)
  half_factor = factor / 2.0
  quartile1 = percentile(25)
  quartile3 = percentile(75)
  iqr = quartile3 - quartile1
  iqr < epsilon and return
  result = @elements.inject(Hash.new(0)) do |h, t|
    extreme =
      case t
      when -Infinity..(quartile1 - factor * iqr)
        :very_low
      when (quartile1 - factor * iqr)..(quartile1 - half_factor * iqr)
        :low
      when (quartile1 + half_factor * iqr)..(quartile3 + factor * iqr)
        :high
      when (quartile3 + factor * iqr)..Infinity
        :very_high
      end and h[extreme] += 1
    h
  end
  unless result.empty?
    result[:median] = median
    result[:iqr] = iqr
    result[:factor] = factor
    result
  end
end

#durbin_watson_statisticFloat

Returns the d-value for the Durbin-Watson statistic.

Returns:

  • (Float)

    The Durbin-Watson statistic value (close to 2 indicates no autocorrelation)



405
406
407
408
409
410
# File 'lib/more_math/sequence.rb', line 405

def durbin_watson_statistic
  e = linear_regression.residuals
  e.size <= 1 and return 2.0
  (1...e.size).inject(0.0) { |s, i| s + (e[i] - e[i - 1]) ** 2 } /
    e.inject(0.0) { |s, x| s + x ** 2 }
end

#each {|element| ... } ⇒ self

Calls the block for every element of this Sequence.

Yields:

  • (element)

    Yields each element to the block

Yield Parameters:

  • element (Object)

    Each element in the sequence

Returns:

  • (self)

    Returns self to allow method chaining



45
46
47
# File 'lib/more_math/sequence.rb', line 45

def each(&block)
  @elements.each(&block)
end

#empty?Boolean

Returns true if this sequence is empty, otherwise false.

Returns:

  • (Boolean)

    true if sequence has no elements, false otherwise



53
54
55
# File 'lib/more_math/sequence.rb', line 53

def empty?
  @elements.empty?
end

#geometric_meanFloat

Returns the geometric mean of the elements.

The geometric mean is useful for sets of positive numbers that are to be multiplied together. Returns NaN if any element is negative, 0 if any element is zero.

Returns:

  • (Float)

    The geometric mean, or NaN if invalid input



208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
# File 'lib/more_math/sequence.rb', line 208

memoize method:
def geometric_mean
  sum = @elements.inject(0.0) { |s, t|
    case
    when t > 0
      s + Math.log(t)
    when t == 0
      break :null
    else
      break nil
    end
  }
  case sum
  when :null
    0.0
  when Float
    Math.exp(sum / size)
  else
    0 / 0.0
  end
end

#harmonic_meanFloat

Returns the harmonic mean of the elements.

The harmonic mean is useful for rates and ratios. Returns NaN if any element is <= 0.

Returns:

  • (Float)

    The harmonic mean, or NaN if invalid input



190
191
192
193
194
195
196
197
198
199
200
# File 'lib/more_math/sequence.rb', line 190

memoize method:
def harmonic_mean
  sum = @elements.inject(0.0) { |s, t|
    if t > 0
      s + 1.0 / t
    else
      break nil
    end
  }
  sum ? size / sum : 0 / 0.0
end

#histogram(bins) ⇒ Histogram

Creates a Histogram instance from this sequence.

Parameters:

  • bins (Integer)

    The number of bins for the histogram

Returns:



495
496
497
# File 'lib/more_math/sequence.rb', line 495

def histogram(bins)
  Histogram.new(self, bins)
end

#interquartile_rangeFloat

Returns the interquartile range for this sequence.

Returns:

  • (Float)

    The difference between 75th and 25th percentiles



444
445
446
447
448
# File 'lib/more_math/sequence.rb', line 444

def interquartile_range
  quartile1 = percentile(25)
  quartile3 = percentile(75)
  quartile3 - quartile1
end

#linear_regressionLinearRegression

Returns the LinearRegression object for this sequence.

Returns:



486
487
488
489
# File 'lib/more_math/sequence.rb', line 486

memoize method:
def linear_regression
  LinearRegression.new @elements
end

#ljung_box_statistic(lags = 20) ⇒ Float?

Returns the q value of the Ljung-Box statistic.

Parameters:

  • lags (Integer) (defaults to: 20)

    The number of lags to consider (default: 20)

Returns:

  • (Float, nil)

    The Ljung-Box statistic value or nil if insufficient data



416
417
418
419
420
421
# File 'lib/more_math/sequence.rb', line 416

def ljung_box_statistic(lags = 20)
  r = autocorrelation
  lags >= r.size and return
  n = size
  n * (n + 2) * (1..lags).inject(0.0) { |s, i| s + r[i] ** 2 / (n - i) }
end

#maxObject

Returns the maximum of the elements.

Returns:

  • (Object)

    The maximum element in the sequence



241
242
243
244
# File 'lib/more_math/sequence.rb', line 241

memoize method:
def max
  @elements.max
end

#minObject

Returns the minimum of the elements.

Returns:

  • (Object)

    The minimum element in the sequence



233
234
235
236
# File 'lib/more_math/sequence.rb', line 233

memoize method:
def min
  @elements.min
end

#percentile(p = 50) ⇒ Float Also known as: median

Returns the p-percentile of the elements.

Uses weighted average at x_(n + 1)p for interpolation between percentiles.

Parameters:

  • p (Integer, Float) (defaults to: 50)

    The percentile to calculate (0-99)

Returns:

  • (Float)

    The p-th percentile value

Raises:

  • (ArgumentError)

    If p is not in the range (0…100)



261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
# File 'lib/more_math/sequence.rb', line 261

def percentile(p = 50)
  (0...100).include?(p) or
    raise ArgumentError, "p = #{p}, but has to be in (0...100)"
  p /= 100.0
  sorted_elements = sorted
  r = p * (sorted_elements.size + 1)
  r_i = r.to_i
  r_f = r - r_i
  if r_i >= 1
    result = sorted_elements[r_i - 1]
    if r_i < sorted_elements.size
      result += r_f * (sorted_elements[r_i] - sorted_elements[r_i - 1])
    end
  else
    result = sorted_elements[0]
  end
  result
end

#push(element) ⇒ Sequence Also known as: <<

Pushes an element onto this Sequence and returns a new Sequence instance.

Parameters:

  • element (Object)

    The element to add to the sequence

Returns:

  • (Sequence)

    A new Sequence instance with the element added



85
86
87
# File 'lib/more_math/sequence.rb', line 85

def push(element)
  Sequence.new(@elements.dup.push(element))
end

#resetself

Reset all memoized values of this sequence.

Returns:

  • (self)

    Returns self after clearing memoization cache



67
68
69
70
# File 'lib/more_math/sequence.rb', line 67

def reset
  self.class.mize_cache_clear
  self
end

#sample_standard_deviationFloat

Returns the sample standard deviation of the elements.

Returns:

  • (Float)

    The sample standard deviation



154
155
156
157
# File 'lib/more_math/sequence.rb', line 154

memoize method:
def sample_standard_deviation
  Math.sqrt(sample_variance)
end

#sample_standard_deviation_percentageFloat

Returns the sample standard deviation as a percentage of the arithmetic mean.

Returns:

  • (Float)

    Sample standard deviation expressed as a percentage of the mean



162
163
164
165
# File 'lib/more_math/sequence.rb', line 162

memoize method:
def sample_standard_deviation_percentage
  100.0 * sample_standard_deviation / arithmetic_mean
end

#sample_varianceFloat

Note:

Uses the formula: Σ(xi - μ)² / (n-1)

Returns the sample variance of the elements.

Sample variance is used when the data represents a sample rather than a population.

Returns:

  • (Float)

    The sample variance of the elements



108
109
110
111
# File 'lib/more_math/sequence.rb', line 108

memoize method:
def sample_variance
  size > 1 ? sum_of_squares / (size - 1.0) : 0.0
end

#sizeInteger

Returns the number of elements in this sequence.

Returns:

  • (Integer)

    The count of elements in the sequence



60
61
62
# File 'lib/more_math/sequence.rb', line 60

def size
  @elements.size
end

#sortedArray

Returns a sorted array of the elements.

Returns:

  • (Array)

    A new array containing elements sorted in ascending order



249
250
251
252
# File 'lib/more_math/sequence.rb', line 249

memoize method:
def sorted
  @elements.sort
end

#standard_deviationFloat

Returns the standard deviation of the elements.

Standard deviation measures the amount of variation or dispersion in a set of values.

Returns:

  • (Float)

    The population standard deviation



128
129
130
131
# File 'lib/more_math/sequence.rb', line 128

memoize method:
def standard_deviation
  Math.sqrt(variance)
end

#standard_deviation_percentageFloat

Returns the standard deviation as a percentage of the arithmetic mean.

Returns:

  • (Float)

    Standard deviation expressed as a percentage of the mean



146
147
148
149
# File 'lib/more_math/sequence.rb', line 146

memoize method:
def standard_deviation_percentage
  100.0 * standard_deviation / arithmetic_mean
end

#suggested_sample_size(other, alpha = 0.05, beta = 0.05) ⇒ Float

Computes the suggested sample size for detecting a mean difference.

Parameters:

  • other (Sequence)

    The other sequence to compare against

  • alpha (Float) (defaults to: 0.05)

    The significance level (default: 0.05)

  • beta (Float) (defaults to: 0.05)

    The Type II error probability (default: 0.05)

Returns:

  • (Float)

    The suggested sample size



349
350
351
352
353
354
355
356
357
# File 'lib/more_math/sequence.rb', line 349

def suggested_sample_size(other, alpha = 0.05, beta = 0.05)
  alpha, beta = alpha.abs, beta.abs
  signal = arithmetic_mean - other.arithmetic_mean
  df = size + other.size - 2
  pooled_variance_estimate = (sum_of_squares + other.sum_of_squares) / df
  td = TDistribution.new df
  (((td.inverse_probability(alpha) + td.inverse_probability(beta)) *
    Math.sqrt(pooled_variance_estimate)) / signal) ** 2
end

#sumFloat

Returns the sum of all elements.

Returns:

  • (Float)

    The sum of all elements in the sequence



170
171
172
173
# File 'lib/more_math/sequence.rb', line 170

memoize method:
def sum
  @elements.inject(0.0) { |s, t| s + t }
end

#sum_of_squaresFloat

Returns the sum of squares of the elements.

Sum of squares is used in variance and standard deviation calculations.

Returns:

  • (Float)

    The sum of squared deviations from the mean



118
119
120
121
# File 'lib/more_math/sequence.rb', line 118

memoize method:
def sum_of_squares
  @elements.inject(0.0) { |s, t| s + (t - arithmetic_mean) ** 2 }
end

#t_student(other) ⇒ Float

Returns the t value of the Student’s t-test between this sequence and another.

Parameters:

  • other (Sequence)

    The other sequence to compare against

Returns:

  • (Float)

    The t-statistic value



334
335
336
337
338
339
340
341
# File 'lib/more_math/sequence.rb', line 334

def t_student(other)
  signal = arithmetic_mean - other.arithmetic_mean
  noise = common_standard_deviation(other) *
    Math.sqrt(size ** -1 + size ** -1)
  signal / noise
rescue Errno::EDOM
  0.0
end

#t_welch(other) ⇒ Float

Returns the t value of the Welch’s t-test between this sequence and another.

Parameters:

  • other (Sequence)

    The other sequence to compare against

Returns:

  • (Float)

    The t-statistic value



296
297
298
299
300
301
302
303
# File 'lib/more_math/sequence.rb', line 296

def t_welch(other)
  signal = arithmetic_mean - other.arithmetic_mean
  noise = Math.sqrt(sample_variance / size +
    other.sample_variance / other.size)
  signal / noise
rescue Errno::EDOM
  0.0
end

#to_aryArray Also known as: to_a

Converts the sequence to an array.

Returns:

  • (Array)

    A duplicate of the internal elements array



75
76
77
# File 'lib/more_math/sequence.rb', line 75

def to_ary
  @elements.dup
end

#varianceFloat

Note:

Uses the formula: Σ(xi - μ)² / n

Returns the variance of the elements.

Variance measures how far each number in the set is from the mean.

Returns:

  • (Float)

    The population variance of the elements



97
98
99
100
# File 'lib/more_math/sequence.rb', line 97

memoize method:
def variance
  sum_of_squares / size
end

#z_scoreSequence

Returns the Z-score sequence derived from the current sequence.

Z-scores standardize data by transforming it to have a mean of 0 and standard deviation of 1.

Returns:

  • (Sequence)

    A new Sequence with z-score values



138
139
140
141
# File 'lib/more_math/sequence.rb', line 138

memoize method:
def z_score
  self.class.new(elements.map { |t| t.to_f - mean / standard_deviation })
end