Module: MoreMath::Entropy
- Included in:
- Functions
- Defined in:
- lib/more_math/entropy.rb
Overview
Provides entropy calculation utilities for measuring information content and randomness in text data.
This module implements Shannon entropy calculations to quantify the unpredictability or information content of text strings. It’s commonly used in cryptography, data compression, and information theory applications.
The entropy measures help determine how “random” or “predictable” a text is, which can be useful for:
-
Password strength analysis
-
Data compression efficiency estimation
-
Cryptographic security assessment
-
Text analysis and classification
Instance Method Summary collapse
-
#collision_entropy_per_symbol(symbols) ⇒ Float
Calculates the collision entropy per symbol in the given symbols.
-
#collision_entropy_total(symbols) ⇒ Float
Calculates the total collision entropy for a sequence of symbols.
-
#entropy_ideal(size) ⇒ Float
Calculates the ideal (maximum) entropy for a given character set size.
-
#entropy_maximum(text, size:) ⇒ Integer
Calculates the maximum possible entropy for a given text and alphabet size.
-
#entropy_per_symbol(symbols) ⇒ Float
(also: #entropy)
Calculates the Shannon entropy per symbol in the given symbols.
-
#entropy_probabilities(symbols) ⇒ Hash<String, Float>
Calculates the probability distribution of symbols in the given input.
-
#entropy_ratio(text, size:) ⇒ Float
Calculates the normalized entropy ratio of a text string.
-
#entropy_total(symbols) ⇒ Float
Calculates the total entropy for a sequence of symbols.
-
#minimum_entropy_per_symbol(symbols) ⇒ Float
Calculates the minimum entropy per symbol in the given symbols.
-
#minimum_entropy_total(symbols) ⇒ Float
Calculates the total minimum entropy for a sequence of symbols.
Instance Method Details
#collision_entropy_per_symbol(symbols) ⇒ Float
Calculates the collision entropy per symbol in the given symbols.
This method computes the collision entropy, which measures the unpredictability of the most likely outcome in a symbol sequence. It’s based on the sum of squared probabilities of each symbol.
93 94 95 96 97 98 99 100 101 |
# File 'lib/more_math/entropy.rb', line 93 def collision_entropy_per_symbol(symbols) symbols = symbols.chars if symbols.respond_to?(:chars) symbols.empty? and return 0.0 probs = entropy_probabilities(symbols) -Math.log2(probs.values.sum { |p| p * p }) end |
#collision_entropy_total(symbols) ⇒ Float
Calculates the total collision entropy for a sequence of symbols.
This method computes the total information content of a symbol sequence using the collision entropy per symbol, multiplied by the total number of symbols. Collision entropy measures the unpredictability of the most likely outcome in a symbol sequence.
146 147 148 149 150 |
# File 'lib/more_math/entropy.rb', line 146 def collision_entropy_total(symbols) symbols = symbols.chars if symbols.respond_to?(:chars) collision_entropy_per_symbol(symbols) * symbols.size end |
#entropy_ideal(size) ⇒ Float
Calculates the ideal (maximum) entropy for a given character set size.
This represents the maximum possible entropy when all characters in the alphabet have equal probability of occurrence.
163 164 165 166 167 |
# File 'lib/more_math/entropy.rb', line 163 def entropy_ideal(size) size <= 1 and return 0.0 frequency = 1.0 / size -1.0 * size * frequency * Math.log2(frequency) end |
#entropy_maximum(text, size:) ⇒ Integer
Calculates the maximum possible entropy for a given text and alphabet size.
This represents the theoretical maximum entropy that could be achieved if all characters in the text were chosen uniformly at random from the alphabet. It’s used to determine the upper bound of security strength for tokens.
210 211 212 213 |
# File 'lib/more_math/entropy.rb', line 210 def entropy_maximum(text, size:) size > 1 or return 0 (text.size * Math.log2(size)).floor end |
#entropy_per_symbol(symbols) ⇒ Float Also known as: entropy
Calculates the Shannon entropy per symbol in the given symbols.
This method computes the entropy of a sequence of symbols, measuring the average information content or unpredictability of the symbols.
54 55 56 57 58 59 60 61 62 |
# File 'lib/more_math/entropy.rb', line 54 def entropy_per_symbol(symbols) symbols = symbols.chars if symbols.respond_to?(:chars) symbols.empty? and return 0.0 probs = entropy_probabilities(symbols) -probs.values.sum { |p| p * Math.log2(p) } end |
#entropy_probabilities(symbols) ⇒ Hash<String, Float>
Calculates the probability distribution of symbols in the given input.
This method computes the frequency of each symbol in the input and converts these frequencies into probabilities by dividing by the total number of symbols.
38 39 40 41 42 43 44 45 |
# File 'lib/more_math/entropy.rb', line 38 def entropy_probabilities(symbols) symbols = symbols.chars if symbols.respond_to?(:chars) freq = symbols.tally total = symbols.size freq.transform_values { |c| c.to_f / total } end |
#entropy_ratio(text, size:) ⇒ Float
Calculates the normalized entropy ratio of a text string.
The ratio is calculated as actual entropy divided by ideal entropy, giving a value between 0 and 1 where:
-
0 indicates no entropy (all characters are identical)
-
1 indicates maximum entropy (uniform distribution across the alphabet)
The normalization uses the specified alphabet size to calculate the theoretical maximum entropy for that character set.
190 191 192 193 |
# File 'lib/more_math/entropy.rb', line 190 def entropy_ratio(text, size:) size <= 1 and return 0.0 entropy(text) / entropy_ideal(size) end |
#entropy_total(symbols) ⇒ Float
Calculates the total entropy for a sequence of symbols.
This method computes the total information content of a symbol sequence by multiplying the entropy per symbol by the total number of symbols.
112 113 114 115 116 |
# File 'lib/more_math/entropy.rb', line 112 def entropy_total(symbols) symbols = symbols.chars if symbols.respond_to?(:chars) entropy_per_symbol(symbols) * symbols.size end |
#minimum_entropy_per_symbol(symbols) ⇒ Float
Calculates the minimum entropy per symbol in the given symbols.
This method determines the minimum possible entropy for a sequence of symbols, which represents the entropy when all symbols are equally likely.
74 75 76 77 78 79 80 81 82 |
# File 'lib/more_math/entropy.rb', line 74 def minimum_entropy_per_symbol(symbols) symbols = symbols.chars if symbols.respond_to?(:chars) symbols.empty? and return 0.0 probs = entropy_probabilities(symbols) -Math.log2(probs.values.max) end |
#minimum_entropy_total(symbols) ⇒ Float
Calculates the total minimum entropy for a sequence of symbols.
This method computes the total information content of a symbol sequence using the minimum entropy per symbol, multiplied by the total number of symbols. It represents the theoretical minimum possible entropy for the given sequence.
129 130 131 132 133 |
# File 'lib/more_math/entropy.rb', line 129 def minimum_entropy_total(symbols) symbols = symbols.chars if symbols.respond_to?(:chars) minimum_entropy_per_symbol(symbols) * symbols.size end |