Module: Tins::FileBinary

Defined in:
lib/tins/file_binary.rb

Overview

A module for detecting and analyzing binary files based on content patterns

This module provides functionality to determine whether a file contains binary data by examining its content against various thresholds for null bytes and high-order bits. It’s useful for identifying files that are not plain text, such as images, executables, or other binary formats.

The detection is performed by scanning a specified portion of the file and calculating the percentage of bytes that match binary criteria.

Defined Under Namespace

Modules: ClassMethods, Constants

Class Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Class Attribute Details

.default_optionsObject

Accessor for default options hash



33
34
35
# File 'lib/tins/file_binary.rb', line 33

def default_options
  @default_options
end

Class Method Details

.included(modul) ⇒ Object

Module inclusion hook that extends the including class with ClassMethods

Parameters:

  • modul (Module)

    The module that is including FileBinary



114
115
116
117
118
119
# File 'lib/tins/file_binary.rb', line 114

def self.included(modul)
  modul.instance_eval do
    extend ClassMethods
  end
  super
end

Instance Method Details

#ascii?(options = {}) ⇒ Boolean?

Returns the logical opposite of binary? - true if file is not binary (ASCII/text)

Examples:

Basic usage

FileBinary.ascii?('script.rb')  # => true

With custom options

FileBinary.ascii?('data.bin', percentage_zeros: 5.0)  # => false

Parameters:

  • options (Hash) (defaults to: {})

    Configuration options for ASCII detection

Options Hash (options):

  • :offset (Integer) — default: 0

    Starting position in file to begin analysis

  • :buffer_size (Integer) — default: 8192

    Number of bytes to read for analysis

  • :percentage_binary (Float) — default: 30.0

    Binary byte threshold percentage

  • :percentage_zeros (Float) — default: 0.0

    Null byte threshold percentage

Returns:

  • (Boolean, nil)

    true if ASCII/text, false if binary, nil if file is empty



104
105
106
107
108
109
# File 'lib/tins/file_binary.rb', line 104

def ascii?(options = {})
  case binary?(options)
  when true   then false
  when false  then true
  end
end

#binary?(options = {}) ⇒ Boolean?

Determines if a file is considered binary based on content analysis

A file is classified as binary if either:

  1. The percentage of null bytes exceeds the configured threshold

  2. The percentage of binary bytes (bytes with high-order bit set) exceeds the configured threshold

Examples:

Basic usage

FileBinary.binary?('large_binary_file.dat')  # => true

Custom thresholds

FileBinary.binary?('file.txt', percentage_binary: 10.0)  # => false

With offset and buffer size

FileBinary.binary?('file.log', offset: 1024, buffer_size: 4096)  # => true

Parameters:

  • options (Hash) (defaults to: {})

    Configuration options for binary detection

Options Hash (options):

  • :offset (Integer) — default: 0

    Starting position in file to begin analysis

  • :buffer_size (Integer) — default: 8192

    Number of bytes to read for analysis

  • :percentage_binary (Float) — default: 30.0

    Binary byte threshold percentage

  • :percentage_zeros (Float) — default: 0.0

    Null byte threshold percentage

Returns:

  • (Boolean, nil)

    true if binary, false if not binary, nil if file is empty



74
75
76
77
78
79
80
81
82
83
84
85
86
87
# File 'lib/tins/file_binary.rb', line 74

def binary?(options = {})
  options = FileBinary.default_options.merge(options)
  old_pos = tell
  seek options[:offset], Constants::SEEK_SET
  data = read options[:buffer_size]
  !data or data.empty? and return nil
  data_size = data.size
  data.count(Constants::ZERO_RE).to_f / data_size >
    options[:percentage_zeros] / 100.0 and return true
  data.count(Constants::BINARY_RE).to_f / data_size >
    options[:percentage_binary] / 100.0
ensure
  old_pos and seek old_pos, Constants::SEEK_SET
end