Module: OllamaChat::SourceFetching

Included in:
Chat
Defined in:
lib/ollama_chat/source_fetching.rb

Overview

A module that provides functionality for fetching and processing various types of content sources.

The SourceFetching module encapsulates methods for retrieving content from different source types including URLs, file paths, and shell commands. It handles the logic for determining the appropriate fetching method based on the source identifier and processes the retrieved content through specialized parsers depending on the content type. The module also manages image handling, document importing, summarizing, and embedding operations while providing error handling and debugging capabilities.

Examples:

Fetching content from a URL

chat.fetch_source('https://example.com/document.html') do |source_io|
  # Process the fetched content
end

Importing a local file

chat.fetch_source('/path/to/local/file.txt') do |source_io|
  # Process the imported file content
end

Executing a shell command

chat.fetch_source('!ls -la') do |source_io|
  # Process the command output
end

Instance Method Summary collapse

Instance Method Details

#add_image(images, source_io, source) ⇒ Object

Adds an image to the images collection from the given source IO and source identifier.

This method takes an IO object containing image data and associates it with a source, creating an Ollama::Image instance and adding it to the images array.

Parameters:

  • images (Array)

    The collection of images to which the new image will be added

  • source_io (IO)

    The input stream containing the image data

  • source (String, #to_s)

    The identifier or path for the source of the image



99
100
101
102
103
# File 'lib/ollama_chat/source_fetching.rb', line 99

def add_image(images, source_io, source)
  STDERR.puts "Adding #{source_io&.content_type} image #{source.to_s.inspect}."
  image = Ollama::Image.for_io(source_io, path: source.to_s)
  (images << image).uniq!
end

#embed(source) ⇒ String?

Embeds content from the specified source.

This method fetches content from a given source (command, URL, or file) and processes it for embedding using the embed_source method. If embedding is disabled, it falls back to generating a summary instead.

Parameters:

  • source (String)

    The source identifier which can be a command, URL, or file path

Returns:

  • (String, nil)

    The formatted embedding result or summary message, or nil if the operation fails



244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
# File 'lib/ollama_chat/source_fetching.rb', line 244

def embed(source)
  if @embedding.on?
    STDOUT.puts "Now embedding #{source.to_s.inspect}."
    fetch_source(source) do |source_io|
      content = parse_source(source_io)
      content.present? or return
      source_io.rewind
      embed_source(source_io, source)
    end
    config.prompts.embed % { source: }
  else
    STDOUT.puts "Embedding is off, so I will just give a small summary of this source."
    summarize(source)
  end
end

#embed_source(source_io, source, count: nil) ⇒ Array, ...

Embeds content from the given source IO and source identifier.

This method processes document content by splitting it into chunks using various splitting strategies (Character, RecursiveCharacter, Semantic) and adds the chunks to a document store for embedding.

Parameters:

  • source_io (IO)

    The input stream containing the document content to embed

  • source (String, #to_s)

    The identifier or path for the source of the content

  • count (Integer, nil) (defaults to: nil)

    An optional counter for tracking processing order

Returns:

  • (Array, String, nil)

    The embedded chunks or processed content, or nil if embedding is disabled or fails



187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/ollama_chat/source_fetching.rb', line 187

def embed_source(source_io, source, count: nil)
  @embedding.on? or return parse_source(source_io)
  m = "Embedding #{italic { source_io&.content_type }} document #{source.to_s.inspect}."
  if count
    STDOUT.puts '%u. %s' % [ count, m ]
  else
    STDOUT.puts m
  end
  text = parse_source(source_io) or return
  text.downcase!
  splitter_config = config.embedding.splitter
  inputs = nil
  case splitter_config.name
  when 'Character'
    splitter = Documentrix::Documents::Splitters::Character.new(
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(text)
  when 'RecursiveCharacter'
    splitter = Documentrix::Documents::Splitters::RecursiveCharacter.new(
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(text)
  when 'Semantic'
    splitter = Documentrix::Documents::Splitters::Semantic.new(
      ollama:, model: config.embedding.model.name,
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(
      text,
      breakpoint: splitter_config.breakpoint.to_sym,
      percentage: splitter_config.percentage?,
      percentile: splitter_config.percentile?,
    )
  end
  inputs or return
  source = source.to_s
  if source.start_with?(?!)
    source = Kramdown::ANSI::Width.truncate(
      source[1..-1].gsub(/\W+/, ?_),
      length: 10
    )
  end
  @documents.add(inputs, source:, batch_size: config.embedding.batch_size?)
end

#fetch_source(source, check_exist: false) {|tmp| ... } ⇒ Object

The fetch_source method retrieves content from various source types including commands, URLs, and file paths. It processes the source based on its type and yields a temporary file handle for further processing.

Parameters:

  • source (String, #to_path)

    the source identifier which can be a command, URL, or file path

Yields:

  • (tmp)


35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/ollama_chat/source_fetching.rb', line 35

def fetch_source(source, check_exist: false, &block)
  source = source.ask_and_send_or_self(:to_path).to_s
  case source
  when %r{\A!(.*)}
    command = $1
    OllamaChat::Utils::Fetcher.execute(command) do |tmp|
      block.(tmp)
    end
  when %r{\Ahttps?://\S+}
    links.add(source.to_s)
    get_url(source, cache:) do |tmp|
      block.(tmp)
    end
  when %r{\Afile://([^\s#]+)}
    filename = $1
    filename = URI.decode_www_form_component(filename)
    filename = File.expand_path(filename)
    check_exist && !File.exist?(filename) and return
    fetch_source_as_filename(filename, &block)
  when  %r{\A((?:\.\.|[~.]?)/(?:\\ |\\|[^\\]+)+)}
    filename = $1
    filename = filename.gsub('\ ', ' ')
    filename = File.expand_path(filename)
    check_exist && !File.exist?(filename) and return
    fetch_source_as_filename(filename, &block)
  when %r{\A"((?:\.\.|[~.]?)/(?:\\"|\\|[^"\\]+)+)"}
    filename = $1
    filename = filename.gsub('\"', ?")
    filename = File.expand_path(filename)
    check_exist && !File.exist?(filename) and return
    fetch_source_as_filename(filename, &block)
  else
    raise "invalid source #{source.inspect}"
  end
rescue => e
  STDERR.puts "Cannot fetch source #{source.to_s.inspect}: #{e.class} #{e}\n#{e.backtrace * ?\n}"
end

#fetch_source_as_filename(filename) {|file| ... } ⇒ nil, Object (private)

Reads a file and extends it with header extension metadata. It then yields the file to the provided block for processing. If the file does not exist, it outputs an error message to standard error.

Parameters:

  • filename (String)

    the path to the file to be read

Yields:

  • (file)

    yields the opened file with header extension

Returns:

  • (nil)

    returns nil if the file does not exist

  • (Object)

    returns the result of the block execution if the file exists



83
84
85
86
87
# File 'lib/ollama_chat/source_fetching.rb', line 83

private def fetch_source_as_filename(filename, &block)
  OllamaChat::Utils::Fetcher.read(filename) do |tmp|
    block.(tmp)
  end
end

#import(source) ⇒ String?

Imports content from the specified source and processes it.

This method fetches content from a given source (command, URL, or file) and passes the resulting IO object to the import_source method for processing.

Parameters:

  • source (String)

    The source identifier which can be a command, URL, or file path

Returns:

  • (String, nil)

    A formatted message indicating the import result and parsed content, # or nil if the operation fails



132
133
134
135
136
137
138
# File 'lib/ollama_chat/source_fetching.rb', line 132

def import(source)
  fetch_source(source) do |source_io|
    content = import_source(source_io, source) or return
    source_io.rewind
    content
  end
end

#import_source(source_io, source) ⇒ String

The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.

Parameters:

  • source_io (IO)

    the input stream containing the document content

  • source (String)

    the source identifier or path

Returns:

  • (String)

    a formatted message indicating the import result and the parsed content



114
115
116
117
118
119
120
# File 'lib/ollama_chat/source_fetching.rb', line 114

def import_source(source_io, source)
  source        = source.to_s
  document_type = source_io&.content_type.full? { |ct| italic { ct } + ' ' }
  STDOUT.puts "Importing #{document_type}document #{source.to_s.inspect} now."
  source_content = parse_source(source_io)
  "Imported #{source.inspect}:\n\n#{source_content}\n\n"
end

Returns the links set for this object, initializing it lazily if needed.

The links set is memoized, meaning it will only be created once per object instance and subsequent calls will return the same Set instance.

Returns:

  • (Set)

    A Set object containing all links associated with this instance



267
268
269
# File 'lib/ollama_chat/source_fetching.rb', line 267

def links
  @links ||= Set.new
end

#summarize(source, words: nil) ⇒ String?

Summarizes content from the specified source.

This method fetches content from a given source (command, URL, or file) and generates a summary using the summarize_source method.

Parameters:

  • source (String)

    The source identifier which can be a command, URL, or file path

  • words (Integer, nil) (defaults to: nil)

    The target number of words for the summary (defaults to 100)

Returns:

  • (String, nil)

    The formatted summary message or nil if the operation fails



167
168
169
170
171
172
173
# File 'lib/ollama_chat/source_fetching.rb', line 167

def summarize(source, words: nil)
  fetch_source(source) do |source_io|
    content = summarize_source(source_io, source, words:) or return
    source_io.rewind
    content
  end
end

#summarize_source(source_io, source, words: nil) ⇒ String?

Summarizes content from the given source IO and source identifier.

This method takes an IO object containing document content and generates a summary based on the configured prompt template and word count.

Parameters:

  • source_io (IO)

    The input stream containing the document content to summarize

  • source (String, #to_s)

    The identifier or path for the source of the content

  • words (Integer, nil) (defaults to: nil)

    The target number of words for the summary (defaults to 100)

Returns:

  • (String, nil)

    The formatted summary message or nil if content is empty or cannot be processed



149
150
151
152
153
154
155
156
# File 'lib/ollama_chat/source_fetching.rb', line 149

def summarize_source(source_io, source, words: nil)
  STDOUT.puts "Summarizing #{italic { source_io&.content_type }} document #{source.to_s.inspect} now."
  words = words.to_i
  words < 1 and words = 100
  source_content = parse_source(source_io)
  source_content.present? or return
  config.prompts.summarize % { source_content:, words: }
end