Module: OllamaChat::SourceFetching
- Included in:
- Chat
- Defined in:
- lib/ollama_chat/source_fetching.rb
Overview
A module that provides functionality for fetching and processing various types of content sources.
The SourceFetching module encapsulates methods for retrieving content from different source types including URLs, file paths, and shell commands. It handles the logic for determining the appropriate fetching method based on the source identifier and processes the retrieved content through specialized parsers depending on the content type. The module also manages image handling, document importing, summarizing, and embedding operations while providing error handling and debugging capabilities.
Instance Method Summary collapse
-
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
-
#embed(source) ⇒ String?
Embeds content from the specified source.
-
#embed_source(source_io, source, count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
-
#fetch_source(source, check_exist: false) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths.
-
#fetch_source_as_filename(filename) {|file| ... } ⇒ nil, Object
private
Reads a file and extends it with header extension metadata.
-
#http_options(url) ⇒ Hash
The http_options method prepares HTTP options for requests based on configuration settings.
-
#import(source) ⇒ String?
Imports content from the specified source and processes it.
-
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
-
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
-
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
Instance Method Details
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
This method takes an IO object containing image data and associates it with a source, creating an Ollama::Image instance and adding it to the images array.
124 125 126 127 128 |
# File 'lib/ollama_chat/source_fetching.rb', line 124 def add_image(images, source_io, source) STDERR.puts "Adding #{source_io&.content_type} image #{source.to_s.inspect}." image = Ollama::Image.for_io(source_io, path: source.to_s) (images << image).uniq! end |
#embed(source) ⇒ String?
Embeds content from the specified source.
This method fetches content from a given source (command, URL, or file) and processes it for embedding using the embed_source method. If embedding is disabled, it falls back to generating a summary instead.
or file path
nil if the operation fails
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
# File 'lib/ollama_chat/source_fetching.rb', line 270 def (source) if @embedding.on? STDOUT.puts "Now embedding #{source.to_s.inspect}." fetch_source(source) do |source_io| content = parse_source(source_io) content.present? or return source_io.rewind (source_io, source) end config.prompts. % { source: } else STDOUT.puts "Embedding is off, so I will just give a small summary of this source." summarize(source) end end |
#embed_source(source_io, source, count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
This method processes document content by splitting it into chunks using various splitting strategies (Character, RecursiveCharacter, Semantic) and adds the chunks to a document store for embedding.
nil if embedding is disabled or fails
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 |
# File 'lib/ollama_chat/source_fetching.rb', line 212 def (source_io, source, count: nil) @embedding.on? or return parse_source(source_io) m = "Embedding #{italic { source_io&.content_type }} document #{source.to_s.inspect}." if count STDOUT.puts '%u. %s' % [ count, m ] else STDOUT.puts m end text = parse_source(source_io) or return text.downcase! splitter_config = config..splitter inputs = nil case splitter_config.name when 'Character' splitter = Documentrix::Documents::Splitters::Character.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'RecursiveCharacter' splitter = Documentrix::Documents::Splitters::RecursiveCharacter.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'Semantic' splitter = Documentrix::Documents::Splitters::Semantic.new( ollama:, model: config..model.name, chunk_size: splitter_config.chunk_size, ) inputs = splitter.split( text, breakpoint: splitter_config.breakpoint.to_sym, percentage: splitter_config.percentage?, percentile: splitter_config.percentile?, ) end inputs or return source = source.to_s if source.start_with?(?!) source = Kramdown::ANSI::Width.truncate( source[1..-1].gsub(/\W+/, ?_), length: 10 ) end @documents.add(inputs, source:, batch_size: config..batch_size?) end |
#fetch_source(source, check_exist: false) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths. It processes the source based on its type and yields a temporary file handle for further processing.
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/ollama_chat/source_fetching.rb', line 55 def fetch_source(source, check_exist: false, &block) case source when %r{\A!(.*)} command = $1 OllamaChat::Utils::Fetcher.execute(command) do |tmp| block.(tmp) end when %r{\Ahttps?://\S+} links.add(source.to_s) OllamaChat::Utils::Fetcher.get( source, headers: config.request_headers?.to_h, cache: @cache, debug: , http_options: (OllamaChat::Utils::Fetcher.normalize_url(source)) ) do |tmp| block.(tmp) end when %r{\Afile://([^\s#]+)} filename = $1 filename = URI.decode_www_form_component(filename) filename = File.(filename) check_exist && !File.exist?(filename) and return fetch_source_as_filename(filename, &block) when %r{\A((?:\.\.|[~.]?)/(?:\\ |\\|[^\\]+)+)} filename = $1 filename = filename.gsub('\ ', ' ') filename = File.(filename) check_exist && !File.exist?(filename) and return fetch_source_as_filename(filename, &block) when %r{\A"((?:\.\.|[~.]?)/(?:\\"|\\|[^"\\]+)+)"} filename = $1 filename = filename.gsub('\"', ?") filename = File.(filename) check_exist && !File.exist?(filename) and return fetch_source_as_filename(filename, &block) else raise "invalid source #{source.inspect}" end rescue => e STDERR.puts "Cannot fetch source #{source.to_s.inspect}: #{e.class} #{e}\n#{e.backtrace * ?\n}" end |
#fetch_source_as_filename(filename) {|file| ... } ⇒ nil, Object (private)
Reads a file and extends it with header extension metadata. It then yields the file to the provided block for processing. If the file does not exist, it outputs an error message to standard error.
108 109 110 111 112 |
# File 'lib/ollama_chat/source_fetching.rb', line 108 private def fetch_source_as_filename(filename, &block) OllamaChat::Utils::Fetcher.read(filename) do |tmp| block.(tmp) end end |
#http_options(url) ⇒ Hash
The http_options method prepares HTTP options for requests based on configuration settings. It determines whether SSL peer verification should be disabled for a given URL and whether a proxy should be used, then returns a hash of options.
proxy settings
36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/ollama_chat/source_fetching.rb', line 36 def (url) = {} if ssl_no_verify = config.ssl_no_verify? hostname = URI.parse(url).hostname |= { ssl_verify_peer: !ssl_no_verify.include?(hostname) } end if proxy = config.proxy? |= { proxy: } end end |
#import(source) ⇒ String?
Imports content from the specified source and processes it.
This method fetches content from a given source (command, URL, or file) and passes the resulting IO object to the import_source method for processing.
or file path
157 158 159 160 161 162 163 |
# File 'lib/ollama_chat/source_fetching.rb', line 157 def import(source) fetch_source(source) do |source_io| content = import_source(source_io, source) or return source_io.rewind content end end |
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
parsed content
139 140 141 142 143 144 145 |
# File 'lib/ollama_chat/source_fetching.rb', line 139 def import_source(source_io, source) source = source.to_s document_type = source_io&.content_type.full? { |ct| italic { ct } + ' ' } STDOUT.puts "Importing #{document_type}document #{source.to_s.inspect} now." source_content = parse_source(source_io) "Imported #{source.inspect}:\n\n#{source_content}\n\n" end |
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
This method fetches content from a given source (command, URL, or file) and generates a summary using the summarize_source method.
192 193 194 195 196 197 198 |
# File 'lib/ollama_chat/source_fetching.rb', line 192 def summarize(source, words: nil) fetch_source(source) do |source_io| content = summarize_source(source_io, source, words:) or return source_io.rewind content end end |
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
This method takes an IO object containing document content and generates a summary based on the configured prompt template and word count.
174 175 176 177 178 179 180 181 |
# File 'lib/ollama_chat/source_fetching.rb', line 174 def summarize_source(source_io, source, words: nil) STDOUT.puts "Summarizing #{italic { source_io&.content_type }} document #{source.to_s.inspect} now." words = words.to_i words < 1 and words = 100 source_content = parse_source(source_io) source_content.present? or return config.prompts.summarize % { source_content:, words: } end |