Module: OllamaChat::SourceFetching
- Included in:
- Chat
- Defined in:
- lib/ollama_chat/source_fetching.rb
Overview
A module that provides functionality for fetching and processing various types of content sources.
The SourceFetching module encapsulates methods for retrieving content from different source types including URLs, file paths, and shell commands. It handles the logic for determining the appropriate fetching method based on the source identifier and processes the retrieved content through specialized parsers depending on the content type. The module also manages image handling, document importing, summarizing, and embedding operations while providing error handling and debugging capabilities.
Instance Method Summary collapse
-
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
-
#embed(source) ⇒ String?
Embeds content from the specified source.
-
#embed_source(source_io, source, count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
-
#fetch_source(source, check_exist: false) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths.
-
#fetch_source_as_filename(filename) {|file| ... } ⇒ nil, Object
private
Reads a file and extends it with header extension metadata.
-
#import(source) ⇒ String?
Imports content from the specified source and processes it.
-
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
-
#links ⇒ Set
Returns the links set for this object, initializing it lazily if needed.
-
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
-
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
Instance Method Details
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
This method takes an IO object containing image data and associates it with a source, creating an Ollama::Image instance and adding it to the images array.
99 100 101 102 103 |
# File 'lib/ollama_chat/source_fetching.rb', line 99 def add_image(images, source_io, source) STDERR.puts "Adding #{source_io&.content_type} image #{source.to_s.inspect}." image = Ollama::Image.for_io(source_io, path: source.to_s) (images << image).uniq! end |
#embed(source) ⇒ String?
Embeds content from the specified source.
This method fetches content from a given source (command, URL, or file) and processes it for embedding using the embed_source method. If embedding is disabled, it falls back to generating a summary instead.
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 |
# File 'lib/ollama_chat/source_fetching.rb', line 244 def (source) if @embedding.on? STDOUT.puts "Now embedding #{source.to_s.inspect}." fetch_source(source) do |source_io| content = parse_source(source_io) content.present? or return source_io.rewind (source_io, source) end config.prompts. % { source: } else STDOUT.puts "Embedding is off, so I will just give a small summary of this source." summarize(source) end end |
#embed_source(source_io, source, count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
This method processes document content by splitting it into chunks using various splitting strategies (Character, RecursiveCharacter, Semantic) and adds the chunks to a document store for embedding.
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/ollama_chat/source_fetching.rb', line 187 def (source_io, source, count: nil) @embedding.on? or return parse_source(source_io) m = "Embedding #{italic { source_io&.content_type }} document #{source.to_s.inspect}." if count STDOUT.puts '%u. %s' % [ count, m ] else STDOUT.puts m end text = parse_source(source_io) or return text.downcase! splitter_config = config..splitter inputs = nil case splitter_config.name when 'Character' splitter = Documentrix::Documents::Splitters::Character.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'RecursiveCharacter' splitter = Documentrix::Documents::Splitters::RecursiveCharacter.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'Semantic' splitter = Documentrix::Documents::Splitters::Semantic.new( ollama:, model: config..model.name, chunk_size: splitter_config.chunk_size, ) inputs = splitter.split( text, breakpoint: splitter_config.breakpoint.to_sym, percentage: splitter_config.percentage?, percentile: splitter_config.percentile?, ) end inputs or return source = source.to_s if source.start_with?(?!) source = Kramdown::ANSI::Width.truncate( source[1..-1].gsub(/\W+/, ?_), length: 10 ) end @documents.add(inputs, source:, batch_size: config..batch_size?) end |
#fetch_source(source, check_exist: false) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths. It processes the source based on its type and yields a temporary file handle for further processing.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/ollama_chat/source_fetching.rb', line 35 def fetch_source(source, check_exist: false, &block) source = source.ask_and_send_or_self(:to_path).to_s case source when %r{\A!(.*)} command = $1 OllamaChat::Utils::Fetcher.execute(command) do |tmp| block.(tmp) end when %r{\Ahttps?://\S+} links.add(source.to_s) get_url(source, cache:) do |tmp| block.(tmp) end when %r{\Afile://([^\s#]+)} filename = $1 filename = URI.decode_www_form_component(filename) filename = File.(filename) check_exist && !File.exist?(filename) and return fetch_source_as_filename(filename, &block) when %r{\A((?:\.\.|[~.]?)/(?:\\ |\\|[^\\]+)+)} filename = $1 filename = filename.gsub('\ ', ' ') filename = File.(filename) check_exist && !File.exist?(filename) and return fetch_source_as_filename(filename, &block) when %r{\A"((?:\.\.|[~.]?)/(?:\\"|\\|[^"\\]+)+)"} filename = $1 filename = filename.gsub('\"', ?") filename = File.(filename) check_exist && !File.exist?(filename) and return fetch_source_as_filename(filename, &block) else raise "invalid source #{source.inspect}" end rescue => e STDERR.puts "Cannot fetch source #{source.to_s.inspect}: #{e.class} #{e}\n#{e.backtrace * ?\n}" end |
#fetch_source_as_filename(filename) {|file| ... } ⇒ nil, Object (private)
Reads a file and extends it with header extension metadata. It then yields the file to the provided block for processing. If the file does not exist, it outputs an error message to standard error.
83 84 85 86 87 |
# File 'lib/ollama_chat/source_fetching.rb', line 83 private def fetch_source_as_filename(filename, &block) OllamaChat::Utils::Fetcher.read(filename) do |tmp| block.(tmp) end end |
#import(source) ⇒ String?
Imports content from the specified source and processes it.
This method fetches content from a given source (command, URL, or file) and passes the resulting IO object to the import_source method for processing.
132 133 134 135 136 137 138 |
# File 'lib/ollama_chat/source_fetching.rb', line 132 def import(source) fetch_source(source) do |source_io| content = import_source(source_io, source) or return source_io.rewind content end end |
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
114 115 116 117 118 119 120 |
# File 'lib/ollama_chat/source_fetching.rb', line 114 def import_source(source_io, source) source = source.to_s document_type = source_io&.content_type.full? { |ct| italic { ct } + ' ' } STDOUT.puts "Importing #{document_type}document #{source.to_s.inspect} now." source_content = parse_source(source_io) "Imported #{source.inspect}:\n\n#{source_content}\n\n" end |
#links ⇒ Set
Returns the links set for this object, initializing it lazily if needed.
The links set is memoized, meaning it will only be created once per object instance and subsequent calls will return the same Set instance.
267 268 269 |
# File 'lib/ollama_chat/source_fetching.rb', line 267 def links @links ||= Set.new end |
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
This method fetches content from a given source (command, URL, or file) and generates a summary using the summarize_source method.
167 168 169 170 171 172 173 |
# File 'lib/ollama_chat/source_fetching.rb', line 167 def summarize(source, words: nil) fetch_source(source) do |source_io| content = summarize_source(source_io, source, words:) or return source_io.rewind content end end |
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
This method takes an IO object containing document content and generates a summary based on the configured prompt template and word count.
149 150 151 152 153 154 155 156 |
# File 'lib/ollama_chat/source_fetching.rb', line 149 def summarize_source(source_io, source, words: nil) STDOUT.puts "Summarizing #{italic { source_io&.content_type }} document #{source.to_s.inspect} now." words = words.to_i words < 1 and words = 100 source_content = parse_source(source_io) source_content.present? or return config.prompts.summarize % { source_content:, words: } end |