Module: OllamaChat::SourceFetching
- Included in:
- Chat
- Defined in:
- lib/ollama_chat/source_fetching.rb
Overview
A module that provides functionality for fetching and processing various types of content sources.
The SourceFetching module encapsulates methods for retrieving content from different source types including URLs, file paths, and shell commands. It handles the logic for determining the appropriate fetching method based on the source identifier and processes the retrieved content through specialized parsers depending on the content type. The module also manages image handling, document importing, summarizing, and embedding operations while providing error handling and debugging capabilities.
Instance Method Summary collapse
-
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
-
#embed(source, tags: []) ⇒ String?
Embeds content from the specified source.
-
#embed_source(source_io, source, tags: [], count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
-
#fetch_source(source, check_exist: false) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths.
-
#fetch_source_as_filename(filename) {|file| ... } ⇒ nil, Object
private
Reads a file and extends it with header extension metadata.
-
#import(source) ⇒ String?
Imports content from the specified source and processes it.
-
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
-
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
-
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
Instance Method Details
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
This method takes an IO object containing image data and associates it with a source, creating an Ollama::Image instance and adding it to the images array.
87 88 89 90 91 |
# File 'lib/ollama_chat/source_fetching.rb', line 87 def add_image(images, source_io, source) STDERR.puts "Adding #{source_io&.content_type} image #{source.to_s.inspect}." image = Ollama::Image.for_io(source_io, path: source.to_s) (images << image).uniq! end |
#embed(source, tags: []) ⇒ String?
Embeds content from the specified source.
This method fetches content from a given source (command, URL, or file) and processes it for embedding using the embed_source method. If embedding is disabled, it falls back to generating a summary instead.
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 |
# File 'lib/ollama_chat/source_fetching.rb', line 242 def (source, tags: []) if @embedding.on? STDOUT.puts "Now embedding #{source.to_s.inspect} in collection #{collection.to_s.inspect}." fetch_source(source) do |source_io| content = parse_source(source_io) content.present? or return source_io.rewind (source_io, source, tags:) end prompt(:embed).to_s % { source:, collection: collection } else STDOUT.puts "Embedding is off, so I will just give a small summary of this source." summarize(source) end end |
#embed_source(source_io, source, tags: [], count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
This method processes document content by splitting it into chunks using various splitting strategies (Character, RecursiveCharacter, Semantic) and adds the chunks to a document store for embedding.
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
# File 'lib/ollama_chat/source_fetching.rb', line 175 def (source_io, source, tags: [], count: nil) @embedding.on? or return parse_source(source_io) m = "Embedding #{italic { source_io&.content_type }} document "\ "#{source.to_s.inspect} in collection #{collection.to_s.inspect}." if count STDOUT.puts '%u. %s' % [ count, m ] else STDOUT.puts m end unless @documents.source_modified?(source) STDOUT.puts "Source #{source.to_s.inspect} already up-to-date. => Skipping." return end text = parse_source(source_io) or return splitter_config = config..splitter inputs = nil case splitter_config.name when 'Character' splitter = Documentrix::Documents::Splitters::Character.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'RecursiveCharacter' splitter = Documentrix::Documents::Splitters::RecursiveCharacter.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'Semantic' splitter = Documentrix::Documents::Splitters::Semantic.new( ollama:, model: config..model.name, chunk_size: splitter_config.chunk_size, ) inputs = splitter.split( text, breakpoint: splitter_config.breakpoint.to_sym, percentage: splitter_config.percentage?, percentile: splitter_config.percentile?, ) end inputs or return source = source.to_s command = false if source.start_with?(?!) source = Kramdown::ANSI::Width.truncate( source[1..-1].gsub(/\W+/, ?_), length: 10 ) command = true end if !command @documents.source_update(inputs, source:, tags:, batch_size: config..batch_size?) else @documents.add(inputs, source:, tags:, batch_size: config..batch_size?) end end |
#fetch_source(source, check_exist: false) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths. It processes the source based on its type and yields a temporary file handle for further processing.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/ollama_chat/source_fetching.rb', line 35 def fetch_source(source, check_exist: false, &block) source = source.ask_and_send_or_self(:to_path).to_s case source when %r{\A!(.*)} command = $1 OllamaChat::Utils::Fetcher.execute(command) do |tmp| block.(tmp) end when %r{\Ahttps?://\S+} get_url(source, cache:) do |tmp| block.(tmp) end when %r{\Afile://([^\s#]+)} filename = $1 filename = URI.decode_www_form_component(filename) filename = File.(filename) check_exist && !File.exist?(filename) and return fetch_source_as_filename(filename, &block) when %r{\A((?:\.\.|[~.]?)/(?:\\ |\\|[^\\]+)+)} filename = $1 filename = filename.gsub('\ ', ' ') filename = File.(filename) check_exist && !File.exist?(filename) and return fetch_source_as_filename(filename, &block) when %r{\A"((?:\.\.|[~.]?)/(?:\\"|\\|[^"\\]+)+)"} filename = $1 filename = filename.gsub('\"', ?") filename = File.(filename) check_exist && !File.exist?(filename) and return fetch_source_as_filename(filename, &block) else raise "invalid source #{source.inspect}" end rescue => e msg = "Fetching source #{source.to_s.inspect}: #{e.class} #{e}" STDERR.puts "#{msg}\n#{e.backtrace * ?\n}" confirm?(prompt: '⏎ Press any key to continue (%s). ', output: STDERR, timeout: 3) msg = OllamaChat::Utils::Fetcher::ResponseMetadata.failed(msg) block.(msg) msg end |
#fetch_source_as_filename(filename) {|file| ... } ⇒ nil, Object (private)
Reads a file and extends it with header extension metadata. It then yields the file to the provided block for processing. If the file does not exist, it outputs an error message to standard error.
270 271 272 273 274 |
# File 'lib/ollama_chat/source_fetching.rb', line 270 def fetch_source_as_filename(filename, &block) OllamaChat::Utils::Fetcher.read(filename) do |tmp| block.(tmp) end end |
#import(source) ⇒ String?
Imports content from the specified source and processes it.
This method fetches content from a given source (command, URL, or file) and passes the resulting IO object to the import_source method for processing.
120 121 122 123 124 125 126 |
# File 'lib/ollama_chat/source_fetching.rb', line 120 def import(source) fetch_source(source) do |source_io| content = import_source(source_io, source) or return source_io.rewind content end end |
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
102 103 104 105 106 107 108 |
# File 'lib/ollama_chat/source_fetching.rb', line 102 def import_source(source_io, source) source = source.to_s document_type = source_io&.content_type.full? { |ct| italic { ct } + ' ' } STDOUT.puts "Importing #{document_type}document #{source.to_s.inspect} now." source_content = parse_source(source_io) "Imported #{source.inspect}:\n\n#{source_content}\n\n" end |
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
This method fetches content from a given source (command, URL, or file) and generates a summary using the summarize_source method.
155 156 157 158 159 160 161 |
# File 'lib/ollama_chat/source_fetching.rb', line 155 def summarize(source, words: nil) fetch_source(source) do |source_io| content = summarize_source(source_io, source, words:) or return source_io.rewind content end end |
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
This method takes an IO object containing document content and generates a summary based on the configured prompt template and word count.
137 138 139 140 141 142 143 144 |
# File 'lib/ollama_chat/source_fetching.rb', line 137 def summarize_source(source_io, source, words: nil) STDOUT.puts "Summarizing #{italic { source_io&.content_type }} document #{source.to_s.inspect} now." words = words.to_i words < 1 and words = 100 source_content = parse_source(source_io) source_content.present? or return prompt(:summarize).to_s % { source_content:, words: } end |