When adding acts_as_ferret to an existing application, with a few thousands of entries in th DB, the initial indexing time is prohibitive : on a MacBook?, 15000 entries are indexed in about 215s.
This may lead to the improper conclusion that ferret / acts_as_ferret is slow (but other experiment show that it is not !)
In fact, the bulk indexer does not really perform bulk indexing, as the documents are basically indexed one by one.
The solution I have found is to change slightly the index_records function in bulk_indexer.rb file revision 316 to :
def index_records(records, offset)
docs = {}
batch_time = measure_time {
records.each { |rec| docs[rec.id] = rec.to_doc if rec.ferret_enabled?(true) }
@index.update_batch(docs)
}.to_f
...
end
This use a new function, update_batch, that I added in ferret 0.11.6, in index.rb :
def update_batch(docs)
@dir.synchrolock do
ensure_writer_open()
commit = false
docs.each do |id, value|
delete(id)
commit = true if id.is_a?(String) or id.is_a?(Symbol)
end
if commit
@writer.commit
end
ensure_writer_open()
docs.each do |id, new_doc|
@writer << new_doc
end
flush() if @auto_flush
end
end
This function performs the same operation as update, but on a set of documents instead of one document at a time.
The result is that initial indexing took only 17s instead of 215s, a nice improvement.
This of course would need some consistency validation on the ferret side.
There may too some subtler improvements to be done, as the delete part of this patch is not totally "batched".
And this could be used to speed up other operations in acts_as_ferret I suppose.
Francois Lagunas
francois.lagunas@gmail.com
http://www.tourteaser.com