To edit pages or tickets please login with username/password: aaf/aaf

Ticket #201 (closed enhancement: fixed)

Opened 8 months ago

Last modified 8 months ago

Potential for data corruption when restarting drb server?

Reported by: aaf Assigned to: jk
Priority: major Milestone: 0.5
Component: 0plugin Version:
Keywords: Cc: j@jjb.cc

Description

Consider a high volume website that uses the capistrano tasks (as of r 309), in particular, this line:

after "deploy:restart", "ferret:restart"

In the time between when the mongrels restart and ferret has not restarted, if there have been schema/model changes, is there potential for corruption of the ferret index? Or is the only concern reindexing existing objects (as discussed in #200)?

Let's assume that system availability and bad user experience are not a concern (for this discussion). I am wondering about data/index corruption only.

Change History

02/03/08 00:36:07 changed by aaf

And if this is a problem, the easiest solution I can think of is:

before "deploy:restart", "ferret:stop"
after  "deploy:restart", "ferret:start"

(And perhaps similar declarations for deploy:start and deploy:stop).

(follow-up: ↓ 3 ) 02/04/08 13:42:00 changed by jk

  • status changed from new to closed.
  • resolution set to fixed.

One way or the other - you have a time window where the application is running but no DRb server. With the recent changes that let aaf handle DRb connection errors by default, the worst things that can happen while the DRb server isn't there are lost index updates and empty search results.

The lost updates even could be handled by aaf if it queued index updates on the rails side when DRb isn't reachable and retry to send them over later on. I'll think about that.

The real problem here is not the application restart, but migrations. Rails apps in general suffer this problem - it is possible that the app is using model classes not matching the current db scheme before or after calling db:migrate (depending on when the code deployment/app restart happen).

The safest thing to do is to shut down the app (show some maintenance notice in the meantime), then shut down Ferret, update/migrate/whatever, bring up ferret, then bring up your app again.

We have now

after  "deploy:stop",    "ferret:stop"
before "deploy:start",   "ferret:start"

before "deploy:restart", "ferret:stop"
after  "deploy:restart", "ferret:start"

So if you run

cap deploy:stop deploy:update deploy:migrate deploy:start

DRb will be stopped *after* bringing down the app and started just *before* bringing it up again, and no data corruption should ever happen.

(in reply to: ↑ 2 ) 02/04/08 17:31:48 changed by aaf

  • status changed from closed to reopened.
  • resolution deleted.

Replying to jk:

So if you run {{{ cap deploy:stop deploy:update deploy:migrate deploy:start }}} DRb will be stopped *after* bringing down the app and started just *before* bringing it up again, and no data corruption should ever happen.

This is a good solution, unless one's migrations call ferret_update, as you suggested in #200 :)

Maybe the environment can be put into single-thread / no-drb mode during migrations? (seems more efficient anyway)

02/04/08 17:42:03 changed by aaf

Or actually-- maybe calling ferret_update is not an option, because the drb server has to be restarted against the new schema beforehand?

The challenge is to achieve both of these:

safe/functional order of operations for updating old object index when schema changes

  1. change schema
  2. restart drb server
  3. update old objects

safe/functional order of operations for restarting rails with schema changes and no ferret corruption / missed data

  1. stop mongrels
  2. stop ferret
  3. change schema
  4. start ferret
  5. start mongrels

02/05/08 17:28:38 changed by aaf

  • status changed from reopened to closed.
  • resolution set to fixed.

Don't get me wrong-- I like the new cap recipes/documentation a lot-- perhaps it should be the user's responsibility to keep their migrations/updates straight. But I just wanted to document the Dream-- a system where both of the above-list concerns are accommodated for. Maybe it should go into a new ticket for future discussion (and maybe I'll find time to put together a solution). -jjb

To edit pages or tickets please login with username/password: aaf/aaf