Wednesday, January 2, 2013

Re: Bulk delete - performance / object collection

A note about the upcoming 1.5: The delete() was optimized so that it
doesn't fetch the objects if it doesn't have to. The "doesn't have to"
translates to:
- no delete signals defined for the model
- no cascades

That being said using DSE might be the right choice for you.

- Anssi

On 2 tammi, 14:17, George Lund <gl...@mintel.com> wrote:
> Thank you very much for this. I'll catch up with those threads and read
> more about DSE, which looks really interesting work.
>
> regards
>
> George
>
> On Wednesday, 2 January 2013 11:50:54 UTC, Cal Leeming [Simplicity Media
>
>
>
>
>
>
>
> Ltd] wrote:
>
> > Hi George,
>
> > This is one area I spent quite a lot of time in personally, see;
>
> >https://groups.google.com/forum/?fromgroups=#!msg/django-users/iRhWD0...
> >https://groups.google.com/forum/#!topic/django-users/hgLrwMoFLII
>
> > Bulk operations using the ORM isn't always the right thing to do - and it
> > entirely depends on what you consider bulk and acceptable performance.
>
> > You might want to look at the source code for this, to see how they handle
> > bulk operations (they implemented the same bulk_update approach mentioned
> > in the above threads)
> >http://pypi.python.org/pypi/dse
>
> > Although bypassing the ORM might feel wrong at first, sometimes it is
> > completely acceptable - you just need to make sure you don't abuse/misuse
> > it unnecessarily.
>
> > Cal
>
> > On Wed, Jan 2, 2013 at 11:29 AM, George Lund <gl...@mintel.com<javascript:>
> > > wrote:
>
> >> I'm trying to bulk-delete several million rows from my database.
>
> >> The docs for Django 1.3<https://docs.djangoproject.com/en/1.3/topics/db/queries/#deleting-obj...>say "this will, whenever possible, be executed purely in SQL". A pure-SQL
> >> delete is what I want in this case, so that's fine.
>
> >> However, the code is never getting as far as running any SQL.
>
> >> Interrupting the script shows that the delete method on the QuerySet is
> >> trying to use a "Collector" to construct model instances for each row I'm
> >> trying to delete. This is going to take too long (and may in fact consume
> >> all the memory available) -- I don't think it's practical to wait in this
> >> case. (I've tried waiting over half an hour!)
>
> >> (I'm looking at django.db.models.query.QuerySet.delete and django.db.models.deletion.Collector.collect
> >> / Collector.add.)
>
> >> What's the point in doing the delete "purely in SQL" if all of the
> >> objects are getting constructed anyway? Why do they need to be "collected"
> >> before the SQL DELETE is run? The model instance in this case has no
> >> child rows, to which the delete might need to be cascaded.
>
> >> Meanwhile I can construct the SQL by hand easily enough, but I feel this
> >> isn't doing things the right way.
>
> >> thanks for any help
>
> >> George
>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "Django users" group.
> >> To view this discussion on the web visit
> >>https://groups.google.com/d/msg/django-users/-/W4LqKzcnlaYJ.
> >> To post to this group, send email to django...@googlegroups.com<javascript:>
> >> .
> >> To unsubscribe from this group, send email to
> >> django-users...@googlegroups.com <javascript:>.
> >> For more options, visit this group at
> >>http://groups.google.com/group/django-users?hl=en.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment