Tuesday, October 29, 2013

Re: Django Transactions Performance increase

On Tue, Oct 29, 2013 at 2:39 PM, Robin Fordham <gingebot@gmail.com> wrote:
> Hi,
>
> I have been doing some reading and am looking to increase the performance of
> updating existing database entries via incoming datafeeds.
>
> I am finding conflicting opinions if wrapping updates in a transaction
> context manager helps improve performance, some sources say it does, others
> say it simply provides data integrity across the queryset within the
> transaction and no performance improvements and others have cited the
> transaction management overhead actually degrades performance;
>
> for instance:
>
> with transaction.commit_on_success()
> for row in updatedata:
> i = item.objects.get(id=row[0])
> i.foo = row[1]
> i.baa = row[2]
> i.save()
>
> for row in updatedata:
> i = item.objects.get(id=row[0])
> i.foo = row[1]
> i.baa = row[2]
> i.save()
>
> Some clarification on this matter would be greatly appreciated. Also any
> pointers to improve my updating efficiency would be appreciated (although I
> know I cannot do a filter and a .update() on the queryset, as each row's
> update data is distinct).
>

This is inefficient. Each row you are updating requires 2 queries, one
to fetch the existing row, one to update it.

for row in updatedata:
Item.objects.filter(id=row[0]).update(foo=row[1], baa=row[2])

This variant would execute only one query per row, and is entirely
atomic even without transactions, since the fetch/update/store happens
entirely in the database (although the entire update operation is not
atomic, to do that would require a transaction).

As to speed, adding transactions does have some cost, test it and see.
The variant I wrote above is extremely unlikely to ever throw an
exception however - if the database server disappears or crashes, or
the table doesn't have the columns indicated, that could cause an
exception.

Since commit_on_success only doesn't commit when an exception is
raised, it means there is highly unlikely to ever be an exception
worth rolling back. If the DB server crashes or goes away, you are
going to find it difficult to roll back the transaction (!), if the
columns aren't there it will fail on the first query, and nothing will
have changed and so there is nothing to roll back.

If you cannot update the database without first creating the model
instance - perhaps you need to call a method on the model instance in
order to get one of the values to update, for example - then you could
still speed it up slightly by reducing the number of queries to load
model data, by loading the model instances in bulk.

There is a limit of how many objects you would want to create in one
go, normally by the length of the query you are allowed to send to the
database server, so you might require some logic to work out the batch
to load from your long list of ids. Eg, you probably cannot load a
million objects at a time, but you can load 10,000 at a time no
bother, so instead of a million queries to look up items, you have
100.

The DB server still does much the same work, so you are saving
(1000000 - 100) * query_overhead.

Cheers

Tom

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAFHbX1KMDtQrS6rAMwfVBUfK9%3D01gyHYJrfUrYi6Q%2Bo0FuwvWA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

No comments:

Post a Comment