Tuesday, October 29, 2013

Re: Django Transactions Performance increase

Hi Robin,

As far as I can tell, using one transaction should increase the
performance of the process. The reason is that you issue just one
COMMIT for the whole process instead of one per UPDATE. As an added
benefit, it helps with the data integrity.

There are two main ways I know you can improve the performance:
- Use executemany
(http://initd.org/psycopg/docs/cursor.html#cursor.executemany), which
issues O(n) queries, but does the query planning only once. One app
that uses executemany is django-bulk:
https://github.com/KMahoney/django-bulk and
https://github.com/transifex/django-bulk for some updated code.
- or use COPY (for postgresql), which uses three queries, but has
other kinds of overhead. I guess that other RDBMSs provide a similar
functionality.

The COPY method is the following: You create a new table (probably a
temporary one), COPY all entries there and then do an update in one
query. You can find an implementation of COPY for django in
https://github.com/mpessas/django-pg-extensions/blob/master/djangopg/copy.py
(that's mine).

Which one is preferrable depends on the number of queries you have.

Hope this helps a bit,
Apostolis

On Tue, Oct 29, 2013 at 4:39 PM, Robin Fordham <gingebot@gmail.com> wrote:
> Hi,
>
> I have been doing some reading and am looking to increase the performance of
> updating existing database entries via incoming datafeeds.
>
> I am finding conflicting opinions if wrapping updates in a transaction
> context manager helps improve performance, some sources say it does, others
> say it simply provides data integrity across the queryset within the
> transaction and no performance improvements and others have cited the
> transaction management overhead actually degrades performance;
>
> for instance:
>
> with transaction.commit_on_success()
> for row in updatedata:
> i = item.objects.get(id=row[0])
> i.foo = row[1]
> i.baa = row[2]
> i.save()
>
> for row in updatedata:
> i = item.objects.get(id=row[0])
> i.foo = row[1]
> i.baa = row[2]
> i.save()
>
> Some clarification on this matter would be greatly appreciated. Also any
> pointers to improve my updating efficiency would be appreciated (although I
> know I cannot do a filter and a .update() on the queryset, as each row's
> update data is distinct).
>
> Thanks.
>
> Regards,
>
> Robin.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscribe@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/8dd77008-1715-4c63-9860-d82ce5c65131%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAEa3b%2BorXwyc4H_7C6T2Ppb2EE_wNQFKgCSsL2y37iP%2BQo24rw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

No comments:

Post a Comment