On Wed, Nov 2, 2011 at 10:46 AM, Tom Evans <tevans.uk@googlemail.com> wrote:
rows in it. Doing any kind of query on this table is slow, andOK, take this example. I have a django model table with 70 million
typically the query is date restrained - which mysql will use as the
optimum key, meaning any further filtering is a table scan on the
filtered rows.
Pulling a large query (say, all logins in a month, ~1 million rows)
takes only a few seconds longer than counting the number of rows the
query would find - after all, the database still has to do precisely
the same amount of work, it just doesn't have to deliver the data.
Say I have a n entries I want to test are in that resultset, and I
also want to iterate through the list, calculating some data and
printing out the row, I can do the existence tests either in python or
in the database. If I do it in the database, I have n+1 expensive
queries to perform. If I do it in python, I have 1 expensive query to
perform, and (worst case) n+1 full scans of the data retrieved (and I
avoid locking the table for n+1 expensive queries).
Depending on the size of the data set, as the developer I have the
choice of which will be more appropriate for my needs. Sometimes I
need "if qs.filter(pk=obj.pk).exists()", sometimes I need "if obj in
qs".
Just looking at the source to QuerySet (finally), and it looks like the __contains__ method actually does something different than this: It evaluates the whole QuerySet in bulk at the database level, and starts creating model instances based on that, but only until it finds a matching one. So, after running "if obj in qs", you might end up with one object created, or you might end up with 70M objects, or anywhere in between.
Again: odd, undocumented, and potentially surprising behaviour, and I'd recommend explicit over implicit, especially in this case.
Regards,
Ian Clelland
<clelland@gmail.com>
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
No comments:
Post a Comment