Tuesday, November 1, 2011

Django aggregation does excessive GROUP BY clauses

I am doing a very simple aggregation using the Django ORM, and it is
producing a GROUP BY clause that includes the data field, which is
very large, and is slowing down the query by over 100-fold.

Here is a simplified version of the model:

class Document(models.Model):
data = models.TextField()

class Attachment(models.Model):
document = models.ForeignKey(Document)

And the query I am running:

Document.objects.annotate(num_attachments=Count('attachment'))

And the SQL output:

SELECT
`document_document`.`id`,
`document_document`.`data`,
COUNT(`document_attachment`.`id`) AS `num_attachments`
FROM `document_document`
LEFT OUTER JOIN `document_attachment`
ON (`document_document`.`id` =
`document_attachment`.`document_id`)
GROUP BY
`document_document`.`id`,
`document_document`.`id`,
`document_document`.`data`
ORDER BY NULL

Doing GROUP BY on the data field is unnecessary and ridiculous. I can
stop this by doing a values query:

Document.objects.values('pk').annotate(num_attachments=Count('attachment'))

But then how do I get a real, annotated Document query as the result?

Note: I cross-posted this to StackOverflow:
http://stackoverflow.com/questions/7973461/django-aggregation-does-excessive-group-by-clauses

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment