Tuesday, August 26, 2014

Re: Django ORM generate inefficient SQL

This is already tracked in #19259.

Le jeudi 21 août 2014 06:49:41 UTC-4, Alex Lebedev a écrit :
Hi, guys!

I have encountered a problem. When I use the following code in django shell:

    from django.contrib.auth.models import Group
    from django.db.models import Count
    print Group.objects.annotate(cnt=Count('user')).values('id', 'cnt').query.sql_with_params()
 
Django ORM generate the following SQL query:

    'SELECT `auth_group`.`id`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id`, `auth_group`.`name` ORDER BY NULL'

"auth_group.name" occurs in "group by" statement. But this column isn't represented in "select" statement. Such query is inefficient (expecially for large tables with many columns and rows).

Debuging of Django SQLCompiler ( https://github.com/django/django/blob/stable/1.6.x/django/db/models/sql/compiler.py#L568 ) gives me the following information:
- postgresql:
    "self.query.select" == "self.query.group_by" == "[(u'auth_group', u'id'), (u'auth_group', 'name')]"
    "self.connection.features.allows_group_by_pk" is False
    "
len(self.query.get_meta().concrete_fields) == len(self.query.select)" is False
   
'auth_group'.'name' appears in result because of "cols = self.query.group_by + having_group_by + select_cols"
- mysql:
    "self.query.select" == "self.query.group_by" == "[(u'auth_group', u'id'), (u'auth_group', 'name')]"
    "self.connection.features.allows_group_by_pk" is True
    "
len(self.query.get_meta().concrete_fields) == len(self.query.select)" is False
   
'auth_group'.'name' appears in result because of "cols = self.query.group_by + having_group_by + select_cols"

In the same time, the following code (without .values()):

    from django.contrib.auth.models import Group
    from django.db.models import Count
    print Group.objects.annotate(cnt=Count('user')).query.sql_with_params()

gives the right SQL query for mysql (because "len(self.query.get_meta().concrete_fields) == len(self.query.select)" is True):

'SELECT `auth_group`.`id`, `auth_group`.`name`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id` ORDER BY NULL'

Is it a bug? Should I create a bug report or something?

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/e884cbce-58ff-4704-bf0f-990b85fca407%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment