Thursday, August 28, 2014

Re: Django ORM generate inefficient SQL

The issue on MySQL should be tracked in another ticket I guess -- you're right that it should be able to GROUP only by `auth_group`.`id`.

Le jeudi 28 août 2014 07:28:18 UTC-4, Alex Lebedev a écrit :
Thanks for the answer! Yes, but this problem occurs regardless of database backend (I tested it on PostgreSQL and MySQL)

среда, 27 августа 2014 г., 12:37:00 UTC+6 пользователь Simon Charette написал:
This is already tracked in #19259.

Le jeudi 21 août 2014 06:49:41 UTC-4, Alex Lebedev a écrit :
Hi, guys!

I have encountered a problem. When I use the following code in django shell:

    from django.contrib.auth.models import Group
    from django.db.models import Count
    print Group.objects.annotate(cnt=Count('user')).values('id', 'cnt').query.sql_with_params()
 
Django ORM generate the following SQL query:

    'SELECT `auth_group`.`id`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id`, `auth_group`.`name` ORDER BY NULL'

"auth_group.name" occurs in "group by" statement. But this column isn't represented in "select" statement. Such query is inefficient (expecially for large tables with many columns and rows).

Debuging of Django SQLCompiler ( https://github.com/django/django/blob/stable/1.6.x/django/db/models/sql/compiler.py#L568 ) gives me the following information:
- postgresql:
    "self.query.select" == "self.query.group_by" == "[(u'auth_group', u'id'), (u'auth_group', 'name')]"
    "self.connection.features.allows_group_by_pk" is False
    "
len(self.query.get_meta().concrete_fields) == len(self.query.select)" is False
   
'auth_group'.'name' appears in result because of "cols = self.query.group_by + having_group_by + select_cols"
- mysql:
    "self.query.select" == "self.query.group_by" == "[(u'auth_group', u'id'), (u'auth_group', 'name')]"
    "self.connection.features.allows_group_by_pk" is True
    "
len(self.query.get_meta().concrete_fields) == len(self.query.select)" is False
   
'auth_group'.'name' appears in result because of "cols = self.query.group_by + having_group_by + select_cols"

In the same time, the following code (without .values()):

    from django.contrib.auth.models import Group
    from django.db.models import Count
    print Group.objects.annotate(cnt=Count('user')).query.sql_with_params()

gives the right SQL query for mysql (because "len(self.query.get_meta().concrete_fields) == len(self.query.select)" is True):

'SELECT `auth_group`.`id`, `auth_group`.`name`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id` ORDER BY NULL'

Is it a bug? Should I create a bug report or something?

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/8711e8b3-f869-4ab7-8166-4618bd173da0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment