Wednesday, June 7, 2017

Is this a bug on annotate+filter+annotate over M2M field.

Hello, everyone, I just want to get some more feedback on this, before posting it into the tracker.

So according to the docs, the order of annotate and filter matters, since they are not commutative operations.

The annotation is over the full queryset if applied before .filter, and the annotation gets filtered if applied after .filter
We can even use both strategies together, i.e.

publishers = (Publisher.objects
                       
.annotate(num_books=Count('book', distinct=True))
                       
.filter(book__rating__gt=3.0)
                       
.annotate(num_rated=Count('book', distinct=True))
                       
.filter(num_books=F('num_rated'))
)

will produce the following SQL

SELECT
   
"store_publisher"."id",
   
"store_publisher"."name",
   
"store_publisher"."num_awards",
    COUNT
(DISTINCT "store_book"."id") AS "num_books",
    COUNT
(DISTINCT T3."id") AS "num_rated"
FROM
"store_publisher"
LEFT OUTER JOIN
"store_book" ON ("store_publisher"."id" = "store_book"."publisher_id")
INNER JOIN
"store_book" T3 ON ("store_publisher"."id" = T3."publisher_id")
WHERE
    T3
."rating" > 3.0
GROUP BY
   
"store_publisher"."id",
   
"store_publisher"."name",
   
"store_publisher"."num_awards"
HAVING COUNT
(DISTINCT "store_book"."id") = (COUNT(DISTINCT T3."id"))

which is exactly what I expected.

However, in this example the annotation + filter is done over a ForeignKey, if do this over a M2M field:

stores = (Store.objects
               
.annotate(num_books=Count('books', distinct=True))
               
.filter(books__rating__gt=3.0)
               
.annotate(num_rated=Count('books', distinct=True))
               
.filter(num_books=F('num_rated'))
)

we get the following SQL:

SELECT
   
"store_store"."id",
   
"store_store"."name",
   
"store_store"."registered_users",
    COUNT
(DISTINCT "store_store_books"."book_id") AS "num_books",
    COUNT
(DISTINCT "store_store_books"."book_id") AS "num_rated"
FROM
"store_store"
LEFT OUTER JOIN
"store_store_books" ON ("store_store"."id" = "store_store_books"."store_id")
INNER JOIN
"store_store_books" T4 ON ("store_store"."id" = T4."store_id")
INNER JOIN
"store_book" T5 ON (T4."book_id" = T5."id")
WHERE
    T5
."rating" > 3.0
GROUP BY
   
"store_store"."id",
   
"store_store"."name",
   
"store_store"."registered_users"
HAVING
    COUNT
(DISTINCT "store_store_books"."book_id") = (COUNT(DISTINCT "store_store_books"."book_id"))

which is incorrect compared to the first one.
The second annotation is not over the filtered set (i.e. its not using the T4/T5 tables).
What I would have expected here, is the same result as the previous query with publishers.

Any thoughts?

PS. I'm testing this on Django 1, 11, 1

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/88096e99-d3d8-4b56-8934-21e23fbbe2c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment