Friday, July 27, 2018

Re: Optimizing Prefetch for Postgres IN Limit

There's two open tickets to work around this issue.

https://code.djangoproject.com/ticket/25464 which allows passing queryset override to be used for retrieval and another one that I can't currently find that allows specifying that a subquery should be used instead of a an IN clause.

Simon


Le vendredi 27 juillet 2018 07:05:17 UTC-4, Jason a écrit :
well, prefetch explicitly does joining in python, as in the docs


prefetch_related, on the other hand, does a separate lookup for each relationship, and does the 'joining' in Python.

since it doesn't use joins, there's nothing really to generate a match on an index.  FWIW, I'm facing a similar issue when optimizing a slow django query.


On Friday, July 27, 2018 at 1:10:35 AM UTC-4, Ram Jayaraman wrote:
Hi Xof,

The issue is that, when you do a naive Prefetch you are left with nothing but Django's auto generated IN query. I was asking about he recommended way to make this into a JOIN, which is not obvious how to do with Django

Also in my case the naive IN query on the PK *did not* generate a Index scan and ended up being a full table scan

I could have however used a Prefetch object to force PG to use a different index apart from the PK but a naive Prefetch of the form queryset.prefetch('table__table2') does not use any index beyond 100 values

Also it's hard to estimate how many values will be there in a nested  prefetch, to state the obvious, to do anything different on a query by query basis


On Thu, Jul 26, 2018 at 19:35 Christophe Pettus <x...@thebuild.com> wrote:

> On Jul 25, 2018, at 02:59, Jason <jjohn...@gmail.com> wrote:
>
> Where do you get that in the pg documentation? I can't find that anywhere (google-fu may be failing me), and we do have some queries with more than 100 values using IN.

It's slightly more complicated than that.  Above 100 entries, the PostgreSQL optimizer won't try to do optimizations of the form changing i IN (1, 2 ..., 101) to (i = 1) OR (i = 2) OR  (i = 3)... to see if there's a better way of executing the query.  It *can* still do an index scan in those cases, although the more entries in the IN list, the less efficient that will be.  In general, large IN clauses aren't a great idea; they're better replaced with a join.

--
-- Christophe Pettus
   x...@thebuild.com

--
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/y9sVr9Pbr-o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/8A21227D-94A9-459F-89D0-771D3052C0C0%40thebuild.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/231b4aba-3b79-4c39-985d-08399c2d5a7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment