Sunday, March 1, 2015

Re: Can the new `Prefetch` solve my problem?

Ask and you shall receive (eventually). Another post in this list has an example using Prefetch(), perhaps that will help you:

https://groups.google.com/d/msgid/django-users/3daddb38-3260-4f7d-9559-7d0d3f17b59e%40googlegroups.com?utm_medium=email&utm_source=footer

On Feb 27, 2015 2:38 AM, "James Schneider" <jrschneider83@gmail.com> wrote:


On Feb 27, 2015 12:51 AM, "aRkadeFR" <contact@arkade.info> wrote:
>
> Yeah, but from my experience, your example through a new
> query. You have to (and please correct me if I'm wrong or
> there are other ways) use the self.chair_set.all() in order
> to not through a new query when you have prefetched the
> chairs.

AFAIK, self.chair_set.all() would always spawn a second query unless something like select_related() had been used previously to cache that query result

I provided several examples, some of which spawn a second query (and may be appropriate, I can't provide an affirmative answer for the OP without knowing how many queries are being run per page load and average query time, etc.). You'll need to be more specific.

I think I accidentally referred to some of the model fields as if they were FK's, but they probably should have had .all() after all of the references since everything was an M2M relationship.

>
> To be simple and answer the problem: my only solution I
> have in mind is to prefetched all the objects (chairs), and
> then filter it in python with properties like I said. But as you
> said it will load too much objects...

I think there was some miscommunication here. The OP stated that loading all of the chairs was infeasible, and I would tend to agree.

My only clarification would be that loading all of the chairs via something like Chair.objects.all() would be a bad idea, since you have no idea how many chairs you may have in the entire database.

Loading >10k Chair objects into memory and having Django coerce those into model objects, and then performing post processing in some custom app code to filter that list back down to something reasonable will give you a bad time, every time. Your users will be unhappy with pages that take seconds to load, and your server processes will be unhappy assuming you have plenty of RAM/CPU to handle such a request. Now multiply that load by X number of users...and you're quickly hitting CPU and process limits for RAM allocation.

However, "loading all of the chairs" via a M2M or FK relationship such as desk_obj.nearby_chairs.all() (assuming the OP reconfigured the model fields as I suggested before, or even using the existing Chair M2M to Desk) would likely be perfectly valid and would probably be a small subset of the total chairs in the system (or maybe None or all of them).

>
> Still watching the thread cause I have couple of problems
> like this one :)
>

I don't necessarily run in to this specific problem, so I'm not sure how much more I can contribute. A lot of the model changes I suggested were educated guesses and may not be valid at all given other requirements or design considerations in the project.

> aRkadeFR
>
>
> On 02/26/2015 08:27 PM, James Schneider wrote:
>>
>> Heh, I just realized that aRkadeFR had replied with a similar idea to use a property. At least I know I'm not too far off on my thinking. :-D
>>
>> -James
>>
>> On Thu, Feb 26, 2015 at 11:02 AM, James Schneider <jrschneider83@gmail.com> wrote:
>>>
>>> Whoops, accidentally sent that last one too early, here's the continuation:
>>>
>>> However, that probably doesn't buy you much since you are still doing an extra query for every Desk you pull from your original query.
>>>
>>> Funny enough, I was googling around for an answer here, and stumbled across this:
>>>
>>> https://docs.djangoproject.com/en/1.7/ref/models/queries/#django.db.models.Prefetch
>>>
>>> which I think is what you were referring to initially in your OP. I wasn't even aware of its existence. Prefetch() is a helper class for prefetch_related(). Taking a quick glance through the source code, I would imagine that it probably won't help you much, since the functionality of that class only controls the action of prefetch_related().
>>>
>>>
>>> Zooming out a bit, the crux of your problem is this: An attribute you wish to populate is not an FK or M2M field, it is an entirely separate Queryset with some moderately complex filters. The built-in ORM functionality for pre-loading via prefetch/select_related() is expecting a FK or M2M relationship and can't  use another queryset AFAIK. The high-level functionality of prefetch_related() is probably close to what you want, which is to run a single second query to collect all of the favorite_or_nearby_chairs for all of the Desks in your original query, and then glue everything together behind the scenes in Python to make the desk_obj.favorite_or_nearby_chairs available seamlessly.
>>>
>>> I would then wonder if there is another way to organize this data to make it easier to work with? How about adding a 'favorite_chairs' field to the Desk model that has an M2M to Chair? I would also update your 'nearby_desks' model field to use 'nearby_chairs' as the related_field. 
>>>
>>> Then you could do something like the following:
>>>
>>> desks = Desk.objects.filter(<filter here>).select_related('nearby_chairs', 'favorite_chairs')
>>>
>>> Then, you can modify your model with a property that will return the concatenation of nearby_chairs and favorite_chairs:
>>>
>>> class Desk(models.Model):
>>>     @property
>>>     def favorite_or_nearby_chairs(self):
>>>         return self.nearby_chairs.all() + self.favorite_chairs
>>>
>>> I don't believe this will spawn another query, since select_related() will have already run and have the results cached. You may also want to consider moving 'nearby_desks' out of Chair and renaming it to 'nearby_chairs' in Desk, and using a related_name of 'nearby_desks' instead. Then you can remove the .all() from the property definition from above and it definitely won't spawn a query. Obviously you'll need to create other processes that will populate desk.favorite_chairs, which may or may not be feasible.
>>>
>>> TL;DR; I don't believe you can pre-fetch anything because of the extra SQL logic needed to calculate the favorite_or_nearby_chairs attribute. It might be possible via raw SQL though. Reformatting your data models may lead to an easier time since you can then take advantage of the some of the optimizations Django offers.
>>>
>>> I'm slightly out in right field on this one, so YMMV, but taking a hard look at the current model design would be where I would start to try and eliminate the need for that custom queryset.
>>>
>>> Again, the django-debug-toolbar is your friend in these cases, but obviously a high number of even relatively fast queries can have a detrimental effect on your load times. Also ensure that the fields you are using to filter contain indexes, if appropriate/available.
>>>
>>> -James
>>>
>>>
>>>
>>>
>>> On Thu, Feb 26, 2015 at 10:17 AM, James Schneider <jrschneider83@gmail.com> wrote:
>>>>
>>>> Yep, looks like I misunderstood.
>>>>
>>>> So, you want to have something like this pseudo code:
>>>>
>>>> desks = Desk.objects.filter(<some filter>)
>>>> for desk in desks:
>>>>     print desk.favorite_or_nearby_chairs
>>>>
>>>> And have favorite_or_nearby_chairs be pre-populated with your Chair queryset mentioned earlier?
>>>>
>>>> Due to the nature of the custom Chair queryset, I doubt you can do any sort of pre-fetching that would reduce the number of queries. You can probably simulate the effect of prefetch_related() though by overriding the __init__() method of your Desk model, and having the Desk model populate favorite_or_nearby_chairs whenever a Desk object is created. 
>>>>
>>>> class Desk(models.Model)
>>>>     def __init__(self):
>>>>         # using list() to force the queryset to be evaluated
>>>>         self.favorite_or_nearby_chairs = list(<custom Chair queryset>)
>>>>
>>>>
>>>> However, that probably doesn't buy you much since you are still doing an extra query for every Desk you pull from your original query.
>>>>
>>>> Funny enough, I was googling around for an answer here, and stumbled across this:
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Feb 26, 2015 at 4:19 AM, aRkadeFR <contact@arkade.info> wrote:
>>>>>
>>>>> got it, so you want to prefetch but not all chairs.
>>>>>
>>>>> I will def follow this thread to see the possibilities of Prefetch :)
>>>>>
>>>>>
>>>>> On 02/26/2015 12:52 PM, Ram Rachum wrote:
>>>>>>
>>>>>> There may be a big number of chairs, and I don't want all the chairs prefetched. I want to have the database filter them according to the queryset I specified in a single call, I don't want to filter them in Python or make a new call to filter them.
>>>>>>
>>>>>> Thanks,
>>>>>> Ram.
>>>>>>
>>>>>> On Thu, Feb 26, 2015 at 1:48 PM, aRkadeFR <contact@arkade.info> wrote:
>>>>>>>
>>>>>>> I may not have completely understand your problem, but
>>>>>>> why not prefetching all the chairs? and then with the (new)
>>>>>>> attribute favorite_or_nearby_chairs loading only the favorite
>>>>>>> or nearby one?
>>>>>>>
>>>>>>> like:
>>>>>>> @property
>>>>>>> def favorite_or_nearby_chairs(self):
>>>>>>>     for chair in self.chair_set.all():
>>>>>>>           #filter...
>>>>>>>           ans += ...
>>>>>>>     return ans
>>>>>>>
>>>>>>> It will only hit the DB once thanks to the first join of desk
>>>>>>> <-> chair.
>>>>>>>
>>>>>>>
>>>>>>> On 02/26/2015 11:28 AM, cool-RR wrote:
>>>>>>>>
>>>>>>>> James, you misunderstood me.
>>>>>>>>
>>>>>>>> There isn't supposed to be a `favorite_or_nearby_chairs` attribute. That's the new attribute I want the prefetching to add to the `Desk` queryset that I need. Also, I don't understand why you'd tell me to add a `.select_related('nearby_desks')` to my query. Are you talking about the query that starts with `Chair.objects`? I'm not looking to get a `Chair` queryset. I'm looking to get a `Desk` queryset, which has a prefetched attribute `favorite_or_nearby_chairs` which contains the `Chair` queryset I wrote down.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ram.
>>>>>>>>
>>>>>>>> On Thursday, February 26, 2015 at 6:02:15 AM UTC+2, James Schneider wrote:
>>>>>>>>>
>>>>>>>>> Well, the Desk model you provided is blank, but I'll believe you that there's a favorite_or_nearby_chairs attribute. ;-)
>>>>>>>>>
>>>>>>>>> Should be relatively simple. Just add a .select_related('nearby_desks') to your existing query and that should pull in the associated Desk object in a single query. You can also substitute in prefetch_related(), although you'll still have two queries at that point.
>>>>>>>>>
>>>>>>>>> If you are trying to profile your site, I would recommend the Django-debug-toolbar. That should tell you whether or not that query set is the culprit.
>>>>>>>>>
>>>>>>>>> -James
>>>>>>>>>
>>>>>>>>> On Feb 25, 2015 1:28 PM, "Ram Rachum" <r...@rachum.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi James,
>>>>>>>>>>
>>>>>>>>>> I've read the docs but I still couldn't figure it out. My queryset works great in production, I'm trying to optimize it because our pageloads are too slow. I know how to use querysets in Django pretty well, I just don't know how to use `Prefetch`. 
>>>>>>>>>>
>>>>>>>>>> Can you give me the solution for the simplified example I gave? This might help me figure out what I'm not understanding. One thing that might be unclear with the example I gave, is that I meant I want to get a queryset for `Desk` where every desk has an attribute names `favorite_or_nearby_chairs` which contains the queryset of chairs that I desrcibed, prefetched.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Ram.
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 25, 2015 at 11:18 PM, James Schneider <jrschn...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I assume that you are talking about the select_related() and prefetch_related() queryset methods?
>>>>>>>>>>>
>>>>>>>>>>> https://docs.djangoproject.com/en/1.7/ref/models/querysets/#select-related
>>>>>>>>>>> https://docs.djangoproject.com/en/1.7/ref/models/querysets/#prefetch-related
>>>>>>>>>>>
>>>>>>>>>>> Both of those sections have excellent examples, and detail what the differences are (primarily joins vs. separate queries, respectively).
>>>>>>>>>>>
>>>>>>>>>>> For better help, you'll need to go into more detail about the queries you are trying to make, what you've tried (with code examples if possible), and the results/errors you are seeing.
>>>>>>>>>>>
>>>>>>>>>>> In general, I would try to get an initial queryset working and gathering the correct results first before looking at optimizations such as select_related(). Any sort of pre-fetching will only confuse the situation if the base queryset is incorrect.
>>>>>>>>>>>
>>>>>>>>>>> -James
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 25, 2015 at 12:05 PM, cool-RR <ram.r...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm trying to solve a problem using the new `Prefetch` but I can't figure out how to use it. 
>>>>>>>>>>>>
>>>>>>>>>>>> I have these models:
>>>>>>>>>>>>
>>>>>>>>>>>>     class Desk(django.db.models.Model):
>>>>>>>>>>>>         pass
>>>>>>>>>>>>     
>>>>>>>>>>>>     class Chair(django.db.models.Model):
>>>>>>>>>>>>         desk = django.db.models.Foreignkey('Desk', related_name='chair',)
>>>>>>>>>>>>         nearby_desks = django.db.models.ManyToManyField(
>>>>>>>>>>>>             'Desk',
>>>>>>>>>>>>             blank=True,
>>>>>>>>>>>>         )
>>>>>>>>>>>>
>>>>>>>>>>>> I want to get a queryset for `Desk`, but it should also include a prefetched attribute `favorite_or_nearby_chairs`, whose value should be equal to: 
>>>>>>>>>>>>
>>>>>>>>>>>>     Chair.objects.filter(
>>>>>>>>>>>>         (django.db.models.Q(nearby_desks=desk) | django.db.models.Q(desk=desk)),
>>>>>>>>>>>>         some_other_lookup=whatever,
>>>>>>>>>>>>     )
>>>>>>>>>>>>
>>>>>>>>>>>> Is this possible with `Prefetch`? I couldn't figure out how to use the arguments.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Ram.
>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the Google Groups "Django users" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
>>>>>>>>>>>> To post to this group, send email to django...@googlegroups.com.
>>>>>>>>>>>> Visit this group at http://groups.google.com/group/django-users.
>>>>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/46d9fdb7-c008-4496-acda-ac7cb30b4a89%40googlegroups.com.
>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
>>>>>>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/EuPduHjSNos/unsubscribe.
>>>>>>>>>>> To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.
>>>>>>>>>>> To post to this group, send email to django...@googlegroups.com.
>>>>>>>>>>> Visit this group at http://groups.google.com/group/django-users.
>>>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CA%2Be%2BciVk7_6VBDoBE-qjLBwrBxiNeVdP6-fwwnOXV%3DvSA3HnCw%40mail.gmail.com.
>>>>>>>>>>>
>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google Groups "Django users" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
>>>>>>>> To post to this group, send email to django-users@googlegroups.com.
>>>>>>>> Visit this group at http://groups.google.com/group/django-users.
>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/fc6b1237-7bd0-44a7-a91e-c12301fe0e05%40googlegroups.com.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
>>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/EuPduHjSNos/unsubscribe.
>>>>>>> To unsubscribe from this group and all its topics, send an email to django-users+unsubscribe@googlegroups.com.
>>>>>>> To post to this group, send email to django-users@googlegroups.com.
>>>>>>> Visit this group at http://groups.google.com/group/django-users.
>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/54EF0808.9090009%40arkade.info.
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google Groups "Django users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
>>>>>> To post to this group, send email to django-users@googlegroups.com.
>>>>>> Visit this group at http://groups.google.com/group/django-users.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CANXboVa6%2BtSa5gmEww2-gm3SGHvGgn2VjOBTbYQA%3DWTt6CaufA%40mail.gmail.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups "Django users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
>>>>> To post to this group, send email to django-users@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/django-users.
>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/54EF0F3A.8000703%40arkade.info.
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
>> To post to this group, send email to django-users@googlegroups.com.
>> Visit this group at http://groups.google.com/group/django-users.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CA%2Be%2BciU2Tnn6oX42%2Bn6Wpg8GRErcaoG9jp_2XCvYmUxDZaKRdw%40mail.gmail.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-users.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/54F0300A.9000204%40arkade.info.
>
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CA%2Be%2BciVGNVZ0hkACb7MWjeVWUAXc6z6zrmhBTNCc8s-gJSRFVA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment