Thursday, September 17, 2015

Re: Prefetch() with through models

> Den 16/09/2015 kl. 16.45 skrev Mike Dewhirst <miked@dewhirst.com.au>:
>
> On 16/09/2015 9:53 AM, Erik Cederstrand wrote:
>> Hi folks,
>>
>> I'm working on a school timetable app. I want to fetch hundreds of
>> thousands of Lesson instances with prefetched m2m relations (e.g.
>> subjects). My m2m relations use through models (I'm not sure this
>> actually makes a difference here), and I'm running into performance
>> issues because the prefetch query does something along the lines of
>> "SELECT ... FROM lesson_subjects WHERE lesson_id IN
>> [insane_list_of_lesson_ids]".
>
> I'm no expert so I'm wondering if len([insane_list_of_lesson_ids]) == "hundreds of thousands"?

In my, case, yes. I think the backend might process the query in chunks if the length of the SQL exceeds the max SQL query size.

I've had a look at the prefetch code in Django 1.8. The prefetcher is designed to kick in *after* the QuerySet has constructed the list of objects. It only has access to the resulting list of items, not the original SQL filters, so the only sane way to fetch related objects is to use an IN clause. I can't see any way to replace this with the filters without rewriting the whole prefetch code.

I'll try the route Simon suggested and run the prefetch queries myself. I can probably populate the _prefetched_objects_cache on each object so the optimization is reasonably transparent to consumers. Prefetching 'foo__bar' this way gets more complicated, so I think I'll go with a simple solution first.

> I'm not much help but I do remember many years ago a school timetable programming guru who told me they are definitely not trivial. I think he was referring to timetable creation given enrolments, student preferences, curriculum/course requirements, availability of teachers and location of campuses. One of the main issues (ISTR) was memory - or lack of it.

Constraints solvers to calculate the optimal timetable given a set of (soft or hard) requirements and costs is mathematically well understood but insanely difficult to get right in practice. Lack of memory and the limited life span of human beings are just some of the issues. Luckily, I'm just working on the output of that calculation :-)

Erik

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/69D8C444-936A-46CA-904A-FC9B4C0CA39D%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment