Thursday, September 1, 2022

Performance regression when moving from 3.1 to 3.2 (same with 4)

Hi,

I am seing a sharp increase in execution time for some of my queries when moving from Django 3.1 to 3.2. And the performance hit appears to be the same for Django 4.

My backend is Postgres 14.2. 

My Django project has forums for different types of content. The forum app consists essentially of 5 tables:
  1. a Post table that contains forum posts (essentially a text field);
  2. a Thread table where rows point to a specific content and post;
  3. a FollowUp table where rows point to a specific thread and post;
  4. a ThreadEntry table where rows point to a thread, a user, and the last seen thread post for this user;
  5. a FollowUpEntry table where rows point to a followup, a user, and the last seen followup post for this user.
Here is an example query that executes 2 times slower on 3.2 than on 3.1:
Content.objects.all().annotate(
  has_unread_posts=Greatest(
    # a content is unread if at least one condition is true
    Exists(
      # a thread has never been read (has no user entry)
      Thread.objects.filter(
        content=OuterRef('pk'),
      ).exclude(threadentry__user=user)
    ),
    Exists(
      # a thread entry is not up-to-date
      ThreadEntry.objects.filter(
        thread__content=OuterRef('pk'),
        user=user,
      ).exclude(post=F('thread__post'))
    ),
    Exists(
      # a followup has no user entry
      FollowUp.objects.filter(
        thread__content=OuterRef('pk')
      ).exclude(followupentry__user=user)
    ),
    Exists(
      # a followup entry is not up-to-date
      FollowUpEntry.objects.filter(
        followup__thread__content=OuterRef('pk'),
        user=user,
      ).exclude(post=F('followup__post'))
    ),
  )
).filter(
  has_unread_posts=True,
).order_by(
  'course__uid',
  '-version__start',
).select_related(
  'course',
  'version',
)

Course and Version are other tables related to Content.

I want to know with this query, for each of the user's content, whether or not there is something new in the corresponding forum.  There is something new if any one of the following condition is true:
  1. there exists a thread for which the user has no thread entry (an entry is added when the thread is first read by the user);
  2. there exists a user thread entry for which the last read post is not up to date with the current thread post (the thread owner has modified the post since);
  3. there exists a followup for which the user has no followup entry (an entry is added when the followup is first read by the user);
  4. there exists a user followup entry for which the last read post is not up to date with the followup post (the followup owner has modified the post since).
On my machine, just by changing the Django version using pip, and nothing else, this query takes about 1 second of execution on Django 3.1.14, and a little more than 2 seconds on Django 3.2.15, so about a 2x increase. Here are the current table sizes for these execution times:
  1. Content: 33
  2. Thread: ~30K
  3. FollowUp: ~46K
  4. ThreadEntry: ~1.3M
  5. FollowUpEntry: ~4.5M
  6. Post: ~103K
Am I the only one observing such performance regressions with Django 3.2? On other more complex queries that contain subqueries inside subqueries, I have seen up to 30x execution time increases. 

Did something major happen in SQL generation from 3.1 to 3.2?

Am I doing something wrong? How can this happen?

Any help on understanding what is going on with Django 3.2 would be much appreciated.

Best regards,

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/54287aec-dcf2-4179-b939-f099876e05a9n%40googlegroups.com.

No comments:

Post a Comment