Sunday, May 20, 2018

Re: Selecting data from intermediate table in prefetch_related results in duplicate rows

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEgp5wx8+ggeLmQKiikpDftiGHlegFAlsBRqYACgkQkpDftiGH
lei/kQ//UU0BwX5jvcUIMVRNV74IEVPtUoo0bi1RAo0/cWNbxIfcogrfcfv0un2f
sSz6bTKZs6+cEpQuPKrzNzC13TrBIt1/OPgtBvDW0OfehUgxhv3tbzWYR0DPwe3l
XCmOyXuktKwsHJdKASz/RzHae+vUqmQaUtVo5P3F7R4G/3XP1vsztWaqABHOFnSo
+pbZgpSikj5PZqlroJu3cs8G6IKEoF1udhOpCkOpW+SNLh47i70YqkskOnX0MU8k
3EfGGyCm92dZKEwgZZBzHL+sEDHxHyMU5v93TiEQsRIz0icSBn7squce7JpbjINJ
4BEyCLRuBsZXdvLunZwJwJTjK1I9nZvKSTb8cqR6qVam3R53UnkHLREobyKU5bG0
9WOuw0AVzMm1BdNtISikNYeyjC/0+4ZcL+5vhloc+gq5ITgt/ii1gDTlo8oE4Ynf
RJ2fbmZPeWa4yvetSZvi/wWGTds5PgqoNrJE84KP0hM1BQdC/EUIvIdkwVG6zcEu
Q0jCVs8evAdEv1pDpOTmH1ho6y/rsa6jVDiswS+x/69O7bUWI3YMtfrV/Awj1Gsg
HGevpVu6+oOlLgr3sWaMWk7cfeVBfCf+0vSeLCWJMRIOmy1rMCi/IrM2HASVq54w
Qk8qwD+R0QU6CQOAII1bT1ugqdnTmoyy92fu+ibbQlY6lJVOVZA=
=4Wp1
-----END PGP SIGNATURE-----
I completely forgot about `extra` method and it seems it does solve the problem:

roles = Prefetch(
    'users',
    queryset=User.objects.extra(select={'role': 'users_role.role'})
)
qs = Project.objects.prefetch_related(roles))

However, I'm still curious why `annotate` doesn't work when used inside `prefetch_related`.


1. Is it bug? Should it work in the same way as when used outside `prefetch_related`?
2. Is there a way to avoid `extra` method? It should be avoided at all cost, after all.


Thank you in advance!
   Tom


20. 5. 2018 v 11:41, Tomáš Ehrlich <tomas.ehrlich@gmail.com>:

Hello,
I have a two models (User, Project) in m2m relationship with intermediate (Role) table.

When I'm selecting all users in project and I also want to select corresponding role, I simply annotate one field from Role table using F expression:

Users.objects.all().filter(projects__name='Django').annotate(role=F('roles__role'))

Annotate in this case doesn't create new join, because `roles` table is already joined to filter on data from `projects` table. This works well.


However, I tried to same in `prefetch_related` and I'm getting duplicate rows, because there's a new JOIN statement added. (Usecase: Selecting all projects in DB with all users per project)

The SQL statement with `prefetch_related`, but without `annotate` looks like this:

roles = Prefetch(
    'users',
    queryset=User.objects.all()
)
qs = Project.objects.prefetch_related(roles)

SELECT
  ("users_role"."project_id") AS "_prefetch_related_val_project_id",
  — other fields here
FROM "users_user"
  INNER JOIN "users_role" ON ("users_user"."id" = "users_role"."user_id")
WHERE "users_role"."project_id" IN (1, 2, 3, 4, 5)


As you can see, the table `users_role` is already joined, so I'm basically looking for Django ORM expression which generates following SQL query:

SELECT
  ("users_role"."project_id") AS "_prefetch_related_val_project_id",
  "users_role_."role",
  — other fields here
FROM "users_user"
  INNER JOIN "users_role" ON ("users_user"."id" = "users_role"."user_id")
WHERE "users_role"."project_id" IN (1, 2, 3, 4, 5)


Unfortunatelly, following expression generates incorrect SQL:

roles = Prefetch(
    'users',
    queryset=User.objects.all().annotate(role=F('roles__role'))
)
qs = Project.objects.prefetch_related(roles)

SELECT
  ("users_role"."project_id") AS "_prefetch_related_val_project_id",
  "users_role"."role"         AS "role",
  — other fields here
FROM "users_user"
  LEFT OUTER JOIN "users_role" ON ("users_user"."id" = "users_role"."user_id")
  INNER JOIN "users_role" T3 ON ("users_user"."id" = T3."user_id")
WHERE T3."project_id" IN
      (1, 2, 3, 4, 5)

The extra `left outer join` causes duplicate entries.


I've found one ticket (https://code.djangoproject.com/ticket/27144) which seems to be relevant, but it's old and closed.

Any ideas? Is it bug or is there really a reason to include extra JOIN? I'm not very skilled in relational algebra.

Thank you in advance!


Cheers,
   Tom




No comments:

Post a Comment