On Sat, 30 May 2020 at 13:49, Shaheed Haque <shaheedhaque@gmail.com> wrote:
-- Hi,I have a model MyModel which has a JSONField() called 'snapshot'. In Python terms, each snapshot looks like this:======================snapshot = {'pay_definition' : {'1234': {..., 'name': 'foo', ...},'99': {..., 'name': 'bar', ...},}======================I'd like to find all unique values of 'name' in all instances of MyModel. I have this working using native JSON functions like this:class PayDefs(Func):
function='to_jsonb'
template="%(function)s(row_to_json(jsonb_each((%(expressions)s->'pay_definition')))->'value'->'name')"
MyModel.objects.annotate(xxx=PayDefs(F('snapshot'))).order_by().distinct('xxx').values_list('xxx', flat=True)My question is if this the best way to solve this problem? The way my current logic works, reading from insider out is, I think:
- Pass in the 'snapshot'.
- Since 'snapshot' is a JSON field, "->'pay_definition'" traverses this key.
- To skip the unknown numeric keys, "jsonb_each()" turns each key, value pair into an inner row like ['1234', {...}].
- To get to the value column of the inner row "row_to_json()->'value'".
- To get the name field's value "->'name'".
- A final call to "to_jsonb" in the PayDefs class.
For example, since all I care about is the string value of 'name', is there a way to get rid of the PayDefs class, and its invocation of to_jsonb? Likewise, is there a better way to do the inner parts? To provide context on what "better" might be:
- Snapshot JSONs might easily be 20MB in size.
- Each 'pay_definition' is probablyonly about 1kB in size, and there might be 50 of them in a snapshot.
- There might be 1000 MyModel instances in a given query.
- I'm using PostgreSQL 12
so my concern is not have the database server or Django perform extraneous work converting between strings and JSON for example.
It has been pointed out that a JOIN LATERAL like this can also perform the iteration over the keys:
SELECT DISTINCT snapshot->'pay_definition'->k.value->'name'
FROM mymodel
JOIN LATERAL jsonb_object_keys(snapshot->'pay_definition') AS k(value) ON true
FROM mymodel
JOIN LATERAL jsonb_object_keys(snapshot->'pay_definition') AS k(value) ON true
But Google suggests that a raw query might be the only way to get this to work under the ORM. Pondering further, I was able to simplify that to this:
SELECT DISTINCT snapshot -> 'pay_definition' -> k.value -> 'name'
FROM mymodel, jsonb_object_keys(snapshot -> 'pay_definition') AS k(value);
FROM mymodel, jsonb_object_keys(snapshot -> 'pay_definition') AS k(value);
Is there a way to wrap this form in the embrace of the ORM?
Thanks, Shaheed
=================
Thanks, Shaheed
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAHAc2jfv9z5AKQm8%3D%3DxWJD9e924%3DnANs7WSdaY_OEcPYyZOcMQ%40mail.gmail.com.
No comments:
Post a Comment