Thursday, August 17, 2023

Re: Case-insensitive non-deterministic collation

On 18/08/2023 12:34 pm, Vitor Freitas wrote:
Hi Mike,

On Tue, Aug 15, 2023 at 4:30 AM Mike Dewhirst <miked@dewhirst.com.au> wrote:

This is a great reference. It helped me out with the migration from postgresql ci fields to db collations.

Everything about this is new for me as well. I'm sure the db collation strategy is more powerful and I can see the benefits.

However, the postgresql ci fields were way easier to implement.

I agree


Right now I'm testing it out on a smaller project. One problem that I'm currently facing is that exposing some fields that have the db_collation configuration to django-filters or to Django Admin search parameters are causing an exception:

NotSupportedError
nondeterministic collations are not supported for LIKE

Yes. I got the same error ...

https://code.djangoproject.com/ticket/33901   closed Bug (fixed)

From the discussion in that ticket I got the impression that maybe I should postpone using collations until after I upgrade from Django 3.2 to 4.2.

I was using a very similar collation.

I have already removed my CI fields so I won't put them back.

I manually (__iexact) check to prevent duplicate names for users and companies and the sorting is not critical for me at the moment.

I think I'll avoid collations for now. I'm running out of brainspace and don't have the time to do the exhaustive research needed to correctly define a bug for a new ticket. I'll press on with workarounds until I'm absolutely forced to upgrade to 4.2 and hope that some generous soul has sorted it out by then.

The fact is it is new for PostgreSQL as well as Django so it isn't surprising to see such wrinkles.

Cheers

Mike


This is the collation that I'm using:

CreateCollation(
  "case_insensitive",
  provider="icu",
  locale="und-u-ks-level2",
  deterministic=False,
)

Anyway, all the icu / und-u-ks stuff look a little bit confusing. It would be good to have some guidelines or some quick recipes on the docs that would help us out making the migration :-)

Kind regards,
Vitor
 


Covers all the bases.

Thank you Adam

Cheers

Mike

On 7/08/2023 12:28 pm, Mike Dewhirst wrote:
On 6/08/2023 9:17 pm, Chetan Ganji wrote:

Thanks Chetan

I have seen that 'icu' and 'und-whatever...' in various places on the web - so it seems to be spreading - but I haven't had the brainspace to understand it yet.

I'll try an experiment with provider='C' and locale='C' because that is how most of my databases are already established. If that passes my tests I might move on to other things.

From what I can see, PostgreSQL are likely to deprecate citext as inelegant. That would be why Django has deprecated it.

Thanks again.

Mike

Check this out.
https://gist.github.com/hleroy/2f3c6b00f284180da10ed9d20bf9240a

# According to Django documentation, it's preferable to use non-deterministic collations
# instead of the citext extension for Postgres > 12.
# Example migation to create the case insensitive collation
class Migration(migrations.Migration):
operations = [
CreateCollation(
'case_insensitive',
provider='icu',
locale='und-u-ks-level2',
deterministic=False
)
]
# Example model using the new db_collation parameter introduced with Django 3.2
class Tag(models.Model):
name = models.CharField(max_length=50, db_collation='case_insensitive')
class Meta:
ordering = ['name']
def __str__(self):
return self.name

Regards,
Chetan Ganji
+91-900-483-4183


On Sun, Aug 6, 2023 at 12:32 PM Mike Dewhirst <miked@dewhirst.com.au> wrote:
On 5/08/2023 7:58 pm, Chetan Ganji wrote:
Hi Mike

RE: The primary use case is to establish case-insensitivity when checking names - including usernames, company names and abbreviations/acronyms. 

I dont know anything about db_collation.

Me neither

Below 4 lookups should solve most common scenarios.

Actually that was how I did it originally. I switched to using the PostgreSQL CI field because it is all done in the database - much faster - and my code is much reduced and therefore fewer possibilities for bugs etc.

Judging from the Django release notes and the PostgreSQL docs there should be a straightforward answer to my question. Researching the correct answer is complex enough to make me ask here first.

Cheers

Mike


On Sat, Aug 5, 2023 at 1:35 PM Mike Dewhirst <miked@dewhirst.com.au> wrote:
The following warning triggered a bit of research which looks like a significant amount of study will be required to find the collation needed ...


django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1.
        HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead.


Does anyone have experience they would like to share? What replaces that ellipsis?

The primary use case is to establish case-insensitivity when checking names - including usernames, company names and abbreviations/acronyms. Maybe there is a better way to handle that?

This is my typical PostgreSQL database spec ...

CREATE DATABASE xxxx
    WITH
    OWNER = miked
    ENCODING = 'UTF8'
    LC_COLLATE = 'C'
    LC_CTYPE = 'C'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1
    IS_TEMPLATE = False;

Many thanks for any help

Cheers

Mike

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/2eccab9e-e296-55e0-05de-e8d4cf708262%40dewhirst.com.au.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAMKMUjuxfeV5m4QiPz1jEyh7fRobqZn7SCp4dnXnjrSOBirh7Q%40mail.gmail.com.


--   Signed email is an absolute defence against phishing. This email has  been signed with my private key. If you import my public key you can  automatically decrypt my signature and be sure it came from me. Your  email software can handle signing.  
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/e3d57fb4-5899-a04c-f2b8-f39591c978c7%40dewhirst.com.au.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAMKMUju4y_H%2BDUr1qn0Y4eNjYC_o%2BzCe5PNRiLctUZxfos5TSA%40mail.gmail.com.


--   Signed email is an absolute defence against phishing. This email has  been signed with my private key. If you import my public key you can  automatically decrypt my signature and be sure it came from me. Your  email software can handle signing.  

--   Signed email is an absolute defence against phishing. This email has  been signed with my private key. If you import my public key you can  automatically decrypt my signature and be sure it came from me. Your  email software can handle signing.  
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/62ca89e9-f31e-7b78-3e7e-195692a0e728%40dewhirst.com.au.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAJ4iidooT7Ei9bkCPT394eA8S_aZfNxBTJ5WSFKTwAmfsgM5YA%40mail.gmail.com.


--   Signed email is an absolute defence against phishing. This email has  been signed with my private key. If you import my public key you can  automatically decrypt my signature and be sure it came from me. Your  email software can handle signing.  

No comments:

Post a Comment