Tuesday, June 14, 2022

Re: How to hash fields and detect changes in a record

On 6/12/22 11:40 PM, Mike Dewhirst wrote:

-------- Original message --------
From: Ryan Nowakowski <tubaman@fattuba.com>
Date: 13/6/22 07:09 (GMT+10:00)
Subject: Re: How to hash fields and detect changes in a record

On Sat, Jun 11, 2022 at 12:13:16AM +1000, Mike Dewhirst wrote:
> On 10/06/2022 11:24 pm, Ryan Nowakowski wrote:
> > On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote:
> > > I think the solution might be to hash note.title and note.note into a new
> > > field note.hash on being auto-created. On subsequent saves, compare the
> > > latest hash with note.hash to decide whether to delete auto-inserted notes
> > > prior to generating the next set. Those subsequent saves could be months or
> > > years later.
> > Hashing is useful if you want to check that something has been
> > unexpectedly changed.  I assume the note can only be changed through
> > your web app so you know when a user is changing a note.
>
> These are automatically generated notes which taken together constitute
> advice on how to deal with the analysis. Users can edit them. For example,
> someone might record some action taken regarding the advice. I don't want to
> delete that. If nothing has been edited, it is safe to delete.
>
> So how do I know it is the same as when originally generated - and safe to
> delete - except by storing a hash of the interesting fields.

Because when the user edits a note, during the form.save()(assuming
you're using Django forms), you'll set `altered_by_user` to True.

Notes can also be altered in the Admin


You have a couple of choices then.  You could alter the note details view in the admin to set the altered_by_user field.  Alternatively and more generically, you could check the pk field in your model save method.  If it is None, then you are creating a new note.  If the pk field is not None, then you are updating an existing note so you can set altered_by_user to True.


> And if that is the best approach, what sort of hashing will survive Python
> upgrades etc?

Pick a hash algorithm[1](ex: sha256).  The output will remain the same
even with Python upgrades.

So the mechanism doesn't need to be a hash - as you said.I now just sum ord(char) for the title and the note and keep that in a flag field.

Summing the ordinal of the characters won't catch transposition:

>>> chars = 'ab'  >>> sum([ord(c) for c in chars])  195  >>> chars = 'ba'  >>> sum([ord(c) for c in chars])  195  

Better to use a real hash algorithm if you're trying to detect changes.  My note above about hashing not being required is because you don't need to detect changes because you explicitly already know when changes are being made.

No comments:

Post a Comment