Friday, January 21, 2022

Re: Search & replace an object in the database

Yes this exactly. A couple of possible additions:

I've had great luck using python and SQL queries to manipulate a database including converting from other schemas and merging duplicate records.

Make a test database from your backup, and beat it up with your scripted code until you are happy with the result. Just keep deleting the test database and reconstructing it from the backup to try again. The python code can do things easily such as sucking in all records, looking for overlaps on various fields, and merging the results.

If this is not a one-time thing, then you might consider doing something similar through the admin interface, but breaking the problem down into separate phases using intermediate models. Then you could review each possible merge before going on to the next phase of actually doing a merge, or choosing to ignore a merge candidate. You could also do additional screening to, for example, select *which* customer record should be considered complete and which ones should be retired.

hth

- Tom

On Jan 21, 2022, at 12:16 PM, Bernard Mallala <bernard@autofyle.com> wrote:

I would approach this differently and from the database side mostly
  1. Backup the database so you can restore if something happens
  2. In Django, Create an an empty copy of the Customer database e.g CustomerB
  3. Using a SQL query tool, select all duplicate copies from Customer and insert into CustomerB
  4. If needed, update the model for CustomerB so that you are able to associate or reference every duplicate in CustomerB with the right record in Customer
  5. Using a SQL query too delete all duplicate copies of Customers from Customer table
  6. If this breaks your application in any way, account for that by ensuring that every duplicate record in CustomerB has a primary record in Customer that suffices and returns results. This will ensure that all relations with other models and records are honored.
  7. Update or fix, both models to ensure that duplicates can no longer be inserted
  8. Perform any other schema updates


On Friday, January 21, 2022 at 12:43:03 PM UTC-7 yebo...@gmail.com wrote:
Okay bro try this script hope it works

On Fri, 21 Jan 2022 at 10:05, jools <jul...@pinabausch.org> wrote:
Dear Django users,

here's an interesting case and I'm curious whether somebody can point me in the right direction.

Here's the problem:
My company has a database of customers. Over the years, many duplicates have been created. For cleaning up duplicates, I'd like to have a search-and-replace functionality.

I.e.: Replace all references in the database to customer <old> with customer <new>.
The database schema has quite a bit of complexity, so I'm aiming to find a very generic solution.

Approach:
Django has a bit of functionality to find all references to an existing object, namely django.db.models.deletion.Collector and I'm using it to find all references.

Though, the "replace" logic seems quite hard to get right:
  • It has to keep parent links intact
  • It has to recognize references in parent models (Customer model is derived from the concrete model Actor)
  • It has to recognize generic relations, built with Django's content types

My stub implementation comes below.

  • Has anybody else implemented sth like this, e.g. in helper library?
  • Do you think the approach is right?
  • What would you differently?
  • Any caveats that you know of?
Best regards
Jools

Stub implementation:

from django.db.models.deletion import Collector, ProtectedError, RestrictedError


def replace_object(old_obj, new_obj, using='default'):
    """
    Replace an old object with a new object throughout the database.
    """
    collector = Collector(using)

    try:
        collector.collect([old_obj])
    except ProtectedError:
                pass
    except RestrictedError:
                pass

    for model, obj in collector.instances_with_model():
        for field_opts in obj._meta.get_fields():
            if __is_qualified(field_opts, new_obj):
                setattr(obj, field_opts.name, new_obj)
                obj.save()

def __is_qualified(field_opts, new_obj):
    if not hasattr(field_opts, 'related_model'):
        return False
    
    # This check is wrong for multi-table inheritance, I suppose?
    if field_opts.related_model != new_obj.__class__:
        return False
    if hasattr(field_opts, 'parent_link') and field_opts.parent_link is True:
        return False

    return True


-- 
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/4cd485cd-cc20-481d-a7be-3179bcc5c98bn%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/32c497c1-0640-4cae-9fdd-16fb4fb62e70n%40googlegroups.com.

No comments:

Post a Comment