Friday, January 21, 2022

Search & replace an object in the database

Dear Django users,

here's an interesting case and I'm curious whether somebody can point me in the right direction.

Here's the problem:
My company has a database of customers. Over the years, many duplicates have been created. For cleaning up duplicates, I'd like to have a search-and-replace functionality.

I.e.: Replace all references in the database to customer <old> with customer <new>.
The database schema has quite a bit of complexity, so I'm aiming to find a very generic solution.

Approach:
Django has a bit of functionality to find all references to an existing object, namely django.db.models.deletion.Collector and I'm using it to find all references.

Though, the "replace" logic seems quite hard to get right:
  • It has to keep parent links intact
  • It has to recognize references in parent models (Customer model is derived from the concrete model Actor)
  • It has to recognize generic relations, built with Django's content types

My stub implementation comes below.

  • Has anybody else implemented sth like this, e.g. in helper library?
  • Do you think the approach is right?
  • What would you differently?
  • Any caveats that you know of?
Best regards
Jools

Stub implementation:

from django.db.models.deletion import Collector, ProtectedError, RestrictedError


def replace_object(old_obj, new_obj, using='default'):
    """
    Replace an old object with a new object throughout the database.
    """
    collector = Collector(using)

    try:
        collector.collect([old_obj])
    except ProtectedError:
                pass
    except RestrictedError:
                pass

    for model, obj in collector.instances_with_model():
        for field_opts in obj._meta.get_fields():
            if __is_qualified(field_opts, new_obj):
                setattr(obj, field_opts.name, new_obj)
                obj.save()

def __is_qualified(field_opts, new_obj):
    if not hasattr(field_opts, 'related_model'):
        return False
   
    # This check is wrong for multi-table inheritance, I suppose?
    if field_opts.related_model != new_obj.__class__:
        return False
    if hasattr(field_opts, 'parent_link') and field_opts.parent_link is True:
        return False

    return True

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/4cd485cd-cc20-481d-a7be-3179bcc5c98bn%40googlegroups.com.

No comments:

Post a Comment