Thursday, September 3, 2020

Re: DB migration: Loading a massive fixture dump

Just for the follow-up, I ended up using pgloader and after some argument tweaking, it worked. Cheers.

On Wednesday, September 2, 2020 at 7:26:04 AM UTC+2 Fred Blaise wrote:
Hi Mike,

Thanks for the answer.

Yes, I was aware yet hopeful regarding the size.

Regarding the converter, I would be interested to take a look at your py3 work, if you could forward it along.

Thanks.


On Wed, Sep 2, 2020, 00:21 Fred Blaise <fred....@gmail.com> wrote:
Hi,

I have to migrate a database from mysql to postgresql. Unfortunately, there is quite a bit of data in there.

As a strategy, I thought of using Django's ORM feature, dump the data from my setup using mysql, and load the fixtures on the other setup using postgresql, such as:

# on mysql
$ docker exec -ti madchap_ddojo_uwsgi_1 bash -c 'python manage.py dumpdata --indent=4 -e sessions -e admin --natural-foreign --natural-primary -e contenttypes -e auth.permission -v2' > ~/Downloads/datadump_orm_dump_1.json

This yields a 4GB json file.

Reload on the other side, such as:

# on postgresql
$  docker exec -ti madchap_ddojo_uwsgi_1 bash -c 'python manage.py loaddata datadump_orm_dump_small.json'

I let the loaddata run for 4hours, and it eventually returns with no error -- but no success either. I actually now think that the message indicating "Installed x object(s) from 1 fixture(s)" does not show, I will see again.

Looking at the postgresql side, querying metatables such as:

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 5;

tells me loads of data gets in there. a "\d+" does not agree, however (weird, but I don't know enough to get the difference between the two). And yet, I cannot see anything through any SELECT statement (hinting to the transaction isolation).

All things equal, this strategy works fine when working with small json files. I just successfully did it with a file less than 1MB.

I am guessing I have a few questions, I am stuck here.

1. Are there any hidden args/switches to manage.py loaddata or else that I could look at?
2. Are there specific django configuration that could help me load the data?
3. Does a loaddata actually result in a massive single transaction? (looks like so, sounds crazy to me :))
4. Is there a better way to migrate my data?

Thanks a lot for any pointers!
Cheers.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/d6f5e874-e96b-4984-b69d-d7857e95a0a4n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/e005a6c1-07ef-4fc5-99ca-1e3e44b58f5en%40googlegroups.com.

No comments:

Post a Comment