Tuesday, September 1, 2020

Re: DB migration: Loading a massive fixture dump

-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEE/NCg7Xf1UydoVFgpGvW31BqCOLMFAl9PHh4ACgkQGvW31BqC
OLNbvgf+P1OQ/vR5hq0SqGiZjc4Wqjnprnl9YG9GSZ/fmK/8SfritRI2CzI13F19
DIGDXGuvxOKXo0KblQDihPVzBbndNtFKD0HTXRbDNyuDGOeTS2uDAk+I2G+zoJYS
jXaPD5SUJazfS04j1q75kZ8kYW6L4qITzU/PYh0CTixv0SY9r20JGIfE2jX7v9zs
fezuUzgxU2TZvJradugKoqw87UWeMxwq5YR3yBs+fU2h6xq8ZjPMIgtY815tPn/V
uyPAL2msWr7zAymFSWFsjb4R9rJmLlXb0hvuLhNMsptus5yocfu/GL6iXp2/DSpe
xvhqk5U9SLv9cTTA6aqOM9s082LJmA==
=aARv
-----END PGP SIGNATURE-----
On 2/09/2020 3:51 am, Fred Blaise wrote:
> Hi,
>
> I have to migrate a database from mysql to postgresql. Unfortunately,
> there is quite a bit of data in there.
>
> As a strategy, I thought of using Django's ORM feature, dump the data
> from my setup using mysql, and load the fixtures on the other setup
> using postgresql, such as:
>
> # on mysql
> $ docker exec -ti madchap_ddojo_uwsgi_1 bash -c 'python manage.py
> dumpdata --indent=4 -e sessions -e admin --natural-foreign
> --natural-primary -e contenttypes -e auth.permission -v2' >
> ~/Downloads/datadump_orm_dump_1.json
>
> This yields a 4GB json file.
>
> Reload on the other side, such as:
>
> # on postgresql
> $  docker exec -ti madchap_ddojo_uwsgi_1 bash -c 'python manage.py
> loaddata datadump_orm_dump_small.json'
>
> I let the loaddata run for 4hours, and it eventually returns with no
> error -- but no success either. I actually now think that the message
> indicating "Installed x object(s) from 1 fixture(s)" does not show, I
> will see again.
>
> Looking at the postgresql side, querying metatables such as:
>
> SELECT nspname || '.' || relname AS "relation",
>     pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
>   FROM pg_class C
>   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
>   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
>     AND C.relkind <> 'i'
>     AND nspname !~ '^pg_toast'
>   ORDER BY pg_total_relation_size(C.oid) DESC
>   LIMIT 5;
>
> tells me loads of data gets in there. a "\d+" does not agree, however
> (weird, but I don't know enough to get the difference between the
> two). And yet, I cannot see anything through any SELECT statement
> (hinting to the transaction isolation).
>
> All things equal, this strategy works fine when working with small
> json files. I just successfully did it with a file less than 1MB.
>
> I am guessing I have a few questions, I am stuck here.
>
> 1. Are there any hidden args/switches to manage.py loaddata or else
> that I could look at?

loaddata is intended for small imports such as skeleton data for unit
testing


> 2. Are there specific django configuration that could help me load the
> data?
> 3. Does a loaddata actually result in a massive single transaction?
> (looks like so, sounds crazy to me :))

Probably does because that would be what you most likely want. But as
you have discovered it probably runs out of resources if the load is too
big.

> 4. Is there a better way to migrate my data?

Yes.

https://github.com/lanyrd/mysql-postgresql-converter/blob/master/db_converter.py

I used this successfully a few months ago. I had to convert it from
py2.7 to 3.x.

Briefly, you will have to dump MySQL again using a similar incantation
to this  ...

mysqldump --compatible=postgresql --default-character-set=utf8 -r
databasename.mysql -u root databasename

If you (or anyone) would like a copy of my adjusted db_converter.py,
drop me a line off-list.

Cheers

Mike

> Thanks a lot for any pointers!
> Cheers.
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-users+unsubscribe@googlegroups.com
> <mailto:django-users+unsubscribe@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/d6f5e874-e96b-4984-b69d-d7857e95a0a4n%40googlegroups.com
> <https://groups.google.com/d/msgid/django-users/d6f5e874-e96b-4984-b69d-d7857e95a0a4n%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
Signed email is an absolute defence against phishing. This email has
been signed with my private key. If you import my public key you can
automatically decrypt my signature and be sure it came from me. Just
ask and I'll send it to you. Your email software can handle signing.


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/83fa51d5-7e52-a1fd-694b-d20ab295bb51%40dewhirst.com.au.

No comments:

Post a Comment