Monday, July 1, 2019

Re: Accessing data from a 30 GB file in json format

Hi,
To be able to traverse the JSON structure you'd normally need the entire structure in memory.
Not mandatory, depending whats is to be done with the data.

The same problem exists with XML, and this is the reason why SAX parsers have been created in addition to DOM ones.
 
If the data process can accommodate with on the fly handling, implementing a callback based parser could solve the problem. Maybe have a look at projects such as Naya, UltraJSON and alike, they could be time (and memory 😉) savers.

HTH

Eric


From: django-users@googlegroups.com <django-users@googlegroups.com> on behalf of Cornelis Poppema <c.poppema@gmail.com>
Sent: Monday, July 1, 2019 15:38
To: Django users
Subject: Re: Accessing data from a 30 GB file in json format
 
To be able to traverse the JSON structure you'd normally need the entire structure in memory. For this reason you can't (easily) apply suggestions to iterate over a file efficiently to a JSON file: you can perhaps read the file efficiently, but the structure in memory will still grow in memory. I've found these packages made for efficiently reason large JSON files after a quick search: https://github.com/ICRAR/ijson or https://github.com/kashifrazzaqui/json-streamer. https://stackoverflow.com/a/17326199/248891 shows a simple example when using ijson



On Monday, 1 July 2019 12:07:39 UTC+2, Nibil Ashraf wrote:
Hey,

I have a file with a size of around 30GB. The file is in json format. I have to access the data and write that to a csv file. When I tried to do that with my laptop which has a a RAM of 4GB, I am getting some error. I tried to load the json file like this json_parsed = json.loads(json_data)

Can someone help me with this? How should I do this? If I should go with some server, please let me know what specifications should I use? 

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/00dd3f6a-85da-4942-97bb-eae2652cfe96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment