Saturday, June 30, 2012

Handling millions of rows + large bulk processing (now 700+ mil rows)

Hi all,

As some of you know, I did a live webcast last year (July 2011) on our LLG project, which explained how we overcome some of the problems associated with large data processing.

After reviewing the video, I found that the sound quality was very poor, the slides weren't very well structured, and some of the information is now out of date (at the time it was 40mil rows, now we're dealing with 700+mil rows).

Therefore, I'm considering doing another live webcast (except this time it'll be recorded+posted the next day, the stream will be available in 1080p, it'll be far better structured, and will only last 50 minutes).

The topics I'd like to cover are:

* Bulk data processing where bulk_insert() is still not viable (we went from 30 rows/sec to 8000 rows/sec on bulk data processing, whilst still using the ORM - no raw sql here!!)
* Applying faux child/parent relationship when standard ORM is too expensive (allows for ORM approach without the cost)
* Applying faux ORM read-only structure to legacy applications (allows ORM usage on schemas that weren't properly designed, and cannot be changed - for example, vendor software with no source code).
* New Relic is beautiful, but expensive. Hear more about our plans to make an open source version.
* Appropriate use cases for IAAS vs colo with SSDs.
* Percona is amazing, some of the tips/tricks we've learned over.

If you'd like to see this happen, please leave a reply in the thread - if enough people want this, then we'll do public vote for the scheduled date.



You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at

No comments:

Post a Comment