Tuesday, October 30, 2012

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

On Tue, Oct 30, 2012 at 7:35 AM, Isaac XXX <vyrphan@gmail.com> wrote:
> Hi Tom,
>
> you're right, I was not really explicit about what were my lacks of
> information. Right now, the following points are the ones I can't found a
> howto for the desired deployment:
>
> - Create a master-slave system on postgresql, maintaining all systems up to
> date, distributing reads, and centralizing writes
> - How to configure a cluster of reverse proxies (a single reverse proxy can
> not be enought, and I need to plan to deploy more than 1 load balancers)
>
> The rest (configure apache with mod_wsgit, configure nginx to serve static
> content and so on), is now solved on my current deployments, so it should
> not be a problem on a distributed environment.
>
> Cheers,
>
> Isaac
>

Hi Isaac

We use a similar setup at $JOB. We run a pair of apache httpd servers,
which serve static content and reverse proxy to other http servers
(usually apache again, sometimes not) for dynamic content. The
requests are distributed evenly between the two proxies by our
routers, which round robin connections between the two of them. This
is basically your setup, but we use Apache, because we know it and can
tune it to give nginx like performance anyway.

Actually, that's only half of it - each proxy has all of the public
IPs we serve allocated on lo0 (loopback), and the requests are round
robin routed via a pair of high availability addresses. Therefore, on
both boxes, apache listens on the 'right' IPs. If we ever want to run
with just one proxy, we can 'down' one of the HA addresses on the
server we wish to update, which moves that HA address to be active on
the other proxy, which then serves all the requests.

Now, did we need to do any of this? Probably not! We serve in the
region of 3-5 million requests a day, with peaks of around 200
concurrent requests/s going through the proxies. Apache uses in total
500MB of RAM, pre-tuned to serve up to 768 concurrent req/s without
requiring extra resources. Load average on the boxes never goes above
0.05, even if we put all the requests through one machine. It's nice
having a spare for such a critical part of infrastructure though.

pgsql scaling is a little more involved than MySQL, which is what
we've always used here - usually because it's replication systems are
so good. You will typically need to use some external software to
manage the replication, eg Slony, but don't take my word for it, very
limited pgsql experience.

Cheers

Tom

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment