Thursday, December 27, 2018

Re: best way to generate PDFs in django - handling concurrency

On 12/27/18 6:05 PM, Danny Blaker wrote:
we're building an app for the council where users fill in a form, then we generate a PDF (containing a page of text), and users get a download link on the homepage. 
we expect many users to submit forms concurrently.

I see 2 approaches:

1. Generate in the view as per documentation : https://docs.djangoproject.com/en/2.1/howto/outputting-pdf/
2. Use a seperate script to generate PDF and use django api https://www.django-rest-framework.org/api-guide/parsers/#fileuploadparser

However, to handle concurrency we'll also need a broker - like rabbitMQ + Celery

Is there a "best practice" way to approach this, or has anyone had experience with generating PDFs in django and can recommend an approach?

Thanks!!

resources:




Best way is relative.

Simplified we do the following:

  • Use Django Templates to create a HTML page.
  • Save the HTML output as a file.
  • Use subprocess to call wkhtmltopdf to convert the HTML file to PDF file.
  • Return a PDF response using the saved PDF file

It works, not sure if it is the fastest way, we see about a 4-5 sec response time. As some of the generated PDF files can be reused we actually keep the PDF file on disk and reuse them for other recurring requests. This bypasses the generation of PDF file bring the response time back to 1 sec.

By using Celery you don't have an exact time of when a PDF will be generated.

There are several commercial applications available to generate PDF's which might be much faster but for us the open source application works fine.

wkhtmltopdf: https://wkhtmltopdf.org/

--
Peter van der Does
o: 
410-584-2500
m: 732-425-3102
ONeil Interactive, Inc 

No comments:

Post a Comment