Saturday, January 28, 2012

Re: Parsing HTML

On 2012-01-27, at 23:40 , jondbaker wrote:
> Chapter 8 of Dive Into Python demonstrates what you're describing
> using sgmllib.
> http://www.diveintopython.net/

None of these libraries is very good at parsing "real-world" (broken) HTML though, for that you'd better go with html5lib, lxml.html or BeautifulSoup (in decreasing order of recommendation, lxml.html is probably the fastest but I don't think it implements the HTML5 parsing rules)

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment