Friday, January 27, 2012

Re: Parsing HTML

Chapter 8 of Dive Into Python demonstrates what you're describing
using sgmllib.
http://www.diveintopython.net/

On Jan 27, 3:31 pm, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:
> On Fri, 27 Jan 2012 13:35:42 +0700, ddtopgun <ddto...@gmail.com> wrote:
> >i'am new to django and i want to try get the content of HTML.
> >can help me how to get the content of html.
>
>         <blink><blink>
>
>         Django is meant to generate HTML pages, not parse HTML content.
>
> >f=urllib.request.urlopen("http://site_name.com")
> >s=f.read()
> >f.close()
>
> >but the code is display all code html. i want to just take the contents
> >of tag html.
>
>         You'll have to do better to define "contents". Only stuff inside
> <p></p> tags (and you then may have to worry about old HTML that doesn't
> using closing </p> tags)? Is an image reference ( <img
> src="somefile.name"> </img>) content or only the text between the tags?
>
>         If the HTML is well-formed, you might be able to use ElementTree to
> traverse the nodes. Or define callbacks for HTMLParser or htmllib (see
> section 19 [for Python 2.7]: Structured Markup Processing in the
> Standard Library reference manual) to capture the portion in which you
> are interested.
> .
> --
>         Wulfraed                 Dennis Lee Bieber         AF6VN
>         wlfr...@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment