Saturday, October 2, 2010

Re: how to retreive the body text alone of a webpage

Dnia 02-10-2010 o 15:10:33 jimgardener <jimgardener@gmail.com> napisał(a):

> hi
> I am writing an application to find out if the body text of a web
> page of given url has been modified-added or removed.Is there some way
> to find out the length of body text alone?
> I wrote a function that can read and return the data length.But is
> there some way I can read the body text alone ?The idea is to ignore
> length change due to tracking id etc that comes with the page and
> take into account only the body text.
>
> def get_page_data(url):
> f=urllib.urlopen(url)
> return len(f.read())
>
> Any help appreciated
> thanks
> jim
>

Check BS
http://www.crummy.com/software/BeautifulSoup/

--
Linux user

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment