Web Scraper | Python
For the Golang version please check here.
And here is a python version:
Now call the python module class that you just created.
Now keep the title as the title_init
and then do a while loop:
i = 0
while latest_title == title_init :
i += 1
print(i, "no new feeds yet, sleep 2 seconds")
time.sleep(2)
result = getstats(url)[2]
latest_title = result[0][0]
if latest_title != title_init :
print("new news:")
result = getstats(url)[2]
latest_title = result[0][0]
title_init = result[0][0]
for r in result:
print(r[0])
else:
continue
Then you have a tiny web scraper:
Good! Now we have the RSS feed and the link, Let’s grab the content off the internet with the HTML attribute. We will need the BeautifulSoup
.
Now let’s create a class to capture those texts by parsing with the HTML attribute like {'class': 'StandardArticleBody_container'}
.
Let’s hit the button and give it go 😎😎😎!

Read other posts