2021-03-31

Python web scraping: websites from google search result

A newbie to Python here. I want to extract info from multiple websites (e.g. 100+) from a google search page. I just want to extract the key info, e.g. those with <h1>, <h2> or <b> or <li> HTML tags etc. But I don't want to extract the entire paragraph <p>.

I know how to gather a list of website URLs from that google search; and I know how to web scrape individual website after looking at the page's HTML. I use the Request and BeautifulSoup for these tasks.

However, I want to know how can I extract key info from all these (100+ !) websites without having to look at their html one by one. Is there a way to automatically find out the HTML tags the website used to emphasize key messages? e.g. some websites may use <h1>, while some may use <b> , or something else...

All I can think of is to come up with a list of possible "emphasis-typed" HTML tags and then just use BeautifulSoup.find_all() to do a wide-scale extraction. But surely there must be an easier way?



from Recent Questions - Stack Overflow https://ift.tt/2PGeuIT
https://ift.tt/eA8V8J

No comments:

Post a Comment