Python web scraping: websites from google search result
A newbie to Python here. I want to extract info from multiple websites (e.g. 100+) from a google search page. I just want to extract the key info, e.g. those with <h1>
, <h2>
or <b>
or <li>
HTML tags etc. But I don't want to extract the entire paragraph <p>
.
I know how to gather a list of website URLs from that google search; and I know how to web scrape individual website after looking at the page's HTML. I use the Request and BeautifulSoup for these tasks.
However, I want to know how can I extract key info from all these (100+ !) websites without having to look at their html one by one. Is there a way to automatically find out the HTML tags the website used to emphasize key messages? e.g. some websites may use <h1>
, while some may use <b>
, or something else...
All I can think of is to come up with a list of possible "emphasis-typed" HTML tags and then just use BeautifulSoup.find_all() to do a wide-scale extraction. But surely there must be an easier way?
from Recent Questions - Stack Overflow https://ift.tt/2PGeuIT
https://ift.tt/eA8V8J
Comments
Post a Comment