2023-02-14

Parsing HTML data with BeautifulSoup - cannot extract the 'href' out in one string

I'm trying to parse out the html to get the - 'href' link; My code is parsing the 'href link' into separate string, but I'm hoping to get a complete string.

Here is my code:

data = requests.get("https://www.chewy.com/b/food_c332_p2", 
                    auth = ('user', 'pass'), 
                    headers = {'User-Agent': user_agent})

with open("dogfoodpage/dg2.html","w+") as f:
    f.write(data.text)

with open("dogfoodpage/dg2.html") as f:
    page = f.read()
    soup = BeautifulSoup(page,"html.parser")
     
test = soup.find('a',class_= "kib-product-title")

productlink = []

for items in test:
   for link in items.get("href"):
       productlink.append(link)

Here is my output:

output of my code

Here is the html structure for test:

html for test



No comments:

Post a Comment