Scraping data from https://ift.tt/Fq2rypk in Python

By Ritesh Sahu - February 25, 2023

I'm trying to follow along the steps from this article to scrape data from the transfermarkt website but I'm not getting the desired output. It seems some of the classes have changed since the article was written so I've had to change

Players = pageSoup.find_all("a", {"class": "spielprofil_tooltip"}) to

Players = pageSoup.find_all("td", {"class": "hauptlink"})

from bs4 import BeautifulSoup
import requests
import pandas as pd

headers = {'User-Agent': 
           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/Version 110.0.5481.100 Safari/537.36'}

page = "https://www.transfermarkt.co.uk/transfers/transferrekorde/statistik/top/plus/0/galerie/0?saison_id=2000"
pageTree = requests.get(page, headers=headers)
pageSoup = BeautifulSoup(pageTree.content, 'html.parser')

Players = pageSoup.find_all("td", {"class": "hauptlink"})
Values = pageSoup.find_all("td", {"class": "rechts hauptlink"})

PlayersList = []
ValuesList = []

for i in range(0,25):
    PlayersList.append(Players[i].text)
    ValuesList.append(Values[i].text)
    
df = pd.DataFrame({"Players":PlayersList,"Values":ValuesList})

df.head(10)

The problem with this is it finds other classes of this type and adds them to the Players variable, eg Players[0].text returns '\nLuís Figo ' and Players[1].text returns '\nReal Madrid' because team names are also the same class as Player names. How can I select the first hauptlink class or somehow differentiate which one I want if they are the same?

Search This Blog

Theprogrammersfirst | A technical portal.

Scraping data from https://ift.tt/Fq2rypk in Python

Comments

Post a Comment

Popular posts from this blog

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation

Today Walkin 14th-Sept