2020-12-29

Using Scrapy to iterate through Boxscore links on footballdb

I need to iterate through all the boxscore links with scrapy and then extract the passing,rushing, and receiving tables from each of the boxscores to create a dataset. Main problem is my code returns nothing when I run it.

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class Nfl20Spider(CrawlSpider):
    name = 'nfl20'
    allowed_domains = ['www.footballdb.com']
    start_urls = ['http://www.footballdb.com/games']
#fixed to iterate through all box scores
    rules = (
        Rule(LinkExtractor(restrict_xpaths='.//table/tbody/tr[1]/td[7]/a'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        item = {}
        #table of stats.
#need to fix so that it only prints out the text and not the html elements.
        item['table'] = response.xpath('//table/tbody').extract_first()
        print(item['table'])
        yield item

Was able to get it to iterate and save to a file, but I wasn't able to limit it to just the boxscores and it is printing out the html tags. Need help with cleaning it up so that it only extracts the text and only goes to the boxscore links. Thanks for any help.



from Recent Questions - Stack Overflow https://ift.tt/3aPz3eE
https://ift.tt/eA8V8J

No comments:

Post a Comment