[SOLVED] Is scrapy.spider or crawler good fit for this task?


This Content is from Stack Overflow. Question asked by hareko

I am trying to scrape soccer players’ data using python’s Scrapy package. The website I’m scraping has the format

https://www.example.com/players — I’ll refer to it as “Homepage”

Here, there is a list of players playing in the league. To get to the data I’m looking for from the start url, I have to click the player’s name and it takes me to an overview page of that player. To get the data I want to scrape for the second player and so forth, I have to go back up to the Homepage and click the name of the second player and scrape the data > back up to the Homepage again and click the name of the third player and so on. So how should I go about doing this task? Should I use basic spider or crawlspider? How do I tell scrapy I want to go into a specific page (player’s overview page) and out to the Homepage where the list of all players exist so I’m able to go to the next player repeating the same process? Thank you in advance!


Assuming that the page isn’t rendered with javascript the scrapy would be a great tool.

I would suggest reading the installation docs and the tutorial to get a general understanding of how it works, where to begin and how to start a new project.

Here is an example of what your spider could look like:

import scrapy

class MySpider(scrapy.Spider):

    name = "myspider"
    start_urls = ["https://example.com/homepage"]

    def parse(self, response):
        for players_name in response.xpath_or_css_selector(some_selector_path_to_url).getall():
            yield scrapy.Request(url, callback=self.parse_player)

    def parse_player(self, response):
        # scrape the player data into a dictionary and then yield it as an item
        yield {player: data}

Installation docs

Scrapy Tutorial

This Question was asked in StackOverflow by hareko and Answered by Alexander It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?