[SOLVED] Oh no – Scrapy CSS selector used several times on a product detail page?


This Content is from Stack Overflow. Question asked by Legion Inc.

I am trying to scrape products (not something surprising) – but honestly, defining the CSS selector for the product descriptions that works on any product page gives me a headache.

I look for the selector that defines the product description from the following link:


The selector is:

#inner > div > div.col-lg-12-full.col-md-12-full > div:nth-child(1) > div:nth-child(12)

Alternatively, the selector can be:


But sometimes the selector changes:


Here is the selector:

#inner > div > div.col-lg-12-full.col-md-12-full > div:nth-child(1) > div:nth-child(11)

Alternatively, the selector can be:


When I look at the source code, the section of product description is defined with


But it’s too general and used often in the source code for other sections too.

I can’t figure out how to solute this problem.

My spider runs correctly, but from product to product i get empty descriptions (cause of my described issue).

def parse_product(self, response):
  for product in response.css("body"):
     yield {
     "brand": product.css('div.pd_inforow:nth-of-type(4) span::text').extract(),
     "item_name": product.css("h1::text').extract(),
     "description": product.css('#inner > div > div.col-lg-12-full.col-md-12-full > div:nth-child(1) > div:nth-child(12)').extract_first

Why don’t I match the product description with a CSS selector on all pages?


Using XPath selector (get div with class equal to pd_description that contains h4 with text Produktbeschreibung):


This Question was asked in StackOverflow by Legion Inc. and Answered by gangabass It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?