top of page
  • Writer's pictureAshwin Joseph

Web Scraping Zara: Extracting Product Data using Python & Selenium


Web Scraping Zara: Extracting Product Data using Python & Selenium
Web Scraping Zara: Extracting Product Data using Python & Selenium

In the ever-evolving world of fashion, staying updated with the latest trends is not just a passion—it's a necessity for many. And when we speak of trendsetting, Zara inevitably enters the conversation. A Spanish multinational retail clothing chain, this globally recognized brand has consistently kept fashionistas on their toes, eagerly awaiting its next collection.


But what if there was a way to analyze these trends systematically, ensuring that we're not just catching up, but also forecasting the next big thing? This is precisely what web scraping helps to do.


Zara holds a treasure trove of data that holds the key to understanding evolving fashion trends, consumer preferences, and market dynamics. This kind of information is important for making smart decisions.


In this blog, we will learn how to scrape Zara product data. We'll look into customer preferences, popular product choices, and price ranges for a particular category in Zara Women: Jackets.



The Attributes

We'll be extracting the following attributes from Zara's product pages:

  • product_url: It is the unique address of a jacket on the Zara website.

  • product_name: It specifies the name and model of the jacket.

  • mrp: It is the selling price of the jacket.

  • color: It is the color of the jacket.

  • description: It is a short detail about the jacket

Step1: Importing the Required Libraries

After identifying the attributes to be scrapped, we need to import the required libraries. Here, we will be using Selenium which is a tool used to automate browsers to scrape the Zara website. The libraries to be imported are:

  • Selenium web driver is a tool used for web automation. It allows a user to automate web browser actions such as clicking a button, filling in fields, and navigating to different websites.

  • By class from selenium.webdriver.common.by which is used to locate elements on a web page using different strategies like ID, class name, XPATH etc.

  • The writer class from csv library is used to read and write tabular data in CSV format.

  • The sleep function from the time library is used to provide a pause or delay in the execution of a program for a specified number of seconds.

# Importing the required libraries
from selenium import webdriver
from time import sleep
from csv import writer
from selenium.webdriver.common.by import By

Step2: Initialization Process

After importing the required libraries, we need to initialize a few things before we can start the actual scraping process. First, we initialize a web driver by creating an instance of the Chrome web driver using the ChromeDriver executable path. It is used to establish a connection with the web browser, here which is Google Chrome. Once initialized, a Chrome web browser will be opened and Zara's website is opened using the get() function so that Selenium can interact with it. The size of the window is maximized using the maximize_window() function.


# Specify the full path to the ChromeDriver executable
chrome_driver_path = r"C:\Users\Dell\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)

driver.get('https://www.zara.com/us/en/search?searchTerm=women%20jackets&section=WOMAN')
driver.maximize_window()

Step 3: Getting the Products’ Link

Zara’s website is a dynamically loaded website. It means that all the products will be loaded only upon scrolling the webpage. Initially, there will be only a few products. To scroll down a page, we first determine the initial height of the webpage and store it in a variable named ‘height’. Then we enter a loop and inside the loop we scroll to the bottom of the page using a JavaScript command, then pause for 5 seconds to allow content to load. The script calculates the new height of the page after scrolling and compares it to the initial height. If they match, it signifies that all content has been loaded, and the loop ends.


# Scrolling the web page
height = driver.execute_script("return document.body.scrollHeight")
whileTrue:
   driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
   sleep(5)
   new_height = driver.execute_script("return document.body.scrollHeight")
if height == new_height:
break
   height = new_height

After all the products are successfully loaded, we create an empty list to store the products’ link. The product elements are located on the web page using XPath and the find_elements() function is used to scrape the product elements. This function returns the product elements as a list. To get the actual product link from these elements, we will be calling get_attribute() method on each of these elements and extract the corresponding ‘href’ property and store it in the list we created before.


product_links = []

# Getting the product elements
page_product_links = driver.find_elements(By.XPATH, '//div[@class="product-grid-product__figure"]/a')

# Getting the product links
for product in page_product_links:
   product_link = product.get_attribute('href')
   product_links.append(product_link)

Step 4: Defining Functions

We will now define functions to extract each attribute.


# Extracting product name
defget_product_name():
try:
       product_name = driver.find_element(By.XPATH, '//h1[@class="product-detail-info__header-name"]').text
except Exception as e:
       product_name = "Not available"
return product_name

# Extracting product mrp
defget_mrp():
try:
       mrp = driver.find_element(By.XPATH, '//span[@class="money-amount__main"]').text
except Exception as e:
       mrp = "Not available"
return mrp

# Extracting product color
defget_color():
try:
       color = driver.find_element(By.XPATH, '//p[@class="product-color-extended-name product-detail-info__color"]').text
except Exception as e:
       color = "Not available"
return color

# Extracting product description
defget_desc():
try:
       desc = driver.find_element(By.XPATH, '//div[@class="expandable-text__inner-content"]/p').text
except Exception as e:
       desc = "Not available"
return desc

Step 5: Writing to a CSV File

The extracted data needs to be stored so that it can be further used for other purposes like analysis. Now we will see how to store the extracted data to a csv file.


First, we will open a file named “women_jacket_data.csv” in the write mode and initialize an object of the writer class named theWriter. The headings of different columns of the csv file are first initialized as a list and then written to the file using the writerow() function.


Now we will extract the information about each product. For this, we will iterate through each product link in the product_links and call the get() function and the functions defined earlier to extract the required attributes. The attribute values returned are first stored as a list and then written into the csv file using the writerow() function. After the process is completed, the quit() command is called which closes the web browser that the selenium web driver opened.


It can be noted that sleep() function is called in between different function calls. It is provided to avoid getting blocked by the website.


# Writing to a CSV File
with open('women_jacket_data.csv','w',newline='', encoding='utf-8') as f:
   theWriter = writer(f)
   heading = ['product_url', 'product_name', 'mrp', 'color', 'description']
   theWriter.writerow(heading)
for product in product_links:
       driver.get(product)
       sleep(5)
       product_name = get_product_name()
       sleep(3)
       mrp = get_mrp()
       sleep(3)
       color = get_color()
       sleep(3)
       desc = get_desc()
       sleep(3)
       record = [product, product_name, mrp, color, desc]
       theWriter.writerow(record)

driver.quit()

Final Thoughts

In the dynamic landscape of the fashion industry, understanding consumer preferences and emerging trends can provide a competitive edge to brands looking to build a strong footing in the industry. Through this guide, we've not only shown the technique of scraping Zara using Python and Selenium but also highlighted its potential to be adapted for various other product categories and e-commerce platforms.


While the techniques described above are excellent for data enthusiasts looking to perform small-scale extractions, larger projects demand dedicated solutions. That’s where Datahut web scraping services step in. At Datahut, we specialize in providing comprehensive web scraping services, assisting retailers in acquiring essential information seamlessly. By partnering with us, businesses can focus on interpretation and strategy, leaving the heavy lifting of data extraction to our experts.


Dive into the world of informed decision-making with Datahut and unlock the true potential of data in retail! Contact us today!


Recent Posts

See All

Comments


Do you want to offload the dull, complex, and labour-intensive web scraping task to an expert?

bottom of page