Everybody loves to get their products on amazon at their lowest prices. I have a bucket list full of electronic gadgets that I am waiting to buy at the right price. Price wars between e-commerce marketplaces are forcing online retailers to change their prices frequently.
The intelligent thing would be to know when the price of an item drops and then buy that item immediately. How do I know if the price of an item on my bucket list is dropped? There are commercial Amazon price tracker software available as chrome extensions. But why pay when you can get the price drop alerts for free.
This is time to give my programming skills some workout. The goal is to track the prices of the products on my bucket list using programming. If there is a price drop - the link will be sent to me via SMS. Let's build ourselves an amazon price tracker. We will build a basic price tracking tool to experiment with.
The Process
In this blog, we will build a web scraper from scratch using python to build a master file containing the product name, Product prices, and URL.
We will build another web scraper that checks the prices every hour and compares them against the master file. This web scraper will also be built with python and will check for a price drop.
Sellers on Amazon automate pricing. We expect at least one of our bucket list items will have a price drop. The script will send me a Price alert SMS if there is a significant price drop (say more than 10%).
How to build an Amazon web scraper in python
We are going to start with the attributes we need to extract. To build a master list, we will use python requests, BeautifulSoup, and lxml. The data writing will be using csv library.
Attributes we will be scraping from Amazon.
We will scrape only two items from an Amazon page for the master list, price, and product name. Note that the price is the sale price, not the listing price.
Importing the libraries
import requests
from bs4 import BeautifulSoup
from lxml import etree as et
import time
import random
import csv
Adding a header to the code
Websites, especially amazon, hate web scrapers or bots that access amazon data programmatically. Amazon has a heavy anti-scraping mechanism to detect and block web scrapers. The best way to get around this for our case is to have headers.
Headers are a vital part of every HTTP request as it provides essential meta information about incoming requests to the target website. We inspected the headers using Postman and defined our header as below.
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}
Building my bucket list.
The next step is to add the bucket list for processing. In my case, I have five items on my bucket list, and I have added them to the program as a list. You can add this to a text file and read it using python and process the data. A python list is enough for price tracking of a small bucket list, but a file would be the best choice if you have an extensive list.
We will be tracking only pricing and product name from Amazon.
bucket_list = ['https://www.amazon.in/Garmin-010-02064-00-Instinct-Monitoring-Graphite/dp/B07HYX9P88/',
'https://www.amazon.in/Rockerz-370-Headphone-Bluetooth-Lightweight/dp/B0856HRTJG/',
'https://www.amazon.in/Logitech-MK215-Wireless-Keyboard-Mouse/dp/B012MQS060/',
'https://www.amazon.in/Logitech-G512-Mechanical-Keyboard-Black/dp/B07BVCSRXL/',
'https://www.amazon.in/BenQ-inch-Bezel-Monitor-Built/dp/B073NTCT4R/'
]
Extracting Pricing and Product name from Amazon
We will define two functions that return the price when they're called. We are using Python BeautifulSoup and lxml libraries to extract the pricing information. Locating the elements on the web page is achieved using Xpaths.
See the image below. You open chrome developer tools and select the pricing. The pricing is available in a class "a-offscreen" inside a span. We write the Xpaths to locate the data and test it using the chrome developer tools.
We need to extract the price data and compare it with the master data to see if there is a price drop. We need to apply a few string manipulation techniques to get data in the desired form.
def get_amazon_price(dom):
try:
price = dom.xpath('//span[@class="a-offscreen"]/text()')[0]
price = price.replace(',', '').replace('₹', '').replace('.00', '')
return int(price)
except Exception as e:
price = 'Not Available'
return None
def get_product_name(dom):
try:
name = dom.xpath('//span[@id="productTitle"]/text()')
[name.strip() for name in name]
return name[0]
except Exception as e:
name = 'Not Available'
return None
Building the master file by writing the data
We use the pythons csv module to write the scraped data to the master file. The code is shown below.
Few ideas to note.
The master file has three columns, product name, price, and the product URL
We iterate through the bucket list and parse information from each URL
We also add a random time delay, giving a helpful gap between each request.
When you run the code snippets above, you'll be able to see a csv file named master_data.csv generated. You need to run this program only once.
Also Read: 7 eCommerce Data Sources You Must Scrape
Building the Amazon price tracker tool
We have the master data to compare the fresh scraping with. So let's begin writing the second script that extracts data from Amazon and compares it with the data on the master file.
Importing the required libraries
For the tracker script - we need to import two additional libraries, the panda's library and the Twilio library.
import requests
from bs4 import BeautifulSoup
from lxml import etree as et
import pandas as pd
from twilio.rest import Client
import sys
Pandas
Pandas is an open-source python library for data analysis and data manipulation. The package is known for a handy data structure called the pandas DataFrame. Pandas also allow Python developers to quickly deal with tabular data (like spreadsheets) within a Python script. Pandas is a must-learn library if you're planning to build a career in data science.
Twilio
Twilio APIs make it easy to programmatically send SMS notifications. We choose Twilio because it gives free credits, which is enough for us.
Starting the data extraction
We will reuse many of the functions defined above to accomplish the task. The additional function we add is to get the price in the master file corresponding to the URL under scraping.
def get_master_price(url):
for row in df.itertuples():
if row.url == url:
return row.price
return None
We also define two lists for storing products with a price drop. We will be storing their URL and name.
price_drop_products = []
price_drop_list_url = []
Starting to check the price drops on amazon
We will go through each page, get the current price, compare it against the file in the master data and see if there is a price change of more than 10%. If there is a price change of more than 10%. We will add the products to the lists defined above.
for product_url in amazon_urls:
response = requests.get(product_url, headers=header)
soup = BeautifulSoup(response.content, 'html.parser')
main_dom = et.HTML(str(soup))
price = get_amazon_price(main_dom)
product_name = get_product_name(main_dom)
df = pd.read_csv('new_master_Data.csv')
if price < get_master_price(product_url):
change_percentage = round((get_master_price(product_url) - price) * 100 / get_master_price(product_url))
if change_percentage > 10:
print(' There is a {}'.format(change_percentage), '% drop in price for {}'.format(product_name))
print('Click here to purchase {}'.format(product_url))
price_drop_products.append(product_name)
price_drop_list_url.append(product_url)
If there is no price drop - we need the program to exit, and we don't need to invoke the Twilio API.
if len(price_drop_products) == 0:
sys.exit('No Price drop found')
But if there is a change, we need to invoke the Twilio API and send a message. The first thing is to define a message body. This will be the content of our SMS.
messege = "There is a drop in price for {}".format(len(price_drop_products)) + " products." + "Click to purchase"
for items in price_drop_list_url:
messege = messege + "\n" + items
This is what I wrote for my message body, you can use a different message - it is totally customizable. The next step would be signing up for Twilio and getting the Account SID and auth Token. When you signup and log in to the console. This is how it will be
account_sid = 'Add your Account sid here'
auth_token = 'Add your auth token here'
client = Client(account_sid, auth_token)
message = client.messages.create(
from_='Your twilio phone number',
body=messege,
to='Your personal mobile number'
)
Automating the scraper to run every hour
Since I have a full-time job, manually running the program every two hours is not something I'm not cool with. What I want to do is to schedule the program to run every hour.
We can use the python schedule library to do just that. Let us see the code snippet below.
import schedule
import time
from os import system
def job():
system("python3 Amazon_price_tracker.py")
schedule.every(1).hours.do(job)
while True:
schedule.run_pending()
time.sleep(1)
I can just run the script when I start working in the morning, and the scheduling module will run the Amazon price tracking program for me every hour.
Testing the program
Manually change the price values on the master data file and run the tracker program. You'll see the SMS coming in. If not, some is debugging to do.
I manually changed the master file, and here is the SMS I received.
Download the source code
The code to extract master data
The code to check for prices and send SMS via Twilio.
The code for scheduling the price tracking software to run every hour
Conclusion
This is a good hobby project for those who are learning to program. However, if you have a lot of products to track from Amazon - this script might not work. At scale, amazon data extraction requires IP rotators and a few other techniques to get data. In that case, you need experts like Datahut to get the data for you, contact Datahut today using the chat box on the right side.
Also Read: Web Scraping in Python – How to Scrape an eCommerce Website with 20 lines of Python code
Comments