top of page
  • Writer's pictureTony Paul

How to Build an Amazon Price Tracker using Python

Updated: Nov 6, 2023


How to Build an Amazon Price Tracker using Python
How to build an amazon price tracker

Everybody loves to get their products on amazon at their lowest prices. I have a bucket list full of electronic gadgets that I am waiting to buy at the right price. Price wars between e-commerce marketplaces are forcing online retailers to change their prices frequently.


The intelligent thing would be to know when the price of an item drops and then buy that item immediately. How do I know if the price of an item on my bucket list is dropped? There are commercial Amazon price tracker software available as chrome extensions. But why pay when you can get the price drop alerts for free.


This is time to give my programming skills some workout. The goal is to track the prices of the products on my bucket list using programming. If there is a price drop - the link will be sent to me via SMS. Let's build ourselves an amazon price tracker. We will build a basic price tracking tool to experiment with.



The Process

  1. In this blog, we will build a web scraper from scratch using python to build a master file containing the product name, Product prices, and URL.

  2. We will build another web scraper that checks the prices every hour and compares them against the master file. This web scraper will also be built with python and will check for a price drop.

  3. Sellers on Amazon automate pricing. We expect at least one of our bucket list items will have a price drop. The script will send me a Price alert SMS if there is a significant price drop (say more than 10%).


How to build an Amazon web scraper in python


We are going to start with the attributes we need to extract. To build a master list, we will use python requests, BeautifulSoup, and lxml. The data writing will be using csv library.


Attributes we will be scraping from Amazon.

We will scrape only two items from an Amazon page for the master list, price, and product name. Note that the price is the sale price, not the listing price.


Importing the libraries

import requests
from bs4 import BeautifulSoup
from lxml import etree as et
import time
import random
import csv

Adding a header to the code

Websites, especially amazon, hate web scrapers or bots that access amazon data programmatically. Amazon has a heavy anti-scraping mechanism to detect and block web scrapers. The best way to get around this for our case is to have headers.

Headers are a vital part of every HTTP request as it provides essential meta information about incoming requests to the target website. We inspected the headers using Postman and defined our header as below.


header = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
    'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}

Building my bucket list.

The next step is to add the bucket list for processing. In my case, I have five items on my bucket list, and I have added them to the program as a list. You can add this to a text file and read it using python and process the data. A python list is enough for price tracking of a small bucket list, but a file would be the best choice if you have an extensive list.

We will be tracking only pricing and product name from Amazon.



bucket_list = ['https://www.amazon.in/Garmin-010-02064-00-Instinct-Monitoring-Graphite/dp/B07HYX9P88/',
               'https://www.amazon.in/Rockerz-370-Headphone-Bluetooth-Lightweight/dp/B0856HRTJG/',
               'https://www.amazon.in/Logitech-MK215-Wireless-Keyboard-Mouse/dp/B012MQS060/',
               'https://www.amazon.in/Logitech-G512-Mechanical-Keyboard-Black/dp/B07BVCSRXL/',
               'https://www.amazon.in/BenQ-inch-Bezel-Monitor-Built/dp/B073NTCT4R/'
               ]

Extracting Pricing and Product name from Amazon

We will define two functions that return the price when they're called. We are using Python BeautifulSoup and lxml libraries to extract the pricing information. Locating the elements on the web page is achieved using Xpaths.


See the image below. You open chrome developer tools and select the pricing. The pricing is available in a class "a-offscreen" inside a span. We write the Xpaths to locate the data and test it using the chrome developer tools.


How to take xpath for the python script
Taking xpath for the script


We need to extract the price data and compare it with the master data to see if there is a price drop. We need to apply a few string manipulation techniques to get data in the desired form.


def get_amazon_price(dom):

    try:
        price = dom.xpath('//span[@class="a-offscreen"]/text()')[0]
        price = price.replace(',', '').replace('₹', '').replace('.00', '')
        return int(price)
    except Exception as e:
        price = 'Not Available'
        return None


def get_product_name(dom):
    try:
        name = dom.xpath('//span[@id="productTitle"]/text()')
        [name.strip() for name in name]
        return name[0]
    except Exception as e:
        name = 'Not Available'
        return None

Building the master file by writing the data

We use the pythons csv module to write the scraped data to the master file. The code is shown below.


Few ideas to note.

  1. The master file has three columns, product name, price, and the product URL

  2. We iterate through the bucket list and parse information from each URL

  3. We also add a random time delay, giving a helpful gap between each request.

When you run the code snippets above, you'll be able to see a csv file named master_data.csv generated. You need to run this program only once.



Building the Amazon price tracker tool


We have the master data to compare the fresh scraping with. So let's begin writing the second script that extracts data from Amazon and compares it with the data on the master file.


Importing the required libraries

For the tracker script - we need to import two additional libraries, the panda's library and the Twilio library.


import requests
from bs4 import BeautifulSoup
from lxml import etree as et
import pandas as pd
from twilio.rest import Client
import sys

Pandas

Pandas is an open-source python library for data analysis and data manipulation. The package is known for a handy data structure called the pandas DataFrame. Pandas also allow Python developers to quickly deal with tabular data (like spreadsheets) within a Python script. Pandas is a must-learn library if you're planning to build a career in data science.


Twilio

Twilio APIs make it easy to programmatically send SMS notifications. We choose Twilio because it gives free credits, which is enough for us.


Starting the data extraction


We will reuse many of the functions defined above to accomplish the task. The additional function we add is to get the price in the master file corresponding to the URL under scraping.


def get_master_price(url):
    for row in df.itertuples():
        if row.url == url:
            return row.price
    return None  

We also define two lists for storing products with a price drop. We will be storing their URL and name.


price_drop_products = []
price_drop_list_url = []

Starting to check the price drops on amazon

We will go through each page, get the current price, compare it against the file in the master data and see if there is a price change of more than 10%. If there is a price change of more than 10%. We will add the products to the lists defined above.


for product_url in amazon_urls:

    response = requests.get(product_url, headers=header)
    soup = BeautifulSoup(response.content, 'html.parser')
    main_dom = et.HTML(str(soup))

    price = get_amazon_price(main_dom)
    product_name = get_product_name(main_dom)
    df = pd.read_csv('new_master_Data.csv')

    if price < get_master_price(product_url):
        change_percentage = round((get_master_price(product_url) - price) * 100 / get_master_price(product_url))

        if change_percentage > 10:
            print(' There is a {}'.format(change_percentage), '% drop in price for {}'.format(product_name))
            print('Click here to purchase {}'.format(product_url))
            price_drop_products.append(product_name)
            price_drop_list_url.append(product_url)

If there is no price drop - we need the program to exit, and we don't need to invoke the Twilio API.



if len(price_drop_products) == 0:
    sys.exit('No Price drop found')

But if there is a change, we need to invoke the Twilio API and send a message. The first thing is to define a message body. This will be the content of our SMS.


messege = "There is a drop in price for {}".format(len(price_drop_products)) + " products." + "Click to purchase"

for items in price_drop_list_url:
    messege = messege + "\n" + items

This is what I wrote for my message body, you can use a different message - it is totally customizable. The next step would be signing up for Twilio and getting the Account SID and auth Token. When you signup and log in to the console. This is how it will be


Twilio integration with Amazon Price Tracker
Twillio integration

account_sid = 'Add your Account sid here'
auth_token = 'Add your auth token here'

client = Client(account_sid, auth_token)
message = client.messages.create(
    from_='Your twilio phone number',
    body=messege,
    to='Your personal mobile number'
)

Automating the scraper to run every hour

Since I have a full-time job, manually running the program every two hours is not something I'm not cool with. What I want to do is to schedule the program to run every hour.


We can use the python schedule library to do just that. Let us see the code snippet below.


import schedule
import time
from os import system

def job():
    system("python3 Amazon_price_tracker.py")


schedule.every(1).hours.do(job)

while True:
    schedule.run_pending()
    time.sleep(1)

I can just run the script when I start working in the morning, and the scheduling module will run the Amazon price tracking program for me every hour.


Testing the program

Manually change the price values on the master data file and run the tracker program. You'll see the SMS coming in. If not, some is debugging to do.


I manually changed the master file, and here is the SMS I received.



Twilio price tracker sms
sms from Twilio



Download the source code

The code to extract master data

amazon_master_data_extractor
.ipynb
Download IPYNB • 3KB

The code to check for prices and send SMS via Twilio.



Amazon_price_tracker
.ipynb
Download IPYNB • 4KB

The code for scheduling the price tracking software to run every hour


price_tracker_scheduler (1)
.ipynb
Download IPYNB • 853B

Conclusion

This is a good hobby project for those who are learning to program. However, if you have a lot of products to track from Amazon - this script might not work. At scale, amazon data extraction requires IP rotators and a few other techniques to get data. In that case, you need experts like Datahut to get the data for you, contact Datahut today using the chat box on the right side.



13,021 views2 comments

Do you want to offload the dull, complex, and labour-intensive web scraping task to an expert?

bottom of page