top of page
  • Writer's pictureBhagyeshwari Chauhan

PIM and Web Scraping: How Bad data is destroying Online Retail

Updated: Feb 3, 2021

According to B2X, 80% of companies are not confident in their data.

The idea of eCommerce Inventory management or E-commerce catalogue management is not as simple as it seems. In theory, you just have to maintain a single catalogue of all your eCommerce products. But, it’s actually a cumbersome task for retailers – which is costing them a fortune.

Good content is instrumental in driving traffic to a retail website and influence purchase decisions, Therefore, having the wrong product information can hurt retailers & marketplaces big time.  The smallest of errors and gaps in product data can disrupt smooth retail operations or silently eat away your market share.

Unlike online marketplaces, traditional retailers have a direct relationship with suppliers. The supplier provides data to the retailer in the form of a spreadsheet. However, in most cases, even a somewhat tech savvy internet retailer has had wrong product information. Some studies show that over 75% of internet retailers have this problem. 

Informatica, a leader in PIM, defines Product information management (PIM) as the software-based orchestration of data dissemination related to a business’s products and its suppliers’ products. PIM systems keep sales, marketing, merchandising and other stakeholders on the same page when a supplier changes or updates a product data. 

What Problems does a PIM system solve?

If you want to understand the importance of a clean PIM system, you should understand the primary purpose of a PIM system. A PIM process works like this:

  1. The supplier usually sells to multiple retailers, the content and format required for each retailer would be different. The supplier will be having the product data in a single format usually the one that is of their biggest customer.

  2. The supplier will update product information on a regular basis, you have a lot of changes to make in different places unless there is a clean process in place.

  3. Sometimes supplier data might need to be enriched and formatted before putting on the website with extra attributes and information.

  4. Retailers often have large teams who spend hours updating and publishing product data. If all the team members are not on the page, the data can get messy. This can hurt both you and your customers.

  5. The supplier can get you the wrong information, you need a solution to verify it.

How Datahut solved a PIM software problem?

One of our customers came to us with a unique problem. They were having a lot of product returns because of wrong product information displayed on their website. As a result, the customers did not receive the products they’d ordered. The firm has a costly PIM software in place, hence this happened to be a problem of suppliers. 

The problem they were facing was that the information their suppliers were providing was inaccurate and was affecting the accuracy of their inventory. As a result, their content team had to put in a lot of hours to fix this. The firm was unaware of the data discrepancy and mostly looked at product return information to locate the errors.

This hugely affected the firm in the following manner:

  1. Large number of product returns: The firm witnessed a staggering amount of product returns. The main reason was the difference in the specifications of the product on the website and the product delivered to the customer. This incoherence arose due to an error in the spreadsheet provided by the supplier.

  2. The content team was working overtime to fix things: As the firm brought on-board new vendors, the errors accumulated, forcing the content team to work extra hours to fix the discrepancies. In one month there were around 1700 errors from the suppliers, and finding and fixing it was becoming a cumbersome task for the content team.

  3.  Delay in new launches: New product launches were getting postponed because of the existing content problem.

  4. High Cart abandonment rate: The cart abandonment rate was high and they were not able to find the root cause.

  5. An onslaught of negative reviews: The site received numerous negative reviews as the customers were getting frustrated with data discrepancies resulting in wrong purchases. 

How Datahut solved the problem

The PIM process was being managed by a large IT consulting firm and we had to work with them to solve the issue. We setup data extractors and started extracting data from the websites in the same structures as their suppliers, that means 43 different structures of data and 43 extractors ready to deploy.

The suppliers were giving data in files at different times and in different formats. Initially, it was through email, which later the retailer asked to be provided in their preferred format. We had to initiate scraping the data as soon as the supplier provided it via a Dropbox upload.

PIM and Web Scraping: How Bad data is destroying Online Retail

Datahut’s solution lifecycle

We, then helped the client build a solution to compare the supplier files with the extracted data following which, we discovered the differences in both forms of data. With the discrepancies in hand, the retailer was able to ask the supplier specific questions and the data discrepancy was fixed quickly. Once the data issues are resolved, it was pushed to the PIM and the consulting partner of the company took things from there.

The window of finding the problem was reduced from days and sometimes weeks to less than an hour.


Within 30 days of implementing our solution, they were able to see significant results.

  1. The return rate was reduced significantly: Updated product information not only lead to better sales but also resulted in fewer returns.  Our client saw a significant drop in the number of returned products. In industries like fashion where returns are more than 20%, data errors could be catastrophic.

  2. The content team was able to reduce their manual effort: It is estimated that 25 minutes are spent on manual data synchronization per SKU per year. With the help of a PIM software coupled with a web scraping solution, you can get it under five minutes per SKU per year. That’s a lot of hours saved. 

  3. Data accuracy was drastically improved: Reliable product information leads to increased sales and fewer headaches.  

  4. Cart abandonment was significantly reduced: The client witnessed significantly reduced instances of cart abandonment.

Retailers have a content problem that is beyond their PIM software. This calls for data extraction via web scraping to fight retail data errors. Are you a retailer who is a victim of bad product data despite a dedicated PIM system in place? Get in touch, we should talk.

211 views0 comments


Do you want to offload the dull, complex, and labour-intensive web scraping task to an expert?

bottom of page