top of page
Datahut Blog
A blog for people & companies looking to make a big business impact with data acquired using web scraping and web crawling. Learn the best practices, business use cases, legality, and how you can do your job better with data.
Recommended Posts


Y Combinator 2025: How AI is Reshaping Startups and Markets
In 2025, over 72% of new startups in Y Combinator are powered by artificial intelligence , signaling a seismic shift in how technology is...
Aarathi J
Apr 96 min read


Why Every Amazon Seller Must Scrape Their Competitor’s Reviews
Monitoring your product’s reviews is incredibly useful to assess customer satisfaction and identifying areas of improvement.
Ashmi Subair
Mar 1111 min read


Scraping Decathlon using Playwright in Python
Decathlon is a rеnownеd sporting goods rеtailеr that offеrs a divеrsе rangе of products, including sports apparеl, shoеs and еquipmеnt....

Thasni M A
May 5, 202313 min read


How to Build an Amazon Price Tracker using Python
How to build an amazon price tracker Everybody loves to get their products on amazon at their lowest prices. I have a bucket list full of...

Tony Paul
Jul 22, 20228 min read


How Predictive Analytics is Transforming the Retail Industry
Remember that time when you were browsing through a blog online and that beautiful dress you have been coveting for months suddenly popped up in one of the corners of the screen? How could you possibility resist? You just had to purchase it! ‘What a coincidence!’, you would wonder. It’s actually smart business. Predictive analytics, the new holy-grail in town, is a rather clever way to boost your business. It analyses the data obtained from buyer preferences and past purchasi

Tony Paul
Jun 5, 20173 min read


Scrape The Internet To Get Training Data For Your Machine Learning Model
Have you seen the latest season of the Silicon Valley series ? In the series, Erlich Blachman is asking Jian Yang to scrape the internet to train his classifier for a food app. He literally asked Jian to do it however, Jian refused and his food discovery app idea was crushed. The moral of the story is, if you want data scraping to train your classifier, better contact someone like Datahut. Be smart and don’t be like Erlich Blachman. Machine learning is a member of the famil

Tony Paul
May 26, 20173 min read


Product Assortment For Retailers In Layman’s Terms
Product assortment is a topic that a lot of retailers don’t pay attention to. Here is a blog post that explains why it is relevant and how you can get it. The constant question that sticks in the mind of a retailer is that “how to determine the product to offer for sale?” The idea of keeping huge stocks of every product doesn’t work anymore, one needs to be smart with such choices now. You should know what the consumer wants? What factors will give you a competitive advantage
Jezeel MK
May 1, 20174 min read


Boost Your Online Retail Marketing ROI Through Competitive Intelligence
Who doesn’t love to play detective? The thrill of sneaking around in the enemy territory and getting what you need right under the enemy’s nose; there is a certain fun to it, isn’t it? Although it might be a little late to join secret services, spying on your competitor might not be the worst idea in the world, especially if you are trying to one-up them and boost your sales. One clever way to do it is through competitive intelligence. Boosting ROI is on the top of the priori
Jezeel MK
Mar 21, 20173 min read


How To Automate The Recruiting Of Tech Talent
Recruiting tech talent is a very tough job. Automating a major portion of the recruiting process can save HR teams ton of time and money. Have you ever pondered about the work of a Human Resource Manager? One of the most time consuming job that an HR goes through is the long, lengthy, extended task of selecting candidates. Today in this time and age when the process of searching a potential candidate for a job has become digital in its nature and form, it doesn’t necessarily

Tony Paul
Feb 3, 20172 min read


3 ways web scraping can help you make product launch successful
You might have seen the video of iPhone product launch by Steve Jobs. It was one of the most successful launches ever. The iPhone 3G sold over a million units on its launch weekend and iPhone 4 made a new record as 1.7 million units were sold within the first 3 days! Those certainly are some great numbers. There were a lot of mobile phones launched before and after iPhone, but have you ever thought why iPhone was more successful than the rest? That is because Steve Jobs had

Tony Paul
Jan 26, 20172 min read


Free Web Scraping And Free Web Crawling Is A Myth My Friend- Period!
You must have come across free web scraping tools or browser plugins. Have you ever thought how they are managing to give you these free web scraping tools for free when they have costs in developing, maintaining and running these things? Pricing is one of the most common objections you’ll hear from a potential buyer. People are naturally attracted to products that are cheaper. This psychology is used by companies to trick people into buying products that are priced cheaper b
Jezeel MK
Jan 25, 20172 min read


How to improve your online distribution strategy using product data feeds
The internet has changed the way people buy and sell things. For brands, this is an opportunity and a problem at the same time. The biggest problem of any brand is getting their product in the hands of users. Websites like Amazon & eBay helped solve this major distribution problem. However, it created some other problems for the brands. Here are some of those problems and how to solve them. How do you track sellers online? Brands are selling their products in multiple marke
Jezeel MK
Dec 21, 20162 min read


Most Of The Genuine Apple Products Sold Online Are Counterfeits – Says Apple
Nearly 90% of so called genuine Apple products sold online are fake. How do you identify counterfeits online using data and save the reputation of your brand? Apple says it has been buying Apple chargers and cables labeled as genuine on Amazon.com and has found most of them to be counterfeit. Apple filed a lawsuit against a New Jersey company on Monday over what Apple says are counterfeit chargers and cables for its products that were sold on Amazon. Some sources on the we

Tony Paul
Oct 27, 20162 min read


Beginner’s guide to Web Scraping with Python lxml
Web Scraping with Python is a popular subject around data science enthusiasts. Here is a piece of content aimed at beginners who want to learn Web Scraping with Python lxml library. What is lxml? lxml is the most feature-rich and easy-to-use library for processing XML and HTML in Python programming language. lxml is a reference to the XML toolkit in a pythonic way which is internally being bound with two specific libraries of C language, libxml2 , and libxslt . lxml is un

Tony Paul
Sep 7, 20166 min read


Follow Amazon to grow your online retail business
Amazon with a $358 billion market cap is one of the most successful online retailers. Online retailing business has huge potential, however it is also very competitive. A lot of companies came and vanished into thin air. Giants like Amazon and startups from the garage are fighting fiercely for the biggest piece of the market. A strategy powered by real and factual information about your competitors is the difference between your success and failure. Amazon is an example wi

Tony Paul
Aug 23, 20162 min read


How Xpath Plays Vital Role In Web Scraping
XPath is a language for finding information in structured documents like XML or HTML. You can say that XPath is (sort of) SQL for XML or HTML files. XPath is used to navigate through elements and attributes in an XML or HTML document. To understand XPath we must be clear about elements and nodes which are the building blocks of XML and HTML. Let’s talk about them. Here is an example element in an HTML document: <a class=”hyperlink” href=http://www.google.com>google</a>

Tony Paul
Aug 8, 20163 min read


Web Scraping Helps Real Estate Portals To Stay Ahead Of The Competitors
Buying a home is one of the decisions of a lifetime and people don’t take it lightly. According to a survey conducted by SurveyMonkey – 21% people made an offer on a house without ever having seen it in person . The real estate brokers should have an understanding of what people really want to grab the biggest slice of the pie. The real estate market is still growing and technology is disrupting this industry. Investors are pouring millions of dollars into real estate start

Tony Paul
Jan 20, 20162 min read


Market Research For Startups – A Data Driven Approach
When starting a technology company, understanding the needs of the market can be the difference between success and failure. The best way to understand whether you have a viable business idea is through market research. When it comes to market research for your product, most people trust their instinct rather the data. This happens because you are excited about the product. Under stress and excitement, the hormones will force you to trust the instinct rather than the logic. G
Jezeel MK
Dec 16, 20153 min read


Datahut is rolling out today with an important mission
Datahut is on a mission to make the web scraping painless, affordable and super comfortable. Our focus is primarily on non-technical customers or companies who do not know how to go about data extraction and processing. We will take care of all the technicalities involved and just deliver clean, structured and ready to use data. We want to democratize data and make it more accessible and useful to people. Any company, from big blue chip corporations to the tiniest start-up

Tony Paul
Jul 28, 20152 min read
GET CLEAN DATA FROM ANYWHERE HAND DELIVERED TO YOU
bottom of page