Web Scraping with Pandas: How to Scrape Monkeypox data
When extracting web data, there are cases when you need to extract tabular information from websites. The usual way to go is to write a web scraper using Python Beautifulsoup, Selenium, Scrapy, or something else.
However, there is an easy way to get tabular information out of web pages just using Pandas and in less than a minute & five lines of Python code.
In our example, we will work with the monkeypox data available here. Fortunately, there is only one table on the website containing the monkeypox infection data. A point to note is our method will work regardless of how many tables are on the page.
Step 1 - Install pandas
If you haven't already installed pandas - install it using the command below in your terminal.
pip install pandas
Step 2 Let's start scraping.
The code below scrapes data from the page into a CSV file. See the explanation below to understand how the code works.
import pandas as pd url = 'https://www.monkeypox.global.health/' df_list = pd.read_html(url) monkeypox = df_list monkeypox.to_csv('monkeypox.csv')
In the first line - we have imported the pandas library. In the following line, we are telling the scraper - the data or the table we want to scrape is at the URL 'https://www.monkeypox.global.health/'
The following line is the most important. We're telling the pandas library to use the read_html function to get all the tables on the web page. read_html() returns a list containing data frames of all the available tables on the page.
There is only one table in our case; the first element on the list will contain the monkeypox data. We access it using the index using the following link of code.
monkeypox = df_list
The next step is to convert the data into a CSV file, and we use the to_csv function to convert the monkeypox data frame to a CSV file.
That's it. That is how we extract tabular data from a web page into a CSV with just five lines of code and under a minute.
In this guide, we've covered how to use Python and pandas to scrape monkeypox data from the Global Health website. We've shown you how to get started with web scraping, what tools to use, how to format your code and extract the right information.
Web scraping is a great way to save time and money in your business by automating tasks that would otherwise take hours or days to complete manually. With Datahut's expertise in web scraping at massive scales, we can help you get up and running with the best web scraping solution for your needs.
Contact Datahut for your web scraping needs today!