RSelenium Tutorial: A Tutorial to Basic Web Scraping With RSelenium

Scraping data from the web is a common tool for data analysis. In fact, it is very creative and ensures a unique data set that no one else has analysed before. Often times, we can use packages such as rvest, scrapeR, or Rcrawler to get the job done. However, sometimes we want to scrape dynamic web pages which can only be scraped with RSelenium. This RSelenium tutorial will intrduce you to how web scraping works with the R package.

RSelenium automates a web browser and let’s us scrape content that is dynamically altered by javascript for example.

In this RSelenium tutorial, we will be going over two examples of how it can be used.

  • For example #1, we want to get some latitude and longitude coordinates for some street addresses we have in our data set. In order to do that, we have to let RSelenium type in our addresses, hit the enter button, and then scrape the latitude and longitude coordinates from the website.
  • For example #2, we are doing something similar with postal codes.

Let’s jump into our examples and this RSelenium tutorial!

Example #1

Step 1: Navigate to the URL

For the first example, we are going to visit https://www.latlong.net/.

RSelenium latitude longitude

In the picture above, we can see the text box Place Name , where we are going to let RSelenium type in our street addresses. Afterwards, we have to let RSelenium click the Find button and then we have to scrape the results that will appear in the Latitude and Longitude boxes.

Step 2: Let RSelenium Type in the Necessary Fields

First, we have to load the library. Then we are connecting to the Chrome driver and navigate to the desired URL we want to scrape data from.

Now, we have to have a look in what location the Place Name box is located in the html code.

place-name-box

When looking at the html code, then we can see that the box is located in this snippet above with the xpath @class = “width70”. So, the code below shows how to navigate to that particular text box.

Now, we have to let RSelenium type in the address we want to get coordinates for.

We are almost done. Now we have to press the Find button in order to get the coordinates.

find-button RSelenium

In the code below, we are using the xpath @class = “button”, to locate the button.

After we have located the button, we have to click it.

Step 3: Scrape the Coordinates From the Website

When we scroll down to then we see the coordinates like this:

RSelenium

They are located here in the html code:

RSelenium

Under the xpath @class = “coordinatetxt”.

When we have a lot of addresses we want to get coordinates for, then this could be accomplished like that:

After, we can extract the latitude and logitude values with the code below

Let’s jump to the next example of this RSelenium tutorial.

Example #2

Step 1: Navigate to the URL

As previously, we want to go to the website where we want to scrape data from. In our second example, we will be using the https://www.canadapost.ca/cpo/mc/personal/postalcode/fpc.jsf# url.

RSelenium

Again, we can see the box where we have to enter our address and the search button we have click after we inserted our address.

Step 2: Let RSelenium Type in the Necessary Fields

First, we have to navigate to the desired url.

Then, we have to tell RSelenium to put in the desired address in the box. We do that, by locating where the box lies in the html code.

RSelenium

The xpath is undelined in green. The code to put text in the text box looks like this:

Now, we have to locate the Search button in order to get the postal code for the address.

RSelenium

 

The xpath is underlined in green.

To click to search button, we have to execute the following code:

After that, we only have to extract the desired information and we are done!

Step 3: Scrape the Postal Code From the Website

RSelenium

In order to get the address we have do the following:

RSelenium tutorial

To only get the postal code, we can simply do:

I hope you have enjoyed this short RSelenium tutorial about web scraping. If you have any questions or suggestions then let me know in the comments below.

Additional Resources

Leave a Reply

Your email address will not be published. Required fields are marked *