What is Data Scraping using selenium?

Selenium can be classified as the automation tool that facilitates scraping of information from the HTML web pages to perform web scraping utilizing google chrome.

In this tutorial, you will learn:

How to prepare Excel Macro before performing Data Scraping using Selenium?

There are certain prerequisites that has to be performed on the excel macro file before getting into the process of data scraping in excel.

These prerequisites are as follows: -

Step 1) Open an Excel-based Macro and access the developer option of excel.

Step 2) Select Visual Basic option under Developer ribbon.

Step 3) Insert a new module.

Step 4) Initialize a new subroutine and name it as test2.

Sub test2()
End sub

Following would be the results in the module: -

Step 5) Access the reference option under the tool tab and reference Selenium type library. The following libraries are to be referenced to the module as it helps in opening google chrome and facilitates the development of macro scripting.

Now the Excel file is ready to interact with the internet explorer. Next steps would be to incorporate a macro script that would facilitate data scraping in HTML.

How to Open Google Chrome using VBA?

Here, are step to open Google Chrome using VBA

Step 1) Declare and initialize the variables in the subroutine as displayed below

Sub test2()
Dim driver as new webdriver
Dim rowc, cc, columnC as integer

Step 2) To open google chrome using selenium and VBA, write driver.start "chrome" and press F5.

The following would be the code.

Sub test2()
Dim driver as new webdriver
Dim rowc, cc, columnC as integer
Driver.start "Chrome"
Application.Wait Now+Timevalue("00:00:20")
End sub

The module would result as follows: -

How to Open Website in Google chrome using VBA?

Once you are able to access the google chrome using VBA, the next step would be to incorporate the accessing of a website using VBA. This facilitated by get function wherein the URL has to pass as double quotes in the attribute.

Follow the following steps as displayed

The module would look as follows: -

Press F5 to execute the macro.

The following webpage would be opened in google chrome as displayed

Sub test2()
Dim driver as new webdriver
Dim rowc, cc, columnC as integer
Driver.start "Chrome"
Driver.get "http://demo.guru99.com/test/web-table-element.php"
Application.Wait Now+Timevalue("00:00:20")
End sub

Now the excel macro is ready with respect to performing the scraping tasks. The next step would display how the information can be extracted by applying selenium and VBA.

How to Scrape information from Website using VBA?

Suppose the day trader wants to access the data from the website on a daily basis. Each time the day trader presses the click the button, it should auto pull the market data into excel.

From the above website, it would be necessary to inspect an element and observe how the data is structured. Access the below source code of HTML by pressing control + Shift + I

<table class="datatable">
<thead>
<tr>
<th>Company</th>
<th>Group</th>
<th>Pre Close (Rs)</th>
<th>Current Price (Rs)</th>
<th>% Change</th>
</tr>

The source code would be as follows: -

As it can be seen that the data is structured as a single HTML Table. Therefore, in order to pull entire data from the HTML table, it would require designing of macro which pulls the header information of the HTML table and the corresponding data associated with the table. Perform the following tasks as displayed: -

Step 1) Formulate a for loop that runs through the HTML header information as a collection. The selenium driver has to find the header information of the HTML table. To do this, we utilize the FindElementByClass() and FindElementByTag() method to perform the task as displayed

The VBA module would look as follows: -

Sub test2()
Dim driver As New WebDriver
Dim rowc, cc, columnC As Integer
rowc = 2
Application.ScreenUpdating = False
driver.Start "chrome"
driver.Get "http://demo.guru99.com/test/web-table-element.php"
For Each th In driver.FindElementByClass("dataTable").FindElementByTag("thead").FindElementsByTag("tr")
cc = 1
For Each t In th.FindElementsByTag("th")
Sheet2.Cells(1, cc).Value = t.Text
cc = cc + 1
Next t
Next th

Step 2) Next, the selenium driver would locate the table data using the similar approach, as mentioned above. You have to write the following code: -

Sub test2()
Dim driver As New WebDriver
Dim rowc, cc, columnC As Integer
rowc = 2
Application.ScreenUpdating = False
driver.Start "chrome"
driver.Get"http://demo.guru99.com/test/web-table-element.php"
For Each th In driver.FindElementByClass("dataTable").FindElementByTag("thead").FindElementsByTag("tr")
cc = 1
For Each t In th.FindElementsByTag("th")
Sheet2.Cells(1, cc).Value = t.Text
cc = cc + 1
Next t
Next th
For Each tr In driver.FindElementByClass("dataTable").FindElementByTag("tbody").FindElementsByTag("tr")
columnC = 1
For Each td In tr.FindElementsByTag("td")
Sheet2.Cells(rowc, columnC).Value = td.Text
columnC = columnC + 1
Next td
rowc = rowc + 1
Next tr
Application.Wait Now + TimeValue("00:00:20")
End Sub

The vba module would look as follows: -

The excel can be initialized by means of the Range attribute of the excel sheet or through cells attribute of the excel sheet. To reduce the complexity of the VBA script, the collection data is initialized to the excel cells attribute of the sheet 2 present in the workbook. Further, the text attribute helps in getting the text information placed under HTML tag.

Sub test2()
Dim driver As New WebDriver
Dim rowc, cc, columnC As Integer
rowc = 2
Application.ScreenUpdating = False
driver.Start "chrome"
driver.Get"http://demo.guru99.com/test/web-table-element.php"
For Each th In driver.FindElementByClass("dataTable").FindElementByTag("thead").FindElementsByTag("tr")
cc = 1
For Each t In th.FindElementsByTag("th")
Sheet2.Cells(1, cc).Value = t.Text
cc = cc + 1
Next t
Next th
For Each tr In driver.FindElementByClass("dataTable").FindElementByTag("tbody").FindElementsByTag("tr")
columnC = 1
For Each td In tr.FindElementsByTag("td")
Sheet2.Cells(rowc, columnC).Value = td.Text
columnC = columnC + 1
Next td
rowc = rowc + 1
Next tr
Application.Wait Now + TimeValue("00:00:20")
End Sub

The vba module would look as follows: -

Step 3) Once the macro script is ready, pass and assign the subroutine to excel button and exit the module of VBA. Label the button as refresh or any suitable name that could be initialized to it. For this example, the button is initialized as refresh.

Step 4) Press the refresh button to get the below mentioned output

Step 5) Compare the results in excel with the results of google chrome

Summary:

  • Selenium can be classified as the automation tool that facilitates scraping of information from the HTML web pages to perform web scraping utilizing google chrome.
  • The scraping on the internet should be performed carefully.
  • It is normally against the terms of the website to scrape out information.
  • When scraping is done through selenium, then it offers multiple browser support.
  • In other words, the scraper can perform similar tasks of scraping through Firefox, internet explorer as well.

 

YOU MIGHT LIKE: