What is Data Scraping?

Data scraping is the technique that helps in the extraction of desired information from a HTML web page to a local file present in your local machine. Normally, a local file could correspond to an excel file, word file, or to say any Microsoft office application. It helps in channeling critical information from the web page.

The data scraping becomes simple when working on a research-based project on a daily basis, and such a project is purely dependent on the internet and website. To further illustrate on the topic, let us take the example of a day trader who runs an excel macro for pulling market information from a finance website into an excel sheet using VBA.

In this tutorial, you will learn:

How to prepare Excel Macro before performing Data Scraping using Internet explorer?

There are certain prerequisites that has to be performed on the excel macro file before getting into the process of data scraping in excel.

These prerequisites are as follows: -

Step 1) Open an Excel-based Macro and access the developer option of excel.

Step 2) Select Visual Basic option under Developer ribbon.

Step 3) Insert a new module.

Step 4) Initialize a new subroutine

Sub test()
End sub

The module would result as follows: -

Step 5) Access the reference option under the tool tab and reference Microsoft HTML Object Library and Microsoft internet control.

The following files are to be referenced to the module as it helps in opening internet explorer and facilitates the development of macro scripting.

Now the Excel file is ready to interact with the internet explorer. The next step would be to incorporate macro scripts that would facilitate data scraping in HTML.

How to Open Internet Explorer using Excel VBA?

Step 1) Initialize the variable in the subroutines as displayed below

Sub test()
Dim ie As New InternetExplorer 
Dim doc As New HTMLDocument

Step 2) To open internet explorer using VBA, write i.e. visible=true and press F5.

Sub test()
Dim ie As New InternetExplorer 
Dim doc As New HTMLDocument
Ie.visible=true

The module would look as follows: -

How to Open Website in Internet explorer using VBA?

Here, are steps to Open Website in Internet exploer using VBA

Step 1) Once you are able to access the internet explorer using Excel VBA, the next step would incorporate the accessing of a website using VBA. This facilitated by Navigate Attribute, wherein the URL has to pass as double quotes in the attribute. Follow the following steps as displayed.

Sub test()
Dim, ie As New InternetExplorer
Dim doc As New HTMLDocument
Dim ecoll As Object
ie.Visible = True
ie.navigate"http://demo.guru99.com/test/web-table-element.php"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE

Step 2) - Press F5 to execute the macro. The following webpage would be opened as displayed

Now, the excel macro is ready with respect to performing the scraping functions. The next step would display how the information can be extracted from internet explorer using VBA.

How to Scrape information from Website using VBA?

Suppose the day trader wants to access the data from the website on a daily basis. Each time the day trader presses the click the button, it should auto pull the market data into excel.

From the above website, it would be necessary to inspect an element and observe how the data is structured.

Step 1) Access the below source code of HTML by pressing control + Shift + I

<table class="datatable">
<thead>
<tr>
<th>Company</th>
<th>Group</th>
<th>Pre Close (Rs)</th>
<th>Current Price (Rs)</th>
<th>% Change</th>
</tr>

The source code would be as follows: -

Sub test()
Dim ie As New InternetExplorer
Dim doc As New HTMLDocument
Dim ecoll As Object
ie.Visible = True
ie.navigate "http://demo.guru99.com/test/web-table-element.php"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document

As it can be seen that the data is structured as a single HTML Table. Therefore, in order to pull entire data from the html table, it would require designing of macro which collects the data in the form of a collection.

The collection would then be pasted into excel. To achieve, the desired results perform the below-mentioned steps: -

Step 2) Initialize the Html document in the subroutine

The VBA module would look as follows: -

Step 3) Initialize the collection element present in the HTML document

The VBA module would look as follows: -

Sub test()
Dim ie As New InternetExplorer
Dim doc As New HTMLDocument
Dim ecoll As Object
ie.Visible = True
ie.navigate "http://demo.guru99.com/test/web-table-element.php"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
Set ecoll = doc.getElementsByTagName("table")

Step 4) Initialize the excel sheet cells with the help of nested loop as shown

The VBA module would look as follows: -

Sub test()
Dim ie As New InternetExplorer
Dim doc As New HTMLDocument
Dim ecoll As Object

ie.Visible = True
ie.navigate "http://demo.guru99.com/test/web-table-element.php"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
Set ecoll = doc.getElementsByTagName("table")

The excel can be initialized using the range attribute of the excel sheet or through cells attribute of the excel sheet. To reduce the complexity of the VBA script, the collection data is initialized to the excel cells attribute of sheet 1 present in the workbook.

Once the macro script is ready, pass and assign the subroutine to excel button and exit the module of VBA. Label the button as refresh or any suitable name that could be initialized to it. For this example, the button is initialized as a refresh.

Step 5) Press the refresh button to get the below-mentioned output

Step 6) Compare the results in excel with the results of internet explorer

Summary:

  • The data scraping allows the user to scrape out only the information that the user wants.
  • Scraping can be performed using internet explorer.
  • The process of scraping is slower in the case of internet explorer; however, it delivers the desired results to the user.
  • The scraping should be performed with absolute carefulness and caution as it can harm and crash the system being utilized for scraping.

 

YOU MIGHT LIKE: