• Home
  • Testing
  • SAP
  • Web
  • Must Learn!
  • Big Data
  • Live Projects
  • Blog

With Python you can also access and retrieve data from the internet like XML, HTML, JSON, etc. You can also use Python to work with this data directly. In this tutorial we are going to see how we can retrieve data from the web. For example, here we used a guru99 video URL, and we are going to access this video URL using Python as well as print HTML file of this URL.

In this tutorial we will learn

How to connect to Internet data

Before we run the code to connect to Internet data, we need to import statement for URL library two module or "urllib2". It is a Python module that provides utilities for connecting to web addresses and retrieving data from them.

Internet Access with Python Tutorial: Open, Parse & Read URL

  • Import urllib2
  • Define your main function
  • Declare the variable webUrl
  • Then call the urlopen function on the URL lib two library
  • The URL we are opening is guru99 tutorial on youtube
  • Next, we going to print the result code
  • Result code is retrieved by calling the getcode function on the webUrl variable we have created
  • We going to convert that to a string, so that it can be concatenated with our string "result code"
  • This will be a regular HTTP code "200", indicating http request is processed successfully

How to read HTML file for your URL in Python

You can also read the HTML file by using the "read function" in Python, and when you run the code, the HTML file will appear in the console.

Internet Access with Python Tutorial: Open, Parse & Read URL

  • Call the read function on the webURL variable
  • Read variable allows to read the contents of data files
  • Read the entire content of the URL into a variable called data
  • Run the code- It will print the data into HTML format

Here is the complete code

# read the data from the URL and print it
import urllib2

def main():
# open a connection to a URL using urllib2
   webUrl = urllib2.urlopen("https://www.youtube.com/user/guru99com")
#get the result code and print it
   print "result code: " + str(webUrl.getcode()) 
# read the data from the URL and print it
   data = webUrl.read()
   print data
if name == "__main__":