How to Find Broken Links in Selenium
โก Smart Summary
Finding broken links in Selenium WebDriver involves collecting every anchor tag, sending an HTTP HEAD request to each URL, and reading the response code. Links returning 400 or above are flagged as broken, while valid links return 2xx codes.

What are Broken Links?
Broken links are links or URLs that are not reachable. They may be down or not functioning due to some server error.
A URL will always have a status with 2xx, which is valid. There are different HTTP status codes which have different purposes. For an invalid request, the HTTP status is 4xx and 5xx.
The 4xx class of status code is mainly for client side error, and the 5xx class of status codes is mainly for the server response error.
We will most likely be unable to confirm if that link is working or not until we click and confirm it.
Why should you check Broken links?
You should always make sure that there are no broken links on the site, because the user should not land on an error page.
The error happens if the rules are not updated correctly, or the requested resources are not existing at the server.
Manual checking of links is a tedious task, because each webpage may have a large number of links & the manual process has to be repeated for all pages.
An automation script using Selenium that will automate the process is a more apt solution.
How to check the Broken Links and Images in Selenium
For checking the broken links, you will need to do the following steps.
- Collect all the links in the web page based on the <a> tag.
- Send an HTTP request for the link and read the HTTP response code.
- Find out whether the link is valid or broken based on the HTTP response code.
- Repeat this for all the links captured.
Code to Find the Broken links on a webpage
Below is the web driver code which tests our use case:
package automationPractice; import java.io.IOException; import java.net.HttpURLConnection; import java.net.MalformedURLException; import java.net.URL; import java.util.Iterator; import java.util.List; import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; public class BrokenLinks { private static WebDriver driver = null; public static void main(String[] args) { // TODO Auto-generated method stub String homePage = "http://www.zlti.com"; String url = ""; HttpURLConnection huc = null; int respCode = 200; driver = new ChromeDriver(); driver.manage().window().maximize(); driver.get(homePage); List<WebElement> links = driver.findElements(By.tagName("a")); Iterator<WebElement> it = links.iterator(); while(it.hasNext()){ url = it.next().getAttribute("href"); System.out.println(url); if(url == null || url.isEmpty()){ System.out.println("URL is either not configured for anchor tag or it is empty"); continue; } if(!url.startsWith(homePage)){ System.out.println("URL belongs to another domain, skipping it."); continue; } try { huc = (HttpURLConnection)(new URL(url).openConnection()); huc.setRequestMethod("HEAD"); huc.connect(); respCode = huc.getResponseCode(); if(respCode >= 400){ System.out.println(url+" is a broken link"); } else{ System.out.println(url+" is a valid link"); } } catch (MalformedURLException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } driver.quit(); } }
Explaining the code of Broken Links
Step 1: Import Packages
Import the below package in addition to the default packages:
import java.net.HttpURLConnection;
Using the methods in this package, we can send HTTP requests and capture HTTP response codes from the response.
Step 2: Collect all links in web page
Identify all links in a webpage and store them in a List.
List<WebElement> links = driver.findElements(By.tagName("a"));
Obtain an Iterator to traverse through the List.
Iterator<WebElement> it = links.iterator();
Step 3: Identifying and Validating URL
In this part, we will check if a URL belongs to a third party domain or whether the URL is empty/null.
Get the href of the anchor tag and store it in the url variable.
url = it.next().getAttribute("href");
Check if the URL is null or empty and skip the remaining steps if the condition is satisfied.
if(url == null || url.isEmpty()){ System.out.println("URL is either not configured for anchor tag or it is empty"); continue; }
Check whether the URL belongs to a main domain or a third party. Skip the remaining steps if it belongs to a third party domain.
if(!url.startsWith(homePage)){ System.out.println("URL belongs to another domain, skipping it."); continue; }
Step 4: Send HTTP request
The HttpURLConnection class has methods to send an HTTP request and capture the HTTP response code. So, the output of the openConnection() method (URLConnection) is type cast to HttpURLConnection.
huc = (HttpURLConnection)(new URL(url).openConnection());
We can set the Request type as “HEAD” instead of “GET”, so that only headers are returned and not the document body.
huc.setRequestMethod("HEAD");
On invoking the connect() method, the actual connection to the url is established and the request is sent.
huc.connect();
Step 5: Validating Links
Using the getResponseCode() method, we can get the response code for the request.
respCode = huc.getResponseCode();
Based on the response code, we will try to check the link status.
if(respCode >= 400){ System.out.println(url+" is a broken link"); } else{ System.out.println(url+" is a valid link"); }
Thus, we can obtain all links from a web page and print whether the links are valid or broken.
How to get ALL Links of a Web Page
One of the common procedures in web Testing is to test if all the links present within the page are working. This can be conveniently done using a combination of the Java for-each loop, findElements() & By.tagName(“a”) method.
The findElements() method returns a list of Web Elements with the tag a. Using a for-each loop, each element is accessed.
The WebDriver code below checks each link from the Mercury Tours homepage to determine those that are working and those that are still under construction.
import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; import java.util.List; import java.util.concurrent.TimeUnit; import org.openqa.selenium.*; public class P1 { public static void main(String[] args) { String baseUrl = "https://demo.guru99.com/test/newtours/"; System.setProperty("webdriver.chrome.driver", "G:\\chromedriver.exe"); WebDriver driver = new ChromeDriver(); String underConsTitle = "Under Construction: Mercury Tours"; driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS); driver.get(baseUrl); List<WebElement> linkElements = driver.findElements(By.tagName("a")); String[] linkTexts = new String[linkElements.size()]; int i = 0; //extract the link texts of each link element for (WebElement e : linkElements) { linkTexts[i] = e.getText(); i++; } //test each link for (String t : linkTexts) { driver.findElement(By.linkText(t)).click(); if (driver.getTitle().equals(underConsTitle)) { System.out.println("\"" + t + "\"" + " is under construction."); } else { System.out.println("\"" + t + "\"" + " is working."); } driver.navigate().back(); } driver.quit(); } }
The output should be similar to the one indicated below.
- Accessing image links is done using the By.cssSelector() and By.xpath() methods.
TroubleShooting
In an isolated case, the first link accessed by the code could be the “Home” Link. In such a case, the driver.navigate.back() action will show a blank page, as the first action is opening a browser. The driver will not be able to find all other links in a blank browser. So the IDE will throw an exception and the rest of the code will not execute. This can be easily handled using an If loop.


.png)