XML Tutorial for Beginners
What is XML?
XML stands for eXtensible Markup Language. It is a language (not> a programming language) that uses the markup and can extend. It is derived from Standard Generalized Markup Language(SGML). XML also uses DTDs (Document Type Definitions) to define the XML document structure.
XML is not for handling computational operations and algorithms. Thus, XML is not a programming language. The main goal is to transport data not to display information. XML bridges the gap between human readability and machine readability. Unlike HTML tags, XML tags are self-descriptive.
XML is an open format. The filename extension of XML is .xml
History of XML
XML started way back in 1996 and was first published in 1998. World Wide Web Consortium (W3C) is the developer of XML, and it became a W3C recommendation in 1998.
There are two versions of XML.
- XML 1.0
- XML 1.1
XML 1.1 is the latest version. Yet, XML 1.0 is the most used version.
Editors of XML are:
- Tim Bray,
- Jean Paoli,
- C. M. Sperberg,
- Eve Maler,
- François Yergeau.
XML Features
Here are some important features of XML:
- It is extensible and human-readable.
- It is platform and language independent.
- It preserves white space.
- Overall simplicity.
- Self-descriptive nature.
- It separates data from HTML.
- XML tags are not predefined. You need to define your customized tags.
- XML was designed to carry data, not to display that data.
- Mark-up code of XML is easy to understand for a human.
- Well-structured format is easy to read and write from programs.
- XML is an extensible markup language like HTML.
XML Encoding
Encoding is the conversion of Unicode characters to their binary representation. UTF is use for XML encoding. UTF stands for UCS (UCS stands for Universal Character Set) Transformation Format.
Mainly, there are two types of UTF encoding.
- UTF-8 : UTF-8 uses 8-bits to represent the characters.
- UTF-16
Example:
<?xml version="1.0" encoding="UTF-8"?>
It uses 16-bits to represent the characters.
Example:
<?xml version="1.0" encoding="UTF-16"?>
You can use encoding inside the XML declaration. UTF-8 is the default encoding in XML.
XML Syntax
The below code segment shows the basic XML syntax.
<?xml version = "1.0" encoding = "UTF-8" ?> <root> <child> <subchild>.....</subchild> </child> </root>
XML Declaration
XML declaration consists of the XML version, character encoding or/and standalone status. The declaration is optional.
Syntax for XML Declaration
The below code segment shows the syntax for XML declaration.
<?xml version="version_number," encoding="character_encoding" standalone="yes_or_no" ?>
XML Declaration Rules
Following are XML declaration rules.
- If the XML declaration is present, it must be the first thing that appears.
- The XML declaration is case sensitive, and it must start with the lowercased <?xml.
- It has no closing tag.
Example of XML Declaration
Following code segment shows an example of an XML declaration.
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
XML Comments
Comments are optional. Adding comments help to understand the document content.
Syntax for XML Comments
A comment begins with <!– and ends with –>.
Following code segment shows the syntax for XML comments.
<!-- Add your comment here -->
XML Tags and Elements
Tags work as pairs except for declarations. Every tag pair consists of an opening tag (also known as the start tag) and a closing tag (also known as the end tag).
Tag names are enclosed in <>. For a particular tag pair, the start and end tags must be identical except the end tag has / after the <.
<name>...</name>
Anything between the opening and closing tags is referred to as content.
Opening tag, content, and closing tag, altogether, is referred to as an element.
Opening tag + content + closing tag = an element
Note: Elements may also contain attributes. You will learn the attributes very soon.
Let us consider the below element.
<age>20</age>
In the above element,
- age is the name of the element.
Note: Tag name also referred to as an element or element name.
- <age> – opening tag
- 25 – content
- </age> – closing tag.
If there is no content between the tags, as shown below, it referred to as empty tags.
<result></result>
XML Tag and Element Rules
Following list shows XML tag and element rules.
- Tags are case sensitive.
Example:
Correct:
<age>20</age>
Wrong:
<age>20</Age>
Note: AGE, Age, and age are three different names in XML.
- All XML documents must contain a single root element.
- All elements must have a closing tag (except for declarations).
- A tag name must begin with a letter or an underscore, and it cannot start with the XML.
- A tag name can contain letters, digits, hyphens, underscores, and periods. Hyphens underscore, and periods are the only punctuation marks allowed.
- A tag name cannot contain spaces.
- All elements must be nested properly.
Example:
Correct:
<b><u>This text is bold and italic</u></b>
Wrong:
<b><u>This text is bold and italic.</b></u>
XML Attributes
Attribute for an element is placed after the tag name in the start tag. You can add more than one attribute for a single element with different attribute names.
Let’s consider the below XML document.
<company name="ABC Holdings" location="London"> <chairman>Mr. John</chairman> <gm>Mr. Wood</gm> </company>
There are two attributes in the company element, i.e. name and location.
Let’s study the name attribute,
- name=”ABC Holdings” – an attribute
- name – attribute name
- ABC Holdings – attribute value
Note: An attribute name is also known as an attribute.
Also, note that in the above example, the company is the root element.
XML Attribute Rules
The below list shows XML attribute rules.
- Attribute values must be within quotes.
- An element cannot contain several attributes with the same name.
Attribute versus Element
Are you still confused about the difference between an attribute and an element? Here is another example.
Let’s consider documents A and B given below.
Document A:
<teacher subject="English"> <name>Mr. John</name>. <qualification>Graduate</qualification> </teacher>
Document B:
<teacher> <subject>English</subject> <name>Mr. John</name> <qualification>Graduate</qualification> </teacher>
In document A, the subject is an attribute.
In document B, the subject is an element.
XML Entities
What are XML Entities?
In simple terms, entities are a way of representing special characters. Entities are also known as entity references.
Why You Need XML Entities?
Some characters (such as “, & <, and so on) are reserved in XML. They are referred to as special characters and cannot be directly used for other purposes.
For example, the < and > symbols a used for tags. You cannot directly type from the keyboard for less than and greater than signs. Instead, you need to use entities.
Following table shows some of the popular XML entities.
Character | Description | Entity Name | Usage |
---|---|---|---|
“ | Quotation mark (double quote) | quot | " |
& | Ampersand | amp | & |
‘ | Apostrophe (single quote) | apos | ' |
< | Less than sign | lt | < |
> | Greater than sign | gt | > |
Example:
<friend> <name>My friends are Alice & Jane.</name> </friend>
HTML versus XML
Similarities between HTML and XML
Following list shows the similarities between HTML and XML.
- Both are open formats.
- Both are markup languages.
- Both use tags and attributes to describe the content.
Differences between HTML and XML
Even though XML is like HTML, XML is not a replacement for HTML. There are some significant differences between HTML and XML as well.
Following list table show a comparison between HTML and XML.
HTML | XML | |
---|---|---|
Stands for | Hypertext Markup Language | Extensible Markup Language |
Type of language | A predefined markup language. | A framework for specifying markup languages. |
Structural details | Not provided. | Provided. |
Purpose | Used to display data. | Used to transport data |
Driven by | Format driven. | Content-driven. |
Nature | Has a static nature. | Has a dynamic nature. |
Tag type | Predefined tags. | User-defined tags. |
Tag limit | A limited number of tags are available. | Tags are extensible. |
Closing tags | It is not necessary to use closing tags (but recommended to use closing tags). | Closing tags are mandatory. |
Namespace support | Not supported. | Supported. |
Case sensitivity | Tags are not case-sensitive. | Tags are case-sensitive. |
White space | White space cannot preserve (can ignore white space). | White space preserved (cannot ignore white space). |
Parsing in JavaScript | Not needed any extra application. | Need DOM implementation. |
Code nesting | Not necessarily needed. | Needed. |
Errors | Can ignore small errors. | Errors are not allowed. |
Filename Extension | .html or .htm | .xml |
Size | Comparatively large. | Comparatively small. |
Quotes | Quotes are not required for attribute values. | Required for XML attribute values. |
Object support | Offers native object support. | Objects have to be expressed by conventions. |
Null support | Natively recognizes the null value. | Need to use xsi:nil on elements. |
Formatting decisions | Provides direct mapping for application data. | Require more significant effort. |
Learning curve | Less steep learning curve compared to XML. | Steep learning curve. |
Website | https://html.spec.whatwg.org/ | https://www.w3.org/TR/xml11/ |
Basic HTML Syntax
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Document</title> </head> <body> </body> </html>
Basic XML Syntax
<?xml version = "1.0" encoding = "UTF-8" ?> <root> <child> <subchild>.....</subchild> </child> </root>
Same example with HTML and XML
With HTML
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Document</title> </head> <body> <p>Book</p> <p>Name: Anna Karenina</p> <p>Author: Leo Tolstoy</p> <p>Publisher: The Russian Messenger</p> </body> </html>
With XML
<?xml version = "1.0" encoding = "UTF-8" ?> <book> <name>Anna Karenina</name> <author>Leo Tolstoy</author> <publisher>The Russian Messenger</publisher> </book>
JSON versus XML
Similarities between JSON and XML
The below list shows the similarities between JSON and XML.
- Both are open formats.
- Both are self-describing.
- Both have a hierarchical structure.
- Both can parse and use by several programming languages.
Differences between JSON and XML
There are several differences between XML and JSON as well.
The below tables show a comparison between JSON and XML.
JSON | XML | |
---|---|---|
Stands for | JavaScript Object Notation | Extensible Markup Language |
Extended from | JavaScript | SGML |
Data storage | Data stored as key-value pairs. | Data stored as a tree structure. |
Namespaces | No support for namespaces. | Supports namespaces. |
Comments | Adding comments is not supported. | Can add comments. |
Data accessibility | Readily accessible as JSON objects. | Data need to be parsed. |
Metadata | Adding metadata is not supported. | Can write metadata. |
Types | JSON types: string, number, array, Boolean. | All XML data should be strings. |
Data types of support | Supports text and number data types only. | Support many data types (text, numbers, images, so on) |
Array’s support | More support for arrays compared to XML. | No or less support for arrays. |
Object’s support | Native support for object. | The object has to be express by conventions. |
AJAX toolkit support | Supported. | Not fully supported. |
Retrieving values | Easy. | Difficult. |
Deserializing/serializing | Fully automated. | Developers have to write JavaScript code. |
Browser support | Supported by most browsers. | Cross-browser XML parsing can be tricky. |
Encoding | Only supports UTF-8 encoding. | It supports various encoding. |
Display capabilities | No display capabilities. | Offer display capabilities. |
Document size | Smaller than XML. | Large than JSON. |
Filename Extension | .json | .xml |
Security | Less secured. | More secure than JSON. |
Easy to read | Relatively easy. | Relatively difficult. |
Learning curve | Easy to learn. | Steep learning curve. |
Website | https://www.json.org/json-en.html | https://www.w3.org/TR/xml11/ |
Basic JSON Syntax
{string:value, .......}
Same example with JSON and XML
With JSON
{"books":[ {"name":"Anna Karenina", "author":"Leo Tolstoy"}, {"name":"One Hundred Years of Solitude", "author":"Gabriel Garcia Marquez"}, {"name":"The Great Gatsby", "author":"Scott Fitzgerald"}, {"name":"Invisible Man", "author":"Ralph Ellison"} ]}
With XML
<?xml version = "1.0" encoding = "UTF-8" ?> <books> <book> <name>Anna Karenina</name> <author>Leo Tolstoy</author> </book> <book> <name>One Hundred Years of Solitude</name> <author>Gabriel Garcia Marquez</author> </book> <book> <name>The Great Gatsby</name> <author>Scott Fitzgerald</author> </book> <book> <name>Invisible Man</name> <author>Ralph Ellison</author> </book> </books>
XML DTD
What is DTD?
DTD stands for Document Type Definition. It defines the structure of an XML document using some legal elements. XML DTD is optional.
DTD Rules
Following list shows DTD rules.
- If DTD is present, it must appear at the start of the document (only the XML declaration can appear above the DTD).
- The element declaration must start with an ! mark.
- The DTD name and element type of the root element must be the same.
Examples of DTD
Example of an internal DTD:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE student [ <!ELEMENT student (firstname,lastname,school)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT school (#PCDATA)> ]> <student> <firstname>Mark</firstname> <lastname>Wood</lastname> <school>Hills College</school> </student>
In the above example,
- !DOCTYPE student indicates the beginning of the DTD declaration. And the student is the root element of the XML document.
- !ELEMENT student indicates the student element must contain firstname, lastname and school elements.
- !ELEMENT firstname indicates the firstname element is of type #PCDATA (Parsed Character Data).
- !ELEMENT lastname indicates the lastname element is of type #PCDATA.
- !ELEMENT school indicates the school element is of type #PCDATA.
Example of an external DTD:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE student SYSTEM "student.dtd"> <student> <firstname>Mark</firstname> <lastname>Wood</lastname> <school>Hills College</school> </student>
The DTD file content (student.dtd) as follows.
<!ELEMENT student (firstname,lastname,school)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT school (#PCDATA)>
XML DOM
What is DOM?
DOM stands for Document Object Model. It defines a standard manner of accessing and manipulating XML documents. DOM has a (hierarchical) tree structure.
Example of DOM
Let’s consider the below XML document.
<?xml version="1.0" encoding="UTF-8" ?> <school> <student> <name> <first_name>Alex</first_name> <last_name>Clarke</last_name> </name> <age>14</age> <address>No. 35, Flower Road, Leeds</address> </student> </school>
The tree structure of the above XML file would look like the following image.
XML Validation
What are Well-formed XML Documents?
Well-formed XML documents are XML documents with correct syntax.
What are Valid XML Documents?
Valid XML documents are well-formed and also conform to the DTD rules.
XML Namespaces
Why Namespaces?
Namespaces help to avoid element name conflicts.
Namespace Declaration
Following shows the syntax for the namespace declaration.
<element xmlns:name="URL">
In the above declaration,
- The xmlns keyword indicates the beginning of the namespace.
- The name is the prefix of the namespace.
- The URL is the namespace identifier.
Examples of Namespaces
Following code segment shows an example of namespaces.
<?xml version="1.0" encoding="UTF-8" ?> <abt:about xmlns:abt="https://www.guru99.com/about-us.html"> <abt:founder>Krishna</abt:founder> <abt:vision>Fun and Free Education for ALL</abt:vision> </abt:about>
XML Editors
There are several XML editors available. Any text editor (such as notepad and so on) can use as an XML editor.
The following list shows some of the popular XML editors in 2021.
1) XML Notepad
XML Notepad is an open-source editor for XML . It has a tree view and XSL Output on the left pane and node text on the right. It has an error-debugging window at the bottom.
Key Statistics:
- Type – XML editor
- Developer – Microsoft
- Supported operating system – Microsoft Windows.
- Price – Free
Link: http://microsoft.github.io/XmlNotepad/
2) Stylus Studio
Stylus Studio is an IDE written in C++ for Extensible Markup Language ( XML ). It allows a user to edit and transform XML documents, data such as electronic data interchange(EDI), CSV, and relational data.
Key Statistics:
- Type – Integrated development environment (IDE) for XML
- Developer – Progress Software Corporation
- Supported operating system – Microsoft Windows.
- Price – Paid (Please refer to the website given below for the latest price), Free trial available.
Link: http://www.stylusstudio.com/
3) Altova XMLSpy
XMLSpy is primarily marketed as a JSON and XML Editor. It has a built-in schema designer and editor. It includes Visual Studio And Eclipse integration.
Key Statistics:
- Type – XML Editor
- Developer – Altova
- Supported operating system – Microsoft Windows.
- Price – Paid (Please refer to the website given below for the latest price), Free trial available.
Link:https://www.altova.com/xmlspy-xml-editor
4) Oxygen XML Editor
Oxygen XML is a cross-platform editor developed in Java. It helps to validate schemas like DTD, W3C XML Schema, RELAX NG, Schematron, NRL, and NVDL schemas.
Key Statistics:
- Type – XML editor
- Developer – SyncRO Soft Ltd
- Supported operating system – Windows, Linux, and Mac OS X
- Price – Paid (Please refer to the website given below for the latest price
Link:https://www.oxygenxml.com/
5) Xmplify
Xmplify XML Editor provides a fully XML-aware editing environment with DTD and XML Schema-based auto, automatic document validation, etc.
Key Statistics:
- Type – XML Editor
- Developer – MOSO Corporation
- Supported operating system – Mac OS.
- Price – Paid (Please refer to the website given below for the latest price
Link: http://xmplifyapp.com/
XML Parsers
An XML parser is a software library that provides an interface to work with XML documents. It checks whether the format of the XML document is correct. Some parsers can also validate the XML documents. Modern-day browsers come with XML parsers.
SAX
SAX stands for Simple API for XML. It is an application program interface (API) for parsing XML documents. They behave similarly to the event handlers in Java.
Unlike DOM, SAX is an example of an event-based XML parser.
Here are some important differences between the SAX and DOM.
SAX | DOM | |
---|---|---|
Stands for | Simple API for XML | Document Object Model |
Type of parser | Event-based | Object-based |
Read and write XML | Read-only | Both read and write |
Insert/update/delete nodes | Cannot insert/update/delete nodes | Can insert/update/delete nodes |
Memory efficiency | Good memory efficiency | Varies |
Speed | Slower than DOM Parser | Faster than SAX Parser |
Suitable for | Small-sized files | Large-sized files |
XML Data Binding
XML data binding is the representation of data in an XML document as a business object in the memory of a computer.
There are three approaches for XML data binding.
- XML schema-based data binding: Corresponding XML classes are created based on the schema.
- Class-based data binding: A corresponding XML schema is created based on classes.
- Mapping-based data binding: It describes how an existing XML schema maps to a set of classes (and vice-versa).
There are XML data binding frameworks also.
Examples:
XML data binding is easy with frameworks. The data binding framework generates a large amount of code for you. You need to feed in a DTD or XML schema.
XML Schemas
XML schema (also known as XML schema definition or XSD) use to describe the XML document structure. It is an alternative to DTD.
Why Schema is Important?
DTD is not powerful as schema as it is not extensible and flexible enough. So, it may not be suitable for some situations. In such a situation schema is important. The main purpose of using XML schema is to define the elements and attributes of an XML document.
How XML Schema is Different from DTD?
The following comparison shows how XSD (XML Schema) is different from DTD.
DTD | XSD | |
---|---|---|
Stands for | Document Type Definition | XML Schema Definition |
Extensibility | Not extensible | Extensible |
Control on XML structure | Less control | More control |
Data types of support | Not supported | Supported |
Namespace Support | Not supported | Supported |
Following code segment shows an example of XML schema.
xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:element name = "employee"> <xs:complexType> <xs:sequence> <xs:element name = "firstname" type = "xs:string" /> <xs:element name = "lastname" type = "xs:string" /> <xs:element name = "phone" type = "xs:int" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Advantages of XML
Here, pros/benefits of XML:
- It made it easy to transport and share data.
- XML improves the exchange of data between various platforms.
- It is a markup language, which is a set of characters or/and symbols placed in a text document.
- XML indicates how the XML document should look after it is displayed.
- It simplifies the platform change process.
- It enhances data availability.
- It supports multilingual documents and Unicode.
- Provide relatively easy to learn and code.
- It is a markup language, which is a set of characters or/and symbols placed in a text document.
- It performs validation using DTD and Schema.
- Makes documents transportable across systems and applications. With the help of XML, you can exchange data quickly between different platforms.
- XML separates the data from HTML.
Disadvantages of XML
Here are the cons/drawback of using XML:
- XML requires a processing application.
- The XML syntax is similar to another alternative ‘text-based’ data transmission formats, which is sometimes confusing.
- No intrinsic data type support
- The XML syntax is redundant.
- Does not allow the user to create his tags.
Summary
- XML stands for eXtensible Markup Language. XML is a language (not a programming language) that uses the markup and can extend.
- The main goal is to transport data, not to display data.
- XML 1.1 is the latest version. Yet, XML 1.0 is the most used version.
- Tags work as pairs except for declarations.
- Opening tag + content + closing tag = an element
- Entities are a way of representing special characters.
- DTD stands for Document Type Definition. It defines the structure of an XML document using some legal elements. XML DTD is optional.
- DOM stands for Document Object Model. It defines a standard manner of accessing and manipulating XML documents.
- Well-formed XML documents are XML documents with correct syntax.
- Valid XML documents are well-formed and also conform to the DTD rules.
- Namespaces help to avoid element name conflicts.