Wednesday, August 24, 2011

eXtensible Markup Language (XML)

I am looking at the W3Schools tutorial on XML. I like the clarification on the roles of HTML and XML:
  • HTML was designed to display data, with focus on how data looks.
  • HTML truncates multiple white-space characters to one single white-space.
  • XML was designed to transport and store data, with focus on what data is.
  • White-space is Preserved in XML. XML Stores New Line as LF.
  • XML tags are not predefined. You must define your own tags.
  • XML documents must include an XML declaration and can include a DOCTYPE declaration, which in turn refers to "system" of tags/elements, set out in either a DTD file or in a Schema.
  • XML does not DO anything other than to transport and store data.
  • XML data is stored in plain text format. This provides a software and hardware independent way of storing data.
  • XML documents must contain a root element. This element is "the parent" of all other elements. All elements can have sub elements (child elements).

The tutorial gives an example:

<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

Here first line is the XML declaration and the root element is <bookstore>. The <bookstore> contains 3 <book> elements, each of which had a lang attribute, and in turn has 4 child elements. It seems to me analogous to a data table entitled "bookstore", with 6 columns and 3 rows. I say 6 columns, because in a data table, the lang attribute and the book category would be stored as columns, possibly integer indices linked to other tables.

My interpretation is confirmed in a section entitled Elements vs. Attributes, where the following examples are said to be equivalent:

<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>

<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>

Indeed the tutorial goes on to say:

There are no rules about when to use attributes or when to use elements [but while] attributes are handy in HTML. In XML my advice is to avoid them. Use elements instead.

My first impression is that this is not an "efficient" way to store data, not least because the row "headers" have to be repeated in every row. But then HTML tables are messy to look at (in HTML code) and nightmarish to produce (manually).

Interestingly, although XML does not render in a browser in the same way as HTML, it does come up a bit like code in a code editor (see below). So "parent" elements can be expanded or contracted to show or hide their "child" elements. In the image below, the "cooking" book element has been expanded, but the other two book elements left closed.

It's interesting, but I am still no closer to connecting my Applet to a database.

No comments: