Chapter 8: STxT and XML

Warning to the Reader

This chapter may offend the sensitivities of enthusiastic XML readers. Read at your own risk ;-)

The Legend

The origin of evil: SGML.

It all started a long, long time ago in a dark land called "informatics." Encodings roamed this world. Sowing incompatibilities. Hindering communications. Laughing at those who were not like them. Some tried to make peace with all of them.

And then they created the monster: SGML.

The monster began by stating which encoding it would stick with. And there was no problem. But the effort was too much. A bunch of strange characters were needed. Texts filled with "<", ">", "&xxx;". Unintelligible.

And they have reproduced. The world is full of their offspring. Little monsters that fill the earth with these characters, hindering reading for humans. Only a handful of elites can deal with the monsters. They are called "The IT People." They are the terror. Nobody wants to see them, but everyone needs them. They are dark. They speak strange tongues.

But the monsters are still here. They have mutated. Now they are called XMLs. And people like them!!

...

But a small group of people gathered and decided this couldn’t go on. They would create a champion. Someone to enlighten them. There could be only one.

The time for STxT had arrived.

The Present

The world is full of ugly documents, XMLs that we like... even though it's a horrible format!

We are inheriting an outdated format, with outdated encodings, with a very IT-centric mindset. And it's everywhere. And we keep using it. And we want it for everything... Wait a minute! Stop! Is it necessary? If we knew nothing about SGML and wanted to have a quick-to-parse TXT, with structure, with namespaces, would we have created XML? I don't think so. If you asked a child to invent something, I suppose they would never have created XML... they would have made something more natural... THEY WOULD HAVE CREATED STxT!

Well, we have more experience, and we’ve learned. Let's create it!

First Contact

I don't want to scare you, but STxT, compared to XML, is:

Prettier
Simpler
More compact
Faster

And it can also express the same things.

Let's take a look at an XML example, and we’ll compare it with STxT to start. It’ll be an easy example:

<?xml version="1.0" encoding="UTF-8" ?> 
    <!-- This is a line comment  --> 
    <email>
    <from>John Smith</from>
    <to>Mery Adams</to> 
    <cc>Keyla Brown</cc> 
    <title>Project report</title> 
    <body>Hello Mery!! The book is finished!!</body> 
</email>

Let's look at the STxT version:

# This is a line comment
Email (www.example.com/email.stxt):
    From: John Smith
    To: Mery Adams
    Cc: Keyla Brown
    Title: Project report
    Body: Hello Mery!! The book is finished!!

I think a first characteristic is immediately apparent:

STxT is much prettier than XML and easier to understand.

Let's talk about size.

XML Length: 254
STxT Length: 190

Let’s see what happens when we compact it as much as possible:

<?xml version="1.0" encoding="UTF-8" ?><!-- This is a line comment  -->
<email><from>John Smith</from><to>Mery Adams</to><cc>Keyla Brown</cc>
<title>Project report</title><body>Hello Mery!! The book
is finished!!</body></email>

# This is a line comment
Email(www.example.com/email.stxt):
    From:John Smith
    To:Mery Adams
    Cc:Keyla Brown
    Title:Project report
    Body:Hello Mery!! The book is finished!!

Minimum XML Length: 225
Minimum STxT Length: 172

If we compare without XML header or STxT namespace:

Minimum XML Length: 186
Minimum STxT Length: 149

I think it’s clear that:

STxT is more compact than XML

But also:

STxT is more understandable than XML when compacted.

Or put another way:

STxT maintains its readability when compacted,
while XML does not.

By the way, did we lose any information? No, but what would happen if there were attributes? We'll talk about that later. For now, trust me:

With STxT, you can express the same as with XML.

Now let’s try something fun... What does an XML document look like with a text node that itself contains XML?

Let's see:

<?xml version="1.0" encoding="UTF-8" ?> 
<!-- This is a line comment  --> 
<email>
<from>John Smith</from>
<to>Mery Adams</to> 
<cc>Keyla Brown<cc> 
<title>Project report</title> 
<body>
    Hello Mery!! The book is finished!!
    &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt; 
    &lt;!-- This is a line comment  --&gt; 
    &lt;email&gt;
    &lt;from&gt;John Smith&lt;/from&gt;
    &lt;to&gt;Mery Adams&lt;/to&gt; 
    &lt;cc&gt;Keyla Brown&lt;cc&gt; 
    &lt;title&gt;Project report&lt;/title&gt; 
    &lt;body&gt;Hello Mery!! The book is finished!!&lt;/body&gt; 
</body> 
</email>

And in STxT:

Email (www.example.com/email.stxt):
From: John Smith
To: Mery Adams
Cc: Keyla Brown
Title: Project report
Body:
    Hello Mery!! The book is finished!!
    <email>
    <from>John Smith</from>
    <to>Mery Adams</to> 
    <cc>Keyla Brown<cc> 
    <title>Project report</title> 
    <body>Hello Mery!! The book is finished!!</body> 
    </email>

I think no further comment is needed. It’s obvious, but:

STxT is simpler than XML.

XML Attributes

Let’s take an example:

<example id="1" show="false">
    Hello World
</example>

Attributes in XML have always been a source of controversy. What is an attribute and what should be a node? It’s always hard to decide. It is fairly accepted that attributes are like metadata of the node, that is, they provide information but not content. There are cases where this is acceptable, but (in my humble opinion) in most cases they are a source of problems.

Another added problem is that everyone has to keep in mind that attributes exist. This affects DTDs, XSDs, libraries, programmers... And for what? They really don’t add much. Just unnecessary complexity. Furthermore, this is STxT. Everything has meaning, and everything is important.

STxT does not have attributes in the style of XML.

This makes it BETTER, not worse. Sometimes, LESS IS MORE ;-)

The previous example in STxT would be:

example(...):
    id:1
    show:false
    text:Hello World

In general, we can always do something in the form:

node_name:
    metadata:
        m1:xxx
        m2:xxx
        m3:xxx
    node2:
    node3:

DTD, XSD

XSD, DTD, and STxT

Let’s talk about XML’s DTDs and XSDs. They are documents that tell us how a valid XML should be. They work more or less well, and each has its advantages and disadvantages. If you look around the Internet a bit, you’ll see what I mean.

In summary, DTD is simpler than XSD, less powerful, and is written in a language different from XML. XSD is more powerful, harder to make and understand, but it’s written in XML, so there’s no need to learn another language.

And what about STxT? You should know :-D It has the best. It’s a STxT document. Integrated into the language itself. Powerful and easy to learn. Like STxT itself ;-)

Where is it?

I would like to see a typical XML problem disappear:

Where is the grammar of this document?

In STxT, EVERYTHING is on the web. When you want to create a document, you must automatically be able to see its definition. The grammar must always be accessible.

A STxT document by definition has an associated grammar.
If not, it’s not STxT.

XSD of the XSD (*)

Does anyone want to compare the XSD of an XSD? I was tempted to include it in the book, but it would have taken up more than 50 pages, and I’m not exaggerating :-D

I’ll leave the link here for anyone who wants to see it:

[[http:_www.w3.org/2001/XMLSchema.xsd|http:_www.w3.org/2001/XMLSchema.xsd|xsd]]

Is there anyone who understands it? Sorry, sorry. Is there anyone who isn’t a superhero who understands it?

Everything is much more complicated. With STxT we have sought simplicity. We like to do things by hand! It shouldn’t be mandatory to have an XML editor or reader!

Parsers and Validators

Does anyone want to compare parsers and validators? It’s unnecessary. One with STxT is MUCH faster and easier to implement. Why? It’s very simple:

STxT has far fewer rules than XML.

This makes everything easier. The parser code is faster. They have fewer errors. It’s easier to maintain. Easier to make. Moreover, another advantage is that parsing and validating can be done simultaneously. With XML, you have to decide whether you really need to validate the XSD... and where is it? We’re back to the same old problem. With STxT, everything is much clearer. Every document must have a definition of how it is. But it’s very easy to do. It doesn’t take long. It’s worth it. It’s one of the big advantages of STxT.

Note: For those who love private information: if the STxT parser or editor has its own STxT grammar repository, it doesn’t have to be visible on the web. I don’t like it; I’m against it, but nothing and no one prevents this from being done. In fact, in some cases, it will be necessary, as I had to do in my own book before the web was published.

Namespaces

I’ve always hated XML namespaces. They are cumbersome, difficult to control, and usually tell us nothing.

Namespaces in STxT are completely different because they are not hard to make, they provide all the necessary information, and you don’t lose the expressiveness that you get with XML. In fact, you don’t even see them! But they are always there, helping. Completing the work.

In XML, namespaces are annoying, add complexity, and give very little in return.

By the way, we’ve banished the prefix {{{http:}}} from the namespace. It’s no longer necessary because that would imply it’s different if obtained with {{{http:}}} than if obtained with {{{https:}}}. This doesn’t make sense. What does make sense is that it’s from {{{}}}.

The Future

I’ve been very honest with my analysis of XML vs. STxT, but there’s something we must not forget. XML has been around for a long time. It’s tested, it’s used everywhere. It’s a standard. STxT is new; it has just been born and has a long way to go.

But I think it’s worth it.

STxT: The Book

A language for the web