By: Terence Catapano
Location: Studio@Butler
Time: 5 - 6 pm
Weather:
Mark-up adds structure, commentary. Languages include GML, SGML, COCOA, HTML, XML, LMNL. GML was the oldest one. HTML is the most famous one. XML came up in 1998. LMNL is a non-hierarchical language where layer substitues hierarchies. Conferences on Markup discusses its function, future and so on.
Types of Markup include procedural vs descriptive. Others include proleptic vc metaleptic, indicative vs imperative, prospective vs retrospective. we will attempt to do retrospective markup which works on an existing texts. One of the editors of TEI was a humanities professor - a paleaographer.
XML is simplified SGML. It's maintained by the World Wide Web Consortium (W3G), and are open standard and therefore non-proprietary. They are platform independent, non-binary. In most cases, the fundamental base of XML is a document. It is a string of texts and a sequence of nodes like a tree. Mixed content is a feature in XML which is difficult to process and XML allows this specifically.
building blocks of XML are elements - eg: line element <LINE.Now is the winter of our discontent</LINE> - there's a start and an end tag. No space or colons, may not start with a number. Elements may content text or other elements: <LINE><WORD>now<WORD>is</WORD></LINE>
It delineates and labels segments of text. Order is not significant.
Attributes: can only be texts, cant squeeze in an element in an attribute, must begin with a letter, underscore or colon, order is not significant. eg: <LINE n="1"> trans-de=...</LINE>
There should be no overlap. One element cannot start before the previous element ends.
XML character encoding + entities: Use unicodes. Predefined character entities if we need unicode content as individual characters in the text itself:
1. < (<) --- & and ; are the definers.
2. > (>)
3. & (&)
4. ' (')
5. " (")
Can include comments that need not be processed: <!--...-->
More on:
https://github.com/cu-mkp/GR8975/blob/master/presentations/XML_intro.ppt