<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en-GB">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title type="main">TEI by Example</title>
        <title type="sub">Module 0: Introduction to Text Encoding and the TEI</title>
        <author xml:id="EV">Edward Vanhoutte</author>
        <editor xml:id="RvdB">Ron Van den Branden</editor>
        <editor xml:id="MT">Melissa Terras</editor>
        <sponsor>Association for Literary and Linguistic Computing (ALLC)</sponsor>
        <sponsor>Centre for Data, Culture and Society, University of Edinburgh, UK</sponsor>
        <sponsor>Centre for Digital Humanities (CDH), University College London, UK</sponsor>
        <sponsor>Centre for Computing in the Humanities (CCH), King’s College London, UK</sponsor>
        <sponsor>Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium</sponsor>
        <funder>
          <address>
            <addrLine>Centre for Scholarly Editing and Document Studies (CTB)</addrLine>
            <addrLine>Royal Academy of Dutch Language and Literature</addrLine>
            <addrLine>Koningstraat 18</addrLine>
            <addrLine>9000 Gent</addrLine>
            <addrLine>Belgium</addrLine>
          </address>
          <email>ctb@kantl.be</email>
        </funder>
        <principal>Edward Vanhoutte</principal>
        <principal>Melissa Terras</principal>
      </titleStmt>
      <publicationStmt>
        <publisher>Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium</publisher>
        <distributor>Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium</distributor>
        <pubPlace>Gent</pubPlace>
        <address>
          <addrLine>Centre for Scholarly Editing and Document Studies (CTB)</addrLine>
          <addrLine>Royal Academy of Dutch Language and Literature</addrLine>
          <addrLine>Koningstraat 18</addrLine>
          <addrLine>9000 Gent</addrLine>
          <addrLine>Belgium</addrLine>
        </address>
        <availability status="free">
          <p>Licensed under a <ref target="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution ShareAlike 3.0 License</ref>
                    </p>
        </availability>
        <date when="2010-07-09">9 July 2010</date>
      </publicationStmt>
      <seriesStmt>
        <title>TEI by Example.</title>
        <respStmt>
          <name>Edward Vanhoutte</name>
          <resp>editor</resp>
        </respStmt>
        <respStmt>
          <name>Ron Van den Branden</name>
          <resp>editor</resp>
        </respStmt>
        <respStmt>
          <name>Melissa Terras</name>
          <resp>editor</resp>
        </respStmt>
      </seriesStmt>
      <sourceDesc>
        <p>Digitally born</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <projectDesc>
        <p>TEI by Example offers a series of freely available online tutorials walking individuals through the different stages in marking up a document in TEI (Text Encoding Initiative). Besides a general introduction to text encoding, step-by-step tutorial modules provide example-based introductions to eight different aspects of electronic text markup for the humanities. Each tutorial module is accompanied with a dedicated examples section, illustrating actual TEI encoding practise with real-life examples. The theory of the tutorial modules can be tested in interactive tests and exercises.</p>
      </projectDesc>
    </encodingDesc>
    <profileDesc>
      <langUsage>
        <language ident="en-GB">en-GB</language>
      </langUsage>
    </profileDesc>
    <revisionDesc>
      <change when="2020-06-23" who="#RvdB">technical revision</change>
      <change when="2010-07-23" who="#RvdB">fixed broken link and (example) character encoding</change>
      <change when="2010-07-13" who="#RvdB">
                <list>
                    <item>added distinction <gi>gi</gi> — <tag>gi scheme="..."</tag> — <gi>tag</gi>
                    </item>
        <item>final spellcheck</item>
                </list>
            </change>
      <change when="2010-07-08" who="#RvdB">release</change>
      <change when="2009-12-16" who="#EV">Added documentation on how to associate entity declarations with a document instance under 4.2.3.</change>
      <change when="2009-11-20" who="#EV">
                <list>
                    <item>Added new section 4. XML ground rules: to be finished</item>
                    <item>Added new section 5.3 Using TEI: to be revised</item>
                </list>
            </change>
      <change when="2009-06-11" who="#RvdB">-reshuffled modules: TBED01v00 has become TBED00v00; updated TBED00v00.xml</change>
      <change when="2009-09-10" who="#EV">Revision</change>
      <change when="2008-02-19" who="#EV">XML-izing text</change>
    </revisionDesc>
  </teiHeader>
  <text xml:id="TBED00v00" type="tutorials">
    <body>
            <div xml:id="markuplanguages">
        <head>Markup Languages in the Humanities</head>
        <div xml:id="descriptive">
          <head>Procedural and Descriptive Markup</head>
          <p>When human beings read texts, they perceive both the information stored in the linguistic code of the text and the meta-information which is inferred from the appearance and interpretation of the text. By convention, italics are, for instance, used as a code signalling a title of a book, play, or movie; a foreign word or phrase; or emphatic use of the language. Through their cognitive abilities, readers usually have no problems selecting the most appropriate interpretation of an italic string of text. Computers, however, need to be informed about these issues in order to be able to process them. This can be done by way of a markup language that provides rules to formally separate information (the text in a document) from meta-information (information about the text in a document). Whereas markup languages in use in the typesetting community were mainly of a procedural nature—that is, they indicate procedures that a particular application should follow—(e.g., printing a string of text in italics), the humanities were also and mainly considered with descriptive markup that identifies the entity type of tokens (e.g., identifying that a string of text is a title of a book or a foreign word). Unlike procedural or presentational markup, descriptive markup establishes a one to one mapping between logical elements in the text and their markup. In order to achieve this, descriptive markup languages tend to formally separate information (the text in a document) from meta-information (information about the text in a document).</p>
        </div>
        <div xml:id="earlyattempts">
          <head>Early Attempts</head>
          <p>Some sort of standardisation of markup for the encoding and analysis of literary texts was reached by the COCOA encoding scheme originally developed for the COCOA program in the 1960s and 1970s (<ref type="bibl" target="#russel1967">Russel 1967</ref>), but used as an input standard by the Oxford Concordance Program (OCP) in the 1980s (<ref type="bibl" target="#hockey1980">Hockey 1980</ref>) and by the Textual Analysis Computing Tools (TACT) in the 1990s (<ref type="bibl" target="#lancashire1996">Lancashire et al. 1996</ref>). For the transcription and encoding of classical Greek texts, the Beta-transcription/encoding system reached some level of standardised use (<ref type="bibl" target="#berkowitz1986">Berkowitz, Squitier, and Johnson 1986</ref>).</p>
        </div>
        <div xml:id="SGML">
          <head>The Standard Generalized Markup Language (SGML)</head>
          <p>The call for a markup language that could guarantee reusability, interchange, system- and software-independence, portability and collaboration in the humanities was answered by the publication of the Standard Generalized Markup Language (SGML) as an ISO standard in 1986 (ISO 8879:1986) (<ref type="bibl" target="#goldfarb1990">Goldfarb 1990</ref>). Based on IBM’s Document Composition Facility Generalized Markup Language, SGML was developed mainly by Charles Goldfarb as a metalanguage for the description of markup schemes that satisfied at least seven requirements for an encoding standard (<ref type="bibl" target="#barnard1988">Barnard, Fraser, and Logan 1988, 28–29</ref>):
            <list rend="ordered">
              <item>The requirement of comprehensiveness;</item>
              <item>The requirement of simplicity;</item>
              <item>The requirement that documents be processable by software of moderate complexity;</item>
              <item>The requirement that the standard not be dependent on any particular characteristic set or text-entry devise;</item>
              <item>The requirement that the standard not be geared to any particular analytic program or printing system;</item>
              <item>The requirement that the standard should describe text in editable form;</item>
              <item>The requirement that the standard allow the interchange of encoded texts across communication networks.</item>
            </list>
          </p>
          <p>In order to achieve universal exchangeability and software and platform independence, SGML made use exclusively of the ASCII codes. As mentioned above, SGML is not a markup language itself, but a metalanguage by which one can create separate markup languages for separate purposes. This means that SGML defines the rules and procedures to specify the vocabulary and the syntax of a markup language in a formal Document Type Definition (DTD). Such a DTD is a formal description of, for instance, names for all elements, names and default values for their attributes, rules about how elements can nest and how often they can occur, and names for re-usable pieces of data (entities). The DTD enables full control, parsing, and validation of SGML encoded documents. By and large the most popular SGML DTD is the Hypertext Markup Language (HTML) developed for the exchange of graphical documents over the internet.</p>
          <p>A markup scheme with all these qualities was exactly what the humanities were looking for in their quest for a descriptive encoding standard for the preparation and interchange of electronic texts for scholarly research. There was a strong consensus among the computing humanists that SGML offered a better foundation for research oriented text encoding than other such schemes (<ref type="bibl" target="#barnard1988">Barnard, Fraser, and Logan 1988, 26–31</ref>; <ref type="bibl" target="#barnard1988b">Barnard et al. 1988</ref>). From the beginning, however, SGML was also criticised for at least two problematic matters: SGML’s hierarchical perspective on text, i.e., the representation of text as a hierarchical tree structure, and SGML’s verbose markup system (<ref type="bibl" target="#barnard1988b">Barnard et al. 1988</ref>). These two issues have since been central to the theoretical and educational debates on markup languages in the humanities.</p>
        </div>
        <div xml:id="XML">
          <head>The eXtensible Markup Language (XML)</head>
          <p>The publication of the eXtensible Markup Language (XML) 1.0 as a W3C recommendation in 1998 (<ref type="bibl" target="#bray1998">Bray, Paoli, and Sperberg-McQueen 1998</ref>) brought together the best features of SGML and HTML and soon achieved huge popularity. Among the power XML borrowed from SGML are the explicitness of descriptive markup, the expressive power of hierarchic models, the extensibility of markup languages, and the possibility to validate a document against a DTD. From HTML it borrowed simplicity and the possibility to work without a DTD. Technically speaking, XML is a subset of SGML and the recommendation was developed by a group of people with a long standing experience in SGML, many of whom were TEI members.</p>
          <p>Because of its advantages and widespread popularity, XML became the metalanguage of choice for expressing the rules for descriptive text encoding in TEI.</p>
        </div>
      </div>
        </body>
    <back>
      <div type="bibliography">
        <listBibl>
          <bibl xml:id="barnard1988">
                        <author>Barnard, David T.</author>, <author>Cheryl A. Fraser</author>, and <author>George M. Logan</author>. <date>1988</date>. <title level="a">Generalized Markup for Literary Texts</title>. <title level="j">Literary and Linguistic Computing</title> <biblScope unit="volume">3</biblScope> (<biblScope unit="issue">1</biblScope>): <biblScope unit="page">26–31</biblScope>. <idno type="DOI">10.1093/llc/3.1.26</idno>.</bibl>
          <bibl xml:id="barnard1988b">
                        <author>Barnard, David T.</author>, <author>Ron Hayter</author>, <author>Maria Karababa</author>, <author>George M. Logan</author>, and <author>John McFadden</author> <date>1988</date>. <title level="a">SGML-Based Markup for Literary Texts: Two Problems and Some Solutions</title>. <title level="j">Computers and the Humanities</title> <biblScope unit="volume">22</biblScope> (<biblScope unit="issue">4</biblScope>): <biblScope unit="page">265–276</biblScope>.</bibl>
          <bibl xml:id="berkowitz1986">
                        <author>Berkowitz, Luci</author>, <author>Karl A. Squitier</author>, and <author>William H. A. Johnson</author>. <date>1986</date>. <title level="m">Thesaurus Linguae Graecae, Canon of Greek Authors and Works.</title> <pubPlace>New York/Oxford</pubPlace>: <publisher>Oxford University Press</publisher>.</bibl>
          <bibl xml:id="bray1998">
                        <editor>Bray, Tim</editor>, <editor>Jean Paoli</editor>, and <editor>C. M. Sperberg-McQueen</editor>. <title level="m">Extensible Markup Language (XML) 1.0.</title> W3C Recommendation 10-February-1998. <ptr target="http://www.w3.org/TR/1998/REC-xml-19980210"/> (accessed September 2008).</bibl>
          <bibl xml:id="burnard1988">
                        <author>Burnard, Lou</author> <date>1988</date>. <title level="a">Report of Workshop on Text Encoding Guidelines</title>. <title level="j">Literary and Linguistic Computing</title> <biblScope unit="volume">3</biblScope> (<biblScope unit="issue">2</biblScope>): <biblScope unit="page">131–133</biblScope>. <idno type="DOI">10.1093/llc/3.2.131</idno>.</bibl>
          <bibl xml:id="burnard2006">
                        <author>Burnard, Lou</author>, and <author>C. M. Sperberg-McQueen</author>. <date>2006</date>. <title level="u">TEI Lite: Encoding for Interchange: an introduction to the TEI Revised for TEI P5 release</title>. February 2006 <ptr target="https://tei-c.org/release/doc/tei-p5-exemplars/html/tei_lite.doc.html"/>.</bibl>
          <bibl xml:id="derose1999">
                        <author>DeRose, Steven J.</author> <date>1999</date>. <title level="a">XML and the TEI</title>. <title level="j">Computers and the Humanities</title> <biblScope unit="volume">33</biblScope> (<biblScope unit="issue">1–2</biblScope>): <biblScope unit="page">11–30</biblScope>.</bibl>
          <bibl xml:id="goldfarb1990">
                        <author>Goldfarb, Charles F.</author> <date>1990</date>. <title level="m">The SGML Handbook</title>. <pubPlace>Oxford</pubPlace>: <publisher>Clarendon Press</publisher>.</bibl>
          <bibl xml:id="hockey1980">
                        <author>Hockey, Susan</author> <date>1980</date>. <title level="m">Oxford Concordance Program Users’ Manual</title>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford University Computing Service</publisher>.</bibl>
          <bibl xml:id="ide1988">
                        <author>Ide, Nancy M.</author>, and <author>C. M. Sperberg-McQueen</author>. <date>1988</date>. <title level="a">Development of a Standard for Encoding Literary and Linguistic Materials</title>. In <title level="m">Cologne Computer Conference 1988. Uses of the Computer in the Humanities and Social Sciences. Volume of Abstracts.</title> Cologne, Germany, Sept 7–10 1988, p. <biblScope unit="page">E.6-3-4</biblScope>.</bibl>
          <bibl xml:id="ide1995">
                        <author>Ide, Nancy M.</author>, and <author>C. M. Sperberg-McQueen</author>. <date>1995</date>. <title level="a">The TEI: History, Goals, and Future</title>. <title level="j">Computers and the Humanities</title> <biblScope unit="volume">29</biblScope> (<biblScope unit="issue">1</biblScope>): <biblScope unit="page">5–15</biblScope>.</bibl>
          <bibl xml:id="kay1967">
                        <author>Kay, Martin</author> <date>1967</date>. <title level="a">Standards for Encoding Data in a Natural Language</title>. <title level="j">Computers and the Humanities</title>, <biblScope unit="volume">1</biblScope> (<biblScope>5</biblScope>): <biblScope unit="page">170–177</biblScope>.</bibl>
          <bibl xml:id="lancashire1996">
                        <author>Lancashire, Ian</author>, <author>John Bradley</author>, <author>Willard McCarty</author>, <author>Michael Stairs</author>, and <author>Terence Russon Woolridge</author>. <date>1996</date> <title level="m">Using TACT with Electronic Texts</title>. <pubPlace>New York</pubPlace>: <publisher>Modern Language Association of America</publisher>.</bibl>
          <bibl xml:id="russel1967">
                        <author>Russel, D. B.</author> <date>1967</date>. <title level="m">COCOA: A Word Count and Concordance Generator for Atlas</title>. <pubPlace>Chilton</pubPlace>: <publisher>Atlas Computer Laboratory</publisher>.</bibl>
          <bibl xml:id="msmq1991">
                        <author>Sperberg-McQueen, C. M.</author> <date>1991</date>. <title level="a">Text in the Electronic Age: Textual Study and Text Encoding with examples from Medieval Texts</title>. <title level="j">Literary and Linguistic Computing</title> <biblScope unit="volume">6</biblScope> (<biblScope unit="issue">1</biblScope>): <biblScope unit="page">34–46</biblScope>. <idno type="DOI">10.1093/llc/6.1.34</idno>.</bibl>
          <bibl xml:id="msmq1990">
                        <editor>Sperberg-McQueen, C. M.</editor>, and <editor>Lou Burnard</editor> (eds.). <date>1990</date>. <title level="m">TEI P1: Guidelines for the Encoding and Interchange of Machine Readable Texts</title>. <pubPlace>Chicago/Oxford</pubPlace>: <publisher>ACH-ALLC-ACL Text Encoding Initiative</publisher>. <ptr target="https://tei-c.org/Vault/Vault-GL.html"/> (accessed October 2008).</bibl>
          <bibl xml:id="msmq1993">
                        <editor>Sperberg-McQueen, C. M.</editor>, and <editor>Lou Burnard</editor> (eds.). <date>1993</date>. <title level="m">TEI P2 Guidelines for the Encoding and Interchange of Machine Readable Texts</title> Draft P2 (published serially 1992–1993); Draft Version 2 of April 1993: 19 chapters. <ptr target="https://tei-c.org/Vault/Vault-GL.html"/> (accessed October 2008).</bibl>
          <bibl xml:id="msmq1994">
                        <editor>Sperberg-McQueen, C. M.</editor>, and <editor>Lou Burnard</editor> (eds.). <date>1994</date>. <title level="m">Guidelines for Electronic Text Encoding and Interchange. TEI P3.</title> <pubPlace>Oxford, Providence, Charlottesville, Bergen</pubPlace>: <publisher>Text Encoding Initiative</publisher>.</bibl>
          <bibl xml:id="msmq1999">
                        <editor>Sperberg-McQueen, C. M.</editor>, and <editor>Lou Burnard</editor> (eds.). <date>1999</date>. <title level="m">Guidelines for Electronic Text Encoding and Interchange. TEI P3. Revised reprint.</title> <pubPlace>Oxford, Providence, Charlottesville, Bergen</pubPlace>: <publisher>Text Encoding Initiative</publisher>.</bibl>
          <bibl xml:id="msmq2002">
                        <editor>Sperberg-McQueen, C. M.</editor>, and <editor>Lou Burnard</editor> (eds.). <date>2002</date>. <title level="m">TEI P4: Guidelines for Electronic Text Encoding and Interchange. XML-compatible edition.</title> XML conversion by Syd Bauman, Lou Burnard, Steven DeRose, and Sebastian Rahtz. <pubPlace>Oxford, Providence, Charlottesville, Bergen</pubPlace>: <publisher>Text Encoding Initiative Consortium</publisher>. <ptr target="https://tei-c.org/Vault/P4/doc/html/"/> (accessed October 2008).</bibl>
          <bibl xml:id="tei2007">
                        <orgName>TEI Consortium</orgName>. <date>2007</date>. <title level="m">TEI P5: Guidelines for Electronic Text Encoding and Interchange</title>. <pubPlace>Oxford, Providence, Charlottesville, Nancy</pubPlace>: <publisher>TEI Consortium</publisher>. <ptr target="https://tei-c.org/Vault/P5/1.0.0/doc/tei-p5-doc/en/html/"/> (accessed October 2008).</bibl>
        </listBibl>
      </div>
    </back>
  </text>
  <!-- 
        $Date: 2020-07-08 02:33:20 +0200 (Wed, 08 Jul 2020) $
        $Id: TBED00v00.xml 425 2020-07-08 00:33:20Z ron.vandenbranden $  -->
</TEI>