<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title type="main">TEI by Example</title>
        <title type="sub">Module 7: Critical Editing</title>
        <author xml:id="RvdB">Ron Van den Branden</author>
        <editor xml:id="EV">Edward Vanhoutte</editor>
        <editor xml:id="MT">Melissa Terras</editor>
        <sponsor>Association for Literary and Linguistic Computing (ALLC)</sponsor>
        <sponsor>Centre for Data, Culture and Society, University of Edinburgh, UK</sponsor> 
        <sponsor>Centre for Digital Humanities (CDH), University College London, UK</sponsor>
        <sponsor>Centre for Computing in the Humanities (CCH), King’s College London, UK</sponsor>
        <sponsor>Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium</sponsor>
        <funder>
          <address>
            <addrLine>Centre for Scholarly Editing and Document Studies (CTB)</addrLine>
            <addrLine>Royal Academy of Dutch Language and Literature</addrLine>
            <addrLine>Koningstraat 18</addrLine>
            <addrLine>9000 Gent</addrLine>
            <addrLine>Belgium</addrLine>
          </address>
          <email>ctb@kantl.be</email>
        </funder>
        <principal>Edward Vanhoutte</principal>
        <principal>Melissa Terras</principal>
      </titleStmt>
      <publicationStmt>
        <publisher>Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium</publisher>
        <distributor>Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium</distributor>
        <pubPlace>Gent</pubPlace>
        <address>
          <addrLine>Centre for Scholarly Editing and Document Studies (CTB)</addrLine>
          <addrLine>Royal Academy of Dutch Language and Literature</addrLine>
          <addrLine>Koningstraat 18</addrLine>
          <addrLine>9000 Gent</addrLine>
          <addrLine>Belgium</addrLine>
        </address>
        <availability status="free">
          <p>Licensed under a <ref target="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution ShareAlike 3.0 License</ref>
                    </p>
        </availability>
        <date when="2010-07-09">9 July 2010</date>
      </publicationStmt>
      <seriesStmt>
        <title>TEI By Example.</title>
        <respStmt>
          <name>Edward Vanhoutte</name>
          <resp>editor</resp>
        </respStmt>
        <respStmt>
          <name>Ron Van den Branden</name>
          <resp>editor</resp>
        </respStmt>
        <respStmt>
          <name>Melissa Terras</name>
          <resp>editor</resp>
        </respStmt>
      </seriesStmt>
      <sourceDesc>
        <p>Digitally born</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <projectDesc>
        <p>TEI By Example offers a series of freely available online tutorials walking individuals through the different stages in marking up a document in TEI (Text Encoding Initiative). Besides a general introduction to text encoding, step-by-step tutorial modules provide example-based introductions to eight different aspects of electronic text markup for the humanities. Each tutorial module is accompanied with a dedicated examples section, illustrating actual TEI encoding practise with real-life examples. The theory of the tutorial modules can be tested in interactive tests and exercises.</p>
      </projectDesc>
    </encodingDesc>
    <profileDesc>
      <langUsage>
        <language ident="en-GB">en-GB</language>
      </langUsage>
    </profileDesc>
    <revisionDesc>
      <change when="2020-07-02" who="#RvdB">proofing corrections</change>
      <change when="2020-06-12" who="#RvdB">technical revision</change>
      <change when="2010-07-13" who="#RvdB">
                <list>
                    <item>added distinction <gi>gi</gi> — <tag>gi scheme="..."</tag> — <gi>tag</gi>
                    </item>
        <item>final spellcheck</item>
                </list>
            </change>
      <change when="2010-07-09" who="#RvdB">release</change>    
      <change when="2009-09-28" who="RvdB">authoring</change>
    </revisionDesc>
  </teiHeader>
  <text xml:id="TBED07v00" type="tutorials">
    <body>
            <div xml:id="appCaveat">
        <head>Caveats</head>
        <p>Until the release of <ref target="https://tei-c.org/Vault/P5/2.9.1/doc/tei-p5-doc/en/html/">version 2.9.1</ref> of the TEI Guidelines in 2015, <gi>lem</gi> and <gi>rdg</gi> could only contain phrase-level elements. For a very long time, this had caused problems for variants that involve larger structural units. Yet, since version 2.9.1, <gi>lem</gi> and <gi>rdg</gi> can contain chunk-level elements such as <gi>div</gi>, <gi>p</gi>, <gi>ab</gi>, <gi>lg</gi>, and <gi>l</gi>. This addition has greatly increased the use of <gi>lem</gi> and <gi>rdg</gi> for encoding real-life textual variation.</p>
        <p>One tough problem remains, however, when textual variation occurs on a structural level. For example, if you look closely at the facsimiles of the TEI Guidelines above (see <ptr type="crossref" target="#figure1 #figure2 #figure3 #figure4"/>), you’ll notice that there is a paragraph shift at the sentence starting with <q>Historically, the word markup has been used </q>:
          <list rend="bulleted">
            <item>in the <ident>p2</ident> and <ident>p3</ident> versions, this sentence starts the third paragraph</item>
            <item>in the <ident>p4</ident> and <ident>p5</ident> versions, this sentence is part of the second paragraph</item>
          </list>
        </p>
        <p>This poses a harder encoding problem, as it involves markup itself (i.e., the end and start tag of the third paragraph are the subject of variation). As XML requires proper nesting of elements, this is a problem in any XML representation of this kind of structural variation. Again, two strategies could be followed (none of which is ideal, however): 
          <list rend="bulleted">
            <item>Encode structural variants as variant structures. However, this may obscure their alignment.</item>
            <item>Encode structural variants using milestone elements instead of full-blown XML structures. However, depending on your view on the texts, this could be considered a less orthodox approach, as it implies some notion of a base text that determines the encoding of the others.</item>
          </list>
        </p>
        <p>The first option would compare the individual transcriptions of these text witnesses, some of which spread more or less the same textual contents over 3 paragraphs, while others use only 2 paragraphs. In a parallel segmented apparatus, this might look as follows:
          <figure xml:id="example29">
            <egXML xmlns="http://www.tei-c.org/ns/Examples">
              <app>
                <rdg wit="#p2_p #p3_p">
                  <p>SGML is an international standard for the description of marked-up electronic text. More exactly, SGML is a <app>
                                            <rdg wit="#p2_p">metalanguage</rdg>
                                            <rdg wit="#p3_p">
                                                <hi>metalanguage</hi>
                                            </rdg>
                                        </app>, that is, a means of formally describing a language, in this case, a <app>
                                            <rdg wit="#p2_p">markup language</rdg>
                                            <rdg wit="#p3_p">
                                                <hi>markup language</hi>
                                            </rdg>
                                        </app>. Before going any further we should define these terms.</p>
                </rdg>
                <rdg wit="#p4 #p5"/>
              </app>
              <p>
                                <app>
                                    <rdg wit="#p4">XML is an extensible markup language used for the description of marked-up electronic text. More exactly, XML is a <hi>metalanguage</hi>, that is, a means of formally describing a language, in this case, a <hi>markup language</hi>.</rdg>
                                    <rdg wit="#p5">Strictly speaking, XML is a metalanguage, that is, a language used to describe other languages, in this case, markup languages.</rdg>
                                    <rdg wit="#p2_p #p3_p"/>
                                </app>Historically, the word<!-- ... -->
              </p>
            </egXML>
            <head type="legend">Encoding structural variants as variant structures.</head>
          </figure>
        </p>
        <p>This approach treats the shifting paragraph as a variant in its own right, that is present in some witnesses (<ident>p2</ident> and <ident>p3</ident>), while absent in the others (<ident>p4</ident> and <ident>p5</ident>). The second apparatus entry then omits the text of <ident>p2</ident> and <ident>p3</ident>, while including the (corresponding) text of <ident>p4</ident> and <ident>p5</ident>. However, as this example illustrates, the alignment of the corresponding text fragments between both groups of witnesses (those starting a new paragraph and those that don’t) is lost: there is no way of telling how the phrases <q>SGML is an international standard <gap/> . More exactly, SGML <gap/>
                    </q> (in <ident>p2</ident> and <ident>p3</ident>) and <q>XML is an extensible markup language <gap/> . More exactly, XML <gap/>
                    </q> correspond. This kind of encoding could be less problematic when <emph>generating</emph> an electronic critical edition (in which case the more complicated apparatus encoding could be generated by an automatic collation routine). When <emph>creating</emph> a digital edition, the construction of such a more complex apparatus entry could be less desirable.</p>
        <p>The other solution would be to encode the paragraph break in the <ident>p2</ident> and <ident>p3</ident> versions using an empty <soCalled>milestone</soCalled> marker: an empty element that indicates some kind of structural boundary in the text where it occurs, as in this parallel segmented example:
          <figure xml:id="example30">
            <egXML xmlns="http://www.tei-c.org/ns/Examples">
              <p>
                                <app>
                                    <rdg wit="#p2 #p3">SGML is an international standard for the description of marked-up electronic text. More exactly</rdg>
                                    <rdg wit="#p4">XML is an extensible markup language used for the description of marked-up electronic text. More exactly</rdg>
                                    <rdg wit="#p5">Strictly speaking</rdg>
                                </app>, <app>
                                    <rdg wit="#p2 #p3">SGML</rdg>
                                    <rdg wit="#p4 #p5">XML</rdg>
                                </app> is a <app>
                                    <rdg wit="#p2 #p5">metalanguage</rdg>
                                    <rdg wit="#p3 #p4">
                                        <hi>metalanguage</hi>
                                    </rdg>
                                </app>, that is, a <app>
                                    <rdg wit="#p2 #p3 #p4">means of formally describing a language</rdg>
                                    <rdg wit="#p5">language used to describe other languages</rdg>
                                </app>, in this case, <app>
                                    <rdg wit="#p2">a markup language</rdg>
                                    <rdg wit="#p3 #p4">a <hi>markup language</hi>
                                    </rdg>
                                    <rdg wit="#p5">markup languages</rdg>
                                </app>. <app>
                                    <rdg wit="#p2 #p3">Before going any further we should define these terms. <milestone unit="p"/>
                                    </rdg>
                                    <rdg wit="#p4 #p5"/>
                                </app>Historically, the word <!-- ... -->
                            </p>
            </egXML>
            <head type="legend">Encoding structural variation with <soCalled>milestone</soCalled> markers.</head>
          </figure>
        </p>
        <p>Since the milestone paragraph boundary marker (<tag type="empty">milestone unit="p"</tag>) removes the intrusive XML boundaries, this allows us to compare the text between all versions. However, this implies that the encoding of the third paragraph in the <ident>p2</ident> and <ident>p3</ident> versions is <emph>suppressed</emph>, in contrast to the other paragraphs in these text versions. This could be less a problem when <emph>creating</emph> an electronic critical edition, rather than when generating one. In the latter case, the milestone encoding would reflect a dependency on a base text (that does not have the paragraph break). Moreover, it presupposes some kind of structural alignment prior to the encoding of the individual texts.</p>
        <note type="summary">Problems can arise when the variation involves text structures as well, giving rise to problems of overlapping XML structures. This can be avoided by either ignoring the possible alignment of such structures in the apparatus, or paraphrasing some structural boundaries with empty milestone elements.</note>
      </div>
        </body>
    <back>
      <div type="bibliography">
        <listBibl>
          <bibl xml:id="vanhoutte2009">
                        <author>Vanhoutte, Edward</author>, and <author>Ron Van den Branden</author>. <date>2009</date>. <title level="a">Describing, Transcribing, Encoding, and Editing Modern Correspondence Material: a Textbase Approach</title>. <title level="j">Literary and Linguistic Computing</title> <biblScope unit="volume">24</biblScope> (<biblScope unit="issue">1</biblScope>): <biblScope unit="page">77–98</biblScope>. <idno type="DOI">10.1093/llc/fqn035</idno>.</bibl>
        </listBibl>
      </div>
    </back>
  </text>
  <!--
    $Date: 2020-11-16 12:48:08 +0100 (Mon, 16 Nov 2020) $
    $Id: TBED07v00.xml 462 2020-11-16 11:48:08Z ron.vandenbranden $  -->
</TEI>