Markup by way of example

The service provider Editura GmbH produces one TEI data-file for every volume whose text has been acquired. Those TEI-files are supplemented with structure data. In a two step process this data is further granulated, which means it is made fit for scientific use. Apart from the TEI-Data also an METS- file is produced. This container file format contains structure data and is needed for the presentation with Goobi as well as for the DFG - Viewer. In the following the tagging strategies shall be explained using three different examples.

1.) Registry

At the end of every volume the "Polytechnisches Journal" contains an extensive person and subject index. Also, four so-called 'real indices' were added that contain about 40 issues each and function as a subject index.
Those registries are also being filed and tagged through OCR.
Later on in the process they will be granulated even further and will be fit into the following TEI-based registry.

The TEI-based Registry by Name

For every person named in the magazine an entry is passed in the head administration XML-file. According to the TEI guidelines those entries have the following structure:

<person xml:id="pers00033">

    <persName>

        <roleName>Prof. Dr.</roleName>

        <surname>Schönbein</surname>

        <forename>Christian Friedrich</forename>

    </persName>

    <birth>

        <date when="1799-10-18"/>

        <placeName>Metzingen bei Reutlingen</placeName>

    </birth>

    <death>

        <date when="1868-08-29"/>

        <placeName>Bad Wildbad (Schwarzwald)</placeName>

    </death>

    <occupation>

        <ref target="http://mdz10.bib-bvb.de/~db/bsb00008390/images/index.html?seite=258">ADB|NDB</ref>

    </occupation>

</person>

Through the xml:id the entry receives a distinct project-ID. Whenever this person is mentioned in the magazine, he or she gets linked to the right data record with ref="&persons;#pers00033". If possible, other sources containing further information regarding the person (eg. from the ADB- Allgemeine deutsche Biografie or NDB- Neue deutsche Biografie) are added to the registry. Perspectively, the data record will be completed by the corresponding PND (Personen Normdatei).

Sources

A similar procedure is applied to the sources named in the "Polytechnisches Journal". The journal is an organ of reference and the editors declared their sources carefully. After the editing a typical entry in the journal could look like the following:

<titlePart type="sub" rendition="#center">

    Aus dem <hi rendition="#wide">Mechanics' Magazine</hi>

    <hi rendition="#roman>N.</hi> 441. S. 290.

</titlePart>


A different strategy uses the approach of marking the roles of all the different text elements. In the example below the information is biographical- The title of a journal, a certain issue and a page number are named.

According to TEI P5 the structure would be as following:


<titlePart type="sub" rendition="#center">
    Aus dem 
   
       <bibl type="source">
      
                <title level="j" ref="&journals;#jour0011">
                Mechanics' Magazine</title>
      
                <biblScope type="iss">N. 441.</biblScope>
                <biblScope type="pp">S. 290.</biblScope>
   
       </bibl>

</titlePart>

In addition an entry is added to an XML-file that acts as an agent for managing all sources centrally and distinctly:

<bibl xml:id="jour0011">
   
                 <title level="j">Mechanics' Magazine: museum, register,                                                      journal, and gazette</title>
  
                  <pubPlace>
  <country>England</country>
  </pubPlace>
                            <date from="1823" to="1873">1823-1873</date>
  
                  <ref target="#ZDB-ID">423434-0</ref>
  
                  <ref target="e-journal">http://rzblx1.uni-regensburg.de/ezeit                                           /?2248618</ref>

</bibl>

2) Links inside the "Polytechnisches Journal"

The editors of the "Polytechnisches Journal" tried to assist their readers by structuring and connecting the heterogeneous topics that were dealt with in the magazine. Therefore a large number of references to articles with similar subject matters can be found inside every issue.
During the step of scientific processing, the digitized material is specifically browsed with the purpose of finding those references, tagging them and providing them with a link that in the text view appears as a hyper link.
With a rather complex find&replace pattern this process now works automatically. We have applied the following search pattern which is based on a regular routine.

 

my $pat = qr/

    (

        (?:\s+)

        (?:

            (?:polyt[^<]*)

            |

            (?:diesem)

            |

            (?:unser[^<]*)

            \s+

        )?

        journ(?:ale?)?

        (?:<\/hi>)?

        [,.]?\s+

    )

    (

        [BV](?:an)?d?\.?\s+

        (?:<hi\s+rendition="\#roman"\s*>)?

        (

            (?:[ivxlcdm]{1,8})

            |

            (?:[1-9][0-9]{0,2})

        )

        (?:<\/hi>)?

        [,.]*

        (?:

            \s+Heft\s+\d\[.,]?

        )?

        \s+

        S(?:eite)?\.?\s+(\d{1,3})

        (?!<\/ref>)

    )/isx;


This example stands for the search of the occurrence of the phrase »polyt. Journal Bd. VI, S. 342«. The regular routine would also find all other possible variations of this expression. For example the numbers do not have to be written in latin, the exact issue doesn't have to be named etc.

3) Text- Image Relations

The "Polytechnisches Journal" is a reference organ that should be accessible for a broad audience. This aspect was given a lot of attention and led to the journal being equipped with high quality, colored images on the last few pages of every issue.

Those tables are made accessible by using links in the text.


<titlePart type="sub" rendition="#center">

    Mit Abbildungen auf

    <ref target="#tab044493">Tab. V</ref>.

</titlePart>