Semi-automatic collation of medieval texts.
The role of the base manuscript


ESTS 2016 / DiXiT 3 - Antwerp, 5-7 October

Awareness and critical understanding
in using
computational tools


understand the algorithm,
based on the Gothenburg's model

semi-automatic collation softwares,
here Juxta e CollateX

Why?

  • choose the most appropriate application
  • understand correctly the results
  • recognize and make use of the innovative potential

Gothenburg's model

"Interedition" - Gothenburg, 2009

Juxta + CollateX


  1. Tokenization
  2. Normalization
  3. Alignment
  4. Analysis and feedback
  5. Visualization

Why do we collate?


  • Investige the varia lectio
  • Understand the relationships among the witnesses

Manual collation

  • Choose base manuscript. E.g.: ms. A
  • Compare all witnesses (B, C, D, ...) with A

Why do we choose a base manuscript?


  • availability of the witnesses
  • time
  • difficulty to record, organize and visualize the variants among all the witnesses


practical reasons

Attention!


base manuscript for collation



base manuscript for critical edition

is this the best way to collate

(when the aim of the collation is to understand the relationships among the witnesses)

?

Spencer, Matthew, e Christopher J. Howe. 2004. Collating Texts Using Progressive Multiple Alignment. Computers and the Humanities 38 (3): 253-70.

Distance between the witnesses and the base manuscript

Distance among all the witnesses

Pairwise alignment


Example

A: Dalla collina si vede una grande casa rossa.
B: Dal belvedere si vede una grande casa azzurra.
C: Dalla collina si vede una piccola casa rossa.
D: Dal belvedere si vedono tante case.

Step 1: Pairwise alignment
using 'A' as the base manuscript.


A Dalla collina si vede una grande casa rossa
B Dal belvedere si vede una grande casa azzurra

Dalla] dal B
collina] belvedere B
rossa] azzurra B

A Dalla collina si vede una grande casa rossa
C Dalla collina si vede una piccola casa rossa

grande] piccola C

A Dalla collina si vede una grande casa rossa
D Dal belvedere si vedono tante case

Dalla] dal D
collina] belvedere D
vede] vedono D
una] tante D
grande] om. D
casa] case D
rossa] om. D

Step 2: the results of the pairwise alignment are merged


Dalla] dal B, D
collina] belvedere B, D
vede] vedono D
una] tante D
grande] piccola C, om. D
casa] case D
rossa] azzurra B, om. D

Distance between the witnesses and the base manuscript

Distance among all witnesses

Multiple alignment

Progressive multiple alignment




  1. pairwise alignment
  2. guide tree
  3. order of similarity
  4. alignment following the order of similarity

Drawbacks




  1. pairwise alignment    expensive calculation
  2. guide tree
  3. order of similarity    NP complete problem
  4. alignment following the order of similarity

Among the drawbacks

Order of the tokens in the super-witness created by the serialization of the graph

Dalla collina si     vede una grande piccola casa rossa
A, C A, C A, C A, C A, C A C A, C A, C

Dalla collina si     vede una piccola grande casa rossa
A, C A, C A, C A, C A, C C A A, C A, C

Non progressive
multiple alignment

Es. PicXAA
A B C
the drought the first march the first march
of of of
march drought pierced drought
hath perced - hath perced
to the root to the root to the root
and and -
is - -
this this -
the right the drought of march hath is the -
B C A
the first march of the first march of the
drought drought drought
pierced - of march
- hath perced hath perced
to the root to the root to the root
and - and
- - is
this - this
is the - the right the drought of march hath

Elena Spadini


DiXiT

Huygens ING

Sapienza Universita di Roma

elena.spadini@huygens.knaw.nl

elenaspadini.com