Title: | Stemmatological Analysis of Textual Traditions |
---|---|
Description: | Explore and analyse the genealogy of textual or musical traditions, from their variants, with various stemmatological methods, mainly the disagreement-based algorithms suggested by Camps and Cafiero (2015) <doi:10.1484/M.LECTIO-EB.5.102565>. |
Authors: | Jean-Baptiste Camps ; Florian Cafiero |
Maintainer: | Jean-Baptiste Camps <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.3.2 |
Built: | 2025-01-19 03:44:59 UTC |
Source: | https://github.com/jean-baptiste-camps/stemmatology |
Data from the tradition of Richart de Fournival, Bestiaire d'Amours, from C. Segre's edition, limited to archetype y with only substantive readings selected.
data(fournival)
data(fournival)
A matrix with 292 observations on the following 10 variables.
A
a numeric vector
B
a numeric vector
C
a numeric vector
D
a numeric vector
E
a numeric vector
H
a numeric vector
I
a numeric vector
J
a numeric vector
K
a numeric vector
O
a numeric vector
Only the manuscripts from archetype y have been retained, in order to have a tradition with limited contamination, and a 10% sample has been taken in the full text. The variant locations have been selected to retain only substantial readings. The data is presented here as used in Camps & Cafiero 2015, without further modifications or corrections. Readings have been converted to numeric codes (0 being omission, and NA an absence of value).
Richart de Fornival. Li Bestiaires d’Amours di maistre Richart de Fornival e li response du bestiaire. edited by Cesare Segre, Milano & Napoli, 1957.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
data(fournival)
data(fournival)
Data from the artificial textual tradition Heinrichi
data(heinrichi)
data(heinrichi)
A matrix with 1208 observations on the following 37 variables.
A
a numeric vector
Ab
a numeric vector
Ac
a numeric vector
Ad
a numeric vector
Ae
a numeric vector
B
a numeric vector
Ba
a numeric vector
Bb
a numeric vector
Bd
a numeric vector
Be
a numeric vector
C
a numeric vector
Ca
a numeric vector
Cb
a numeric vector
Cc
a numeric vector
Cd
a numeric vector
Ce
a numeric vector
Cf
a numeric vector
Da
a numeric vector
E
a numeric vector
F
a numeric vector
G
a numeric vector
H
a numeric vector
I
a numeric vector
J
a numeric vector
K
a numeric vector
L
a numeric vector
M
a numeric vector
N
a numeric vector
O
a numeric vector
P
a numeric vector
R
a numeric vector
S
a numeric vector
T
a numeric vector
V
a numeric vector
W
a numeric vector
X
a numeric vector
Z
a numeric vector
The data comes from an artificial tradition, created under controlled circumstances. The data is presented here as used in Camps & Cafiero 2015, without further modifications or corrections. Readings have been converted to numeric codes (0 being omission, and NA an absence of value).
Roos, Teemu, and Tuomas Heikkilä. ‘Evaluating methods for computer-assisted stemmatology using artificial benchmark data sets’. Literary and Linguistic Computing 24/4 (2009), p. 417–433.
Roos, Teemu, Tuomas Heikkilä, and Petri Myllymäki. ‘Computer-Assisted Stemmatology Challenge’. Helsinki, 2007, https://www.cs.helsinki.fi/u/ttonteri/casc/data.html.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
data(heinrichi)
data(heinrichi)
Import a TEI encoded parallel-segmentation apparatus.
import.TEIApparatus(file = "", appTypes = c("substantive"))
import.TEIApparatus(file = "", appTypes = c("substantive"))
file |
Path to a valid TEI file. |
appTypes |
a vector of strings giving the types of entries that should be retained (as per |
This function takes in input the path to a TEI file, with a <listWit>
, with <witness>
identified by an @xml:id
,
and with <app>
entries encoded in parallel-segmentation mode. Using the witness sigla, it then creates a database of variant locations, with witnesses in columns and variant locations in rows.
The @types
attributes of the <app>
elements are used to assess if they should be included in the variant locations matrix (default: only ‘substantive’ app entries).
The readings are identified either by a code
reflecting their order in the file (1 … n) and omissions by 0. If <app>
entries have @xml:id
, they are used as rownames. Otherwise, the index is used.
Either a numeric matrix, with witnesses in columns, variant locations in rows, and readings coded by a number;
or, if alternative readings were found at some point, a character matrix, with witnesses in columns, variant locations in rows, and, in each cell, one or several readings, coded by numbers and separated by a comma (e.g. '1,2,3', if the witness has three different readings).
The output of this function can be used as input for the PCC
functions.
If you want more control over the conversion, you can use directly the stylesheets available at https://github.com/Jean-Baptiste-Camps/stemmatology-utils.
Jean-Baptiste Camps ([email protected]) & Florian Cafiero
Jean-Baptiste Camps, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://hal.archives-ouvertes.fr/hal-01695903/document.
layout_as_stemma
creates a tree-like layout from an edgelist,
where nodes are placed horizontally according to
a measure of distance from their parent node.
layout_as_stemma(x)
layout_as_stemma(x)
x |
an edgelist containing, as a third column, the distance between the two nodes. |
The distance between the nodes will usually correspond to the number of different readings (disagreements and omissions). If a node has several parents, the function will consider only the distance from the last parent in topological order.
A layout, i.e. a matrix of two columns, giving x,y coordinates for each node.
This function is experimental. Horizontal overlapping may occur has a result.
Jean-Baptiste Camps
PCC.Stemma
, PCC.reconstructModel
.
edgelist = structure( c("{ABC}", "{ABC}", "{ABC}", "D", "A","A","G", "A", "B", "C", "E", "F","G","H", 1,5,3,10,3,4,5), .Dim = c(7L, 3L) ) g = igraph::graph_from_edgelist(edgelist[,1:2], directed = TRUE) layout = layout_as_stemma(edgelist) plot(g, layout = layout)
edgelist = structure( c("{ABC}", "{ABC}", "{ABC}", "D", "A","A","G", "A", "B", "C", "E", "F","G","H", 1,5,3,10,3,4,5), .Dim = c(7L, 3L) ) g = igraph::graph_from_edgelist(edgelist[,1:2], directed = TRUE) layout = layout_as_stemma(edgelist) plot(g, layout = layout)
Data from the artificial tradition Notre-Besoin
data(notreBesoin)
data(notreBesoin)
A matrix with 42 observations on the following 13 variables.
nb1
a numeric vector
nb2
a numeric vector
nb3
a numeric vector
nb4
a numeric vector
nb5
a numeric vector
nb6
a numeric vector
nb7
a numeric vector
nb8
a numeric vector
nb9
a numeric vector
nb10
a numeric vector
nb11
a numeric vector
nb12
a numeric vector
nb13
a numeric vector
The data comes from an artificial tradition, created under controlled circumstances. The variant locations have been selected to retain only substantial readings. The data is presented here as used in Camps & Cafiero 2015, without further modifications or corrections. Readings have been converted to numeric codes (0 being omission, and NA an absence of value).
Baret, Philippe V., P. Robinson, and C. Macé. ‘Testing methods on an artificially created textual tradition’. Linguistica computazionale 24 (2004), p. 1000–1029.
Roos, Teemu, Tuomas Heikkilä, and Petri Myllymäki. ‘Computer-Assisted Stemmatology Challenge’. Helsinki, 2007, https://www.cs.helsinki.fi/u/ttonteri/casc/data.html.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
data(notreBesoin)
data(notreBesoin)
Data for the tradition of the Chanson d'Otinel
data(otinel)
data(otinel)
A matrix with 36 observations on the following 8 variables.
A
a numeric vector
B
a numeric vector
M
a numeric vector
NOt
a numeric vector
WOt
a numeric vector
EOtA
a numeric vector
EOtF
a numeric vector
EOtT
a numeric vector
A sample of 36 substantial readings, taken from the tradition (Old French and translations) of the Chanson d'Otinel. Data will be subject to updates. Readings have been converted to numeric codes (0 being omission, and NA an absence of value).
Camps, Jean-Baptiste. La 'Chanson d’Otinel’: édition complète du corpus manuscrit et prolégomènes à l’édition critique, PhD thesis, dir. Dominique Boutet, Paris-Sorbonne, 2016, http://www.theses.fr/2016PA040173.
data(otinel)
data(otinel)
Data from the artificial textual tradition Parzival
data(parzival)
data(parzival)
A matrix with 139 observations on the following 16 variables.
p1
a numeric vector
p2
a numeric vector
p3
a numeric vector
p4
a numeric vector
p5
a numeric vector
p6
a numeric vector
p7
a numeric vector
p8
a numeric vector
p9
a numeric vector
p10
a numeric vector
p11
a numeric vector
p12
a numeric vector
p13
a numeric vector
p14
a numeric vector
p15
a numeric vector
p16
a numeric vector
The data comes from an artificial tradition, created under controlled circumstances. The variant locations have been selected to retain only substantial readings. The data is presented here as used in Camps & Cafiero 2015, without further modifications or corrections. Readings have been converted to numeric codes (0 being omission, and NA an absence of value).
M. Spencer, E. A. Davidson, A. C. Barbrook, and C. J. Howe. ‘Phylogenetics of artificial manuscripts’. Journal of Theoretical Biology, 227 (2004), p. 503–11, https://doi.org/10.1016/j.jtbi.2003.11.022.
Roos, Teemu, Tuomas Heikkilä, and Petri Myllymäki. ‘Computer-Assisted Stemmatology Challenge’. Helsinki, 2007, https://www.cs.helsinki.fi/u/ttonteri/casc/data.html.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
data(parzival)
data(parzival)
Global shell for all the PCC functions, both exploratory and stemma-building. This command successively executes PCC.Exploratory and PCC.Stemma, while asking user for input when necessary.
PCC(x, omissionsAsReadings = FALSE, alternateReadings = FALSE, limit = 0, recoverNAs = TRUE, layout_as_stemma = FALSE, pauseAtPlot = FALSE, ask = TRUE, threshold = NULL, verbose = FALSE)
PCC(x, omissionsAsReadings = FALSE, alternateReadings = FALSE, limit = 0, recoverNAs = TRUE, layout_as_stemma = FALSE, pauseAtPlot = FALSE, ask = TRUE, threshold = NULL, verbose = FALSE)
x |
if |
omissionsAsReadings |
logical; if |
alternateReadings |
logical; if |
limit |
The maximum number of severe disagreements expected for witnesses to be in the same group. Default: |
recoverNAs |
logical; if |
layout_as_stemma |
logical; if TRUE, the witnesses will be placed vertically
according to the distance from their parent, as per the function |
pauseAtPlot |
logical; if |
ask |
logical; if |
threshold |
numeric; the centrality threshold above which variant locations are considered to be over-conflicting. Used only with |
verbose |
logical; if |
This function provides a single entry to all the algorithms used in the PCC method. It successively calls PCC.Exploratory
and PCC.Stemma
.
The algorithmic principles of the PCC method are described in Camps & Cafiero 2015. It builds on the propositions of Poole 1974, 1979.
In a first stage, problematic configurations in the traditions (i.e. configurations that cannot be linked to a normal genealogy, without contamination or polygenesis) are identified by crossing every possible pair of variant locations, and are then plotted as a network. When the most unreliable variant locations (i.e. unreliable) are identified, different methods for eliminating them are offered.
In a second time, a stemma is iteratively built, using the variant locations selected in the first stage. At each step, witness with no severe disagreements (i.e. disagreements between two witnesses, on two readings both shared with at least one other witness, cf. Trennfehler, errores separativi) are grouped together. A model is then reconstructed for each group, and either identified to a witness of the group or to an hypothetical subarchetype.
The option recoverNAs=TRUE
is a novelty not described in the original paper (Camps & Cafiero 2015).
For more information about the underlying principles behind the method applied here, particularly the distinction between severe and benign disagreement, the different status given to readings, omissions and lacunae, the notion of conflict between variant locations or the way the stemma is built, see the references section.
The function returns either a single object of class "pccStemma"
, or a list containing several objects of class "pccStemma"
(if multiple stemmata were drawn);
see PCC.Stemma
.
Jean-Baptiste Camps ([email protected]) & Florian Cafiero([email protected])
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
Poole, Eric. ‘L’analyse stemmatique des textes documentaires’. La pratique des ordinateurs dans la critique des textes, Paris, 1979, p. 151-161.
Poole, Eric, ‘The Computer in Determining Stemmatic Relationships’. Computers and the Humanities, 8-4 (1974), p. 207-16.
# Load data data("fournival") # or alternatively, import it # fournival = import.TEIApparatus(file = "myFournival.xml", # appTypes = c("substantive")) # Analyse it with the PCC functions # Non interactive mode PCC(fournival, ask = FALSE, threshold = 0.06) ## Not run: # Interactive mode PCC(fournival) ## End(Not run)
# Load data data("fournival") # or alternatively, import it # fournival = import.TEIApparatus(file = "myFournival.xml", # appTypes = c("substantive")) # Analyse it with the PCC functions # Non interactive mode PCC(fournival, ask = FALSE, threshold = 0.06) ## Not run: # Interactive mode PCC(fournival) ## End(Not run)
PCC.buildGroup
groups together witnesses in relevant clusters, based on the absence (or number inferior to a limit) of severe disagreements between them.
PCC.buildGroup(x, limit = 0, ask = TRUE)
PCC.buildGroup(x, limit = 0, ask = TRUE)
x |
A PCC.disagreement object. |
limit |
The maximum number of severe disagreements allowed for two witnesses in the same group. Default (and advised) value: |
ask |
logical; if FALSE, decisions will be made without asking the user for input. Default: TRUE |
Witnesses a number of severe disagreements between them lesser than or equal to limit
are grouped together. This disagreement-based method is described in Camps & Cafiero 2015.
The function returns a list containing:
database |
The original database. |
groups |
A list of the groups that were created, identified by their labels. |
Jean-Baptiste Camps ([email protected]) & Florian Cafiero
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
Poole, Eric. ‘L’analyse stemmatique des textes documentaires’. La pratique des ordinateurs dans la critique des textes, Paris, 1979, p. 151-161.
Poole, Eric, ‘The Computer in Determining Stemmatic Relationships’. Computers and the Humanities, 8-4 (1974), p. 207-16.
PCC.Stemma
, PCC.disagreement
, PCC.reconstructModel
.
# A fictional simple tradition x = matrix( c( 1,0,1,1,1,1,1,1, 1,0,1,2,2,2,1,2, 1,0,0,3,2,1,NA,3, 2,0,1,4,NA,1,1,1, 2,1,2,5,2,1,1,4 ), nrow = 8, ncol = 5, dimnames = list(c("VL1","VL2","VL3","VL4","VL5","VL6","VL7","VL8"), c("A","B","C","D","E"))) # Compute disagreement(s) x = PCC.disagreement(x) # And now build the groups PCC.buildGroup(x)
# A fictional simple tradition x = matrix( c( 1,0,1,1,1,1,1,1, 1,0,1,2,2,2,1,2, 1,0,0,3,2,1,NA,3, 2,0,1,4,NA,1,1,1, 2,1,2,5,2,1,1,4 ), nrow = 8, ncol = 5, dimnames = list(c("VL1","VL2","VL3","VL4","VL5","VL6","VL7","VL8"), c("A","B","C","D","E"))) # Compute disagreement(s) x = PCC.disagreement(x) # And now build the groups PCC.buildGroup(x)
Given a matrix of variant locations, this function compares them by pairs to identify conflicting genealogical information between them.
PCC.conflicts(x, omissionsAsReadings = FALSE, alternateReadings = FALSE)
PCC.conflicts(x, omissionsAsReadings = FALSE, alternateReadings = FALSE)
x |
if |
omissionsAsReadings |
logical; if |
alternateReadings |
logical; if |
This function tries to identify conflicts between variant locations, understood as contradictions in the genealogical information they might contain. In order to do that, every possible pair of variant locations is analysed in order to see if both can denote at least one possible normal genealogy (i.e. a genealogy without contamination or polygenesis). If not, they are considered "conflicting".
A network representing all the conflicts between variant locations is drawn, and the total number of conflicts and centrality index by variant location is given,
as an help to estimate which variant locations are unreliable. This output can be then passed to the function PCC.overconflicting
.
See Camps & Cafiero 2015 for more details.
An object of class "pccConflicts", a list containing
edgelist |
a two-column character matrix, giving the edges between variant locations in the network of conflicts (adjacency list) |
conflictsTotal |
a one-column numeric matrix, giving the total number of conflicts per variant location |
database |
the original database used for the calculations |
Jean-Baptiste Camps ([email protected]) & Florian Cafiero
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
Poole, Eric. ‘L’analyse stemmatique des textes documentaires’. La pratique des ordinateurs dans la critique des textes, Paris, 1979, p. 151-161.
Poole, Eric, ‘The Computer in Determining Stemmatic Relationships’. Computers and the Humanities, 8-4 (1974), p. 207-16.
PCC.Exploratory
, PCC.overconflicting
.
# Load data data(fournival) # Analyse its conflicts myConflicts = PCC.conflicts(fournival)
# Load data data(fournival) # Analyse its conflicts myConflicts = PCC.conflicts(fournival)
Detect possible contaminations by assessing the role of each witness in conflicting information between variant locations.
PCC.contam(x, omissionsAsReadings = FALSE, alternateReadings = FALSE, pauseAtPlot = FALSE)
PCC.contam(x, omissionsAsReadings = FALSE, alternateReadings = FALSE, pauseAtPlot = FALSE)
x |
if |
omissionsAsReadings |
logical; if |
alternateReadings |
logical; if |
pauseAtPlot |
logical; if |
To help assess the role of each witness in the conflicting information between variant locations, this function computes the number of conflicting variant locations when removing one of the witnesses, for each witness. If removing a witness makes the number of conflicting variant locations significantly drop, then contamination can be considered as plausible. Be aware that this function will be most efficient for contaminations limited to a single manuscript.
An object of class "pccContam", a list containing
totalByMs |
a numeric matrix, with, in rows, each variant locations, and, in columns, the number of conflicts and centrality in the full database, followed by the difference in total conflicts and centrality caused by the removal of each witness. |
conflictsDifferences |
a one row numeric matrix, containing, for each witness, the total decrease in conflicts caused by its removal from the computations |
database |
the original database used for the calculations |
The execution of this command can be time-consuming for large databases.
Additional contamination detection methods will be implemented in the future.
Jean-Baptiste Camps & Florian Cafiero
Camps, Jean-Baptiste. ‘Detecting Contaminations in Textual Traditions Computer Assisted and Traditional Methods’. Leeds, International Medieval Congress, 2013, unpublished paper, https://www.academia.edu/3825633/Detecting_Contaminations_in_Textual_Traditions_Computer_Assisted_and_Traditional_Methods.
PCC.Exploratory
, PCC.equipollent
.
# load a data set data("fournival") # identify conflicts on a subset x = PCC.conflicts(fournival) # identify problematic variant locations x = PCC.overconflicting(x, ask = FALSE, threshold = 0.06) # eliminate them x = PCC.elimination(x) # examinate the rest of the problematic cases, to detect # plausible contaminations PCC.contam(x)
# load a data set data("fournival") # identify conflicts on a subset x = PCC.conflicts(fournival) # identify problematic variant locations x = PCC.overconflicting(x, ask = FALSE, threshold = 0.06) # eliminate them x = PCC.elimination(x) # examinate the rest of the problematic cases, to detect # plausible contaminations PCC.contam(x)
The PCC.disagreement function helps spotting disagreements (and agreements) between manuscripts. For a given numeric matrix, representing the variants in different manuscripts, it locates disagreements (benign or severe), agreements and omissions in common between manuscripts.
PCC.disagreement(x, omissionsAsReadings = FALSE)
PCC.disagreement(x, omissionsAsReadings = FALSE)
x |
a numeric matrix, with manuscripts in columns, variants in rows, and readings coded by a number. |
omissionsAsReadings |
logical; if TRUE, omissions are considered as readings. |
A distinction is made between severe and benign disagreements (see Camps & Cafiero 2015). Severe disagreements are disagreements between witnesses on two readings that are each shared by at least two witnesses. They have stronger genealogical implications than benign disagreements, that involve at least one singular reading. This distinction is used by the methods of the PCC
family.
This function also gives common omissions, and oriented omissions (i.e. omission present in one witness but not an other). No distinction is made between omission and addition, as this means establishing the orientation in genealogical relationship.
Agreements are given as well, mostly with an indicative value, as they cannot be taken as a direct measure of similarity.
The function returns:
database |
The original database. |
severeDisagreement |
A list of the severe disagreements between manuscripts. |
benignDisagreement |
A list of the benign disagreements between manuscripts. |
agreements |
A list of agreements between manuscripts. |
omissionsInCommon |
A list of all the omissions in common between manuscripts(if omissionsAsReadings is set to TRUE, this will be NA). |
omissionsOriented |
A list of the omissions present in a manuscript but not in another (if omissionsAsReadings is set to TRUE, this will be NA). |
Jean-Baptiste Camps & Florian Cafiero
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
Poole, Eric. ‘L’analyse stemmatique des textes documentaires’. La pratique des ordinateurs dans la critique des textes, Paris, 1979, p. 151-161.
Poole, Eric, ‘The Computer in Determining Stemmatic Relationships’. Computers and the Humanities, 8-4 (1974), p. 207-16.
PCC.Stemma
, PCC.buildGroup
, PCC.reconstructModel
.
#Load a tradition data("fournival") #Option: explore the tradition to see problems in variant locations #PCC.Exploratory(fournival) #Calculate disagreements PCC.disagreement(fournival)
#Load a tradition data("fournival") #Option: explore the tradition to see problems in variant locations #PCC.Exploratory(fournival) #Calculate disagreements PCC.disagreement(fournival)
This function removes from the database the variant locations labelled as over-conflicting by the PCC.overconflicting
function.
PCC.elimination(x)
PCC.elimination(x)
x |
an object of class pccOverconflicting. |
When PCC.overconflicting has been applied to a PCC.conflicts object, it returns a table where over-conflicting variants are labeled as such. The PCC.elimination function simply removes those variants.
A numeric matrix, with witnesses in columns, variant locations in rows, and readings coded by a number, from which over-conflicting variant locations have been removed.
The notion of using a centrality threshold for the identification of over-conflicting variant locations is found in Camps & Cafiero 2015. Other formulas for this centrality might be implemented in the future.
Jean-Baptiste Camps ([email protected]) & Florian Cafiero
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
PCC.Exploratory
,
PCC.conflicts
,
PCC.overconflicting
,
PCC.contam
.
# Load data data("fournival") # Analyse its conflicts myConflicts = PCC.conflicts(fournival) ## Not run: # Interactive mode: identify over-conflicting VL PCC.overconflicting(myConflicts) ## End(Not run) # Non interactive mode myConflicts = PCC.overconflicting(myConflicts, ask = FALSE, threshold = 0.06) # Create a new DB without problematic VL myNewDB = PCC.elimination(myConflicts)
# Load data data("fournival") # Analyse its conflicts myConflicts = PCC.conflicts(fournival) ## Not run: # Interactive mode: identify over-conflicting VL PCC.overconflicting(myConflicts) ## End(Not run) # Non interactive mode myConflicts = PCC.overconflicting(myConflicts, ask = FALSE, threshold = 0.06) # Create a new DB without problematic VL myNewDB = PCC.elimination(myConflicts)
A single table of variant locations can sometimes
reflect different competing genealogies, due to contamination, either for a single manuscript, or for the whole tradition. PCC.equipollent
identifies the variant locations without internal conflicts, and allows to create separate databases for each internally consistent configuration.
PCC.equipollent(x, ask = TRUE, scope = NULL, wits = NULL, verbose = FALSE)
PCC.equipollent(x, ask = TRUE, scope = NULL, wits = NULL, verbose = FALSE)
x |
A |
ask |
logical; if |
scope |
should the inconsistent variant locations be neutralised for the
whole tradition ( |
wits |
a vector containing the names of the witnesses for which
inconsistent variant locations should be neutralised.
Use only with |
verbose |
logical; if |
Some over-conflicting variant locations can be algorithmically ruled out for the building of a stemma (see PCC.conflicts
, PCC.overconflicting
and PCC.elimination
). Yet, in some cases, choosing between conflicting variables is algorithmically undecidable.
This might be due sometimes to contamination (see PCC.contam
). PCC.equipollent
helps addressing such cases.
It tries to assess, first, the sets of variant locations that are internally consistent (no conflict among themselves), and then, creates as many different databases as sets were found. In creating these new databases, the variant location that have conflicting information with the current set are either fully neutralised (scope = "T"
) or neutralised only for some witnesses (scope = "W"
).
An object of class pccEquipollent, a list containing
databases |
a list with all alternative databases that have been created, if any |
notInConflict |
a list with the set(s) of VL without internal conflicts |
This function is still experimental, and will work optimally only for simple cases, where competing genealogies can be easily identified.
Jean-Baptiste Camps & Florian Cafiero
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste. ‘Detecting Contaminations in Textual Traditions Computer Assisted and Traditional Methods’. Leeds, International Medieval Congress, 2013, unpublished paper, https://www.academia.edu/3825633/Detecting_Contaminations_in_Textual_Traditions_Computer_Assisted_and_Traditional_Methods.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
PCC.Exploratory
, PCC.conflicts
, PCC.overconflicting
, PCC.elimination
, PCC.contam
.
# load data data("fournival") # look for conflicts y = PCC.conflicts(fournival) # identify and eliminate overconflicting VL y = PCC.overconflicting(y, ask = FALSE, threshold = 0.06) y = PCC.elimination(y) # look for further conflicts y = PCC.conflicts(y) # and now, create configurations for competing genealogies # for instance, for one witness newDB = PCC.equipollent(y, ask = FALSE, scope = "W", wits = "D") # Alternatively, you can create them for the whole tradition newDB = PCC.equipollent(y, ask = FALSE, scope = "T") # or for several witnesses newDB = PCC.equipollent(y, ask = FALSE, scope = "W", wits = c("A", "D")) # and then you proceed to create one or several stemmata, e.g. # PCC.Stemma(newDB$databases[[1]], ask = FALSE)
# load data data("fournival") # look for conflicts y = PCC.conflicts(fournival) # identify and eliminate overconflicting VL y = PCC.overconflicting(y, ask = FALSE, threshold = 0.06) y = PCC.elimination(y) # look for further conflicts y = PCC.conflicts(y) # and now, create configurations for competing genealogies # for instance, for one witness newDB = PCC.equipollent(y, ask = FALSE, scope = "W", wits = "D") # Alternatively, you can create them for the whole tradition newDB = PCC.equipollent(y, ask = FALSE, scope = "T") # or for several witnesses newDB = PCC.equipollent(y, ask = FALSE, scope = "W", wits = c("A", "D")) # and then you proceed to create one or several stemmata, e.g. # PCC.Stemma(newDB$databases[[1]], ask = FALSE)
This is the global function for exploratory methods of the PCC family. It interactively makes use of the lower-level exploratory functions, to assess conflicts between variant locations, eliminate problematic configurations or identify likely contaminations.
PCC.Exploratory(x, omissionsAsReadings = FALSE, alternateReadings = FALSE, pauseAtPlot = FALSE, ask = TRUE, threshold = NULL, verbose = FALSE)
PCC.Exploratory(x, omissionsAsReadings = FALSE, alternateReadings = FALSE, pauseAtPlot = FALSE, ask = TRUE, threshold = NULL, verbose = FALSE)
x |
if |
omissionsAsReadings |
logical; if |
alternateReadings |
logical; if |
pauseAtPlot |
logical; if |
ask |
logical; if |
threshold |
numeric; the centrality threshold above which variant locations are considered to be over-conflicting. Used only with |
verbose |
logical; if |
This function is meant to guide the user through the process of assessing and eliminating unreliable variant locations and/or identify competing genealogies (i.e. contamination), as described in Camps & Cafiero 2015.
It starts by computing and plotting the network of conflicting variant locations (i.e. variant locations that present contradictory genealogical information), by calling PCC.conflicts
, and then interactively aids the user in determining overconflicting variant locations (with PCC.overconflicting
), eliminating problematic variant locations (PCC.elimination
), detecting contamination (PCC.contam
) or creating new databases reflecting competing genealogies (PCC.equipollent
).
According to the choices made by the user, the output can be an object belonging to one of the following classes: pccConflicts
, pccOverconflicting
, pccContam
or pccEquipollent
.
Jean-Baptiste Camps ([email protected]) & Florian Cafiero
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
Poole, Eric. ‘L’analyse stemmatique des textes documentaires’. La pratique des ordinateurs dans la critique des textes, Paris, 1979, p. 151-161.
Poole, Eric, ‘The Computer in Determining Stemmatic Relationships’. Computers and the Humanities, 8-4 (1974), p. 207-16.
PCC.conflicts
, PCC.overconflicting
, PCC.elimination
, PCC.contam
, PCC.equipollent
.
# Load data data(fournival) ## Not run: # Interactive mode # or alternatively, import it #fournival = import.TEIApparatus(file = "myFournival.xml", # appTypes = c("substantive")) # Analyse it with the PCC functions PCC.Exploratory(fournival) ## End(Not run) # Non interactive mode PCC.Exploratory(fournival, ask = FALSE, threshold = 0.06)
# Load data data(fournival) ## Not run: # Interactive mode # or alternatively, import it #fournival = import.TEIApparatus(file = "myFournival.xml", # appTypes = c("substantive")) # Analyse it with the PCC functions PCC.Exploratory(fournival) ## End(Not run) # Non interactive mode PCC.Exploratory(fournival, ask = FALSE, threshold = 0.06)
Given a network of conflicts (contradictions) between variant locations, this function helps in assessing which are the problematic ones.
PCC.overconflicting(x, ask = TRUE, threshold = NULL)
PCC.overconflicting(x, ask = TRUE, threshold = NULL)
x |
an object of class pccConflicts. |
ask |
logical; if |
threshold |
numeric; the centrality threshold above which variant locations are considered to be over-conflicting. Used only with |
This function is dedicated to the identification of problematic variant locations, as defined in Poole 1974 and Camps & Cafiero 2015. It helps the user defining a threshold, defined in terms of centrality index, above which variant locations are considered to be over-conflicting. This output can be then passed to the function PCC.elimination
, to remove them from the database.
An object of class "pccOverconflicting", a list containing the three same first objects as the "pccConflicts" input,
edgelist |
a two-column character matrix, giving the edges between variant locations in the network of conflicts |
conflictsTotal |
a one-column numeric matrix, giving the total number of conflicts per variant location |
database |
the original database used for the calculations |
and adding
vertexAttributes |
a two column character matrix, with a row per vertex of the network (i.e. variant location), giving its label and colour |
Jean-Baptiste Camps ([email protected]) & Florian Cafiero
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
Poole, Eric. ‘L’analyse stemmatique des textes documentaires’. La pratique des ordinateurs dans la critique des textes, Paris, 1979, p. 151-161.
Poole, Eric, ‘The Computer in Determining Stemmatic Relationships’. Computers and the Humanities, 8-4 (1974), p. 207-16.
PCC.Exploratory
,
PCC.conflicts
,
PCC.elimination
.
# Load data data("fournival") # Analyse its conflicts myConflicts = PCC.conflicts(fournival) ## Not run: # Interactive mode: identify over-conflicting VL PCC.overconflicting(myConflicts) ## End(Not run) # Non interactive mode PCC.overconflicting(myConflicts, ask = FALSE, threshold = 0.06)
# Load data data("fournival") # Analyse its conflicts myConflicts = PCC.conflicts(fournival) ## Not run: # Interactive mode: identify over-conflicting VL PCC.overconflicting(myConflicts) ## End(Not run) # Non interactive mode PCC.overconflicting(myConflicts, ask = FALSE, threshold = 0.06)
PCC.reconstructModel
examines coherent clusters of witnesses (PCC.buildGroup
), to either identify their model in the tradition, either suggest a reconstructed model for the group.
PCC.reconstructModel(x, omissionsAsReadings = FALSE, recoverNAs = TRUE, ask = TRUE, verbose = FALSE)
PCC.reconstructModel(x, omissionsAsReadings = FALSE, recoverNAs = TRUE, ask = TRUE, verbose = FALSE)
x |
The output of |
omissionsAsReadings |
logical; if |
recoverNAs |
logical; if |
ask |
logical; if FALSE, decisions will be made without asking the user for input. Default: TRUE |
verbose |
logical; if FALSE, the function will only return the results, without information on the operations. Default: FALSE |
This function takes PCC.buildGroup
objects as input. It assesses the characteristics of the model of each group, and compares it to the existing witnesses. If a witness has the same characteristics as the computed model, it is identified as the model for the group. If no witness seems to be a good fit, the function adds a reconstructed model to the tradition.
The function returns a list containing
fullDatabase |
The full database, with the new reconstructed models and recovered NAs (if applicable). |
database |
The same with the descripti removed. |
edgelist |
An edgelist expressing the relations between the witnesses of each group with, as a third column, the distances between witnesses. |
models |
A list containing the database of readings for each model at the time of their reconstruction (i.e., before they are compared to extant witnesses). |
modelsByGroup |
A matrix with, in columns the groups, and a single row containing the label of their model. |
Jean-Baptiste Camps ([email protected]) & Florian Cafiero
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
Poole, Eric. ‘L’analyse stemmatique des textes documentaires’. La pratique des ordinateurs dans la critique des textes, Paris, 1979, p. 151-161.
Poole, Eric, ‘The Computer in Determining Stemmatic Relationships’. Computers and the Humanities, 8-4 (1974), p. 207-16.
PCC.Stemma
, PCC.disagreement
, PCC.buildGroup
.
#A fictional simple tradition x = list(database = matrix( c( 1,0,1,1,1,1,1,1, 1,0,1,2,2,2,1,2, 1,0,0,3,2,1,NA,3, 2,0,1,4,NA,1,1,1, 2,1,2,5,2,1,1,4 ), nrow = 8, ncol = 5, dimnames = list(c("VL1","VL2","VL3","VL4","VL5","VL6","VL7","VL8"), c("A","B","C","D","E"))), groups = list(c("A", "B", "C"), c("D", "E"))) #And now, reconstruct the groups PCC.reconstructModel(x)
#A fictional simple tradition x = list(database = matrix( c( 1,0,1,1,1,1,1,1, 1,0,1,2,2,2,1,2, 1,0,0,3,2,1,NA,3, 2,0,1,4,NA,1,1,1, 2,1,2,5,2,1,1,4 ), nrow = 8, ncol = 5, dimnames = list(c("VL1","VL2","VL3","VL4","VL5","VL6","VL7","VL8"), c("A","B","C","D","E"))), groups = list(c("A", "B", "C"), c("D", "E"))) #And now, reconstruct the groups PCC.reconstructModel(x)
Builds a stemma codicum of the tradition, following the Poole-Camps-Cafiero method.
PCC.Stemma(x, omissionsAsReadings = FALSE, limit = 0, recoverNAs= TRUE, layout_as_stemma = FALSE, ask = TRUE, verbose = FALSE)
PCC.Stemma(x, omissionsAsReadings = FALSE, limit = 0, recoverNAs= TRUE, layout_as_stemma = FALSE, ask = TRUE, verbose = FALSE)
x |
a numeric matrix, with witnesses in columns, variant locations in rows, and readings coded by a number; or a character matrix, with witnesses in columns, variant locations in rows, and, in each cell, one or several readings, coded by numbers and separated by a comma (e.g. '1,2,3', if the witness has three different readings). |
omissionsAsReadings |
logical; if Default: |
limit |
The maximum number of severe disagreements expected for witnesses to be in the same group. Default: |
recoverNAs |
logical; if |
layout_as_stemma |
logical; if TRUE, the witnesses will be placed vertically
according to the distance from their parent, as per the function |
ask |
logical; if |
verbose |
logical; if |
The PCC.Stemma
function calls successively the functions PCC.disagreement
, PCC.buildGroup
and PCC.reconstructModel
to build a stemma codicum of the tradition studied. By default, it stops when less than four witnesses are to be compared, as the possibility of errors becomes high. The user is however able to ask the algorithm its final answer for those last witnesses.
The function returns either a single list, or a list containing several lists (if multiple stemmata were drawn). Each list contains:
fullDatabase |
The full database, with the new reconstructed models and recovered NAs (if applicable). |
database |
The same with the descripti removed. |
edgelist |
An edgelist expressing the relations between the witnesses of each group with, as a third column, the distances between witnesses. |
models |
A list containing the database of readings for each model at the time of their reconstruction (i.e., before they are compared to extant witnesses). |
modelsByGroup |
A matrix with, in columns the groups, and a single row containing the label of their model. |
Jean-Baptiste Camps ([email protected]) & Florian Cafiero ([email protected])
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
Poole, Eric. ‘L’analyse stemmatique des textes documentaires’. La pratique des ordinateurs dans la critique des textes, Paris, 1979, p. 151-161.
Poole, Eric, ‘The Computer in Determining Stemmatic Relationships’. Computers and the Humanities, 8-4 (1974), p. 207-16.
PCC.disagreement
, PCC.buildGroup
, PCC.reconstructModel
, layout_as_stemma
.
# Load data data("parzival") # or alternatively, import it #fournival = import.TEIApparatus(file = "myFournival.xml", # appTypes = c("substantive")) ## Not run: # Interactive mode # Analyse it with the PCC functions myDBs = PCC.Exploratory(parzival) # draw a stemma PCC.Stemma(myDBs$databases[[2]]) ## End(Not run) # Non interactive mode myDBs = PCC.Exploratory(parzival, ask = FALSE, threshold = 0.06) PCC.Stemma(myDBs$databases[[2]], ask = FALSE)
# Load data data("parzival") # or alternatively, import it #fournival = import.TEIApparatus(file = "myFournival.xml", # appTypes = c("substantive")) ## Not run: # Interactive mode # Analyse it with the PCC functions myDBs = PCC.Exploratory(parzival) # draw a stemma PCC.Stemma(myDBs$databases[[2]]) ## End(Not run) # Non interactive mode myDBs = PCC.Exploratory(parzival, ask = FALSE, threshold = 0.06) PCC.Stemma(myDBs$databases[[2]], ask = FALSE)
Build and analyse the genealogy of textual or musical traditions.
Package: | stemmatology |
Type: | Package |
Version: | 0.3.0 |
Date: | 2018-05-20 |
License: | GPL-3 |
Stemmatology is the name of the field dedicated to studying text genealogies and establishing genealogical tree-like graphs known as stemma codicum.
This package includes various functions for stemmatological analysis.
Most of the functions take, as input a numeric matrix, with witnesses in columns, variant locations in rows, and readings coded by a number, e.g.
A | B | C | D | E | H | I | J | K | O | |
1 | 0 | 1 | 1 | 1 | NA | 1 | 1 | NA | 1 | 1 |
2 | 1 | 1 | 1 | 1 | NA | 1 | 1 | NA | 1 | 1 |
3 | 1 | 1 | 1 | 1 | NA | 1 | 1 | NA | 1 | 1 |
4 | 1 | 1 | 1 | 2 | NA | 1 | 1 | NA | 1 | 1 |
5 | 1 | 1 | 1 | 2 | NA | 1 | 1 | NA | 1 | 1 |
6 | 1 | 1 | 1 | 1 | NA | 1 | 1 | NA | 1 | 1 |
where A, B, …, O are the various witnesses in columns, 1…6 the various variant locations, in rows, and the different readings are coded either 0 (omission), 1, 2, …, n. NA
is used for the lack of information (physical lacuna, absence of observation, variant location not applicable to a given witness, etc.).
Alternatively , if alternateReadings = TRUE
, the input can be a character matrix, with witnesses in columns, variant locations in rows, and, in each cell, one or several readings, coded by numbers and separated by a comma (e.g. '1,2,3', if the witness has three different readings), e.g.
A | D | F | T | P | |
1 | "1" | "2" | "2" | "2" | "1,2" |
2 | "1" | "2" | "1,2" | "2" | "1" |
3 | "1" | "1" | "1" | "1" | "2" |
4 | "1,3" | "1,2" | "1" | "2" | "3" |
Notice how a witness can bear several readings (e.g., P at VL 1).
Data can be created inside R or imported. They can be imported by reading a csv file, for instance (e.g. with read.csv
). They can also be imported from a TEI encoded apparatus in parallel-segmentation, either by using an XSL stylesheet, or the built-in function import.TEIApparatus
.
The function import.TEIApparatus
allows to import a TEI P5 encoded apparatus into a stemmatological matrix usable with other functions. It has some parameters to refine the import (variant types, …), and can read either from disk or from an URL.
Functions are made available for the PCC method (See Camps and Cafiero 2014 or PCC
for more details). The most important are
PCC
: global shell for the PCC functions
PCC.Exploratory
: global function for exploratory methods of the PCC family
PCC.Stemma
: Building the Stemma Codicum.
The package contains also various other functions, particularly aimed at
detecting contamination. It contains for instance the function PCC.contam
.
The package aims at making available various other stemmatological methods, including further functions for contamination detection, or for theoretical stemmatology.
Please report issues with this package to https://github.com/Jean-Baptiste-Camps/stemmatology.
Jean-Baptiste Camps (École nationale des chartes | Université PSL).
Florian Cafiero.
Maintainer: Jean-Baptiste Camps <[email protected]>.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2), edited by Andrew U. Frank et al., 2018, pp. 65–74, https://halshs.archives-ouvertes.fr/hal-01695903v1.
Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, https://halshs.archives-ouvertes.fr/halshs-01435633, DOI: 10.1484/M.LECTIO-EB.5.102565.
PCC
,
PCC.Exploratory
, PCC.Stemma
# Load data data(fournival) # or alternatively, import it #fournival = import.TEIApparatus(file = "myFournival.xml", # appTypes = c("substantive")) ## Not run: # Interactive mode # Analyse it with the PCC functions PCC(fournival) ## End(Not run) # Complete step-by-step non interactive use data("fournival") # look for conflicts myConflicts = PCC.conflicts(fournival) # remove conflicting VL myConflicts = PCC.overconflicting(myConflicts, ask = FALSE, threshold = 0.06) myNewData = PCC.elimination(myConflicts) # look for competing genealogies myConflicts = PCC.conflicts(myNewData) myNewData = PCC.equipollent(myConflicts, ask = FALSE, scope = "W", wits = "D") # build a stemma PCC.Stemma(myNewData$databases[[1]], ask = FALSE)
# Load data data(fournival) # or alternatively, import it #fournival = import.TEIApparatus(file = "myFournival.xml", # appTypes = c("substantive")) ## Not run: # Interactive mode # Analyse it with the PCC functions PCC(fournival) ## End(Not run) # Complete step-by-step non interactive use data("fournival") # look for conflicts myConflicts = PCC.conflicts(fournival) # remove conflicting VL myConflicts = PCC.overconflicting(myConflicts, ask = FALSE, threshold = 0.06) myNewData = PCC.elimination(myConflicts) # look for competing genealogies myConflicts = PCC.conflicts(myNewData) myNewData = PCC.equipollent(myConflicts, ask = FALSE, scope = "W", wits = "D") # build a stemma PCC.Stemma(myNewData$databases[[1]], ask = FALSE)