Extract references in an element
- Elements in XML (here DDI) hold references for consistency reasons only in one traversable direction
- In the following example, it is easy to reach the referenced
Variable
from a givenNote
- Getting the
Note
s of a givenVariable
is quite inefficient, because traversal is only possible from aNote
- Depending on the use case, actually this 'other' query is necessary. Easy and fast access is critical.
- The same is applicable for
Variable
andQuestion
.
<r:Note type="Other" id="836e67df-2122-487a-a3ed-ac37d86619b4">
<r:Subject>org.gesis.cbe.variable.frequencies.table.content</r:Subject>
<r:Relationship>
<r:RelatedToReference>
<r:ID>8cc185b5-dce3-48bc-887c-8b053b35d3c3</r:ID>
</r:RelatedToReference>
</r:Relationship>
<r:Content><![CDATA["ZA3950, V1: GESIS Data Archive Study Number (N=52550)"; ...]]></r:Content>
</r:Note>
<l:Variable id="8cc185b5-dce3-48bc-887c-8b053b35d3c3">
<l:VariableName>V1</l:VariableName>
<r:Label translated="true">GESIS Data Archive Study Number</r:Label>
<l:QuestionReference>
<r:ID>fb65832c-238e-487e-895e-564cfa20b681</r:ID>
</l:QuestionReference>
</l:Variable>
Approach
- During splitting of the document also the relations of elements are extracted.
- The split configuration defines
- which
elementType
references which relatedelementType
- and the relative xpath for the identifier of the related element
- Extracting references is not DDI-specific
# element type: ddiinstance.note
ddiinstance.note.path = /DDIInstance/ResourcePackage/Note
ddiinstance.note.identifierPath = ./@id
ddiinstance.note.parentIdentifierPath = ../@id
# NEW
ddiinstance.note.reference = ddiinstance.variablescheme.variable
ddiinstance.note.reference.identifierPath = ./Relationship/RelatedToReference/ID
# element type: ddiinstance.variablescheme.variable
ddiinstance.variablescheme.variable.path = /DDIInstance/ResourcePackage/VariableScheme/Variable
ddiinstance.variablescheme.variable.identifierPath = ./@id
ddiinstance.variablescheme.variable.parentIdentifierPath = ../@id
# NEW
ddiinstance.variablescheme.variable.reference = ddiinstance.questionscheme.item
ddiinstance.variablescheme.variable.reference.identifierPath = ./QuestionReference/ID
Open issues
- Sequence of related elements matter!
- Key-Value pair with semantically rich keys is very limited and really bad to parse! We need a parsable structure OR better A MODEL!
💎
Edited by Alexander Mühlbauer