Skip to content

Extract references in an element

  • Elements in XML (here DDI) hold references for consistency reasons only in one traversable direction
  • In the following example, it is easy to reach the referenced Variable from a given Note
  • Getting the Notes of a given Variable is quite inefficient, because traversal is only possible from a Note
  • Depending on the use case, actually this 'other' query is necessary. Easy and fast access is critical.
  • The same is applicable for Variable and Question.
<r:Note type="Other" id="836e67df-2122-487a-a3ed-ac37d86619b4">
	<r:Subject>org.gesis.cbe.variable.frequencies.table.content</r:Subject>
	<r:Relationship>
		<r:RelatedToReference>
			<r:ID>8cc185b5-dce3-48bc-887c-8b053b35d3c3</r:ID>
		</r:RelatedToReference>
	</r:Relationship>
	<r:Content><![CDATA["ZA3950, V1: GESIS Data Archive Study Number  (N=52550)"; ...]]></r:Content>
</r:Note>
<l:Variable id="8cc185b5-dce3-48bc-887c-8b053b35d3c3">
	<l:VariableName>V1</l:VariableName>
	<r:Label translated="true">GESIS Data Archive Study Number</r:Label>
	<l:QuestionReference>
		<r:ID>fb65832c-238e-487e-895e-564cfa20b681</r:ID>
	</l:QuestionReference>
</l:Variable>

Approach

  • During splitting of the document also the relations of elements are extracted.
  • The split configuration defines
  • which elementType references which related elementType
  • and the relative xpath for the identifier of the related element
  • Extracting references is not DDI-specific
# element type: ddiinstance.note
ddiinstance.note.path = /DDIInstance/ResourcePackage/Note
ddiinstance.note.identifierPath = ./@id
ddiinstance.note.parentIdentifierPath = ../@id
# NEW
ddiinstance.note.reference = ddiinstance.variablescheme.variable
ddiinstance.note.reference.identifierPath = ./Relationship/RelatedToReference/ID

# element type: ddiinstance.variablescheme.variable
ddiinstance.variablescheme.variable.path = /DDIInstance/ResourcePackage/VariableScheme/Variable
ddiinstance.variablescheme.variable.identifierPath = ./@id
ddiinstance.variablescheme.variable.parentIdentifierPath = ../@id
# NEW
ddiinstance.variablescheme.variable.reference = ddiinstance.questionscheme.item
ddiinstance.variablescheme.variable.reference.identifierPath = ./QuestionReference/ID

Open issues

  • Sequence of related elements matter!
  • Key-Value pair with semantically rich keys is very limited and really bad to parse! We need a parsable structure OR better A MODEL! 💎
Edited by Alexander Mühlbauer