.. highlight:: xml .. _conversion-mapping-xml: Data conversion =============== The data conversion XML document is used to convert the input data to match the data model defined in the lexicon document (:ref:`lexicon-xml`). An example of the inlined and by-level input data formats are given below, for a comprehensive description of all the possible content options for the document see the `schema document `_. Data continuation and abbreviated content expressed as … Inlined example *************** The root level tag contains a reference to the schema document which is used to validate the content of the XML document:: Data levels definitions are given nested as in the lexicon. The input data is given Each data row has an belongs to a certain data level implicated by a variable in a given position in the row The input data has two data levels: ``comp_unit`` and ``stratum``, ``comp_unit`` being the top level, simulation unit (e.g. stand or sample plot) and ``stratum`` stratum being the lower level (sub_levels). In the input data the ``comp_unit`` data is on the rows that have value 1 (rowtype_value) at the position 1 (rowtype_rowpos), and the ``stratum`` data is on the rows where this value is 2. The id for each object is at the position 1 for ``comp_unit`` objects and at position 2 for ``stratum`` objects. The position values begin from 0:: … comp_unit 0 1 1 7 stratum 2 1 2 Note that the tag can contain several values. If several rowtype values are specified, the smallest is taken to indicate the actual data row, others are treated as containing extra information. For example, the date information should be on the main data row. Missing values in the input data are indicated by either with no data between the delimiters ('') or with the value -1 (none_value_indicator). In this case the input data row is rejected during data import if the SIMO variable MAIN_GROUP would get any of the values 4, 5, 6, 7 or 8 (object_rejection). The rejection variable must be from the highest data level; i.e., in this example from the comp_unit level:: … '' -1 MAIN_GROUP 4 5 6 7 8 The variable "Pinta-ala" is converted into SIMO variable "AREA" during import. The "Pinta-ala" value in the from-element is only for documenting purposes. It won't have any effect in the import, because the actual imported value is defined by the row_type and row_position element values. Here the row_type value refers to the ones defined above in the data_levels definitions. It would be possible to give a default value for this variable in case of a missing value (none_to_value). A conversion factor is defined for numerical attributes (conversion_factor). It's used in data import by multiplying the input data value with it:: … Pinta-ala AREA 1 8 double 1 For categorical values explicit mapping from input values to SIMO attribute values is given (value_mapping). In this case input data value 1 at position 14 for row type 1 is converted to PEAT attribute value 0, and values 2, 3, 4 and 5 are converted to 1:: … Alaryhmä PEAT 1 14 int 1 0 2 3 4 5 1 … Besides *numerical* and *categorigal* values, *dates* and *text* can be imported as well. The *epoch_year* for date is either *current* or *gregorian*. Current meaning that the dates are treated in simulation as the number of days since the start of the current year; i.e. if the date being imported is 30-11-2010 and the epoch_year is 'current', the date being imported is 333, the number of days since 1-1-2010 to 30-11-2010. If the epoch_year is 'gregorian' the date is treated as a normal date:: … inventory_date 1 15 date current estate_name 1 16 string By-level example **************** The conversion format for data in separate files for each data level is similar to the inlined data except for the data level definitions:: comp_unit 1 1 stratum 1 comp_unit 0 2 Here the linking between the files is defined in the link_id_rowpos-element for the ``stratum`` data: the first value (pos) in each row in the ``stratum`` data file will contain an id for the ``comp_unit`` (level) object the ``stratum`` belongs to. This id will be identical to the value found in the second position (id_rowpos is 1) in the rows of the ``comp_unit`` data file.