.. _usage:

Using SIMO
==========

Shortcut for the impatient
-----------------------------------------

Your first SIMO run will consist of two commands. The first one will build your  SIMO simulator and optimizer and the next command will run a data import, simulation, optimizatio and reporting cycle using those and a demo data set. Later, you'll only need to run the *builder* command if you have changed the XML documents used to describe the simulator & optimizer in SIMO. 

On Windows
++++++++++

First, open the Command Shell. From the Windows Start-menu select *Run...* (XP) or the search field (Vista) and type **cmd**

Change to working directory to the SIMO installation directory using the *cd* command, and then in that directory::

   bin\builder.exe build simulator\builder.ini
   bin\runner.exe simulator\runner.ini
   
If you wish to run SIMO as a server, you need to first run the following::

    bin\builder.exe build simulator\builder.ini

Then, all of the following need to be running at the same time::

    redis-server
    bin\server.exe simulator\server_runner.ini
    bin\worker.exe
    
After which you can run the following to start runs::

    bin\control.exe

If you're using SIMO source distribution, you won't have the \bin -executables and you have to run SIMO using the following commands::

   python src\builder.py build simulator\builder.ini
   python src\runner.py simulator\runner.ini

If you wish to run SIMO as a server, you need to first run the following::

    python src\builder.py build simulator\builder.ini

Then, all of the following need to be running at the same time::

    redis-server
    python src\simoserver\simoserver.py simulator\server_runner.ini
    python src\simoserver\run_celeryd.py
    
After which you can run the following to start runs::

    python src\simoserver\simoctrl.py


On Linux & OS X
+++++++++++++++

SIMO is run in the terminal from the installation directory::

   bin/builder build simulator/builder.ini
   bin/runner simulator/runner.ini

If you wish to run SIMO as a server, you need to first run the following::

    bin/builder build simulator/builder.ini

Then, all of the following need to be running at the same time::

    bin/redis-server
    bin/server simulator/server_runner.ini
    bin/server_celeryd
    
After which you can run the following to start runs::

    bin/server_control

Structure of your SIMO installation
-----------------------------------

To use SIMO, you have three executables in the *bin* directory in your SIMO installation directory::
   
   /bin
      builder.exe or builder
      runner.exe or runner
      logger.exe or logger

If you're using the source distribution on Windows, there won't be the *bin* directory. Instead you'll have these files in the *src* directory::

   /src
      builder.py
      runner.py
      logger.py

The three executable files (or Python files) are the modules used for running SIMO. The default setup is to run the programs from the root SIMO directory.

In addition to those, the *simulator* directory in your SIMO installation directory contains important bits and pieces used to construct the simulator and optimizer you'll be using::

   /simulator
      /db
      /input
      /models
         /aggregation
         /cash_flow
         /geo_table
         /management
         /operation
         /prediction
      /output
      /xml
         /aggregation_models
         /cash_flows
         /conversions
         /geo_tables
         /management_models
         /model_chains
         /operation_models
         /parameter_tables
         /prediction_models
         /samples
         /schemas
         /translation
      builder.ini
      exporter.ini
      lister.ini
      runner.ini

The files *builder.ini* and *runner.ini* are parameter files for controlling the execution of the *builder* and *runner* modules, respectively. The name and the location of the ini-files can naturally be different to the one given above.

In the folder structure above the *./simulator/xml* folder contains the XML documents that the user can modify. The *./simulator/models* folder contains the model libraries for the various model types (dll and py files). The *./simulator/db* folder contains the internal SIMO databases, whereas the *./simulator/input* and *./simulator/output* folders are reserved for the input data and reports.

The above folder structure is given only as illustration of a possible configuration. The specific location of different components is defined in the ini-files.

Building simulators & optimizers with builder
---------------------------------------------

**Builder** parses and validates XML documents, constructs SIMO objects from the XML contents and stores the objects into an object database. Builder has three commands; build, list and export; and it's run like this (build command)::
   
   bin/builder build simulator/builder.ini

For source distribution on Windows, which lacks the *bin* versions, the command is::

   python src/builder.py build simulator/builder.ini

You'll need to run the builder build command once at the beginning after a fresh installation and after that only if you make changes to the XML documents in the simulator directory.

Content of builder.ini for building SIMO instances
++++++++++++++++++++++++++++++++++++++++++++++++++

The content of the builder.ini-file is broken down into separate sections for which details are given below. The file paths in the ini-file can be given either as absolute or relative paths. One can also add comments in the file. The comments must begin with the # sign.

**Program execution**::

   [program_execution]

The folder which is used a the base folder for the relative path settings used in the ini file::

   base_folder= C:\Users\Joe\SIMO\simulator

**Logging**::

   [logging]

Whether the log messages during program execution are written on the screen, and if console output is on, what level messages are logged on the screen::

   console=TRUE
   console_level=info

If file is given, log messages are written to the file, and if file output is on, what level messages are logged::

   file=buildlog.txt
   file_level=error

**Database**::

   [simo_database]

Path to SIMO object database file::

   path=db/simo.db

Option *create_new* defines whether new database should be created and existing database deleted. Option *create_zip* zips the object database file and creates a .ver file to accompany it. The .ver file contains a timestamp for the zipped object db. Normally you won't be using this, but it gives you an option to build basic versioning for your SIMO dbs::

   create_new=True
   timestamp=False
   create_zip=False

**Typedef**::

   [typedef]

Path to SIMO type definition schema document::

   path=xml/schemas/Typedefs_SIMO.xsd

**Schema**::

   [schema]

Schema section contains the paths to all required SIMO schema documents. Schema documents are used for validating the XML document structure and contents::

   lexicon=xml/schemas/lexicon.xsd
   message_translation=xml/schemas/message_translation_table.xsd
   lexicon_translation=xml/schemas/lexicon_translation_table.xsd
   text2data=xml/schemas/text2data.xsd
   operation2modelchains=xml/schemas/operation2modelchains.xsd
   text2operation=xml/schemas/text2operation.xsd
   modelchain=xml/schemas/model_chain.xsd
   simulation_control=xml/schemas/simulation.xsd
   problem_definition=xml/schemas/optimization_task.xsd
   output_constraint=xml/schemas/output_constraint.xsd
   random_values=xml/schemas/random_variables.xsd
   aggregation_definition=xml/schemas/report_definition.xsd

**Schema.modelbase**::

   [schema.modelbase]

SIMO model classes also require schema documents for validating the model interfaces::

   aggregation=xml/schemas/aggregation_modelbase.xsd
   cash_flow=xml/schemas/cash_flow_modelbase.xsd
   cash_flow_table=xml/schemas/cash_flow_table.xsd
   geo_table=xml/schemas/geo_table.xsd
   management=xml/schemas/management_modelbase.xsd
   operation=xml/schemas/operation_modelbase.xsd
   parameter_table=xml/schemas/parameter_table.xsd
   prediction=xml/schemas/prediction_modelbase.xsd

**XML**::

   [xml]

XML section controls which XML documents builder parses, validates and stores into SIMO object database::

   # for the xmls several files or folders may be given
   # separate the list items with ;
   # if you merge lexicon from multiple XML documents, the master lexicon
   # should be defined first, and the extra lexicons after that, separated by ;
   lexicon=xml/lexicon.xml
   message_translation=xml/translation/message_translation.xml
   lexicon_translation=xml/translation/lexicon_translation.xml
   text2data=xml/conversions/MELA2SIMO.xml
   text2operation=xml/conversions/SMU2SIMO.xml
   operation2modelchains=xml/operation_models/operation_mapping.xml
   problem_definition=xml/samples/optimization_task.xml
   output_constraint=xml/samples/result_variables.xml
   random_values=
   simulation_control=xml/samples/simulation.xml;xml/samples/simulation_price_model.xml
   aggregation_definition=xml/samples/aggregation_def.xml

**XML modelbase**::

   [xml.modelbase]
 
All models must have XML definitions, or interfaces, where model inputs and output etc. are defined. These include XML documents describing the prediction models (:ref:`prediction-modelbase-xml`), operation models (:ref:`operation-modelbase-xml`), cash flow tables (:ref:`cash-flow-table-xml`), parameter tables (:ref:`parameter-table-xml`) and location dependent tables (:ref:`geo-table-xml`)::::

   aggregation=xml/aggregation_models/aggregation_models.xml
   cash_flow=xml/cash_flows/cash_flow_models.xml
   cash_flow_table=xml/cash_flows/cash_flow_table.xml
   geo_table=xml/geo_tables/geo_table.xml
   management=xml/management_models/management_models.xml
   operation=xml/operation_models/operation_model.xml
   parameter_table=xml/parameter_tables/parameter_table.xml
   prediction=xml/prediction_models/prediction_model.xml

**XML modelchain**::

   [xml.modelchain]

XML.modelchain section controls the paths from where XML model chain documents (:ref:`model-chain-xml`) are searched and processed::

   # Note that these are directories, not individual files
   init=xml/model_chains/tree_simulator/init/
   simulation=xml/model_chains/tree_simulator/
   forced_operation=xml/model_chains/tree_simulator/forced_operation/
   operation=xml/model_chains/tree_simulator/operation/

**Executable modelbase**::

   [executable.modelbase]

Executable modelbase contains the paths to directories where the model implementations, or model libraries, are located during runtime. These model implementations include prediction models (:ref:`prediction-model-library`), operation models (:ref:`operation-model-library`) and location dependent geo tables (:ref:`geo-table`).::

   aggregation=models/aggregation
   cash_flow=models/cash_flow
   cash_flow_table=xml/cash_flows
   geo_table=models/geo_table
   management=models/management
   operation=models/operation
   parameter_table=xml/parameter_tables
   prediction=models/prediction

Content of lister.ini for listing object db content
+++++++++++++++++++++++++++++++++++++++++++++++++++

The beginning of the ini-file used to control the listing of SIMO object database content is the same as for the build command.

Which objects are listed is controlled by setting values for keys in three sections. If the key does not have a value it's names won't be listed. The exact value is irrelevant; e.g. key=1 to export or key= to not::

    [general]
    lexicon=1
    message_translation=1
    lexicon_translation=1
    text2data=1
    text2operation=1
    operation2modelchains=1
    simulation_control=1
    problem_definition=1
    output_constraint=1
    random_values=
    aggregation_definition=1

    [modelbase]
    # The names are listed by type, define which types you want below.
    aggregation=1
    cash_flow=1
    cash_flow_table=1
    geo_table=1
    management=1
    parameter_table=1
    prediction=1
    operation=1

    [modelchain]
    # Likewise modelchains are listed by type.
    init=1
    simulation=1
    operation=1
    forced_operation=1

Content of exporter.ini for object to XML converison of SIMO instances
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

You can run::
    
    ./bin/builder export ./simulator/exporter.ini

To export the object database content back to XML documents. Again, the ini-file has the same content at the beginning as the ini for build command. 

As above, exporting is defined in three sections, in which key-value pairs control whether exporting is done or not. The value for a key should be the path to which the resulting files from that item are to be written. These paths will be prepended with the *result_folder* value in *[program_execution]* section::

    [general_export]
    lexicon=lexicon
    message_translation=translation
    lexicon_translation=translation
    text2data=data_conversion
    text2operation=operation_conversion
    operation2modelchains=operation_conversion
    simulation_control=simulation
    problem_definition=optimization
    output_constraint=output_constraint
    random_values=
    aggregation_definition=aggregation_def

    [model_export]
    # The models are exported by type, define which types you want below.
    aggregation=models/aggregation
    cash_flow=models/cash_flow
    cash_flow_table=models/cash_flow_table
    geo_table=models/geo_table
    management=models/management
    operation=models/operation
    parameter_table=models/parameter_table
    prediction=models/prediction

    [modelchain_export]
    # Likewise modelchains are exported by type.
    init=modelchains/init
    simulation=modelchains/simulation
    forced_operation=modelchains/forced_operation
    operation=modelchains/operation

Running simulations & optimizations with runner
-----------------------------------------------

The actual data import, simulation, optimisation and reporting modules can be run using the **runner** executable::

   bin/runner simulator/runner.ini

For source distribution on Windows, which lacks the *bin* versions, the command is::

   python src/runner.py simulator/runner.ini

Developer option for debugging
++++++++++++++++++++++++++++++

You can give -d or --debug option for runner for the program execution to fall to a debugger (pdb) in case there is an unexpected exception during the program run::

   bin/runner -d simulator/runner.ini

Content of runner.ini
+++++++++++++++++++++

The content of the runner.ini-file is broken down into separate sections for which details are given below. The file paths in the ini-file can be given either as absolute or relative paths. One can also add comments in the file. The comments must begin with the # sign.

**Program execution**::

   [program_execution]

The folder which is used a the base folder for the relative path settings used in the ini file::

   base_folder= C:\Users\Joe\SIMO\simulator

Run ID is used for identifying separate program runs::

   run_id=demo

The following options define which of the four modules are run during the program execution::

   import_data=TRUE
   simulate=TRUE
   optimize=TRUE
   output_data=FALSE
   
The parameters are boolean values, for which the allowed values are: 

* 1, TRUE, True, true, Yes, yes, y, Y, on
* 0, FALSE,  False, false, No, no, n, N, off

In some cases, you might want to retain the contents already in the database you're dealing with, especially when dealing with concurrency, so the following options allow you to prevent a wipe of the existing database.

    no_wipe_import=FALSE
    no_wipe_simulate=FALSE
    no_wipe_optimize=FALSE

A specialized case, which is mostly relevant for parallel execution, requires returning the SQL to be run locally on the server instead of remotely from the workers. These should usually be False (and altered at runtime) and will be harmful in a regular run, as the data is not stored anywhere. 

    no_exec_import=FALSE
    no_exec_simulate=FALSE
    no_exec_optimize=FALSE

Code profiling (boolean value) is normally switched off. It's used in code development for profiling program execution times::

   code_profiling=off

**Message logging**::

   [logging]

Boolean value indicating whether old log messages should be wiped out from the log database::

   wipe_log=1

Whether the log messages during program execution are written on the screen. If console output is on, what level messages are logged on the screen::

   console=1
   console_level=info

Possible values are: debug, info, warning, error, critical

Debug-level contains all the messages possible including info, warning, error and critical messages. Similarly error level contains only error and critical messages.

**Message translation**::

   [translation]
   
Log message translation is defined with twoXML documents: message_translation parameter refers to a file containing the message body translations, lexicon_translation contains the data level and attribute name translations. The desired message language setting must match a language code used in the XML documents. The XML document names are given without the xml-extension or path::
   
   message_translation=message_translation
   lexicon_translation=lexicon_translation
   lang=fi

**Error handling**::

   [error_handling]

These settings affect when the simulation is terminated: if max_error_count_total is exceeded during the simulation (including data import), the whole simulation is terminated. By setting this value to negative value, the simulation run is not terminated because of data import or simulation errors. If max_error_count_per_unit is exceeded for any given simulation unit, the simulation for that unit is terminated and the simulaiton proceeds with the next simulation unit::

   max_error_count_total=100
   max_error_count_per_unit=10
   
If the parameter suppress_location is set to true, the error message will not contain description which identifies the location in a model chain the error originated from::

   suppress_location=False

**Warning handling**::

   [warning_handling]

Despite the logging level setting, the warning messages can be suppressed from the logs by setting the output_warnings to false. The suppress_location setting works similarly to the one for error handling::

   output_warnings=True
   suppress_location=False

**SIMO database**::

   [simo_db]

Path to SIMO object database. SIMO object database is for storing the SIMO's internal data structures constructed from the XML documents::

   db_path=db/simo.db

**DATA database**::

   [data_db]

The DATA database is used for storing simulation input and result data. The name of the data level having the simulation units (like stands or plots) must be defined here::

    simulation_level=comp_unit

db_type can be either sqlite, postgresql or oracle::

    db_type=sqlite

Spatial database is needed if spatial properties of the data should be included in the simulation. If spatial is set to true, the spatial reference system id for the data needs to be defined. It's the EPSG code for the coordinate system (see: http://spatialreference.org/ref/epsg/). The diminfo-setting needs to be defined if the db_type is oracle. Diminfo is a list of four values separated with ;. The values are the minimum x coordinate in the data, the minimum y coordinate, and maximum x and y coordinates (values in the coordinate system defined with the srid setting:

    spatial=false
    srid=3067
    # diminfo: minx;miny;maxx;maxy
    diminfo=

Threaded-setting controls whether a threaded connection to the database is used (currently Oracle only), and unlogged-setting whether the database uses unlogged tables (currently PostgreSQL only). WARNING: when usin unlogged tables, the database can't be recovered after a crash or unclean shutdown. This means that _all_ data from the database will be lost; i.e. after opening it again after a db crash, it will be empty.::

    threaded=false
    unlogged=false

    [data_db.sqlite]

SQLite specific settings: database in memory (nothing saved to hard drive, but faster), and the location of the on-disk storage of the databases::

    in_memory_db=false
    db_dir=db
    log_db=log.db
    input_db=input.db
    operation_db=operation.db
    result_db=simulated.db
    optimized_db=optimal.db

    [data_db.postgresql]

PostgreSQL specific settings: table prefixes for each type of tables (to separate the tables that are stored in the same schema), server connection parameters::

    log_prefix=log
    input_prefix=input
    operation_prefix=operation
    result_prefix=result
    optimized_prefix=optimized
    db_host=localhost
    db_port=5432
    db_user=test
    db_pw=test

If the database with the given name doesn't exist, it will be created, the same applies for the schema within the database::

    db_name=simodb
    db_schema=test

    [data_db.oracle]

Oracle specific settings: table prefixes for each type of tables (to separate the tables that are stored in the same schema), server connection parameters::

    log_prefix=log
    input_prefix=input
    operation_prefix=operation
    result_prefix=result
    optimized_prefix=optimized
    db_host=localhost:1521/orcl
    db_user=test
    db_pw=test

**Data import**::

   [import]
   format=inlined

Allowed values: *by_level, inlined*

*by_level* data has data for different data levels in different files, whereas in *inlined* data the different data levels are in the same file.

Text to data conversion mapping file defines how the input data values are imported into SIMO. Mapping definition XML document is given without the xml-extension or path::

   mapping_definition=MELA2SIMO

Boolean value indicating whether date information should be imported from input data::

   import_date=True

Data date must be given regardless of the import_date. Date is given given as d.m.yyyy. If import_date=TRUE, data_date value is used as data date in those cases where date is missing from data. If import_date=FALSE, data_date value is always used.

   data_date=1.1.2009

Data delimiter in the input text file. Whitespace as delimiter given as ' '. First line (i.e. header line) is skipped if skip_first_line=FALSE.

   data_delimiter=' '
   skip_first_line=False

Input data directory and file(s) are defined with options *data_dir* and *data_file*. Multiple data files are delimited with ;. If the format is by_level there are several data files. The files are given in the top-down order of data levels separated by a semicolon; e.g., stand.txt;stratum.txt;tree.txt:: ::

   data_dir=input
   data_file=data.rsu

**Operation import**::

   [import.operations]

So-called *forced operations* can be imported if operation_import=TRUE. Execution of forced operations is pre-defined in the operation input data::

   operation_import=TRUE

Forced operations can be imported from various formats. The allowed formats are: xml, text, db. Operation input file(s) are defined with *operation_dir* option::

   operation_format=text
   operation_dir=input

Operation conversion defined how the operations are imported into SIMO. Only *text* format requires operation_conversion definition, as xml and db are native SIMO operation import formats. More info on operation conversion can be found from operation mapping XML document (:ref:`operation-conversion-xml`). Operation implementation logic is defined in model chains and thus *operation_modelchains* option is required. Forced operations input file is defined with option *operation_file*::

   operation_conversion=SMU2SIMO
   operation_modelchains=operation_mapping
   operation_file=data.smu

**Simulation**::

   [simulation]

Simulation control XML document (:ref:`simulation-xml`) is the high-level simulation control definition, containing the simulation time span definitions, model chains, initial variable values and output constraints. Simulation control XML document name is given without the xml-extension or path::

   simulation_control=simulation

Simulation *task_type* must be either *simulation* (creates a result db) or *data_processing* (modifies the input db)::

    task_type=simulation

In data processing the input data is modified; i.e. the computation will modify the values in the input database. Data processing can contain both init, simulation and forced operation chains. In simulation the data from the input database is taken as is, and a result database is generated.::

    deterministic=True
    track_prices=True

Deterministic-setting, if set to True, removes any stochasticity from the results. If track_prices is set to true, the timber assortment unit prices used for each main level object in operations are stored in the result database.

Next four settings relate to forestry operation simulation: when set to False, *do_forced_ops* causes the simulation of forced operations; i.e, operations that are set to happen on the simulation unit on a given date, to be skipped altogether. When dealing with forced operations and Monte Carlo simulations (more than 1 realizations per unit), there are two options: 1. either each unit-iteration has individual set of forced operations, or 2. each unit has a single set of forced operations which should be used for all realizations (iterations). In the former case *copy_ops_to_all_iterations* should be FALSE, and in the latter case the option should be TRUE and forced ops will be copied from iteration 0 to all other iterations.

Option *create_branch_desc* controls whether for each simulation unit, a description of each decision tree branch will be generated or not. The description contains the name of each operation that has caused branching in the decision tree. *cache_dir* is the directory path in which the unit price and stem volume caches will be written to. The 'simulation' level variable *USE_CACHE* defined in the simulation_control (set to 1 to use caching) will control whether the caches will be used or not. See later for examples of simulation control::

    do_forced_ops=True
    copy_ops_to_all_iterations=True
    create_branch_desc=True
    cache_dir = cache

Option *use_data_date_as_init_date* sets whether the date stored in input data will be used as the starting date for the simulation. It's also possible to use the date from the computer's clock as the initial date (*use_today_as_init_date*). If both of these settings are set to false, the date given in *fixed_init_date* is used as the starting date for the simulation. The date is given in d.m.yyyy format. If the task type is 'data_processing' there are two options for how dates are handled during computation, by setting *fix_data_processing_date_to_init* to True the date for the data after the data processing run is the same as the date used for the starting date for the computation. On the other hand if the same option is set to False, the date for the data after the computation will be whatever is dictated by the simulation control used for the computation; i.e., if the simulation control contains a 10 year simulation, the data date after the run will be the starting date + 10 years. Note that for 'simulation' type tasks the results will be date stamped to the end of each simulation period but for 'data processing' type tasks they will be data stamped to the beginning of the next period (unless a fixed date is used). For example for a simulation control with one 1 year period and a starting date of 1.1.2010, the results from a 'simulation' type run will have a date of 31.12.2010. For a 'data processing' type run the result date will be 1.1.2011.::

    use_data_date_as_init_date=False
    use_today_as_init_date=False
    fixed_init_date=1.1.2008
    fix_data_processing_date_to_init=False

The following lines can be modified to improve data execution times. Over- and underestimates can, however, hurt execution time.
Option *object_count_multiplier* is a positive non-zero number, *estimated_branch_count* is a positive non-zero integer number, *max_unit_count* is a positive non-zero integer and number that sets the amount of simulation units that are calculated at a time. Option *max_task_count* is a positive non-zero integer number that sets the amount of simulation units that are sent from the server to a worker at once (and has no effect if the program is not in server mode). f.e. max_unit_count 30 max_task_count 300 will have workers working on 10 set groups for faster execution time. Option *max_result_unit_count* control the maximum number of result units from prediction models (e.g. new trees from distribution models)::

    object_count_multiplier=50
    estimated_branch_count=1
    max_unit_count=1
    max_task_count=300
    max_result_unit_count=60

Simulator can be forced to either always leave a single untouched branch or never leave untouched (no-op) branches. This setting works "globally", meaning that it will consider all branching groups simultaneously. If you DON'T want to leave a branch with no operations (*leave_no_op_branch* = False), but you have, for example, separate branching groups for thinnings and clearcuts, and the stand is at a development state in that is past thinning stage, i.e. only clearcut branches will be simulated. In this case the last clearcut will still leave the untouched branch, as the thinnings haven't been yet simulated. The untouched branch would only 'disappear' if all the possible thinning and clearcut branches would be simulated. This can be avoided by using *separate_branching_groups* option, which when set to True, will process each branching group separately and you can have multiple branching groups without the problem of "untouched" branches appearing. Option *separate_branching_groups* is set to False by default.

    leave_no_op_branch=True
    separate_branching_groups=False

In cases where branching is very dense, the number of generated branches may cause problems memory- and performance-wise. These problems can be avoided by setting a limit for maximum number of branches, using option *max_branch_limit*.

    max_branch_limit=100

If one wants to simulate only a subset of the input data, individual simulation units can be listed in the *id_list* parameter by listing their ids separated by a semicolon. Note that the parameter is commented out in the example. index_sequence parameter can be used to direct a certain sequence of simulation units to simulation, in the example first 10 units will be simulated. Setting the parameter value to 99- all the simulation units from the 100th onwards will be simulated::

    #id_list=567;773;5367
    index_sequence=0-9

**Optimization**::

    [optimization]

Optimisation method should be one of: *hero*, *tabu_sarch*, *jlp*::

    method=hero

Jlp will use linear programming (`LP <http://en.wikipedia.org/wiki/Linear_programming>`_) to find a global optimum. `Tabu search <http://en.wikipedia.org/wiki/Tabu_search>`_ and  HERO are heuristic optimization methods, which don't necessarily find the global optimum but are more flexible for the optimization problem formulation.

The optimization task is defined in a SIMO optimization task object, constructed from optimization_task XML document (:ref:`optimization-task-xml`). What is given as the value of the problem_definition setting, will be used as the optimization run id in the database as well::

    problem_definition=optimization_task
    keep_opt_data = False
    reuse_opt_data = False

Boolean setting *keep_opt_data* controls whether the optimization data file will be left on disk after the optimization run or not. The data file get its name from the *problem_definition* setting. Connected to the *keep_opt_data* setting, the boolean setting *reuse_opt_data* controls whether an existing optimization data file is used or not. You can reuse the data file if you've run the optimization once before and the expressions and constraints in the optimization task have not changed; e.g. when you run the optimization with a different heuristic method or change the parameters for a method. The data file is the one called *opt_task.h5* in the base folder set earlier::

Setting *limit2branches_include* and *limit2branches_exclude* can be used to limit the optimization to just specific branches of the data, e.g.::

    limit2branches_include=Normal;5 yrs
    limit2branches_exclude=10 yrs

would only pick those alternative development scenarios that would have the string 'Normal' or '5 yrs' in their branch names, and not the string '10 yrs'; i.e. one has to list the strings wanted to appear in the branch name in *limit2branches_include*, and the strings that must not appear in the branch name in the *limit2branches_exclude*. Strings are separated by semicolon. The match can be partial, i.e. the whole branch name does not have to be given. The branch names can be found in the simulation result database from the table branch_desc in the branch_desc column. They are generated from the name-attributes of the <condition> child elements of <branch_conditions> of the modelchain XML document (:ref:`model-chain-xml`) used to describe the alternative branch generation logic in the simulation. Note that a single branch name will contain a combination of the condition names if there are several branching events for the particular branch over the whole simulation period.

Note that even if used branches are limited, the branch 0 is always included in the optimization data, so you should probably set *leave_no_op_branch* to true so that branch 0 does not contain any branching operations.

Normally though, these settings are left empty so that the optimization uses all simulated alternatives for optimal solution candidates::

    limit2branches=

In the case of heuristic optimization, a couple of images illustrating the optimization procedure are generated in the folder defined in the *chart_path* parameter::

    chart_path=.\logs

In the case of heuristic optimization, a couple of images illustrating the optimization procedure are generated in the folder defined in the *chart_path* parameter.

For the LP optimization using J, the installation folder of `J <http://www.metla.fi/products/J>`_ must be given as well as the prefix used for all the files that SIMO generates for J. The value of max_objects should be increased if the optimization using J fails to memory related errors. Also, the timeout value is dependent on the problem size. It determines the time in seconds that the optimization is allowed to run with J. This is because at certain corner cases (signifying a bug in J), J can drop to it's internal prompt while executing the optimization thus leading to indefinite optimization execution without the timeout. Use big values for the timeout with large programs, otherwise the optimization will be terminated prematurely due to the timeout.

In case you want to split optimized operations according to the split unit weights generated by J, set the *do_split* parameter to True. OBS! The split weights DO NOT affect simulated data values in any way, only the operations!!! The optimizer will multiple all numerical op_res columns in the optimized database with the given weights and this will cause trouble if you have some categorical variables in your results, such as tree species or timber assortment. This can be avoided by defining the columns you do not want to split in the *skip_weight_splits* parameter::

    [optimization.jlp]
    j_folder=./J
    file_prefix=demo
    max_objects=2000
    timeout=30
    do_split=False
    skip_weight_splits=species;assortment
   
Because of the iterative nature of optimum search in heuristic methods, the maximum number of iterations used are given. For tabu search also two other parameters are given: for how many iterations is the current solution fixed as the optimum solution (entry_tenure) and for how many iterations is a certain solution kept outside the optimum solution once it's been removed from it (exit_tenure)::
 
   [optimization.tabusearch]
   maximum_iterations=100000
   entry_tenure=10
   exit_tenure=20

   [optimization.hero]
   maximum_iterations=100000

**Reporting**::

   [output]
   data_type=result
   result_type=optimized
   start_date=1.1.2000
   end_date=1.1.2010
   full_date=false
   #id_list=567;773;5367
   #index_sequence=10-24

The output can be used to output values from input or result data (data_type). For the output format operation_result this setting has no effect as for that format the data is always taken from the result database.

Allowed values: input, result

If the data type is result, the result type can be either simulated or optimized (result_type).

Allowed values: simulated, optimized

It's also possible to limit the output to certain time period, using *start_date* and *end_date*. By leaving these options empty, all the dates are reported. By leaving the end_date value empty, all periods beginning from the start_date are reported.

If *full_date* is true, month and day are printed in the output. Otherwise only the year is printed.

Similarly to simulation, the reporting can be targeted to only specific simulation units (see above for definitions of id_list and index_sequence).

It's possible to select several output formats at the same time by separating the format names with a semicolon. There must be a matching number of output filenames again separated with a semicolon::

   output_directory=.
   output_format=aggregation;inlined;operation_result
   output_filename=aggr;stock;oper
   output_constraint_file=output_constraints
   default_decimal_places=1
   archiving=False

Allowed output formats:

* *inlined, by_level*: data from different levels and different simulation branches in the same output file
* *smt* - data for a single data level from the first data branch (usually the one generated by optimization module) for the last period in the simulation. Optionally several smt files can be generated, each for a single simulation period.
* *aggregation*: for aggregating values over time and location
* *operation_result*: the forestry operation results
* *branching_graph* format outputs dot text files that can be converted into images with the `Graphviz <http://www.graphviz.org/>`_ program. Dot files are generated per computation unit and they describe the branching of the simulation results as well as the forestry operation causing each branch generation.

output_constraint_filename is given only for the formats inlined, by_level and smt. It should refer to a SIMO object, constructed from an XML document describing the attributes for different data levels that should be in the report (:ref:`output-constraint-xml`). The XML document name is given without the xml-extension or path.

The text output files can be automatically compressed (archiving).

For inlined-format, result padding will align the column names in the header and the data values::

   [output.inlined]
   result_padding = True

Aggregation rules and output options are defined in an XML document (:ref:`report-definition-xml`). XML document name is given in aggregation_definition_file option without xml-extension or path::

   [output.aggregation]
   aggregation_definition_file=aggregation
   aggregation_charts_directory=.

For operation_result-format vars-parameter defines which of the operation model result variables should be included in the operation result report. The result variables are described in the operation model XML document (:ref:`operation-modelbase-xml`). In addition of those there is an explicit result variable cash_flow. Operation results contain also categorical variables such as tree species and assortment. Operation results can be grouped by these categorical variables by defining the variable names using group_by parameter::
 
   [output.operation_result]
   vars=cash_flow;Volume
   group_by=SP;assortment

What happened in that run? Enter logger
---------------------------------------

Logger module **logger** is used for retrieving information about simulation and optimisation from log database. Info on logger usage can be get with command::

   bin/logger --help

For source distribution on Windows, which lacks the *bin* versions, the command is::

   python src/logger.py --help


Using the Postgres database provided by buildout
================================================

To initialize Postgres database to store data in simulator/db/pg directory::

    ./parts/postgresql/bin/initdb -D simulator/db/pg

and then to start it daemonized::

    ./parts/postgresql/bin/pg_ctl -D simulator/db/pg -l simulator/log/pgsql.log start

Next, you'll need to create the Postgres user used in the simulation (default is test). You can do this with the command line SQL interface psql or the createuser command (see ./parts/postgresql/bin/createuser --help)::

    ./parts/postgresql/bin/psql postgres

and then in psql::

    CREATE USER test WITH PASSWORD 'test' CREATEDB;

Now you're ready to run the test simulation with Postgre as database once you've changed the db_type ini-setting::

    [data_db]
    # db_type must be sqlite or postgresql.
    db_type=postgresql

Doing things in parallel
========================

First up, you'll need a few more things installed if you wish to use Simo in parallel. First of all, you need to install celery, postgresql (if you didn't already) and a redis server.

Now, in the source folder, there are additional runners, for this, you'll want to use the celery_runner to start up the runner host, which functions mostly as the regular runner, except that instead of running the simulations, it'll pass the jobs to celery workers via amqp.

As for the workers, they are located in the *dev/celery_example* folder. Both the server and the workers are, without any changes, going to be expecting all the pieces to be running on localhost, with postgresql having a user test with password test. For example, if you wish to run postgresql offsite (running server and postgresql at the same place highly recommended for speed), you'll need to change it in the runner.ini, celeryconfig.py (which _must_ be on the path) and celery_tasks (which _also_ needs to be on the path). The workers are started using celeryd, a program provided by the celery installation. No parameters neccessary.

The proper order of starting things up is PostgreSQL, redis, celery_runner, celeryd. For rabbitmq installation and startup instructions for different platforms, see `these instructions <http://www.rabbitmq.com/install.html>`_

So, in case you're on a unix system, and have run buildout (and builder) successfully, you'll need to ensure the rabbitmq and postgresql database are running, then execute::

    bin/redis-server
    bin/python src/celery_runner.py simulator/runner.ini

Then, wherever you're running the worker(s)::

    bin/celeryd

Do note that you'll need to have a replica of the simulation setup on the worker host; i.e., a matching copy of the SIMO db.

Also note that all the parallel runs use the default celery queues in rabbitmq, so you should really be running a single run per master server, as otherwise the different simulation tasks from different runs will happily mix in the queue and the workers will get tasks from all of the runs. Therefore, if you abort a run, make sure to reset the rabbitmq queues. This is accomplished with::

    bin/celeryd --purge

While the celery_runner, celery_tasks combo is functional, these are more for demonstrative purposes; you should consider making your own server and workers based on them, and your needs, rather than exclusively relying on them.

Customizing celeryconfig.py
---------------------------

It is quite likely that you'll need to alter *celeryconfig.py* to fit in your environment.

All settings starting BROKER_ refer to the amqp server, and should match it's settings.

CELERY_RESULT_BACKEND chooses the method via which the workers return their results to the server. We prefer using redis, as it seems to be the most reliable option at this time. The REDIS_ settings refer to the redis server settings.

CELERY_IMPORTS defines the file from which all the possible tasks are imported, by default *celery_tasks.py*. This needs to be on the path.

CELERYD_CONCURRENCY allows you to define how many workers are on this server. If not set, it defaults to the amount of available cores.

CELERYD_PREFETCH_MULTIPLIER defines how many tasks a worker preloads to itself. This should be set to 1 so that the tasks are executed as workers become available.

CELERY_REDIRECT_STDOUTS allows you to get print statements to the log/output. If it is set to False, all STDOUT output is lost.

CELERYD_LOG_LEVEL defines how detailed output from the workers is printed out. Allowed values are DEBUG, INFO, WARNING and ERROR. STDOUT, if redirected will be on WARNING level of output.

For further reading about this, see `the documentation <http://celeryq.org/docs/configuration.html>`_

Customizing celery_tasks.py
---------------------------

As mentioned above, this is the file pointed by the CELERY_IMPORTS setting. The key things to change are::

    base_dir -- the location of the simo libraries (usually ./src/simo)
    sim_dir -- the local equivalent for the configuration variable 
               program_execution.base_folder
    pg_host -- address of the postgresql server
    pg_port -- port of the postgresql server
    pg_user -- username on the postgresql server
    pg_pw -- password of the user on the postgresql server

The workers use the postgresql information to get the configuration stored there, as opposed to using the local ini file, overriding some key values as stated above.

When things go wrong
--------------------

If you need to abort a running parallel job for one reason or another, the unfinished tasks are left in the rabbitmq queue. The clear the queue, you can use celeryd --purge, but to do things directly on rabbitmq you can also::

    sudo rabbitmqctl stop_app
    sudo rabbitmqctl reset
    sudo rabbitmqctl start_app