Using SIMO¶

Shortcut for the impatient¶

Your first SIMO run will consist of two commands. The first one will build your SIMO simulator and optimizer and the next command will run a data import, simulation, optimizatio and reporting cycle using those and a demo data set. Later, you’ll only need to run the builder command if you have changed the XML documents used to describe the simulator & optimizer in SIMO.

On Windows¶

First, open the Command Shell. From the Windows Start-menu select Run... (XP) or the search field (Vista) and type cmd

Change to working directory to the SIMO installation directory using the cd command, and then in that directory:

bin\builder.exe build simulator\builder.ini
bin\runner.exe simulator\runner.ini

If you wish to run SIMO as a server, you need to first run the following:

bin\builder.exe build simulator\builder.ini

Then, all of the following need to be running at the same time:

redis-server
bin\server.exe simulator\server_runner.ini
bin\worker.exe

After which you can run the following to start runs:

bin\control.exe

If you’re using SIMO source distribution, you won’t have the bin -executables and you have to run SIMO using the following commands:

python src\builder.py build simulator\builder.ini
python src\runner.py simulator\runner.ini

If you wish to run SIMO as a server, you need to first run the following:

python src\builder.py build simulator\builder.ini

Then, all of the following need to be running at the same time:

redis-server
python src\simoserver\simoserver.py simulator\server_runner.ini
python src\simoserver\run_celeryd.py

After which you can run the following to start runs:

python src\simoserver\simoctrl.py

On Linux & OS X¶

SIMO is run in the terminal from the installation directory:

bin/builder build simulator/builder.ini
bin/runner simulator/runner.ini

If you wish to run SIMO as a server, you need to first run the following:

bin/builder build simulator/builder.ini

Then, all of the following need to be running at the same time:

bin/redis-server
bin/server simulator/server_runner.ini
bin/server_celeryd

After which you can run the following to start runs:

bin/server_control

Structure of your SIMO installation¶

To use SIMO, you have three executables in the bin directory in your SIMO installation directory:

/bin
   builder.exe or builder
   runner.exe or runner
   logger.exe or logger

If you’re using the source distribution on Windows, there won’t be the bin directory. Instead you’ll have these files in the src directory:

/src
   builder.py
   runner.py
   logger.py

The three executable files (or Python files) are the modules used for running SIMO. The default setup is to run the programs from the root SIMO directory.

In addition to those, the simulator directory in your SIMO installation directory contains important bits and pieces used to construct the simulator and optimizer you’ll be using:

/simulator
   /db
   /input
   /models
      /aggregation
      /cash_flow
      /geo_table
      /management
      /operation
      /prediction
   /output
   /xml
      /aggregation_models
      /cash_flows
      /conversions
      /geo_tables
      /management_models
      /model_chains
      /operation_models
      /parameter_tables
      /prediction_models
      /samples
      /schemas
      /translation
   builder.ini
   exporter.ini
   lister.ini
   runner.ini

The files builder.ini and runner.ini are parameter files for controlling the execution of the builder and runner modules, respectively. The name and the location of the ini-files can naturally be different to the one given above.

In the folder structure above the ./simulator/xml folder contains the XML documents that the user can modify. The ./simulator/models folder contains the model libraries for the various model types (dll and py files). The ./simulator/db folder contains the internal SIMO databases, whereas the ./simulator/input and ./simulator/output folders are reserved for the input data and reports.

The above folder structure is given only as illustration of a possible configuration. The specific location of different components is defined in the ini-files.

Building simulators & optimizers with builder¶

Builder parses and validates XML documents, constructs SIMO objects from the XML contents and stores the objects into an object database. Builder has three commands; build, list and export; and it’s run like this (build command):

bin/builder build simulator/builder.ini

For source distribution on Windows, which lacks the bin versions, the command is:

python src/builder.py build simulator/builder.ini

You’ll need to run the builder build command once at the beginning after a fresh installation and after that only if you make changes to the XML documents in the simulator directory.

Content of builder.ini for building SIMO instances¶

The content of the builder.ini-file is broken down into separate sections for which details are given below. The file paths in the ini-file can be given either as absolute or relative paths. One can also add comments in the file. The comments must begin with the # sign.

Program execution:

[program_execution]

The folder which is used a the base folder for the relative path settings used in the ini file:

base_folder= C:\Users\Joe\SIMO\simulator

Logging:

[logging]

Whether the log messages during program execution are written on the screen, and if console output is on, what level messages are logged on the screen:

console=TRUE
console_level=info

If file is given, log messages are written to the file, and if file output is on, what level messages are logged:

file=buildlog.txt
file_level=error

Database:

[simo_database]

Path to SIMO object database file:

path=db/simo.db

Option create_new defines whether new database should be created and existing database deleted. Option create_zip zips the object database file and creates a .ver file to accompany it. The .ver file contains a timestamp for the zipped object db. Normally you won’t be using this, but it gives you an option to build basic versioning for your SIMO dbs:

create_new=True
timestamp=False
create_zip=False

Typedef:

[typedef]

Path to SIMO type definition schema document:

path=xml/schemas/Typedefs_SIMO.xsd

Schema:

[schema]

Schema section contains the paths to all required SIMO schema documents. Schema documents are used for validating the XML document structure and contents:

lexicon=xml/schemas/lexicon.xsd
message_translation=xml/schemas/message_translation_table.xsd
lexicon_translation=xml/schemas/lexicon_translation_table.xsd
text2data=xml/schemas/text2data.xsd
operation2modelchains=xml/schemas/operation2modelchains.xsd
text2operation=xml/schemas/text2operation.xsd
modelchain=xml/schemas/model_chain.xsd
simulation_control=xml/schemas/simulation.xsd
problem_definition=xml/schemas/optimization_task.xsd
output_constraint=xml/schemas/output_constraint.xsd
random_values=xml/schemas/random_variables.xsd
aggregation_definition=xml/schemas/report_definition.xsd

Schema.modelbase:

[schema.modelbase]

SIMO model classes also require schema documents for validating the model interfaces:

aggregation=xml/schemas/aggregation_modelbase.xsd
cash_flow=xml/schemas/cash_flow_modelbase.xsd
cash_flow_table=xml/schemas/cash_flow_table.xsd
geo_table=xml/schemas/geo_table.xsd
management=xml/schemas/management_modelbase.xsd
operation=xml/schemas/operation_modelbase.xsd
parameter_table=xml/schemas/parameter_table.xsd
prediction=xml/schemas/prediction_modelbase.xsd

XML:

[xml]

XML section controls which XML documents builder parses, validates and stores into SIMO object database:

# for the xmls several files or folders may be given
# separate the list items with ;
# if you merge lexicon from multiple XML documents, the master lexicon
# should be defined first, and the extra lexicons after that, separated by ;
lexicon=xml/lexicon.xml
message_translation=xml/translation/message_translation.xml
lexicon_translation=xml/translation/lexicon_translation.xml
text2data=xml/conversions/MELA2SIMO.xml
text2operation=xml/conversions/SMU2SIMO.xml
operation2modelchains=xml/operation_models/operation_mapping.xml
problem_definition=xml/samples/optimization_task.xml
output_constraint=xml/samples/result_variables.xml
random_values=
simulation_control=xml/samples/simulation.xml;xml/samples/simulation_price_model.xml
aggregation_definition=xml/samples/aggregation_def.xml

XML modelbase:

[xml.modelbase]

All models must have XML definitions, or interfaces, where model inputs and output etc. are defined. These include XML documents describing the prediction models (Prediction model), operation models (Operation model), cash flow tables (Cash flow), parameter tables (Parameter table) and location dependent tables (Geo table):::

aggregation=xml/aggregation_models/aggregation_models.xml
cash_flow=xml/cash_flows/cash_flow_models.xml
cash_flow_table=xml/cash_flows/cash_flow_table.xml
geo_table=xml/geo_tables/geo_table.xml
management=xml/management_models/management_models.xml
operation=xml/operation_models/operation_model.xml
parameter_table=xml/parameter_tables/parameter_table.xml
prediction=xml/prediction_models/prediction_model.xml

XML modelchain:

[xml.modelchain]

XML.modelchain section controls the paths from where XML model chain documents (Model chain) are searched and processed:

# Note that these are directories, not individual files
init=xml/model_chains/tree_simulator/init/
simulation=xml/model_chains/tree_simulator/
forced_operation=xml/model_chains/tree_simulator/forced_operation/
operation=xml/model_chains/tree_simulator/operation/

Executable modelbase:

[executable.modelbase]

Executable modelbase contains the paths to directories where the model implementations, or model libraries, are located during runtime. These model implementations include prediction models (Prediction model library), operation models (Operation model library) and location dependent geo tables (Geotable).:

aggregation=models/aggregation
cash_flow=models/cash_flow
cash_flow_table=xml/cash_flows
geo_table=models/geo_table
management=models/management
operation=models/operation
parameter_table=xml/parameter_tables
prediction=models/prediction

Content of lister.ini for listing object db content¶

The beginning of the ini-file used to control the listing of SIMO object database content is the same as for the build command.

Which objects are listed is controlled by setting values for keys in three sections. If the key does not have a value it’s names won’t be listed. The exact value is irrelevant; e.g. key=1 to export or key= to not:

[general]
lexicon=1
message_translation=1
lexicon_translation=1
text2data=1
text2operation=1
operation2modelchains=1
simulation_control=1
problem_definition=1
output_constraint=1
random_values=
aggregation_definition=1

[modelbase]
# The names are listed by type, define which types you want below.
aggregation=1
cash_flow=1
cash_flow_table=1
geo_table=1
management=1
parameter_table=1
prediction=1
operation=1

[modelchain]
# Likewise modelchains are listed by type.
init=1
simulation=1
operation=1
forced_operation=1

Content of exporter.ini for object to XML converison of SIMO instances¶

You can run:

./bin/builder export ./simulator/exporter.ini

To export the object database content back to XML documents. Again, the ini-file has the same content at the beginning as the ini for build command.

As above, exporting is defined in three sections, in which key-value pairs control whether exporting is done or not. The value for a key should be the path to which the resulting files from that item are to be written. These paths will be prepended with the result_folder value in [program_execution] section:

[general_export]
lexicon=lexicon
message_translation=translation
lexicon_translation=translation
text2data=data_conversion
text2operation=operation_conversion
operation2modelchains=operation_conversion
simulation_control=simulation
problem_definition=optimization
output_constraint=output_constraint
random_values=
aggregation_definition=aggregation_def

[model_export]
# The models are exported by type, define which types you want below.
aggregation=models/aggregation
cash_flow=models/cash_flow
cash_flow_table=models/cash_flow_table
geo_table=models/geo_table
management=models/management
operation=models/operation
parameter_table=models/parameter_table
prediction=models/prediction

[modelchain_export]
# Likewise modelchains are exported by type.
init=modelchains/init
simulation=modelchains/simulation
forced_operation=modelchains/forced_operation
operation=modelchains/operation

Running simulations & optimizations with runner¶

The actual data import, simulation, optimisation and reporting modules can be run using the runner executable:

bin/runner simulator/runner.ini

For source distribution on Windows, which lacks the bin versions, the command is:

python src/runner.py simulator/runner.ini

Developer option for debugging¶

You can give -d or –debug option for runner for the program execution to fall to a debugger (pdb) in case there is an unexpected exception during the program run:

bin/runner -d simulator/runner.ini

Content of runner.ini¶

The content of the runner.ini-file is broken down into separate sections for which details are given below. The file paths in the ini-file can be given either as absolute or relative paths. One can also add comments in the file. The comments must begin with the # sign.

Program execution:

[program_execution]

The folder which is used a the base folder for the relative path settings used in the ini file:

base_folder= C:\Users\Joe\SIMO\simulator

Run ID is used for identifying separate program runs:

run_id=demo

The following options define which of the four modules are run during the program execution:

import_data=TRUE
simulate=TRUE
optimize=TRUE
output_data=FALSE

The parameters are boolean values, for which the allowed values are:

1, TRUE, True, true, Yes, yes, y, Y, on
0, FALSE, False, false, No, no, n, N, off

In some cases, you might want to retain the contents already in the database you’re dealing with, especially when dealing with concurrency, so the following options allow you to prevent a wipe of the existing database.

no_wipe_import=FALSE no_wipe_simulate=FALSE no_wipe_optimize=FALSE

A specialized case, which is mostly relevant for parallel execution, requires returning the SQL to be run locally on the server instead of remotely from the workers. These should usually be False (and altered at runtime) and will be harmful in a regular run, as the data is not stored anywhere.

no_exec_import=FALSE no_exec_simulate=FALSE no_exec_optimize=FALSE

Code profiling (boolean value) is normally switched off. It’s used in code development for profiling program execution times:

code_profiling=off

Message logging:

[logging]

Boolean value indicating whether old log messages should be wiped out from the log database:

wipe_log=1

Whether the log messages during program execution are written on the screen. If console output is on, what level messages are logged on the screen:

console=1
console_level=info

Possible values are: debug, info, warning, error, critical

Debug-level contains all the messages possible including info, warning, error and critical messages. Similarly error level contains only error and critical messages.

Message translation:

[translation]

Log message translation is defined with twoXML documents: message_translation parameter refers to a file containing the message body translations, lexicon_translation contains the data level and attribute name translations. The desired message language setting must match a language code used in the XML documents. The XML document names are given without the xml-extension or path:

message_translation=message_translation
lexicon_translation=lexicon_translation
lang=fi

Error handling:

[error_handling]

These settings affect when the simulation is terminated: if max_error_count_total is exceeded during the simulation (including data import), the whole simulation is terminated. By setting this value to negative value, the simulation run is not terminated because of data import or simulation errors. If max_error_count_per_unit is exceeded for any given simulation unit, the simulation for that unit is terminated and the simulaiton proceeds with the next simulation unit:

max_error_count_total=100
max_error_count_per_unit=10

If the parameter suppress_location is set to true, the error message will not contain description which identifies the location in a model chain the error originated from:

suppress_location=False

Warning handling:

[warning_handling]

Despite the logging level setting, the warning messages can be suppressed from the logs by setting the output_warnings to false. The suppress_location setting works similarly to the one for error handling:

output_warnings=True
suppress_location=False

SIMO database:

[simo_db]

Path to SIMO object database. SIMO object database is for storing the SIMO’s internal data structures constructed from the XML documents:

db_path=db/simo.db

DATA database:

[data_db]

The DATA database is used for storing simulation input and result data. The name of the data level having the simulation units (like stands or plots) must be defined here:

simulation_level=comp_unit

db_type can be either sqlite, postgresql or oracle:

db_type=sqlite

Spatial database is needed if spatial properties of the data should be included in the simulation. If spatial is set to true, the spatial reference system id for the data needs to be defined. It’s the EPSG code for the coordinate system (see: http://spatialreference.org/ref/epsg/). The diminfo-setting needs to be defined if the db_type is oracle. Diminfo is a list of four values separated with ;. The values are the minimum x coordinate in the data, the minimum y coordinate, and maximum x and y coordinates (values in the coordinate system defined with the srid setting:

spatial=false srid=3067 # diminfo: minx;miny;maxx;maxy diminfo=

Threaded-setting controls whether a threaded connection to the database is used (currently Oracle only), and unlogged-setting whether the database uses unlogged tables (currently PostgreSQL only). WARNING: when usin unlogged tables, the database can’t be recovered after a crash or unclean shutdown. This means that _all_ data from the database will be lost; i.e. after opening it again after a db crash, it will be empty.:

threaded=false
unlogged=false

[data_db.sqlite]

SQLite specific settings: database in memory (nothing saved to hard drive, but faster), and the location of the on-disk storage of the databases:

in_memory_db=false
db_dir=db
log_db=log.db
input_db=input.db
operation_db=operation.db
result_db=simulated.db
optimized_db=optimal.db

[data_db.postgresql]

PostgreSQL specific settings: table prefixes for each type of tables (to separate the tables that are stored in the same schema), server connection parameters:

log_prefix=log
input_prefix=input
operation_prefix=operation
result_prefix=result
optimized_prefix=optimized
db_host=localhost
db_port=5432
db_user=test
db_pw=test

If the database with the given name doesn’t exist, it will be created, the same applies for the schema within the database:

db_name=simodb
db_schema=test

[data_db.oracle]

Oracle specific settings: table prefixes for each type of tables (to separate the tables that are stored in the same schema), server connection parameters:

log_prefix=log
input_prefix=input
operation_prefix=operation
result_prefix=result
optimized_prefix=optimized
db_host=localhost:1521/orcl
db_user=test
db_pw=test

Data import:

[import]
format=inlined

Allowed values: by_level, inlined

by_level data has data for different data levels in different files, whereas in inlined data the different data levels are in the same file.

Text to data conversion mapping file defines how the input data values are imported into SIMO. Mapping definition XML document is given without the xml-extension or path:

mapping_definition=MELA2SIMO

Boolean value indicating whether date information should be imported from input data:

import_date=True

Data date must be given regardless of the import_date. Date is given given as d.m.yyyy. If import_date=TRUE, data_date value is used as data date in those cases where date is missing from data. If import_date=FALSE, data_date value is always used.

data_date=1.1.2009

Data delimiter in the input text file. Whitespace as delimiter given as ‘ ‘. First line (i.e. header line) is skipped if skip_first_line=FALSE.

data_delimiter=’ ‘ skip_first_line=False

Input data directory and file(s) are defined with options data_dir and data_file. Multiple data files are delimited with ;. If the format is by_level there are several data files. The files are given in the top-down order of data levels separated by a semicolon; e.g., stand.txt;stratum.txt;tree.txt::

data_dir=input
data_file=data.rsu

Operation import:

[import.operations]

So-called forced operations can be imported if operation_import=TRUE. Execution of forced operations is pre-defined in the operation input data:

operation_import=TRUE

Forced operations can be imported from various formats. The allowed formats are: xml, text, db. Operation input file(s) are defined with operation_dir option:

operation_format=text
operation_dir=input

Operation conversion defined how the operations are imported into SIMO. Only text format requires operation_conversion definition, as xml and db are native SIMO operation import formats. More info on operation conversion can be found from operation mapping XML document (Operation conversion). Operation implementation logic is defined in model chains and thus operation_modelchains option is required. Forced operations input file is defined with option operation_file:

operation_conversion=SMU2SIMO
operation_modelchains=operation_mapping
operation_file=data.smu

Simulation:

[simulation]

Simulation control XML document (Simulation) is the high-level simulation control definition, containing the simulation time span definitions, model chains, initial variable values and output constraints. Simulation control XML document name is given without the xml-extension or path:

simulation_control=simulation

Simulation task_type must be either simulation (creates a result db) or data_processing (modifies the input db):

task_type=simulation

In data processing the input data is modified; i.e. the computation will modify the values in the input database. Data processing can contain both init, simulation and forced operation chains. In simulation the data from the input database is taken as is, and a result database is generated.:

deterministic=True
track_prices=True

Deterministic-setting, if set to True, removes any stochasticity from the results. If track_prices is set to true, the timber assortment unit prices used for each main level object in operations are stored in the result database.

Next four settings relate to forestry operation simulation: when set to False, do_forced_ops causes the simulation of forced operations; i.e, operations that are set to happen on the simulation unit on a given date, to be skipped altogether. When dealing with forced operations and Monte Carlo simulations (more than 1 realizations per unit), there are two options: 1. either each unit-iteration has individual set of forced operations, or 2. each unit has a single set of forced operations which should be used for all realizations (iterations). In the former case copy_ops_to_all_iterations should be FALSE, and in the latter case the option should be TRUE and forced ops will be copied from iteration 0 to all other iterations.

Option create_branch_desc controls whether for each simulation unit, a description of each decision tree branch will be generated or not. The description contains the name of each operation that has caused branching in the decision tree. cache_dir is the directory path in which the unit price and stem volume caches will be written to. The ‘simulation’ level variable USE_CACHE defined in the simulation_control (set to 1 to use caching) will control whether the caches will be used or not. See later for examples of simulation control:

do_forced_ops=True
copy_ops_to_all_iterations=True
create_branch_desc=True
cache_dir = cache

Option use_data_date_as_init_date sets whether the date stored in input data will be used as the starting date for the simulation. It’s also possible to use the date from the computer’s clock as the initial date (use_today_as_init_date). If both of these settings are set to false, the date given in fixed_init_date is used as the starting date for the simulation. The date is given in d.m.yyyy format. If the task type is ‘data_processing’ there are two options for how dates are handled during computation, by setting fix_data_processing_date_to_init to True the date for the data after the data processing run is the same as the date used for the starting date for the computation. On the other hand if the same option is set to False, the date for the data after the computation will be whatever is dictated by the simulation control used for the computation; i.e., if the simulation control contains a 10 year simulation, the data date after the run will be the starting date + 10 years. Note that for ‘simulation’ type tasks the results will be date stamped to the end of each simulation period but for ‘data processing’ type tasks they will be data stamped to the beginning of the next period (unless a fixed date is used). For example for a simulation control with one 1 year period and a starting date of 1.1.2010, the results from a ‘simulation’ type run will have a date of 31.12.2010. For a ‘data processing’ type run the result date will be 1.1.2011.:

use_data_date_as_init_date=False
use_today_as_init_date=False
fixed_init_date=1.1.2008
fix_data_processing_date_to_init=False

The following lines can be modified to improve data execution times. Over- and underestimates can, however, hurt execution time. Option object_count_multiplier is a positive non-zero number, estimated_branch_count is a positive non-zero integer number, max_unit_count is a positive non-zero integer and number that sets the amount of simulation units that are calculated at a time. Option max_task_count is a positive non-zero integer number that sets the amount of simulation units that are sent from the server to a worker at once (and has no effect if the program is not in server mode). f.e. max_unit_count 30 max_task_count 300 will have workers working on 10 set groups for faster execution time. Option max_result_unit_count control the maximum number of result units from prediction models (e.g. new trees from distribution models):

object_count_multiplier=50
estimated_branch_count=1
max_unit_count=1
max_task_count=300
max_result_unit_count=60

Simulator can be forced to either always leave a single untouched branch or never leave untouched (no-op) branches. This setting works “globally”, meaning that it will consider all branching groups simultaneously. If you DON’T want to leave a branch with no operations (leave_no_op_branch = False), but you have, for example, separate branching groups for thinnings and clearcuts, and the stand is at a development state in that is past thinning stage, i.e. only clearcut branches will be simulated. In this case the last clearcut will still leave the untouched branch, as the thinnings haven’t been yet simulated. The untouched branch would only ‘disappear’ if all the possible thinning and clearcut branches would be simulated. This can be avoided by using separate_branching_groups option, which when set to True, will process each branching group separately and you can have multiple branching groups without the problem of “untouched” branches appearing. Option separate_branching_groups is set to False by default.

leave_no_op_branch=True separate_branching_groups=False

In cases where branching is very dense, the number of generated branches may cause problems memory- and performance-wise. These problems can be avoided by setting a limit for maximum number of branches, using option max_branch_limit.

max_branch_limit=100

If one wants to simulate only a subset of the input data, individual simulation units can be listed in the id_list parameter by listing their ids separated by a semicolon. Note that the parameter is commented out in the example. index_sequence parameter can be used to direct a certain sequence of simulation units to simulation, in the example first 10 units will be simulated. Setting the parameter value to 99- all the simulation units from the 100th onwards will be simulated:

#id_list=567;773;5367
index_sequence=0-9

Optimization:

[optimization]

Optimisation method should be one of: hero, tabu_sarch, jlp:

method=hero

Jlp will use linear programming (LP) to find a global optimum. Tabu search and HERO are heuristic optimization methods, which don’t necessarily find the global optimum but are more flexible for the optimization problem formulation.

The optimization task is defined in a SIMO optimization task object, constructed from optimization_task XML document (Optimization). What is given as the value of the problem_definition setting, will be used as the optimization run id in the database as well:

problem_definition=optimization_task
keep_opt_data = False
reuse_opt_data = False

Boolean setting keep_opt_data controls whether the optimization data file will be left on disk after the optimization run or not. The data file get its name from the problem_definition setting. Connected to the keep_opt_data setting, the boolean setting reuse_opt_data controls whether an existing optimization data file is used or not. You can reuse the data file if you’ve run the optimization once before and the expressions and constraints in the optimization task have not changed; e.g. when you run the optimization with a different heuristic method or change the parameters for a method. The data file is the one called opt_task.h5 in the base folder set earlier:

Setting limit2branches_include and limit2branches_exclude can be used to limit the optimization to just specific branches of the data, e.g.:

limit2branches_include=Normal;5 yrs
limit2branches_exclude=10 yrs

would only pick those alternative development scenarios that would have the string ‘Normal’ or ‘5 yrs’ in their branch names, and not the string ‘10 yrs’; i.e. one has to list the strings wanted to appear in the branch name in limit2branches_include, and the strings that must not appear in the branch name in the limit2branches_exclude. Strings are separated by semicolon. The match can be partial, i.e. the whole branch name does not have to be given. The branch names can be found in the simulation result database from the table branch_desc in the branch_desc column. They are generated from the name-attributes of the <condition> child elements of <branch_conditions> of the modelchain XML document (Model chain) used to describe the alternative branch generation logic in the simulation. Note that a single branch name will contain a combination of the condition names if there are several branching events for the particular branch over the whole simulation period.

Note that even if used branches are limited, the branch 0 is always included in the optimization data, so you should probably set leave_no_op_branch to true so that branch 0 does not contain any branching operations.

Normally though, these settings are left empty so that the optimization uses all simulated alternatives for optimal solution candidates:

limit2branches=

In the case of heuristic optimization, a couple of images illustrating the optimization procedure are generated in the folder defined in the chart_path parameter:

chart_path=.\logs

In the case of heuristic optimization, a couple of images illustrating the optimization procedure are generated in the folder defined in the chart_path parameter.

For the LP optimization using J, the installation folder of J must be given as well as the prefix used for all the files that SIMO generates for J. The value of max_objects should be increased if the optimization using J fails to memory related errors. Also, the timeout value is dependent on the problem size. It determines the time in seconds that the optimization is allowed to run with J. This is because at certain corner cases (signifying a bug in J), J can drop to it’s internal prompt while executing the optimization thus leading to indefinite optimization execution without the timeout. Use big values for the timeout with large programs, otherwise the optimization will be terminated prematurely due to the timeout.

In case you want to split optimized operations according to the split unit weights generated by J, set the do_split parameter to True. OBS! The split weights DO NOT affect simulated data values in any way, only the operations!!! The optimizer will multiple all numerical op_res columns in the optimized database with the given weights and this will cause trouble if you have some categorical variables in your results, such as tree species or timber assortment. This can be avoided by defining the columns you do not want to split in the skip_weight_splits parameter:

[optimization.jlp]
j_folder=./J
file_prefix=demo
max_objects=2000
timeout=30
do_split=False
skip_weight_splits=species;assortment

Because of the iterative nature of optimum search in heuristic methods, the maximum number of iterations used are given. For tabu search also two other parameters are given: for how many iterations is the current solution fixed as the optimum solution (entry_tenure) and for how many iterations is a certain solution kept outside the optimum solution once it’s been removed from it (exit_tenure):

[optimization.tabusearch]
maximum_iterations=100000
entry_tenure=10
exit_tenure=20

[optimization.hero]
maximum_iterations=100000

Reporting:

[output]
data_type=result
result_type=optimized
start_date=1.1.2000
end_date=1.1.2010
full_date=false
#id_list=567;773;5367
#index_sequence=10-24

The output can be used to output values from input or result data (data_type). For the output format operation_result this setting has no effect as for that format the data is always taken from the result database.

Allowed values: input, result

If the data type is result, the result type can be either simulated or optimized (result_type).

Allowed values: simulated, optimized

It’s also possible to limit the output to certain time period, using start_date and end_date. By leaving these options empty, all the dates are reported. By leaving the end_date value empty, all periods beginning from the start_date are reported.

If full_date is true, month and day are printed in the output. Otherwise only the year is printed.

Similarly to simulation, the reporting can be targeted to only specific simulation units (see above for definitions of id_list and index_sequence).

It’s possible to select several output formats at the same time by separating the format names with a semicolon. There must be a matching number of output filenames again separated with a semicolon:

output_directory=.
output_format=aggregation;inlined;operation_result
output_filename=aggr;stock;oper
output_constraint_file=output_constraints
default_decimal_places=1
archiving=False

Allowed output formats:

inlined, by_level: data from different levels and different simulation branches in the same output file
smt - data for a single data level from the first data branch (usually the one generated by optimization module) for the last period in the simulation. Optionally several smt files can be generated, each for a single simulation period.
aggregation: for aggregating values over time and location
operation_result: the forestry operation results
branching_graph format outputs dot text files that can be converted into images with the Graphviz program. Dot files are generated per computation unit and they describe the branching of the simulation results as well as the forestry operation causing each branch generation.

output_constraint_filename is given only for the formats inlined, by_level and smt. It should refer to a SIMO object, constructed from an XML document describing the attributes for different data levels that should be in the report (Output constraint). The XML document name is given without the xml-extension or path.

The text output files can be automatically compressed (archiving).

For inlined-format, result padding will align the column names in the header and the data values:

[output.inlined]
result_padding = True

Aggregation rules and output options are defined in an XML document (Aggregation definition). XML document name is given in aggregation_definition_file option without xml-extension or path:

[output.aggregation]
aggregation_definition_file=aggregation
aggregation_charts_directory=.

For operation_result-format vars-parameter defines which of the operation model result variables should be included in the operation result report. The result variables are described in the operation model XML document (Operation model). In addition of those there is an explicit result variable cash_flow. Operation results contain also categorical variables such as tree species and assortment. Operation results can be grouped by these categorical variables by defining the variable names using group_by parameter:

[output.operation_result]
vars=cash_flow;Volume
group_by=SP;assortment

What happened in that run? Enter logger¶

Logger module logger is used for retrieving information about simulation and optimisation from log database. Info on logger usage can be get with command:

bin/logger --help

For source distribution on Windows, which lacks the bin versions, the command is:

python src/logger.py --help

Using the Postgres database provided by buildout¶

To initialize Postgres database to store data in simulator/db/pg directory:

./parts/postgresql/bin/initdb -D simulator/db/pg

and then to start it daemonized:

./parts/postgresql/bin/pg_ctl -D simulator/db/pg -l simulator/log/pgsql.log start

Next, you’ll need to create the Postgres user used in the simulation (default is test). You can do this with the command line SQL interface psql or the createuser command (see ./parts/postgresql/bin/createuser –help):

./parts/postgresql/bin/psql postgres

and then in psql:

CREATE USER test WITH PASSWORD 'test' CREATEDB;

Now you’re ready to run the test simulation with Postgre as database once you’ve changed the db_type ini-setting:

[data_db]
# db_type must be sqlite or postgresql.
db_type=postgresql

Doing things in parallel¶

First up, you’ll need a few more things installed if you wish to use Simo in parallel. First of all, you need to install celery, postgresql (if you didn’t already) and a redis server.

Now, in the source folder, there are additional runners, for this, you’ll want to use the celery_runner to start up the runner host, which functions mostly as the regular runner, except that instead of running the simulations, it’ll pass the jobs to celery workers via amqp.

As for the workers, they are located in the dev/celery_example folder. Both the server and the workers are, without any changes, going to be expecting all the pieces to be running on localhost, with postgresql having a user test with password test. For example, if you wish to run postgresql offsite (running server and postgresql at the same place highly recommended for speed), you’ll need to change it in the runner.ini, celeryconfig.py (which _must_ be on the path) and celery_tasks (which _also_ needs to be on the path). The workers are started using celeryd, a program provided by the celery installation. No parameters neccessary.

The proper order of starting things up is PostgreSQL, redis, celery_runner, celeryd. For rabbitmq installation and startup instructions for different platforms, see these instructions

So, in case you’re on a unix system, and have run buildout (and builder) successfully, you’ll need to ensure the rabbitmq and postgresql database are running, then execute:

bin/redis-server
bin/python src/celery_runner.py simulator/runner.ini

Then, wherever you’re running the worker(s):

bin/celeryd

Do note that you’ll need to have a replica of the simulation setup on the worker host; i.e., a matching copy of the SIMO db.

Also note that all the parallel runs use the default celery queues in rabbitmq, so you should really be running a single run per master server, as otherwise the different simulation tasks from different runs will happily mix in the queue and the workers will get tasks from all of the runs. Therefore, if you abort a run, make sure to reset the rabbitmq queues. This is accomplished with:

bin/celeryd --purge

While the celery_runner, celery_tasks combo is functional, these are more for demonstrative purposes; you should consider making your own server and workers based on them, and your needs, rather than exclusively relying on them.

Customizing celeryconfig.py¶

It is quite likely that you’ll need to alter celeryconfig.py to fit in your environment.

All settings starting BROKER_ refer to the amqp server, and should match it’s settings.

CELERY_RESULT_BACKEND chooses the method via which the workers return their results to the server. We prefer using redis, as it seems to be the most reliable option at this time. The REDIS_ settings refer to the redis server settings.

CELERY_IMPORTS defines the file from which all the possible tasks are imported, by default celery_tasks.py. This needs to be on the path.

CELERYD_CONCURRENCY allows you to define how many workers are on this server. If not set, it defaults to the amount of available cores.

CELERYD_PREFETCH_MULTIPLIER defines how many tasks a worker preloads to itself. This should be set to 1 so that the tasks are executed as workers become available.

CELERY_REDIRECT_STDOUTS allows you to get print statements to the log/output. If it is set to False, all STDOUT output is lost.

CELERYD_LOG_LEVEL defines how detailed output from the workers is printed out. Allowed values are DEBUG, INFO, WARNING and ERROR. STDOUT, if redirected will be on WARNING level of output.

For further reading about this, see the documentation

Customizing celery_tasks.py¶

As mentioned above, this is the file pointed by the CELERY_IMPORTS setting. The key things to change are:

base_dir -- the location of the simo libraries (usually ./src/simo)
sim_dir -- the local equivalent for the configuration variable
           program_execution.base_folder
pg_host -- address of the postgresql server
pg_port -- port of the postgresql server
pg_user -- username on the postgresql server
pg_pw -- password of the user on the postgresql server

The workers use the postgresql information to get the configuration stored there, as opposed to using the local ini file, overriding some key values as stated above.

When things go wrong¶

If you need to abort a running parallel job for one reason or another, the unfinished tasks are left in the rabbitmq queue. The clear the queue, you can use celeryd –purge, but to do things directly on rabbitmq you can also:

sudo rabbitmqctl stop_app
sudo rabbitmqctl reset
sudo rabbitmqctl start_app

Table Of Contents

Previous topic

Next topic

This Page

Using SIMO¶

Shortcut for the impatient¶

On Windows¶

On Linux & OS X¶

Structure of your SIMO installation¶

Building simulators & optimizers with builder¶

Content of builder.ini for building SIMO instances¶

Content of lister.ini for listing object db content¶

Content of exporter.ini for object to XML converison of SIMO instances¶

Running simulations & optimizations with runner¶

Developer option for debugging¶

Content of runner.ini¶

What happened in that run? Enter logger¶

Using the Postgres database provided by buildout¶

Doing things in parallel¶

Customizing celeryconfig.py¶

Customizing celery_tasks.py¶

When things go wrong¶

Navigation

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Using SIMO¶

Shortcut for the impatient¶

On Windows¶

On Linux & OS X¶

Structure of your SIMO installation¶

Building simulators & optimizers with builder¶

Content of builder.ini for building SIMO instances¶

Content of lister.ini for listing object db content¶

Content of exporter.ini for object to XML converison of SIMO instances¶

Running simulations & optimizations with runner¶

Developer option for debugging¶

Content of runner.ini¶

What happened in that run? Enter logger¶

Using the Postgres database provided by buildout¶

Doing things in parallel¶

Customizing celeryconfig.py¶

Customizing celery_tasks.py¶

When things go wrong¶

Navigation