Notes

ZeroMQ-based communication

OCD uses ZeroMQ to listen for events emitted by TCS as well as for internal communication. ZeroMQ makes it very simple to setup inter- and intra-process independently of the transport protocol [1]. We use the PUB/SUB pattern. This protocol allow multiple clients to connect to one publisher and one client to connect to multiple publisher.

Various OCD subcommands emit TCS-like events via ZeroMQ sockets in PUB mode. The addresses are provided via the following configuration options in the [urls] section:

  • ocd_main_loop: events related with the OCD main loop execution, i.e. ocd run
  • ocd_run_shot: events emitted by observations commanded by OCD; they all originates in ocd.run_shot;
  • ocd_allow_hetdex: events emitted by the ocd allow_hetdex command;
  • ocd_db_replay: events emitted by the ocd db_replay command.

Ideally we would one one address for service, but since every publisher can bind only one address it would be impossible to run multiple OCD subcommands or multiple instances of the same subcommand without breaking the communication.

Here is an example of why we might want to have multiple subcommands running at the same time:

We start ocd run and as soon as the conditions are good for HETDEX it begins to execute shots. In this mode, the command emits events from two channels, the ocd_main_loop and the ocd_run_shot. All goes fine until one shot starts failing. At that point the RA wants to explore what is wrong with the shot by hand and temporarily disables HETDEX shot execution via the ocd allow_hetdex stop command. Then she/he can try to run the shot by hand using the ocd run_shot command, enabling the -e/--emit-events option, so that it is possible to track the shot execution via OCD. However this fails, because the ocd_run_shot address has already bound to an other process.

The solution is to provide multiple addresses for ocd_run_shot and to specify which one to use to emit signals in each of the OCD commands. The following example modifies only the relevant parts of the Master configuration file:

[urls]
ocd_main_loop = tcp://127.0.0.1:6600
ocd_run_shot = tcp://127.0.0.1:6601, ipc://run_shot.ipc

[run]
n_ocd_main_loop = 0
n_ocd_run_shot = 0

[run_shot]
n_ocd_run_shot = 1

According to this configuration, ocd run emits events at the addresses tcp://127.0.0.1:6600 and tcp://127.0.0.1:6601 and listens to tcp://127.0.0.1:6601 and ipc://run_shot.ipc, while ocd run_shot emits events at the address ipc://run_shot.ipc. This allows to execute cases like in the above example and make OCD future proofed against future services that will consume OCD events or produce events for it.

See the ocd.utils.init_zmq_servers() for some more information.

MySQL database

Before attempting to run a shot, OCD needs to interface with a MySQL database. The information necessary to access the database is stored in the configuration file [database] section:

[database]

# {mandatory} configuration for the mysql database containing the vl_obsnum table
mysql_host=127.0.0.1
mysql_port=3306
mysql_database=test_db
mysql_user=test_user
mysql_password=test
# {optional} if the following entry is false, do not insert in the mysql database
# the new observation number. This options is should be set to false for
# testing and when running OCD in listening mode. Default: true
mysql_update_obsnum = false

The database is expected to contain one table called vl_obsnum with the following structure:

Field Type Null Key Default Extra
id smallint(5) unsigned NO PRI NULL auto_increment
ts timestamp NO   CURRENT_TIMESTAMP  
obsdate date NO MUL NULL  
inst varchar(5) NO   NULL  
obsnum mediumint(9) NO   NULL  

and one the entry should look like:

id ts obsdate inst obsnum
1 2017-11-24 14:04:26 2017-11-24 virus 10

When the next shot can be run, the highest obsnum for the current UTC date (obsdate) is recovered from the database, increased by 1 and returned. If the mysql_update_obsnum configuration entry is set to true, the new value is inserted in the database.

When using the MySQL image provided by the ocd docker_mysql command, the mysql_host configuration entry should be updated to the IP address provided by the up or info subcommands before running ocd run.

Mock times

As you might have noticed, testing OCD outside of HET requires a certain amount of work. Here is yet an other problem: autoschedule_main returns shots only for the current night, so it is impossible to fully test OCD during engineering, i.e. with full moon. To do this we need to fake the time fed to autoschedule_main. One way would be to mock the time in the shells where the various OCD subcommands run. I found and tested libfaketime: unfortunately it doesn’t work. I could successfully run:

faketime '2017-11-18 18:00:00' ocd run --config ocd.cfg

but when I tried to do something like:

faketime '2017-11-18 18:02:00' ocd allow_hetdex --config ocd.cfg start

I could not make the connection with ocd run. Leaving out the faketime command, it does work fine. This also means that ocd run could correctly run autoschedule_main and select a new shot, but the shot could not be run because of the connection failure.

To help testing issue #2242 was addressed and a way to mock times has been added to OCD. To use this functionality its enough to uncomment the mock_time option of the [dates] section and give it a value accepted by astropy Time:

[dates]

# {optional} if this value is provided, it must contain a UTC date/time that
# astropy.time.Time can parse (http://docs.astropy.org/en/stable/time/#id3).
# If the option not is used, the times used to run e.g. ``autoschedule_main``
# refers to current UTC times
# If the option is used, a mock object is initialized with the ``mock_time``,
# and calls to ocd.utils.get_utc and ocd.utils.get_jd return a new time ``n``
# seconds after ``mock_time``, where ``n`` is the time between initializing the
# mock object and the get_* function call.
# If the option is used the user is asked to proceed to avoid troubles during
# operation
mock_time = 2017-11-18T18:00:00

When running:

ocd run --config ocd.cfg

you will be asked if you really want to proceed with a mock time. If you type y or yes, the command will run as usual. The logs will show the correct time stamps (i.e. not the mocked ones). When the conditions are good enough to submit a new shot, the current Julian date is requested. Since we are mocking the time, we do not get back the current date, but the one corresponding to the value in mock_time plus the time passed from the start of ocd run. I.e. if the first shot happens one hour after starting OCD, we will get the JD corresponding to 2017-11-18T19:00:00 (2458076.291667) [2].

Footnotes

[1]ZeroMQ handles transparently multiple protocols.
[2]For reference, the JD corresponding to 2017-11-18T18:00:00 is 2458076.25