Notes¶
ZeroMQ-based communication¶
OCD uses ZeroMQ
to listen for events emitted by TCS as well as for internal
communication. ZeroMQ
makes it very simple to setup inter- and
intra-process independently of the transport protocol [1]. We use the
PUB/SUB pattern.
This protocol allow multiple clients to connect to one publisher and one client
to connect to multiple publisher.
Various OCD subcommands emit TCS-like events via ZeroMQ sockets in PUB
mode. The
addresses are provided via the following configuration options in the [urls]
section:
ocd_main_loop
: events related with the OCD main loop execution, i.e. ocd runocd_run_shot
: events emitted by observations commanded by OCD; they all originates inocd.run_shot
;ocd_allow_hetdex
: events emitted by the ocd allow_hetdex command;ocd_db_replay
: events emitted by the ocd db_replay command.
Ideally we would one one address for service, but since every publisher can bind only one address it would be impossible to run multiple OCD subcommands or multiple instances of the same subcommand without breaking the communication.
Here is an example of why we might want to have multiple subcommands running at the same time:
We start ocd run and as soon as the conditions are good for HETDEX it begins to execute shots. In this mode, the command emits events from two channels, theocd_main_loop
and theocd_run_shot
. All goes fine until one shot starts failing. At that point the RA wants to explore what is wrong with the shot by hand and temporarily disables HETDEX shot execution via theocd allow_hetdex stop
command. Then she/he can try to run the shot by hand using the ocd run_shot command, enabling the-e/--emit-events
option, so that it is possible to track the shot execution via OCD. However this fails, because theocd_run_shot
address has already bound to an other process.
The solution is to provide multiple addresses for ocd_run_shot
and to
specify which one to use to emit signals in each of the OCD commands. The
following example modifies only the relevant parts of the Master configuration file:
[urls]
ocd_main_loop = tcp://127.0.0.1:6600
ocd_run_shot = tcp://127.0.0.1:6601, ipc://run_shot.ipc
[run]
n_ocd_main_loop = 0
n_ocd_run_shot = 0
[run_shot]
n_ocd_run_shot = 1
According to this configuration, ocd run
emits events at the addresses
tcp://127.0.0.1:6600
and tcp://127.0.0.1:6601
and listens to
tcp://127.0.0.1:6601
and ipc://run_shot.ipc
, while ocd run_shot
emits events at the address ipc://run_shot.ipc
. This allows to execute cases
like in the above example and make OCD future proofed against future services
that will consume OCD events or produce events for it.
See the ocd.utils.init_zmq_servers()
for some more information.
MySQL database¶
Before attempting to run a shot, OCD needs to interface with a MySQL database. The information necessary to access the
database is stored in the configuration file [database]
section:
[database]
# {mandatory} configuration for the mysql database containing the vl_obsnum table
mysql_host=127.0.0.1
mysql_port=3306
mysql_database=test_db
mysql_user=test_user
mysql_password=test
# {optional} if the following entry is false, do not insert in the mysql database
# the new observation number. This options is should be set to false for
# testing and when running OCD in listening mode. Default: true
mysql_update_obsnum = false
The database is expected to contain one table called vl_obsnum
with the
following structure:
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
ts | timestamp | NO | CURRENT_TIMESTAMP | ||
obsdate | date | NO | MUL | NULL | |
inst | varchar(5) | NO | NULL | ||
obsnum | mediumint(9) | NO | NULL |
and one the entry should look like:
id | ts | obsdate | inst | obsnum |
---|---|---|---|---|
1 | 2017-11-24 14:04:26 | 2017-11-24 | virus | 10 |
When the next shot can be run, the highest obsnum
for the current UTC date
(obsdate
) is recovered from the database, increased by 1 and returned. If
the mysql_update_obsnum
configuration entry is set to true
, the new
value is inserted in the database.
When using the MySQL image provided by the ocd docker_mysql command, the
mysql_host
configuration entry should be updated to the IP address provided
by the up
or info
subcommands before running ocd run.
Mock times¶
As you might have noticed, testing OCD outside of HET requires a certain amount
of work. Here is yet an other problem: autoschedule_main
returns shots only
for the current night, so it is impossible to fully test OCD during
engineering, i.e. with full moon. To do this we need to fake the time fed to
autoschedule_main
. One way would be to mock the time in the shells where
the various OCD subcommands run. I found and tested
libfaketime: unfortunately it
doesn’t work. I could successfully run:
faketime '2017-11-18 18:00:00' ocd run --config ocd.cfg
but when I tried to do something like:
faketime '2017-11-18 18:02:00' ocd allow_hetdex --config ocd.cfg start
I could not make the connection with ocd run
. Leaving out the faketime
command, it does work fine. This also means that ocd run
could correctly
run autoschedule_main
and select a new shot, but the shot could not be run
because of the connection failure.
To help testing issue #2242 was addressed and a way to mock times has been
added to OCD. To use this functionality its enough to uncomment the
mock_time
option of the [dates]
section and give it a value accepted by
astropy Time:
[dates]
# {optional} if this value is provided, it must contain a UTC date/time that
# astropy.time.Time can parse (http://docs.astropy.org/en/stable/time/#id3).
# If the option not is used, the times used to run e.g. ``autoschedule_main``
# refers to current UTC times
# If the option is used, a mock object is initialized with the ``mock_time``,
# and calls to ocd.utils.get_utc and ocd.utils.get_jd return a new time ``n``
# seconds after ``mock_time``, where ``n`` is the time between initializing the
# mock object and the get_* function call.
# If the option is used the user is asked to proceed to avoid troubles during
# operation
mock_time = 2017-11-18T18:00:00
When running:
ocd run --config ocd.cfg
you will be asked if you really want to proceed with a mock time. If you type
y
or yes
, the command will run as usual. The logs will show the correct
time stamps (i.e. not the mocked ones). When the conditions are good enough to
submit a new shot, the current Julian date is requested. Since we are mocking
the time, we do not get back the current date, but the one corresponding to
the value in mock_time
plus the time passed from the start of ocd run
.
I.e. if the first shot happens one hour after starting OCD, we will get the JD
corresponding to 2017-11-18T19:00:00
(2458076.291667) [2].
Footnotes
[1] | ZeroMQ handles transparently multiple protocols. |
[2] | For reference, the JD corresponding to 2017-11-18T18:00:00 is 2458076.25 |