Notes
=====

ZeroMQ-based communication
--------------------------

OCD uses ``ZeroMQ`` to listen for events emitted by TCS as well as for internal
communication. ``ZeroMQ`` makes it very simple to setup inter- and
intra-process independently of the transport protocol [#fzmq_p]_.  We use the
`PUB/SUB pattern
<https://learning-0mq-with-pyzmq.readthedocs.io/en/latest/pyzmq/patterns/pubsub.html#publish-subscribe>`_.
This protocol allow multiple clients to connect to one publisher and one client
to connect to multiple publisher.

Various OCD subcommands emit TCS-like events via ZeroMQ sockets in ``PUB`` mode. The
addresses are provided via the following configuration options in the ``[urls]``
section:

* ``ocd_main_loop``: events related with the OCD main loop execution, i.e.
  :ref:`ocd_run`
* ``ocd_run_shot``: events emitted by observations commanded by OCD; they all
  originates in :mod:`ocd.run_shot`;
* ``ocd_allow_hetdex``: events emitted by the :ref:`ocd_allow_hetdex` command;
* ``ocd_db_replay``: events emitted by the :ref:`ocd_db_replay` command.

Ideally we would one one address for service, but since every publisher can
bind only one address it would be impossible to run multiple OCD subcommands or
multiple instances of the same subcommand without breaking the communication.

Here is an example of why we might want to have multiple subcommands running at
the same time:

    We start :ref:`ocd_run` and as soon as the conditions are good for HETDEX
    it begins to execute shots. In this mode, the command emits events from two
    channels, the ``ocd_main_loop`` and the ``ocd_run_shot``. All goes fine
    until one shot starts failing. At that point the RA wants to explore what
    is wrong with the shot by hand and temporarily disables HETDEX shot
    execution via the ``ocd allow_hetdex stop`` command. Then she/he can try to
    run the shot by hand using the :ref:`ocd_run_shot` command, enabling the
    ``-e/--emit-events`` option, so that it is possible to track the shot
    execution via OCD. However this fails, because the ``ocd_run_shot`` address
    has already bound to an other process.

The solution is to provide multiple addresses for ``ocd_run_shot`` and to
specify which one to use to emit signals in each of the OCD commands. The
following example modifies only the relevant parts of the :ref:`master_conf`:

.. code-block:: cfg

    [urls]
    ocd_main_loop = tcp://127.0.0.1:6600
    ocd_run_shot = tcp://127.0.0.1:6601, ipc://run_shot.ipc

    [run]
    n_ocd_main_loop = 0
    n_ocd_run_shot = 0

    [run_shot]
    n_ocd_run_shot = 1

According to this configuration, ``ocd run`` emits events at the addresses
``tcp://127.0.0.1:6600`` and ``tcp://127.0.0.1:6601`` and listens to
``tcp://127.0.0.1:6601`` and ``ipc://run_shot.ipc``, while ``ocd run_shot``
emits events at the address ``ipc://run_shot.ipc``. This allows to execute cases
like in the above example and make OCD future proofed against future services
that will consume OCD events or produce events for it.

See the :func:`ocd.utils.init_zmq_servers` for some more information.

.. _mysql_note:

MySQL database
--------------

Before attempting to run a shot, OCD needs to interface with a `MySQL
<https://www.mysql.com/>`_ database. The information necessary to access the
database is stored in the configuration file ``[database]`` section:

.. code-block:: cfg

    [database]

    # {mandatory} configuration for the mysql database containing the vl_obsnum table 
    mysql_host=127.0.0.1
    mysql_port=3306
    mysql_database=test_db
    mysql_user=test_user
    mysql_password=test
    # {optional} if the following entry is false, do not insert in the mysql database
    # the new observation number. This options is should be set to false for
    # testing and when running OCD in listening mode. Default: true
    mysql_update_obsnum = false

The database is expected to contain one table called ``vl_obsnum`` with the
following structure:

+---------+----------------------+------+-----+-------------------+----------------+
| Field   | Type                 | Null | Key | Default           | Extra          |
+=========+======================+======+=====+===================+================+
| id      | smallint(5) unsigned | NO   | PRI | NULL              | auto_increment |
+---------+----------------------+------+-----+-------------------+----------------+
| ts      | timestamp            | NO   |     | CURRENT_TIMESTAMP |                |
+---------+----------------------+------+-----+-------------------+----------------+
| obsdate | date                 | NO   | MUL | NULL              |                |
+---------+----------------------+------+-----+-------------------+----------------+
| inst    | varchar(5)           | NO   |     | NULL              |                |
+---------+----------------------+------+-----+-------------------+----------------+
| obsnum  | mediumint(9)         | NO   |     | NULL              |                |
+---------+----------------------+------+-----+-------------------+----------------+

and one the entry should look like:

+----+---------------------+------------+-------+--------+
| id | ts                  | obsdate    | inst  | obsnum |
+====+=====================+============+=======+========+
|  1 | 2017-11-24 14:04:26 | 2017-11-24 | virus |     10 |
+----+---------------------+------------+-------+--------+

When the next shot can be run, the highest ``obsnum`` for the current UTC date
(``obsdate``) is recovered from the database, increased by 1 and returned. If
the ``mysql_update_obsnum`` configuration entry is set to ``true``, the new
value is inserted in the database.

When using the MySQL image provided by the :ref:`ocd_docker_mysql` command, the
``mysql_host`` configuration entry should be updated to the IP address provided
by the ``up`` or ``info`` subcommands **before** running :ref:`ocd_run`.

.. _mock_times:

Mock times
----------

As you might have noticed, testing OCD outside of HET requires a certain amount
of work. Here is yet an other problem: ``autoschedule_main`` returns shots only
for the current night, so it is impossible to fully test OCD during
engineering, i.e. with full moon. To do this we need to fake the time fed to
``autoschedule_main``. One way would be to mock the time in the shells where
the various OCD subcommands run. I found and tested
`libfaketime <https://github.com/wolfcw/libfaketime>`_: unfortunately it
doesn't work. I could successfully run::

    faketime '2017-11-18 18:00:00' ocd run --config ocd.cfg

but when I tried to do something like::

    faketime '2017-11-18 18:02:00' ocd allow_hetdex --config ocd.cfg start

I could not make the connection with ``ocd run``. Leaving out the ``faketime``
command, it does work fine. This also means that ``ocd run`` could correctly
run ``autoschedule_main`` and select a new shot, but the shot could not be run
because of the connection failure.

To help testing :issue:`2242` was addressed and a way to mock times has been
added to OCD. To use this functionality its enough to uncomment the
``mock_time`` option of the ``[dates]`` section and give it a value accepted by
`astropy Time <http://docs.astropy.org/en/stable/time/#id3>`_:

.. code-block:: cfg

    [dates]

    # {optional} if this value is provided, it must contain a UTC date/time that
    # astropy.time.Time can parse (http://docs.astropy.org/en/stable/time/#id3).
    # If the option not is used, the times used to run e.g. ``autoschedule_main``
    # refers to current UTC times
    # If the option is used, a mock object is initialized with the ``mock_time``,
    # and calls to ocd.utils.get_utc and ocd.utils.get_jd return a new time ``n``
    # seconds after ``mock_time``, where ``n`` is the time between initializing the
    # mock object and the get_* function call.
    # If the option is used the user is asked to proceed to avoid troubles during
    # operation
    mock_time = 2017-11-18T18:00:00

When running::

    ocd run --config ocd.cfg

you will be asked if you really want to proceed with a mock time. If you type
``y`` or ``yes``, the command will run as usual. The logs will show the correct
time stamps (i.e. not the mocked ones). When the conditions are good enough to
submit a new shot, the current Julian date is requested. Since we are mocking
the time, we do not get back the current date, but the one corresponding to
the value in ``mock_time`` plus the time passed from the start of ``ocd run``.
I.e. if the first shot happens one hour after starting OCD, we will get the JD
corresponding to ``2017-11-18T19:00:00`` (2458076.291667) [#jd_mock]_.

.. rubric:: Footnotes

.. [#fzmq_p] ZeroMQ handles transparently `multiple protocols
    <http://zeromq.org/docs:features>`_.

.. [#jd_mock] For reference, the JD corresponding to 2017-11-18T18:00:00 is
    2458076.25