How OCD works

This chapter describe how OCD handles the information coming from TCS and decide when to run a shot.

Note

Throughout this chapter, unless otherwise specified, the terms section and option referes to the configuration file.

OCD start

To start OCD, you execute the ocd run command. This does all the necessary setup for OCD to run:

  • load the configuration;
  • start the TCS logger or, if not available, its mock version (section [urls], tcs_log and tcs_log_mock_path options);
  • connect to the TCS subsystems or, if not available, creates their mock counterpart (section [urls], subsystem_names, tcs, virus, pfip, pas options);
  • initialize the ZeroMQ server necessary to send events from the OCD mail loop;
  • setup the Orchestrator to listen for TCS events ([urls] section, event_urls option) and for events coming from other OCD subcommands ([urls] section, ocd_main_loop, ocd_run_shot, ocd_allow_hetdex, ocd_db_replay options). As described in the note in ocd run it is not possible to provide values for event_urls and ocd_db_replay at the same time;
  • load the available list of shots (shot_file option in the [shots] section) into an internal database.

An event comes in

Once it’s all setup, OCD begins listening for events. Each event is composed of two parts:

  • a topic, i.e a header string, typically identifying the source of the event: e.g. pas.Guider1.metrology_data;
  • an event payload: a dictionary containing information.

OCD listen for a few events:

  • pas.Guider1.metrology_data, pas.Guider2.metrology_data: reduced guide probe data;
  • tcs.root.ra_dec: primary pointing information;
  • tcs.receiver.heartbeat: heartbeat event from TCS;
  • ocd.run_shot.run, ocd.run_shot.setup_telescope, ocd.run_shot.exp_hetdex: track the execution state of shots submitted by OCD
  • ocd.states.hetdex_allowed: enable/disable HETDEX shot execution via the ocd allow_hetdex command
  • ocd.heartbeat.enquiry: ocd.heart_beat – Control the OCD main loop status and wait for connections

The following subsections describe how events are handled

pas.Guider{1,2}.metrology_data

This event contains information about FWHM, the sky magnitude and the transparency as measured using a star observed through the guide probes. These data are stored into a MetrologyVault object.

  • The FWHM is computed from the 2D Gaussian fit variance values (fit.gauss_mag(3) and fit.gauss_mag(4)) and the plate scale (plate_scale.x and plate_scale.y); for more information see ContainerFWHM;
  • the sky magnitude is the value associated with the key name indicated by the photometry_skymag option of the [containers] section [1];
  • the transparency is computed as described in ContainerTransparency using the star magnitude value (whose key name comes from photometry_trans option of the [containers] section [1]), the intrinsic object magnitude (from the filter.magnitude and the illumination correction (set to 1 [2]).

The [containers] section also contains the maxlen and delta_timestamp options to decide the maximum number of stored values and/or their maximum age

When a new event arrives the following happens for the FWHM, sky magnitude and transparency:

  1. the new value is stored;
  2. if maxlen is a positive number and the number of stored values exceeds maxlen, remove the oldest one;
  3. if delta_timestamp is given, any element older that delta_timestamp seconds with respect to the new one is removed;
  4. if any of the following event parameters are true, the value is masked: photometry.object_at_image_border, photometry.object_in_bad_image_region, photometry.star_ambiguous, photometry.star_not_found, photometry.unreliable_background; the transparency value is masked also if the filter.magnitude is negative.

After the new values have been stored, OCD re-evaluate whether the metrology matches specification. To do this:

  1. for each probe, evaluates the median of the unmasked FWHM, ski magnitude and transparency;
  2. compares the medians with the reference ranges stored in the ref_fwhm, ref_skymag and ref_transparency options of the [containers] section;
  3. if all values for one probe are within the reference ranges, mark the probe as good;
  4. if one or both of the probes are within specifications, the state of MetrologyState is set to good, otherwise is set to bad. The both_gp_good option of the [containers] section commands whether one probe is sufficient to mark the metrology as good (both_gp_good = false) or if both must be on on spec (both_gp_good = true);
  5. log the state transition and emit a TCS-like event, as described in Transition event.

tcs.root.ra_dec

This event contains primary pointing information. Out of this event, OCD stores the value of the azimuth of the telescope, contained in the az parameters. To be more precise, the AzimuthVault object stores the azimuth in the following two cases: when the setup is done, i.e. when the telescope is settled and likely observing, and when the setup is not done, i.e. when the telescope is moving.

tcs.receiver.heartbeat

TCS emits these events at fixed times (typically every 5 seconds) to allow monitoring its status. OCD uses this event to trigger the emission of TCS-like events, documented in State event, that report the state of each of the state machines described on this page and in ocd.states.

ocd.run_shot.*

This class of events are emitted by the ocd.run_shot and allow OCD to track the shot execution steps. As these events come in, OCD updates the RunShotState state machine, emitting log messages and TCS-like events, as described in Transition event, to document the transitions.

The most relevant state for OCD is idle: when the machine is in this state, a new shot can be planned and run; also when returning to idle, the internal shot list is updated.

ocd.states.hetdex_allowed

The event is emitted when executing the ocd allow_hetdex command. It is used to change the state of the HetdexAllowedState machine. When the state is set to allowed, OCD can plan and execute shots.

State transitions triggers the emission of log messages and TCS-like events, as described in Transition event.

ocd.heartbeat.enquiry

The event is used to test the connection between OCD commands. Typically OCD commands that send events, first make sure that the main loop is up and listening using the heart beat functionality.

The MetaState

Every time one of pas.Guider{1,2}.metrology_data, ocd.run_shot.* or ocd.states.allow_hetdex is received a state machine is updated. The same events are then handed to a MetaState machine. The machine check the machines MetrologyState, RunShotState and HetdexAllowedState: if their states are, respectively good, idle and allowed, the meta-state is set to satisfied, otherwise is set to not_satisfied.

As with the other machines, state transitions triggers the emission of log messages and TCS-like events, as described in Transition event.

The shot runner

The same events that trigger The MetaState, are also handled by the ShotRunner, that decide then next shot and, if it is time, run it. If the meta state is satisfied, the following happens:

  1. check if the there are pending processes: an existing process means that a shot is being and the no new shot is prepared and run; finished processes are removed;
  2. get the FWHM, sky magnitude and transparency: for each quantity evaluates the median of unmasked values for both probes and then take the mean value;
  3. get the azimuth: try to use the azimuth with the setup done; if not available try to use the azimuth with the setup not done; if also not available return 180;
  4. create a shot file from the internal database; the name and directory of the file comes from the out_shot_file_template and out_shot_dir options of the [shot] section;
  5. get the current Julian Date (or a mocked version);
  6. run $CUREBIN/autoschedule_main with the shot file, JD, FWHM, sky magnitude, transparency and azimuth described above; the executable name and of some of the files necessary to run it are stored in the [autoschedule] section;
  7. if autoschedule_main does not return any shot, do not proceed further;
  8. if at least one shot is available, get the first one;
  9. if the shot is too far in the future, do not proceed further; the skip_shot_delta_sec option of the [autoschedule] section defines “too far”;
  10. prepare the parameters necessary to run a shot: in the process make contact with a MySQL database to retrieve the observation number to use; if the mysql_update_obsnum option in the [database] section is true, add back the new observation number; if the observation number exceeds max_obsnum the shot submission is aborted; all the data necessary to connect to the database comes from the mysql_* options of the [database] section;
  11. if the shot is scheduled to start more than wait_shot_delta_sec (from the [autoschedule] section) seconds in the future, mark it so in the list of parameters just prepared: this way the shot is submitted but sleeps for the time necessary to make it start at the correct moment;
  12. if the option skip_shot_submission option of the [autoschedule] section is true, the shot is not submitted: this option is useful to run OCD in read-only mode; when is true, the mysql_update_obsnum option is automatically set to false;
  13. execute the ocd run_shot command in a subprocess and save the process (see first point of this list)

When a shot runs …

Note

Throughout this section, unless otherwise specified, the terms option referes to options of the [run_shot] section of the OCD configuration file.

As soon as the ocd run_shot command starts, it tries to establish a connection with the parent process in order to make sure that it can properly track the execution (see ocd.run_shot.*). Once the connection is in place these steps are performed:

  1. send a ocd.run_shot.run event with exec_status set to ocd.run_shot.EXEC_STATUS.START;

  2. sleep for the necessary time, as described before;

  3. retrieve the configuration file for the shot created by hetdex shuffle; the template for the file name comes from the shuffle_conf_template section; see the inline documentation in Master configuration file for information about how to format the template;

  4. compare the ra, dec, azimuth and track values passed to ocd run_shot with the corresponding values in the [trajectory] section of the shuffle configuration file; if the values are too dissimilar, the shot is aborted; the absolute tolerance for the parameters comes from the abs_tol_* option values;

  5. copy the ACAM image from the from the shuffle directory (shuffle_conf option) to a target file (acam_dest_file option); the name of the source file comes from the acam_output option of the [image] section of the shuffle configuration;

  6. send a ocd.run_shot.setup_telescope event with exec_status set to ocd.run_shot.EXEC_STATUS.START;

  7. load the trajectory with tcs.load_trajectory; get the equinox from the [trajectory] section of the shuffle configuration file and the ra, dec, az (azimuth) and dir (track) from the ocd run_shot input parameters;

  8. set guide and wfs probe stars ra, dec and equinox, whose values come from the corresponding options of the probe sections of the shuffle configuration file;

  9. go to the next trajectory (using the move_* options of the [go_next] section of the shuffle configuration file);

  10. set guide and wfs probe stars id; for guide probes, filter magnitudes can be copied from the shuffle configuration file to the pas subsystem; the name of the filters in the former comes from the guider_shuffle_filters option while the names to pass to pas.Guider{1,2}_SetObjectAndMagnitudes come from the guider_pas_filters option;

  11. setup the analysis region and the fiducial for the guide probes and the exposure time for WFS probes (see _set_probes_fiductial() for more details);

  12. set optional metadata; the metadata come from the input shot file, in particular the columns with the names listed in the METADATA_NAMES variable;

  13. play a sound, using the executable defined by play_exe option and the file in the setup_sound option;

  14. wait for the TO to mark the setup as done. If the wait_for_setup_timeout option is a positive number, it will wait at most the given amount of seconds; if the timeout is hit, and value of the continue_on_timeout option is false, the shot execution is aborted; if the value is true, the shot execution continue also if the timeout happened. A positive timeout and a true value of continue_on_timeout are useful if the system can be trusted to run in a fully automated way;

  15. send a ocd.run_shot.setup_telescope event with exec_status set to ocd.run_shot.EXEC_STATUS.FINISH;

  16. stop ACQ and start storing guide probe frames (first part of reset_probes())

  17. get the dither pattern: if the dither_with_probes option is true offset star in the guide probes, otherwise use the dither mechanism;

  18. for each exposure:

    1. send a ocd.run_shot.exp_hetdex event with exec_status set to ocd.run_shot.EXEC_STATUS.START and exposure set to the corresponding value;
    2. if the dithering mechanism is used, adjust the dither position;
    3. submit the exposure to the virus subsystem and wait to the shutter to close;
    4. if the guider offset is used and it is not the last exposure, offset the fiducial position of the guide stars in the probes;
    5. wait for the readout to finish, unless it is the last dither and the wait_last_readout option is false;
    6. send a ocd.run_shot.exp_hetdex event with exec_status set to ocd.run_shot.EXEC_STATUS.FINISH and exposure set to the corresponding value;
    7. play a sound, using the executable defined by play_exe option and the file in the finish_exp_*_sound option;
  19. stop storing guide and WFS probes frames, reset the setup status and deploy the ACQ mirrow (second part of reset_probes())

  20. clear the metadata and send a ocd.run_shot.run event with exec_status set to ocd.run_shot.EXEC_STATUS.FINISH; if an exception happened the following event keywords are set to :

    • error: True,
    • exc_type: the name of the exception,
    • exc_value: the string representation of the error,
    • traceback: the full traceback;

    in case of an exception play a sound (using the executable defined by play_exe option and the file in the failure_sound option).

… and finishes

When the shot finishes or aborts, the event ocd.run_shot.run with exec_status set to ocd.run_shot.EXEC_STATUS.FINISH is emitted and the RunShotState state is set to idle. At the same time the shot information is used to update the internal database with the list of shots:

  1. get the database entry for shotid; if this is not in the database, log a warning with the problem (this might happen if ocd run_shot is executed by hand);
  2. decrease the number of observations yet to be done (ocd.shots_db.Shots.n_obs); if the value was already 0, the database entry is not updated and a warning is logged;
  3. if ocd.shots_db.Shots.forced_az is negative, set it to the value used to run the shot (a positive value between 0 and 360);
  4. if ocd.shots_db.Shots.track is 2, set it to the value used to run the shot (either 0 or 1).

Footnotes

[1](1, 2)

as of 21.12.2017 the follwing keys are available:

  • photometry.kron_skymag
  • photometry.fixed_skymag
  • photometry.kron_mag
  • photometry.fixed_mag
  • fit.moffat_mag
  • fit.gauss_mag
[2]it is also possible to override the default value of 1 with the illumination_correction option of the [containers] section. Note that this option will not be used once the illumination correction value is fed into the events.