Handling flag days
Consequences of flag days for the notary
A flag day signifies the point in time where the network stops using one set of Network Parameters and begins using the new, previously proposed set of Network Parameters. This is discussed in the “Network parameters update process” within the network-map documentation.
Once a flag day is issued, the next time a node polls the Network Map service it will receive the updated Network Parameters, in turn causing the node to shutdown due to a parameter mismatch. As a Notary node (whether a basic Notary or a worker within a HA cluster) is built upon the same foundation as a standard node, it will behave in the same way and also shutdown when it next polls.
With a simple non-HA Notary service, a zero-downtime parameter update is not possible. After the flag day the service must be restarted, either manually and immediately after the flag day (if the network operator in control of the flag day is also in control of the Notary) or automatically when the Notary next polls the Network Map service (e.g. using a daemon to restart the service after any shutdowns).
Although immediately restarting manually after a flag day should be preferred, there is a chance that a notarisation request is sent during the downtime from a node who is not yet aware of the flag day. If network participants cannot handle Notary downtime then a HA notary cluster should be run instead.
HA notary cluster
With a HA cluster of Notary worker nodes, a zero-downtime update is possible but is dependent on the Network Map service polling schedules. The schedule of each Notary worker’s polling will be determined by both the polling interval (specified by the Network Map service) along with the Notary service start time. As the node will shutdown when it next polls the Network Map service, having all polling schedules be in sync across worker nodes will mean that without manual intervention the entire HA notary cluster will shutdown after a flag day. To avoid this situation a Notary operator should ensure that the worker nodes are started in a staggered manner and the polling intervals are not in sync.
If a daemon or some automated process is being used to resurrect dead worker nodes then the Notary operator can rely on this to automatically handle the flag day roll-over. If the polling intervals are properly staggered then this should also result in a zero-downtime Notary cluster however it is inherently more risky.