You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 18, 2025. It is now read-only.
This is applicable when: BufferInstanceWrites == true.
I recently added some counters to monitor the number of time InstancePollSeconds gets exceeded during discovery. The number seen should normally be quite low but I've seen that on a busy orchestrator server, especially when talking to a orchestrator backend in a different datacentre that the number of times this happens can jump significantly.
Consequently better management and monitoring of this is needed.
Thoughts involve:
ensuring that the configuration parameters used are dynamically configurable via SIGHUP calls and thus do not require orchestrator to be restarted. This affects the 2 variables: InstanceFlushIntervalMilliseconds and InstanceWriteBufferSize.
adding extra monitoring of the time taken for flushInstanceWriteBuffer to run. A single metric every minute is useless so I need to collect metrics and then be able to provide aggregate data and percentile timings in a similar way to how the discovery timings are handled.
parallelising this function to run against the backend orchestrator server a number of times. (completely serialising this even though the writes are batched is not fully efficient but we should ensure that writes for the same instance are never done through different connections at the same time)
With these changes it should be easier to see where the bottleneck is and to be able to adjust the configuration "dynamically" to ensure the required performance is achieved.