Skip to content

Conversation

mssalvatore
Copy link
Collaborator

@mssalvatore mssalvatore commented Jun 28, 2021

What does this PR do?

Allows ransomware telemetry to be sent in batches to minimize network traffic but also provide the user with periodic feedback.

This PR makes some compromises. Issue #1268 was opened to document the reasons for some of this PR's shortcomings and suggest a path forward.

PR Checklist

  • Have you added an explanation of what your changes do and why you'd like to include them?
  • Is the TravisCI build passing? - No, but it's not related to these changes.
  • Was the CHANGELOG.md updated to reflect the changes?
  • Was the documentation framework updated to reflect the changes?

Testing Checklist

  • Added relevant unit tests?
  • Have you successfully tested your changes locally? Elaborate:

    Tested by running the monkey agent from source and inspecting the telemetry that was sent to the island.

  • If applicable, add screenshots or log transcripts of the feature working

Screenshots

image

IBatchableTelem adds two methods to the ITelem interface. These methods allow
a telemetry object to mange batches of telemetry entries, rather than
just one.
Adds an implementation as a mixin of the two methods specified by
IBatchableTelem.
This allows encryption attempt telmetries to be batched into one
telemetry object so they can be sent to the island in batches.
The term "wrapper" is sometimes used as synonym for the decorator
pattern, whereas this class is a textbook adapter. Use the term
"adapter" instead of "wrapper" and rename "TelemetryMessengerWrapper" to
"LegacyTelemetryMessengerAdapter", as this class servers as an adapter
between the new ITelemetryMessenger interface and the (soon to be) legacy way of
sending telemetry.
This telemetry messenger is a decorator that aggregates batchable
telemetries and sends them to the island periodically.
We don't want the ransomware payload to encrypt all files and then send
telemetry to the island. This could lead to a long period of time where
the user has no insight into what the monkey is doing on a node. We also
don't want to flood the island with telemetries. By using the
BatchingTelemetryMessenger, ransomware encryption telemetries are
batched together and periodically sent to the island.
My original plan was to start a thread in __init__() and stop the thread
when __del__() was called. Since the running thread (object) contains a
reference to the BatchingTelemetryMessenger object that launched it, the
destructor will not be called until the thread is stopped. Therefore, a
stop() was added to allow the BatchingTelemetryMessenger to be stopped.
Since it has an explicit stop, it should also have an explicit start,
rather than starting the thread in the constructor.
My original plan was to start a thread in __init__() and stop the thread
when __del__() was called. Since the running thread (object) contains a
reference to the BatchingTelemetryMessenger object that launched it, the
destructor will not be called until the thread is stopped. This
resulted in adding a stop() method (fadd978) followed by adding a
start() method (1d066c8).

By using an inner class to run the thread, we enable the class to be
used as originally intended, reducing the burden on the user of this
class. The thread is now started on construction and stopped on
destruction. The user can remain blissfully unaware that anything
resembling threading is going in, and can use the
BatchingTelemetryMessenger just like any other ITelemetryMessenger.
@codecov
Copy link

codecov bot commented Jun 28, 2021

Codecov Report

Merging #1272 (7e7d46d) into develop (2ec020f) will increase coverage by 0.51%.
The diff coverage is 92.72%.

❗ Current head 7e7d46d differs from pull request most recent head 8281a9d. Consider uploading reports for the commit 8281a9d to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1272      +/-   ##
===========================================
+ Coverage    30.13%   30.65%   +0.51%     
===========================================
  Files          444      449       +5     
  Lines        13323    13457     +134     
===========================================
+ Hits          4015     4125     +110     
- Misses        9308     9332      +24     
Impacted Files Coverage Δ
monkey/infection_monkey/monkey.py 0.00% <0.00%> (ø)
...ey/infection_monkey/telemetry/i_batchable_telem.py 77.77% <77.77%> (ø)
...lemetry/messengers/batching_telemetry_messenger.py 98.52% <98.52%> (ø)
...nfection_monkey/telemetry/batchable_telem_mixin.py 100.00% <100.00%> (ø)
...key/infection_monkey/telemetry/ransomware_telem.py 100.00% <100.00%> (ø)
...nkey/tests/unit_tests/infection_monkey/conftest.py 100.00% <100.00%> (ø)
...key/monkey_island/cc/services/post_breach_files.py 100.00% <0.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ec020f...8281a9d. Read the comment docs.

WAKES_PER_PERIOD = 4


class BatchingTelemetryMessenger(ITelemetryMessenger):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the responsibility of this class is not to send the telemetries but to batch them. I wonder if we should make a distinction in the name, because a decorator now has exatcly the same naming convention as a messanger even though the responsibility and creation of the object is different

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could name it BatchingTelemetryMessengerDecorator.

WAKES_PER_PERIOD = 4


class BatchingTelemetryMessenger(ITelemetryMessenger):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of implementing the ITelemetryMessenger interface? We already receive an ITelemetryMessenger object, so this class will know it's interface. Can't we do something like telemetry_messenger.send_telemetry = self.add_telemetry_to_queue in the constructor? This would also solve the issue that send_telemetry method is not sending anything, in fact, it's just adding the telemetry to a queue

Copy link
Collaborator Author

@mssalvatore mssalvatore Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of implementing the ITelemetryMessenger interface?

In implementing ITelemetryMessenger, we're stating explicitly that this object must conform to that interface.

We already receive an ITelemetryMessenger object, so this class will know it's interface.

Just because this class "has-a" ITelemetryMessenger doesn't mean it implements ITelemetryMessenger.

Can't we do something like telemetry_messenger.send_telemetry = self.add_telemetry_to_queue in the constructor?

I think this would make the code harder to read. It implements ITelemetryMessenger, but where is its send_telemetry() method? I have to read the constructor to find out how it implements ITelemetryMessenger.

This would also solve the issue that send_telemetry method is not sending anything, in fact, it's just adding the telemetry to a queue.

I'm not sure I see what problem needs to be solved. For anyone who uses the class, all they care about is that telemetry is sent. The purpose of ITelemetryMessenger is that they don't need to care how it is sent.

self._telemetry_messenger = telemetry_messenger
self._period = period

self._should_run_batch_thread = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this variable required? It's only used in _manage_telemetry_batches, and if False only if thread shoud stop. But if the thread should stop, it's joined and _manage_telemetry_batches no longer runs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joining a thread doesn't stop it. It waits for the thread to stop before execution continues. A mechanism is required to signal the thread to stop. In this case, it's the state of self._should_run_batch_thread.


def start(self):
self._should_run_batch_thread = True
self._manage_telemetry_batches_thread = threading.Thread(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the thread not defined in init?

Copy link
Collaborator Author

@mssalvatore mssalvatore Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threads can only be started once, so it needs to be initialized on start. I've added self._manage_telemetry_batches_thread = None to '__init__()'

def _manage_telemetry_batches(self):
self._reset()

while self._should_run_batch_thread or not self._queue.empty():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of self._should_run_batch_thread and or not self._queue.empty() seems to be sending the last batch of telemetries before stopping the tread so that none telemetries go missing. That's not readable. Is it possible to explicitly send all telemetries in the queue before stopping the thread?


while self._should_run_batch_thread or not self._queue.empty():
try:
telemetry = self._queue.get(block=True, timeout=self._period / WAKES_PER_PERIOD)
Copy link
Contributor

@VakarisZ VakarisZ Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should add a comment explaining why we chose these parameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't the names of the parameters explain what they do?

try:
telemetry = self._queue.get(block=True, timeout=self._period / WAKES_PER_PERIOD)

if isinstance(telemetry, IBatchableTelem):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is the last loop and thread pauses here. Another thread in the meanwhile adds telems to the queue, which will never get sent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment explains it: #1272 (comment)

The loop condition ensures the loop will only exit if the queue is empty. Regardless, I've refactored the code to more clearly state its intent.

Copy link
Contributor

@VakarisZ VakarisZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss my comments via zoom.

Base automatically changed from ransomware-logging to develop June 29, 2021 10:19
We need to ensure when a BatchingTelemetryMessenger stops, all remaining
telemetries in its queue are sent. The existing logic does this, but
this commit improves the readability and intent of the code, as well as
adds a test for this condition.
@mssalvatore mssalvatore merged commit d9366a5 into develop Jun 29, 2021
@mssalvatore mssalvatore deleted the batchable-telemetry branch June 29, 2021 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants