GSoC Week 3....4 - Passing the exam
Here we are again. Notice anything unusual in the title? Yeah, you got me. I gathered results over results before posting anything, and this means that I’ll try to condense here the results of these 10 days.
Benchmarks, benchmarks…
The last thing I was testing on was my dumpwatcher addon for mitmproxy. This basically showed how:
- Protobuf serializer class was working great to generate the dumps.
- It took an awful lot of time to insert every blob to the SQLite DB.
The idea - hints of mentors inside - is now to move my testing process from time to throughput. Let’s be honest, trying to measure meaningfully a single dump/write is not going to take me anywhere - too many variables - and in any case writing each flow alone is not the greatest strategy.
I decided to move to a different strategy, with a completely different addon.
This time, I will take a unit flow, and generate accordingly a constant stream of flow duplicates, measuring how my system is reacting to this - parameterized and tweakable, of course - stress test.
Ah, and before I forget it. Last time I promised I would’ve taken a look to asyncio module, to align to mitmproxy strategy (it recently moved to asyncio) and, to be honest, gather some more knowledge. As usually, you can check the code on my repo (listed at the end of the page), but I will resume here the hot code:
async def stream(self):
while True:
await self.queue.put(protobuf.dumps(self.flow.get_state()))
await asyncio.sleep(self._stream_period)
async def writer(self):
while True:
await asyncio.sleep(self._flush_period)
count = 1
b = await self.queue.get()
self.hot_blobs.append(b)
while not self.queue.empty() and count < self._flush_rate:
try:
self.hot_blobs.append(self.queue.get_nowait())
count += 1
except asyncio.QueueEmpty:
pass
start = time.perf_counter()
n = self._fflush()
end = time.perf_counter()
ctx.log(f"dumps/time ratio: {n} / {end-start} -> {n/(end-start)}")
self.results.append(n / (end - start))
self._fflushes += 1
This is a classic producer<->consumer design, exploiting an asyncio.Queue(). The stream method will produce a protobuf blob, put it on the queue, and sleep for a given amount of time. Periodically - to improve performance - The writer method wakes up, checks if the Queue is not empty, and if so, it takes _up to self.flush_rate flows from the queue. This way, I ensure that flows are inserted to SQLite DB in stocks, avoiding the great waste of time occurring with many isolated transactions.
Long story short, testing with this, for how rudimentary it could be, showed that the results are acceptable. It is time to move on.
The View
The view addon is where I will focus quite the bulk of my work for mitmproxy, during this GSoC.
Let’s examine and point out the major features of the View:
- It stores the flows received my mitmproxy in memory (self._store).
- It mantains an ordered list (with different possible order keys) of the flow which are currently displayed to the user (self._view).
- It keeps track of the currently focused flow.
- It mantains a Settings mapping, which user can use to annotate, or add various fields, associated to flows id.
But most importantly, the View exposes an API which can be accessed by mitmproxy toolkit, which provides various operations on order keys, flows, settings, filters, and a bit more. Being the point of contact between mitmproxy tools, and flows storage, this is where session storage to DB will be implemented.
So, before starting to work on my new Session addon, I have to clarify - to myself, and to mitmproxy - what we expect, and how do we expect it, from the View. That is, cleaning and extending the API.
And that is everything. What I’m doing right now is refactoring the API, and extending the current test suite to eradicate all the happy bugs I am injecting right now. Then I’ll move on cleaning the codebase of all the hacky, not modular accesses to the View.
You can check out the status of my work on View API here.
Streamtester addon discussed at the beginning of the post here.
Enjoy :)