GSoC Week 1 - Staging
Hello, folks. PyCharm is fueled, brain is spinning, and I’m beginning to really set the stage for my coding period at Mitmproxy. In the following weeks, I’ve worked to refine my proposal, and it’s time to put up a show :)
0x10-> A warm dive into Mitmproxy dumps
So, let’s start with the basics. My job in Mitmproxy, for this summer, is to revamp and overhaul the serialization and indexing flow, following the tide of openness and modularity that mitmproxy is riding on. Remember to check my proposal at GSoC proposal for more informations about that…now, let’s hack into the io module!
0x11-> Basic units
" in the beginning was the Flow […] “
Quote from the docs:
"""
A Flow is a collection of objects representing a single transaction.
This class is usually subclassed for each protocol, e.g. HTTPFlow.
"""
What does this mean? So, say I am connecting to lolcats. Now, dumping the value of Flow objects, respectively for requests and response, we get:
HTTPFlow on request:
<HTTPFlow
request = Request(GET lolcats.com:80/)
client_conn = <ClientConnection: 127.0.0.1:50841>
server_conn = <ServerConnection: <no address>>>
Same HTTPFlow, updated on response:
<HTTPFlow
request = Request(GET lolcats.com:80/)
response = Response(301 Moved Permanently, text/html, 184b)
client_conn = <ClientConnection: 127.0.0.1:50841>
server_conn = <ServerConnection: lolcats.com:80>>
Actually, there’s a lot more stuff into those objects, and you can find out by yourself with a little debugging effort! :)
0x12-> tnetstring dumps
So, let’s say I want to replay some requests I forwarded to lolcats. Maybe I need to reproduce the exact sequence of requests, since all those cats make a nice figure on my laptop, or whatever. Mitmproxy currently employs a customized version of tnetstrings to save its captures. Dumps and loads are wrapped by the FlowReader and FlowWriter classes. The former, in particular, handles the conversions between different versions of flows (coming from ancient captures) and the current blueprint.
A slice of our tnetstring looks like this:
4:host;11:lolcats.com,}6:marked;5:false!2:id;36:4515de36-9f3f-4d1e-8162-f73e879 d639c;8:response;578:11:status_code;3:301#6:reason;17:Moved Permanently,
Got the structure, eh?
0x20-> Moving to protobufs (testing)
Well, that said, let’s build the skeleton of the task. Moving to protocol buffers require some prep work, and the very first thing I need to assure is that the change in performance is good enough to justify the shift.
The testing process will be so conducted: we will test both tnetstrings and python protobuf serializer.
0x21-> HTTP Response format
To serve our testing purpose, I will use a simple message –> HTTPResponse.
As I am interested in testing performances, what can really drive me tired is the content. How our protobuf algorithm would handle messages of 4, 10, 50 MB? At this point, I just defined a simple schema:
message HTTPResponse {
message HTTPRequest {
enum HTTPMethod {
GET = 0;
PUT = 1;
POST = 2;
}
required HTTPMethod method = 1;
required string host = 2;
optional int32 port = 3;
optional string path = 4;
}
required HTTPRequest request = 1;
required int32 status_code = 2;
optional bytes content = 3;
}
Each HTTPResponse simply contains an HTTPRequest, a content, and other fields. This is just to test protocol buffer, so I called this message “dummy_http”.
0x22-> Protobuf implementation
Next thing, I had to generate the metaclass generator code for protobuf, which is unique to the python implementation. More at:
https://developers.google.com/protocol-buffers/docs/pythontutorial
So, at this point, to finish my dummy protobuf module, I only needed to define a dumps and loads, so to start comparisons with the current tnetstring module. The code is really simple there, you can just find out by yourself looking at my repo (I’ll post the link at the end of the post).
0x30-> DumpWatcher Addon
Everything is set to begin the test: a message definition is there, together with some piece of code to extract flow.get_state() dictionary into a suitable structure.
The DumpWatcher addon I developed to test code, does basically this:
- When the proxy is running, it invokes running() event on every addon; at that point, DumpWatcher applies a watcher decorator to every dumps and loads function, of every imported serialization module.
- When a response is received by the proxy, the specific response() event is invoked, passing the flow to the addon. At that point, if the switch option has been set to true by the user, it runs dumps and loads to the flow.
- Every decorated function perform its deeds, and prints to event log performance informations! (time, size of blob)
This is obviously a beta implementation, it still needs many commodities and features – and I still have to put SQLiteDB inside that mess – yet this is a good way to inspect how the protobuf python implementation performs with our objects!
That’s all, fellows! I’ll be posting more as soon there’s more on my workspace :)