If you know what a “serializer” is, skip the introduction!
I’ve decided to also blog about technical issues my engineers and I run into during the developement of Investamatic.com. It’s a good way to document what worked while simultaneously sharing the learning experience with others. Today’s issue is serialization. “Serializers” are commonly required when two program need to communicate and must pass structured data (“objects”) to each other across a network or through storage. The sender’s in-memory data is formatted into a representation that can be transmitted over the internet irrespective of the operating systems, computer types etc at either ends. Some popular representations are Json and XML and there are many more with different trade offs in computer speed, human readability, compatibility with other programs, etc.
- Json.NET: A very popular JSON serializer. I think Microsoft replaced their own JSON serializer with this one because the Microsoft one was too slow (I didn’t verify this).
- ServiceStack JSON: Another popular JSON serializer
- ServiceStack JSV: Same as above but in more efficient JSV format. I actually like it because it’s quite small, very fast and still very human readable when it comes to debugging.
- Microsoft Binary Formatter: Generic Binary formatter. It was surprisingly large in size (=bad) and performance was alright. It is very flexible and can work with larger data types. Seems Microsoft designed it to be more general purpose than for speed.
- ProtoBuf .NET: Protocol Buffers is a very fast serializing scheme designed by Google for it’s own internal programs. Its not as flexible as Microsoft’s Binary Formatter above. However, it the fastest and the smallest. Important if you’re running a Google sized operation.
- Microsoft XML serializer: XML still hang around from it’s old SOAP days. It’s rather verbose as a spec so it’s a large output (=bad) irrespective of the implementation. Not that fast either. Here “just for a reference”.
I wrote a quick application that would
- take a simple object
- run it through a serializer to get the representation
- run it back through the deserializer to get back a clone
- compare the original and cloned object
- examine the speed and size of the serialized output (what is sent over the network)
- Do the above for every serializer in the mix
The benchmark application I quickly wrote us is released it as an open-sourced project at GitHub. Download, compile, run.
Warm-start vs cold start
Almost all serializers have an upfront computational cost to initialize themselves. To warm start, I basically ran the same object through every serializers 5 times. I then compared the time taken to serialize ONE object, dropping all cold-start times. This is a stark contrast to some (unrealistic) benchmarks which serialize in a tight loop about a million times and THEN compute the per object serialization time. Almost no application I know has a use fast requiring a single thread to serialize even a thousand objects in a tight loop. So I believe examining the warm-start times is the most realistic, “apples-to-apples” comparison reflective of the real world use case for serialization.
Speedup over Cold-start (X times)
|.NET Binary Formatter||binary||13,565||94||512||144x|
|.NET Xml Serializer||Text||138,447||224||461||618x|
It’s actually simple.
- If you want the absolute fastest and smallest: ProtoBuf is the winner
- If you want very fast, pretty small and very human readable: ServiceStack JSV is the winner. We’ll be using this in places where we won’t be talking to the user’s browser.
- If you want the fastest JSON (=wide compatibility, notiably with browsers) : ServiceStack JSON is the winner