Comparing different serializers

If you know what a “serializer” is, skip the introduction!

Introduction

I’ve decided to also blog about technical issues my engineers and I run into during the developement of Investamatic.com. It’s a good way to document what worked while simultaneously sharing the learning experience with others. Today’s issue is serialization. “Serializers” are commonly required when two program need to communicate and must pass structured data (“objects”) to each other across a network or through storage. The sender’s in-memory data is formatted into a representation that can be transmitted over the internet irrespective of the operating systems, computer types etc at either ends. Some popular representations are Json and XML and there are many more with different trade offs in computer speed, human readability, compatibility with other programs, etc.

The contenders

The serializers I’ve compared are for the C# language

Json.NET: A very popular JSON serializer. I think Microsoft replaced their own JSON serializer with this one because the Microsoft one was too slow (I didn’t verify this).
ServiceStack JSON: Another popular JSON serializer
ServiceStack JSV: Same as above but in more efficient JSV format. I actually like it because it’s quite small, very fast and still very human readable when it comes to debugging.
Microsoft Binary Formatter: Generic Binary formatter. It was surprisingly large in size (=bad) and performance was alright. It is very flexible and can work with larger data types. Seems Microsoft designed it to be more general purpose than for speed.
ProtoBuf .NET: Protocol Buffers is a very fast serializing scheme designed by Google for it’s own internal programs. Its not as flexible as Microsoft’s Binary Formatter above. However, it the fastest and the smallest. Important if you’re running a Google sized operation.
Microsoft XML serializer: XML still hang around from it’s old SOAP days. It’s rather verbose as a spec so it’s a large output (=bad) irrespective of the implementation. Not that fast either. Here “just for a reference”.

Benchmark application

I wrote a quick application that would

take a simple object
run it through a serializer to get the representation
run it back through the deserializer to get back a clone
compare the original and cloned object
examine the speed and size of the serialized output (what is sent over the network)
Do the above for every serializer in the mix

The benchmark application I quickly wrote us is released it as an open-sourced project at GitHub. Download, compile, run.

Warm-start vs cold start

Almost all serializers have an upfront computational cost to initialize themselves. To warm start, I basically ran the same object through every serializers 5 times. I then compared the time taken to serialize ONE object, dropping all cold-start times. This is a stark contrast to some (unrealistic) benchmarks which serialize in a tight loop about a million times and THEN compute the per object serialization time. Almost no application I know has a use fast requiring a single thread to serialize even a thousand objects in a tight loop. So I believe examining the warm-start times is the most realistic, “apples-to-apples” comparison reflective of the real world use case for serialization.

Results

Serializer	Type	Cold Start Time (microsec)	Warm Start Time (microsec)	Size (bytes)	Warm-start Speedup over Cold-start (X times)
ProtoBuf .NET	binary	125,770	34	99	3754x
ServiceStack JSV	text	123,688	38	181	3229x
ServiceStack Json	text	130,128	49	205	2661x
.NET Binary Formatter	binary	13,565	94	512	144x
Json.NET	text	274,785	146	205	1881x
.NET Xml Serializer	Text	138,447	224	461	618x

Conclusion

It’s actually simple.

If you want the absolute fastest and smallest: ProtoBuf is the winner
If you want very fast, pretty small and very human readable: ServiceStack JSV is the winner. We’ll be using this in places where we won’t be talking to the user’s browser.
If you want the fastest JSON (=wide compatibility, notiably with browsers) : ServiceStack JSON is the winner