How to improve performance by using binary serialization instead of json or xml

Xml and (later) json formats became a standard in development and many programmers do not have an idea that using them may be a bad choice for network interactions or persisting data. People are used to text but computers do not need it!

Have you ever thought how many cycles your CPU wastes on parsing text? Practically the whole web technology is text-based. Each time you load a page in a browser a remote server has to parse your request headers to understand what you want. Then your browser is parsing response headers, HTML, CSS. Even JavaScript program is interpreted from a text form. Believe me, this is a huge overhead!

Thank god, today the new binary HTTP2 protocol is available and the Web Assembly technology should be ready to replace a plain text JavaScript in a near future. This trend shows that nowadays more and more developers begin to realize this simple sentence: machines do not need a text.

In addition, binary formats are much more effective in terms of size and traffic.

The more performance critical application you develop the more throughput you want to achieve. Using a binary format can drastically reduce CPU usage on parsing and network usage on transmitting redundant data, which may be the key to improve the throughput. Btw, did you know that transferring data through TCP or UDP sockets still uses some CPU time?

In short, binary formats are usually simpler, far smaller (3 to 10 times) and much faster (20 to 100 times) than xml or json.

Google Protocol Buffers

One of available solutions for binary serialization is Google Protocol Buffers. It is a language and platform neutral mechanism for persisting structured data.

The format is not self-describing which means you need a defined data structure to read it. Nevertheless, this should not make it harder to debug your application. In fact, when you view data transferred in a text format it is already byte-encoded using UTF8 or another string encoding. So anyway, you have to use some kind of a decoder that converts binary stream into human-readable format. For Protocol Buffers you just have to specify to a decoder a structure that you expect an input stream to contain. Then you can view a decoded structure directly in your favorite IDE.

Protocol Buffers for C# / .NET, protobuf-net

Protobuf-net is an implementation of Google Protocol Buffers for C#. Unfortunately, Protocol Buffers format does not suit well when you want to preserve not just data, but a complex .NET object with hierarchy, references, dictionaries, etc.

Protobuf-net serializer implements some enhancements to Protocol Buffers format to support basic things like reference tracking.

It is much faster and compact than any built-in .NET serializers like BinaryFormatter or WCF DataContract serializers.

Being a big believer of a binary serialization I have been developing a fork of protobuf-net — AqlaSerializer, which supports even more features of .NET like nested collections and multidimensional arrays. You can read more about AqlaSerializer, a fork of protobuf-net, here.

Conclusion

I hope I showed you how using binary formats may make your life better and which tools you can use in .NET. About Google Protocol Buffers implementations for non-CLR languages you may read here.