Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I worked on a product inside Google which used protos (v1) as the data format to a web front end, and in practice, that system was a failure, in part to the decision to use protos. The deserialization cost of protocol buffers is too high if you're doing complex data throughput, and even though the data size is smaller, it's better to send larger gzipped JSON (which will be decompressed in native code) and deserialized into JS (also via native code). We weren't using ProtoBuf.js, but our own internal javascript implementation of a similar library, and doing all of this in JS was too expensive. Granted, we were sending around protos that had multi megabyte payloads at times.

We rewrote our app eventually to send protos in JSON format to the app, while just letting our backends still pass around native protos, it worked a lot better.



Things have changed a lot since your experience, I think. For one, a different encoding called "JSPB" has become the de facto standard for doing Protocol Buffers in JavaScript, at least inside Google. JSPB is parseable with JSON.parse(), so it avoids the speed issues you experienced.

And looking forward, JavaScript parsing of protobuf binary format has gotten a lot faster, thanks in large part to newer JavaScript technologies like TypedArray. Ideally JSPB would be deprecated as a wire format in favor of fast JavaScript parsing of binary protobufs, but this would of course be contingent on the performance being acceptable.

Finally, JSON is becoming a first-class citizen in proto3, so protobuf vs. JSON will no longer be an either/or, it can be a both/and. https://developers.google.com/protocol-buffers/docs/proto3#j...


I'm not sure that TypedArray will help that much. For web apps, most of the data is strings and at some point you have to deserialize the strings so that regular JavaScript code can work with them (rather than asm.js code which would work with the bytes directly).

The proof of concept would be to send an array of strings as bytes in a TypedArray, deserialize it to an array of JavaScript strings using JavaScript (not native code), and show that this is about as fast as doing the same thing using JSON.parse(). It seems likely that JSON.parse() will have an easier time creating all those JavaScript strings and other objects at once from native code.


TextDecoder/TextEncoder is the emerging standard way to do this.

https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder


Interesting, how does that relate to this? https://encoding.spec.whatwg.org/


That's the same thing.


What benefits do I get from ProtoBuf, apart from the standard binary wire format?

JSON is just more popular as a serialization format. It doesn't matter what what programming language or OS I am on, there is almost always a built-in library that de/serialize JSONs at reasonable speed. To send the JSON objects around from one service to another, I can just gzip the string if it's big, or just plain UTF-8 string if it's not.

ProtoBuf has to provide more values for people like me to switch. I would rather try out Apache Avro first as a replacement for what I am doing right now.


In my opinion, the biggest benefit from using protobuf is that the schema exists in a .proto file. This can be used to provide all sorts of conveniences.

With a plain JSON-based API, you copy and paste field names out of sample code or the documentation. If you spell a field name wrong, there will be no error on the client. If you're lucky, the server might error out because it didn't recognize the property name, but it also might not. If you send an integer when the server was expecting a string, the server might automatically convert or it might not.

With protobuf, the schema is explicit in a .proto file. That means that the client library can tell you, at the precise moment that you say msg.misspledFieldName, that the field name doesn't exist. Or if you try to put an integer in there instead of a string, it can tell you about that too. Basically it makes for a tighter feedback loop, which is almost always better.

In statically-typed languages like C++ or Java, the schema can be used to generate static types too, so it's actually a compile-time error when you misspell a field name.

> It doesn't matter what what programming language or OS I am on, there is almost always a built-in library that de/serialize JSONs at reasonable speed.

Yep, that's one reason that proto3 will support JSON as a first-class citizen: https://developers.google.com/protocol-buffers/docs/proto3#j...


summary: xml annoyances for json

:-)


XML isn't painful because it has a schema, XML is painful because it wasn't really designed for RPC, so getting to feature parity with something like Protocol Buffers takes a whole stack of XML technologies and a huge mess of complexity.

Protocol Buffers were designed from the ground up for RPC, and as a result are far simpler and more convenient to use than XML. Seriously, nobody who uses Protocol Buffers compares them to XML, because it's not even a comparison.

https://developers.google.com/protocol-buffers/docs/overview...


The encoding for protobufs is significantly more compact than JSON. If you're logging or persistently storing your data on the server, this can cut down your storage and bandwidth costs significantly, particularly if you're operating at Google scale.

Haberman also mentioned the schema benefits.

All that said, I'm using JSON for my current startup. I view them as optimizing for different parts of the product's lifecycle: JSON lets you quickly adapt the protocol and switch out different languages for different services when you're figuring out what product to build, while Protobuf saves you money when you're trying to scale it. I'm also pretty intrigued by Cap'n Proto as a high-performance serialization format, since it fixes a lot of the problems we faced using protobufs at scale at Google, but its language support just isn't up to protobuf/JSON yet, and the protocol is quite complicated.


I use Protobufs for my startup and it has saved us an incredible amount of time building out iOS, Android, and Web clients. With a small team, any time we can shave by not having to re-write the modeling layer in all of these languages is a big win. As the writer of the APIs, I publish the new Protobuf models/services and then can switch over and instantly start working with real objects in Swift or Java.

Coming from a larger startup, I've also experienced the pains of trying to maintain JSON objects between different services. Protobufs have some quirks, but I think its a great solution to get behind at any stage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: