I’m trying to connect to Google Chrome sync (that synchronizes your Chrome settings and your currently opened tabs).
For now I’m concentrating on on the tab syncing. I connected to the Google Talk servers and I’m receiving messages from tango bot whenever I navigate to a new webpage in Chrome.
But I have difficulties decoding those messages as they are encodes in Google’s protobuf format – because there are tons of different protobuf classes dedicated to Chrome Sync and I think there’s no way of figuring out the type of a binary protobuf message?
A typical message would look like this (base64 encoded, XXXX’t out my mail adress):
CAAilQEKQAoGCgQIAxACEiUKBgoECAMQARISCZwF6dZYmkeFEXZLABNN3/yMGgcIhSwQAxgBINP80ri/JyoIMTgxOTgxMjYaUQpPCgwI7AcSB1NFU1NJT04QARiw64/I0se0AiIyVzpDaGZDeU9JWUZXdXFuUmRXaGtJWk94VkRSM1lmTGU1M0FoRGVxT2EwOHVQUHcyOD0wASoGCgQIAxACMAI4AUIrCG8SJxAEGAIiFGRlbHXXXXXXXXdAZ21haWwuY29tQgl0YW5nb19yYXdIAQ==
I tried decoding it with some of the protobuf classes (that I compiled for Java), but with none of them I got any useful data.
Does anyone have more information on this topic? Some insight on how to find the right protobuf class for decoding a certain binary message would be great. It would even help me to some point to be able to decode that exact message I gave as an example above.
There is very little public documentation and the Chromium source code is really difficult to look trough if you’re not a C++ guy…
(I’m developing in Java, if that matters)
Yes, that is broadly possible; however, it cannot be done with the data you have posted because you have corrupted it irretrievably in your attempt to remove your email address. Protobuf is very sensitive to that; I tried replacing the XXXXXXXX with the base-64 for a 6-letter email-address, but the byte immediate before that is 199, and 199 cannot be legal there (the data immediately before string contents is the length of the string encoded as a varint, and a varint can never end with the most-significant-bit of the last byte set, because the MSB is a continuation flag).
If you have raw protobuf binary, you can try running it through
protoc --decode_raw, and see what it says; that may give you enough to start reconstructing the layout. Alternatively, you can try parsing it manually with your preferred implementation’s “reader” API (if it has one). For example, using protobuf-net andProtoReader, I was able to piece together (the numbers in brackets are the offsets after reading each field-header):The problem is that due to the corruption (because of your replacement), it is impossible to say much beyond that field 4; by that point, everything could be completely gibberish due to the lengths being off. So I have very little confidence past that point. The main point of the above is simply to illustrate: yes, you can parse protobuf data without knowing the schema in advance, to reverse engineer a schema – but it requires: