I have often heard complaints against Java for not having unsigned data types. See for example this comment. I would like to know how is this a problem? I have been programming in Java for 10 years more or less and never had issues with it. Occasionally when converting bytes to ints a & 0xFF is needed, but I don’t consider that as a problem.
Since unsigned and signed numbers are represented with the same bit values, the only places I can think of where signedness matters are:
- When converting the numbers to other bit representation. Between 8, 16 and 32 bit integer types you can use bitmasks if needed.
- When converting numbers to decimal format, usually to Strings.
- Interoperating with non-Java systems through API’s or protocols. Again the data is just bits, so I don’t see the problem here.
- Using the numbers as memory or other offsets. With 32 bit ints this might be problem for very huge offsets.
Instead I find it easier that I don’t need to consider operations between unsigned and signed numbers and the conversions between those. What am I missing? What are the actual benefits of having unsigned types in a programming language and how would having those make Java better?
Why not? Is “applying a bitwise AND with 0xFF” actually part of what your code is trying to represent? If not, why should it have to be part of have you write it? I actually find that almost anything I want to do with bytes beyond just copying them from one place to another ends up requiring a mask. I want my code to be cruft-free; the lack of unsigned bytes hampers this 🙁
Additionally, consider an API which will always return a non-negative value, or only accepts non-negative values. Using an unsigned type allows you to express that clearly, without any need for validation. Personally I think it’s a shame that unsigned types aren’t used more in .NET, e.g. for things like
String.Length,ICollection.Countetc. It’s very common for a value to naturally only be non-negative.Is the lack of unsigned types in Java a fatal flaw? Clearly not. Is it an annoyance? Absolutely.
The comment that you quote hits the nail on the head:
Suppose you are interoperating with another system, which wants an unsigned 16 bit integer, and you want to represent the number 65535. You claim “the data is just bits, so I don’t see the problem here” – but having to pass -1 to mean 65535 is a problem. Any impedance mismatch between the representation of your data and its underlying meaning introduces an extra speedbump when writing, reading and testing the code.
The only times you would need to consider those operations is when you were naturally working with values of two different types – one signed and one unsigned. At that point, you absolutely want to have that difference pointed out. With signed types being used to represent naturally unsigned values, you should still be considering the differences, but the fact that you should is hidden from you. Consider:
Suppose
foois 100 andlengthis -1. What’s the logical result? The value oflengthrepresents 65535, so logicallyfoois smaller than it. But you’d probably go along with the code above and get the wrong result.Of course they don’t even need to represent different types here. They could both be naturally unsigned values, represented as signed values with negative numbers being logically greater than positive ones. The same error applies, and wouldn’t be a problem if you had unsigned types in the language.
You might also want to read this interview with Joshua Bloch (Google cache, as I believe it’s gone from java.sun.com now), including: