New Languages, old worries…

surrogate_pairs

With each crop of new languages, comes a flurry of articles explaining how much better they are than the incumbent (C++). The fact that the incumbent is still the same after all these years, is an interesting sign, but I find it more interesting that the big argument in favour of these languages is the absence of pointers and the related errors.

This has been the main argument of all the new languages for as long as C++ has been king of the hill, and is technically correct, it is also an issue that can be well mitigated using modern C++ structures like string and smart pointers.

Object references and arrays are only two of the many data structures used in code. What I would really like for new languages is to handle or prevent other common data bugs. Instead of asking if a new language lets the user address memory, it would be nice to ask about other bug prone operations:

  • Does the language allow the equality operator for floating point values?
  • Does the language allow substring operations on text strings?
  • What kind of string comparison does the language offer?

Those two operations are allowed in many languages (including C++), but most of the time, using them is a bug. I ranted about floating point equality before. Cutting a substring in a string is a dangerous operations because most languages have an implicit representation of strings, in Go it is UTF-8 (strings are just arrays of bytes), in Java it is UTF-16. UTF-8 contains multi-byte characters, so cutting a string at any arbitrary byte can result in invalid UTF-8 data, UTF-16 has multi 16-bit word characters (surrogate pairs), so cutting at an arbitrary 16-bit word limit can result in invalid UTF-16 data. String comparison is similar in essence to floating point comparison, bit equality comparison, as implemented by most languages, is not what you want: you want to normalise the text representation in some way before comparing the strings.

Now you could argue that these problems should not be handled in the language, that people should use some library that does the right thing. Guess what, this is how C++ solved memory management problems…

One thought on “New Languages, old worries…”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.