Bugs in software have many sources, but ones caused by erroneous assumptions are among the most difficult to find. The problem is that these assumptions are baked in the code, so the reader will tend to take them in at the same time as the code.

Consider the following code that creates some simple histogram of input values. It seems reasonable to assume that the resulting dictionary will have a limited size, with at most `steps`

entries. This assumption is incorrect.

def bucket(values, min, max, steps): r = max - min assert r > 0 steps = int(steps) assert steps > 0 d = float(steps) / r result = collections.defaultdict(int) for v in values: if v < min: v = min elif v > max: v = max k = math.floor((v - min) * d) / d + min result[k] += 1 return result

This code can return dictionaries which are much larger than `steps`

entries. The problem here lies in the clamping logic. The assumption here is that a value that is not smaller than `min`

and not larger than `max`

is a value in the range ⟦`min`

…`max`

⟧.

This assumption is incorrect in Python (and probably many other languages), because in floating point, there is `NaN`

. Now `NaN`

in Python has many properties that break the assumptions of the code above:

- Any floating point operation that has
`NaN`

as an argument returns`NaN`

. `float('NaN') <`

→`X``False`

∀`X``float('NaN') >`

→`X``False`

∀`X``float('NaN') == float('NaN')`

→`False`

The last property means that using floating point values as keys in dictionaries is a bad idea, as the equality operator is used to determine if two keys are the same. This means that you while you can add a value with key `NaN`

, you can never access it by key, because that key is not equal with itself. So you can also add multiple values with the same key, and they will each have a different entry in the dict:

d[float('NaN')] = 4 d[float('NaN')] = 5 d → {nan: 4, nan: 5}

Python will let you use anything as a dictionary key, even if the key is mutable, or does not implement the equality operator properly. So if you feed the following input to the `bucket`

function above, you will a return dictionary with 1000 entries, regardless of the value of the step parameter.

v = itertools.repeat(float('nan'), 1000)