# Assumptions Bugs in software have many sources, but ones caused by erroneous assumptions are among the most difficult to find. The problem is that these assumptions are baked in the code, so the reader will tend to take them in at the same time as the code.

Consider the following code that creates some simple histogram of input values. It seems reasonable to assume that the resulting dictionary will have a limited size, with at most `steps` entries. This assumption is incorrect.

```def bucket(values, min, max, steps):
r = max - min
assert r > 0
steps = int(steps)
assert steps > 0
d = float(steps) / r
result = collections.defaultdict(int)
for v in values:
if v < min:
v = min
elif v > max:
v = max
k =  math.floor((v - min) * d) / d + min
result[k] += 1
return result
```

This code can return dictionaries which are much larger than `steps` entries. The problem here lies in the clamping logic. The assumption here is that a value that is not smaller than `min` and not larger than `max` is a value in the range ⟦`min``max`⟧.

This assumption is incorrect in Python (and probably many other languages), because in floating point, there is `NaN`. Now `NaN` in Python has many properties that break the assumptions of the code above:

• Any floating point operation that has `NaN` as an argument returns `NaN`.
• `float('NaN') < X``False`X
• `float('NaN') > X``False`X
• `float('NaN') == float('NaN')``False`

The last property means that using floating point values as keys in dictionaries is a bad idea, as the equality operator is used to determine if two keys are the same. This means that you while you can add a value with key `NaN`, you can never access it by key, because that key is not equal with itself. So you can also add multiple values with the same key, and they will each have a different entry in the dict:

```d[float('NaN')] = 4
d[float('NaN')] = 5
d → {nan: 4, nan: 5}
```

Python will let you use anything as a dictionary key, even if the key is mutable, or does not implement the equality operator properly. So if you feed the following input to the `bucket` function above, you will a return dictionary with 1000 entries, regardless of the value of the step parameter.

`v = itertools.repeat(float('nan'), 1000)`

## One thought on “Assumptions”

1. Ça me rappelle le NULL en bases de données, avec WHERE monchamp = NULL qui renverra toujours FAUX, et WHERE monchamp NULL qui renverra FAUX aussi.

J’avais commis un billet sur les “assumptions” (je vois pas de mot en français : suppositions ? assumations ?), mais venant du monde réel, là :
http://www.coindeweb.net/blogsanssujetprecis/index.php?post/2008/04/07/466-les-pieges-dans-le-developpement

This site uses Akismet to reduce spam. Learn how your comment data is processed.