Python Woes – Stdin

wiesmann@wagamama~> ping 8.8.8.8 | big.py PING 8.8.8.8 (8.8.8.8): 56 data bytes PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from 8.8.8.8: icmp_seq=0 ttl=54 time=23.645 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=13.098 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=14.101 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=13.718 ms 64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=16.194 ms 64 bytes from 8.8.8.8: icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from 8.8.8.8: icmp_seq=5 ttl=54 time=13.336 ms 64 bytes from 8.8.8.8: icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from 8.8.8.8: icmp_seq=6 ttl=54 time=13.188 ms 64 bytes from 8.8.8.8: icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from 8.8.8.8: icmp_seq=7 ttl=54 time=28.381 ms 64 bytes from 8.8.8.8: icmp_seq=8 ttl=54 time=13.067 ms 64 bytes from 8.8.8.8: icmp_seq=8 ttl=54 time=13.067 ms

One important aspect of a good programming language is that the effect of code should be predictable. I wanted to write a small utility program that takes the its input from stdin and transforms it to make it bigger on stdout. Typically the kind of quick hacking you could do in Python. So I wrote the following code:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys

def main(argv):
  for line in sys.stdin:
    line = line.rstrip()
    sys.stdout.write('\x1b#3')
    sys.stdout.write(line)
    sys.stdout.write('\n\x1b#4')
    sys.stdout.write(line)
    sys.stdout.write('\n')

if __name__ == "__main__":
    main(sys.argv[1:])

This worked fine with commands which terminate (and close the file descriptor) but failed with continuous commands like ping: the code blocks and seems to wait for the stdin descriptor to be closed. It is not just a buffering issue, because eventually ping would fill up whatever buffer.

I tried to turn off Python input buffering with the -u command line flag, no change. Switching from the implicit iterator to using readlines(80), i.e setting an explicit buffer size did not solve the issue either. So I ended up rewriting the code the following way:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys

def main(argv):
  while True:
    line = sys.stdin.readline()
    if (not line):
      break  # EOF = empty line
    line = line.rstrip()
    sys.stdout.write('\x1b#3')
    sys.stdout.write(line)
    sys.stdout.write('\n\x1b#4')
    sys.stdout.write(line)
    sys.stdout.write('\n')
    sys.stdout.flush()

if __name__ == "__main__":
    main(sys.argv[1:])

The fact that there is a functional different between calling readlines() or repeatedly calling readline() and detecting end of file is really not intuitive, and typically the type of black magic that languages should avoid. If python files were generators, then this mess would not be there.

2 thoughts on “Python Woes – Stdin”

  1. As almost always, I don’t really agree with your comments about Python, is it, or not, a good language. For me, it’s a matter of taste.

    However, about readline() and readlines(), there is a bit of a convention here : readline() return one line at a time, readlines() return a _list_ of lines and tries to buffer it complettely in memory. So, repeateadly call readline() doesn’t exactly give you the same result, you must make it a list before.

    The second important thing is that the file interface of Python waits for an EOF before processing the file itself. So, if your file is ping continuously piping lines in the sys.stdin of Python, you should not see anything before it returns an EOF.

    So, your example is the right thing to do in Python 2. I would have done something like this :

    “””
    import sys

    def main(argv):
    while True:
    try:
    line = sys.stdin.readline()
    except KeyboardInterrupt:
    break

    if not line:
    break

    line = line.rstrip()
    sys.stdout.write(‘\x1b#3’)
    sys.stdout.write(line)
    sys.stdout.write(‘\n\x1b#4’)
    sys.stdout.write(line)
    sys.stdout.write(‘\n’)
    sys.stdout.flush()

    if __name__ == “__main__”:
    main(sys.argv[1:])

    “””

    But please note that this is not the way it would happens in Python 3, where the annoying buffering you mentionned has been removed, and your original code sample works just fine… (Python 3 is a bit more modern, shall we says…)

Leave a Reply to Antoine BoegliCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.