I once worked in a company specializing in embedded electronics for industrial applications. In one particular project, the device communicated through a RS-422 cable to the computer and seemed to return weird data once in a while, causing unwanted behavior in the control computer whose programming did not provide for this unexpected data. So I took upon myself to test the communication channel as it seemed that the on-board software was operating properly and did not contain serious bugs. I added a check-sum to the data packet and it turned out that some packets came in indeed corrupted despite the supposedly superior electrical characteristics of the RS-422 link.
After a few days’ work, I implemented the communication protocol that could detect and repair certain errors while reverting to a request to retransmit if the data was too damaged. I then started gathering statistics on error rate, number of retransmit, etc, and the spurious behavior on the controller’s side went away. My (metaphorically) pointy-haired boss opposed the modification because “we didn’t have any of these damn transmission errors until you put your fancy code in there“. Of course, this was an epic facepalm moment. I tried to explain that the errors have always been there, except that now they’re caught and repaired. Needless to say, it ended badly.
Notwithstanding this absurd episode, I kept using check-sum to validate data whenever no other layer of the protocol took care of transmission errors. So, this week, let us discuss check-sums and other error detection algorithms.