DWITE • March 2010 • Kind of like OCR

Optical character recognition (OCR) is the process of extracting textual information from images. While the current technology is mostly software-based, rather than using optical devices, the term has stuck around.

For the problem, we’ll work with a very simple alphabet, with each letter strictly defined by a 2×2 or 3×2 bitmap.

A: x.
   xx

B: xx
   xx
    
C: x.x
   xxx

D: xx
   .x

E: xxx
   .xx

The input file DATA4.txt will contain 5 sets, each 2 lines long, each line at least 2 and no more than 30 characters long. A set will spell out some word in the above alphabet. It will always be a valid word, and it will not be ambiguous (in a way that only one possible word could make the design pattern).

The output file OUT4.txt will contain 5 lines of output – the recognized words.

Note: You would need to take a word as a whole to distinguish between some of the cases. For example: In the sample below, while the first character could be read as C, the rest of the word would not be made of valid characters.

Sample Input (first two shown):

x.x.x
xxxxx
x.x.xx
xxxxxx

Sample Output (first two shown):

AC
AAB