DWITE • March 2010 • Summary Diff

Sometimes when the files are different, we want to have a summary of how far off they are. In the case of well-formatted data, we might also want to know by how much the values differ. A practical example would be comparing some kind of a usage report against the expected values.

The input file DATA3.txt will contain 5 sets of input. Each set starts with a line having an integer 0 ≤ N ≤ 5 and another line with an integer 0 ≤ M ≤ 5. These are followed by N+M lines with the contents of the two files. There will be a break line containing three hyphens “---” after each set.

Each line inside a “file” to be compared is a string-integer pair separated by a single space. The string is a 3 character word (lower case alpha characters), and the integer is non-negative and less than 100.

Each “file” is in a sorted order according to the leading string. The string keys in each file are unique.

The output file OUT3.txt will contain 5 lines of output, each containing a pair of integer sums separated by a space. The first integer is the total number of lines missing between two files. The second integer is a sum of the absolute differences in the values of the lines where string keys match.

Notes on the sample below: In the first case both files have just a single line. The keys are the same, so zero lines are missing, but the values differ by one. In the second case the first file is missing a line with “baz” while the second file is also missing a line with “foo”, so there is a total of two missing lines. The remaining lines have their values differ by one.

Sample Input (first two shown):

1
1
foo 42
foo 41
---
2
2
bar 1
foo 42
bar 2
baz 40
---

Sample Output (first two shown):

0 1
2 1