DWITE Online Computer Programming Contest
January 2008
Problem 3
Don't follow my links

There's a lot of spam on the internet -- blog comments, forum posts, etc., all done for the purpose of planting enough links and influencing search engines such as Google to think that a certain page is more important than it should be. One of the solutions is to mark untrusted links with a rel="nofollow" attribute, telling spiders to ignore the link. A sample link might look like this:

<a href="http://compsci.ca/" title="Computer Science Canada" rel="nofollow">sample link</a>
		

The goal is to write a program that will find all the links in a text file and insert the nofollow attributes properly. rel="" should be inserted as the last property of the link, unless it already exists. The nofollow value should be inserted last in the rel= string, unless it already exists. Rel could have multiple values, space separated. Refer to the sample input for examples.

The input file DATA3.txt will contain five lines of text, each containing one link in the form <a*>*</a>. Links might be surrounded by filler text. Each line will be no more than 255 characters long.

The output file OUT3.txt will contain five lines -- just the parsed links.

Sample Input:
This is a <a>sample link</a>.
<a rel="" href="http://dwite.ca/">link with rel</a>
<a href="http://compsci.ca/" rel="nofollow">link with no follow</a>
<a href="http://compsci.ca/blog" rel="external">more rels</a>
text <a href="http://compsci.ca/v3/viewforum.php?f=131" title="">link</a> more text
		        
Sample Output:
<a rel="nofollow">sample link</a>
<a rel="nofollow" href="http://dwite.ca/">link with rel</a>
<a href="http://compsci.ca/" rel="nofollow">link with no follow</a>
<a href="http://compsci.ca/blog" rel="external nofollow">more rels</a>
<a href="http://compsci.ca/v3/viewforum.php?f=131" title="" rel="nofollow">link</a>