Tag Archives: HTMLCleaner

Parsing HTML with Groovy and HTMLCleaner

HTML found on the web can sometimes be invalid and difficult to parse. There are several HTML cleaning utilities that convert this invalid HTML to valid XML which is easier to work with. Two of these are Tag Soup and HTMLCleaner. Tag Soup has a much nicer syntax when used with Groovy, but I decided [...]
Posted in Groovy | Also tagged , , , , | Comments closed