Here’s an open souce parsing HTML parsing library that does not use libxml2 and can handle broken markup just like a browser called HTMLReader from Nolan Waite.
HTMLReader is WHATWG compliant and works with CSS selectors so if you are working with possibly malformed HTML and are looking for a library that treats html like a browser it looks like a good choice.
view source