BlogForever, a collaborative European Commission funded project, developed an exciting new system to harvest, preserve, manage and reuse blog content. The system is performing an intelligent harvesting operation which retrieves and parses hypertext as well as all other associated content (images, linked files, etc) from blogs. It copies content by interrogating not only the RSS feed of a blog, but also by copying data from the original HTML. The parsing action is able to render the captured content into structured data, expressed in XML; it does this in accordance with the project’s data model.

Blogforever official webpage