The Hut Group
Harry.Collard@thehutgroup.com
Information Extraction from Semi-structured Web Pages
Although content on the web is written using a common structured markup language, the implementation by different websites varies widely: Your task is to implement an information extraction system that can extract content from e-commerce websites. Your system should be able to reliably extract product content and data from a variety of semi-structured product pages. You should then build a web application around this algorithm. Examples include: an e-commerce aggregator, effectively allowing shoppers to browse products from a broad swathe of online retailers, or a comparison site for clothing that can compare competitor products across brands by intelligently matching metadata.
Feedback:
Harry - can you give a little more information about what new business opportunities might be possible as a result of extracting the information? I'd like to identify some more specific technical challenges - at the moment, it seems like the intention would be just to create another price comparison or retail aggregator site. This is a fairly crowded market, so it's possible that students would not see much novelty.
Previous years: