The Hut Group

From Computer Laboratory Group Design Projects
Revision as of 13:38, 7 October 2016 by afb21 (talk | contribs)
Jump to navigationJump to search

Harry.Collard@thehutgroup.com


Information Extraction from Semi-structured Web Pages

Although content on the web is written using a common structured markup language, the implementation by different websites varies widely: Your task is to implement an information extraction system that can extract content from e-commerce websites. Your system should be able to reliably extract product content and data from a variety of semi-structured product pages. You should then build a web application around this algorithm. Examples include: an e-commerce aggregator, effectively allowing shoppers to browse products from a broad swathe of online retailers, or a comparison site for clothing that can compare competitor products across brands by intelligently matching metadata.


Feedback:

Harry - can you give a little more information about what new business opportunities might be possible as a result of extracting the information? I'd like to identify some more specific technical challenges - at the moment, it seems like the intention would be just to create another price comparison or retail aggregator site. This is a fairly crowded market, so it's possible that students would not see much novelty.

Previous years:

2014: Purchase Abandonment Predictor

Buying Pattern Prediction