The Hut Group: Difference between revisions

From Computer Laboratory Group Design Projects
Jump to navigationJump to search
(Created page with "Harry.Collard@thehutgroup.com Information Extraction from Semi-structured Web Pages Although content on the web is written using a common structured markup language, the im...")
 
No edit summary
Line 12: Line 12:


Previous years:
Previous years:
2014: [[Purchase Abandonment Predictor]]


[[Buying Pattern Prediction]]
[[Buying Pattern Prediction]]

Revision as of 13:38, 7 October 2016

Harry.Collard@thehutgroup.com


Information Extraction from Semi-structured Web Pages

Although content on the web is written using a common structured markup language, the implementation by different websites varies widely: Your task is to implement an information extraction system that can extract content from e-commerce websites. Your system should be able to reliably extract product content and data from a variety of semi-structured product pages. You should then build a web application around this algorithm. Examples include: an e-commerce aggregator, effectively allowing shoppers to browse products from a broad swathe of online retailers, or a comparison site for clothing that can compare competitor products across brands by intelligently matching metadata.


Feedback:

Harry - can you give a little more information about what new business opportunities might be possible as a result of extracting the information? I'd like to identify some more specific technical challenges - at the moment, it seems like the intention would be just to create another price comparison or retail aggregator site. This is a fairly crowded market, so it's possible that students would not see much novelty.

Previous years:

2014: Purchase Abandonment Predictor

Buying Pattern Prediction