The Hut Group: Difference between revisions
(Created page with "Harry.Collard@thehutgroup.com Information Extraction from Semi-structured Web Pages Although content on the web is written using a common structured markup language, the im...") |
No edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
Harry.Collard@thehutgroup.com | Harry.Collard@thehutgroup.com | ||
[[Retail Startup Automator]] | |||
Previous discussion: | |||
Information Extraction from Semi-structured Web Pages | Information Extraction from Semi-structured Web Pages | ||
Line 12: | Line 15: | ||
Previous years: | Previous years: | ||
2014: [[Purchase Abandonment Predictor]] | |||
[[Buying Pattern Prediction]] | [[Buying Pattern Prediction]] |
Latest revision as of 16:24, 15 October 2016
Harry.Collard@thehutgroup.com
Previous discussion:
Information Extraction from Semi-structured Web Pages
Although content on the web is written using a common structured markup language, the implementation by different websites varies widely: Your task is to implement an information extraction system that can extract content from e-commerce websites. Your system should be able to reliably extract product content and data from a variety of semi-structured product pages. You should then build a web application around this algorithm. Examples include: an e-commerce aggregator, effectively allowing shoppers to browse products from a broad swathe of online retailers, or a comparison site for clothing that can compare competitor products across brands by intelligently matching metadata.
Feedback:
Harry - can you give a little more information about what new business opportunities might be possible as a result of extracting the information? I'd like to identify some more specific technical challenges - at the moment, it seems like the intention would be just to create another price comparison or retail aggregator site. This is a fairly crowded market, so it's possible that students would not see much novelty.
Previous years: