Get More Info!

Announcement
Announcement
Automated conflation of multi source point of interest (POIs) for creation of best of breed datasets

Student name: Ms Poonam Chandel
Guide: Dr Vinay Shankar Prasad Sinha
Year of completion: 2016
Host Organisation: Pitney Bowes Software India Pvt. Ltd.
Supervisor (Host Organisation): Dr Neena Priyanka
Abstract: This study is an attempt to create best of the breed dataset by automatically conflating POIs of USA and UK of categories like schools, hospitals, restaurants, banks, hotels and post offices extracted by two different sources (Yellow Pages and Bing Maps). As Conflation combines the best quality elements of both the datasets to create a composite dataset that is better representation of that POI. Entire study is carried out in two stages for both the countries. In the first stage the data is scraped from Yellow pages and Bing maps websites by using trial version of Yellabot and Local scraping software and then in Spectrum Technology Platform a flow is created where all the attributes on the basis of which data is to be conflated is defined along with the algorithms like Soundex Metaphone, Metaphone 3 and Double Metaphone. To test the accuracy of the process, false positive (wrongly conflated) and false negative (not conflated) cases are identified. In USA dataset, there are no such false positive cases in all the categories in each stage whereas the false negative cases in the first stage was maximum in banks category (0.59%) followed by schools (0.27) and post offices (0.27%) and almost negligible in the remaining categories. In second stage to further reduce these false negative cases the data of both the sources is standardized to remove the nuisance in the data and to maintain the uniformity within the dataset and the algorithm in spectrum platform is replaced by edit distance which increase the number conflated results as well as reduces the number of false negative cases by almost 50%.

Keywords: POI, Conflation, Standardization, False Negative, Spectrum Technology Platform.