Completeness Estimation of OpenStreetMap POI Data Using Machine Learning Approaches

Crisafulli, Marco and Monzón, Dominic (2021) Completeness Estimation of OpenStreetMap POI Data Using Machine Learning Approaches. Other thesis, OST Ostschweizer Fachhochschule.

[thumbnail of FS 2021-BA-EP-Crisafulli-Monzón-Completeness Estimation of OpenStreetMap POI Data Using Mach.pdf]

Text
FS 2021-BA-EP-Crisafulli-Monzón-Completeness Estimation of OpenStreetMap POI Data Using Mach.pdf - Submitted Version
Download (18MB)

Abstract

As OpenStreetMap (OSM) gains traction and is considered a viable alternative to service providers like Google Maps, the question of the quality of the provided data becomes increasingly important. A key factor for the quality of geographical data is the completeness of entities that are included or omitted in a dataset. And currently, there is no general solution to determine it. The vision of this project is to lay the groundwork for an approach with an open-source tool that can be used by the community and by users to check desired areas for completeness.

This work aims to estimate intrinsically - i.e., without comparing to a 'golden dataset' - the number of Points of Interest (POIs) in a defined area. These values compared to the number of existing POIs act as an indicator for completeness. The nature of the problem and the size of available data is predestined for machine learning (ML) methods. An initial model was trained based on high-resolution imagery (orthophotos). It showed that there are relationships that can be detected by ML algorithms. Thus, a model was trained using only intrinsic data provided by OSM. Under the assumption that the training and validation areas are completely mapped, the implemented model performs well enough to show a trend where entities are missing.

The results are visualized in a color-coded grid showing the areas which are predicted to either be complete, improvable, or incomplete. As it is trained on data in Swiss cities it works best for urban areas in Switzerland and neighboring countries because of the geographic and demographic similarities. By use of re-training the model it is possible to predict other areas. One drawback of the intrinsic approach is that a certain amount of existing data is needed to make a prediction. Further, the quality of the prediction itself can only be measured on the assumption that the training and validation areas are well mapped. In conclusion, we provide a model which estimates the completeness of an area and indicates if further investigation is needed.

Item Type:	Thesis (Other)
Subjects:	Area of Application > GIS > OpenStreetMap Area of Application > Data Mining Metatags > IFS (Institute for Software)
Divisions:	Bachelor of Science FHO in Informatik > Bachelor Thesis
Depositing User:	Stud. I
Contributors:	Contribution Name Email Thesis advisor Keller, Stefan UNSPECIFIED Reviewer Jordan, Nicola UNSPECIFIED
Date Deposited:	13 Sep 2021 08:27
Last Modified:	20 Sep 2021 07:53
URI:	https://eprints.ost.ch/id/eprint/940

Actions (login required)

: View Item