Technical report | Towards an Evaluation of Air Surveillance Track Clustering Algorithms via External Cluster Quality Measures
Abstract
Clustering is a data mining technique for analysing large data sets and finding groups of elements within the data set that are similar to each other. The use of clustering on archives of historical air surveillance track data would enable the discovery of flights that exhibited similar behaviour and followed similar flight paths. However there are many different clustering algorithms available, so some method for selecting the best from the competing algorithms is required. Unfortunately the academic literature has yet to provide a general, comprehensive, and robust methodology for this task. Further the niche nature of the problem domain means the academic literature provides no direct assistance by way of reporting practical experience in the use of particular algorithms on air surveillance track data. This report aims to fill the gap by describing such a methodology for evaluating and choosing between competing clustering algorithms.
Executive Summary
Clustering is a data mining technique for analysing large data sets. The technique finds groups of elements within the data that are similar to each other, but different from other data elements outside the group. The use of clustering on archives of historical air surveillance track data would enable the discovery of groups of flights that followed the same flight path. This could enable improved capability in a variety fields including situational awareness, tactical air intelligence (automated behavioural prediction and anomaly detection, indicators and warnings, etc.), strategic air intelligence (historical analysis, capability assessment, etc.), and general efficiency dividends (higher performance of air surveillance and air intelligence operators, improved training and knowledge retention practices, etc.).
However there are many different clustering algorithms available, so before clustering can be used a method for selecting the best from the competing algorithms is required. Unfortunately the academic literature has yet to provide a general, comprehensive, and robust methodology for this task. Further the niche nature of the problem domain means the academic literature provides no direct assistance by way of reporting practical experience in the use of particular algorithms on air surveillance track data.
This report aims to fill the gap by describing a methodology for evaluating and choosing between competing clustering algorithms. Note that this report does not describe the outcome of actually performing an exhaustive evaluation and selection process. Rather, this report describes the methodology and experience from a trial of the methodology on a test data set of air surveillance track data. The experience was generally positive, in that the methodology achieved the desired outcome, however it is concluded improvements to the methodology can and should be sought.