Outlier Detection in Energy Datasets
In the past decade, numerous datasets have been released with the explicit goal of furthering non-intrusive load monitoring research (NILM). NILM is an energy measurement strategy that seeks to disaggregate building-scale loads. Disaggregation attempts to turn the energy consumption of a building into its constituent appliances. NILM algorithms require representative real-world measurements which has led institutions to publish and share their own datasets. NILM algorithms are designed, trained, and tested using the data presented in a small number of these NILM datasets. Many of the datasets contain arbitrarily selected devices. Likewise, the datasets themselves report aggregate load information from building(s) which are similarly selected arbitrarily. This raises the question of the representativeness of the datasets themselves as well as the algorithms based on their reports. One way to judge the representativeness of NILM datasets is to look for the presence of outliers in these datasets. This paper presents a novel method of identifying outlier devices from NILM datasets. With this identification process, it becomes possible to mitigate and measure the impact of outliers. This represents an important consideration to the long-term deployment of NILM algorithms.