Kalman Filter
Using the Kalman Filter on IOT-sensor temperature data in Python
I started looking into the Kalman Filter for a separate project on cleaning sensor data and found that there was limited online Python resources, so I’d like to use this page to add on to the existing and provide my code in case someone out there can find it useful. I’m also happy to hear from others if the way I ended up using the filter was completely incorrect.
Since I am not well versed in the Kalman Filter, I will not be going into the mathematical definition, but there is a wealth of information online. I will be using the Kalman Filter in a single dimension solution (i.e. not calculating velocity), for temperature measurement readings from an IOT sensor. The csv file was downloaded from Kaggle and can be found here.
Importing libraries
import pandas as pd import plotly as py import plotly.express as px import plotly.figure_factory as ff import numpy as np import scipy.stats as stats import matplotlib.pyplot as plt import statsmodels from itertools import compress from math import * from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error from sklearn.model_selection import train_test_split from pykalman import KalmanFilter from scipy import interpolate from sklearn import datasets, linear_model from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error from math import sqrt from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt from plotly.offline import init_notebook_mode, plot, iplot init_notebook_mode(connected=True)
Data describe
The data provided had both an inside and outside data sensor, so I decided to split the original dataset into two based on this Boolean value.
Summary:
Total data pts = 97606, 8733 inside, 22605 outside
Inside mean = 30.67, outside mean = 39.51
Inside min = 21, outside min = 24
Inside max = 41, outside max = 51
df = pd.read_csv (r'C:\Users.... (enter your file path) df_out = df[(df['out/in']=='Out')] df_in = df[(df['out/in']=='In')] # Inside Sensor data set temp_in = pd.DataFrame(df_in, columns=['noted_date', 'temp']) temp_in = temp_in.groupby(['noted_date']).mean() temp_in.reset_index(inplace=True) temp_in['temp_inside'] = temp_in['temp'] del temp_in['temp'] print (temp_in.describe()) # Outside Sensor data set temp_out = pd.DataFrame(df_out, columns=['noted_date', 'temp']) temp_out = temp_out.groupby(['noted_date']).mean() temp_out.reset_index(inplace=True) temp_out['temp_outside'] = temp_out['temp'] del temp_out['temp'] print (temp_out.describe())
Kalman Filter Python code
The below is the Kalman Filter model I used on the Inside temperature data from above. I did break this table into a train and test, but will not be covering that portion of the code here.
# Kalman Filter model # using the Kalman Filter to clean the Inside IOT sensor measurements = np.asarray(temp_in) kf = KalmanFilter(transition_matrices=[1], observation_matrices=[1], initial_state_mean=measurements[0,1], initial_state_covariance=1, observation_covariance=5, transition_covariance=1) state_means, state_covariances = kf.filter(measurements[:,1]) state_std = np.sqrt(state_covariances[:,0]) plt.figure(figsize=(12, 8)) plt.plot(measurements[10:,0], measurements[10:,1], '-b', label='Data') plt.plot(measurements[10:,0], state_means[10:,0], '-r', label='Kalman-filter 5') plt.legend(loc='upper left') plt.show() kalmandf = pd.DataFrame(state_means, columns=['temp_inside_KF'], index=temp_in.index) temp_kalman = pd.concat([temp_in,kalmandf], axis=1) print(temp_kalman.head(10)) temp_kalman.drop(temp_kalman.head(10).index, inplace=True) print('') print ('--Kalman Filter--'*5) print(model_eval(temp_kalman['temp_inside'],temp_kalman['temp_inside_KF']))
Model Performance
Mean Absolute Error: 0.24 Mean Squared Error: 30.375 R2 Score: 0.953 Root Mean Squared Error: 0.504 Mean absolute percentage error: 0.806 Scaled Mean absolute percentage error: 0.802 Mean forecast error: 30.368 Normalised mean squared error: 0.047 Theil_u_statistic: 0.001