Smoothing Data

The goal of smoothing is to remove high-frequency noise and highlight important features of the signal, such as trends,  patterns, and anomalies.

Class attributes

Name Type Range/values Description
data pandas.core.frame.DataFrame - Dataframe to work with.
x_axis String - Select the column that will work as X axis
y_axis String - Select the column that will work as Y axis(Only for plotting)
x_start String 2 - 15 Initial point of the working range of X. (Only for plotting)
x_end String - Final point of the working range of X. (Only for plotting)
window_size Int 2 - 15 Refers to the number of data points used to calculate the smoothed value for a given point.
grade int - Refers to the order of the polynomial that is fit to the data within the window
alpha float 2 - 15 Is the smoothing factor that controls the weight given to past values in the calculation of a smoothed value.

The indexes of the start and end of the range are obtained to define the working range:

    self.index_min = int(self.data.index[ self.data[self.x] == self.x_start].to_list()[0])          
    self.index_max = int(self.data.index[ self.data[self.x] == self.x_end].to_list()[0])

After that, a new array is created with indexes reset from 0 to N, the algorithms and the plots will be performed on this new array:

    self.new_x_axis = self.data[self.x][self.index_min: self.index_max]                                    
    self.new_y_axis = self.data[self.y][self.index_min: self.index_max]

    self.new_x_axis = self.new_x_axis.reset_index(drop = True)
    self.new_y_axis = self.new_y_axis.reset_index(drop = True)
    self.y_filtered = []
    self.y_filt_complete = []

Moving Average Filter

The moving average filter works by taking the average of a set of data points over a certain window size, and using this average as the estimate of the signal value at any given point. It is used for signal processing, finance, and engineering, where the goal is to remove high-frequency noise and obtain a clearer representation of the underlying signal. 

Parameters
Name Type Range/values Description
self.new_y_axis String - Declaration of the Y axis (Data that will be analyzed)
self.window_size Int 2 - 15 Refers to the number of data points used to calculate the smoothed value for a given point.
Description of the method

The method is defined and the array that will display the smoothed values is initialized to 0.

    def mov_average_filter(self):

            i = 0    
            moving_averages = []

A loop will iterate the number of times of the lenght of the data that will be smoothed, minus the size of the window. e.g., 150 rows - 10 as window size = 145 iterations.

    while i < len(self.new_y_axis) - self.window :

On each iteration, starting from position 0, the average of the following N numbers will be calculated and the result will be appended to the new smoothed array.

        window_sum = self.new_y_axis[i : i + self.window]                             
        window_average = round(sum(window_sum) / self.window, 2)                      
        moving_averages.append(window_average)                                         
        i += 1

As a time offset is generated when working with a moving average filter, due to the calculation of the current average based on future/past samples. A compensation before and after the smoothed array is implemented.

    for i in range (0, round(self.window/2), 1):                                        
        moving_averages.insert(0,None)                                                 
    for i in range (0, round(self.window/2), 1):
        moving_averages.append(None)

The final smoothed values are returned.

    y_filtered = moving_averages                                                     
    return y_filtered

Savitzky-Golay Filter

The Savitzky-Golay smoothing filter works by fitting a polynomial of a certain order to a set of data points and using this polynomial to estimate  the value of the signal at any given point.It is useful in situations where it is important to preserve features such as peaks and valleys in the data. 

Parameters
Name Type Range/values Description
self.new_y_axis String - Declaration of the Y axis (Data that will be analyzed)
self.window_size int 2 - 15 Refers to the number of data points used to calculate the smoothed value for a given point.
self.poly_degree int 1 - 5 Refers to the order of the polynomial that is fit to the data within the window
Description of the method

The method is defined and calculated with the function 'savgol_filter' provided by the scipy.signal library.

def savgol_filter(self):

    y_filtered = savgol_filter(self.new_y_axis, self.window, self.grade)                    
    return y_filtered

Exponential Smoothing Filter

Exponential smoothing works by weighting the past data points in a signal exponentially, with more recent data  points receiving higher weight than older data points. The smoothed signal value at any given point is a weighted average of the past data points,  with the weights decaying exponentially over time. It is particularly useful when the signal has trends or seasonality, as it can effectively capture these patterns. 

Parameters
Name Type Range/values Description
self.new_y_axis String - Declaration of the Y axis (Data that will be analyzed)
self.alpha float 0 - 1 Is the smoothing factor that controls the weight given to past values in the calculation of a smoothed value.
Description of the method

The method is defined and the filtered array is initialized to 0.

def exponential_filter(self):

    self.y_filtered = [self.new_y_axis[0]]

A loop iterates from 0 to the lenght of the original data.

    for i in range(1, len(self.new_y_axis)):

On each iteration, the new smoothed value is calculated from the following calculation. The result is appended to the smoothed array.

        smoothed_val = self.alpha * self.new_y_axis[i] + (1 - self.alpha) * self.y_filtered[i-1]
        self.y_filtered.append(smoothed_val)

Create New DataFrame

Generate a new Dataframe with the smoothed values obtained by the method selected

Parameters
Name Type Range/values Description
method_name String - Provide the method selected by the user.
df_to_change dataframe - Assign a new dataframe that will be filled with the filtered values.
Description of the method

After adjusting the parameters acording with the desired output, a for loop is performed to run the filtering algorithm in every column of the dataframe.

The result will be stored in the new dataframe called "df_to_change"

    df = self.data
    column_to_skip = self.x

    if method_name == 'method_name':
        for column in df.columns:
            if column != column_to_skip:
                    #The filtering method will be performed here...
                df_to_change[column] = filtered_column