Pre-processing
Class for performing preprocessing on the loaded data.
Preprocessor
dataclass
Class which implements generic functionality for pre-processing of sensor and meteorology information.
Attributes:
Name | Type | Description |
---|---|---|
time_bin_edges |
DatetimeArray
|
edges of the time bins to be used for smoothing/interpolation. |
sensor_object |
SensorGroup
|
sensor group object containing raw data. |
met_object |
Meteorology
|
met object containing raw data. |
aggregate_function |
str
|
function to be used for aggregation of data. Defaults to mean. |
sensor_fields |
list
|
standard list of sensor attributes that we wish to regularize and/or filter. |
met_fields |
list
|
standard list of meteorology attributes that we wish to regularize/filter. |
Source code in src/pyelq/preprocessing.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
|
__post_init__()
Initialise the class.
Attaching the sensor and meteorology objects as attributes, and running initial regularization and NaN filtering steps.
Before running the regularization & NaN filtering, the function ensures that u_component and v_component are present as fields on met_object. The post-smoothing wind speed and direction are then calculated from the smoothed u and v components, to eliminate the need to take means of directions when binning.
The sensor and meteorology group objects attached to the class will have identical numbers of data points per device, identical time stamps, and be free of NaNs.
Source code in src/pyelq/preprocessing.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
regularize_data()
Smoothing or interpolation of data onto a common set of time points.
Function which takes in sensor and meteorology objects containing raw data (on original time points), and smooths or interpolates these onto a common set of time points.
When a SensorGroup object is supplied, the function will return a SensorGroup object with the same number of sensors. When a MeteorologyGroup object is supplied, the function will return a MeteorologyGroup object with the same number of objects. When a Meteorology object is supplied, the function will return a MeteorologyGroup object with the same number of objects as there is sensors in the SensorGroup object. The individual Meteorology objects will be identical.
Assumes that sensor_object and met_object attributes contain the RAW data, on the original time stamps, as loaded from file/API using the relevant data access class.
After the function has been run, the sensor and meteorology group objects attached to the class as attributes will have identical time stamps, but may still contain NaNs.
Source code in src/pyelq/preprocessing.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
filter_nans()
Filter out data points where any of the specified sensor or meteorology fields has a NaN value.
Assumes that sensor_object and met_object attributes have first been passed through the regularize_data function, and thus have fields on aligned time grids.
Function first works through all sensor and meteorology fields and finds indices of all times where there is a NaN value in any field. Then, it uses the resulting index to filter all fields.
The result of this function is that the sensor_object and met_object attributes of the class are updated, any NaN values having been removed.
Source code in src/pyelq/preprocessing.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
filter_on_met(filter_variable, lower_limit=None, upper_limit=None)
Filter the supplied data on given properties of the meteorological data.
Assumes that the SensorGroup and MeteorologyGroup objects attached as attributes have corresponding values (one per sensor device), and have attributes that have been pre-smoothed/interpolated onto a common time grid per device.
The result of this function is that the sensor_object and met_object attributes are updated with the filtered versions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filter_variable
|
list of str
|
list of meteorology variables that we wish to use for filtering. |
required |
lower_limit
|
list of float
|
list of lower limits associated with the variables in filter_variables. Defaults to None. |
None
|
upper_limit
|
list of float
|
list of upper limits associated with the variables in filter_variables. Defaults to None. |
None
|
Source code in src/pyelq/preprocessing.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
block_data(time_edges, data_object)
Break the supplied data group objects into time-blocked chunks.
Returning a list of sensor and meteorology group objects per time chunk.
If there is no data for a given device in a particular period, then that device is simply dropped from the group object in that block.
Either a SensorGroup or a MeteorologyGroup object can be supplied, and the list of blocked objects returned will be of the same type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
time_edges
|
DatetimeArray
|
[(n_period + 1) x 1] array of edges of the time bins to be used for dividing the data into blocks. |
required |
data_object
|
SensorGroup or MeteorologyGroup
|
data object containing either or meteorological data, to be divided into blocks. |
required |
Returns:
Name | Type | Description |
---|---|---|
data_list |
list
|
list of [n_period x 1] data objects, each list element being either a SensorGroup or MeteorologyGroup object (depending on the input) containing the data for the corresponding period. |
Source code in src/pyelq/preprocessing.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
filter_object_fields(data_object, fields, index)
staticmethod
Apply a filter index to all the fields in a given data object.
Can be used for either a Sensor or Meteorology object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_object
|
Union[Sensor, Meteorology]
|
sensor or meteorology object (corresponding to a single device) for which fields are to be filtered. |
required |
fields
|
list
|
list of field names to be filtered. |
required |
index
|
ndarray
|
filter index. |
required |
Returns:
Type | Description |
---|---|
Union[Sensor, Meteorology]
|
Union[Sensor, Meteorology]: filtered data object. |
Source code in src/pyelq/preprocessing.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
|
interpolate_single_met_object(met_in_object)
Interpolate a single Meteorology object onto the time grid of the class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
met_in_object
|
Meteorology
|
Meteorology object to be interpolated onto the time grid of the class. |
required |
Returns:
Name | Type | Description |
---|---|---|
met_out_object |
Meteorology
|
interpolated Meteorology object. |
Source code in src/pyelq/preprocessing.py
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
|