--- library_name: transformers tags: - biology - electrocardiogram language: - en pipeline_tag: feature-extraction --- # 📚 What is RPeaks2HRV? A pipeline to derive heart rate variability (HRV) features from R-Peaks derived from an electrocardiogram (ECG) signal. *Note*: You need to process raw ECG signals? Consider using [ECG2HRV](https://huggingface.co/hubii-world/ecg-to-hrv-pipeline) instead! # Usage Instructions ## 0 (Optional but recommeded) Set up Virtual Environment ## 1 Install Pipeline Dependencies In order to use the pipeline, you need to install some dependencies the pipeline relies on. Run the following command to install the dependencies defined in requirements_rpeak2hrv_pipeline.txt. You can get the file from this repository. ```python %pip install -r requirements_rpeak2hrv_pipeline.txt ``` ## 2 Instantiate Pipeline ```python from transformers import pipeline rpeak2hrv_pipeline = pipeline(model = "hubii-world/rpeaks-to-hrv-pipeline", trust_remote_code=True) ``` ## 3 Pipeline Parameters & Supported File Formats ### Overiew: Parameters The pipeline provides a variety of different parameters that can be set to adjust the preprocessing behavior. The following sections explain the individual parameters in detail and provide illustrative examples. #### Mandatory Parameters In general, the pipeline relies on 2 mandatory parameters the user has to set for every parameter execution: | Parameter name | Type | Default value | Description | |----------------|------|---------------|-------------| | `inputs` | _str_ or _Dataframe_ | No default value | The input that should be processed by the pipeline. This can either be a path to a file containing the data to process or the data itself | | `feature_domains`| _list[str]_ | ['time', 'freq', 'non_lin'] | The domains the pipeline should calculate features for. | | `sampling_rate` | _int_ | 1000 | The sampling rate of the continuous cardiac signal in which peaks occur | #### Optional Parameters Besides the mandatory parameters, the pipeline offers multiple optional parameters that may be necessary to set in order to compute correct HRV-features: | Parameter name | Type | Default value | Description | |----------------|------|---------------|-------------| | `time_header` | _str_| 'SystemTime' | The name of the data column that contains the timestamp to which the respective values in the same row are recorded | | `rri_header` | _str_| 'interbeat_interval' | The name of the data column that contains the RR-Intervals in msec | | `windowing_method` | _str_| None | The method that should be applied to divide the raw data into windows. Default setting is None, so no windowing is applied | | `window_size` | _str_| '60s' | The size of a window in terms of a time frame. Only relevant if windowing should be applied to the data | ### 3.1 `inputs` The `inputs` parameter represents the data the pipeline should process to HRV-Features. The pipeline supports values of type _str_ and _Dataframe_ as input. When providing the `inputs` as string, it has to represent a file path to a file containing the data to process. Supported file formats are .csv and .txt. Alternatively, you can also provide the data directly to the pipeline in form of a _DataFrame_. #### Example: Provide input as file path ```python file_path = "./Example_data/RRIntervalExample.csv" result = rpeak2hrv_pipeline(inputs=file_path, sampling_rate=1000) result.head() ``` ### 3.2 `feature_domains` The `feature_domains` parameter controls which domain features the pipeline calculates. The domains are provided to the pipeline as an array of keys. Supported keys are: | Key | Description | |-----|-------------| | 'time' | pipeline calculates time-domain HRV metrics | | 'freq' | pipeline calculates frequency-domain HRV metrics | | 'non_lin' | pipeline calculates non-linear HRV indices | For additional information regarding the calculated features, consult the [NeuroKit2 documentation](https://neuropsychology.github.io/NeuroKit/functions/hrv.html#). Per default, the pipeline will calculate features for all three domains. ##### Example: Feature domains In the following code, the pipeline only calculates time- and non-lin HRV indices for the provided data ```python file_path = "./Example_data/RRIntervalExample.csv" result = rpeak2hrv_pipeline(inputs=file_path, feature_domains=['time', 'non_lin'], sampling_rate=1000) result.head() ``` ### 3.3 `sampling_rate` The `sampling_rate` (Hz) represents the rate with which the sensor sampled data from the patient. It has to be provided as integer. In the example above, you can see a configuration where the `sampling_rate` is set to 1000. The default rate is 1000 Hz, meaning that the sensor sampled 1000 values per second. ### 3.4 `time_header` & `rri_header` `time_header` and `rri_header` are important settings to define the structure of the data the pipeline has to process. In general, the pipeline supports two possible data formats: - R Peak Flags - RR-Intervals with timestamps #### 3.4.1 R Peak Flags The first format option is defined by a _Dataframe_ with one column named `'ECG_R_Peaks'`. The column values are simple binary flags indicating whether a R peak occured or not. This is the standard data format used by neurokit2 to represent R peaks. If you use this data format, you do not need to specify `time_header` and `rri_header`. __Important__: Make sure that the column has the correct name and that you specify the correct `sampling_rate`, as this is indispensable information to compute the correct HRV-Features. ##### Example: R Peak Flags The following code generates an example for a _DataFrame_ containing R Peak Flags ```python import pandas as pd df = pd.read_csv("./Example_data/RPeaksDataExample.csv") df.head() ``` You can process this data without setting `time_header`and `rri_header` ```python result = rpeak2hrv_pipeline(inputs=df, sampling_rate=1000) result.head() ``` #### 3.4.2 RR-Intervals with timestamps The second format option is defined by a _DataFrame_ with two columns containing the RR-Intervals in milliseconds and the corresponding timestamps at which the RR-intervals have been recorded by the sensor. Here, `time_header` speficies the column name containing the timestamps and `rri_header` speficies the column containing the RR-intervals. The default column names are `'SystemTime'` and `'interbeat_intervals'`. ##### Example: RR-Intervals with timestamps The following code generates an example for a _DataFrame_ containing RR intervals and their timestamps. ```python import pandas as pd df = pd.read_csv("./Example_data/RRIntervalExample.csv") df.head() ``` As in this example the column names match the default values of `time_header` and `rr_header`, you also do not need to specify them individually to process the data. ```python result = rpeak2hrv_pipeline(inputs=df, sampling_rate=1000) result.head() ``` ### 3.5 `windowing_method` The `windowing_method` defines the method to be used to divide the raw data into windows. The supported settings are: | Parameter value | Description | |-----------------|-------------| |'rolling' | Creates a window rolling over the data. For more information see [pandas.DataFrame.rolling()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html) | |'first_interval' | Keeps the data values that are recorded within the __first__ timeframe defined by _window_size_ and omits the rest | |'last_interval' | Keeps the data values that are recorded within the __last__ timeframe defined by _window_size_ and omits the rest | #### Example: 'first_interval'-windowing The following code snippet shows an exemplary usage of first_interval windowing. In this example, only the values recorded within the first 5 minutes of the data collection are used to compute HRV-Features. ```python file_path = "./Example_data/RRIntervalExample.csv" result = rpeak2hrv_pipeline(inputs=file_path, windowing_method="first_interval", window_size="5m", sampling_rate=1000) result.head() ``` ### 3.6 `window_size` The `window_size` defines the size of the windows the data should be divided into. In general, the definition follows this pattern: '{any positive integer}{t}', where t is an element of {'d', 'h', 'm', 's'}. For example: the setting '20m' represents a window size of 20 minutes. The default setting is '60s' corresponding to a window size of a minute. Setting this parameter is only necessary, if you want to apply windowing. #### Example: Window size In the following code, a rolling window of 5 minutes is applied to the data. For each window, the pipeline then calculates the HRV-Features and creates a new row in the result _DataFrame_. The pipeline returns a _DataFrame_ in which each row represents a specific window. For each window, the corresponding starting and ending timestamps are included in the result. ```python file_path = "./Example_data/RRIntervalExample.csv" result = rpeak2hrv_pipeline(inputs=file_path, windowing_method="rolling", window_size="5m", sampling_rate=1000) result.head() ``` ### 3.7 Supported file formats As already mentioned in Section 3.1, the pipeline can process 2 types of data formats when providing a file path: .csv and .txt. When using a .csv file, the pipeline supports two column seprarators: ',' and ';'. The pipeline recognizes the column separator in the .csv file automatically. When using a .txt file, the pipeline only supports the column separator '\t'. Make sure your data file matches this requirement before providing it to the pipeline. #### Example: Provide .csv file to pipeline The following example provides a .csv file to the pipeline and lets it calculate the HRV-Features on the first 10 minutes of the data. ```python file_path = "./Example_data/RRIntervalExample.csv" result = rpeak2hrv_pipeline(inputs=file_path, windowing_method="first_interval", window_size="10m", sampling_rate=1000) result.head() ``` #### Example: Provide .txt file to pipeline The same can be done using a .txt file. ```python file_path = "./Example_data/RRIntervalExample.txt" result = rpeak2hrv_pipeline(inputs=file_path, windowing_method="first_interval", window_size="10m", sampling_rate=1000) result.head() ```