FPGA compression of ECG signals by using modified convolution scheme of the Discrete Wavelet Transform

This paper presents FPGA design of ECG compression by using the Discrete Wavelet Transform (DWT) and one lossless encoding method. Unlike the classical works based on off-line mode, the current work allows the real-time processing of the ECG signal to reduce the redundant information. A model is developed for a fixed-point convolution scheme which has a good performance in relation to the throughput, the latency, the maximum frequency of operation and the quality of the compressed signal. The quantization of the coefficients of the filters and the selected fixed-threshold give a low error in relation to clinical applications.


INTRODUCTION
Discrete Wavelet Transform (DWT) has been used in the last years in applications of signal processing like denoising, compression and coding.Methods for both offline and online mode have been proposed.In the first, the information is processed frame-by-frame; in the second, it is processed sample-by-sample.
In denoising and compression methods, the DWT is accompanied by a thresholding stage to reduce the redundant information [1].The threshold controls the compression ratio (CR) of the system.The threshold higher, the compression ratio higher.The limit of the threshold is related to the desired Percent-Root-Mean-Square-Difference (PRD) of the compressed (or filtered) signal [2].When the input is a biomedical signal such as the ECG signal, the PRD should be lower than 4% to guarantee that clinical useful information is kept [3].The selection of the threshold can be due to the energy packing efficiency [4], fixed percentage [5], [6] or universal threshold [7].
In off-line mode, the algorithms can reach compression ratio up to 20:1 with a PRD lower than 10% [8]- [10].However these methods are not suitable for real-time implantation neither for stand-alone applications.
Because in portable devices it is desirable the real-time processing, the design of new methods or the adaptation of the known methods allows the transmission or storage of the input signal, sample-by-sample.The problem is design a realtime architecture for the compression of ECG signal with low latency, low error of quantization, low losing of information and high compression ratio.
For the hardware realization of the DWT, the classical schemes are the based on the convolution and the lifting scheme [11]- [13].The convolution scheme demands massive operations, which implies hardware consumption, and it is not efficient because the half of data is eliminated in the subsampling process; while the lifting scheme reduces the operations in three steps: split, prediction and update.The disadvantage of the lifting scheme is that the lifting coefficients are not integer; therefore, the scheme requires float-point multipliers and float point adders [14].Although some modifications have been proposed, the use of float-point modules are necessary in most of the schemes based on the lifting design.To overcome this restriction, we propose a scheme which takes advantage of both the convolution and lifting schemes.The output of the each filter is calculated by a convolution process, but, a split step is added in our proposal.In our scheme, the modules (adder, multiplier) work in integer format and the system only calculates the outputs that are not eliminated in the subsampling process.Summarizing, we propose an integer-to-integer wavelet transform scheme which reduces the hardware resources of the convolution one and it does not use float-point modules such as the lifting scheme.
Finally, one lossless encoding method is added to the architecture of compression of the ECG signal.According to [15], Huffman encoding and Runlength (RL) encoding provides similar results of CR and PRD, but RL is more suitable for real-time applications.Because it is a lossless encoding method, the PRD is only due to the quantization error and the thresholding process.If an adequate threshold is selected and a low error of quantization is used, the compressed signal should be closer to the ECG.

ARCHITECTURE OF THE DISCRETE WAVELET TRANSFORM
The two classical schemes to perform the Discrete Wavelet Transform are the convolution (or filter bank) and lifting scheme.

Convolution scheme
It is based on two FIR filters and one subsampling process.The detail and coarse coefficients for one level of decomposition are obtained according to Figure 1.The symbol 2 means subsampling by 2, dropping sampling with odd indexes [16].In other words, after the convolution process between x[n] and [h 1 h 0 ], the odd samples of the outputs are eliminated.In this scheme, half of all operations are wasted, because only the halves of the data are used.

Lifting scheme
This scheme is based on three processes: split the input data, prediction and updating.The block diagram is presented in Figure 2. The input data are split in two parts, even and odd samples; the prediction step produces the detail coefficients and the update step generates the coarse representation of the input signal.This scheme has been used with biorthogonal filters like 9/7 DWT.In that case, six constants are included in the architecture: a, b, g, d, k, 1/k.The disadvantage is that the constants are not integer and they are represented by 18 bits in fixed-point format, 2 for the integer part and 16 for the right side of the point [14]; or by 10 bits in fixed-point format, 8 for the right side of the point [17].Then, the arithmetic operations are in float point.

Efficient convolution scheme
The aims of this scheme are reducing the operations of the classical convolution scheme and avoiding operations with float-point format of the lifting scheme.Unlike the lifting scheme which splits the input data, our scheme splits the clock signal and the filtering is calculated in alternate clock cycles.The coarse (c 1 ) and detail (d 1 ) coefficients are calculated according to: for n= 1,3,5,… Where M is the length of the FIR filters, [h 1 h 0 ] are the impulse response of the low and high pass filters, respectively; and x[n] is the input signal.According to eq. ( 1), ( 2), the coarse coefficients are calculated in the even cycles of the clock signal; while the detail coefficients in the odd cycles.Since the detail coefficients are obtained one cycle after of the even positions, it is necessary to include an additional delay in its mathematic formula.Then, the even values of the detail coefficients are obtained by: for n= 0,2,4,… With this approach, only the halves of the operations are calculated and the throughput of the system is the double of the classical convolution scheme.On the other hand, all of the hardware modules (adder, multiplier) can operate with integer data if [h 0 h 1 ] are encoded in an integer binary format.

HARDWARE IMPLEMENTATION
We implemented an 8-bit integer-to-integer efficient convolution scheme of the DWT, family sym4, using a FPGA of Xilinx.The dwt block includes the following modules: div_2, bank of register, coefficients and multiplier/adder.Additionally, the thresholding and encoding process are added to the compression scheme of the ECG signal.The high level description is written in VHDL code and it is simulated on ModelSim.Finally, the code is synthetized using a Spartan3E-100, and validated with real ECG signals.The general architecture is illustrated in Figure 3.  Coefficients: according to the value of div_2, the coefficients of the low pass filter or the high pass filter are selected.If div_2='1' it selects h 0 , but if div_2='0' it selects h 1 .The binary representation of h 0 is presented in Table 1.In a similar way, the coefficients of the high pass filter are encoded with 7-bits.
Multiplier/Adder: this block computes the convolution between x[n] and the impulse response of the FIR filters.If div_2='1'then it works as a low pass filter; while if div_2='0' it works as a high pass filter.Because [h 0 h 1 ] are unsigned, the equations ( 1) and ( 2) are transformed, for the case of sym4, as: Where y is the input (c 1 or d 1 ), th is the threshold and f(y) is the thresholded coefficient.The threshold uses in this work is the proposed in [15].
According to the thresholding-encoding scheme presented in [18], three flags are calculated: b1, b2 and b3.The meaning is presented in Table 2. Encoding: it is based on the run-length encoding method.The run-length is a lossless encoding method that takes advantage of the consecutive repetitions of a specific number [19].Because the thresholding step sets to zero a large number of coefficients, the run-length scheme represents the data by the zero follows by the total of repetitions.If the coefficient is different to zero, the encoded data is equal to the wavelet coefficient.In our architecture, the output of the system is data and row; data is the encoded wavelet coefficient and row is the position into the run-length code.According to the value of the flags b 1 , b 2 , b 3 , the row is updated.(Table 3).Every time that f(x)=0and b3='1', the counter increases its value; its mean the flag count account the total of consecutive zeros.When a new data different of zero appear, the last value of the count is written in the run-length code, follows by the new data, f(x).

RESULTS
In this section we present some results related to performance of the proposed model.The quality of the hardware architecture and the compression algorithm are measured.First, the work is analyzed in terms of the metrics of hardware.Second, the CR and PRD are measured.
Performance of the Hardware Architecture: the FPGA Spartan3E-100 (BASYS2 board) of Xilinx is programmed with the VHDL code.Additionally an A/D and D/A blocks are connected to the FPGA for the hardware validation of the compression scheme.Four works of hardware realizations of the DWT have been selected with the purpose of comparing the performance of the algorithm.Two of them correspond to convolution scheme and the others to lifting scheme.In Table 4, the metrics are shown.
In Table 4, Scheme corresponds to the based on convolution (conv), lifting (lif) or modified convolution (mc); Mode is off-line if the data is processed frame-by-frame or real-time if it is processed sample-by-sample; Base corresponds to biorthogonal (Bior) or Orthogonal (Orth); Format of quantized data is fixed-point (F-P) or Integer (Int); Error of quantization is the produced by the quantization of the coefficients of the FIR filters (it is measured for an input signal equal to a constant); Maximum Delay is the time that the DWT block takes to calculate the detail and approximation coefficient (it is obtained from the synthesized tool); while Latency is the times of cycles of the clock signal to obtain the output from a specific input (it is tested by the simulation on ModelSim).
According to Table 4, our design has the lowest error of quantization, which is desirable to obtain a low value of PRD.On the other hand, the latency of our work allows that the answer of the system will be faster than the answer of the other works.
Finally, the proposed model can work with signals with higher bandwidth (such as the speech signals) than the signals in the convolution scheme, because the maximum delay is lower.
Unlike some works whose eliminated completely the detail coefficients, our work kept the coefficients higher than a fixed-threshold.In Figures 6 and 7, we present an example of one ECG signal from the Fluke PS420 Multiparameter Patient Simulator as the input of the system.It was configured with 60 beats per minute (bpm).Additionally, the coarse and detail coefficients are calculated.According to Figure 6, the highest amplitude of the coarse coefficients is the quarter of the highest of the ECG signal, for two reasons: first, the filters [h 0 h 1 ] were multiplied by 125 but their outputs were divided by 256; it implies the half of the amplitude; second, while the input signal is 8-bits in unsigned format, the wavelet coefficients [c 1 d 1 ] are in 8-bits signed format (7-bits for the amplitude).Additionally, it is notice that not all the detail coefficients are set to zero.The results are in agreement with theoretical results.The data are obtained from the hardware results using the Fluke PS420 Multiparameter Patient Simulator with bpm=60, 90, 120.The average is plotted in each case.

Performance of the Compression
The quality of the compressed signal is measured with the Percent-Root-Mean-Square-Difference (PRD), according to: Where x i is the original signal from the ECG record, x i is the compressed signal and L is the length of the signals.In Figure 9, the performance of the compression model related to the PRD is presented.
According to the Figures 8 and 9, the compression ratio of the proposed system is up to 8 for a threshold of 10.It could be slightly better if the PRD is in the limit of 4%.Nevertheless it is evident that if PRD increases, then CR increases too.Because the PRD in the entire works is not ever in the same range, a parameter that helps to compare the tradeoff between the CR and the PRD is the Quality Score (QS) [25].This is the relation between the CR and the PRD, represented as: The higher QS, the hiquer relationship between the CR and the PRD.In Figure 10, the QS for the four levels of decomposition is presented.5.
According to Table 5, our systems has better CR than [26], [28], but lower than the others.Nevertheless, our proposal can be work in real-time without units of pre-processing or post-processing.The works that used Huffman encoding are not suitable for sample-by-sample mode, because they need a prior knowledge of the data.This is the main difference between our proposal and those in the literature.

CONCLUSIONS
This paper describes a modified scheme of the convolution one which has the same throughput of the lifting scheme, because only the even wavelet coefficients are calculated.The maximum frequency of operation is higher than in the convolution scheme and is similar than in the lifting scheme.Because our architecture not needs external memories, the system works in sample-by-sample mode.The low error of quantization helps to keep the quality of the signal, because the experimental values (wavelet coefficients) are similar than the theoretical values.
Comparing to others compression models, our proposal has similar results in relation to the compression ratio, but the QS could be better.Nevertheless, the PRD satisfied the requirements of clinical applications.This work may improvement with a variable quantization of the wavelet coefficients.

Figure 3 .
Figure 3. Architecture of the proposed scheme.

5 )
Since x[n] is encoded with 8 bits and [h 0 h 1 ] is encoded with 7 bits, the output of the convolution is represented by 16 bits.The coefficients c 1 and d 1 correspond to the 8 most significant bits of the output (the 8 LSBs are ignored); it means the output is divided by 256.The circuit of this block is presented in Figure5.

Table 3 .
Output of the encoding block.

Table 4 .
Comparison to related works: hardware metrics of the DWT block.

Table 5 .
Comparison to related works: compression model.