Design of image codec based on Bandelet transform using a NIOS II processor Diseño de un codec de imágenes basado en la transformada Bandelet utilizando un procesador NIOS II

This paper presents the design and implementation of a compression system for grayscale images based on the Bandelet transform. The basis functions of the Bandelet transform are constructed as a set of vectors that indicate the directions in which the image has regular variations of gray. The compression system was designed as a SoPC and was composed of a NIOS II processor with a Cyclone II EP2C70, a touch-panel, and a SD-Card, using 13% of the logic elements and 27% of the memory bits of the FPGA. The Wavelet filters were accelerated in hardware with NIOS II C2H Compiler, obtaining an execution time reduction of 8.8%. Experimental results show that Bandelet compression has an improvement of up to 2 dB over a Wavelet compression when the image has geometric components with high contrast.


INTRODUCTION
Basis functions used in JPEG and JPEG2000 (Cosine and Wavelet respectively) do not take advantage of the geometric regularities of images, i.e. image regions with regular variations of gray levels [1].The integration of these regularities in the representation of two dimensional signals has several advantages in image processing applications especially those related with compression and denoising; this integration is achieved by the Bandelet transform [1][2][3][4].Bandelet basis are defined using image segmentation and geometric-flow vectors in order to find the best geometric approximation model of the image.The purpose of this transform is to reduce the geometric redundancy of 2D Wavelet coefficients by applying a reordering, followed by a 1D Discrete Wavelet transform (DWT) [3].
The Bandelet transform has been used in the improvement of the JPEG2000 standard [5], satellite image compression [6][7], and image denoising [8][9], among other applications.These works have been focused in software platforms and only one implementation for video scaling in television formats using FPGAs is found in the literature [10].However, no related information about its design is found in the academic databases.Given the potential of the Bandelet transform in image compression, it is necessary to perform more studies that analyze the design and implementation in hardware platforms of image codecs (encoder-decoder) based on this transform, taking advantage of the parallel processing capabilities of FPGAs.This document presents the design of an image compression codec in which the image frequency coefficients are obtained with the Bandelet transform, using a hardware/software implementation based on the Bandelet algorithm by [3].Due to the high computational load involved in image processing, the design is aimed towards a System-on-Programmable-Chip (SoPC) implementation, with an Altera NIOS II embedded configurable processor as its main functional block.This paper is organized as follows: Next section introduces the Bandelet transform and the theoretical framework that allows its use in an image compression system; it also reviews some of the previous related works.Third section presents the design of the image codec based on the Bandelet transform using the Bandelet approximation algorithm.Fourth section focuses on the performance evaluation of the Bandelet codec over a codec based on the 2D Wavelet transform.Decompressed images are evaluated using the PSNR and the number of nonzero coefficients, among other metrics.Finally, the concluding remarks are presented in the Conclusions section.

Bandelet orthonormal bases
The representation of an image with Wavelet bases generates a redundancy of geometrical information around edges and irregular textures.This is translated in the presence of high-magnitude coefficients in the singularities of the image, as shown in Figure 1 [11].
The objective of the Bandelet transform is to remove this redundancy by applying a 1D DWT in the local directions where 2D DWT coefficients have regular variations of gray levels.If the 1D DWT is applied as parallel as possible to the real geometry of an image section, then it is possible to perform a thresholding of the corresponding 1D DWT coefficients, reducing the number of nonzero coefficients without loss of image quality.
The set of Bandelet orthonormal bases is defined by segmenting the array of 2D DWT coefficients in squares of various sizes that are subsequently processed with the 1D DWT.The segmentation is performed in a dyadic fashion, successively dividing the array into four squares S of equal size L [11] (called sub-squares).Then, the direction d that is best adjusted to the local geometry of the sub-square is sought (there are up to 2L2 possible directions in a square of size L).The criterion used to select the best direction d is the minimization of the Lagrangian L. A complete explanation of the Bandelet approximation algorithm for the implementation of the Bandelet transform can be found in [3].
For every image, several sets of bases can be constructed using different segmentations and applying the 1D DWT in each square in different directions.The optimal set of Bandelet orthonormal bases will be the one that best represents the real geometry of the image.

Previous works
Previous works made with the Bandelet transform have been primarily aimed to software platforms, without having in consideration an efficient implementation of the Bandelet algorithm.
In [5], the Bandelet transform is incorporated in the JasPer software that implements the Part 1 of the JPEG2000 standard in order to retain more detail information of images.However, the paper only focuses in the improvement of the quality of the image.
The performance of a Wavelet image compression system is improved in [6] by post-processing the Wavelet coefficients with the Bandelet transform.Due to the low computational complexity of the proposed system, the authors recommended it for satellite image compression applications but the design and implementation of the system is left for future work.
In [7], the Bandelet transform with an adaptive quadtree partition is used for the compression of SAR (Synthetic Aperture Radar) images.In this work, the Bandelet coefficients are encoded using the EBCOT (Embedded Block Coding with Optimal Truncation) coding algorithm, obtaining a better quality than the one obtained with JPEG2000 or the Bandelet transform solely.
Other authors have taken advantage of the geometrical properties of the Bandelet transform to improve the performance of image denoising systems [8][9].These works report that the Bandelet transform outperforms the Wavelet and Contourlet transforms in denoising applications.However, the authors of these works only explain the denoising algorithm but not the implementation details.
Among the few hardware implementations of the Bandelet transform found in literature is [10].In this paper, the Bandelet bases are used for the upconversion of video in NTSC and PAL formats to HD formats.The Bandelet bases are adapted to the time geometry of the movements by following time displacements.This technique helps to calculate the missing pixels of HD frames without oscillatory artifacts.The design is implemented in an Altera Cyclone FPGA but the document does not present major experimental results or implementation details.
From the analysis made to the previous papers, it is clear that several researchers have seen the capabilities of the Bandelet transform in image processing, particularly in compression and denoising.But there is not enough research in efficient implementations of this transform using reconfigurable hardware.

DESIGN OF THE CODEC BASED ON THE BANDELET TRANSFORM
The Bandelet codec for grayscale images was designed based on the Bandelet approximation algorithm [3].This codec includes both the Bandelet image compressor and decompressor.It was designed in language C targeting a NIOS II processor; some of its software functions were later accelerated with Altera NIOS II C2H Compiler.
The image encoder is composed of three main blocks: 2D Wavelet transform, image segmentation, and extraction of points (Figure 2).The system output are the Bandelet coefficients, which can be later encoded with any of the techniques defined by JPEG or JPEG2000 standards (RLE, Huffman, arithmetic encoding, etc.).

2D Wavelet transform Image segmentation
For each square of the segmentation Projection of points

2D Wavelet transform
The 2D Wavelet transform is the entry point of the encoder and generates the 2D Wavelet coefficients in the horizontal, vertical, and diagonal orientations, for each scale of decomposition.The filters used in the 2D DWT (as well as in the 1D DWT of the Extraction of Points block) are the odd-length QMF (Quadrature Mirror Filters) filters Le Gall 5/3; these filters are recommended by the JPEG2000 standard for lossless compression.The numbers indicate the size of the low-pass and high-pass filters (5 and 3 taps respectively) used in the decomposition of the signal [12].

Acceleration of filters with NIOS II C2H compiler
The software routines of the Wavelet transform filters were accelerated in hardware using the Altera NIOS II C2H Compiler [13].They were selected since they are the most used functions in the codec.The NIOS II C2H Compiler generates custom hardware accelerators directly from their software description in language C using the available resources of FPGA.
To obtain the maximum algorithm acceleration, the software routines has to be rewritten to meet the software-hardware mapping requirements of the NIOS II C2H Compiler.This includes avoiding data dependencies, data cache coherency problems, and excessive pointer dereferences.

Processing of signal borders in the Wavelet transform
In order to achieve an exact reconstruction of the image and a non-expansive Wavelet transform, the rows and columns of the image (which are finite signals) were properly treated at the borders during the implementation of the 2D and 1D DWT [14].The input signal of the filters (one row or one column) was extended in a symmetric way at both ends, without repeating the first and last samples, for an amount equal to the filter length minus one.The down-sampling of the filter outputs was performed in an opposite way.For the low-pass filter output, only the odd-numbered elements were kept, while for the high-pass filter outputs, the even-numbered elements were maintained.

Image segmentation
According to [3], the two-dimensional array of 2D Wavelet coefficients must be segmented using an optimal dyadic configuration.The number and size of the segmentation squares are selected with the evaluation of the Lagrangian.To simplify the design and execution of the encoder, the segmentation was performed with fixed-size squares.This size is denominated inside the algorithm as w and can be modified by the user at the beginning of the encoder execution.

Extraction of points
Each square of w 2 elements is processed by the extraction of points block, which determines its best geometric direction d and returns the corresponding Bandelet coefficients.This block is composed of three sub-blocks: projection of points, 1D Wavelet transform, and Lagrangian calculation.
The projection of points projects the square pixels orthogonally onto each possible direction d and rearranges them in a one-dimensional array f d .These rearrangements are obtained by the rotation of the vertical midline of the square and the subsequent orthogonal projection of the center of each pixel to this line, following the method presented in [3] (Figure 3).Every signal f d is processed with the 1D DWT and for each output signal f w , the Lagrangian is calculated.The coefficients of the signal f w with the minimum Lagrangian are thresholded by user's threshold T. The resulting coefficients are the Bandelet coefficients of the image.Figure 3. Projection of points in a 4x4 square (-0.5258 rad).The resulting ordering is:

Bandelet decoder
The Bandelet decoding process is faster than the encoding process as it does not need to calculate the Lagrangian for every possible direction of each square.In the decoding process, the set of Bandelet coefficients of each square passes through the 1D Inverse DWT, obtaining the reconstruction of the square pixels arranged in a one-dimensional manner.
The pixels are subsequently reordered in the twodimensional space of the square according to the best direction d found in the encoding process.
The arrangement of the pixels of all squares of the image produces the reconstructed 2D Wavelet coefficients that are processed with the 2D Inverse DWT, generating the decompressed image.The 1D and 2D Inverse DWT uses the reconstruction filters Le Gall 5/3 and also treats the problems of finite signals at the borders.

Hardware platform
The platform used for the implementation of the Bandelet codec is presented in Figure 4  At the end of the process, the touch panel displays several images associated with the Bandelet algorithm, such as the original image, the 2D Wavelet coefficients, the Bandelet coefficients, the 2D Wavelet coefficients reconstructed from Bandelet coefficients, and the reconstructed image with the Bandelet decoder.

PERFORMANCE EVALUATION OF THE CODEC
The performance of the Bandelet codec was evaluated for various compression ratios using grayscale images of 128x128 pixels (16.384 bytes).Images were selected with various geometric components so that it was possible to observe the operation of the codec in different types of images.
Each image was processed with the Bandelet codec using a threshold T as input parameter.The decoded image with this codec was then compared with the image obtained from a Wavelet codec.In the latter case, the original image was passed through the 2D DWT and the resulting coefficients were discriminated according to the user's threshold, but without applying the Bandelet processing to them.
As quality metric, the PSNR (Signal-to-Noise Ratio) was used.The number of scales in the Wavelet transform was three, having a fixed segmentation of squares with w = 8.This size was selected after performing several tests with images as it allows having a compromise between performance and algorithm execution time: With a square of 4x4, it is not possible to clearly identify the local geometry of the image, and with a 16x16 square, the algorithm takes too long to determine the best geometric direction d, because the maximum number of possible directions is quadratically proportional to the size of the square.

Analysis of image compression
Figure 5 presents the compression of a section of Barb at 0.26 bpp.In the decompressed images from both codecs, Barb's face is blurred.However, in the reconstructed image from the 2D Wavelet coefficients her mouth can be better appreciated than in the Bandelet case.
As for the Barb's garment, the line pattern is still distinguishable in the image from the Bandelet decoder, whereas in the Wavelet case this pattern has almost completely vanished.In turn, a pattern of diamonds has become dominant, making no longer possible to determine the original direction of the garment lines.
Figure 6 shows the Bandelet and the 2D Wavelet coefficients of Figure 5.The reconstructed 2D DWT coefficients from the Bandelet coefficients have more details and therefore more information than the thresholded 2D Wavelet coefficients.This enables the Bandelet codec to produce a better decompressed image, which is translated into a higher PSNR for the image compressed with Bandelets than the PSNR obtained with Wavelets.
Figure 7 presents the PSNR for the image of Figure 5 for various compression rates.For all the rates, Bandelet processing produces a higher PSNR than that obtained with 2D Wavelet processing; the maximum difference is up to 2 dB aprox.for the same compression ratio.
The superiority of Bandelet processing over 2D Wavelet processing is largely due to the presence of significant geometric components in the analyzed image (such as the garment that covers Barb's head).This allows that Bandelet transform capabilities are used to the maximum.
Figure 8 presents the compression of another section of Barb.Some of the geometric elements of this image are the straight lines from the books and the library, and the pattern of lines in the tablecloth, which is similar to the garment in Barb's head.These elements do not have a high contrast to differentiate them from the objects around.Therefore, their geometry is not completely distinguishable by the Bandelet transform (the library for example has almost the same color as the floor).
Because of this, the decoded images at 0.11 bpp with the Bandelet transform and the 2D Wavelet transform are very similar.For other compression rates, the same trend remains as shown in Figure 9.
Similar compression results were presented with Lena. Figure 10 shows a section of this image compressed at 0.19 bpp.In this case, the Wavelet decoder has a slight advantage over the Bandelet decoder (22.12 dB for the former compared to 21.89 dB for the latter).
This advantage is not significant when the graph in Figure 11 is analyzed, corresponding to the PSNR for various compression rates for this image.The fingerprint compressed with Bandelets keeps more ridge characteristics than the cone compressed with Wavelets, allowing for a best identification of the image in a security system.Figure 13 presents the calculated PSNR for this image for various compression rates, with a clear superiority of the Bandelet transform over the 2D Wavelet transform.

Analysis of the execution times of the encoder
The execution times of the Bandelet encoder were measured using the Altera timestamp timer for the image fingerprint.bmp of Figure 12.Two scenarios were considered for the analysis: The encoder   The time reduction percentage in the DBT was calculated for various data and instruction cache sizes in the NIOS II processor.It was found that the maximum time reduction percentage (8.8%) is presented with a 1kB data cache and a 16kB instruction cache, having the input and output data vectors of the DWT stored in the FPGA onchip memory and the filter coefficients defined as constants in the code.However, the minimum DBT execution time (129.43s) is obtained when a 32kB instruction cache is used.The size of the data cache has a significant impact on the performance of the DBT since every time the system calls the wavelet filter accelerators, the processor flushes the data cache and write it to memory to avoid data cache coherency problems.

CONCLUSIONS
This paper has presented a codec for grayscale images based on the Bandelet transform with using a fixed segmentation into squares of 8x8 pixels and the JPEG2000 filters Le Gall 5/3.The system was implemented as a hardware/software co-design in an Altera Cyclone II FPGA using an Altera NIOS II processor.The execution time of the Bandelet codec was reduced an 8.8% by the acceleration in hardware of its Wavelet filters using the NIOS II C2H compiler.This reduction was also influenced by the size of the data and instruction cache memories of the NIOS II processor.It was shown that if the target image has highly distinguishable geometric components, the Bandelet codec can offer an improvement of up to 2dB compared to a codec based uniquely on the 2D Wavelet transform for the same compression ratio.The performance of the Bandelet codec was mainly limited by the fixed segmentation.
Future work would be aimed to improve the performance of the codec by the implementation of an adaptive dyadic segmentation of the 2D Wavelet coefficients.An adaptive segmentation could take more advantage of the geometrical regularities of the image, obtaining a higher compression ratio.Taking into account that the best geometrical direction must be found independently for each square of the segmentation, the throughput of the codec could be improved if several groups of squares are processed in parallel.In this sense, future work would be also aimed to implement the Bandelet algorithm completely in hardware using hardware description languages.The objective is to have in the future a Bandelet codec as a co-processor implemented in an FPGA that can be easily integrated in an image compression system.

Figure 1 .
Figure 1.Presence of high-magnitude coefficients in the DWT-2D of a section of Lena.

Figure 2 .
Figure 2. Block diagram of the image encoder based on the Bandelet transform.

Figure 7 .
Figure 7. PSNR vs. bpp for the section of Barb analyzed in Figure 5.

Figure 12
Figure 12  presents another example of an image that is best compressed with Bandelet transform; in this case, the area of interest is the center of a fingerprint[15].

Data Cache 1kB Data Vectors in On-Chip Mem Filter Coeff as Constants
implemented only in language C routines and the encoder with the filters Le Gall 5/3 accelerated in hardware with NIOS II C2H Compiler.The complete timing analysis is presented in Table1, where the extraction of points time corresponds only to the diagonal orientation of the upper scale and the Direct Bandelet Transform (DBT) time refers to the complete compression process.
Figure 13.PSNR vs. bpp for fingerprint.bmp.Table 1.Time reduction percentage introduced to the Bandelet encoder by the acceleration of the Wavelet Filters (all units in seconds).fingerprint.bmpThreshold = 0.20