

Journal of Advances in Computer Research Quarterly pISSN: 2345-606x eISSN: 2345-6078 Sari Branch, Islamic Azad University, Sari, I.R.Iran (Vol. 8, No. 4, November 2017), Pages: 87-94 www.jacr.jausari.ac.ir



# Efficient Reverse Converter for Three Moduli Set ${2^n - 1, 2^{n+1} - 1, 2^n}$ in Multi-Part RNS

Shiva Taghipour Eivazi<sup>⊠</sup>

Department of Computer Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran

taghipour@iaut.ac.ir

Received: 2016/10/17; Accepted: 2017/01/07

## Abstract

Residue Number System is a numerical system which arithmetic operations are performed parallelly. One of the main factors that affect the system's performance is the complexity of reverse converter. It should be noted that the complexity of this part should not affect the earned speed of parallelly performed arithmetic unit. Therefore in this paper a high speed converter for moduli set  $\{2^n - 1, 2^{n+1} - 1, 2^n\}$  is proposed which is based on Two-Part RNS and Chinese Reminder Theorem. Using this method has increased the speed of reverse converter. To have an accurate comparison both unit gate model and synthesized silicon tools are used and their parameters are compared in terms of delay and area. Converters are implemented in hardware description language and correctness for various n values is verified by simulation and execution on Cadence. As the results show, the proposed circuit has lower delay by around 21% in comparison to previous presented converter.

Keywords: Chinese Remainder Theorem (CRT), Computer Arithmetic, Parallel Processing, Residue Number System (RNS), R/B Converter, VLSI Architectures

# 1. Introduction

Residue number system (RNS) is a numerical system which is able to perform the arithmetic operations parallelly on remainders of the selected moduli set. Because the residues are smaller than the original numbers and due to the carry free nature of RNS, arithmetic operations are performed fast [1], [2]; therefore this system is suitable for applications that perform arithmetic operations such as addition, subtraction and multiplication widely on a special boundary such as Digital Signal Processing (DSP) [3], image processing [4], [5] and digital filters [6], [7].

To use RNS, numbers should be converted into the residue representation following the arithmetic operations. Then the resulting numbers are converted to the desired system.

By the above mentioned description, the converter's operations are the most overloaded computation of RNS that will lead to increase of time complexity. Fortunately forward converter could be implemented simply by using modular adders [8]. But backward converter is the most complex step which burdens a high time complexity to the system. To decrease the overload of reverse converter using multipart RNS is suggested in this paper.

## 2. Multi-Part RNS

Consider number (A) in binary representation, be represented by  $\{a_{m-1}, a_{m-2}, ..., a_0\}$ .

To convert this number to multi-part RNS, the original number is partitioned into some parts. Each of which requires a corresponding and independent moduli sets. For example the number below is partitioned into two parts.

$$\{\underbrace{a_{m-1}, a_{m-2}, ..., a_n}_{part(a)m-nbit}, \underbrace{\{a_{n-1}, a_{n-2}, ..., a_0\}}_{part(b)nbit}\}$$

The following steps are performed for each part:

The forward converter is implemented parallelly.

The arithmetic operations are performed on each part's moduli set. Note that the output carry of each part should be carried out to the next.

The backward converter of each part is defined independently.

Due to the second part, it is sufficient to implement a two-part RNS, leading limited carry propagation among separated parts, as it is mentioned in [9].

Consider the moduli set  $\{2^n - 1, 2^{n+1} - 1, 2^n\}$  [10]. The dynamic range of this moduli set in usual RNS is calculated as follows:

$$M = \prod_{i=1}^{3} m_i = (2^n - 1) \times (2^{n+1} - 1) \times 2^n = 2^{3n+1} - 2^{2n+1} - 2^{2n} + 2^n$$
(1)

To implement the two-part RNS, it is adequate to keep the n lowest value bits in one part via moduli  $\{2^n\}$  and convert the other 2n+1 most value bits to RNS via moduli set  $\{2^n - 1, 2^{n+1} - 1\}$  for the next part. Let  $0 \le X < M$  be with binary representation. By applying two-part RNS for moduli set  $\{2^n - 1, 2^{n+1} - 1, 2^n\}$  we have:

$$X = \underbrace{\underbrace{X_{3(n)} X_{3(n-1)} \dots X_{n+1} X_n}_{part(a)} \underbrace{X_{n-1} X_{n-2} \dots X_1 X_0}_{part(b)}}_{\{2^n - 1, 2^{2n+1} - 1\}} \underbrace{X_{n-1} X_{n-2} \dots X_1 X_0}_{\{2^n\}}}_{\{2^n\}}$$

Then the numbers of modulis are reduced to two in part(a), which leads less complexity of converters [8].

Example 1:

Consider weighted number 620 that is  $(1001101100)_2$  in binary form and given moduli set is  $\{2^n - 1, 2^{n+1} - 1, 2^n\}$  where n=3, therefore modulis are  $\{7, 15\}$  for part(a) and  $\{8\}$  for

part(b). 620 is represented in two-part RNS as 
$$\left| \underbrace{1001101}_{part(a)=77} \underbrace{100}_{part(b)=4} \atop module \ set \{7,15\}} \underbrace{100}_{module \ set \{8\}} \right|_{2} = \{0_{mod 7}, 2_{mod 15}\} |\{4_{mod 8}\}.$$

Moreover, multi part system increases the dynamic range which can be calculated as follows:

$$M(a) = (2^{n} - 1) \times (2^{n+1} - 1) = 2^{2n+1} - 2^{n} + 1 = \underbrace{11...1}_{n-1} \underbrace{0100...0}_{n-1} 1$$
(2)

As mentioned before, this number creates the high order 2n+1 bit. The lowest value n bits comes from the second part. Therefore final dynamic range is calculated as follows:

$$M_{total} = \underbrace{(2^{2n+1} - 2^n + 1)}_{part(a)} \times 2^n + \underbrace{2^n}_{part(b)}$$
(3)

Thus it increases dynamic range by  $2^n$ .

## 3. Suggested Reverse Converter

For two-part RNS, two backward converters are required to form original number as follow:

A backward converter for moduli set  $\{2^n - 1, 2^{n+1} - 1\}$  which forms the most 2n+1 bits value.

A backward converter for the moduli set  $\{2^n\}$  to form the n low bits.

# 3.1 Reverse converter for part (a) with moduli set $\{2^n - 1, 2^{n+1} - 1\}$ :

The moduli in first part reduce to  $\{2^n - 1, 2^{n+1} - 1\}$  where  $\langle x_1, x_2 \rangle$  stand the reminder of each moduli respectively.

Both Chinese remainder theorem (CRT) [8] and mixed-radix conversion (MRC) [11] are algorithms that are used to implement a converter. In this paper CRT is used as (4).

$$\boldsymbol{X} = \left\langle \sum_{i=1}^{n} (\boldsymbol{X}_{i} \times \boldsymbol{N}_{i})_{m_{i}} \times \boldsymbol{M}_{i} \right\rangle_{M}$$
(4)

Where:

$$M = \frac{\prod_{i=1}^{n} m_i}{GCD(m_i, m_j), i \neq j}$$
(5)

$$M_i = \frac{M}{m_i} \tag{6}$$

$$\boldsymbol{N}_i = \langle \boldsymbol{M}_i^{-1} \rangle_{\boldsymbol{m}_i} \tag{7}$$

Due to the number of moduli in first part, a two channel CRT should be used [12].M is equal to dynamic range of part(a) that is calculated in (2). As (6),  $M_i$  for each modulus is equal to:

$$M_{1} = \frac{(2^{n} - 1) \times (2^{n+1} - 1)}{2^{n} - 1} = 2^{n+1} - 1$$
(8)

$$M_2 = \frac{(2^n - 1) \times (2^{n+1} - 1)}{2^{n+1} - 1} = 2^n - 1$$
(9)

The multiplicative inverse for each moduli is as (10) and (11).

$$N_1 = \langle M_1^{-1} \rangle_{2^{n+1}-1} = 1 \tag{10}$$

$$N_2 = \langle M_2^{-1} \rangle_{2^n - 1} = 2^{n+1} - 3 \tag{11}$$

Therefore X for moduli set  $\{2^n - 1, 2^{n+1} - 1\}$  is calculated as follow:

$$X(a) = \left\langle ((x_1 \times N_1)_{2^{n-1}} \times (2^{n+1} - 1)) + ((x_2 \times N_2)_{2^{n+1} - 1} \times (2^n - 1)) \right\rangle_{M(a)}$$
(12)

$$X(a) = \left\langle (x_1 \times (2^{n+1} - 1)) + ((x_2 \times 2^{n+1} - 3)_{2^{n+1} - 1} \times (2^n - 1)) \right\rangle_{M(a)}$$
(13)

Where  $\langle (x_2 \times 2^{n+1} - 3)_{2^{n+1}-1} \rangle_{M(a)}$  is equal to  $|(\overline{x_{2(n-1)}} \overline{x_{2(n-2)}} \dots \overline{x_{2(1)}} \overline{x_{2(n)}})|_{M(a)}$ . By replacing (13) we have:

$$X(a) = \left\langle (x_1 \times (2^{n+1} - 1)) + (\overline{x_{2(n-1)}} \overline{x_{2(n-2)}} \dots \overline{x_{2(1)}} \overline{x_{2(n)}}) \times (2^n - 1)) \right\rangle_{M(a)}$$
(14)

$$X(a) = \left\langle (x_1 \times 2^{n+1}) - x_1 + (\overline{x_{2(n-1)}} \overline{x_{2(n-2)}} \dots \overline{x_{2(1)}} \overline{x_{2(n)}} \times 2^n) - \overline{x_{2(n-1)}} \overline{x_{2(n-2)}} \dots \overline{x_{2(1)}} \overline{x_{2(n)}} \right\rangle_{M(a)} (15)$$

By using (15) implementation of reverse converter is performed as follow. Note that two's complement is applied to perform the subtraction.

$$A = \underbrace{X_{1(n-1)}X_{1(n-2)}...X_{1(0)}}_{n} \underbrace{X_{2(n-1)}X_{2(n-2)}...X_{2(0)}X_{2(n)}}_{n+1}$$
(16)

$$B = \underbrace{\overline{X_{2(n-1)}} \overline{X_{2(n-2)}} \dots \overline{X_{2(0)}} \overline{X_{2(n)}}}_{n+1} \underbrace{\overline{X_{1(n-1)}} \overline{X_{1(n-2)}} \dots \overline{X_{1(0)}}}_{n}$$
(17)

$$C = \underbrace{11...1}_{n-1} \underbrace{0100...0}_{n-2} 10 \tag{18}$$

As seen A, B, C are with 2n+1 bits. The calculations are as following:

$$X(a) = \left| A + B + C \right|_{M(a)} \tag{19}$$

The below steps demonstrate the new residue to binary converter algorithms.

Calculate X(a) = |A + B + C|

If  $(X(a) < M_a)$  stop. (The result obtained)

Otherwise  $(X(a) - M_a)$  is as output.

# **3.2** Reverse converter for part(b) with modulus set $\{2^n\}$ :

 $x_3$  is the reminder of moduli  $\{2^n\}$  in part(b). As known, to implement the reverse converter for moduli  $\{2^n\}$  no hardware implementation is required and choosing the n lowest order bits of  $x_3$  is sufficient [12].

Finally, the resulted X for two-part RNS via moduli set  $\{2^n - 1, 2^{n+1} - 1, 2^n\}$  can be obtained from (20).

$$X = \underbrace{X(a)}_{\substack{part(a)\\(2n+1)bit}} \times 2^{n} + \underbrace{X(b)}_{\substack{part(b)\\(n)bit}}$$
(20)

## 4. Hardware Implementation

Fig1 illustrates the hardware implementation of the proposed residue to binary converter in two-part RNS. Due to (16-18), A, B and C are generated. As the first part of above algorithm (A+B+C) should be calculated which is first done by the 2n+1 bit Carry Save Adder (1) then the resulted 2n+1 bit carry and sum, should be added by carry propagate adder (1).

To implement the modular adder in part(a) as third step of algorithm, the subtraction is implemented by 2's complement adder, therefore 2n+1 bit carry save adder (CSA 2) is applied to add the resulted carry1 and sum1 with two's complement of M(a) which is named as D in Fig1 and it is equal to 00...01011...1.

Then the resulted 2n+1 bit carry2 and sum2 should be added that is done by carry propagate adder (2). The output carry of CPA (2) determines which result is valid for the part (a).

Finally the n low bits from part (b), form the final result's n lowest bits is generated.



Figure 1. Proposed reverse converter for moduli set  $\{2^n - 1, 2^{n+1} - 1, 2^n\}$ 

Example 2:

Consider n=3, given residues are  $\langle x_1, x_2 \rangle | x_3 = \langle 3_{mod7}, 4_{mod15} \rangle | 5_{mod8}$  respectively. As (16-18) we have:

A = 0111000, B = 0111100, C = 1101010

Due to Fig1 *carry*1=111000 and *sum*1=1101110. Therefore *R*1=1011110. By adding two's complement of M(a) R2 is generated that is  $R2 = \bigcup_{outputCarry} 1110110$ . Due to the output carry, R1 is chosen in this example. Therefore the final X is formed as X = (1110110101)

R1 is chosen in this example. Therefore the final X is formed as  $X = (1110110101)_2$ . Where  $(101)_2$  is  $x_3$ .

Example 3:

Again consider n = 3. The given residues  $are < x_1, x_2 > |x_3| = <1_{mod7}, 0_{mod15} > |2_{mod8}$ . Therefore:

A = 0010000, B = 1111110, C = 1101010

Due to Fig1  $carry_1 = 1110100$  and  $sum_1 = 0000100$ . We have  $R_1 = 1111000$ .

By adding two's complement of M(a) R2 is generated that is R2 = 1 0001111. Due

to the output carry value, R2 is chosen as output. Finally X is formed as  $X = (0001111010)_2$ . Where  $(010)_2$  is  $x_3$ .

## 5. Comparison

Recently presented reverse converters for moduli set  $\{2^n - 1, 2^{n+1} - 1, 2^n\}$  have been introduced in [13-15]. Note that in this paper a FA with constant input ("0" or "1") is considered as HA. Due to table 1, in comparison to the listed reverse converters, the proposed circuit in this paper has lower delay.

| Converter | INV  | MUX  | ROM       | HA   | FA     | Delay                                                                       |
|-----------|------|------|-----------|------|--------|-----------------------------------------------------------------------------|
| Proposed  | 2n+1 | 2n+1 | -         | 4n+4 | 4n     | $t_{not}+t_{mux}+3_{tHA}+(2n)t_{FA}$                                        |
| [15]      | 4n+1 | -    | -         | 2n+1 | 4n+1   | (6n+5)t <sub>FA</sub>                                                       |
| [14]      | 3n+2 | n+1  | -         | 2n+1 | 5n+3   | $2t_{not}+t_{mux}+(n+1)t_{HA}+(3n+4)t_{FA}$                                 |
| [13]C-I   | 5n+4 | -    | -         | n    | 4n+3   | 4t <sub>not</sub> +nt <sub>HA</sub> +(5n+5)t <sub>FA</sub>                  |
| [13]C-II  | 5n+2 | 2n+1 | -         | 2n+3 | 14n+21 | t <sub>not</sub> +t <sub>mux</sub> +2t <sub>HA</sub> +(2n+4)t <sub>FA</sub> |
| [13]C-III | 5n+2 | 2n+1 | 10×(2n+1) | 2n+2 | 12n+19 | $t_{not}+t_{mux}+2t_{HA}+(2n+4)t_{FA}$                                      |

Table 1. Performance Comparison

To provide a fair comparison the proposed circuit is compared with the previous implementations by considering unit gate model which is used in [16, 17]. The results are reported in Table 2.

| Converters | Unit Gate Area | Unit Gate Delay | Time Complexity                  |
|------------|----------------|-----------------|----------------------------------|
| Proposed   | 52n+20         | 8n+9            | 416 n <sup>2</sup> +628 n+ 180   |
| [15]       | 40n+12         | 24n+10          | 960 n <sup>2</sup> +1488 n+ 360  |
| [14]       | 49n+30         | 14n+22          | 686 n <sup>2</sup> +1498 n+ 660  |
| [13]C-I    | 37n+25         | 22n+24          | 814 n <sup>2</sup> +1438 n+ 600  |
| [13]C-II   | 123n+167       | 8n+25           | 984 n <sup>2</sup> +4411 n+ 4175 |
| [13]C-III  | 103n+146       | 8n+23           | 824 n <sup>2</sup> +3204 n+ 3537 |

Table 2. Total Unit Gate Area and Delay of the Reverse Converters based on unit gate model

As it is shown in table2, the proposed converter in [14] is the best design among previous methods and the suggested two-part method in this paper is better than [14] by 1.64 times. Therefore the proposed design and the converter in [14] are implemented and the correctness of two converters is checked by running on the corresponding VHDL codes. Furthermore, we have used these codes to synthesize (Design vision) and then Cadence (SOC encounter) using a target library based on TSMC 0.18  $\mu$ m. The working frequency is 155 MHZ where voltage supply is considered 1.8V.

| Table 5. Synthesizes results |           |           |           |                        |  |  |
|------------------------------|-----------|-----------|-----------|------------------------|--|--|
| Moduli set                   | Converter | Area(µm2) | Delay(ns) | Min working period(ns) |  |  |
| {3,7,4}                      | Proposed  | 580*580   | 1.16      | 2                      |  |  |
|                              | [14]      | 550*520   | 1.32      | 5                      |  |  |
| {15,31,16}                   | Proposed  | 700*700   | 1.23      | 2.5                    |  |  |
|                              | [14]      | 640*610   | 1.40      | 6                      |  |  |
| {31,63,32}                   | Proposed  | 760*760   | 1.25      | 2.5                    |  |  |
|                              | [14]      | 670*670   | 1.41      | 6                      |  |  |
| {255,511,256}                | Proposed  | 940*940   | 1.51      | 2.5                    |  |  |
|                              | [14]      | 820*790   | 1.90      | 6.5                    |  |  |

| Table 3. | <b>Synthesizes</b> | results |
|----------|--------------------|---------|
|----------|--------------------|---------|

The performance evaluation has been done in terms of both area and delay (with Minimum working period). Table 3 has presented the earned results. As shown in the table the proposed design is the fastest design for any moduli set

The reason of this improvement is related to using multi-part method that facilitates the implementation of the proposed converter and also the simple architecture of proposed converter that is mainly consists of adders which offers lower Area and Delay metric in comparison previous reverse converters.

The final layout of proposed converter for n=8 is shown in Figure 2.



Figure 2. Chip layout of proposed backward converter for n=8.

## 6. Conclusion

In this paper a high speed low complexity reverse converter have been proposed for moduli set  $\{2^n - 1, 2^{n+1} - 1, 2^n\}$ . The novel converter has been designed in two-part structure and in parallel which results in increased conversion speed. To have an accurate comparison both unit gate model and simulation are used. As the results of unit gate illustrates the proposed method indicates an improvement of about 1.68 time complexity. Also, the novel design is compared accurately with the related state of the art design by running the corresponding VHDL codes with cadences. As the results indicate, for any n, the proposed converter is improved in delay point because of using multi-part approach.

#### References

- A. Hiasat, "A Reverse Converter and sign detectors for an Extended RNS Five-Moduli set," IEEE Transactions on Circuits and Systems I, Regular Papers, 2017.
- [2] Sh. Taghipour, M. Hosseinzadeh, and O. Mirmotahhari, "Fully Parallel Comparator for the moduli set {2<sup>n</sup>, 2<sup>n</sup> -1, 2<sup>n</sup> +1}," IEICE Electronic Express, 2011.
- [3] G. Cardarilli, A. Nannarelli, and M. Re, "Residue number system for low-power DSP applications," in Proc. 41st IEEE Asilomar Conference Signals, System, Computer, 2007.
- [4] W. Wei, M.N.s. Swamy, and M. O. Ahamd, "RNS application for digital image processing," in Proc. 4<sup>th</sup> IEEE International Workshop System on Chip Real Time Applications, 2004.
- [5] A. Ammar, A. AlKabbany, M. Youssef, and A. Emam, "A secure image coding using residue number systems," In Processing, 18<sup>th</sup> national Radio Science Conference, 2001.
- [6] R. Conway, and J. Nelson, "Improved RNS FIR filter architectures," IEEE Transactions on Circuits Systems II, Express Briefs, 2004.
- [7] V. Singh, "Improved state-space criterion for global asymptotic stability of fixed-point statespace digital filters with saturation arithmetic," The Arabian Journal for Science and Engineering, 2007.
- [8] K. Navi, A. Mollahosseini, and M. Esmaeildoust, "How to Teach Residue Number System to Computer Scientists and Engineers," IEEE Transaction on Education, 2011.
- [9] B. Kazemzadeh, and Sh. TaghipourEivazi, "High Speed Comparator by using two-Part RNS," International Journal of Computer Science and Information Security, 2016.
- [10] Sh. TaghipourEivazi, M. Hosseinzadeh, and A. HabibiZadnavin, "Efficient RNS Converter via Two-Part RNS," Journal of Circuits, Systems, and Computers, 2015.
- [11] B. Cao, C. Chang, and T. Srikanthan, "Adder based residue to binary converters for a new balanced 4-moduli set," Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis, 2003.
- [12] M. Jameii, Sh. Taghipoureivazi, and M. azad, "Using both Binary and Residue Representations for Achieving Fast Converters in RNS," Journal of Advances in Computer Research, 2011.
- [13] P.V.A Mohan, "RNS-to-binary converter for a new three-moduli set  $(2^{n+1}-1,2^n,2^n-1)$ ," IEEE Transactions on Circuits and Systems II, 2007.
- [14] S. LIN, M. SHEU, and C. Hsiang, "Efficient VLSI Design of Residue-to-Binary Converter for the Moduli set (2<sup>n</sup>, 2<sup>n+1</sup> - 1, 2<sup>n</sup> - 1)," IEICE Transactions on Information System, 2008.
- [15] Y. Chingkuo, M. Siao, Ch. Huang, and T. Chen, "New Reverse Converter Design of Moduli Set  $\{2^n, 2^{n+1}-1, 2^n-1\}$ ," Second International Conference on Innovations in Bio-inspired Computing and Applications, 2011.
- [16] M. R. Noorimehr, M. Hosseinzadeh, and R. Farshidi, "High Speed Residue to Binary Converter for the New Four-Moduli Set  $\{2^{2n}, 2^n + 1, 2^{n/2} + 1, 2^{n/2} 1\}$ ," Arabian Journal of Science and Engineering, 2014.
- [17] M. R. Noorimehr, M. Hosseinzadeh, and K. Navi, "Efficient reverse converters for 4-moduli sets  $\{2^{2n-1}-1,2^n,2^n+1,2^n-1\}$  and  $\{2^{2n-1}-1,2^{2n-1},2^n+1,2^n-1\}$  based on CRTs algorithm," Circuits Systems and Signal Processing, 2014.