Delay-power efficient VLSI architecture design for robust proportionate adaptive filter

Gangadharaiah Soralamavu Lakshmaiah¹, Chikkajala Krishnappa Narayanappa², Divya Muddenahalli Narasimhaiah³, Munivenkatappa Nagabhushanam⁴, Nuthan Prasad Venkatesh⁵, Bhanu Darshan Srinivas Shobhavathi⁶

¹VTU Research Centre, Department of Electronics and Communication, M. S. Ramaiah Institute of Technology, Bengaluru, India
²Department of Medical Electronics, M. S. Ramaiah Institute of Technology, Bengaluru, India
³VTU Research Centre, Department of Medical Electronics, M.S. Ramaiah Institute of Technology, Bengaluru, India
⁴Department of Electronics and Communication, M. S. Ramaiah Institute of Technology, Bengaluru, India
⁵Department of Medical Electronics, M. S. Ramaiah Institute of Technology, Bengaluru, India
⁶Department of Medical Electronics, M. S. Ramaiah Institute of Technology, Bengaluru, India

ABSTRACT
This paper proposes the robust proportionate adaptive filtering algorithms and their respective efficient very large-scale integration (VLSI) architectures for sparse system identification under impulsive noise, several types of algorithms are combined to obtain optimum results. Here, we rendered a relative analysis on these algorithms and the algorithms are mapped on to the hardware to show that the improvement is obtained with respect to convergence rate and hardware complexity of VLSI architectures and has negligible hardware overhead with improved robustness. Good performance and convergence rate is obtained by combining the delayed μ-law proportionate (DMP) and least mean logarithmic square (LMLS) algorithms i.e. delayed μ-law proportionate least mean logarithmic square (DMP LMLS). Robust proportionate adaptive filter is coded in system verilog and synthesized using cadence genius compiler with 90 nm technology library.

This is an open access article under the CC BY-SA license.

1. INTRODUCTION
The filtering problem and its solution was elaborated in [1] with the most popular adaptive algorithm i.e. The least mean square (LMS) algorithm, which is extensively used because of its robustness and simplicity in its structure. Different types of LMS algorithms addressing the various problems incorporated with it, is exposed and thoroughly analyzed in the literature with an extreme specialized importance given to error non-linearities, Gaussian and impulsive noise. Firstly, addressing the problems incorporated with error non-linearities. The successor variant of LMS algorithm was the least mean fourth (LMF) algorithm, where the regular error function becomes the error function raised to fourth of its power [2], which performs excellently compared with LMS algorithm in non-gaussian environment [3] with a low signal to noise ratio (SNR), but results in poor stability issues when the environment is a gaussian data [4]. So, an idea was proposed [5] by incorporating both LMS and LMF algorithms to obtain the optimistic result i.e. least mean mixed norm (LMMN) but it was later found to be difficult to implement in practical applications.

On the other hand, a new approach inspired by the competitive methods was proposed in [6] which is based on the logarithmic cost functions i.e. least mean logarithmic square (LMLS) which eliminates the
problem faced in LMMN algorithm (combination of both LMS and LMF algorithms) by providing good convergence rate and stability but lacks against impulsive environment. Another algorithm proposed in the same [6] i.e. least logarithmic absolute difference (LLAD) shows good convergence and outperforms the sign-LMS algorithm (SA) [7] in impulsive environments but finds stability issues under gaussian environments. Whereas, SA [7] provides good convergence in impulsive environments but has a poor steady state response in gaussian environments. A quantized approach from kernel adaptive filtering [8] i.e. quantized kernel LMS (QK-LMS) which compress the area by using the redundant data for updating the coefficient which has the closest center provides an excellent improvement of convergence rate in impulsive environments and also provides reduction in area.

To reduce the gap between the problems associated with error non-linearities, A delayed approach concept was proposed in [9] where the LMLs algorithms were deliberately delayed in line with the constraints of [10] to improve stability in the case of delayed LMLS (DLMLs) and a new variable “α” is introduced in the same [9] for the obtained improvement, then the same approach was used to the LLAD algorithm to obtain new algorithm i.e. delayed LLAD (DLLAD) which outperforms previous algorithms against impulsive noise environments and this can be implemented by using the technique of retiming [10]. The above discussed algorithms provide a new way for implementing the VLSI architecture for sparse system identification [11] in practical applications of filters. Now, let us discuss the literature for alternate way of approach for sparse system identification [11] in communication applications such as network echo cancellation (NEC) [12], underwater acoustic communication (UAC) [13], HD TV terrestrial transmission (HDTTT) [14].

With the advent of the above applications [12]-[15], the proportionate type algorithms [16] are one of the hotline topics for the researchers. These [16] type of algorithms is based on the normalized LMS (NLMS) algorithms which has a comparatively better convergence rate than the usual LMS algorithm and it can be speeded up by using different time varying step size [17]. The proportionate type LMS (PT-LMS) algorithms are extensively used than the previously proposed proportionate normalized LMS (PNLMS) due to its lower computational complexity [18]. The PNLMs algorithm [18] uses a gain proportional matrix which frequently updates the filter co-efficient at each step proportional to its magnitude, thus it outperforms the both LMS [1] and NLMS [17] algorithms in impulsive environments. But it starts diverging when impulsive response is a dispersive type. The performance and analysis of both PTLMS [16] and PNLMs [18] algorithms were shown both in first and second order convergence analysis in [19], which proposes the use of proportionate LMS (PLMS), which describes the same [18] with an improvement both in performance and convergence rate.

In order to overcome the drawback of PNLMs [18] several algorithms were introduced to make PNLMs more robust against the time varying sparsity [20], [21]. In order to achieve rapid overall convergence when it reaches its true values, a Nominal law-based approach was proposed by introducing μ-law PNLMs (MPNLMs) in [22] and its variant by using a small value of “ε” proximity i.e. ε-Law PNLMs (EPNLMs) in [23]. Both the algorithms i.e. MPNLMs [22] and EPNLMs [23] are used to reduce the updating of gain in larger co-efficient, thus obtaining faster convergence. But both these algorithms start diverging in case of inputs such as speech which is correlated. Hence, wavelet domain MPNLMs (WMPNLMS) is used in such responses [24]. The same concept of delayed approach [9] with the constraints of [10] and also the method for implementing the retiming approach [15] is used for the above algorithms MPNLMs [22] and WMPNLMS [24], which is based on [19] PLMS type of structure, thus it provides a significant improvement in convergence against sparsity and complexity when applied in hardware analysis which gives us the newly proposed algorithms i.e. delayed MPLMS (DMPLMS) and delayed WMPNLMS (DWMPLMS) as shown in [25]. It is to be mentioned that an approach was proposed for building the algorithm in maximum correntropy criterion (MCC) in [26] which was a good choice under impulsive environments in terms of cost but it was outperformed by its successor improved MCC (IP-MCC) in [27].

The next approach to algorithms is derived purely from the above-mentioned algorithms namely LLAD [6], QK-LMS [8], IP-MCC [27] and SLMS [7], in which the delayed concept [9] with its constraints [10] and the retiming approach [15] with the MPNLMS [22] algorithm are very well combined to derive delayed μ-law Proportionate LLAD (DMPPLAD), delayed μ-law proportionate QK-LMS (DMPQK-LMS), delayed μ-law proportionate MCC (DMPMCC) and delayed μ-law proportionate sign-LMS (DMPSLMS) [28] provide a good convergence rate and also exhibit robustness against impulsive responses. Our contributions in the paper includes, Obtaining good performance and convergence rate by proposing an efficient algorithm by combining the delayed μ-law proportionate (DMP) and least mean logarithmic square (LMLS) algorithms i.e. delayed μ-law proportionate least mean logarithmic square (DMLPLMS). Very large-scale integration (VLSI) architecture of all robust proportionate algorithms and their application-specific integrated circuit (ASIC) synthesis findings with respect to 90 nm complimentary metal-oxide semiconductor (CMOS) technology for the evaluation of overhead region for robustness enhancement.
2. **RESEARCH METHOD**

Figure 1 shows the problem of identifying an unknown sparse system. The unknown system “Wopt” is designed by taking the input regressor \(u(n)\) with ‘n’ as the independent time index and providing an output as desirable response \(d(n)\) with respect to the observation noise \(v(n)\).

![Figure 1. Adaptive filter](image)

\[d(n) = (Wopt)^Tu(n) + v(n)\]

where, \(u(n) = [u(n), u(n-1), \ldots, u(n-L+1)]^T\)

\(v(n)\) = noise with variance \((\sigma)^2\) with ‘L’ as the order of the filter designed. The adaptive filtering \((w)\) algorithm is used to estimate the unknown system \((Wopt)\) by updating the filter co-efficient values by,

\[W(n) = [w_0, w_1, w_2, \ldots, w(L-1)]^T, W(n+1) = w(n) + \mu u(n) F(e(n))\]

where, \(\mu\) is the size of adaption, \(F(e(n))\) is the error cost function, in which \(e(n)\) is given by, \(e(n) = d(n) - y(n)\) and the filter output regression is \(y(n) = w(n)^Tu(n)\) [29].

The impulsive noise which comprises of two independent signals, one is the ordinary noise with zero mean and small variance (0.01) and second is the impulsive noise with gaussian environment of large variance (104) which is controlled by Bernoulli random process [24]. For each iteration the frequency of impulsive noise occurrence is set at 5 percent. To test the efficiency of adaptive filtering algorithms, normalized mean square deviation (MSD) (which offers a calculation of how similar the filter weights are to the corresponding optimal weights Wopt at time index n) is commonly known as a metric.

The equation of normalized MSD.

\[
\text{Normalized MSD}(n) = 10 \log_{10} \left( \frac{E(||wopt - w(n)||_2^2)}{||wopt||_2^2} \right)
\]

3. **ARCHITECTURE**

3.1. **Proposed robust proportionate algorithms**

This class of algorithms is obtained by combining two or more algorithms; hence we consider three different types of algorithms, which are combined with the delayed \(\mu\)-law algorithm to obtain the optimum result. The least logarithmic absolute difference (LLAD) [6], quantized kernel LMS [8], sign LMS [7] and improved maximum correntropy criterion [27] which are pretty good at impulse environments. Algorithm 1 is modelled by combining its error computing process with the delayed \(\mu\)-law algorithm to obtain improvement which is shown for QKLMS [8] as shown in (1) [28].
Algorithm 1. DMPLMS Algorithm with $F(e(n))$

Initialization: $w(i) = 0.0 \leq i \leq (L - 1)$

Parameters: $w(i) = 0.0 \leq i \leq (L - 1), \alpha, \mu, \sigma$

Updation:

- $e(n) = d(n) - w^T(n)u(n)$
- $y_i(n) = \log_2 \left( 1 + \left| w(i) + \rho \right|^2 \right)$
- $g_i(n) = \frac{y_i(n)}{\sum_{i=0}^{L-1} y_i(n)}, 0 \leq i \leq (L - 1)$
- $G(n) = \text{diag}(g_0(n), g_1(n), \ldots, g_{L-1}(n))$
- $w(i+1) = w(i) + \mu G(n).F(e(n))$

$$F(e(n)) = \frac{ae^3(n)}{(1 + b|e^3(n)|)}$$

(1)

As shown in (1) was obtained by incorporating QKLMS [8] with delayed $\mu$-law algorithm [25], when undergone into impulsive noise if $e(n)$ becomes large the function of error will be equal to 0, hence showing good robustness against it. If $e(n)$ is small the function of error will be approximately equal to $e(n)$ providing good convergence rate. The same is applicable to other algorithms shown in [28].

The inference drawn from Figure 2 is that robustness under impulsive environments is good in DMPQKLMS & DMPMCC which was shown in [28] and these two algorithms had 3-db improvement in [28] and this algorithm was also computed with change in parameters, which are other than from the [28] to obtain an overall improvement of 12-db with respect to DMPLLAD [28] which was computed with $L = 64$, Sparsity ($S_m$) = 0.9, step size $\mu = 0.0078125$ and cost function $[\alpha_{qklms}, \alpha_{mcc}] = [0.75, 1]$ which was adjusted to obtain optimistic results.

Figure 2. Performance of DMPSLMS/DMPLLAD/DMPMCC/DMPQKLMS

From the previously computed results it was evident that combination of algorithms produces better results both in terms of steady state and convergence under impulsive environments. Hence in order to obtain improvement and also good convergence rate Delayed $\mu$-law algorithm with Least mean logarithmic square LMLS [6] DMPLMLS is proposed in this paper which has more improvement of convergence rate and steady state with reference to DMPMCC and DMPQKLMS [28].

The delayed $\mu$-law least mean logarithmic square (DMPLMLS) was obtained by incorporating $F(e(n))$ as shown in (2) of DMPLMLS in Algorithm 1.

$$F(e(n)) = \frac{ae^3(n)}{(1 + b|e^3(n)|)}$$

(2)

From (2) it can be observed that, $e(n)$ becomes more when impulsive noise occurs, then $\alpha$ minimizes the value in such a way that $F(e(n))$ is almost zero, providing robustness against impulsive noise. If $e(n)$ is small, then it continues its computation to obtain good convergence rate and steady state performance which is shown in Figure 3.

Figure 3 shows the performance analysis of DMPMCC/DMPQKLMS/DMPLMLS in which the DMPLMLS has almost same convergence factor and becomes stable with increase in number of iterations.
Here $L = 64$, Sparsity $(S_m) = 0.9$, step size $\mu = 0.0078125$, zero mean with unit convergence (SNR=30db) and cost function $[\alpha_{qklms}, \alpha_{mcc}, \alpha_{lmls}]=[0.75,1,16]$ is adjusted to get the same convergence. DMPLMLS has a good steady state performance and convergence rate too, the proposed algorithm DMPLMLS shows an improvement of 8db and outperforms other two algorithms significantly. Figure 4 represents the Error computing block used in LMLS algorithm which is derived using logarithmic number system.

4. RESULTS AND DISCUSSION

The simple design Figure 5 used to realize the generated algorithms is shown in Figure 5(a) and the detailed description of one of the tap co-efficient block is shown in Figure 5(b) [29]. It should be noted that the architecture used is based on logarithmic number system for VLSI calculations, this architecture is integrated into the F(e(n)) block of LLAD, QKLMS, MCC and LMLS architecture to realize it in hardware. All derived algorithms are implemented in the System Verilog framework and synthesized using the Cadence Genus architecture consistent with standard cell library 90 nm technology. A 16-bit fixed point representation is used for all the designs. Filters of order 32 and 64 bit are considered and results obtained are data arrival time (DAT) which provides the timing of critical path in the circuit, adaption delay (AD) is used to indicate the number of cycles, area-delay product (ADP) gives the product of area and DAT, the total Delay in the circuit is indicated by D, energy per sample (EPS) provides the product of power and DAT of the circuit which are listed in Table 1. The findings obtained from simulations with white gaussian feedback at full clock frequency show us how stable the derived algorithms are in impulsive environments.
Table 1. Synthesized results of derived algorithms using 90 nm CMOS process

<table>
<thead>
<tr>
<th>DESIGN</th>
<th>FILTER LENGTH</th>
<th>DELAY (ns)</th>
<th>DAT (ns)</th>
<th>FREQ (MHz)</th>
<th>AD (µm2)</th>
<th>ADP (µm2 * ns)</th>
<th>POWER (mw)</th>
<th>EPS (mw * ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>DLMS</td>
<td>32</td>
<td>2.37</td>
<td>5.31</td>
<td>189</td>
<td>5</td>
<td>9010</td>
<td>21353.7</td>
<td>0.091</td>
</tr>
<tr>
<td>[9]</td>
<td>64</td>
<td>2.68</td>
<td>5.31</td>
<td>189</td>
<td>5</td>
<td>19918</td>
<td>53380.24</td>
<td>0.182</td>
</tr>
<tr>
<td>DMPLLAD</td>
<td>32</td>
<td>2.37</td>
<td>5.31</td>
<td>189</td>
<td>6</td>
<td>8920</td>
<td>21140.4</td>
<td>0.046</td>
</tr>
<tr>
<td>DMPQKLMS</td>
<td>32</td>
<td>53.17</td>
<td>5.31</td>
<td>189</td>
<td>6</td>
<td>113641</td>
<td>6042291.9</td>
<td>3.41</td>
</tr>
<tr>
<td>[28]</td>
<td>64</td>
<td>45.03</td>
<td>5.31</td>
<td>189</td>
<td>6</td>
<td>128786</td>
<td>5799233.5</td>
<td>3.78</td>
</tr>
<tr>
<td>DMPMCC</td>
<td>32</td>
<td>54.01</td>
<td>5.31</td>
<td>189</td>
<td>7</td>
<td>29347</td>
<td>1555031.4</td>
<td>1.13</td>
</tr>
<tr>
<td>[28]</td>
<td>64</td>
<td>130.85</td>
<td>5.31</td>
<td>189</td>
<td>7</td>
<td>93718</td>
<td>12263000.3</td>
<td>8.08</td>
</tr>
<tr>
<td>Proposed method [DMPLMLS]</td>
<td>32</td>
<td>26.82</td>
<td>5.31</td>
<td>189</td>
<td>7</td>
<td>42072</td>
<td>1128371.04</td>
<td>0.82</td>
</tr>
<tr>
<td></td>
<td>64</td>
<td>50.76</td>
<td>5.31</td>
<td>189</td>
<td>7</td>
<td>123218</td>
<td>6254545.68</td>
<td>3.44</td>
</tr>
</tbody>
</table>

DAT: Data arrival time, AD: Adaption delay
ADP: Area delay product, EPS: Energy per sample

Figure 5. Basic architecture used to realize the derived algorithm: (a) DLMS architecture and (b) Tap architecture

As shown in Table 1, it shows the performance parameters obtained from synthesis of the derived algorithms. The proposed algorithm DMPLMLS has approximately 3% area improvement when compared with DMPQKLMS and 23% increased with respect to DMPMCC algorithm.

A power reduction of 9% and 57% is obtained with respect to DMPQKLMS and DMPMCC algorithms respectively, which is a better improvement when compared with the current work on VLSI architectures. The adaption delay (AD) of DMPMCC and DMPLMLS is increased by one stage resulting in extra pipeline stages in F(e(n)) block to meet the timing requirement with respect to DMPLLAD and DMPQKLMS. The derived algorithms are synthesized for filter lengths of 32 and 64 bit, compared to other algorithms DMPLMLS has a delay improvement of 61% with respect to DMPMCC and delay increase of 11% with respect to DMPQKLMS.

The area delay product (ADP) of DMPLMLS has an improvement of 49% compared to DMPMCC and decreased by 7% when compared with DMPQKLMS, which is low for robustness enhancement accomplished. From this we can conclude that DMPLMLS has an improved convergence rate of 1.25x, when compared with DMPQKLMS and DMPMCC. From the simulation and synthesized results obtained, it is evident that we can employ the DMPLMLS architecture which has better VLSI solution for the sparse system identification under impulsive noise environments.

5. CONCLUSION

It is understood that from the results obtained by incorporating algorithms from error nonlinearity adaption into the proportionate algorithms provides significantly improved results against impulsive noise environments, with negligible area overhead. In this paper we had reliable algorithms extracted and rendered...
a comparative analysis of all the extracted algorithms in terms of convergence, overhead hardware and performance. We can conclude that to the best of our knowledge DMPLMLS algorithm is a better VLSI solution under impulsive noise environment for sparse system identification.

REFERENCES

BIOGRAPHIES OF AUTHORS

Gangadharaiyah Soralamavu Lakshmaiah obtained his M. Tech in Digital Electronics and Advanced Communication from KREC, Surathkal. Currently, he is pursuing Ph.D. in the area of VLSI signal processing. Presently he is working as Assistant Professor in the Department of Electronics & Communication Engineering, M. S. Ramaiah Institute of Technology, Bengaluru. His areas of interest are Analog VLSI, Digital VLSI, VLSI Signal Processing and Machine Learning. He can be contacted at email: gdhar@msrit.edu.

Chikkajala Krishnappa Narayanappa received Ph.D from VTU, Belagavi. He is currently working as Associate Professor at the department of Medical Electronics, M. S Ramaiah Institute of Technology, Bengaluru. His research interests include signal and Image processing and control systems. He is the member of ISTE, IETE and BMESI. He is also a fellow at The Institution of Engineers (India). He can be contacted at email: c_k_narayanappa@msrit.edu.

Divya Muddenahalli Narasimhaiah obtained M. Tech from VTU Belgaum in 2007. Presently she is working as Assistant Professor in the school of Electronics & Communication Engineering, Reva University Bengaluru. Her areas of interest are Aerospace Electronics, Signal Processing and Machine Learning. She can be contacted at email: draophd@gmail.com.

Munivenkatappa Nagabhushanam obtained Ph.D from Anna University. Presently working as Assistant Professor in the Dept of E&C, M.S. Ramaiah Institute of Technology. He is a recognized reviewer for few reputed journals and has published 22 research papers in National and International Journals. His Area of Interest are Mixed Signal Design, Image Processing. He can be contacted at email: nagabushanam1971@gmail.com.

Nuthan Prasad Venkatesh received his M. Tech in Digital Electronics and Communication from VTU, Belgaum. He is presently working as Assistant Professor in Department of Electronics and Communication Engineering, M. S. Ramaiah Institute of Technology, Bangalore. His research interests are Antenna Design, Wearable and Textile Antenna. He can be contacted at email: nutan2u@gmail.com.

Bhanu Darshan Srinivas Shobhavathi completed his M. Tech from Ramaiah institute of Technology. Currently he is working with Intel India Pvt Ltd as Digital Design engineer in Memory unit on the core side. His area of research are adaptive filtering, Approximate adders and Low power VLSI. He can be contacted at email: darshudarshan001@gmail.com.