# Design of Area Efficient Vedic Multiplier using Parallel Prefix Adder Dr Vidyasaraswathi H N, Trisha Maddanna, Vaishnavi U, Vidhushi Agrawal and Saurabh Singh Bangalore Institute of Technology, Bengaluru-560004 Abstract:- The fundamental incentive of this paper is to give the design and implementation of Area Efficient 16x16 Vedic Multiplier. Vedic mathematics is a mathematical system, rooted in ancient Indian techniques that simplifies mathematical operations, enabling quicker and easier problem-solving. The goal here is to implement a Vedic multiplier based on the Urdhva Tiryagbhyam Sutra to maximize the speed of multiplication by using Parallel Prefix Adder (PPA) - Ladner-Fischer adder. Parallel prefix adder (PPA) is one among the fast adders which can be used in data path applications to minimize the overall delay involved in addition. The integration of these algorithms produces a novel multiplier design that optimizes speed while minimizing area and power consumption. This design meets the demands of modern high-speed computing and digital signal processing applications. The performance metrics of the proposed Vedic multiplier are evaluated and compared with those of existing Vedic multipliers and traditional multipliers. The algorithm was implemented and verified using Verilog HDL in Xilinx 14.7 and Cadence tools to ensure the accuracy of the results. Keywords: Vedic Multiplier, PPA-Parallel Prefix Adder, HDL-Hardware description language. # 1. Introduction Multiplication is an important and fundamental operation in many arithmetic computations. Some of the Multiplication-based applications include Multiply and Accumulate (MAC), Complex Intensive Arithmetic Functions (CIAF) [1], and various Digital Signal Processing (DSP) tasks such as convolution, Fast Fourier Transform (FFT), filtering, as well as many functions in microprocessors and microcontrollers. But unfortunately, these multiplier circuits consume more processor time and cause diminishing in processor performance. As the multiplication plays more vital role in many applications, it is highly essential to select high performance multiplier in terms of low area, low power and less delay. At the same time, high speed multipliers are required to maintain high throughput in arithmetic computations beside achieving desired performance. Vedic Mathematics is the ancient practice of Indian mathematics that is calculated based on 16 Formulae. These formulae help to reduce the carry propagation from LSB to MSB in the evaluation of the partial products and sums. Here, a 16x16 bit Vedic Multiplier is constructed using Urdhva Tiryagbhyam sutra with Parallel Prefix adder to analyze their efficiency by comparing their carry propagation individually. Nowadays parallel prefix adders are most commonly used adders due to their fast computation properties. Specifically, the Ladner Fischer parallel prefix adder is more efficient [2] than a standard adder. In a conventional adder, each bit addition depends on the completion of the previous bit, while the Ladner-Fischer adder allows operations to proceed independently. This gate-level modification reduces memory usage and enhances overall performance. There is booming use of portable devices in everyday life and these devices are ultimately demanding for high performance, so the use of Very Large Scale Integration - (VLSI) systems have been increased. To manage these demands, various researchers are developing to improve VLSI systems in terms of area, power, and delay [3]. #### VEDIC MATHEMATICS Vedic mathematics is an ancient mathematical technique. Vedic is a word obtained from the word "Veda" and its meaning is "storehouse of all knowledge". Vedic mathematics was reconstructed from the Vedas by Sri Bharati Krishna Tirathaji between 1911 and 1918. This system is organized into sixteen distinct sutras, each corresponding to different branches of mathematics [3]. These Sutras are enlisted below alphabetically along with their brief meanings: - 1.(Anurupye) Shunyamanyat If one quantity is in a certain ratio, the other will be zero. - 2. Chalana-Kalanabyham Differences and similarities. - 3.EkadhikinaPurvena Involves using one more than the previous value. - 4.EkanyunenaPurvena Involves using one less than the previous value. - 5.Gunakasamuchyah The factors of a sum equal the sum of the factors. - 6.Gunitasamuchyah The product of a sum equals the sum of the products. - 7.NikhilamNavatashcaramamDashatah All from 9 and last from 10. - 8. Paraavartya Yojayet Transpose and adjust. - 9. Puranapuranabyham Based on completion or non-completion. - 10.Sankalana- vyavakalanabhyam Through addition and subtraction. - 11. Shesanyankena Charamena The remainders determined by the last digit. - 12. Shunyam Saamyasamuccaye When the sum is the same that equal is zero. - 13. Sopaantyadvayamantyam The ultimate and twice the penultimate. - 14. Urdhva-Tiryagbhyam Vertically and crosswise. These sutras can be applied across various branches of mathematics, including algebra, trigonometry, and geometry. These methods reduce the complexity of calculations, because these calculations are performed by the human mind. These mathematical techniques consume less power and acquire low chip area. In this paper, 16X16 Vedic multiplier is implemented using "Urdhva Tiryagbhyam algorithm". ## ANCIENT VEDIC MATHEMATICAL ALGORITHM By applying sutras, Vedic mathematics resolves the complexity of calculations. It requires less computation time and hardware for implementation. These sutras are importantly used for decimal multiplication here it is incorporated to binary multiplication. Urdhva - Tiryagbhyamsutra (Vertically and Crosswise) - This paper demonstrates the implementation of the Vedic multiplication technique "UrdhvaTiryagbhyam – Vertically and Crosswise." This method is well-regarded for its high-speed operation, as it generates partial products in parallel and adds them simultaneously. The Vedic multiplier meets these requirements without increasing power consumption. Additionally, it has lower complexity compared to the Booth multiplier, requiring less hardware [4]. Consequently, the Vedic multiplier offers significant advantages in terms of area, power, delay, and complexity. #### ARCHITECTURE OF VEDIC MULTIPLIER The Vedic multiplication technique can be applied to the multiplication of both decimal and binary numbers. This section explains the Vedic multiplication method for binary numbers and the implementation of a 2x2 Vedic multiplier # Tuijin Jishu/Journal of Propulsion Technology ISSN: 1001-4055 Vol. 45 No. 4 (2024) architecture. The 2x2 Vedic multiplier block serves as a fundamental building block for constructing larger architectures, such as 4x4, 8x8, and 16x16 Vedic multipliers. ## 2x2 VEDIC MULTIPLIER: The core principle of Vedic mathematics revolves around the Urdhva Tiryagbhyam sutra, one of the foundational 16 sutras. These Vedic sutras facilitate the multiplication of complex numbers (n\*n) within the decimal number system. The multiplication is purely based on vertical and crosswise multiplication. The Algorithm for Urdhva Tiryagbhyam is as follows. #### A. ALGORITHM: - 1) Multiply the number a by b. - 2) Multiply the LSB bits of the given numbers this gives the LSB digit answer. - Multiply LSB $(a_0)$ of the top number with the second bit MSB of the bottom number $(b_1)$ and vice versa with the MSB of the $a_1$ with the LSB of $b_0$ and add them together - 4) Add step 2 and 3 - 5) Multiply MSB bits of the given numbers and move one place to the left and added with step 4 which gives us the multiplication result. Considering two numbers with two bits each and the numbers are A and B where A=a0a1 and B=b0b1 as shown in the below diagram. First, the Least Significant Bits are multiplied, yielding the Least Significant Bit (LSB) of the final product (vertical). In the second step, the products are calculated in a crosswise manner. This involves multiplying the Least Significant Bit (LSB) of multiplicand A with the next higher bit of multiplicand B. The resulting sum provides the second bit of the final product, while any carry is added to the partial product obtained from multiplying the Most Significant Bits. This process generates both the sum and carry, which correspond to the third and fourth bits of the final product [3]. $$s0 = a0b0$$ ---- (1) $$c1s1 = a1b0 + a0b1$$ ---- (2) $$c2s2 = c1 + a1b1$$ ---- (3) The final result is given as c2s2s1s0. A 2×2 Vedic multiplier block is implemented by using two half adders and four two input and gates as shown below. Figure 1 Block diagram of 2x2 Vedic multiplier ## PARALLEL PREFIX ADDER These parallel prefix adders enhance the transfer rate, thereby boosting the speed of summation. This technique is employed to accelerate the addition operations in DSP processors. The addition process in the parallel prefix adder [5] follows a 3-step structure, as illustrated in Figure 2. Figure 2 General Representation of stages of parallel prefix adder. #### PRE-PROCESSING STAGE This process entails calculating the propagate and generate signals using the corresponding Boolean formulas during the pre-processing phase, as shown in equations (1) and (2). Since the propagate and generate signals are established in this step, this operation is referred to as the prefix operation. $$Ps[i] = A[i] + B[i] \tag{1}$$ $$Gs[i] = A[i] \cdot B[i] \tag{2}$$ ## A. CARRY GENERATION STAGE This is the subsequent step in the parallel prefix adder following the pre-fixing phase, where the propagate and generate signals are determined. In this step, the carry generate and carry propagate signals are produced, with each bit generating a carry based on the corresponding Boolean formulas outlined in equations (3) and (4). As the number of bits increases, the number of stages or levels also rises, denoted by k. $$Gs[i:j] = Gs[i:k] + (Ps[i:j] \cdot Gs[i:j])$$ (3) $$Ps[i:j] = Ps[i:j] \cdot Ps[i:k]$$ (4) ## POST-PROCESSING STAGE In the third stage of the Ladner-Fischer adder, the output sum is obtained by performing the XOR operation on the input signals and the carry generated by the preceding bit from the second stage [5]. The final output sum can be expressed using Boolean equation (5). $$Sum = A XOR B XOR C[i-1]$$ (5) $$C[i-1] = Gs[i1] \tag{6}$$ The structure of the Ladner-Fischer adder is designed like a tree, as illustrated in Figure 2. To achieve optimal performance in addition operations, this adder implementation utilizes the minimum number of logic gates, which accelerates the process and reduces memory requirements within this architecture. ## B. LADNER FISCHER PARALLEL PREFIX ADDER The Ladner-Fischer parallel prefix adder optimizes the area for carry propagation and enhances the speed of summation verification. The addition operation in the parallel prefix adder involves three main steps. Although the Ladner-Fischer adder features minimal logic depth, it has a high fanout and includes more robust operator nodes. Its tree-like structure [6] facilitates efficient arithmetic operations by requiring fewer ports, which in turn reduces both latency and memory usage in this proposed architecture. Figure 3 16-bit Ladner Fischer adder #### 5. PROPOSED SYSTEM The proposed system integrates a Vedic based multiplier with the Ladner-Fischer parallel prefix adder. In the Verilog code, the default addition operation is performed using the Ripple Carry adder method. However, this Ripple Carry Adder requires a higher number of logic gates, leading to increased area and power consumption. To enhance speed, we incorporate parallel prefix adders as a key component of our module, as they perform operations in parallel. Thus, we replace the Ripple Carry adders with parallel prefix adders (Ladner-Fischer adder) in the Vedic multiplier. Utilizing a parallel prefix adder for addition allows us to achieve an area-efficient and highspeed multiplier. The design of the Vedic based multiplier using the parallel prefix Ladner-Fischer adder is presented below. # A. 16X16 VEDIC MULTIPLIER USING LADNER-FISCHER PARALLEL PREFIX ADDER The design of the 16×16 block is an optimized configuration of 8×8 blocks, as depicted in Figure 4. The initial step involves grouping the 8 bits from each 16-bit input. The least significant bits (LSBs) of the two inputs generate vertical and crosswise product terms. Each input byte is processed by a separate 8×8 Vedic multiplier, resulting in sixteen partial product rows. These partial product rows are then added using a 16-bit parallel prefix Ladner-Fischer adder to produce the final product bits. The schematic for the 16×16 block is constructed using the 8×8 Vedic multiplier. The partial products represent the Urdhva vertical and cross product terms, and the final product is obtained through the use of an OR gate. Figure 4 Block Diagram of 16X16 bit Vedic Multiplier using Ladner-Fischer Parallel prefix adder ## 6. RESULTS AND DISCUSSIONS # A. SIMULATION RESULTS The 2x2 bit Vedic multiplier has been verified using Xilinx 14.7 and Cadence Design Tool, with the output observed through simulation. The simulation waveform for the 2x2 bit Vedic multiplier is presented in figure 5, while the netlist is illustrated in figure 6. Additionally, the 16x16 bit Vedic multiplier, utilizing a 16-bit parallel prefix adder, has also been verified using Xilinx 14.7 and Cadence Design Tool, with its output confirmed through simulation. The simulation waveform for 16x16 bit Vedic multiplier is shown in the figure 9 and for parallel prefix adder is shown in figure 7. The RTL schematic for 16x16 bit Vedic multiplier is shown in the figure 12 and for parallel prefix adder is shown in figure 10. Figure 5 Simulation Waveform of 2x2 Vedic Multiplier Figure 6 Netlist of 2x2 Vedic Multiplier Figure 7 Simulation Waveform of 16-bit Ladner- Fischer parallel prefix adder 1322 Figure 8 Block Diagram of 16-bit Ladner-Fischer parallel prefix adder | 1 | | Maga | | | | | | |-----|---------------------------|-----------------------------------------|-----------------------------------------|-----------------------------------------|-----------------------------------------|-----------------------------------------|-----------------------------------------| | 04 | /vedic16x16ppa_tb/a | 24 | 12 | | 24 | 200 | 36 | | 04 | /vedic16x16ppa_th/b | | 10 | 12 | 2 | 21 | 48 | | 04 | /vedic16x16ppa_th/result | 48 | 120 | 144 | 48 | 4200 | 1728 | | 1 | /vedic16x16ppa_tb/c_out | StO | | | | | | | 84 | /vedx16x16ppa_tb/uut/ai | 0000000000011000 | 00000000001100 | | 000000000011000 | 000000011001000 | 0000000000100100 | | 1 | /vedc16x16ppa_tb/uut/bi | 000000000000000000000000000000000000000 | 000000000001010 | 000000000001100 | 00000000000010 | 0000000000010101 | 000000000110000 | | 0.4 | /vedic16x16ppa_tb/wt/sout | 000000000000000000000000000000000000000 | 000000000000000000000000000000000000000 | 000000000000000000000000000000000000000 | 000000000000000000000000000000000000000 | 200000000000000000000000000000000000000 | 000000000000000000000000000000000000000 | | 1 | /vedic16x16ppa_tb/uut/cc0 | StO | | | | | | | - 1 | /vedic16x16ppa_tb/uut/cc1 | StO | | | | | | | 4 | /vedic16x16ppa_tb/uut/cc2 | StO | | | | | | | 1 | /vedicióxióppa_tb/uut/cc3 | StO | | | | | | | 4 | /vedic16x16ppa_tb/uut/cc | Stő | | | | | | Figure 9 Simulation output waveform of 16x16 Vedic Multiplier Figure 10 RTL Schematic of 16-bit Ladner-Fischer PPA Figure 11 Block Diagram of 16x16 Vedic Multiplier Figure 12 RTL Schematic of 16x16 Vedic Multiplier ## B. SYNTHESIS RESULTS OF LADNER-FISCHER PPA The 16-bit parallel prefix adder, designed using the Ladner-Fischer structure, has been verified through simulation. The synthesis of the Ladner-Fischer adder was carried out using Xilinx 14.7 and Cadence Design Tool. The Area report of 16-bit parallel prefix adder is tabulated in table 1. The report analysis for the same is done on the basis of delay and area as shown in the table 2. The delay, power and area of the PPA is compared with that of the other conventional adders as shown in table 3. | Device Utilization Summary (estimated values) | | | | | | | |-----------------------------------------------|------|-----------|-------------|--|--|--| | Logic Utilization | Used | Available | Utilization | | | | | Number of Slices | 23 | 3584 | 0% | | | | | Number of 4 input LUTs | 43 | 7168 | 0% | | | | | Number of bonded IOBs | 50 | 141 | 35% | | | | Table 1 Area Report of 16-bit Ladner- Fischer parallel prefix adder Table 2 Report Analysis for Ladner Fischer Adder | Report Analysis | 4BIT | 8BIT | 16BIT | |---------------------|-------|--------|--------| | Path Delay(ns) | 7.898 | 11.074 | 13.822 | | No of Slices | 4 | 10 | 23 | | No of LUT'S | 8 | 18 | 43 | | No of Bonded IOB'S | 14 | 26 | 50 | | The of Bonaca Tob S | | 20 | 20 | Table 3 Comparison with Different 16-bit adders | Type of parallel prefix Adders (16 bit) | Delay(ns) | Power (W) | No. of logic | |-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | | | LUTs | | Kogge-stone adder [7] | 26.895 | 1.076 | 37 | | Brent kung adder [7] | 27.31 | 0.897 | 32 | | Han-Carlson adder [7] | 27.09 | 1.001 | 20 | | Ladner fischer adder [7] | 21.416 | 0.0136 | 24 | | Other adders | | | | | Ripple carry adder(16 bit) [8] | 27.331 | 0.125 | 45 | | Carry save adder(16 bit) [8] | 24.667 | 0.126 | 63 | | | Kogge-stone adder [7] Brent kung adder [7] Han-Carlson adder [7] Ladner fischer adder [7] Other adders Ripple carry adder(16 bit) [8] | Kogge-stone adder [7] 26.895 Brent kung adder [7] 27.31 Han-Carlson adder [7] 27.09 Ladner fischer adder [7] 21.416 Other adders Ripple carry adder(16 bit) [8] 27.331 | Kogge-stone adder [7] 26.895 1.076 Brent kung adder [7] 27.31 0.897 Han-Carlson adder [7] 27.09 1.001 Ladner fischer adder [7] 21.416 0.0136 Other adders Ripple carry adder(16 bit) [8] 27.331 0.125 | # C. RESULTS OF VEDIC MULTIPLIER USING LADNER- FISCHER PARALLEL PREFIX ADDER The 16x16 bit Vedic multiplier using 16-bit parallel prefix adder is synthesized using Cadence Design Tool. The synthesis schematics for 16x16 Vedic multiplier, the area report, power report, gates report is shown in figures 13, 14, 15 and 16 respectively. Figure 13 Synthesis Schematic of 16x16 Vedic Multiplier using ladner fischer adder | Instance | <br>Cell Count | | | |---------------|----------------|----------|----------| | /edic16x16ppa | 803 | 1108.974 | 1108.974 | Figure 14 Area Report of 16x16 Vedic Multiplier using ladner fischer adder | Category | Leakage | Internal | Switching | Total | Row% | |----------|-------------|-------------|-------------|-------------|---------| | memory | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00% | | register | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00% | | latch | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00% | | logic | 1.03797e-05 | 4.25568e-05 | 4.31917e-05 | 9.61282e-05 | 100.00% | | bbox | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00% | | clock | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00% | | pad | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00% | | pm | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00000e+00 | 0.00% | | Subtotal | 1.03797e-05 | 4.25568e-05 | 4.31917e-05 | 9.61282e-05 | 100.00% | | rcentage | 10.80% | 44.27% | 44.93% | 100.00% | 100.00% | Figure 15 Power Report of 16x16 Vedic Multiplier using ladner fischer adder | Gate Ins | tances | Area | Library | |----------------|--------|--------|------------| | ACHCONX2 | 32 | 0.000 | slow | | ADDEXL | 97 | | slow | | ADDHX1 | | 0.000 | | | AND2X2 | | 0.000 | | | AOI2BB1X1 | 16 | 0.000 | slow | | CLKINVX1 | 6 | 0.000 | slow | | CLKXOR2X2 | 16 | 0.000 | slow | | INVXL | 32 | 0.000 | slow | | NAND2BXL | 9 | 0.000 | slow | | NAND3BXL | 48 | 0.000 | slow | | NOR2BXL | 71 | 0.000 | slow | | NOR2XL | 16 | 0.000 | slow | | NOR3BXL | 16 | 0.000 | slow | | OR2X2 | 5 | 0.000 | slow | | XNOR2X2 | 26 | 0.000 | slow | | XNOR3X1 | 61 | 0.000 | slow | | total | 803 | 0.000 | | | Type | Instan | ces A | rea Area % | | inverter | | 38 0 | 000 0.0 | | logic | | | 999 9.9 | | physical_cells | | | 000 0.0 | | total | | 803 0. | 999 9.9 | Figure 16 Gates Report of 16x16 Vedic Multiplier using ladner fischer adder ## COMPARISON OF AREA AND POWER REPORTS WITH BASIC 16x16 VEDIC MULTIPLIER The area and power consumption of the 16x16 bit Vedic multiplier utilizing a 16-bit parallel prefix adder have been compared with those of a standard 16x16 bit Vedic multiplier, as detailed in Table 4. The results indicate that the proposed Vedic multiplier demonstrates reduced area and power consumption compared to the existing method. Table 4 Comparison of Power and Area of Vedic based Multiplier and Vedic Multiplier using parallel prefix Adder | Sl. no | Types of Multipliers (16- bit) | Comparison | | |--------|----------------------------------------------------------------------|------------|-----------------------| | | | Power(mW) | Area(m <sup>2</sup> ) | | 01 | Normal vedic based Multiplier | 0.3 | 4646.037 | | 02 | Vedic Multiplier using parallel prefix adder (Ladner fischer adder) | 0.096 | 1108.974 | ## 7. Conclusion The proposed parallel prefix adder addition offers great advantage in reducing area, power and speed. The 16x16 Vedic multiplier using PPA is of high area efficiency with low power and high performance. It is simulated and synthesized using Verilog in Xilinx 14.7 and Cadence Design tool. A comparison with various existing multipliers and adders has been conducted. This analysis reveals that the 16x16 Vedic multiplier using a parallel prefix adder (PPA) exhibits improved power and area efficiency compared to other conventional multipliers. The performance of different experiments has been assessed in terms of area, delay, and power consumption, highlighting that the Vedic multiplier employing parallel prefix adders demonstrates superior efficiency in both power and area. ## A. FUTURE SCOPE The proposed design of the Vedic Multiplier can be implemented for high-speed applications and extended to FIR and IIR filters with higher order taps. We can extend this design to implement in real-time applications: - ► Enhancing ECG signals for better analysis - Image processing applications like RGB to gray scale conversion - Image segmentation and compression - > Image inversion - Image watermarking - Biomedical Signal processing #### References - [1] Vidyasaraswathi.H.N, Akarsha S.N, "AREA EFFICIENT RNS-PPA MULTIPLIER DESIGN FOR HIGH-SPEED APPLICATION", International Journal of Research Publication and Reviews, Vol 3, no 5, pp 3165-3175, May 2022. - [2] A New Implementation of 16-bit Parallel Prefix Adder for High Speed and Low Area, Proceedings of the 2020 4th International Conference on Digital Signal Processing - [3] Durgadevi, M. Renugadevi, C. Sathyasree, R. Chitra, "Design of High Speed Vedic Multiplier Using Carry Select Adder", International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181, Volume: 5 Issue: 09, 2023. - [4] Gokhale, G.R. and Gokhale, S.R. (2015) 'Design of Area and Delay Efficient Vedic Multiplier Using Carry Select Adder', IEEE International Conference on Information Processing, pp.295-300. - [5] Akarsha Yadav, Vijaya Prakash, and Vidya Saraswathi, "Design of Modified RNS-PPA Based FIR Filter for High-Speed Application", European Journal of Engineering and Technology Research ISSN:2736-576, 2022. - [6] Liao Q, Li S. A new implementation of 16-bit parallel prefix adder for high speed and low area. In Proceedings of the 2020 4th International Conference on Digital Signal Processing. 2020 Jun 19 (pp. 284-288) - [7] Performance Analysis of Parallel Prefix Adders Using Zynq- 7000 Soc Neeraj Kumar Cheryala, Nikhil Madhavaneni, Rajesh Odela-2016. - [8] Hybrid variable latency carry skip adder in multiplier structures by Cristin.R.- researchgate-2019.