32×32 (C) and 18×18 (Verilog) Implementation Examples
This page presents an integer squaring algorithm and its C implementation (sqr32),
designed for processors that provide only 16×16 multipliers.
By exploiting the symmetry of squaring and applying a Karatsuba-like decomposition, a single 32×32-bit multiplication is replaced with three 16×16-bit multiplications.
An FPGA-oriented Verilog example is also provided in the appendix,
demonstrating an 18-bit squarer (sqr18u) based on the same decomposition.
Note: On systems equipped with a native 64-bit multiplier, the ordinary expression a * a should be used.
On FPGA devices with dedicated 18×18 DSP multipliers, a direct implementation is typically more efficient; the decomposed form shown here is useful when DSP blocks are unavailable or need to be conserved.
a2 = aH2 B2 + 2 aH aL B + aL2
B = 216
Here, aH and aL denote the upper and lower 16-bit halves of the input value, respectively.
sqr32 — computes the square of a given 32-bit integer
and returns the result as a 64-bit unsigned integer.
/* sqr32 - calculate squre of given integer value
* Rev.1.0 (2025-12-22) (c) 2009-2025 Takayuki HOSODA
* SPDX-License-Identifier: BSD-3-Clause
*/
#include <inttypes.h>
uint64_t sqr32(int32_t a);
#ifdef HAVE_NO_MUL64
uint64_t sqr32(int32_t a) {
uint32_t ax, ah, al;
uint64_t bh, bm, bl;
// Karatsuba method: a * a = a_h^2 * B^2 + 2 * a_h * a_l * B + a_l^2
ax = (uint32_t)(a < 0 ? -a : a); // assume two's complement; INT32_MIN wraps
al = (ax ) & 0xffff; // lower 16-bit of input
ah = (ax >> 16) & 0xffff; // upper 16-bit of input
bh = (uint64_t)(ah * ah) << (2 * 16); // 16x16 multiplier and 64-bit shifter
bm = (uint64_t)(ah * al) << (1 + 16); // 16x16 multiplier and 64-bit shifter
bl = (uint64_t)(al * al); // 16x16 multiplier
return bl + bm + bh; // 64-bit adder
}
#else
uint64_t sqr32(int32_t x) {
int64_t eax = (int64_t)x;
return (uint64_t)(eax * eax);
}
#endif
Note: Two's complement representation is assumed;
INT32_MIN is treated as wrapping on negation.
This implementation avoids using a native 18×18 multiplier and instead decomposes the operation into 9×9 multiplications, allowing efficient mapping to LUT-based logic.
sqr18u — computes the square of a given 18-bit unsigned integer
and returns the result as a 36-bit unsigned integer.
/* sqr18u -- unsigned 18-bit squarer without hardware 18x18 multiplier
* Rev.1.0 (2025-12-22) (c) 2009-2025 Takayuki HOSODA
* SPDX-License-Identifier: BSD-3-Clause
*/
module sqr18u (input wire [17:0] a, output wire [35:0] y);
// split input into high / low 9-bit parts
wire [8:0] ah = a[17:9];
wire [8:0] al = a[ 8:0];
// 3x (9x9, LUT-based) multiplications instead of 18x18 multiplication.
wire [17:0] p_hh = ah * ah;
wire [17:0] p_ll = al * al;
wire [17:0] p_hl = ah * al;
// Square-specialized Karatsuba-like decomposition:
// a^2 = al^2 + 2*ah*al*2^9 + ah^2*2^18
assign y = (p_ll)
+ (p_hl << (1 + 9))
+ (p_hh << (9 + 9)); // 36-bit adder
endmodule



© 2000 Takayuki HOSODA.