Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CIRCUIT, SYSTEM AND METHOD FOR COMPUTER DIVISION APPROXIMATION
Document Type and Number:
WIPO Patent Application WO/2023/200817
Kind Code:
A1
Abstract:
The present invention relates to a system and method for computer division approximation. In embodiments, a processor including circuitry may be configured to perform the method for computer division approximation. In embodiments, the circuitry may be configured to obtain a first divisor and a first dividend for division from a register file of a microprocessor, generate an initial approximation of 1 divided by the first divisor, generate a first result of the division operation, and transmit the initial approximation to the register file of the microprocessor.

Inventors:
BADIZADEGAN NIMA (US)
Application Number:
PCT/US2023/018220
Publication Date:
October 19, 2023
Filing Date:
April 11, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BADIZADEGAN NIMA (US)
International Classes:
G06F7/523; G06F7/487
Foreign References:
US20170293471A12017-10-12
US20100125621A12010-05-20
US6223198B12001-04-24
Attorney, Agent or Firm:
BARKAUS, Keith, J. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A circuit compri sing : a) an input port, wherein the input port is configured to receive a first binary number; b) a first multiplier operationally connected to the input port, wherein the first multiplier is configured to:

1) receive as an input to the first multiplier the first binary number; and

2) provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; c) a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to:

1) receive as an input to the first bit shifter the first binary number; and

2) provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; d) a first inverter operationally connected to the first bit shifter, wherein the first inverter is configured to:

1) receive as an input to the first inverter the first shifted binary number; and

2) provide as an output of the first inverter an inverse of the first shifted binary number; e) a second inverter operationally connected to the input port wherein the second inverter is configured to:

1) receive as an input to the second inverter the first binary number; and

2) provide as an output of the second inverter an inverse of the first binary number; f) a first adder, operationally connected to the second inverter, wherein the first adder is configured to:

1) receive as an input to the first adder the inverse of the first binary number; and

2) provide as an output of the first adder a first factor, wherein the first factor is the sum of the input to the first adder and a first constant; g) a second adder, operationally connected to the first multiplier and the first inverter, wherein the second adder is configured to: 1) receive as inputs to the second adder the inverse of the first shifted binary number and the first squared number; and

2) provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; h) a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to:

1) receive as an input to the second multiplier the first factor and the second factor; and

2) provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and i) a second bit shifter operationally connected to the second multiplier, wherein the second bit shifter is configured to:

1) receive as an input to the second bit shifter the product; and

2) provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the product shifted by a second number of bits in a second direction. A circuit comprising: a) an input port, wherein the input port is configured to receive a first binary number; b) a first multiplier operationally connected to the input port, wherein the first multiplier is configured to:

1) receive as an input to the first multiplier the first binary number; and

2) provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; c) a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to:

1) receive as an input to the first bit shifter the first binary number; and

2) provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; d) a first inverter operationally connected to the first bit shifter, wherein the first inverter is configured to:

1) receive as an input to the first inverter the first shifted binary number; and 2) provide as an output of the first inverter an inverse of the first shifted binary number; e) a second inverter operationally connected to the input port wherein the second inverter is configured to:

1) receive as an input to the second inverter the first binary number; and

2) provide as an output of the second inverter an inverse of the first binary number; f) a first adder, operationally connected to the second inverter and a first constant source, wherein the first constant source is configured to provide as an output of the first constant source a first constant, wherein the first adder is configured to:

1) receive as inputs to the first adder the inverse of the first binary number and the first constant; and

2) provide as an output of the first adder a first factor, wherein the first factor is the sum of the inputs to the first adder; g) a second adder, operationally connected to the first multiplier, the first inverter, and a second constant source, wherein the second constant source is configured to provide as an output of the second constant source a second constant, wherein the second adder is configured to:

1) receive as inputs to the second adder the inverse of the first shifted binary number, the first squared number, and the second constant; and

2) provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; h) a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to:

1) receive as an input to the second multiplier the first factor and the second factor; and

2) provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and i) a second bit shifter operationally connected to the second multiplier, wherein the second bit shifter is configured to:

1) receive as an input to the second bit shifter the product; and 2) provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the product shifted by a second number of bits in a second direction. The circuit of claim 1 or claim 2, wherein the second direction is to the left. The circuit of claim 3, wherein the second adder is further operationally connected to the second inverter and wherein the inputs to the second adder further include the inverse of the first binary number. The circuit of claim 4, wherein the first direction is to the right, the first number of bits is one, and the second number of bits is two. The circuit of claim 4, wherein the circuit further comprises: j) a third adder, operationally connected to the second multiplier and the second bit shifter, wherein the third adder is configured to: a. receive as inputs to the third adder the product and the second shifted binary number; and b. provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation of one divided by the first binary number is the sum of the inputs to the third adder. The circuit of claim 6, wherein the circuit further comprises: k) a third bit shifter operationally connected to the input port, wherein the third bit shifter is configured to:

1) receive as an input to the third bit shifter the first binary number; and

2) provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the first binary number shifted by a third number of bits in a third direction; and l) a third inverter operationally connected to the third bit shifter, wherein the third inverter is configured to:

1) receive as an input to the third bit shifter the third shifted binary number; and

2) provide as an output of the third bit shifter an inverse of the third shifted binary number; wherein the second adder is further operationally connected to the third inverter and wherein the inputs to the second adder further include the third shifted binary number. The circuit of claim 7, wherein the first direction is to the right, the first number of bits is two, the second number of bits is two, the third direction is to the right, and the third number of bits is three. The circuit of claim 6, wherein the circuit further comprises: k) a third bit shifter operationally connected to the input port, wherein the third bit shifter is configured to:

1) receive as an input to the third bit shifter the first binary number; and

2) provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the first binary number shifted by a third number of bits in a third direction; and l) a fourth bit shifter operationally connected to the input port, wherein the fourth bit shifter is configured to:

1) receive as an input to the fourth bit shifter the first binary number; and

2) provide as an output of the fourth bit shifter a fourth shifted binary number, wherein the fourth shifted binary number is the first binary number shifted by a fourth number of bits in a fourth direction; and wherein the second adder is further operationally connected to the third bit shifter and the fourth bit shifter, and wherein the inputs to the second adder further include the third shifted binary number and the fourth shifted binary number. The circuit of claim 9, wherein the first direction is to the left, the first number of bits is one, the second number of bits is two, the third direction is to the right, the third number of bits is two, the fourth direction is to the right and the fourth number of bits is four.] The circuit of claim 6, wherein the circuit further comprises: k) a third bit shifter operably connected to the second multiplier, wherein the third bit shifter is configured to:

1) receive as an input to the third bit shifter the product; and

2) provide as an output of the third bit shifter a second shifted binary number, wherein the second shifted binary number is the product shifted by a second number of bits in a second direction; and wherein the third adder is further operationally connected to the third bit shifter and the inputs to the third adder further include the third shifted binary number. The circuit of claim 11, wherein the first direction is to the right, the first number of bits is one, the second number of bits is one, the third direction is to the right, and the third number of bits is one. The circuit of claim 12, wherein the circuit further comprises a fourth bit shifter operationally connected to the input port, wherein the fourth bit shifter is configured to:

1) receive as an input to the fourth bit shifter the first binary number; and

2) provide as an output of the fourth bit shifter a fourth shifted binary number, wherein the fourth shifted binary number is the first binary number shifted by a fourth number of bits in a fourth direction. The circuit of claim 13, wherein the second adder is further operationally connected to the fourth bit shifter and the inputs to the second adder further includes the fourth bit shifter. The circuit of claim 14 wherein the fourth direction is to the right and the fourth number of bits is four. The circuit of claim 13, wherein the circuit further comprises:

1) a third inverter operationally connected to the fourth bit shifter, wherein the third inverter is configured to:

1) receive as an input the fourth shifted binary number; and

2) provide as an output an inverse of the fourth shifted binary number; and wherein the second adder is further operationally connected to the third inverter and the input to the second adder further includes the fourth shifted binary number. The circuit of claim 16, wherein the fourth direction is to the right and the fourth number of bits is four. A circuit comprising: a) an input port, wherein the input port is configured to receive a first binary number; b) a first multiplier operationally connected to the input port, wherein the first multiplier is configured to:

1) receive as an input to the first multiplier the first binary number; and

2) provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; c) a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to:

1) receive as an input to the first bit shifter the first binary number; and 2) provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; d) a second bit shifter operationally connected to the input port, wherein the second bit shifter is configured to:

1) receive as an input to the second bit shifter the first binary number; and

2) provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the second binary number shifted by a first number of bits in a first direction; e) a first inverter operationally connected to the second bit shifter, wherein the first inverter is configured to:

1) receive as an input to the first inverter the first shifted binary number; and

2) provide as an output of the first inverter an inverse of the first shifted binary number; f) a second inverter operationally connected to the input port wherein the second inverter is configured to:

1) receive as an input to the second inverter the first binary number; and

2) provide as an output of the second inverter an inverse of the first binary number; g) a third inverter operationally connected to the first bit shifter, wherein the third inverter is configured to:

1) receive as an input to the third inverter the second shifted binary number; and

2) provide as an output of the third inverter an inverse of the second shifted binary number; and h) a first adder, operationally connected to the second inverter and the first inverter, wherein the first adder is configured to:

1) receive as inputs to the first adder the inverse of the first binary number and the inverse of the first shifted number; and

2) provide as an output of the first adder a first factor, wherein the first factor is the sum of the input to the first adder and a first constant; i) a second adder, operationally connected to the first multiplier and the second inverter, wherein the second adder is configured to: 1) receive as inputs to the second adder the inverse of the third shifted binary number and the first squared number; and

2) provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; j) a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to:

1) receive as an input to the second multiplier the first factor and the second factor; and

2) provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and k) a third bit shifter operationally connected to the second multiplier, wherein the third bit shifter is configured to:

1) receive as an input to the third bit shifter the product; and

2) provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the product shifted by a second number of bits in a second direction. A circuit comprising: a) an input port, wherein the input port is configured to receive a first binary number; b) a first multiplier operationally connected to the input port, wherein the first multiplier is configured to:

1) receive as an input to the first multiplier the first binary number; and

2) provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; c) a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to:

1) receive as an input to the first bit shifter the first binary number; and

2) provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; d) a second bit shifter operationally connected to the input port, wherein the second bit shifter is configured to:

1) receive as an input to the second bit shifter the first binary number; and 2) provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the second binary number shifted by a first number of bits in a first direction; e) a first inverter operationally connected to the second bit shifter, wherein the first inverter is configured to:

1) receive as an input to the first inverter the first shifted binary number; and

2) provide as an output of the first inverter an inverse of the first shifted binary number; f) a second inverter operationally connected to the input port wherein the second inverter is configured to:

1) receive as an input to the second inverter the first binary number; and

2) provide as an output of the second inverter an inverse of the first binary number; g) a third inverter operationally connected to the first bit shifter, wherein the third inverter is configured to:

1) receive as an input to the third inverter the second shifted binary number; and

2) provide as an output of the third inverter an inverse of the second shifted binary number; and h) a first adder, operationally connected to the second inverter, the first inverter, a first constant source, wherein the first constant source is configured to provide as an output of the first constant source a first constant, wherein the first adder is configured to:

1) receive as inputs to the first adder the inverse of the first binary number, the inverse of the first shifted number, and the first constant; and

2) provide as an output of the first adder a first factor, wherein the first factor is the sum of the inputs to the first adder; i) a second adder, operationally connected to the first multiplier, the second inverter, and a second constant source, wherein the second constant source is configured to provide as an output of the second constant a second constant, wherein the second adder is configured to:

1) receive as inputs to the second adder the inverse of the third shifted binary number, the first squared number, and the second constant; and

2) provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder; j) a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to:

1) receive as an input to the second multiplier the first factor and the second factor; and

2) provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and j) a third bit shifter operationally connected to the second multiplier, wherein the third bit shifter is configured to:

1) receive as an input to the third bit shifter the product; and

2) provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the product shifted by a second number of bits in a second direction. The circuit of claim 18 or claim 19, wherein the third direction is to the left. The circuit of claim 3, wherein the circuit further comprises: k) a fourth bit shifter operationally connected to the input port, wherein the fourth bit shifter is configured to:

1) receive as an input to the fourth bit shifter the first binary number; and

2) provide as an output of the fourth bit shifter a fourth shifted binary number, wherein the fourth shifted binary number is the first binary number shifted by a fourth number of bits in a fourth direction; l) a fifth bit shifter operationally connected to the input port, wherein the fifth bit shifter is configured to:

1) receive as an input to the fifth bit shifter the first binary number; and

2) provide as an output of the fifth bit shifter a fifth shifted binary number, wherein the fifth shifted binary number is the first binary number shifted by a fifth number of bits in a fifth direction. The circuit of claim 21, wherein the second adder is further operationally connected to the fourth bit shifter and the fifth bit shifter, and wherein the inputs to the second adder further include the fourth shifted binary number and the fifth shifted binary number. The circuit of claim 22, wherein the first direction is to the right, the first number of bits is five, the second direction is to the left, the second number of bits is two, the third number of bits is two, the fourth direction is to the right, the fourth number of bits is two, the fifth direction is to the right, the fifth number of bits is five. The circuit of claim 23, wherein the second adder is further operationally connected to the second inverter and wherein the inputs to the second adder further include the inverse of the first binary number. The circuit of claim 22, wherein the circuit further comprises: m) a third adder operationally connected to the third bit shifter and the second multiplier, wherein the third adder is configured to:

1) receive as inputs to the third adder the product and the third shifted binary number;

2) provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation is the sum of the inputs to the third adder. The circuit of claim 25, wherein the first direction is to the right, the second direction is to the left, the fourth direction is to the right, the fourth number of bits is two, the fifth direction is the right, and the fifth number of bits is four. The circuit of claim 26, wherein the first number of bits is five, the second number of bits is one, and the third number of bits is two. The circuit of claim 27, wherein the second adder is further operationally connected to the second inverter and wherein the inputs to the second adder further include the inverse of the first binary number. The circuit of claim 26, wherein the first number of bits is five, the second number of bits is two, and the third number of bits is two. The circuit of claim 29, wherein the second adder is further operationally connected to the first input and wherein the inputs to the second adder further include the first binary number. The circuit of claim 26, wherein the first number of bits is four, the second number of bits is one, and the third number of bits is one. The circuit of claim 31, wherein the second adder is further operationally connected to the second inverter and wherein the inputs to the second adder further include the inverse of the first binary number. The circuit of claim 26, wherein the first number of bits is four, the second number of bits is one, and the third number of bits is two. The circuit of claim 33, wherein the second adder is further operationally connected to the second inverter and wherein the inputs to the second adder further include the inverse of the first binary number. The circuit of claim 22, wherein the circuit further comprises: m) a sixth bit shifter operationally connected to the second multiplier, wherein the sixth bit shifter is configured to:

1) receive as an input to the sixth bit shifter the product; and

2) provide as an output of the sixth bit shifter a sixth shifted binary number, wherein the sixth shifted binary number is the product shifted by a sixth number of bits in a sixth direction; and n) a third adder operationally connected to the second bit shifter and the sixth bit shifter, wherein the third adder is configured to:

1) receive as inputs to the third adder the third shifted binary number and the sixth shifted binary number;

2) provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation is the sum of the inputs to the third adder. The circuit of claim 35, wherein the first direction is to the right, the first number of bits is five, the second direction is to the left, the third number of bits is two, the fourth direction is to the right, the fourth number of bits is two, the fifth direction is to the right, the fifth number of bits is four, the sixth direction is to the left and the sixth number of bits is one. The circuit of claim 36, wherein the second number of bits is one. The circuit of claim 37, wherein the second adder is further operationally connected to the second inverter and wherein the inputs to the second adder further include the inverse of the first binary number. The method of claim 38, wherein the third adder is further operationally connected to the to the second multiplier and wherein the inputs to the third adder further include the product. The circuit of claim 36, wherein the second number of bits is two. The circuit of claim 40, wherein the second adder is further operationally connected to the first input and wherein the input to the second adder further includes the first binary number. The circuit of claim 21, wherein the circuit further comprises: m) a fourth inverter operationally connected to the third bit shifter, wherein the fourth inverter is configured to:

1) receive as an input to the fourth inverter the fourth shifted binary number; and

2) provide as an output of the fourth inverter an inverse of the fourth shifted binary number; and n) a fifth inverter operationally connected to the fourth bit shifter, wherein the fifth inverter is configured to:

1) receive as an input to the fifth inverter the fourth bit shifted binary number; and

2) provide as an output of the fifth inverter an inverse of the fifth shifted binary number; and wherein the second adder is further operationally connected to the fourth inverter and the fifth inverter, and wherein the inputs to the second adder further include the inverse of the fourth binary numbers and the fifth binary numbers. The circuit of claim 42, wherein the circuit further comprises: o) a third adder operationally connected to the third bit shifter and the second multiplier, wherein the third adder is configured to:

1) receive as inputs to the third adder the product and the third shifted binary number; and

2) provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation is the sum of the inputs to the third adder. The circuit of claim 43, wherein the first direction is to the right, the first number of bits is one, the second direction is to the left, the second number of bits is one, the second number of bits is two, the fourth direction is to the right, the fourth number of bits is two, and the fifth direction is to the right. The circuit of claim 44, wherein the fifth number of bits is four. The circuit of claim 44, wherein the fifth number of bits is three. The circuit of claim 43, wherein the circuit further comprises: p) a sixth bit shifter operationally connected to the input port, wherein the sixth bit shifter is configured to:

1) receive as an input to the sixth bit shifter the first binary number; and 2) provide as an output of the sixth bit shifter the sixth bit shifted binary number; and q) a sixth inverter operationally connected to the sixth bit shifter, wherein the sixth inverter is configured to:

1) receive as an input to the sixth inverter the sixth bit shifted binary number; and

2) provide as an output of the sixth inverter an inverse of the sixth bit shifted binary number; and wherein the second adder is further operationally connected to the sixth inverter; and wherein the inputs to the second adder further include the inverse of the sixth bit shifted binary number. The circuit of claim 47, wherein the first direction is to the right, the first number of bits is five, the second direction is to the left, the second number of bits is one, the third number of bits is two, the fourth direction is to the right, the fourth number of bits is one, the fifth direction is to the right, the fifth number of bits is three, the sixth direction is to the right, and the sixth number of bits is four. The circuit of claim 42, wherein the circuit further comprises: o) a sixth bit shifter operationally connected to the input port, wherein the sixth bit shifter is configured to:

1) receive as an input to the sixth bit shifter the first binary number; and

2) provide as an output of the sixth bit shifter the sixth bit shifted binary number; and p) a sixth inverter operationally connected to the sixth bit shifter, wherein the sixth inverter is configured to:

1) receive as an input to the sixth inverter the sixth bit shifted binary number; and

2) provide as an output of the sixth inverter an inverse of the sixth bit shifted binary number; q) a seventh bit shifter operationally connected to the second multiplier, wherein the seventh bit shifter is configured to:

1) receive as an input to the seventh bit shifter the product; and

2) provide as an output of the seventh bit shifter a seventh bit shifted binary number, wherein the seventh bit shifted binary number is the product shifted by a seventh number of bits in a seventh direction; r) a third adder operationally connected to the third bit shifter and the seventh bit shifter, wherein the third adder is configured to:

1) receive as inputs to the third adder the third shifted binary number and the seventh shifted binary number; and

2) provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation is the sum of the inputs to the third adder; and wherein the second adder is further operationally connected to the sixth inverter and wherein the inputs to the second adder further includes the sixth bit shifted binary number. The circuit of claim 49, wherein the first direction is to the right, the first number of bits is five, the second direction is to the left, the second number of bits is one, the third number of bits is two, the fourth direction is to the right, the fourth number of bits is 1, the fourth number of bits is three, the fifth direction is to the right, the fifth number of bits is three, the sixth direction is to the right, the sixth number of bits is four, the seventh direction is to the left, and the seventh number of bits is one. The circuit of claims 1-50, wherein the first constant is determined by a minimax algorithm. The circuit of claims 1-50, wherein the second constant is determined by a minimax algorithm. The circuit of claims 2-18 or claims 20-52, wherein the constant source is a register. The circuit of claims 2-18 or claims 20-52, wherein the constant source is circuitry. The circuit of claims 1-54, wherein the circuit is a component of a processor. The circuit of claims 1-54, wherein the circuit is a component of a field programmable gate array. The circuit of claims 1-54, wherein the circuit is a component of an application-specific integrated circuit. A processor comprising circuitry configured to perform a method for computer division approximation comprising: a) obtaining, by the circuitry, a first divisor and a first dividend for division from a register file of a microprocessor; b) generating, by the circuitry, an initial approximation of 1 divided by the first divisor by processing the first divisor as an input using an approximation algorithm implemented by the circuitry to generate as an output the initial approximation using one or more constants, wherein the approximation algorithm is determined by the formula:

A(x) = 5(C1 - 1.03125x + x2) (C2 - 2.6875x + x2) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor; c) generating, by the circuitry, a first result of the division operation using the initial approximation of 1 divided by the first divisor and the first dividend; and d) transmitting, by the circuitry, the first result of the division operation to the register file of the microprocessor. The processor of claim 58, wherein the circuitry comprises a microcontroller. The processor of claim 58, wherein the circuitry comprises one or more hardware multipliers. The processor of claim 60, wherein each respective hardware multiplier of the one or more hardware multipliers comprises respective multiplication instructions. The processor of claim 58, wherein the circuity comprises one or more fused multiply add operators. The processor of claim 58, wherein the circuity comprises one or more bit shifters. The processor of claim 63, wherein each respective bit shifter of the one or more bit shifters is configured to shift a digit of a respective input in a direction. The processor of claim 64, wherein the direction is a left direction. The processor of claim 64, wherein the direction is a right direction. The processor of claim 58, wherein the circuity comprises one or more adders. The processor of claim 67, wherein each respective adder of the one or more adders comprises respective summation instructions. The processor of claim 58, wherein the step of obtaining the first divisor and the first dividend for division comprises receiving, by the circuitry, a request to perform a division operation including the first divisor and the first dividend. The processor of claim 69, wherein the initial approximation is provided in response to the request to perform the division operation. The processor of claim 69, wherein the method further comprises, prior to step c), generating, by the circuitry or the one or more processors, respectively, a second approximation of 1 divided by the divisor using an iterative refinement algorithm. The processor of claim 71, wherein the iterative refinement algorithm uses Newton’s method of iterative approximation. The processor of claim 71, wherein the iterative refinement algorithm uses Goldschmidt’s division algorithm. The processor of claim 58, wherein step c) is performed by a respective hardware multiplier of the one or more hardware multipliers. The processor of claim 58, wherein the first constant is determined by a minimax algorithm. The processor of claim 51, wherein the second constant is determined by a minimax algorithm. A computer system comprising one or more memories and one or more processors, the one or more memories comprising computer-readable instructions that when executed cause the one or more processors to perform a method for computer division approximation comprising: a) obtaining, by the one or more processors, a first divisor and a first dividend for division from the one or more memories; b) generating, by the one or more processors, an initial approximation of 1 divided by the first divisor by processing the first divisor as an input using an approximation algorithm implemented by the processor to generate as an output the initial approximation using one or more constants, wherein the approximation algorithm is determined by the formula:

A(x) = 5(C1 - 1.03125x + x2) (C2 - 2.6875x + x2) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor; c) generating, by the one or more processors, a first result of the division operation using the initial approximation of 1 divided by the first divisor and the first dividend; and transmitting, by the one or more processors, the first result of the division operation to the one or more memories. The computer system of claim 77, wherein the one or more processors of claim 77, respectively, comprises a microcontroller. The computer system of claim 77, wherein the one or more processors of claim 77 comprises one or more hardware multipliers. The computer system of claim 79, wherein each respective hardware multiplier of the one or more hardware multipliers comprises respective multiplication instructions. The computer system of claim 77, the one or more processors comprise one or more fused multiply add operators. The computer system of claim 77, wherein the one or more processors of claim 2, comprises one or more bit shifters. The computer system of claim 82, wherein each respective bit shifter of the one or more bit shifters is configured to shift a digit of a respective input in a direction. The computer system of claim 83, wherein the direction is a left direction. The computer system of claim 83, wherein the direction is a right direction. The computer system of claim 77, wherein the one or more processors comprises one or more adders. The computer system of claim 86, wherein each respective adder of the one or more adders comprises respective summation instructions. The computer system of claim 77, wherein the step of obtaining the first divisor and the first dividend for division comprises receiving, by one or more processors, a request to perform a division operation including the first divisor and the first dividend. The computer system of claim 88, wherein the initial approximation is provided in response to the request to perform the division operation. The computer system of claim 88, wherein the method further comprises, prior to step c), generating, by the circuitry or the one or more processors, respectively, a second approximation of 1 divided by the divisor using an iterative refinement algorithm. The computer system of claim 90, wherein the iterative refinement algorithm uses Newton’s method of iterative approximation. The computer system of claim 90, wherein the iterative refinement algorithm uses Goldschmidt’s division algorithm. The computer system of claim 77, wherein step c) is performed by a respective hardware multiplier of the one or more hardware multipliers. The computer system of claim 77, wherein the first constant is determined by a minimax algorithm. The computer system of claim 77, wherein the second constant is determined by a minimax algorithm. A method of performing computer division, the method comprising: a) obtaining a first divisor and a first dividend for division; b) generating an initial approximation of 1 divided by the first divisor by processing the first divisor as an input using an approximation algorithm implemented by the circuitry to generate as an output the initial approximation using one or more constants, wherein the approximation algorithm is determined by the formula:

A(x) = 5(C1 - 1.03125x + x2) (C2 - 2.6875x + x2) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor; c) generating a first result of the division operation using the initial approximation of 1 divided by the first divisor and the first dividend; and d) transmitting the first result of the division operation. The method of claim 96, wherein the method is performed by a computer system comprising one or more memories operably connected to one or more processors. The method of claim 97, wherein the one or more processors includes one or more of reprogrammable hardware configured to perform the method, a circuit configured to perform the method, a processor configured to execute computer instructions which, upon execution, cause the processor to perform the method, a virtual machine which is configured to perform the method, and/or a network which is configured to perform the method.

Description:
CIRCUIT, SYSTEM AND METHOD FOR COMPUTER DIVISION APPROXIMATION

RELATED APPLICATION

[0001] This application claims priority to and the benefit of U.S. Provisional Application No. 63/329,662, filed April 11, 2022, and entitled SYSTEM AND METHOD FOR COMPUTER DIVISION APPROXIMATION, the content of which is hereby incorporated by reference herein in its entirety.

FIELD OF THE INVENTION

[0002] The present invention relates to circuits, systems and methods for computer division approximation.

BACKGROUND

[0003] Division is one of the four main arithmetic operations performed by a computer, but takes the longest to compute. On a modem central processing unit (CPU), addition takes a single cycle, multiplication takes 3 cycles, and division takes 12-80 cycles depending on the width of the division. Typical approaches to computer division used in fast CPUs involve making an initial guess, often using a lookup table algorithm, and then refining that guess through iterative approximation. A lookup table, which is an array that holds pre-calculated values, may take up a significant amount of silicon space on a processor. For example, an AMD K7 microcontroller uses a lookup table that is 32Kbits in size to computer an initial guess.

[0004] Some low-power CPUs with a division instruction (e.g., the ARM Cortex -M23, at 0.0088mm 2 ) use a slower algorithm which is similar to long division. However, an accelerated long division algorithm, SRT division, also uses a lookup table that is about 2 kilobits in size. Several of the smallest CPUs (e.g., the ARM Cortex-M0+ at 0.0066mm 2 ) do not have a division instruction at all.

[0005] Along these lines, while many small microcontrollers today are equipped with single-cycle multipliers, they use division algorithms that compute only one bit at a time, resulting in division operations with 32 or more CPU cycles of latency, stalling the core in use during the computing. Area constraints discourage the use of large lookup tables which are needed to use conventional fast division algorithms (e.g., the Newton -Raphson and Goldschmidt algorithms) that provide a useful speedup compared to these slower algorithms.

[0006] It would therefore be beneficial to have a hardware unit to produce an accurate initial guess for computer division to allow a microprocessor to have a more-quickly executed division instruction with less additional silicon.

SUMMARY OF INVENTION

[0007] In view of the above, it is the object of embodiments of the present invention to provide method and apparatus to overcome the technological challenges faced in conventional computer division methods.

[0008] The present invention provides circuits, systems and methods for computer division approximation.

[0009] According to an exemplary embodiment of the present invention, a circuit includes: an input port, wherein the input port is configured to receive a first binary number; a first multiplier operationally connected to the input port, wherein the first multiplier is configured to: receive as an input to the first multiplier the first binary number; and provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to: receive as an input to the first bit shifter the first binary number; and provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; a first inverter operationally connected to the first bit shifter, wherein the first inverter is configured to: receive as an input to the first inverter the first shifted binary number; and provide as an output of the first inverter an inverse of the first shifted binary number; a second inverter operationally connected to the input port wherein the second inverter is configured to: receive as an input to the second inverter the first binary number; and provide as an output of the second inverter an inverse of the first binary number; a first adder, operationally connected to the second inverter, wherein the first adder is configured to: receive as an input to the first adder the inverse of the first binary number; and provide as an output of the first adder a first factor, wherein the first factor is the sum of the input to the first adder and a first constant; a second adder, operationally connected to the first multiplier and the first inverter, wherein the second adder is configured to: receive as inputs to the second adder the inverse of the first shifted binary number and the first squared number; and provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to: receive as an input to the second multiplier the first factor and the second factor; and provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and a second bit shifter operationally connected to the second multiplier, wherein the second bit shifter is configured to: receive as an input to the second bit shifter the product; and provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the product shifted by a second number of bits in a second direction.

[0010] According to another exemplary embodiment of the present invention, a circuit includes: an input port, wherein the input port is configured to receive a first binary number; a first multiplier operationally connected to the input port, wherein the first multiplier is configured to: receive as an input to the first multiplier the first binary number; and provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to: receive as an input to the first bit shifter the first binary number; and provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; a first inverter operationally connected to the first bit shifter, wherein the first inverter is configured to: receive as an input to the first inverter the first shifted binary number; and provide as an output of the first inverter an inverse of the first shifted binary number; a second inverter operationally connected to the input port wherein the second inverter is configured to: receive as an input to the second inverter the first binary number; and provide as an output of the second inverter an inverse of the first binary number; a first adder, operationally connected to the second inverter and a first constant source, wherein the first constant source is configured to provide as an output of the first constant source a first constant, wherein the first adder is configured to: receive as inputs to the first adder the inverse of the first binary number and the first constant; and provide as an output of the first adder a first factor, wherein the first factor is the sum of the inputs to the first adder; a second adder, operationally connected to the first multiplier, the first inverter, and a second constant source, wherein the second constant source is configured to provide as an output of the second constant source a second constant, wherein the second adder is configured to: receive as inputs to the second adder the inverse of the first shifted binary number, the first squared number, and the second constant; and provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to: receive as an input to the second multiplier the first factor and the second factor; and provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and a second bit shifter operationally connected to the second multiplier, wherein the second bit shifter is configured to: receive as an input to the second bit shifter the product; and provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the product shifted by a second number of bits in a second direction.

[0011] According to another exemplary embodiment of the invention, a circuit may include: an input port, wherein the input port is configured to receive a first binary number; a first multiplier operationally connected to the input port, wherein the first multiplier is configured to: receive as an input to the first multiplier the first binary number; and provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to: receive as an input to the first bit shifter the first binary number; and provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; a second bit shifter operationally connected to the input port, wherein the second bit shifter is configured to: receive as an input to the second bit shifter the first binary number; and provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the second binary number shifted by a first number of bits in a first direction; a first inverter operationally connected to the second bit shifter, wherein the first inverter is configured to: receive an input to the first inverter the first shifted binary number; and provide as an output of the first inverter an inverse of the first shifted binary number; a second inverter operationally connected to the input port wherein the second inverter is configured to: receive as an input to the second inverter the first binary number; and provide as an output of the second inverter an inverse of the first binary number; a third inverter operationally connected to the first bit shifter, wherein the third inverter is configured to: receive as an input to the third inverter the second shifted binary number; and provide as an output of the third inverter an inverse of the second shifted binary number; and a first adder, operationally connected to the second inverter and the first inverter, wherein the first adder is configured to: receive as inputs to the first adder the inverse of the first binary number and the inverse of the first shifted number; and provide as an output of the first adder a first factor, wherein the first factor is the sum of the input to the first adder and a first constant; a second adder, operationally connected to the first multiplier and the second inverter, wherein the second adder is configured to: receive as inputs to the second adder the inverse of the third shifted binary number and the first squared number; and provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to: receive as an input to the second multiplier the first factor and the second factor; and provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and a third bit shifter operationally connected to the second multiplier, wherein the third bit shifter is configured to: receive as an input to the third bit shifter the product; and provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the product shifted by a second number of bits in a second direction. [0012] According to another exemplary embodiment of the invention, a circuit may include: an input port, wherein the input port is configured to receive a first binary number; a first multiplier operationally connected to the input port, wherein the first multiplier is configured to: receive as an input to the first multiplier the first binary number; and provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to: receive as an input to the first bit shifter the first binary number; and provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; a second bit shifter operationally connected to the input port, wherein the second bit shifter is configured to: receive as an input to the second bit shifter the first binary number; and provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the second binary number shifted by a first number of bits in a first direction; a first inverter operationally connected to the second bit shifter, wherein the first inverter is configured to: receive as an input to the first inverter the first shifted binary number; and provide as an output of the first inverter an inverse of the first shifted binary number; a second inverter operationally connected to the input port wherein the second inverter is configured to: receive as an input to the second inverter the first binary number; and provide as an output of the second inverter an inverse of the first binary number; a third inverter operationally connected to the first bit shifter, wherein the third inverter is configured to: receive as an input to the third inverter the second shifted binary number; and provide as an output of the third inverter an inverse of the second shifted binary number; and a first adder, operationally connected to the second inverter, the first inverter, a first constant source, wherein the first constant source is configured to provide as an output of the first constant source a first constant, wherein the first adder is configured to: receive as inputs to the first adder the inverse of the first binary number, the inverse of the first shifted number, and the first constant; and provide as an output of the first adder a first factor, wherein the first factor is the sum of the inputs to the first adder; a second adder, operationally connected to the first multiplier, the second inverter, and a second constant source, wherein the second constant source is configured to provide as an output of the second constant a second constant, wherein the second adder is configured to: receive as inputs to the second adder the inverse of the third shifted binary number, the first squared number, and the second constant; and provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder; a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to: receive as an input to the second multiplier the first factor and the second factor; and provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and a third bit shifter operationally connected to the second multiplier, wherein the third bit shifter is configured to: receive as an input to the third bit shifter the product; and provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the product shifted by a second number of bits in a second direction. [0013] According to an exemplary embodiment of the invention, a processor may include circuitry configured to perform a method for computer division approximation including: a) obtaining, by the circuitry, a first divisor and a first dividend for division from a register file of a microprocessor; b) generating, by the circuitry, an initial approximation of 1 divided by the first divisor by processing the first divisor as an input using an approximation algorithm implemented by the circuitry to generate as an output the initial approximation using one or more constants, wherein the approximation algorithm is determined by the formula: A(x) = 5(C1 - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor; c) generating, by the circuitry, a first result of the division operation using the initial approximation of 1 divided by the first divisor and the first dividend; and d) transmitting, by the circuitry, the first result of the division operation to the register file of the microprocessor.

[0014] According to an exemplary embodiment of the invention, a computer system may include one or more memories and one or more processors, the one or more memories comprising computer-readable instructions that when executed cause the one or more processors to perform a method for computer division approximation including: a) obtaining, by the one or more processors, a first divisor and a first dividend for division from the one or more memories; b) generating, by the one or more processors, an initial approximation of 1 divided by the first divisor by processing the first divisor as an input using an approximation algorithm implemented by the processor to generate as an output the initial approximation using one or more constants, wherein the approximation algorithm is determined by the formula:

A(x) = 5(C1 - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor; c) generating, by the one or more processors, a first result of the division operation using the initial approximation of 1 divided by the first divisor and the first dividend; and d) transmitting, by the one or more processors, the first result of the division operation to the one or more memories.

[0015] According to exemplary embodiments, a method of performing computer division may include: a) obtaining a first divisor and a first dividend for division; b) generating an initial approximation of 1 divided by the first divisor by processing the first divisor as an input using an approximation algorithm implemented by the circuitry to generate as an output the initial approximation using one or more constants, wherein the approximation algorithm is determined by the formula:

A(x) = 5(Cl - 1.03125x + x2) (C2 - 2.6875x + x2) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor; c) generating a first result of the division operation using the initial approximation of 1 divided by the first divisor and the first dividend; and d) transmitting the first result of the division operation. BRIEF DESCRIPTION OF THE DRAWINGS

[0016] The above and related objects, features and advantages of embodiments of the present invention will be more fully understood by reference to the following detailed description of the preferred, albeit illustrative, embodiments of the present invention when taken in conjunction with the accompanying figures, wherein:

[0017] FIG. 1 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0018] FIG. 1 A depicts a dot diagram of summed bits in a Booth multiplier in accordance with conventional methods of computer multiplication;

[0019] FIG. IB depicts a mapping of a division approximation polynomial onto the multiplier partial products depicted in FIG. 1A in accordance with an embodiment of the present invention;

[0020] FIG. 2 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0021] FIG. 3 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0022] FIG. 4 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0023] FIG. 5 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0024] FIG. 6 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0025] FIG. 7 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0026] FIG. 8 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0027] FIG. 9 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0028] FIG. 10 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention; [0029] FIG. 11 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0030] FIG. 12 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0031] FIG. 13 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0032] FIG. 14 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0033] FIG. 15 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0034] FIG. 16 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0035] FIG. 17 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention;

[0036] FIG. 18 is a schematic diagram of a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention; and

[0037] FIG. 19 is a process flow for a processor configured to perform a method for computer division approximation in accordance with embodiments of the present invention.

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS

[0038] The present invention generally relates to a system and method for computer division approximation.

[0039] Embodiments of the present invention provide a useful alternative to using lookup tables for initial approximation step in division and provide circuits, systems and methods by which area-constrained microcontrollers can perform relatively quick division. Embodiments of the present invention overcome these technical challenges by augmenting a single-cycle multiplier to compute a heavily-quantized polynomial approximation of the inverse of a denominator “D” (1/D). In embodiments, the approximation may have an 8-bit precision, is capable of computing in a single cycle using some of the hardware from a multiple, and comes at cost of only an 8% increase in area. In embodiments, compared to similar area footprints used in restoring division, embodiments of the present invention may reduce cycles spent on division by a factor of 3. [0040] In embodiments, a circuit may include: an input port, wherein the input port is configured to receive a first binary number; a first multiplier operationally connected to the input port, wherein the first multiplier is configured to: receive as an input to the first multiplier the first binary number; and provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to: receive as an input to the first bit shifter the first binary number; and provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; a first inverter operationally connected to the first bit shifter, wherein the first inverter is configured to: receive as an input to the first inverter the first shifted binary number; and provide as an output of the first inverter an inverse of the first shifted binary number; a second inverter operationally connected to the input port wherein the second inverter is configured to: receive as an input to the second inverter the first binary number; and provide as an output of the second inverter an inverse of the first binary number; a first adder, operationally connected to the second inverter, wherein the first adder is configured to: receive as an input to the first adder the inverse of the first binary number; and provide as an output of the first adder a first factor, wherein the first factor is the sum of the input to the first adder and a first constant; a second adder, operationally connected to the first multiplier and the first inverter, wherein the second adder is configured to: receive as inputs to the second adder the inverse of the first shifted binary number and the first squared number; and provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to: receive as an input to the second multiplier the first factor and the second factor; and provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and a second bit shifter operationally connected to the second multiplier, wherein the second bit shifter is configured to: receive as an input to the second bit shifter the product; and provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the product shifted by a second number of bits in a second direction.

[0041] In embodiments, a circuit may include: an input port, wherein the input port is configured to receive a first binary number; a first multiplier operationally connected to the input port, wherein the first multiplier is configured to: receive as an input to the first multiplier the first binary number; and provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to: receive as an input to the first bit shifter the first binary number; and provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; a first inverter operationally connected to the first bit shifter, wherein the first inverter is configured to: receive as an input to the first inverter the first shifted binary number; and provide as an output of the first inverter an inverse of the first shifted binary number; a second inverter operationally connected to the input port wherein the second inverter is configured to: receive as an input to the second inverter the first binary number; and provide as an output of the second inverter an inverse of the first binary number; a first adder, operationally connected to the second inverter and a first constant source, wherein the first constant source is configured to provide as an output of the first constant source a first constant, wherein the first adder is configured to: receive as inputs to the first adder the inverse of the first binary number and the first constant; and provide as an output of the first adder a first factor, wherein the first factor is the sum of the inputs to the first adder; a second adder, operationally connected to the first multiplier, the first inverter, and a second constant source, wherein the second constant source is configured to provide as an output of the second constant source a second constant, wherein the second adder is configured to: receive as inputs to the second adder the inverse of the first shifted binary number, the first squared number, and the second constant; and provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to: receive as an input to the second multiplier the first factor and the second factor; and provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and a second bit shifter operationally connected to the second multiplier, wherein the second bit shifter is configured to: receive as an input to the second bit shifter the product; and provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the product shifted by a second number of bits in a second direction.

[0042] In embodiments, the second direction may be to the left. [0043] In embodiments, the second adder is further operationally connected to the second inverter. In embodiments, the inputs to the second adder further include the inverse of the first binary number.

[0044] In embodiments, first direction is to the right, the first number of bits is one, and the second number of bits is two.

[0045] In embodiments, the circuit includes a third adder, operationally connected to the second multiplier and the second bit shifter, wherein the third adder is configured to: receive as inputs to the third adder the product and the second shifted binary number; and provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation of one divided by the first binary number is the sum of the inputs to the third adder.

[0046] In embodiments, the circuit further may include a third bit shifter operationally connected to the input port, wherein the third bit shifter is configured to: receive as an input to the third bit shifter the first binary number; and provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the first binary number shifted by a third number of bits in a third direction; and a third inverter operationally connected to the third bit shifter, wherein the third inverter is configured to: receive as an input to the third bit shifter the third shifted binary number; and provide as an output of the third bit shifter an inverse of the third shifted binary number; wherein the second adder is further operationally connected to the third inverter and wherein the inputs to the second adder further include the third shifted binary number.

[0047] In embodiments, the first direction is to the right, the first number of bits is two, the second number of bits is two, the third direction is to the right, and the third number of bits is three.

[0048] In embodiments, the circuit may include a third bit shifter operationally connected to the input port, wherein the third bit shifter is configured to: receive as an input to the third bit shifter the first binary number; and provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the first binary number shifted by a third number of bits in a third direction; and a fourth bit shifter operationally connected to the input port, wherein the fourth bit shifter is configured to: receive as an input to the fourth bit shifter the first binary number; and provide as an output of the fourth bit shifter a fourth shifted binary number, wherein the fourth shifted binary number is the first binary number shifted by a fourth number of bits in a fourth direction. In embodiments, the second adder is further operationally connected to the third bit shifter and the fourth bit shifter, and wherein the inputs to the second adder further include the third shifted binary number and the fourth shifted binary number.

[0049] In embodiments, first direction is to the left, the first number of bits is one, the second number of bits is two, the third direction is to the right, the third number of bits is two, the fourth direction is to the right and the fourth number of bits is four.

[0050] In embodiments, the circuit may include a third bit shifter operably connected to the second multiplier, wherein the third bit shifter is configured to: receive as an input to the third bit shifter the product; and provide as an output of the third bit shifter a second shifted binary number, wherein the second shifted binary number is the product shifted by a second number of bits in a second direction. In embodiment, the third adder may be further operationally connected to the third bit shifter and the inputs to the third adder further include the third shifted binary number.

[0051] In embodiments, the first direction may be to the right, the first number of bits is one, the second number of bits is one, the third direction is to the right, and the third number of bits is one.

[0052] In embodiments, a fourth bit shifter may be operationally connected to the input port, wherein the fourth bit shifter is configured to: receive as an input to the fourth bit shifter the first binary number; and provide as an output of the fourth bit shifter a fourth shifted binary number, wherein the fourth shifted binary number is the first binary number shifted by a fourth number of bits in a fourth direction.

[0053] In embodiments, second adder is further operationally connected to the fourth bit shifter and the inputs to the second adder further includes the fourth bit shifter.

[0054] In embodiments, the circuit further may include: a third inverter operationally connected to the fourth bit shifter, wherein the third inverter is configured to: receive as an input the fourth shifted binary number; and provide as an output an inverse of the fourth shifted binary number. In embodiments, the second adder is further operationally connected to the third inverter and the input to the second adder further includes the fourth shifted binary number.

[0055] In embodiments, the fourth direction is to the right and the fourth number of bits is four.

[0056] According to another exemplary embodiment of the invention, a circuit may include: an input port, wherein the input port is configured to receive a first binary number; a first multiplier operationally connected to the input port, wherein the first multiplier is configured to: receive as an input to the first multiplier the first binary number; and provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to: receive as an input to the first bit shifter the first binary number; and provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; a second bit shifter operationally connected to the input port, wherein the second bit shifter is configured to: receive as an input to the second bit shifter the first binary number; and provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the second binary number shifted by a first number of bits in a first direction; a first inverter operationally connected to the second bit shifter, wherein the first inverter is configured to: receive an input to the first inverter the first shifted binary number; and provide as an output of the first inverter an inverse of the first shifted binary number; a second inverter operationally connected to the input port wherein the second inverter is configured to: receive as an input to the second inverter the first binary number; and provide as an output of the second inverter an inverse of the first binary number; a third inverter operationally connected to the first bit shifter, wherein the third inverter is configured to: receive as an input to the third inverter the second shifted binary number; and provide as an output of the third inverter an inverse of the second shifted binary number; and a first adder, operationally connected to the second inverter and the first inverter, wherein the first adder is configured to: receive as inputs to the first adder the inverse of the first binary number and the inverse of the first shifted number; and provide as an output of the first adder a first factor, wherein the first factor is the sum of the input to the first adder and a first constant; a second adder, operationally connected to the first multiplier and the second inverter, wherein the second adder is configured to: receive as inputs to the second adder the inverse of the third shifted binary number and the first squared number; and provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to: receive as an input to the second multiplier the first factor and the second factor; and provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and a third bit shifter operationally connected to the second multiplier, wherein the third bit shifter is configured to: receive as an input to the third bit shifter the product; and provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the product shifted by a second number of bits in a second direction. [0057] According to another exemplary embodiment of the invention, a circuit may include: an input port, wherein the input port is configured to receive a first binary number; a first multiplier operationally connected to the input port, wherein the first multiplier is configured to: receive as an input to the first multiplier the first binary number; and provide as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; a first bit shifter operationally connected to the input port, wherein the first bit shifter is configured to: receive as an input to the first bit shifter the first binary number; and provide as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; a second bit shifter operationally connected to the input port, wherein the second bit shifter is configured to: receive as an input to the second bit shifter the first binary number; and provide as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the second binary number shifted by a first number of bits in a first direction; a first inverter operationally connected to the second bit shifter, wherein the first inverter is configured to: receive as an input to the first inverter the first shifted binary number; and provide as an output of the first inverter an inverse of the first shifted binary number; a second inverter operationally connected to the input port wherein the second inverter is configured to: receive as an input to the second inverter the first binary number; and provide as an output of the second inverter an inverse of the first binary number; a third inverter operationally connected to the first bit shifter, wherein the third inverter is configured to: receive as an input to the third inverter the second shifted binary number; and provide as an output of the third inverter an inverse of the second shifted binary number; and a first adder, operationally connected to the second inverter, the first inverter, a first constant source, wherein the first constant source is configured to provide as an output of the first constant source a first constant, wherein the first adder is configured to: receive as inputs to the first adder the inverse of the first binary number, the inverse of the first shifted number, and the first constant; and provide as an output of the first adder a first factor, wherein the first factor is the sum of the inputs to the first adder; a second adder, operationally connected to the first multiplier, the second inverter, and a second constant source, wherein the second constant source is configured to provide as an output of the second constant a second constant, wherein the second adder is configured to: receive as inputs to the second adder the inverse of the third shifted binary number, the first squared number, and the second constant; and provide as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder; a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier is configured to: receive as an input to the second multiplier the first factor and the second factor; and provide as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and a third bit shifter operationally connected to the second multiplier, wherein the third bit shifter is configured to: receive as an input to the third bit shifter the product; and provide as an output of the third bit shifter a third shifted binary number, wherein the third shifted binary number is the product shifted by a second number of bits in a second direction. [0058] In embodiments, the third direction is to the left.

[0059] In embodiments, the circuit further includes: a fourth bit shifter operationally connected to the input port, wherein the fourth bit shifter is configured to: receive as an input to the fourth bit shifter the first binary number; and provide as an output of the fourth bit shifter a fourth shifted binary number, wherein the fourth shifted binary number is the first binary number shifted by a fourth number of bits in a fourth direction; a fifth bit shifter operationally connected to the input port, wherein the fifth bit shifter is configured to: receive as an input to the fifth bit shifter the first binary number; and provide as an output of the fifth bit shifter a fifth shifted binary number, wherein the fifth shifted binary number is the first binary number shifted by a fifth number of bits in a fifth direction.

[0060] In embodiments, second adder is further operationally connected to the fourth bit shifter and the fifth bit shifter, and wherein the inputs to the second adder further include the fourth shifted binary number and the fifth shifted binary number.

[0061] In embodiments, the first direction is to the right, the first number of bits is five, the second direction is to the left, the second number of bits is two, the third number of bits is two, the fourth direction is to the right, the fourth number of bits is two, the fifth direction is to the right, the fifth number of bits is five.

[0062] In embodiments, the second adder is further operationally connected to the second inverter and wherein the inputs to the second adder further include the inverse of the first binary number.

[0063] In embodiments, the circuit further includes :a third adder operationally connected to the third bit shifter and the second multiplier, wherein the third adder is configured to: receive as inputs to the third adder the product and the third shifted binary number; provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation is the sum of the inputs to the third adder.

[0064] In embodiments, the first direction is to the right, the second direction is to the left, the fourth direction is to the right, the fourth number of bits is two, the fifth direction is the right, and the fifth number of bits is four.

[0065] In embodiments, the first number of bits is five, the second number of bits is one, and the third number of bits is two. In embodiments, the second adder is further operationally connected to the second inverter and the inputs to the second adder further include the inverse of the first binary number.

[0066] In embodiments, the first number of bits is five, the second number of bits is two, and the third number of bits is two. In embodiments, second adder is further operationally connected to the first input and the inputs to the second adder further include the first binary number.

[0067] In embodiments, the first number of bits is four, the second number of bits is one, and the third number of bits is one. In embodiments, the second adder is further operationally connected to the second inverter and the inputs to the second adder further include the inverse of the first binary number.

[0068] In embodiments, the first number of bits is four, the second number of bits is one, and the third number of bits is two. In embodiments, the second adder is further operationally connected to the second inverter and the inputs to the second adder further include the inverse of the first binary number.

[0069] In embodiments, the circuit includes a sixth bit shifter operationally connected to the second multiplier, wherein the sixth bit shifter is configured to: receive as an input to the sixth bit shifter the product; and provide as an output of the sixth bit shifter a sixth shifted binary number, wherein the sixth shifted binary number is the product shifted by a sixth number of bits in a sixth direction; and a third adder operationally connected to the second bit shifter and the sixth bit shifter, wherein the third adder is configured to: receive as inputs to the third adder the third shifted binary number and the sixth shifted binary number; provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation is the sum of the inputs to the third adder.

[0070] In embodiments, the first direction is to the right, the first number of bits is five, the second direction is to the left, the third number of bits is two, the fourth direction is to the right, the fourth number of bits is two, the fifth direction is to the right, the fifth number of bits is four, the sixth direction is to the left and the sixth number of bits is one.

[0071] In embodiments, wherein the second number of bits is one.

[0072] In embodiments, the second adder is further operationally connected to the second inverter and the inputs to the second adder further include the inverse of the first binary number. [0073] In embodiments, the third adder is further operationally connected to the to the second multiplier and the inputs to the third adder further include the product.

[0074] In embodiments, the second number of bits is two.

[0075] In embodiments, the second adder is further operationally connected to the first input and wherein the input to the second adder further includes the first binary number.

[0076] In embodiments, the circuit further includes: a fourth inverter operationally connected to the third bit shifter, wherein the fourth inverter is configured to: receive as an input to the fourth inverter the fourth shifted binary number; and provide as an output of the fourth inverter an inverse of the fourth shifted binary number; and a fifth inverter operationally connected to the fourth bit shifter, wherein the fifth inverter is configured to: receive as an input to the fifth inverter the fourth bit shifted binary number; and provide as an output of the fifth inverter an inverse of the fifth shifted binary number. In embodiments, the second adder is further operationally connected to the fourth inverter and the fifth inverter. In embodiments, the inputs to the second adder further include the inverse of the fourth binary numbers and the fifth binary numbers.

[0077] In embodiments, the circuit includes a third adder operationally connected to the third bit shifter and the second multiplier, wherein the third adder is configured to: receive as inputs to the third adder the product and the third shifted binary number; and provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation is the sum of the inputs to the third adder.

[0078] In embodiments, the first direction is to the right, the first number of bits is one, the second direction is to the left, the second number of bits is one, the second number of bits is two, the fourth direction is to the right, the fourth number of bits is two, and the fifth direction is to the right.

[0079] In embodiments, the fifth number of bits is four.

[0080] In embodiments, the fifth number of bits is three.

[0081] In embodiments, the circuit includes a sixth bit shifter operationally connected to the input port, wherein the sixth bit shifter is configured to: receive as an input to the sixth bit shifter the first binary number; and provide as an output of the sixth bit shifter the sixth bit shifted binary number; and a sixth inverter operationally connected to the sixth bit shifter, wherein the sixth inverter is configured to: receive as an input to the sixth inverter the sixth bit shifted binary number; and provide as an output of the sixth inverter an inverse of the sixth bit shifted binary number. In embodiments, the second adder is further operationally connected to the sixth inverter. In embodiments, the inputs to the second adder further include the inverse of the sixth bit shifted binary number.

[0082] In embodiments, the first direction is to the right, the first number of bits is five, the second direction is to the left, the second number of bits is one, the third number of bits is two, the fourth direction is to the right, the fourth number of bits is one, the fifth direction is to the right, the fifth number of bits is three, the sixth direction is to the right, and the sixth number of bits is four.

[0083] In embodiments, the circuit includes: a sixth bit shifter operationally connected to the input port, wherein the sixth bit shifter is configured to receive as an input to the sixth bit shifter the first binary number; and provide as an output of the sixth bit shifter the sixth bit shifted binary number; and a sixth inverter operationally connected to the sixth bit shifter, wherein the sixth inverter is configured to: receive as an input to the sixth inverter the sixth bit shifted binary number; and provide as an output of the sixth inverter an inverse of the sixth bit shifted binary number; a seventh bit shifter operationally connected to the second multiplier, wherein the seventh bit shifter is configured to: receive as an input to the seventh bit shifter the product; and provide as an output of the seventh bit shifter a seventh bit shifted binary number, wherein the seventh bit shifted binary number is the product shifted by a seventh number of bits in a seventh direction; a third adder operationally connected to the third bit shifter and the seventh bit shifter, wherein the third adder is configured to: receive as inputs to the third adder the third shifted binary number and the seventh shifted binary number; and provide as an output of the third adder an approximation of one divided by the first binary number, wherein the approximation is the sum of the inputs to the third adder. In embodiments, the second adder is further operationally connected to the sixth inverter. In embodiments, the inputs to the second adder further includes the sixth bit shifted binary number.

[0084] In embodiments, the first direction is to the right, the first number of bits is five, the second direction is to the left, the second number of bits is one, the third number of bits is two, the fourth direction is to the right, the fourth number of bits is 1, the fourth number of bits is three, the fifth direction is to the right, the fifth number of bits is three, the sixth direction is to the right, the sixth number of bits is four, the seventh direction is to the left, and the seventh number of bits is one.

[0085] In embodiments, the circuit may include: an input port, wherein the input port receives a first binary number; a first multiplier operationally connected to the input port, wherein the first multiplier: receives as an input to the first multiplier the first binary number; andprovides as an output of the first multiplier a squared binary number, wherein the squared binary number is the first binary number squared; a first bit shifter operationally connected to the input port, wherein the first bit shifter: receives as an input to the first bit shifter the first binary number; and provides as an output of the first bit shifter a first shifted binary number, wherein the first shifted binary number is the first binary number shifted by a first number of bits in a first direction; a first inverter operationally connected to the first bit shifter, wherein the first inverter: receives as an input to the first inverter the first shifted binary number; and provides as an output of the first inverter an inverse of the first shifted binary number; a second inverter operationally connected to the input port wherein the second inverter: receives as an input to the second inverter the first binary number; and provides as an output of the second inverter an inverse of the first binary number; a first adder, operationally connected to the second inverter, wherein the first adder: receives as an input to the first adder the inverse of the first binary number; and provides as an output of the first adder a first factor, wherein the first factor is the sum of the input to the first adder and a first constant; a second adder, operationally connected to the first multiplier and the first inverter, wherein the second adder: receives as inputs to the second adder the inverse of the first shifted binary number and the first squared number; and provides as an output of the second adder a second factor, wherein the second factor is the sum of the inputs to the second adder and a second constant; a second multiplier operationally connected to the first adder and the second adder, wherein the second multiplier: receives as an input to the second multiplier the first factor and the second factor; and provides as an output of the second multiplier a product, wherein the product is the product of the second factor multiplied by the first factor; and a second bit shifter operationally connected to the second multiplier, wherein the second bit shifter: receives as an input to the second bit shifter the product; and provides as an output of the second bit shifter a second shifted binary number, wherein the second shifted binary number is the product shifted by a second number of bits in a second direction.

[0086] In embodiments, the first constant is determined by a minimax algorithm.

[0087] In embodiments, the second constant is determined by a minimax algorithm. [0088] In embodiments, the constant source is a register.

[0089] In embodiments, the constant source is circuitry.

[0090] In embodiments, the circuit is a component of a processor.

[0091] In embodiments, the circuit is a component of a field programmable gate array.

[0092] In embodiments, the circuit is a component of an application-specific integrated circuit.

[0093] In embodiments, a processor may include circuitry configured to perform a method for computer division approximation including: a) obtaining, by the circuitry, a first divisor and a first dividend for division from a register file of a microprocessor; b) generating, by the circuitry, an initial approximation of 1 divided by the first divisor by processing the first divisor as an input using an approximation algorithm implemented by the circuitry to generate as an output the initial approximation using one or more constants, wherein the approximation algorithm is determined by the formula:

A(x) = 5(C1 - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor; c) generating, by the circuitry, a first result of the division operation using the initial approximation of 1 divided by the first divisor and the first dividend; and d) transmitting, by the circuitry, the first result of the division operation to the register file of the microprocessor.

[0094] In embodiments, a computer system may include one or more memories and one or more processors, the one or more memories comprising computer-readable instructions that when executed cause the one or more processors to perform a method for computer division approximation including: a) obtaining, by the one or more processors, a first divisor and a first dividend for division from the one or more memories; b) generating, by the one or more processors, an initial approximation of 1 divided by the first divisor by processing the first divisor as an input using an approximation algorithm implemented by the processor to generate as an output the initial approximation using one or more constants, wherein the approximation algorithm is determined by the formula:

A(x) = 5(C1 - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor; c) generating, by the one or more processors, a first result of the division operation using the initial approximation of 1 divided by the first divisor and the first dividend; and d) transmitting, by the one or more processors, the first result of the division operation to the one or more memories.

[0095] In embodiments, a method of performing computer division may include: a) obtaining a first divisor and a first dividend for division; b) generating an initial approximation of 1 divided by the first divisor by processing the first divisor as an input using an approximation algorithm implemented by the circuitry to generate as an output the initial approximation using one or more constants, wherein the approximation algorithm is determined by the formula:

A(x) = 5(Cl - 1.03125x + x2) (C2 - 2.6875x + x2) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor; c) generating a first result of the division operation using the initial approximation of 1 divided by the first divisor and the first dividend; and d) transmitting the first result of the division operation.

[0096] In embodiments, the circuitry and computer system may include a microcontroller. [0097] In embodiments, the circuitry and computer system may include one or more hardware multipliers.

[0098] In embodiments, respective hardware multiplier of the one or more hardware multipliers may include respective multiplication instructions.

[0099] In embodiments, the circuitry and computer system may include one or more fused multiply add operators.

[0100] In embodiments, the circuitry and computer system may include one or more bit shift operators.

[0101] In embodiments, each respective bit shift operator of the one or more bit shift operators is configured to shift a digit of a respective input in a direction.

[0102] In embodiments, the direction is a left direction.

[0103] In embodiments, the direction is a right direction.

[0104] In embodiments, the circuitry and computer system may include one or more adders.

[0105] In embodiments, each respective adder of the one or more adders may include respective summation instructions.

[0106] In embodiments, the step of obtaining the first divisor and the first dividend for division may include receiving, by the circuitry, a request to perform a division operation including the first divisor and the first dividend. [0107] In embodiments, the initial approximation is provided in response to the request to perform the division operation.

[0108] In embodiments, the method may further include, prior to step c), generating, by the circuitry, a second approximation of 1 divided by the divisor using an iterative refinement algorithm.

[0109] In embodiments, the iterative refinement algorithm uses Newton’s method of iterative approximation.

[0110] In embodiments, the iterative refinement algorithm uses Goldschmidt’s division algorithm.

[OHl] In embodiments, the first constant is determined by a minimax algorithm.

[0112] In embodiments, the second constant is determined by a minimax algorithm.

[0113] In embodiments, step c) is performed by a respective hardware multiplier of the one or more hardware multipliers.

[0114] In embodiments, the method is performed by a computer system including one or more memories operably connected to one or more processors. In embodiments, the one or more processors includes one or more of reprogrammable hardware configured to perform the method, a circuit configured to perform the method, a processor configured to execute computer instructions which, upon execution, cause the processor to perform the method, a virtual machine which is configured to perform the method, and/or a network which is configured to perform the method.

[0115] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 6(Cl - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0116] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 7(Cl - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0117] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 4(Cl - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0118] In embodiments, the approximation algorithm is determined by the formula: A(x) = 3(Cl - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0119] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 5(C1 - 1.03125x + x 2 ) (C2 - 2.75x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0120] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 5(C1 - 1.03125x + x 2 ) (C2 - 2.625x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0121] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 5(C1 - x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0122] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 5(C1 -1.0625x + x 2 ) (C2 - 2.6875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0123] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 3.5(Cl — x) (C2 - 1.5x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0124] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 3.5(Cl — x) (C2 - 1.53125x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0125] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 3.5(Cl — x) (C2 - 1.46875x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0126] In embodiments, the approximation algorithm is determined by the formula:

A(x) = 4(C1 - x) (C2 - 1 ,5x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

[0127] In embodiments, the approximation algorithm is determined by the formula: A(x) = 5 (Cl - x) (C2 - 1 ,375x + x 2 ) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the first divisor.

DEFINITIONS

[0128] As used herein, the term adder refers to a circuit component that performs binary addition. In embodiments, the adder may be, for example, a ripple-carry adder, a carrylookahead adder, carry-bypass adders, parallel prefix adders, or carry-save adders, to name a few.

[0129] As used herein, the term multiplier refers to a circuit component that performs binary multiplication. In embodiments, the multiplier may, for example, multiply by summing partial products. In embodiments, the multiplier may be, for example, an array multiplier, a Bough-Wooley multiplier, a Wallace tree multiple, or Dadda multiplier, to name a few. In embodiments, the multiplier may use Booth encoding in the partial product generation stage to reduce the number of partial products needed to be summed. For example, in embodiments, the multiplier may be a Booth code Wallace tree multiplier.

[0130] As used herein, the term shifter (also referred to as a “bit shifter” or “bit shift operator”) refers to a circuit component that shifts a circuit bus in a given direction (E.g., left or right) by a certain number of bits. In embodiments, for example, a shifter may include wire connections that shift a bus to the left or to the right. For example, the shifter may shift an input to the right by one by connecting the most significant bit of the input to the next bit over. In embodiments, a shifter may include a barrel shifter and a funnel shifter, to give two examples. [0131] As used herein, in embodiments, the term inverter refers to a circuit component that performs a bit-wise inversion of a binary number. For example, in embodiments, the inverter, using physical electronics, may invert 11001100 as 00110011. In embodiments, the inverter may be a not gate.

[0132] In embodiments of the present invention, the various components, while described distinctly, may be combined or separated. In embodiments, for example, a fused multiply adder may be used instead of both an adder and a multiplier. Continuing the example, instead of first using a multiplier to take the product of the two numbers and second using an adder to output the sum of the of the product of the two numbers and a third number, a fused multiplier may output the product of the two numbers and the third number in a single step.

EXAMPLES

[0133] FIG. 1 is a schematic diagram of a processor 100 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, the processor 100 may include circuitry 108. In embodiments, the circuitry 108 may include one or more hardware multipliers 102, one or more bit shift operators (also referred to as “bit shifters”) 104, and one or more adders 106. For example, in embodiments, the circuitry 108 may include a first hardware multiplier 102-1 which may include square instructions, and a second hardware multiplier 102-2 which may include multiplication instructions. In embodiments, the first hardware multiplier 102-1 and the second hardware multiplier 102-2 may be embodied separate circuits. In embodiments, the first hardware multiplier 102-1 and the second hardware multiplier 102-2 may be embodied in the same circuit. In embodiments, the first hardware multiplier 102-1 and the second hardware multiplier 102-2 may be combined with existing multipliers in a microprocessor. In embodiments, the multiplier may be an 8-bit multiplier, a 16-bit multiplier, or a 32-bit multiplier, for example, to name a few. In embodiments, the circuitry 108 may include one or more fused multiply add operators. In embodiments, the one or more fused multiply add operators may include multiply add instructions. In embodiments, the fused multiply add instructions may include instructions for computing the product of two numbers and providing the result to an accumulator. In embodiments, the operations performed by the one or more multipliers 102 and the one or more adders 106 may be performed by the one or more fused multiply add operators.

[0134] In embodiments, the circuitry 108 may include a first bit shift operator 104-1, a second bit-shift operator 104-2, a third bit shift operator 104-3, a fourth bit shift operator 104- 4, and a fifth bit shift operator 104-5, to name a few. In embodiments, a bit shift operator 104 may be used to shift the digit of each bit of the input to the left or the right.

[0135] In embodiments, the circuitry 108 may include, for example, a first adder 106-1, a second adder 106-2, and a third adder 106-3. In embodiments, each adder 106 may include circuits for the summation of one or more terms as inputs. [0136] In embodiments, the circuitry 108 may be configured to perform a method for computer division approximation. In embodiments, the computer division approximation may be an approximation of (1/x) where x is a divisor. In embodiments, Chebyshev and Remez approximation algorithms may be used to compute a fourth order polynomial which approximates the function (1/x). In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 8 bits using the following formula:

ApproxRecip x) = 5(c 1 — 2.6875% + % 2 )(c 2 — 1.03125% + % 2 ) (Eq. 1) where c « 1.97955322265625 and c 2 ® 0.71752926875, and are computed using a minimax algorithm. In embodiments, with the given coefficients, this will produce a result with at least 10.2 bits of precision. In embodiments, when x is not between 0.5 and 1, it can be prescaled to be put in that range. In embodiments, by virtue of using specific hardcoded coefficients, this function is relatively computationally easy to compute in silicon, taking only an ~8-bit multiplication, several additions, and a ~12-bit multiplication. In comparison, a Taylor or Chebyshev approximation would take significantly more silicon (at least 5 multiplications) in order to produce an approximation, and would operate much more slowly. In embodiments, the formula may be a fourth order polynomial equation.

[0137] In embodiments, the circuity required to implement Eq. 1 (and as will be appreciated, the other equations in this formula) may be reduced by using quantization. For example, in embodiments, with respect to Fig. 1, the quantization be such by: computing % 2 with a 13-bit input width, computing p = ( — 2.6875% + % 2 ) with 14-bit precision, computing p 2 = (c 2 — 1.03125% + % 2 ) with 12-bit precision, and multiplying p x and p 2 with full precision (e.g., 14 bits by 12 bits). In such an embodiment, the accuracy level is reduced to slightly more than 8 bits as quantized, as opposed to 10.2 bits as unquantized, but results in a small circuit, taking less than 10 narrow additions and 2 small multiplications to fully compute. Table 1 depicts exemplary constants used in an embodiment for precise estimation, over-estimation, and under-estimation, respectively, and related accuracy bounds:

Table 1 : Computed Constants for Equation in 1 in Decimal and Fixed Point Notation

Constants Accuracy Bounds c 2 min (A 1/D (%)%) max (A 1/D (%)%)

1.97968 0.71753 0.99814 1.00178

0x7eb3 0xb7b 0xff85dlcc 0xl0074aff6

1.97979 0.71777 1.00006 1.00369

0x7eb5 0xb7c 0xl0003dbb7 0xl00flb092 1.97925 0.71729 0.99664 0.99996

0x7eac 0xb7a 0xff2413c5 0xfffd77al

[0138] In embodiments, higher-order polynomials and higher-order terms may be used. For example, in embodiments, a sixth order polynomial as a product of second and third order polynomials may be used. In embodiments, with the given coefficients, Eq. 1 will produce a result with at least 9.2 bits of precision with Cl (also written herein as ci) and C2 (also written herein as C2) chosen to always overestimate or underestimate the division result. In embodiments, using such constants the approximation will lose one bit of precision.

[0139] In embodiments, referring to FIG. 1, the circuitry 108 may be used to implement the formula in Eq. 1 to generate an approximation of (1/x). In embodiments, the circuity 108 may be used to implement a quantization of the formula in Eq. 1. In embodiments, the processor 100 may be required to perform a computer division, where a dividend is to be divided by a divisor. Therefore, in embodiments, an approximation may be made for 1 divided by the divisor, x. In embodiments, the divisor x and a dividend may be obtained by the circuit 108 from a register file of a microprocessor. In embodiments, the register file may be a memory storage of the microprocessor. In embodiments, the step of obtaining the divisor and dividend for division may include the circuitry 108 receiving a request to perform a division operation including the divisor and the dividend. In embodiments, the request to perform the division operation may be an instruction by the microprocessor to divide the dividend by the divisor. In embodiments, the instruction may be a “DIV” instruction. In embodiments, a calculation of the square of the divisor may be performed by the first hardware multiplier 102-1. In embodiments, the first bit shift operator 104-1 may perform a bit shift of x by 5 digits left. In embodiments, the first adder 106-1 may obtain the squared divisor, x 2 , from the first hardware multiplier 102- 1, the divisor x, the 5-digit left bit-shifted divisor, and Cl. In embodiments, the instructions performed by the first adder 106-1 may be pre-programmed. In embodiments, the first adder 106-1 may determine the summation of the values in the first term of the fourth order polynomial.

[0140] In embodiments, the second bit shift operator 104-2 may perform a bit shift of the divisor by 1 digit right. In embodiments, the second bit shift operator 104-2 may perform a bit shift of the divisor by 2 digits left. In embodiments, the third bit shift operator 104-3 may perform a bit shift of the divisor by 4 digits left. In embodiments, the second adder 106-2 may obtain the squared divisor, x 2 from the first hardware multiplier 102-1, the divisor x, the 1 -digit right bit-shifted divisor, the 2-digit left bit-shifted divisor, the 4-digit right bit-shifted and C2. In embodiments, the instructions performed by the second adder 106-2 may be preprogrammed. In embodiments, the first adder 106-1 may determine the summation of the values in the second term of the fourth order polynomial.

[0141] In embodiments, the second hardware multiplier 102-2 may obtain the summation of the first term of the fourth order polynomial in Eq. 1 from the first adder 106-1, and the summation of the second term of the fourth order polynomial from the second adder 106-2. In embodiments, the second hardware multiplier 102-2 may perform a multiplication operation of the two terms. In embodiments, the multiplication operation may take 3 clock cycles to perform.

[0142] In embodiments, in order to multiply the result of the multiplication by the second hardware multiplier 102-2 by 5, the fifth bit shift operator 104-5 may perform a bit shift of the result by 2 digits right. In embodiments, the third adder 106-3 may perform the step of summing the 2-digit right bit-shifted result, to the result of the multiplication. In embodiments, this summation may be the approximation of (1/x).

[0143] In embodiments, the circuitry 108 may generate a result of the division operation using the initial approximation of (1/x) using the approximation and the dividend. In embodiments, the result of the division operation may be calculated by the circuitry 108 multiplying the approximation of (1/x) by the dividend. In embodiments, the step of generating the result of the division operation may be performed by a respective hardware multiplier of the one or more hardware multipliers 102.

[0144] In embodiments, prior to generating the result of the division operation, the circuitry 108 may further refine the initial approximation of (1/x). In embodiments, the circuitry 108 may generate a second approximation of (1/x) using an iterative refinement algorithm. In embodiments, the iterative refinement algorithm may use Newton’s method of approximation. In embodiments, the iterative refinement algorithm may use Goldschmidt’s division algorithm. In embodiments, the circuitry 108 may transmit the result of the division operation (either as the initial approximation or the second refine approximation) to the register file of the microprocessor. In embodiments, the initial approximation or the second approximation may be provided in response to the request to perform the division operation.

[0145] Compared to the traditional lookup table approaches, which take significant silicon area for a large lookup table, in embodiments, the silicon area required to produce a silicon implementation of this formula takes a similar amount of silicon to a 16x16 multiplier. In embodiments, the silicon of a 32x32 multiplier may be used to back the implementation of the method, meaning that the area overhead of adding hardware-accelerated division to a small microcontroller would be negligible. For example, a 32-bit Booth code multiplier may be used, which uses an area of adder trees to compute sums and products for a chosen polynomial. In embodiments, other multiplier topologies may be used, for example array multipliers.

[0146] FIG. 1A depicts a dot diagram of summed bits in a Booth code multiplier in accordance with conventional methods of computer multiplication. Referring back to the 32- Bit Booth multiplier, in embodiments, the Booth code multiplier is conventionally used for signed and unsigned multiplication and operates by computing the sum of 17 partial products, as laid out in bitwise columns according to figure 1. These are summed ‘vertically’ using trees of adder cells, before a final carry-propagate adder is used to produce the product.

[0147] FIG. IB depicts a mapping of a division approximation polynomial onto the multiplier partial products depicted in FIG. 1A in accordance with an embodiment of the present invention. In an embodiment of the present invention, the same array of adder trees may be used to compute the polynomials p and p 2 used for division approximation. FIG. IB depicts the mapping of the division approximation terms. In embodiments, the two multiplications, computing x 2 and 5p x p 2 may use Booth encoding circuits for the respective rows that contain their partial products, while the additions that compute p t and p 2 may bypass the booth encoders for those rows and multiplex directly into the columns of adder trees. In embodiments, the computation of p 2 may add 10 nonzero bits to the multiplier array (see the darkly colored dots), but all other computations fit inside the multiplier. In embodiments where an array multiplier is used, the entire computation may fit inside the multiplier. In embodiments, this may reduce the circuit space needed. In further embodiments, the adder (or summation) trees may include a mix of 3 :2 and 4:2 compressor which allow the trees to be split at the 7 th and 10 th partial products, allowing for distinct computations.

[0148] In embodiments, there may be applications to large microprocessors as well, since this approximation is likely faster to compute than the table lookup that is traditionally used.

[0149] In embodiments, other formulas for approximating computer division may be used which may vary in precision and speed. In embodiments, in addition to the formula used in (Eq. 1) the ranges of coefficients achieve at least 7 bits of precision are as follows: where a 6 [3,7], /3 E [2.625, 2.75], and y 6 [1, 1.0625], with constants c and c 2 computed by a minimax algorithm.

[0150] For example, FIG. 2 is a schematic diagram of a processor 200 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 2 is an alternative circuit construction for the processor 100 illustrated in FIG. 1. For example, in embodiments, the processor 200 may include circuitry 208. In embodiments, the circuitry 208 may include a first hardware multiplier 202-1, a second hardware multiplier 202-2, a first bit shift operator 204-1, a second bit shift operator 204-2, a third bit shift operator 204-3, a fourth bit shift operator 204-4, a fifth bit shift operator 204-5, a first adder 206-1, a second adder 206-2, and a third adder 206-3. In embodiments, the second bit shift operator 204-6 in FIG. 2 may be a different bit shift operator than the second bit shift operator 104-2 in FIG. 1 to implement the approximation algorithm in Eq. 1.

[0151] FIG. 3 is a schematic diagram of a processor 300 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 3 is an alternative circuit construction for the processor 100 illustrated in FIG. 1. For example, in embodiments, the processor 300 may include circuitry 308. In embodiments, the circuitry 308 may include a first hardware multiplier 302-1, a second hardware multiplier 302-2, a first bit shift operator 304-1, a second bit shift operator 304-2, a third bit shift operator 304-3, a fourth bit shift operator 304-4, a fifth bit shift operator 304-5, a sixth bit shift operator 304-6, a first adder 306-1, a second adder 306-2, and a third adder 306-3. In embodiments, the alternative circuit construction in FIG. 3 may be used to implement the approximation algorithm in Eq. 1.

[0152] FIG. 4 is a schematic diagram of a processor 400 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 4 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 2. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 8.9 bits using the following formula:

A(x) = 6(Cl - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 )

(Eq. 3) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 7.9 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0153] In embodiments, the processor 400 may include circuitry 408. In embodiments, the circuitry 308 may include a first hardware multiplier 402-1, a second hardware multiplier 402- 2, a first bit shift operator 404-1, a second bit shift operator 404-2, a third bit shift operator 404-3, a fourth bit shift operator 404-4, a fifth bit shift operator 404-5, a sixth bit shift operator 404-6, a first adder 406-1, a second adder 406-2, and a third adder 406-3. In embodiments, the alternative circuit construction in FIG. 4 may be used to implement the approximation algorithm in Eq. 3.

[0154] FIG. 5 is a schematic diagram of a processor 500 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, the circuit construction in FIG. 5 is an alternative circuit construction for the processor 400 illustrated in FIG. 4. In embodiments, the processor 500 may include circuitry 508. In embodiments, the circuitry 508 may include a first hardware multiplier 502-1, a second hardware multiplier 502-2, a first bit shift operator 504-1, a second bit shift operator 504-2, a third bit shift operator 504-3, a fourth bit shift operator 504-4, a fifth bit shift operator 504-5, a sixth bit shift operator 504-6, a first adder 506-1, a second adder 506-2, and a third adder 506-3. In embodiments, the alternative circuit construction in FIG. 5 may be used to implement the approximation algorithm in Eq. 3.

[0155] FIG. 6 is a schematic diagram of a processor 600 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, the circuit construction in FIG. 6 is an alternative circuit construction for the processor 400 illustrated in FIG. 4. In embodiments, the processor 600 may include circuitry 508. In embodiments, the circuitry 608 may include a first hardware multiplier 602-1, a second hardware multiplier 602-2, a first bit shift operator 604-1, a second bit shift operator 604-2, a third bit shift operator 604-3, a fourth bit shift operator 604-4, a fifth bit shift operator 604-5, a sixth bit shift operator 604-6, a seventh bit shift operator 604-7, a first adder 606-1, a second adder 606-2, and a third adder 606-3. In embodiments, the alternative circuit construction in FIG. 6 may be used to implement the approximation algorithm in Eq. 3.

[0156] FIG. 7 is a schematic diagram of a processor 700 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 7 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 2. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 8.0 bits using the following formula:

A(x) = 7(Cl - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 )

(Eq. 4) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 7.0 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0157] In embodiments, the processor 700 may include circuitry 708. In embodiments, the circuitry 708 may include a first hardware multiplier 702-1, a second hardware multiplier 702- 2, a first bit shift operator 704-1, a second bit shift operator 704-2, a third bit shift operator 404-3, a fourth bit shift operator 404-4, a fifth bit shift operator 404-5, a sixth bit shift operator 404-6, a first adder 706-1, a second adder 706-2, and a third adder 706-3. In embodiments, the alternative circuit construction in FIG. 7 may be used to implement the approximation algorithm in Eq. 4.

[0158] FIG. 8 is a schematic diagram of a processor 800 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 8 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 2. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 8.8 bits using the following formula:

A(x) = 4(Cl - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 )

(Eq. 5) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 7.8 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0159] In embodiments, the processor 800 may include circuitry 808. In embodiments, the circuitry 808 may include a first hardware multiplier 802-1, a second hardware multiplier 802- 2, a first bit shift operator 804-1, a second bit shift operator 804-2, a third bit shift operator 804-3, a fourth bit shift operator 804-4, a fifth bit shift operator 804-5, a first adder 806-1, and a second adder 806-2. In embodiments, the alternative circuit construction in FIG. 8 may be used to implement the approximation algorithm in Eq. 5.

[0160] FIG. 9 is a schematic diagram of a processor 900 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 9 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 2. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 7.6 bits using the following formula:

A(x) = 3(Cl - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 )

(Eq. 6) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 7.6 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0161] In embodiments, the processor 900 may include circuitry 908. In embodiments, the circuitry 908 may include a first hardware multiplier 902-1, a second hardware multiplier 902- 2, a first bit shift operator 904-1, a second bit shift operator 904-2, a third bit shift operator 904-3, a fourth bit shift operator 904-4, a fifth bit shift operator 904-5, a first adder 906-1, a second adder 906-2, and a third adder 906-3. In embodiments, the alternative circuit construction in FIG. 9 may be used to implement the approximation algorithm in Eq. 6.

[0162] FIG. 10 is a schematic diagram of a processor 1000 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 10 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 2. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 8.5 bits using the following formula:

A(x) = 5(C1 - 1.03125x + x 2 ) (C2 - 2.75x + x 2 )

(Eq. 7) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 7.5 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm. [0163] In embodiments, the processor 1000 may include circuitry 1008. In embodiments, the circuitry 1008 may include a first hardware multiplier 1002-1, a second hardware multiplier 1002-2, a first bit shift operator 1004-1, a second bit shift operator 1004-2, a third bit shift operator 1004-3, a fourth bit shift operator 1004-4, a fifth bit shift operator 1004-5, a first adder 1006-1, a second adder 1006-2, and a third adder 1006-3. In embodiments, the alternative circuit construction in FIG. 10 may be used to implement the approximation algorithm in Eq.

7.

[0164] FIG. 11 is a schematic diagram of a processor 1100 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 11 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 2. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 7.9 bits using the following formula:

A(x) = 5(C1 - 1.03125x + x 2 ) (C2 - 2.625x + x 2 )

(Eq. 8) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 6.9 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0165] In embodiments, the processor 1100 may include circuitry 1108. In embodiments, the circuitry 1108 may include a first hardware multiplier 1102-1, a second hardware multiplier 1102-2, a first bit shift operator 1104-1, a second bit shift operator 1104-2, a third bit shift operator 1104-3, a fourth bit shift operator 1104-4, a fifth bit shift operator 1104-5, a first adder 1106-1, a second adder 1106-2, and a third adder 1106-3. In embodiments, the alternative circuit construction in FIG. 11 may be used to implement the approximation algorithm in Eq.

8.

[0166] FIG. 12 is a schematic diagram of a processor 1200 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 12 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 2. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 8.2 bits using the following formula:

A(x) = 5(C1 - x + x 2 ) (C2 - 2.6875x + x 2 )

(Eq. 9) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 7.2 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0167] In embodiments, the processor 1200 may include circuitry 1208. In embodiments, the circuitry 1208 may include a first hardware multiplier 1202-1, a second hardware multiplier 1202-2, a first bit shift operator 1204-1, a second bit shift operator 1204-2, a third bit shift operator 1204-3, a fourth bit shift operator 1204-4, a fifth bit shift operator 1204-5, a first adder 1206-1, a second adder 1206-2, and a third adder 1206-3. In embodiments, the alternative circuit construction in FIG. 12 may be used to implement the approximation algorithm in Eq.

9.

[0168] FIG. 13 is a schematic diagram of a processor 1300 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 13 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 2. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 7.8 bits using the following formula:

A(x) = 5(Cl -1.0625x + x 2 ) (C2 - 2.6875x + x 2 ) (Eq. 10) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 6.8 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0169] In embodiments, the processor 1300 may include circuitry 1308. In embodiments, the circuitry 1308 may include a first hardware multiplier 1302-1, a second hardware multiplier 1302-2, a first bit shift operator 1304-1, a second bit shift operator 1304-2, a third bit shift operator 1304-3, a fourth bit shift operator 1304-4, a fifth bit shift operator 1304-5, a first adder 1306-1, a second adder 1306-2, and a third adder 1306-3. In embodiments, the alternative circuit construction in FIG. 13 may be used to implement the approximation algorithm in Eq.

10.

[0170] In embodiments, the following additional approximation functions also achieve at least 7 bits of precision (which are derived using the same methodology):

ApproxRecip(x) = a(c — x)(c 2 — /?x + x 2 ) (Eq. 11) where a E [3, 4], E [1.46875, 1.53125], with constants and c 2 computed by a minimax algorithm. For example, in embodiments, the approximation algorithm may be a third order polynomial.

[0171] For example, FIG. 14 is a schematic diagram of a processor 1400 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 14 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 11. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 8.9 bits using the following formula:

A(x) = 3.5(Cl - x) (C2 - 1.5x + x 2 ) (Eq. 12) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 7.9 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0172] In embodiments, the processor 1400 may include circuitry 1408. In embodiments, the circuitry 1408 may include a first hardware multiplier 1402-1, a second hardware multiplier 1402-2, a first bit shift operator 1404-1, a second bit shift operator 1404-2, a third bit shift operator 1404-3, a first adder 1406-1, a second adder 1406-2, and a third adder 1406-3. In embodiments, the alternative circuit construction in FIG. 14 may be used to implement the approximation algorithm in Eq. 12.

[0173] FIG. 15 is a schematic diagram of a processor 1500 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 15 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 11. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 7.8 bits using the following formula:

A(x) = 3.5(Cl - x) (C2 - 1.53125x + x 2 ) (Eq. 13) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 6.8 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0174] In embodiments, the processor 1500 may include circuitry 1508. In embodiments, the circuitry 1508 may include a first hardware multiplier 1502-1, a second hardware multiplier 1502-2, a first bit shift operator 1504-1, a second bit shift operator 1504-2, a third bit shift operator 1504-3, a fourth bit shift operator 1504-4, a first adder 1506-1, a second adder 1506- 2, and a third adder 1506-3. In embodiments, the alternative circuit construction in FIG. 15 may be used to implement the approximation algorithm in Eq. 13.

[0175] FIG. 16 is a schematic diagram of a processor 1600 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 16 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 11. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 7.4 bits using the following formula:

A(x) = 3.5(Cl - x) (C2 - 1.46875x + x 2 ) (Eq. 14) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 6.4 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0176] In embodiments, the processor 1600 may include circuitry 1608. In embodiments, the circuitry 1608 may include a first hardware multiplier 1602-1, a second hardware multiplier 1602-2, a first bit shift operator 1604-1, a second bit shift operator 1604-2, a third bit shift operator 1604-3, a fourth bit shift operator 1604-4, a first adder 1606-1, a second adder 1606- 2, and a third adder 1606-3. In embodiments, the alternative circuit construction in FIG. 16 may be used to implement the approximation algorithm in Eq. 14.

[0177] FIG. 17 is a schematic diagram of a processor 1700 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 17 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 11. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 7.1 bits using the following formula:

A(x) = 4(C1 - x) (C2 - 1.5x + x 2 ) (Eq. 15) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 6.1 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm. [0178] In embodiments, the processor 1700 may include circuitry 1708. In embodiments, the circuitry 1708 may include a first hardware multiplier 1702-1, a second hardware multiplier 1702-2, a first bit shift operator 1704-1, a second bit shift operator 1704-2, a first adder 1606- 1, and a second adder 1606-2. In embodiments, the alternative circuit construction in FIG. 17 may be used to implement the approximation algorithm in Eq. 15.

[0179] FIG. 18 is a schematic diagram of a processor 1800 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, FIG. 18 is a circuit construction for implementing a variation of the approximation algorithm of Eq. 11. In embodiments, the initial guess for (1/x) (when x is between 0.5 and 1) can be made within 7.2 bits using the following formula:

A(x) = 5(C1 - x) (C2 - 1.375x + x 2 ) (Eq. 16) where A(x) is the initial approximation of (1/x), Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the initial guess can be made within 6.2 bits with Cl and C2 chosen to always overestimate or underestimate. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0180] In embodiments, the processor 1800 may include circuitry 1808. In embodiments, the circuitry 1808 may include a first hardware multiplier 1802-1, a second hardware multiplier 1602-2, a first bit shift operator 1804-1, a second bit shift operator 1804-2, a third bit shift operator 1804-3, a first adder 1806-1, a second adder 1806-2, and a third adder 1806-3. In embodiments, the alternative circuit construction in FIG. 18 may be used to implement the approximation algorithm in Eq. 16.

[0181] FIG. 19 is a process flow for a processor 100 configured to perform a method for computer division approximation in accordance with embodiments of the present invention. In embodiments, referring to FIG. 19, a method for computer division approximation by the processor 100, which may include circuitry 108, may begin with step SI 902. At step SI 902, in embodiments, the circuitry 108 may obtain a divisor and a dividend for division from a register file of a microprocessor. In embodiments, the circuitry 108 may include a microcontroller. In embodiments, the circuitry 108 may include one or more hardware multipliers 102. In embodiments, each respective hardware multiplier of the one or more hardware multipliers 102 may include respective multiplication instructions. In embodiments, the circuitry 108 may include one or more fused multiply add operators. In embodiments, the circuitry may include one or more bit shift operators 104. In embodiments, each respective bit shift operator of the one or more bit shift operators 104 may be configured to shift a digit of a respective input in a direction. In embodiments, the direction may be a left direction. In embodiments, the direction may be a right direction. In embodiments, the circuitry 108 may include one or more adders 106. In embodiments, each respective adder of the one or more adders 106 may include respective summation instructions.

[0182] In embodiments, the step SI 902 of obtaining the divisor and dividend for division may include the circuitry 108 receiving a request to perform a division operation including the divisor and the dividend. In embodiments, the request to perform the division operation may be an instruction by the microprocessor to divide the dividend by the divisor. In embodiments, the instruction may be a “DIV” instruction.

[0183] In embodiments, the method may continue with step S1904. At step S1904, in embodiments, the circuitry 108 may generate an initial approximation of 1 divided by the first divisor. In embodiments, the generating step S1904A may include the circuitry 108 processing the divisor as an input using an approximation algorithm implemented by the circuitry 108 to generate as an output the initial approximation using one or more constants. In embodiments, the approximation algorithm may be a fourth order polynomial. In embodiments, the approximation algorithm may be a sixth order polynomial. In embodiments, the approximation algorithm may be a third order polynomial. In embodiments, the approximation algorithm may be determined by the formula:

A(x) = 5(Cl - 1.03125x + x 2 ) (C2 - 2.6875x + x 2 ) (Eq. 1) where A(x) is the initial approximation of 1 divided by the first divisor, Cl is a first constant, C2 is a second constant, and x is the divisor. In embodiments, the first constant may be determined by a minimax algorithm. In embodiments, the second constant may be determined by a minimax algorithm.

[0184] In embodiments, the method may continue with step SI 906. In embodiments, at step SI 906, the circuitry 108 may generate a result of the division operation using the initial approximation of 1 divided by the divisor and the dividend. In embodiments, the result of the division operation may be calculated by the circuitry 108 multiplying the approximation of (1/x) by the dividend. In embodiments, the step of generating the result of the division operation may be performed by a respective hardware multiplier of the one or more hardware multipliers 102.

[0185] In embodiments, prior to generating the result in step SI 906, the circuitry 108 may generate a second approximation of 1 divided by the divisor using an iterative refinement algorithm. In embodiments, the iterative refinement algorithm may use Newton’s method of approximation. In embodiments, the iterative refinement algorithm may use Goldschmidt’s division algorithm.

[0186] In embodiments, referring to FIG. 19, the method may continue with S1908. In embodiments, at step SI 908, the circuitry 108 may transmit the result to the register file of the microprocessor. In embodiments, the result of the division operation may be stored in the register file.

[0187] In embodiments, the method may be performed using a computer system including one or more memories operably connected to one or more processors. In embodiments, the one or more memories may store instructions that, when implemented by the one or more processors, cause the one or more processors to perform the method. In embodiments, the one or more processors may include circuity which performs the method as depicted in Fig. 19.

* * *

[0188] Now that embodiments of the present invention have been shown and described in detail, various modifications and improvements thereon can become readily apparent to those skilled in the art. Accordingly, the exemplary embodiments of the present invention, as set forth above, are intended to be illustrative, not limiting. The spirit and scope of the present invention is to be construed broadly.