Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
RADAR HARDWARE ACCELERATOR
Document Type and Number:
WIPO Patent Application WO/2017/218917
Kind Code:
A1
Abstract:
In described examples, a radar hardware accelerator (HWA) (125) includes a fast Fourier transform (FFT) engine including a pre-processing block (211) for providing interference mitigation, finite impulse response (FIR) filtering, and/or multiplying a radar data sample stream received from ADC buffers (120) within a split accelerator local memory that also includes output buffers (130) by a pre-programmed complex scalar or a specified sample from an internal look-up table (LUT) to generate pre-processed samples. A windowing plus FFT block (windowed FFT block) (212) is for multiply the pre-processed samples by a window vector and then processing by an FFT block for performing a FFT to generate Fourier transformed samples. A post-processing block (213) is for computing a magnitude of the Fourier transformed samples and performing a data compression operation for generating post-processed radar data. The pre-processing block (211), windowed FFT block (212) and post-processing block (213) are connected in one streaming series data path.

Inventors:
RAO SANDEEP (IN)
RAMASUBRAMANIAN KARTHIK (IN)
PRATHAPAN INDU (IN)
GANESAN RAGHU (IN)
GUPTA PANKAJ (IN)
Application Number:
PCT/US2017/037916
Publication Date:
December 21, 2017
Filing Date:
June 16, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TEXAS INSTRUMENTS INC (US)
TEXAS INSTRUMENTS JAPAN LTD (JP)
International Classes:
G01S13/34; G06F17/14
Foreign References:
US20080019464A12008-01-24
US20160013743A12016-01-14
US20120235855A12012-09-20
US6380887B12002-04-30
US20080106460A12008-05-08
DE102005037628B32007-04-19
US20080019464A12008-01-24
US20050285773A12005-12-29
Other References:
"Automotive 24Ghz radar development kit. Based on Infineon AURIX family and BGT24A MMIC family", INFINEON PRODUCT BRIEF, February 2016 (2016-02-01), XP055463859, Retrieved from the Internet
ME??HC??? B??A???, ?PA??OE PY?O?O?C??? NO PA6OTE C PIPO?PA???? SPECTRUMLAB B PIP??O?E??? ? PA??O?E?EOP??IOTA? ?A6???????? (OC?O????IOTA? ?AC?PO???, 13 September 2008 (2008-09-13), pages 5 - 6, XP055463876, Retrieved from the Internet
See also references of EP 3472640A4
Attorney, Agent or Firm:
NEERINGS, Ronald, O. et al. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A radar hardware accelerator (HWA), comprising:

a fast Fourier transform (FFT) engine including:

a pre-processing block for providing at least one of interference mitigation, finite impulse response (FIR) filtering and multiplying a radar data sample stream received from ADC buffers within a split accelerator local memory that also includes output buffers by a pre-programmed complex scalar or a specified sample from an internal look-up table (LUT) to generate pre-processed samples;

a windowing plus FFT block (windowed FFT block) for multiplying the pre-processed samples by a window vector and then processing by an FFT block for performing a FFT to generate Fourier transformed samples; and

a post-processing block for computing a magnitude of the Fourier transformed samples;

wherein the pre-processing block, the windowed FFT block, and the post-processing block are connected in one streaming series data path.

2. The HWA of claim 1, further comprising a constant false alarm rate (CFAR) engine in a CFAR detection path parallel to the streaming series data path including a log-magnitude pre-processing block and a CFAR detector for detecting radar target returns against a background.

3. The HWA of claim 1, wherein the pre-processing block, the windowed FFT block and the post-processing block include independent enable (EN) circuits to provide independent muxing controls that enable/bypass for any combination of the pre-processing block, the windowed FFT block and the post-processing block.

4. The HWA of claim 1, further comprising a substrate that provides at least a semiconductor surface, wherein the HWA is formed in the semiconductor surface.

5. The HWA of claim 1, wherein the ADC buffers and the output buffers are both split memories.

6. The HWA of claim 2, wherein the CFAR detection path shares at least one of a shared memory and logic with the FFT engine.

7. A radar sub-system, comprising:

a split accelerator local memory including ADC input buffers (ADC buffers) for storing radar data sample streams, and output buffers,

a radar hardware accelerator (HWA) coupled to the ADC buffers for receiving the radar data sample streams and for processing the radar data sample streams, the HWA including:

a fast Fourier transform (FFT) engine including:

a pre-processing block for providing at least one of interference mitigation, finite impulse response (FIR) filtering and multiplying the radar data sample streams by a pre-programmed complex scalar or a specified sample from an internal look-up table (LUT) to generate pre-processed samples;

a windowing plus FFT block (windowed FFT block) for multiplying the pre-processed samples by a window vector and then processing by an FFT block for performing a FFT to generate Fourier transformed samples; and

a post-processing block for computing a magnitude of the Fourier transformed samples for generating post-processed radar data;

wherein the pre-processing block, the windowed FFT block, and the post-processing block are connected in one streaming series data path, and wherein an output of the post-processing block is coupled to an input of the output buffers for transferring the post-processed radar data to the output buffers; and

a parameter-set configuration memory coupled to a state machine both coupled by a bus to the FFT engine for sequencing parameters sets for execution of a chained sequence of operations and data transfers between the accelerator local memory and an external memory for controlling the pre-processing block, the windowed FFT block and the post-processing block.

8. The radar sub-system of claim 7, wherein the state machine is a parameter-set based state machine wherein the parameter-sets are programmable, the parameter-sets for configuring the HWA to perform a certain set of the operations, and wherein a sequence of executing the parameter-sets is defined.

9. The radar sub-system of claim 8, wherein the radar sub-system is configured to use 2D-memory indexing, wherein each of the parameter-sets performs a specific one of the operations on multiple ones of the radar data sample streams including a first radar data sample stream and a subsequent second radar data sample stream, and wherein a separation between subsequent samples of each of the radar data sample streams, a separation between an initial sample of the first radar data sample stream and an initial sample of the second radar data sample stream, and a number of the radar data sample streams for each of the operations is configurable via the parameter-sets.

10. The radar sub-system of claim 7, further comprising a constant false alarm rate (CFAR) engine in a CFAR detection path parallel to the streaming series data path including a log-magnitude pre-processing block and CFAR detector for detecting radar target returns against a background.

11. The radar sub-system of claim 7, wherein the ADC buffers and the output buffers are both split memories.

12. The radar sub-system of claim 7, further comprising a substrate that provides at least a semiconductor surface, wherein the HWA is formed in the semiconductor surface.

13. The radar sub-system of claim 7, wherein the pre-processing block, the windowed FFT block and the post-processing block include independent enable (EN) circuits to provide independent muxing controls that enable/bypass for any combination of the pre-processing block, the windowed FFT block and the post-processing block.

14. A method of (FMCW) radar signal processing using a radar hardware accelerator (HWA), comprising:

coupling radar data sample streams from ADC input buffers (ADC buffers) to the FIWA comprising a fast Fourier transform (FFT) engine for receiving and processing the radar data sample streams including calculating including at least one of interference-thresholding, windowing and range FFT to generate post-processed radar data including range FFT data;

streaming the post-processed radar data to output buffers;

transferring the range FFT data from the output buffers to an external memory via direct memory accesses (DMA), the DMA being triggered automatically by the HWA;

repeating the coupling, the calculating, the streaming, and the transferring for the radar data sample streams received by multiple antennas and across multiple chirps;

wherein further processing is performed on the range FFT data originating from the multiple antennas and across the multiple chirps, comprising:

transferring from the external memory in blocks to the output buffers, each the block including data for a first range gate and at least a second range gate; performing multiple doppler FFT's using the HWA corresponding to the first range gate, wherein the doppler FFT's are computed for each of the multiple antennas in the first range gate;

performing an absolute value operation on result of the doppler FFT's for each of the multiple antennas corresponding to the first range gate, and summing results of the the absolute value operation across the multiple antennas; and

repeating the further processing on subsequent data blocks from the data blocks corresponding to at least the second range gate.

15. The method of claim 14, wherein the transferring the range FFT data is performed in a transpose fashion.

16. The method of claim 14, further comprising detecting radar target returns against a background using a constant false alarm rate (CFAR) engine in a CFAR detection path parallel to a streaming series data path including a log-magnitude pre-processing block and a CFAR detector.

17. The method of claim 14, further comprising indexed memory addressing for data transfers to and from the HWA.

18. The method of claim 14, wherein the FFT engine comprises:

a pre-processing block for providing at least one of interference mitigation and multiplying the radar data sample streams within a split accelerator local memory including the output buffers by a pre-programmed complex scalar or a specified sample from an internal look-up table (LUT) to generate pre-processed samples;

a windowing plus FFT block (windowed FFT block) for multiplying the pre-processed samples by a window vector and then processing by an FFT block for performing a FFT to generate Fourier transformed samples; and

a post-processing block for computing a magnitude of the Fourier transformed samples for generating the post-processed radar data;

wherein the pre-processing block, the windowed FFT block, and the post-processing block are connected in one streaming series data path.

19. The method of claim 18, wherein the pre-processing block, the windowed FFT block and the post-processing block include independent enable (EN) circuits to provide independent muxing controls that enable/bypass for any combination of the pre-processing block, the windowed FFT block and the post-processing block.

20. The method of claim 14, further comprising 2D-memory indexing, wherein each of multiple parameter-sets performs a specific operation on multiple ones of the radar data sample streams including a first radar data sample stream and a subsequent second radar data sample stream, and wherein a separation between subsequent samples of each of the radar data sample streams, a separation between an initial sample of the first radar data sample stream and an initial sample of the second radar data sample stream, and a number of the radar data sample streams for each of the operations is configurable via the parameter-sets.

Description:
RADAR HARDWARE ACCELERATOR

[0001] This relates to hardware accelerators for radar systems.

BACKGROUND

[0002] Radar is used in many applications to detect target objects such as airplanes, military targets, vehicles, and pedestrians. Radar finds use in a number of applications associated with a motor vehicle such as for adaptive cruise control, collision warning, blind spot warning, lane change assist, parking assist and rear collision warning. Pulse radar or frequency modulated continuous wave (FMCW) radar are conventionally used in such applications.

[0003] In a radar system, a local oscillator (LO) generates a transmit signal. A voltage controlled oscillator (VCO) converts a voltage variation into a corresponding frequency variation. The transmit signal is amplified and transmitted by one or more transmit units. In FMCW radar, the frequency of the transmit signal is varied linearly with time. This transmit signal is referred as a ramp signal or a chirp signal. One or more obstacles scatters (or reflects) the transmit signal which is received by one or more receive units in the FMCW radar system.

[0004] A baseband signal is obtained from a mixer which mixes the transmitted LO signal and the received scattered signal that is termed an intermediate frequency (IF) signal. The IF signal is signal conditioned by a conditioning circuit which includes an amplifier and an anti-alias filter, is sampled by an analog to digital converter (ADC), and then is processed by a processor (e.g., microprocessor) to estimate a distance and a velocity of one or more nearby obstacles that provide scatter. Each peak in the fast Fourier transform (FFT) of the digitized IF signal corresponds to an object. The frequency of the IF signal is proportional to the range (distance) of the obstacle(s).

[0005] 77 GHz automotive radar is a fast-growing market segment, with a variety of existing and emerging applications. For example, the frequency of the transmitted chirp signal may be controlled to increase at a constant linear ramp rate from 77 GHz to 81 GHz in a period of about 100 microseconds. FMCW modulation is the preferred radar choice due to its various advantages including a large RF sweep bandwidth (enabling high range resolution), while keeping the IF/ADC bandwidth small, and lower peak power consumption needed as compared to pulsed radar.

[0006] The signal processing for FMCW radar systems (such as for advanced driver assist systems (ADAS)) is typically performed using a radar micro controller unit (MCU). The radar MCU generally includes a FFT hardware accelerator and a lock-step safety central processing unit (CPU) for object detection and tracking.

[0007] FMCW radar signal processing involves generating what is termed three (3) dimensions including the computation of a first-dimension (range) FFT, second-dimension (Doppler) FFT and third-dimension angle-of-arrival estimation processing (beamforming). An advantage of using a fast (saw-tooth) FMCW radar waveform is that it can provide a two-dimensional range- velocity view of the objects illuminated by the radar, and additionally, the angle-of-arrival can be obtained through the use of multiple TX/RX antennas using digital beamforming.

SUMMARY

[0010] In described examples, a radar hardware accelerator (HWA) includes a fast Fourier transform (FFT) engine including a pre-processing block for providing at least one of interference mitigation, finite impulse response (FIR) filtering, and multiplying a radar data sample stream received from ADC buffers within a split accelerator local memory that also includes output buffers by a pre-programmed complex scalar or a specified sample from an internal look-up table (LUT) to generate pre-processed samples. A windowing plus FFT block (windowed FFT block) is for multiplying the pre-processed samples by a window vector and then processing by a FFT block for performing a FFT to generate Fourier transformed samples. A post-processing block is for computing a magnitude of the Fourier transformed samples and performing a data compression operation for generating post-processed radar data. The processing block, windowed FFT block, and post-processing block are connected in one streaming series data path which reduces latency.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] FIG. 1 is a block diagram representation of radar system portion that includes an HWA for radar signal processing, according to an example embodiment.

[0012] FIG. 2 shows a radar sub-system that includes an example HWA shown in one example HWA implementation interfaced by a bus to a processor shown as a microprocessor, according to an example embodiment. DETAILED DESCRIPTION OF EXAMPLE EMB ODFMENT S

[0013] In the drawings, like reference numerals are used to designate similar or equivalent elements. Some Illustrated acts or events may occur in different order and/or concurrently with other acts or events. Furthermore, some illustrated acts or events may not be required to implement a methodology in accordance with this description.

[0014] Also, the terms "coupled to" or "couples with" (and the like) as used herein without further qualification describe either an indirect or direct electrical connection. Thus, if a first device "couples" to a second device, that connection can be through a direct electrical connection where only parasitics are in the pathway, or through an indirect electrical connection via intervening items including other devices and connections. For indirect coupling, the intervening item generally does not modify the information of a signal, but may adjust its current level, voltage level and/or power level.

[0015] Conventional hardware accelerator (HWA) architectures for FMCW radar signal processing have several problems. Such problems include latency, lack of flexibility in sequencing data into and out of the HWA, and the radar signal processing that that is performed by the HWA is dependent on processor intervention.

[0016] Described HWAs solve these problems by including several unique features that enable high-performance and flexibility for the customer, with support for off-loading of some frequently used radar signal processing computations from the processor to the HWA.

[0017] FIG. 1 is a block diagram representation of an example radar system portion 100 that includes an HWA 125 providing a FFT Engine for radar signal processing including performing a plurality of FFT calculation or computations. FIG. 2 shows an example radar sub-system 200 that includes an example HWA 125' shown in one example HWA implementation interfaced by a bus 145 to a processor shown as a microprocessor (μΡ) 135' .

[0018] These FFT computations comprise obtaining the above-described three (3) dimensions including the computation of a first-dimension (range) FFT, second-dimension (Doppler) FFT, and third-dimension angle-of-arrival estimation processing (beamforming). HWA 125 is shown in FIG. 1 and HWA 125' in FIG. 2 having a core computation unit including a pre-processing block 211, windowed FFT block 212 and a post-processing block 213 connected together in one streaming series data path. HWA 125 enables a compact (i.e., low area),) implementation for frequently used signal processing computations in an FMCW radar receiver. [0019] Each of the pre-processing block 211, windowed FFT block 212 and post-processing block 213 include an independent enable (EN) circuit shown in FIG. 1 to provide independent muxing controls that enable/bypass (disable) for any combination of the pre-processing block 211, windowed FFT block 212 and post-processing block 213. This provides greater flexibility while chaining multiple accelerator operations that enable a customer to use the HWA 125 to implement a variety of processing steps. The complete configuration for each of the blocks (pre-processing block 211, windowed FFT block 212, post-processing block 213, and the optional Constant False Alarm Rate (CFAR) engine 220 and CFAR detector 222 in FIG. 2 described hereinbelow) including the enable/disable is programmed into the parameter-set configuration-set configuration memory 235 shown in FIG. 2 by the μρ 135 and the state machine 240 shown in FIG. 2 then configures each block as per the programmed contents of parameter-set configuration memory 235.

[0020] The signal processing steps radar system portion 100 performs obtaining the three-dimensional image of objects involves computing a first-dimension (range) FFT on the data samples from ADC buffers 120 corresponding to each transmitted chirp using the windowed FFT block 212. This is followed by a second-dimension (Doppler) FFT, which is performed across chirps, where the range-FFT samples are fed in to the windowed FFT block 212 in transpose order compared to the first-dimension FFT. The angle-of-arrival estimation also involves FFT computations by windowed FFT block 212 with yet another transpose at the input. The HWA 125 also has a post-processing block 213 for optionally computing a magnitude or log-magnitude of the radar image obtained from the FFT operations provided by the windowed FFT block 212. Further and optionally, the sum of the magnitude or log-magnitude of the radar image across a plurality of antennas can be obtained by passing the corresponding samples across antennas obtained at the output of the post-processing block 213 through the FFT engine 210 and retaining only the first sample of the FFT engine output. This exploits the fact that the first output of an FFT computation represents the sum of the samples. Thus for computing the sum across 4 antennas, a 4-point FFT would be computed for every 4 corresponding samples across antennas. Described post-processing is useful in preparing the FFT data for object detection. Object detection can be done by optionally employing a CFAR detection algorithm (see CFAR engine 220 in FIG. 2).

[0021] The radar system portion 100 is usually on a single semiconductor (e.g., silicon) chip shown as a substrate 105 in FIG. 1 that provides at least a semiconductor surface. One example for substrate 105 is a bulk silicon substrate having an epitaxial silicon surface. Other substrates 105 may be used.

[0022] The radar system portion 100 includes an analog block 110 that represents the respective analog front end components (antenna(s), power amplifier, mixers, band pass filters, low noise amplifiers (LNA's) and analog-to-digital converters (ADCs)) that are coupled to a digital front end 115 that generally includes a decimator which downsamples and filters the samples output by the ADC before they are presented to an ADC input buffer pair (ADC buffers 120) which function to store the pre-processed radar data for the HWA 125. Although shown on chip, the antenna(s) may be off chip.

[0023] The ADC buffers 120 and the output buffers 130 together provide the local memories for the HWA 125 (together shown as accelerator local memories 217 in FIG. 2 described hereinbelow). The local memory 217 is a split memory that includes ADC buffers 120 for storing radar data samples received from a digital front end (e.g., digital front end 115 in FIG. 1) and output buffers 130 for receiving post processed radar data from a post-processing block 213. Although ADC buffers 120 and output buffers 130 are designated as input buffers and output buffers respectively, each of these 4 buffers has more generic applicability. For example, during periods when the digital front end is not streaming data to the ADC buffers 120 (such as during inter-frame periods), the HWA 125 is free to use any of the buffers as input/output buffers and, in such periods, the ADC buffers 120 operate as generic buffers that are not limited to storing samples output by the digital front end 115.

[0024] The split aspect of the local memory 217 allows each of these 4 blocks of memory shown in FIG. 2 to be accessed independently. However, a non-split local memory is also possible. A split memory for local memory 217 is useful when performing data processing in a ping/pong fashion, such as when data is being filled into a ping-input memory (from an external source), data from the pong input-buffer can be streamed into the FFT engine 210. Likewise, when data is being streamed out to the ping output-buffer, the previous data from the pong-buffer can be transported out to an external entity. The ping/pong-input memory and ping/pong-output memory can be assigned from the respective memory blocks of 217.

[0025] As shown in FIG. 2, μρ 135' is coupled by a bus 145 to enable accessing the HWA internal memories (parameter-set configuration memory 235 and configuration registers 245, which are both shown in FIG. 2 described hereinbelow), window RAM 212a within the windowed FFT block 212, and the local memory 217 shown in FIG. 2 comprising ADC buffers 120 and output buffers 130. An external memory block 140 comprises a memory external to the HWA 125 for transferring data in chunks (blocks) between the ADC buffers 120 as well the output buffers 130 and the external memory 140. The bus 145 connects to a high speed interface (HSI) 150 and a serial port 155. HSI 150 provides an interface between the radar system portion 100 and another signal processing unit (such as μΡ 135' shown in FIG. 2) that in an automotive application usually processes the processed radar data provided by the HWA frame-by-frame to determine the range, velocity and angle of any obstacle/vehicle in front of the vehicle's radar system.

[0026] An input formatter block 203 reads the input samples from the ADC buffers 120 and feeds them into the FFT engine 210 including the pre-processing block 211. The input formatter block 203 can be configured to perform a variety of tasks. For example, the input formatter block 203 can enable considerable flexibility in streaming data from the input memory (ADC buffers 120) into the HWA (using the 2D Memory indexing described hereinbelow), can be configured to conjugate and/or scale the input data, can be configured to multiply the incoming data with a Binary Phase Modulation (BPM) pattern (which is a sequence of l's and -l's), and can allow for circular indexing of the input memory which can be particularly useful if the HWA is being employed to do sub-band filtering using the FFT-IFFT approach.

[0027] The pre-processing block 211 is for providing at least one of interference mitigation (e.g., zeroing out of radar samples whose magnitude exceeds a programmable limit), finite impulse response (FIR) filtering and performing a complex multiply operation on the radar data sample stream received from the input formatter 203 . The complex multiply operation can be configured to be in one of various modes. In the frequency shift mode, the complex multiplier frequency de-rotates the radar data sample stream by a certain programmable frequency. In a scalar multiplication mode the radar data sample stream is multiplied by a pre-programmed complex scalar using the complex multiplier block 211c shown. In the vector multiplication mode the complex multiplier block 211c performs an element wise multiplication of the radar data sample stream and a complex vector that has been stored in the internal look-up table (LUT) 211a shown as a sin, cos LUT that is coupled to the complex multiplier block 211c. The pre-processing block 211 is also shown including an interference mitigation block 21 Id between the output of the input formatter 203 and the complex multiplier block 211c. The interference mitigation block 21 Id can use threshold comparison for zeroing out/clamping samples determined to be interference samples.

[0028] Pre-processing block 211 enables operations such as frequency shifting and FFT stitching. Regarding FFT stitching, the FFT Engine can perform streaming FFT's of generally up to 1024 points. This capability suffices for most radar applications while still keeping the area of the HWA small. To perform FFT's which are larger than 1024-points, the HWA offers an "FFT Stitching capability" where multiple smaller size FFT's computed on multiple sub-sets of a given input stream can be used to compute a larger size FFT of the entire input stream. As an example, when a 4K size FFT is needed, it is achieved in two steps. In the first step, every 4th input sample is passed through a IK size FFT - i.e., four IK point FFTs are performed on decimated input samples. Then, the resulting 4x1024 FFT outputs are sent through 4-point "stitching" FFTs (1024 4-point FFTs), which additionally involves a pre-multiplication by the complex multiplier block. The pre-processing block 211 also includes a FIR filter 21 lb for FIR filtering.

[0029] The windowed FFT block 212 is for multiplying the pre-processed samples by a window vector from window-coefficients stored in the window RAM 212a and then processing by a FFT block 212b for performing a FFT to generate Fourier transformed samples. The post-processing block 213 is for computing a magnitude of the Fourier transformed samples and performing a data compression operation (e.g., log2 operation) for generating post-processed radar data. Data compression is optional and is configurable.

[0030] An output of the post-processing block 213 is coupled by an output formatter block 216 to an input of the output buffers 130 for transferring the post-processed radar data to the output buffers 130. The output formatter block 216 is responsible for writing the streaming processed output samples from post-processing block 213 into the output buffers 130.

[0031] The output formatter block 216 can also be configured to perform a variety of tasks. For example, the output formatter block 216 can enable considerable flexibility in streaming data from the HWA into the output memory (using the 2D-Memory indexing described hereinbelow), can be configured to conjugate and/or scale the data before storing in the output buffers 130 and 'destination skip Sample' feature which allows for certain number of output samples (from the HWA) to be skipped (i.e. discarded) in the beginning. This feature (in conjunction with the parameter DST ACNT (described hereinbelow), allows only a specific contiguous sub-set of the output samples from the HWA to be stored in output memory. This can be useful, such as when only a specific sub-set of FFT-bins are needed.

[0032] The HWA 125' is shown including an optional CFAR engine 220 positioned in a CFAR detection path parallel to the streaming series data path. CFAR engine 220 includes a pre-processing block 221 and CFAR detector 222 for detecting radar target returns against background noise (e.g., clutter and interference). Because the FFT engine 210 and the CFAR engine 220 will generally not be operating simultaneously, and to reduce the area of the HWA, memory and logic can be shared between these two engines. FIG. 2 shows a shared memory 255 that is shared between the FFT block 210 and the CFAR Engine 220. Also, logic can be shared between the post-processing block 213 and the pre-processing block 221.

[0033] A parameter-set configuration memory 235 (e.g., implemented as RAM) is also shown coupled to a state machine 240 both coupled by a bus 145 to the FFT engine 210. A state machine is any device that stores the status of something at a given time and can operate on input to change the status and/or cause an action or output to take place for any given change. The state machine 240 is responsible for controlling the operation of the HWA 125' including for sequencing parameter-sets for execution of a chained sequence of computations and data transfers between the accelerator local memory 217 and an external memory 140, for controlling the pre-processing block 211, windowed FFT block 212 and post-processing block 213. The state machine 240 can be configured to run through a sequence of parameter-sets starting and ending at specified indices (such as a start index and an end index). The state machine 240 can also be configured to loop through this sequence a specific number of times.

[0034] The parameter-set configuration memory 235 is used to pre-configure the sets of parameters for a chained sequence of HWA operations. This memory can comprise an accelerator register configuration for the 16 different operations (each such configuration being referred to as a parameter-set). This allows the HWA to perform pre-configured chained sequence of operations without frequent intervention from the μΡ 135'. Each parameter-set includes various configuration details for each component inside the accelerator engine. For example, these configuration parameters can include the number of radar samples to read, starting memory address for sample read operation, memory base address, enable/disable for Core Computational engine operations (FFT, Magnitude, Phase, etc.), number of samples to write, starting memory address for sample write operation, etc. This feature enables meaningful chaining or sequencing of various radar signal processing operations with minimal intervention from the μΡ 135' or other processor and as a result of that, makes efficient use of the capabilities of the FFT engine 210, triggering and DMA chaining options. The configuration registers 245 store common configuration information that is applicable to all parameter-sets.

[0035] The parameter-set also allows for autonomous interfacing of the HWA with DMA's for data transfer. Each parameter-set can be configured to require the HWA to trigger a DMA after it has completed the computations corresponding to the parameter-set. This allows the HWA to initiate a transfer of data out of its output buffers 130, or to initiate transfer of a new set of input data into its input buffer provided by ADC buffers 120. The execution of each parameter-set can also be made conditional on a trigger. Accordingly, the state machine delays the execution of a scheduled parameter- set, until the configured trigger condition is true. Examples of triggers include (1) An interrupt announcing the availability of data in the ADC buffer, (2) completion of a specific DMA transfer (3) software trigger from the main processor such as μρ 135'.

[0036] The overall operation of the radar sub-system 200 can be summarized as follows. The FFT engine 210 is configured by the μΡ 135' through the parameter configuration registers (or RAM) 245. Then, the state machine 240 kicks off and controls the overall operation of the HWA 125', which involves loading the parameters needed for current operation from the parameter-set configuration memory 235 into internal registers of the FFT engine 210 (or CFAR engine 220) and running the FFT engine 210 (or CFAR Engine 220) as per the programmed configuration. In one design, the FFT engine 210 and the associated memory (120, 130, 235, and 245) run on a 200 MHz clock.

[0037] In described examples, HWAs solve the above-described problems by using the FFT engine 210 that includes pre-processing, windowed FFT 212 and post-processing blocks 213 in one streaming data path, along with a parameter-set configuration memory 235 and state machine 240, such that a flexible sequence of operations (e.g., multi-dimensional FFT pre-processing, windowing FFT, and post processing 213, is performed back-to-back without frequent intervention by the main processor 135 (μΡ 135' in FIG. 2).

[0038] Described embodiments include a method of FMCW radar signal processing using a described HWA. The method can comprise:

1. Streaming pre-processed radar data from an input buffer (e.g., ADC buffers 120) to an HWA 125 comprising a FFT engine 210 coupled to the ADC buffers for receiving and processing pre-processed radar data, where the HWA includes a FFT engine 210 that performs calculating including interference-thresholding, windowing FFT and range FFT to generate post-processed radar data including range FFT data.

2. Streaming the post-processed radar data to an output buffer 130.

3. Transferring the range FFT data from the output buffer 130 to an external memory (140) in a transpose fashion. The transferring can comprise direct memory access (DMA), with the DMA being triggered automatically by the FIWA 125.

4. Repeating the calculating, streaming pre-processed and post-processed radar data, and transferring the pre-processed streaming radar data received at multiple antennas (or more generically multiple channels). The range FFT data for multiple antennas across multiple chirps in a frame are computed and transferred as in 1-3 described hereinabove.

The range FFT data originating from the multiple antennas across multiple chirps is then processed in further processing steps, comprising:

5. Transferring from the external memory 140 in blocks to an input memory (ADC buffers 120), with each block including data for one or more range gates across multiple chirps in a frame.

6. Performing multiple Doppler FFT's using the HWA, each Doppler FFT corresponding to a specific antenna of each of the one or more range gates corresponding to a block. Also, the absolute values of the Doppler FFT bins are computed and these are summed across multiple antennas. The summing of the absolute value across multiple antennas can be performed by running an appropriate length FFT (for e.g. a 4 point FFT for 4 antennas) in the HWA and then picking the first sample of the FFT output.

7. The Doppler FFTs computed in Step 6 and the sum of the absolute values of the Doppler FFT bins across antennas are both stored in external memory 140 via direct memory accesses (DMA), the DMA being triggered automatically by the HWA.

8. Repeating steps 5, 6, 7 across multiple blocks to cover all range gates corresponding to the range-FFT.

[0039] The method can also further comprising detecting radar target returns against a background using a CFAR engine 220 in a CFAR detection path parallel to the streaming series data path including a pre-processing block 221 and a CFAR detector 222. Samples corresponding to range-FFT or Doppler FFT can be streamed through the CFAR detector 222 to detect peaks which are above a programmed specified threshold compared to the surrounding samples.

[0040] The CFAR engine 220, can also be employed for interference detection using a method described herein. The digitized time domain samples from the Digital front end (corresponding to a single chirp on a single channel) that are stored in the ADC buffers 120 are streamed into the CFAR Engine 220 of the HWA. The pre-processing block 221 of the CFAR Engine 220 can be used to compute the magnitude (or log magnitude) of the streamed samples. The output of the pre-processing block 221 is then streamed to the CFAR detector 222 which then detects samples whose magnitude is significantly above the average magnitude of the surrounding blocks (these samples are considered to be corrupted by interference). The indices of the detected samples are stored in the output buffer. Subsequently, the μρ can read the list of indices of the detected samples and run any suitable algorithm (such as 1 -dimensional interpolation) for correcting the values of these samples.

[0041] A variant of the method described hereinabove is as follows. In this method the processing starts only when samples from the Digital front end corresponding to multiple chirps have been stored in the ADC buffers 120. The samples in the ADC buffers 120 can be viewed as being stored as a matrix, with each row corresponding to samples from the specific chirp. In the first step the samples from the ADC buffers 120 are sent row-wise into the CFAR engine to obtain a first series of lists, each list containing the indices of detected samples corresponding to each row.

[0042] In the second step, the samples from the ADC buffer 120 are sent column wise (using 2D-Memory indexing) into the CFAR engine to obtain a second series of lists, each list containing the indices of detected samples corresponding to each column. In the third step the first series of lists and the second series of lists (that are stored in the output buffers 130 of the HWA) are examined by the μρ (or other processor) to obtain a final list of samples from the ADC buffers 120 that exist in both the first list and the second list. This final list of samples is identified as being corrupted by interference. The μρ can then employ any suitable algorithm to correct these corrupted samples (such as 2-Dimensional interpolation).

[0043] Multiple dimensional computations, versatile access patterns from the input buffer into the HWA and to the output buffer from the HWA are enabled by a described 2D-Memory indexing scheme for memory access. Indexed memory addressing allows the significant flexibility in (a) the way in which data stored in the ADC buffers 120 is streamed into the HWA 125, and (b) the way in which data that is streamed out of the HWA 125 is stored in the output buffers 130. While each parameter-set defines a specific configuration (or operation) of the HWA (and input formatter/output formatter), this specific configuration can operate on multiple radar data sample streams (or simply sample streams), the number of such sample streams, the number of input samples for each sample stream and input/output access pattern for each sample stream can also be programmed in the same parameter-set. Thus for e.g., a single parameter-set can be configured to perform a 256-pt FFT on multiple sample streams (each sample stream, for e.g., corresponding to data from a different antenna).

[0044] In the 2D-Memory indexing scheme, the streaming of data from the input memory (ADC buffers 120) to the HWA can be defined by the parameters: SRC ADDR, SRC ACNT, SRC AIDX, BCNT, SRC BIDX, and SRC ACNT. For streaming into the HWA 125 from ADC buffers 120 as input memory, a sample stream comprising SRC ACNT samples (starting from SRC ADDR) is streamed, each sample being SRC AIDX bytes (which specifies the address offset (in bytes) separating successive samples) separated in the ADC buffers 120 as input memory from the previous sample. BCNT which specifies the number of iterations, such sample streams of SRC ACNT samples each are streamed in, the first sample of each sample stream being separated by SRC BINDX bytes from the preceding sample stream. Analogously, the streaming of data from the HWA to the output memory is defined by the parameters DST ADDR, DST ACNT, DST AIDX, BCNT, DST BIDX. The 2D-Memory indexing scheme allows for different access patterns at the input and output. Thus, one has SRC ADDR/D S T ADDR. SRC AIDX/DST AIDX etc. Only the number of iterations (BCNT) is generally consistent across both input and output streams.

[0045] In some embodiments the HWA might not include an FIR filter in the FFT block 210. In such embodiments filtering operations can be efficiently performed in the HWA as follows. In the first step the incoming samples are streamed into the HWA to perform an FFT (using a first parameter-set). In the second step, the samples corresponding to the FFT are streamed into the HWA (using a second parameter-set) with the pre-processing block 211 and the FFT engine 210both enabled. The complex multiplier is used to multiply the samples of the FFT with a complex vector that represents the frequency response of the desired filter and the FFT engine 222 performs an I-FFT (inverse FFT) on the output of the pre-processing block. Thus the entire filtering operation is performed efficiently with just two streamings of the data through the HWA.

[0046] Modifications are possible in the described embodiments, and other embodiments are possible, within the scope of the claims.