Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AREA-EFFICIENT PARALLEL TEST DATA PATH FOR EMBEDDED MEMORIES
Document Type and Number:
WIPO Patent Application WO/2017/075622
Kind Code:
A1
Abstract:
In described examples, a BIST controller (40) generates test data patterns to be applied to embedded memories (45) through a BIST data path. Each embedded memory (45) is coupled to a dedicated local comparator (46) that compares data read from the memory (45) during test with an expected data response forwarded from the BIST controller (40). The local comparators (46) associated with a group (48) of the memories to be tested in parallel receive the expected data response in parallel from a local response delay generator (44) that is shared among the group (48).

Inventors:
MEHROTRA RAJAT (IN)
NARESH NIKITA (IN)
NARAYANAN PRAKASH (IN)
SARKAR VASKAR (IN)
Application Number:
PCT/US2016/059796
Publication Date:
May 04, 2017
Filing Date:
October 31, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TEXAS INSTRUMENTS INC (US)
TEXAS INSTRUMENTS JAPAN (JP)
International Classes:
G11C29/12; G06F11/27
Foreign References:
US6934205B12005-08-23
US20080059850A12008-03-06
US20090172487A12009-07-02
US7324392B22008-01-29
US20140164856A12014-06-12
US7305597B12007-12-04
Other References:
See also references of EP 3369095A4
Attorney, Agent or Firm:
Michael A. Davis, Jr. et al. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. Circuitry for performing parallel test of a plurality of memories in an integrated circuit, comprising:

a first controller for generating a test data pattern to be applied to one or more groups of the plurality of memories;

a first group of local comparators, each local comparator of the first group of local comparators being associated with a respective one of a first group of the plurality of memories, the first group of the plurality of memories including at least two memories; and

a first shared local delay response generator, coupled to receive the test data pattern from the first controller and coupled to each of the first group of local comparators, for simultaneously applying an expected data response corresponding to the test data pattern to the first group of local comparators after a first selected delay.

2. The circuitry of claim 1, further comprising: one or more pipeline delay stages coupled to an output of the first controller, for forwarding the test data pattern from the first controller to the first group of memories and to the first shared local delay response generator after one or more delay periods.

3. The circuitry of claim 2, further comprising:

a second group of local comparators, each local comparator of the second group of local comparators being associated with a respective one of a second group of the plurality of memories; and

a second shared local delay response generator, coupled to receive the test data pattern from one or more of the pipeline delay stages and coupled to each of the second group of local comparators, for simultaneously forwarding an expected data response corresponding to the test data pattern to the second group of local comparators after a second selected delay;

wherein at least one of the pipeline delay stages is for forwarding the test data pattern from the first controller to the second group of memories and to the second shared local delay response generator after one or more delay periods.

4. The circuitry of claim 3, further comprising: at least one additional pipeline delay stage coupled between the first controller and the second group of memories and the second shared local delay response generator.

5. The circuitry of claim 3, wherein: the first controller generates the test data pattern for memories of a first memory type; and each of the memories of the first and second groups of memories is of the first memory type.

6. The circuitry of claim 1, wherein the first shared local delay response generator comprises one or more delay stages coupled in series.

7. The circuitry of claim 6, wherein each of the delay stages comprises a clocked buffer.

8. The circuitry of claim 1, wherein: the test data pattern generated by the first controller has a selected data word width; and the selected data word width corresponds to the widest data word width of the memories in the first group of the plurality of memories.

9. The circuitry of claim 1, further comprising:

a second controller for generating a test data pattern to be applied to one or more groups of the plurality of memories;

a second group of local comparators, each local comparator of the second group of local comparators being associated with a respective one of a second group of the plurality of memories; and

a second shared local delay response generator, coupled to receive the test data pattern from the second controller and coupled to each of the second group of local comparators, for simultaneously applying an expected data response corresponding to the test data pattern to the first group of local comparators after a second selected delay.

10. The circuitry of claim 9, wherein: the first controller generates the test data pattern for memories of a first memory type; each of the memories of the first group of memories is of the first memory type; the second controller generates the test data pattern for memories of a second memory type; and the second group of memories is of the second memory type.

11. The circuitry of claim 10, wherein the first memory type corresponds to single-port memory and the second memory type corresponds to double-port memory.

12. The circuitry of claim 10, wherein the first memory type corresponds to random access memory and the second memory type corresponds to non-volatile memory.

13. The circuitry of claim 1, wherein the integrated circuit further comprises at least one processor coupled to at least one of the memories.

14. The circuitry of claim 13, wherein: the integrated circuit is of the system-on-a-chip (SoC) type; and the plurality of memories are embedded memories in the SoC integrated circuit.

15. The circuitry of claim 1, wherein the first selected delay corresponds to a memory access latency of the memories in the first group.

16. Circuitry for performing parallel test of a plurality of memories in an integrated circuit, comprising:

a first controller for generating a test data pattern to be applied to one or more groups of the plurality of memories;

one or more pipeline delay stages coupled to an output of the first controller, for forwarding the test data pattern from the first controller after one or more delay periods, an output of one of the pipeline delay stages coupled to each of a first group of memories;

a first group of local comparators, each local comparator of the first group of local comparators being associated with a respective one of the first group of the plurality of memories, the first group of the plurality of memories including at least two memories; and

a first shared local delay response generator, coupled to receive the test data pattern from the one or more pipeline delay stages, the first shared local delay response generator coupled directly to each of the first group of local comparators with no additional delay stage coupled therebetween, for applying an expected data response corresponding to the test data pattern to the first group of local comparators after a first selected delay.

17. The circuitry of claim 16, further comprising: a second group of local comparators, each local comparator of the second group of local comparators being associated with a respective one of a second group of the plurality of memories, each of the second group of memories coupled to an output of one of the pipeline delay stages; and a second shared local delay response generator, coupled to receive the test data pattern from one or more of the pipeline delay stages and coupled to each of the second group of local comparators with no additional delay stage coupled therebetween, for forwarding an expected data response corresponding to the test data pattern to the second group of local comparators after a second selected delay.

18. The circuitry of claim 16, wherein the integrated circuit further comprises at least one processor coupled to at least one of the memories.

19. The circuitry claim 18, wherein: the integrated circuit is of the system-on-a-chip (SoC) type; and the plurality of memories are embedded memories in the SoC integrated circuit.

20. The circuitry of claim 16, wherein the first selected delay corresponds to a memory access latency of the memories in the first group.

Description:
AREA-EFFICIENT PARALLEL TEST DATA PATH FOR EMBEDDED MEMORIES

[0001] This relates generally to integrated circuit testing, and more particularly to testing of embedded memories in large-scale integrated circuits.

BACKGROUND

[0002] Many modern electronic integrated circuits integrate essentially all necessary functional components of a computer system, whether general purpose or arranged for a particular end application. Those large scale integrated circuits that include the computational capability for controlling and managing a wide range of functions and useful applications are often referred to as "system on a chip", or "SoC", devices. Typical modern SoC architectures include one or more processor "cores" that perform the digital computer functions of retrieving executable instructions from memory, performing arithmetic and logical operations on digital data retrieved from memory, and storing the results of those operations in memory. Other digital, analog, mixed-signal, or even RF functions may also be integrated into the SoC for acquiring and outputting the data processed by the processor cores. In any case, considering the large amount of digital data often involved in performing the complex functions of these modern devices, significant solid-state memory capacity is now commonly implemented in these SoC devices.

[0003] To optimize performance, memory resources are typically distributed throughout the typical modern SoC device. These memory resources can include both volatile and non-volatile memory. This distributed memory architecture results in memory resources being physically and electrically (or logically) proximate to the processing function that will be accessing it, but may be physically and logically remote from other similar memory of the same type. For example, the deployment of local memory resources will minimize the traffic over the system bus, which reduces the likelihood of bus contention and undesirable latency, and also reduces access time and memory management overhead. The number of memory arrays implemented throughout a modern large-scale SoC devices can be quite large, numbering into the hundreds in some cases.

[0004] At the time of manufacture, a full test of the integrated circuits' functionality and performance is important, especially because memory resources can occupy much of the chip area of a typical modern SoC. Conventional memory test algorithms can be quite time-consuming, particularly those involving test patterns of order O(nx) where x is greater than one, and as such the test time and test cost involved can be dominated by memory test. The distribution of embedded memory resources throughout typical SoC devices further complicates the task of memory test, as many memory arrays are not directly accessible to external automated test equipment yet must still be tested.

[0005] SoC devices typically include internal test circuitry ("built-in self-test," or "BIST," circuitry) that executes a self-test operation for the device upon power-up or reset. BIST may also be involved in the testing of memory, both at the time of manufacture and also on power-up. Conventional BIST memory test techniques can include the placement of hardwired logic in the SoC, for implementing memory test algorithms developed at the time of circuit design. However, at that early stage of the process, a determination of the particular tests to be performed may not be feasible. Another conventional BIST approach is to use the central processing unit of the SoC itself to perform the memory test. But this approach can be limited, because not all embedded memory arrays in the device may be visible to the CPU, and are thus not testable by the CPU. Direct memory access (DMA) techniques for providing external access to embedded memories are also known, but typically are unable to access the memory at its full operating speed.

[0006] Because of these limitations, programmable BIST ("pBIST") techniques have been developed to test embedded memories in the SoC context. U.S. Patent No. 7,324,392 and U.S. Patent Application Publication No. US 2014/0164856, both commonly assigned herewith and incorporated herein by reference, describe examples of these pBIST techniques for testing embedded memories in large-scale integrated circuits such as SoC devices. According to these approaches, the pBIST circuitry includes a general purpose test controller that is programmed by a set of instructions to produce test conditions for the various internal and embedded functions of the device, and to receive and log the responses of those functions to those test conditions. In the memory test context, these operations include the writing of the desired data pattern to an embedded memory, and then addressing the memory to retrieve and compare the stored data to the expected data. Typically, the BIST data path over which the data are communicated during memory test is a separate and independent data path from that by which the embedded memories are accessed in normal operation of the SoC. [0007] Because of the high test time and test cost for testing the memory capacity of the SoC device, as discussed above, BIST techniques have been developed for the parallel testing of embedded memories, such that multiple memory arrays are simultaneously tested. According to one conventional approach, this parallel test is implemented by multiple BIST controllers that simultaneously execute a test of an associated embedded memory. The provision of multiple BIST controllers multiplies the chip area required for the BIST test logic and data paths, forcing a trade-off between chip area and test time.

[0008] Conventional pBIST architectures, such as described in the above-incorporated U.S. Patent No. 7,324,392, include a BIST controller that is shared by multiple memories of similar memory type (e.g., single-port, double-port). The shared BIST controller generates the test pattern to be written to the memories, and also the expected response from the memories when read. Each memory has a local comparator that compares the data read from its memory during the test with the expected data from the shared BIST controller, and forwards the results to the shared BIST controller. For the expected data from the shared BIST controller to align with the data read from the parallel embedded memories, this conventional arrangement includes a local response delay generator that aligns the expected data to account for access latency for that particular memory, and a local comparator that compares the delayed expected data with the data read from that particular memory and generates a pass/fail signature accordingly.

[0009] FIG 1 illustrates an example of the architecture of a BIST memory test data path in a conventional SoC, in which shared BIST controller 10 supports the parallel test of memories 15 in a manner such as described in the above-incorporated U.S. Patent No. 7,324,392. This test data path is separate and independent from the data path by which memories 15 are accessed in normal operation, which is not shown in FIG 1 for the sake of clarity. As shown in this example, BIST controller 10 communicates with each memory 15 through one or more pipeline delay stages 12, in combination with an instance of local response delay generator 14 that is dedicated to that embedded memory 15. BIST controller 10 may be one of multiple such BIST controllers within the SoC. In architectures such as this example, a given BIST controller 10 is typically associated with memories 15 that are of a common type (e.g., single-port, double-port), considering that BIST controller 10 generates the particular test data pattern to be applied to its associated memories 15; as such, if the SoC includes multiple memory types, multiple BIST controllers 10 and associated data paths may be present. The data pattern generated by BIST controller 10 is applied directly to memories 15, after passing through the pipeline delay stages 12, but these data are not delayed by local response delay generators 14.

[0010] In this arrangement, pipeline delays 12 and each local response delay generator 14 delay the expected data response communicated from BIST controller 10 before application to the instance of local comparator 16 with which that local response delay generator 14 is associated. Local comparator 16 compares that delayed expected data response with the data read from its associated memory 15 during the memory test, and generates a pass/fail signature based on the results of that comparison. In this example, the pass/fail signatures generated by comparators 16 are communicated back to BIST controller 10, such as by parallel test data comparator 17 function, which produces an overall pass/fail signature for those memories 15 that were tested in parallel.

[0011] In this conventional architecture, instances of pipeline delays 12 may be shared by embedded memories 15 that are generally in the vicinity of one another. For example, pipeline delay 120 is shared by all embedded memories 15 shown in FIG 1, while pipeline delay 121 is shared by embedded memories 15 of group 181 that are in the general vicinity of one another, and pipeline delay 122 is shared by embedded memories 15 in group 182 that are in the general vicinity of one another. Each of pipeline delays 12 essentially operate as one or more clocked buffer stages for the data communicated by BIST controller 10, such that a data word applied at the input of an instance of pipeline delay 12 will appear at its output after a delay of x clock cycles, where x is the number of buffer stages in that pipeline delay 12. Each local response delay generator 14 is similarly constructed, and operates to delay the expected data it receives by one or more clock cycles, in order to align it with the memory access latency of its associated embedded memory 15.

[0012] Although the pipeline architecture in this conventional arrangement is "physically aware" by sharing pipeline stages 12 based on the general physical proximity of embedded memories 15, dedicated local response delay generators 14 are still provided in this architecture. These dedicated local response delay generators 14 can each occupy significant chip area, especially in the case of very wide data words (e.g., up to 128 bit) that are now often required in many modern SoC devices. In some cases, particularly those in which the overall chip area of the SoC is constrained by packaging considerations and other constraints, the chip area consumed by these dedicated local response delay generators can be prohibitive, such that parallel memory test cannot be implemented.

SUMMARY

[0013] In described examples, a BIST controller generates test data patterns to be applied to embedded memories through a BIST data path. Each embedded memory is coupled to a dedicated local comparator that compares data read from the memory during test with an expected data response forwarded from the BIST controller. The local comparators associated with a group of the memories to be tested in parallel receive the expected data response in parallel from a local response delay generator that is shared among the group.

BRIEF DESCRIPTION OF THE DRAWING

[0014] FIG 1 is an electrical diagram, in block form, of a conventional built-in self-test (BIST) data path architecture for the parallel test of embedded memories in a system-on-a-chip (SoC).

[0015] FIG 2 is an electrical diagram, in block form, of the architecture of an SoC device constructed according to embodiments.

[0016] FIG 3 is an electrical diagram, in block form, of the architecture of BIST circuitry and a BIST data path for the parallel test of embedded memories in the SoC of FIG 2, according to embodiments.

[0017] FIG 4 is an electrical diagram, in block form, of the construction of a shared local response delay generator in the architecture of FIG 3, according to embodiments.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0018] Example embodiments provide a built-in self-test (BIST) architecture for the parallel test of distributed memories in a large-scale integrated circuit can be implemented with reduced chip area. In at least one example, such embodiments provide such a BIST parallel memory test architecture that is suitable for implementation in system-on-a-chip (SoC) devices. Also, example embodiments provide such a BIST parallel memory test architecture that enables parallel test of embedded memories at full operating speed.

[0019] The embodiments described in this specification are implemented into a large-scale integrated circuit device including a number of computing and other operational functions, such as those integrated circuits commonly referred to as a "system-on-a-chip" or "SoC", because such implementation is particularly advantageous in that context. However, example embodiments may be beneficially applied to in other applications, such as any type of integrated circuit in which a number of memory arrays are embedded at various locations within the device.

[0020] FIG 2 illustrates, in block diagram form, the generalized architecture of SoC 400 constructed according to these embodiments. In this example, programmable logic serving as the central processing unit (CPU) of SoC 400 is provided by CPU 430 (e.g., microprocessor), such as an OMAP processor available from Texas Instruments Incorporated. SoC 400 may be constructed to include multiple CPUs 430, which may be of the same type as one another or which may be processors of other types such as generic programmable processors, digital signal processors (DSPs) or other application-specific or customized logic, including fixed sequence generators, as appropriate for the particular function of SoC 400.

[0021] Memory resources in SoC 400 are provided by non-volatile flash memory 410, read-only memory (ROM) 411, and random access memory 412, a portion of each of which is accessible to CPU 430 through address bus MAB and data bus MDB. The flash memory 410, ROM 411 and RAM 412 are shown in FIG 2 as unitary blocks, but these memory resources may alternatively be implemented as multiple memory blocks or arrays. Particularly in the case of RAM 412, these memory instances may be implemented by any one or more of a number of memory cell types and arrangements, including static RAM (SRAM), dynamic RAM (DRAM), and ferroelectric memory (FRAM). Also, for the case of RAM 412, individual instances of memory resources may have any one of a number of access architectures, including single-port and double-port access types.

[0022] Various peripheral functions may be also coupled to buses MAB, MDB, in order to be accessible to CPU 430 and one another. In the architecture of FIG 2, these peripherals include various signal processing functions, such as analog-to-digital (ADC) and digital-to-analog (DAC) converters, communications ports, timers, a "brownout" protection function, and serial and other interface functions. These various peripheral functions may be within the address space of SoC 400, as suggested by their accessibility via buses MAB, MDB; alternatively, one or more of these or other functions may be accessible to CPU 430 directly or via other functional circuitry. Security features may also be implemented within SoC 400, such as by secure state machine 448 in combination with stored security parameters in secure flash memory 440 and secure tag hardware 446, in order to execute features such as preventing data reads or writes to areas of memory that are specified to be secure areas unless a secure mode is enabled. SoC 400 also includes other functions, such as its clock system, emulation system 420 and JTAG interface 421 for debug and emulation.

[0023] SoC 400 may include additional or alternative functions to those shown in FIG 2, or may have its functions arranged according to a different architecture from that shown in FIG 1.

[0024] In addition to the various memory resources 410, 411, 412 that are accessible via address bus MAB and data bus MDB, many of the circuit functions within SoC 400 may themselves include local memory resources that are not directly accessible to CPU 430. For example, digital functions, such as the various interfaces, state machines (e.g., SM 448) and timers, can include blocks of RAM for data storage, or even flash memory or ROM for storage of configuration data or program instructions. Especially for those functions that operate largely in the digital domain, these memory resources may collectively, if not individually, occupy significant chip area in SoC 400. As mentioned above, functional testing of these memory resources is important, even though they may not be directly accessible to CPU 430 via buses MAB, MDB or otherwise. CPU 430 itself may also include local memory resources, such as one or more levels of cache memory.

[0025] More generally, the various memory resources and other functions in the architecture of SoC 400 may not be physically implemented in the arrangement shown in FIG 2, but may instead be placed at various locations within the integrated circuit. In this regard, those memory resources and other functions may in fact be physically (and logically, for that matter) distributed as multiple instances of SoC 400.

[0026] According to these embodiments, SoC 400 includes built-in self-test (BIST) circuitry 450, which controls the execution of self-test program routines for SoC 400. BIST 450 may have an external interface to receive commands from automated test equipment (not shown), and to communicate test results in response. Additionally or alternatively, BIST 450 may perform a self-test function upon power-up of SoC 400. In any case, according to these embodiments, BIST 450 is coupled to memory resources 410, 411, 412, and other functions of SoC 400 that include local memory, including CPU 430 in this example. As shown in FIG 2, BIST 450 is coupled to these other memories and functions through BIST data path BIST DP. Data path BIST DP is separate and independent of buses MAB, MDB, and as such is able to directly access local memory arrays and other local functions that may not be accessible to CPU 430 over the data path of buses MAB, MDB, through which data are communicated during the normal operation of SoC 400. Also, as described in further detail below, BIST data path BIST DP includes: conductors arranged as data and control buses; and various circuit functions as will be used in the BIST of the memory resources of SoC 400.

[0027] Referring to FIG 3, the architecture of a portion of BIST circuitry 450 and its associated BIST data path BIST DP according to an embodiment will now be described. The portion of the BIST functionality of SoC 400 shown in FIG 3 performs the parallel test of several memories 45 of the same memory type, such as single-port SRAM. The BIST architecture involved with the testing of other memory types such as double-port memory and non-volatile memory may be similarly constructed. This constraining of parallel test to groups of embedded memories of the same type enables the test data patterns and address patterns for those memories to be simultaneously generated; the test patterns for different memory types will generally differ by type. In the architecture of FIG 3 according to this embodiment, BIST controller 40 is implemented within BIST 450, and is constructed of logic circuitry for generating the desired test data pattern and address sequences that are to be applied, in parallel, to its associated memories 45 in carrying out the test. Multiple BIST controllers 40 may be provided within BIST 450, to similarly control the parallel test of other groups of memories 45, such as those of different memory type or architecture. Alternatively or additionally, multiple BIST controllers 40 for the same memory type may be provided, if desired.

[0028] In this example, the test data pattern generated by BIST controller 40 is in the form of a sequence of parallel data words of a width corresponding to the widest data word width among the embedded memories 45 with which this BIST controller 40 is associated. As mentioned above, in modern SoC devices, memories can support a data word of up to 128 bits, if not wider. Bit-wise "masks" may be applied downstream in BIST data path BIST DP as appropriate for the testing of embedded memories 45 of smaller data word widths. This test data pattern is communicated by BIST controller to its memories 45 through one or more pipeline delay stages 42 within BIST data path BIST DP. As in the conventional arrangement described above, each pipeline delay stage 42 according to this embodiment may be constructed as one or more clocked buffer stages, to delay the propagation of a test data word at its input for one or more system clock cycles (i.e., the number of buffer stages in that pipeline delay stage 42). The delay inserted by a pipeline delay stage 42 will usually correspond to the delay of one or more pipeline stages in the execution flow of SoC 400. [0029] In this embodiment, similarly as in the conventional architecture of FIG 1, pipeline delay stages 42 may be shared by those embedded memories 45 that are in the general vicinity of one another and that are to be tested in parallel with one another. For the example of FIG 3, pipeline delay 420 is shared by all embedded memories 45, while pipeline delay 421 is shared by embedded memories 45 of group 481 that are in the general vicinity of one another, and pipeline delay 422 is shared by embedded memories 45 in group 482 that are in the general vicinity of one another. In this manner, the placement and use of pipeline delay stages 42 can be considered to be "physically aware". The total number of pipeline stages 42 that are in BIST data path BIST DP is selected so that the parallel memory test can be carried out at the full operating speed of embedded memories 45, in normal operation.

[0030] The output of the last of pipeline stages 42 in BIST data path BIST DP for a given group of embedded memories 45 is coupled to each of the memories in that group. For example, the output of pipeline delay stage 421 is coupled to embedded memories 45 in group 481.

[0031] According to this embodiment, a shared local delay response generator 44 is inserted into BIST data path BIST DP for each group 48 of embedded memories 45, at a point following the last of the pipeline delay stages 42 for that group. The output of this shared local delay response generator 44 is directly coupled to local comparators 46 associated with those embedded memories 45 in its group 48, with no additional dedicated delay response generators or other delay in BIST data path BIST DP between shared local delay response generator 44 and any of its associated local comparators 46. FIG 4 illustrates the functional arrangement of an example of shared local delay response generator 44j . In this generalized architecture, shared local delay response generator 44j includes m buffer stages B0 through Bm-1 (m > 1), each clocked by a clock signal such as system clock SYS CLK. As a result, expected data word EDR in, of a width of n bits, is delayed by a time corresponding to m cycles of system clock SYS CLK before appearing as expected data word EDR out, for application to its associated local comparators 46.

[0032] According to this embodiment, shared local delay response generator 44 provides the entirety of the local delay in the expected response data applied to local comparators 46 for the embedded memories 45 in the associated group. Conversely, this architecture has no additional local delay response generator that is dedicated to a particular embedded memory 45 within that group. As a result, shared local delay response generator 44 is shared by those embedded memories 45 in SoC 400 that have essentially the same latency as one another. This allows shared local delay response generator 44 to provide the same delay in the expected data response for each memory 45 in that group 48; accordingly, the expected data response is applied by shared local delay response generator 44 simultaneously to each of the local comparators 46 in that group 48. In a general sense, the latency time of these memories 45 includes such factors as the operational timing (e.g., read access time) of those memories 45, and other local delays such as those corresponding to the physical length of the data path to those memories 45. In operation, the delay in the expected data response inserted by shared local delay response generator 44 ensures that the expected data response is applied to local comparators 46 at the correct time for those local comparators 46 to compare the output of their associated embedded memories 45 with the corresponding expected response, in parallel with one another. Each local comparator 46 generates a pass/fail signature from the results of these comparisons in the memory test algorithm, and forwards that pass/fail signature to parallel test data comparator 47, which combines the results from local comparators 46 into data for return to BIST controller 40 for this memory type.

[0033] As evident from FIG 3, multiple groups 481, 482 of embedded memories 45 that are associated with this instance of BIST controller 40 can be tested either sequentially relative to one another, or in parallel with one another. In this example, shared local delay response generator 442 for embedded memories 45 in group 482 can insert a different delay into the expected data response for its local comparators 46 than that inserted by local delay response generator 441 for group 481. In some cases, the inserted delay by these local delay response generators 44 may be the same delay, but the physical distance between the groups 481, 482 may be such that a single shared local delay response generator 44 would not work.

[0034] The sharing of a single local delay response generator 44 among multiple embedded memories 45 according to this embodiment significantly reduces the chip area required to implement parallel memory test in this BIST context. As discussed above, the area required to implement dedicated local delay response generators for each embedded memory can be significant, especially for wide data words. However, in this embodiment, this chip area is reduced by the number of embedded memories 45 that can share an instance of shared local delay response generator 44. This area savings is especially significant in those devices in which the memory data path is extremely wide, such as on the order of 128 bits or more. For example, one modern SoC device includes as many as twelve instances of 128 bit wide memory that can share a single local delay response generator, resulting in an area reduction of nearly 60% for the memory data path in that device as compared with the conventional BIST architecture with dedicated delay response generators for each memory, such as discussed above relative to FIG 1.

[0035] These embodiments thus enable the efficient implementation of parallel memory test within a BIST framework in very large-scale integrated circuits such as SoC devices, even where many embedded memory arrays and functions are widely distributed throughout the device. The reduction of chip area provided by these embodiments permit this parallel memory test, even in devices for which the chip area are severely constrained, such as due to packaging requirements. Parallel testability, even of deeply embedded memories (i.e., not directly accessible to the CPU of the device), is thus provided in a cost-effective manner. Also, these embodiments permit the parallel memory test to be performed at the full operating speed of the memories, ensuring a thorough and complete test of embedded memories.

[0036] Modifications are possible in the described embodiments, and other embodiments are possible, within the scope of the claims.