Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NAND IMPLEMENTATION FOR HIGH BANDWIDTH APPLICATIONS
Document Type and Number:
WIPO Patent Application WO/2009/079014
Kind Code:
A1
Abstract:
A flash memory system may include a NAND flash memory array, and x-address circuitry configured to decode and address one or more rows of data in the NAND array. The flash memory system may further include at least one shift register configured to process the data addressed by the x-address circuitry. The flash memory system may further include at least one external clock. In some embodiments, the shift register may be an asynchronous shift register. A flash translational layer may be provided that permits multiple simultaneous data transfers.

Inventors:
YANG WOODWARD (US)
HYUN JEA WOONG (US)
Application Number:
PCT/US2008/013908
Publication Date:
June 25, 2009
Filing Date:
December 18, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HARVARD COLLEGE (US)
YANG WOODWARD (US)
HYUN JEA WOONG (US)
International Classes:
G11C16/26; G11C7/10
Foreign References:
US20070130488A12007-06-07
US20020001238A12002-01-03
US7088627B12006-08-08
EP0171720A21986-02-19
US5717351A1998-02-10
US6016472A2000-01-18
US20040186946A12004-09-23
US20070165457A12007-07-19
Attorney, Agent or Firm:
KIM, Elizabeth, E. et al. (28 State StreetBoston, MA, US)
Download PDF:
Claims:

What is claimed is:

1. A flash memory system comprising: a NAND flash memory array; x-address circuitry configured to decode and address one or more rows of data in the NAND flash memory array; and at least one shift register configured to process the data addressed by the x- address circuitry in the NAND flash memory array.

2. The system of claim 1 , further comprising at least one external clock.

3. The system of claim 1 , where the shift register is an asynchronous shift register.

4. The system of claim 3, where the asynchronous shift register is configured to perform a parallel-in-serial-out (PISO) function.

5. The sytem of claim 3, where the asynchronous shift register is configured to perform a serial-in-parallel-out (SIPO) function.

6. The system of claim 1 , wherein the at least one shift register has a bandwidth that is configured to match a read bandwidth of the NAND array.

7. The system of claim 1 , where the at least one shift register is a bidirectional shift register.

8. The system of claim 7, wherein the bidirectional shift register comprises a plurality of NAND gates that are configured as OR gates to select data input from right or left bistables.

9. The system of claim 1 , where the at least one shift register is a unidirectional shift register.

10. The system of claim 1 , further comprising at least two independent data buffers, each one of the data buffers configured to be used concurrently for rapid data transmission from the NAND array to a host.

11. The system of claim 1 , wherein the at least two independent data buffers are configured to enable the host to execute at least one of:

a simultaneous Read-While-Load operation; and a simultaneous Write-While-Program operation.

12. The system of claim 1 , further comprising a flash translational layer configured to allow for one or more independent plane operations.

13. The system of claim 12, wherein the flash translational layer is further configured to allow for a plurality of concurrent operations.

14. The system of claim 13, wherein the plurality of concurrent operations comprise one or more of: a read-while-program operation; a read-while-erase operation; a program-while-erase operation; and a concurrent read, program and erase operation without address restriction.

15. A cascaded flash memory system comprising: a plurality of NAND arrays; a plurality of x-address circuits, each x-address circuit associated with one of the NAND arrays to decode and address one or more rows of data in the associated NAND array; a plurality of shift registers, each one of the shift registers configured to process the data addressed by one or more of the x-address circuits; and a flash translational layer configured to allow two or more concurrent operations on each of the NAND flash memory arrays.

16. The cascaded flash memory system of claim 15, wherein the flash translational layer is further configured to permit a plurality of simultaneous data transfers within one device or across a plurality of devices.

17. A method of increasing performance of a NAND flash memory system, the method comprising: replacing Y-address circuitry in the NAND flash memory system with at least one shift register; and using the at least one shift register to transfer data in and out of the NAND flash memory system.

18. The method of claim 17, wherein the at least one shift register comprises an asynchronous shift register.

19. The method of claim 17, further comprising the act of: providing a flash translational layer that permits a plurality of operations to be performed concurrently.

Description:

NAND IMPLEMENTATION FOR HIGH BANDWIDTH APPLICATIONS

[001] CROSS-REFERENCE TO RELATED APPLICATIONS

[002] This application is based upon, and claims the benefit of priority under 35 U. S. C. § 119(e) from U.S. Provisional Patent Application Serial No. 61/014,731 (the "731 provisional application"), filed December 18, 2007, entitled "Implementation and Organization of NAND for High Bandwidth Applications." The contents of the '731 provisional application is incorporated herein by reference in its entirety as though fully set forth.

[003] BACKGROUND

[004] Conventional NAND flash memory may be designed and implemented with an ability to randomly access any row (X) address and access any column (Y) address. The column (Y) address decoding and access may typically be implemented using combinational logic and x8 data I/O bus to buffers. Fig. 1 shows the performance (MB/s) over the past 12 years of conventional (x8b) NAND flash interfaces. Read and programming performance are also shown for typical Single-level cell (SLC) and Multi-level cell (MLC) implementations. As can be seen in Fig. 1 , only modest gains in I/O speed performance of NAND flash have been achieved over the past 12 years. In fact, the increasing density and lower manufacturing cost have been the primary driving forces in NAND technology development over the past 12 years. However, the bottleneck in I/O speed performance of NAND may need to be resolved as NAND increases in density/size and expands into higher speed applications.

[005] Factors that determine NAND I/O speed performance may include without limitation cell read and program times, memory array organization, memory array access time, and time to transfer data to and from the data buffer in the NAND. In general, previous implementations of NAND flash memory may take significant area, consume significant power per x8 data I/O access, and take a significant amount of time (about 25ns cycle per byte) and power to transfer data in/out of the data buffers. Limitations on read performance of NAND flash may result from the relatively long time to transfer data between the page buffer register and the host, typically 25 ns cycle per byte. As NAND flash memory sizes increase, cell read/program times increase, and the page buffer register size increases, the I/O bottleneck may get even larger. In addition, conventional NAND flash interfaces that include an 8-bit, bidirectional, multi-drop bus may suffer from

speed degradation with more than 4 devices on the bus, and may require a chip enable (CE) signal for each device.

[006] Several technologies have been implemented or proposed to improve read performance, including without limitation: increasing the size of a page (from 2112 bytes to 4224 bytes, for example); increasing the I/O from x8 to x16; implementing a multi-plane operation; and new interface approaches such as Open NAND Flash Interface, "ONFi". These technologies, however, may fail in one way or another to provide a definitive solution to the read/write performance issues of NAND flash. For example, increase in page size beyond 4224 bytes may not be easily achieved, and may not address the relatively slow x8/25ns access bandwidth to/from the page buffer register. Furthermore, an increase in block size as a result of increased page size may reduce the endurance and lifetime of the system and increase wear-leveling complexity. Increasing the I/O may require adding additional pins for the additional I/O lines and increased internal area for more complex routing circuitry. ONFi may not deal with the fundamental performance issues in NAND flash and only provides specifications for high speed off chip I/O interface.

[007] For these reasons, systems and methods that allow NAND to be made with reduced cost, higher speed, and lower power consumption, are desirable.

[008] SUMMARY

[009] A flash memory system may include a NAND flash memory array, and x-address circuitry configured to decode and address one or more rows of data in the NAND array. The flash memory system may further include at least one shift register configured to process the data addressed by the x-address circuitry. The flash memory system may further include at least one external clock. In some embodiments, the shift register may be an asynchronous shift register.

[010] BRIEF DESCRIPTION OF DRAWINGS

[011] The drawing figures depict one or more implementations in accord with the present concepts, by way of example only, not by way of limitations. The drawings disclose illustrative embodiments. They do not set forth all embodiments. Other embodiments may be used in addition or instead. In the figures, like reference numerals refer to the same or similar elements.

[012] FIG.1 illustrates the performance of conventional (xδb) NAND flash interfaces over the past 12 years.

[013] FIG. 2 illustrates a flash memory system in accordance with one embodiment of the present disclosure, in which the y-address circuitry in traditional NAND flash memory is replaced with a high speed shift register and an external clock.

[014] FIG. 3 illustrates a bi-directional shift register, used in one embodiment of the present disclosure.

[015] FIGS. 4A, 4B, and 4C illustrate an asynchronous shift register, used in one embodiment of the present disclosure.

[016] FIG. 5 illustrates a device/plane NAND architecture that includes a flash translational layer (FTL) that allows multiple simultaneous data transfers, in accordance with one embodiment of the present disclosure.

[017] FIG. 6 illustrates a dual buffering scheme in accordance with one embodiment of the present disclosure.

[018] DETAILED DESCRIPTION

[019] In the present disclosure, methods and systems are described for implementing NAND flash memory with reduced cost, increased speed, and lower power consumption. In some embodiments, the column (Y) address decoding and access circuitry of conventional NAND flash memory may be replaced with one or more high speed PISO/SIPO (parallel-in-serial-out and serial-in-parallel-out) shift registers and at least one external clock. The parallel interface may connect to the NAND memory array and the serial interface may connect to external I/O. In some embodiments, asynchronous shift registers may be used. In some embodiments, a NAND architecture may be implemented that allows faster transfer of data in and out of the data buffers, thereby increasing NAND flash performance. In some embodiments, high speed NAND flash memory may be implemented for use with solid-state drives, which are new devices that are meant to replace the magnetic hard disk drives (HDDs) used for data storage in computers. For example, by incorporating multiple devices of SLC NAND, the writing throughput may be accelerated significantly, for example up to 60 MB/s or more.

[020] FIG. 2 illustrates a NAND flash memory system 10 in accordance with one embodiment of the present disclosure, in which the y-address circuitry in traditional NAND flash memory is replaced with a high speed shift register and an external clock. As shown in FIG. 2, a conventional implementation of NAND flash memory may include x and y address decoding and access circuitry 14, 18, which permit random access to data at any

row (x) and column (y), respectively.

[021] The size of NAND flash memory has now increased so much that NAND flash memory may be used to replace HDD memory. HDD memory typically does not have random access capability at the bit or byte level. It is only at the block level that HDD may have some random access capability.

[022] Historically, the development of NAND flash memories have included the ability for random byte (x8) access. This has typically been achieved by using a row (X) address and column (Y) address decoder to access the NAND memory array and page buffer. However, as NAND flash memories have increased in density and in particular for applications for storage of large data files, the typical memory data access has predominately favored bytewise sequential page accesses. Therefore, it is proposed in the present disclosure that a shift register would have many advantages over the standard random column (Y) decoder that is currently used in NAND flash memory.

[023] In the embodiment illustrated in FIG. 2, the y address decoding and access circuitry 18 in the conventional NAND flash memory is replaced with one or more high speed shift registers 22 and an external clock 26. With the y-decoder circuitry 18 removed, flash memory information may be accessed in rows of data.

[024] Random read/write access is rarely utilized in the operation of modern NAND flash since the dominant application for NAND flash hardware and software drivers (flash translation layer or FTL) has been the direct and indirect replacement of HDDs. As desribed earlier, HDDs do not have the ability to access random bytes (read/write) of information. Instead, HDDs typically access information in much larger blocks as long sequences of bits to amortize the expensive hardware and long access times associated with moving the HDD head to the appropriate location on a spinning magnetic storage disk.

[025] Random access to each data element (byte) in a column is thus not required when NAND flash memory is implemented as a replacement HDD, i.e. random read/write capabilities are not required for HDD applications of NAND. Thus high speed shift registers may replace y address circuitry in NAND, even though high speed shift registers do not permit efficient access to any random read or write within a page (32Kb) of NAND flash memory. Such high speed shift registers could be clocked much faster, compared to a y- decoder.

[026] The high speed shift registers 22 may be significantly smaller, consume less power, and operate at x20 to x100 higher speed compared to conventional NAND flash. By incorporating high-speed shift registers in NAND flash memory, NAND flash memory may be made with less cost, higher speed, and lower power consumption.

[027] Using the shift registers 22, data may be transferred in and out of the data buffers 30 at a faster rate, thus significantly increasing the read and write performance of the memory 10.

[028] In some embodiments, the internal shift register may be designed to match the read bandwidth, and the NAND array 34 to match the shift register 22 bandwidth. For example, the internal shift register 22 may run at a 1GHz clock speed, while the time to transfer data between the array 34 and the page register (tR) is about 60 μs. Therefore, for a single- plane architecture, reading 4224 bytes of data may take about 93.8 μs ([2x1 ns] + [5x1 ns] + 60 μs + [4224x8x1 ns]), and for a two-plane architecture, reading 8448 bytes of data should take about 127.6 μs (60 μs + [4224x8x1 ns] + [4224x8x1 ns]).

[029] In some embodiments, it may be ideal to use a two plane architecture to maximize read performance since the data transfer from the array to the page buffer takes around 60 μs. Two different clock generators may also be used. For example, a fast clock generator may be used for shifting the first batch of data to the page buffer during a read, and a different clock may be used for a different application (i.e., Stand-alone NAND or SSD) for writes.

[030] In some embodiments of the present disclosure, the N-bit shift register 22 may be implemented using a 1-bit shift register followed by a serial-in-parallel-out (SIPO) operator to coincide with the data bus. Other embodiments may employ an x8 shift register in roughly the same pitch as an x1 shift register, for higher throughput.

[031] In some embodiments, the high speed shift registers 22 may be bi-directional shift registers. In other embodiments, the high speed shift registers 22 may be uni-directional shift registers.

[032] FIG. 3 illustrates a bi-directional shift register used in one embodiment of the present disclosure. A bidirectional shift register, or reversible shift register, is one in which data can be shifted either left or right. Such a bidirectional shift register may be implemented as a set of NAND gates that are configured as OR gates to select data input from the right or left bistables, as selected by the LEFT/RIGHT control line as shown in Fig.

3. In the present disclosure, the term "bidirectional shift register" has the same meaning as "reversible shift register", and these terms are used interchangeably.

[033] In some embodiments, an asynchronous shift register may be used to replace the y address circuitry 18. FIGS. 4A, 4B, and 4C illustrate an asynchronous shift register used in one embodiment of the present disclosure. FIG. 4A shows a conceptual model of the system as a delayed clock pulse. FIGS. 4B and 4C illustrates a circuit for implementing PISO (parallel-in-serial-out).

[034] As shown in FIGS. 4B and 4C, in its initial state, X and W are low and A is high. When X is set high, A turns low. When A is set low, W turns high. When W is set high, X turns low and W is X' for the next data element. Then X and W return to low once X' triggers this same cycle in the next register. Consequently, a pulse is transmitted across each of the data elements, serially transmitting each of the parallel data elements.

[035] An asynchronous shift register may seamlessly embed itself within the existing NAND flash technologies, since separate holding states are not required as would be the case if synchronous registers were used. An asynchronous shift register may have several advantages over a synchronous shift register. An asynchronous shift register may be implemented with approximately 50% less circuitry (and smaller area) for each stage since an internal holding state is not required. As soon as data from the current stage is shifted out, data from the next stage may be directly transferred. Also, the asynchronous shift register may be clocked at very high speed with regard for global clock skew and synchronization. The smaller shift register circuitry and reduced performance requirement on clock drivers may make the asynchronous shift register significantly lower power than synchronous shift registers (including associated clock drivers) and even standard column (Y) decoding circuitry and I/O buses.

[036] In some embodiments of the present disclosure, illustrated in FIG. 5, the NAND flash memory system 10 may include a flash translational layer (FTL) 150 that provides fully independent bank or plane operations. The flash translational layer 150 may allow for concurrent operations including without limitation: read while program, read while erase, program while erase, and concurrent read, program and erase without address restriction.

[037] This may contrast with multi-plane operations in conventional NAND, which are not truly independent operations, but are merely just doubled page size operations. Because the FTL 150 provides for completely independent plane operations, during operation of one layer of the memory, another operation may be run on the second layer of the memory.

[038] In the embodiment illustrated in FIG. 5, throughput may be increased by cascading devices. For example, it may be possible to increase the writing throughput up to 60 MB/s (or more) with 3 devices of SLC NAND flash (six planes with 4 kB of a page) running an internal 1 bit shift register at 1 GHz.

[039] With the device/plane architecture illustrated in FIG. 5, multiple simultaneous data transfers within a device or across devices may be accomplished in a cascaded manner, where multiple devices reside on the a bus. This makes it possible to rotate through devices, performing operations on them. When an operation is requested of the last device, the first device will already have finished its operation and will be ready to accept a new operation.

[040] In some embodiments of the present disclosure, a dual buffering system may be used to further increase the efficiency of the NAND flash memory system. FIG. 6 illustrates a dual buffering scheme in accordance with one embodiment of the present disclosure. In the illustrated example, the NAND flash memory system includes two independent data buffers, 180 and 182. These data buffers may be DataRAMs, for example. Each one of these two independent data buffers may be concurrently used for fast data transmission from the NAND array to the host. These dual buffers 180 and 182 may enable the host to execute simultaneous Read-While-Load, and Write-While Program operation.

[041] During a "Read" operation, 2kB of data may be read from the NAND into a data buffer (such as a DataRAM or page buffer), and may be transmitted to the host. The transmission may be made through a 16 bit bus. While the host is reading this information from the DataRAM or data buffer, another 2kB of data may be transferred from the NAND to another DataRAM, or a second data buffer RAM. Such a double buffering scheme, in conjunction with the large block architecture of NAND flash memory, may permit high speed performance reads. In one embodiment, read performance may reach up to about 68 MB/s. By using a shift register instead of a y-decoder, the data buffers 180 and 182 can be filled much more efficiently.

[042] During a "Program" operation, 2kB of data may be first loaded from the host to a first internal DataRAM, then moved to the page buffer of the NAND flash memory. Flash control logic may perform the actual program operation onto the loaded from host to DataRAM, which may also accelerate the next program operation. In one embodiment, program performance may reach about 9.3 MB/s. Additionally, according to an embodiment of the present disclosure, on-chip logic for generating error correction codes

(ECC) is provided, which is faster than software implemented error correction at the host side.

[043] A Read-While-Load Operation may accelerate the read performance of the NAND device by enabling data to be read out by the host from one data buffer (or DataRAM) while the other data buffer (or DataRAM) is being loaded with data from the NAND flash array memory.

[044] In sum, methods and systems have been described for implementing a new NAND architecture that uses high speed shift registers to transfer data in and out, instead of using I/O buffers and a large Y-gating circuitry. In this way, NAND flash performance (for both read and write) may be dramatically increased. These methods and systems may allow NAND flash memory to be much less expensive, have a much higher speed, and consume much less power.

[045] It is contemplated that the subject matter described herein may be embodied in many forms. Accordingly, the embodiments described in detail below are illustrative embodiments, and are not to be considered limitations. Other embodiments may be used in addition or instead.

[046] The components, steps, features, objects, benefits and advantages that have been discussed are merely illustrative. None of them, nor the discussions relating to them, are intended to limit the scope of protection in any way. Numerous other embodiments are also contemplated, including embodiments that have fewer, additional, and/or different components, steps, features, objects, benefits and advantages. The components and steps may also be arranged and ordered differently.

[047] The phrase "means for" when used in a claim embraces the corresponding structures and materials that have been described and their equivalents. Similarly, the phrase "step for" when used in a claim embraces the corresponding acts that have been described and their equivalents. The absence of these phrases means that the claim is not limited to any of the corresponding structures, materials, or acts or to their equivalents.

[048] Nothing that has been stated or illustrated is intended to cause a dedication of any component, step, feature, object, benefit, advantage, or equivalent to the public, regardless of whether it is recited in the claims.

[049] In short, the scope of protection is limited solely by the claims that now follow. That scope is intended to be as broad as is reasonably consistent with the language that is used in the claims and to encompass all structural and functional equivalents.