Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MEMORY CONTROL COMPONENT WITH SPLIT AUXILIARY CHANNEL
Document Type and Number:
WIPO Patent Application WO/2024/097057
Kind Code:
A1
Abstract:
A memory control component allocates a portion of an auxiliary signaling channel and corresponding memory storage, conventionally dedicated to error correction code (ECC) conveyance and storage, for conveyance of metadata and/or other types of component-level information - splitting the auxiliary channel between metadata and ECC conveyance/storage in proportions that obviate conventional metadata conveyance/storage via the primary data channel and thus maintaining full primary channel bandwidth/storage-capacity for user data.

Inventors:
TRAN ANH (US)
BAPODRA DHAIRYA (US)
PATEL NIRAV (US)
Application Number:
PCT/US2023/035867
Publication Date:
May 10, 2024
Filing Date:
October 25, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ASTERA LABS INC (US)
International Classes:
G06F11/10
Foreign References:
US20210141692A12021-05-13
US20190288710A12019-09-19
US20130117625A12013-05-09
Attorney, Agent or Firm:
SHEMWELL, Charles, E. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. An integrated circuit component comprising: a host interface to receive a host command and write data, the write data including first and second component data values; a memory interface; and control circuitry, responsive to the host command, to: generate one or more error correction codes based on the first and second component data values; output the first component data value via the memory interface for storage within a first subset of memory ICs within a memory subsystem; and output the second component data value together with the one or more error correction codes via the memory interface for storage within a second subset of the memory ICs within the memory subsystem.

2. The integrated circuit component of claim 11 wherein the control circuitry to generate the one or more error correction codes based on the first and second component data values comprises circuitry to generate at least two error correction codes, including a first error correction code based on a first portion of the first component data value and a first portion of the second component data value, and a second error correction code based on a second portion of the first component data value and a second portion of the second component data value.

3. The meth integrated circuit component of claim 1 wherein the control circuitry to generate the one or more error correction codes comprises circuitry to generate the one or more error correction codes with different bit depths in accordance with a configuration signal including circuitry to generate, as at least one of the error correction codes (i) a first error correction code constituted by a first number of bits if the configuration signal indicates a first bit depth and (ii) a second error correction code constituted by a second number of bits if the configuration signal indicates a second bit depth, the second number of bits exceeding the first number of bits.

4. The integrated circuit component of claim 1 wherein the control circuitry to generate the one or more error correction codes based on the first and second component data values comprises circuitry to generate a predetermined number of error correction codes in accordance with a configuration signal, including (i) a first number of error correction codes if the configuration signal indicates a first error correction code density, and (ii) a second number of error correction codes if the configuration signal indicates a second error correction code density, the first number being a nonzero integer and the second number exceeding the first number. The integrated circuit component of claim 1 wherein the control circuitry to output the second component data value together with the one or more error correction codes via the memory interface comprises circuitry to output, via the memory interface, constituent portions of the second component data value sequentially over a respective plurality of signaling intervals. The integrated circuit component of claim 5 wherein the circuitry to output constituent portions of the second component data value sequentially over the respective plurality of signaling intervals comprises circuitry to output, in at least one of the signaling intervals, one of the constituent portions of the second component value together with a first predetermined number of constituent bits of the one or more of the error correction codes. The integrated circuit component of claim 6 wherein circuitry to output, in at least one of the signaling intervals, one of the constituent portions of the second component value together with a first predetermined number of constituent bits of the one or more of the error correction codes further outputs, in at least other one of the signaling intervals, another one of the constituent portions of the second component value together with a second predetermined number of constituent bits of the one or more of the error correction codes, the second predetermined number of constituent bits of the one or more error correction codes being different from the first predetermined number of constituent bits of the one or more error correction codes. The integrated circuit component of claim 1 wherein the host interface to receive the host command and write data via the host interface comprises an interface compliant with a cache-coherent communications standard and wherein the second component data value comprises a metadata value required by the cache-coherent communications standard. The integrated circuit component of claim 1 wherein: the memory interface comprises first and second sets of output drivers, the first set of output drivers constituted by at least 8 times as many output drivers as the second set; the control circuitry to output the first component value via the memory interface comprises circuitry to output a plurality of portions of the first component value via the first set of output drivers over N respective transmit intervals, N being an integer greater than one; and the control circuitry to output the second component value together with the one or more error correction codes via the memory interface comprises circuitry to output a plurality of portions of the second component value and the one or more error correction codes value via the second set of output drivers over the N transmit intervals, respectively. The integrated circuit component of claim 1 wherein the first component data value is constituted by 2N bits and the second component data value is constituted by M bits, where N and M are integer values greater than one and log2(2N + M) is a non-integer value. The integrated circuit component of claim 1 wherein the first component data value is constituted by 2N bits and the second component data value is constituted by M bits, where 2N is at least 8 times greater than M. A method of operation with an integrated-circuit (IC) component having a host interface and a memory interface, the method comprising: receiving a host command and write data via the host interface, the write data including first and second component data values; and in response to the host command: generating one or more error correction codes based on the first and second component data values; outputting the first component data value via the memory interface for storage within a first subset of memory ICs within a memory subsystem; and outputting the second component data value together with the one or more error correction codes via the memory interface for storage within a second subset of the memory ICs within the memory subsystem. The method of claim 12 wherein generating the one or more error correction codes based on the first and second component data values comprises generating at least two error correction codes, including: generating a first error correction code based on a first portion of the first component data value and a first portion of the second component data value; and generating a second error correction code based on a second portion of the first component data value and a second portion of the second component data value. The method of claim 12 wherein generating the one or more error correction codes comprises generating the one or more error correction codes with different bit depths in accordance with a configuration signal including generating, as at least one of the error correction codes, a first error correction code constituted by a first number of bits if the configuration signal indicates a first bit depth, and generating, as the at least one of the error correction codes, a second error correction code constituted by a second number of bits if the configuration signal indicates a second bit depth, the second number of bits exceeding the first number of bits. The method of claim 12 wherein generating the one or more error correction codes based on the first and second component data values comprises generating a predetermined number of error correction codes in accordance with a configuration signal, including generating a first number of error correction codes if the configuration signal indicates a first error correction code density and generating a second number of error correction codes if the configuration signal indicates a second error correction code density, the first number being a nonzero integer and the second number exceeding the first number. The method of claim 12 wherein outputting the second component data value together with the one or more error correction codes via the memory interface comprises outputting, via the memory interface, constituent portions of the second component data value sequentially over a respective plurality of signaling intervals. The method of claim 16 wherein outputting constituent portions of the second component data value sequentially over the respective plurality of signaling intervals comprises outputting, in at least one of the signaling intervals, one of the constituent portions of the second component value together with a first predetermined number of constituent bits of the one or more of the error correction codes. The method of claim 17 wherein outputting constituent portions of the second component data value sequentially over the respective plurality of signaling intervals comprises outputting, in at least other one of the signaling intervals, another one of the constituent portions of the second component value together with a second predetermined number of constituent bits of the one or more of the error correction codes, the second predetermined number of constituent bits of the one or more error correction codes being different from the first predetermined number of constituent bits of the one or more error correction codes. The method of claim 12 wherein receiving the host command and write data via the host interface comprises receiving the host command and write data via an interface compliant with a cache-coherent communications standard and wherein the second component data value comprises a metadata value required by the cache-coherent communications standard. The method of claim 12 wherein: the memory interface comprises first and second sets of output drivers, the first set of output drivers constituted by at least 8 times as many output drivers as the second set; outputting the first component value via the memory interface comprises outputting a plurality of portions of the first component value via the first set of output drivers over N respective transmit intervals, N being an integer greater than one; and outputting the second component value together with the one or more error correction codes via the memory interface comprises outputting a plurality of portions of the second component value and the one or more error correction codes value via the second set of output drivers over the N transmit intervals, respectively. The method of claim 12 wherein the first component data value is constituted by 2N bits and the second component data value is constituted by M bits, where N and M are integer values greater than one and log2(2N + M) is a non-integer value. The method of claim 12 wherein the first component data value is constituted by 2N bits and the second component data value is constituted by M bits, where 2N is at least 8 times greater than M. An integrated circuit component comprising: a host interface to receive a host command and write data, the write data including first and second component data values; a memory interface; and means, responsive to the host command, for: generating one or more error correction codes based on the first and second component data values; outputting the first component data value via the memory interface for storage within a first subset of memory ICs within a memory subsystem; and outputting the second component data value together with the one or more error correction codes via the memory interface for storage within a second subset of the memory ICs within the memory subsystem.

Description:
Memory Control Component with Split Auxiliary Channel

DRAWINGS

[0001] The various embodiments disclosed herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

[0002] Figure 1 illustrates an embodiment of a memory system having a split-auxiliary- channel memory control component coupled via A and B memory channels to a memory subsystem constituted by one or more dual-channel DIMMs (dual inline memory modules);

[0003] Figure 2 illustrates exemplary serializer/deserializer and error correction code (ECC) encoder/decoder circuitry within a mode-configurable memory-side physical signaling interface (PHY);

[0004] Figure 3 presents a conceptual view of application, metadata and ECC storage according to the unified- and split- auxiliary operating modes shown in Figure 2, showing exclusively ECC storage within the auxiliary partition in a unified- auxiliary mode, and an evenly split allocation of that same storage in split- auxiliary mode, half of the auxiliary storage to metadata and the other half to ECC;

[0005] Figure 4 illustrates various configurable (or fixed-by-hardware) ECC-density options that may be implemented by the split- auxiliary control component of Figures 1 and 2; [0006] Figure 5 illustrates applicability of the various ECC bit depths shown within the Figure-4 table across different multi-channel DIMM implementations;

[0007] Figure 6 illustrates exemplary application of a split-auxiliary channel in an alternative DIMM implementation having two 36-bit memory channels;

[0008] Figures 7 and 8 illustrate examples of ECC circuit parallelization for the two-unit- interval per ECC codeword example shown in row 2 of the Figure-4 table row two, with parallel (multi-instance) ECC encoders/decoders clocked by half-rate core clock and quarter-rate core clocks, respectively;

[0009] Figure 9 presents a tabulated view of various configuration operations that may be selected/enabled to yield half-rate core clock and quarter-rate core clock ECC encoding/decoding for selected ECC bit depths and across the exemplary module memory populations depicted in Figures 1, 5 and 6; and

[0010] Figure 10 illustrates an embodiment of a memory control component having a configuration register that may be run-time and/or production-time programmed to enable operation in either of the unified- auxiliary or split- auxiliary modes presented herein and, when split-aux mode is enabled, to operate with an ECC density that yields a desired metadata subchannel bandwidth/storage capacity within an otherwise ECC-dedicated auxiliary channel.

DETAILED DESCRIPTION

[0011] In various embodiments herein a memory control component allocates a portion of an auxiliary signaling channel and corresponding memory storage, conventionally dedicated to error correction code (ECC) conveyance and storage, for conveyance of metadata and/or other types of component-level information - splitting the auxiliary channel between metadata (or other information) and ECC conveyance/storage in proportions that obviate conventional metadata conveyance/storage via the primary data channel and thus maintaining full primary channel bandwidth/storage-capacity for user data.

[0012] In a number of embodiments ECC density (volume of ECC information per memory access transaction) is scaled as necessary to accommodate a desired metadata volume - for example, exploiting non-linearity between ECC bit-depth and volume of ECC-protected data to exponentially expand the protected data volume at the cost of one or more additional ECC bits, reducing the total ECC content conveyed per memory access transaction and thus freeing bandwidth within the otherwise ECC-dedicated auxiliary channel for conveyance of metadata. Where user data or other host-level processing data (“application data”) for a given memory access is conveyed between the control component and memory subsystem in serialized transfers (i.e., a data burst having a length according to the number of serialized transfers), ECC density scaling may be effected by extending a single ECC value to protect data content within an extended number of serialized transfers (e.g., instead of ECC value per individual transfer, incrementally larger ECC value to protect collective data within 2, 4, 8, 16, etc. transfers), configuring data serializing/deserializing circuitry within a physical signaling interface (PHY) as necessary to effect the extended ECC protection. Additionally, the metadata conveyed per memory access may be constrained to yield a collective data volume (i.e., total of the metadata and host-level application data) within the protection size of the scaled ECC value, thus enabling both the application data and metadata to be ECC-protected. In other embodiments, the partitioning of a given memory channel between auxiliary channel and application data channel (primary data channel) may vary, with ECC density scaling likewise adjusted to enable conveyance of desired metadata volume. Also, in all embodiments, ECC encoding/decoding may be executed in a reduced-frequency clock domain - for example, a core clock domain of the memory control component that cycles at half, one-quarter, etc. of PHY clock(s) used to time data transmission/reception over controller- to-memory signaling links - with ECC encoding/decoding circuitry and serialization/deserialization circuitry configured accordingly (e.g., programmatically or by design). These and other features and embodiments are described in further detail below.

[0013] Figure 1 illustrates an embodiment of a memory system 100 having a split- auxiliary-channel memory control component 101 coupled via A and B memory channels to a memory subsystem constituted by one or more dual-channel DIMMs 103. As shown, each of the memory channels (105, 107) is partitioned into a primary “application data” channel and an auxiliary channel with the latter split (in at least one operating mode of the control component) between ECC and metadata subchannels - an out-of-band approach that enables metadata conveyance and storage (e.g., within an auxiliary partition of the memory subsystem) without compromising primary-channel bandwidth or storage capacity as required in conventional in- band systems (i.e., metadata conveyed via primary channel and stored within application data space - degrading both application data bandwidth and storage capacity).

[0014] In the depicted example, memory control component 101 (“control component,” “memory controller” or “controller”) includes counterpart physical signaling interfaces 115 and 117 to a host component (e.g., processor, application specific circuit, neural engine, etc., not shown) and a memory subsystem that includes one or more DIMMs 103 - host-side PHY and memory-side PHY, respectively. In a data-write transaction, the memory control component receives application data and associated metadata via host-side PHY 115 (together with a corresponding memory write instruction/command), queuing the write data (121) and optionally the metadata for eventual output to the memory subsystem via memory-side PHY 117. When operated in a split- auxiliary mode (i.e., splitting the auxiliary channel into metadata and ECC sub-channels), control component 101 generates a predetermined amount of ECC information (predetermined number of ECC values with predetermined bit depth) per memory access transaction according to volume of metadata to be conveyed/stored as part of that same transaction, thus freeing a portion of the auxiliary channel bandwidth and storage capacity for metadata storage. In the case of single-error-correction, double-error-detection (SECDED) ECC encoding/decoding, for example, the control component may apply a predetermined (predesigned, host-instructed, programmatically-selected, etc.) SECDED encoder/decoder hardware set to generate an ECC volume sized in complement with the per-transaction metadata volume such that: the auxiliary sub-channel (within either of the A and B memory channels) bandwidth and storage capacity is split according to the ECC/metadata ratio, enabling metadata conveyance and storage exclusively via the auxiliary sub-channel partition (i.e., no primary channel bandwidth or storage capacity consumed).

* the ECC codeword (ECC-protected data and ECC value, collectively) is sufficiently large to encompass both the application data and associated metadata, meaning that both application data and metadata are subject to ECC protection (i.e., errors detected and, if possible, corrected in both metadata and application data).

[0015] Still referring to Figure 1, each of the A and B memory channels includes multiple sets of ‘N’ signaling links (N=8 in this example) coupled to respective DIMM-mounted memory components 130 (e.g., each set of N signaling links coupled directly or via an data buffer component (unshown) to a pair of integrated-circuit memory components 130 disposed on opposite faces of the DIMM), with a predominant share of the signaling links and corresponding memory components allocated to conveyance of application data (primary channel), and a relatively smaller share of the signaling links and memory components allocated to the auxiliary channel. When operated in an optional unified- auxiliary mode, memory controller 101 dedicates the auxiliary channel (i.e., auxiliary sets of signaling links and memory component(s) 130 coupled thereto) exclusively to ECC conveyance and storage. In split- auxiliary mode, by contrast, the memory controller splits the auxiliary channel into ECC and metadata subchannels as discussed above, scaling the ECC density (amount of ECC information per memory transaction) as necessary to enable all metadata (and/or other component-level information) associated with per-transaction application data to be conveyed over the auxiliary channel, thus effecting metadata conveyance and storage without degrading application data signaling bandwidth or storage allocation.

[0016] Exemplary detail views 141 and 143 (Figure 1) contrast control-component data transmissions in unified-auxiliary and split- auxiliary operating modes. As shown, the auxiliary channel is dedicated to ECC transmission in unified auxiliary mode, conveying a 7-bit SECDED ECC value with each 32-bit application data transfer - SECDED (k, n), where the ECC- protected data block size, ‘k’, is 32 bits, ECC size, ‘n’, is 7 bits and codeword size n+k is 39 bits. Accordingly, in this xl6 burst example (i.e., information conveyed in 16 successive unit intervals or bit times per memory-channel link per memory transaction, with each unit interval (“ui” or “UI”) commencing at a respective timing signal edge (data-strobe (DQS) or clock), control component 101 transmits (or receives) 16 distinct 7-bit ECC values per 512 bits of application data, with each 7-bit ECC value providing error detection/correction for a respective 32-bit block of the application data (and with one signaling link of the auxiliary channel unused) to effect a 39-bit codeword. [0017] In the split-auxiliary operating mode, by contrast, ECC density is scaled from 7 ECC bits per 32-bit application data block to 8 ECC bits per 64-bit application data block (i.e., such that each ECC value corresponds to a pair of 32-bit application data transfers in successive unit intervals as shown at 147 and 149) through SECDED (72, 8) encoding/decoding, to free up half the auxiliary channel for conveyance/storage of 64 bits of metadata (or other componentlevel information) as shown at 150. Thus, the auxiliary channel is split evenly between ECC and metadata subchannels in this example, conveying four bits of metadata and 4 bits of ECC per unit interval (along with 32 bits of application data) and requiring two UIs to convey the entire 80-bit codeword - 8-bits ECC value, 8 bits of metadata and 64 bits of application data for a total of 80 bits conveyed in successive 40-bit transfers.

[0018] In other embodiments or programmatic configurations, ECC density may be further reduced (scaled down) to support larger per-transaction metadata volumes (e.g., increasing the proportion of the auxiliary subchannel allocated to metadata conveyance/storage), and various capabilities and configurations of physical memory (DIMMs 103 in the Figure 1 example) may bear on ECC encoder/decoder implementation and split-auxiliary-channel options. For example, the relative and absolute widths of the application data channel and auxiliary channel may change according to the number of memory channels per memory subsystem (e.g., two memory channel subsystem shown may instead be a single channel or support than two memory channels), memory device widths and quantity of memory chips per memory rank (i.e., memory rank being a set of memory ICs concurrently enabled to respond to the same memory access command by sampling our outputting data via respective subsets of signaling links within the memory channel) other than shown in Figure 1. In those cases, ECC encoding/decoding hardware (SECDED or otherwise) may be revised/re-configured according to the available auxiliary channel bandwidth and desired per-transaction metadata volume (e.g., different SECDED encoders/decoders implemented or enabled for a memory subsystem that implements two 36-bit memory channels, each constituted by a 4-bit auxiliary channel and 32- bit data channel). Moreover, split- auxiliary operation need not be enabled for all memory channels within a given memory subsystem (e.g., one or more memory channels operated in split- auxiliary mode, one or more others in unified-auxiliary mode), nor must the proportionality of the auxiliary-channel split (i.e., between ECC and metadata) be uniform across all memory channels (e.g., different metadata densities applied within memory channels A and B), and nor must any specific set of signaling links and/or memory components be devoted to the auxiliary channel. For example, the memory control component may be configured programmatically (at production time or run time) or by design to implement the auxiliary channel via any subset of the memory-channel signaling links and corresponding memory components (i.e., other than the 8 edge-of-channel signaling links and edge-of-DIMM memory component shown in Figure 1), with that signaling-link/memory-component subset including signaling links directed to two or more different memory components (i.e., metadata and/or ECC values conveyed over auxiliary channel may be stored within respective portions of core memory within two or more memory ICs). Additionally, though shown in Figure 1 as being conveyed to and from control component via host PHY 115, metadata may instead be sourced in whole or part from circuitry within control component 101 and/or one or more other signaling interfaces coupled to control component 101. More generally, while per-transaction metadata is generally described herein as consisting in whole or part of component-level information associated with a host-supplied/host- requested block of application data - predetermined number of bits per application data block as needed to support Compute Express Link (CXL) or other cache-coherent interconnects, encryption or other security-related information, any other useful physical-layer, media-access- control-layer, transport-layer, application-layer data and/or any other data associated with a given memory read/write transaction (e.g., 512-bit write or read data block in the Figure 1 embodiment and others discussed below) - metadata may additionally or alternatively include virtually any type of information useful within the memory system of Figure 1 (or larger host system in which memory system 100 is deployed), including information lacking association with or correlation to the application data block. Also, while DIMM based memory subsystems are illustrated in Figure 1 and referenced in embodiments discussed below, memory subsystems having various other form-factors (e.g., system on a chip, three-dimensional IC or the like) may be coupled to the split- auxiliary memory controller in all cases. Moreover, regardless of memory subsystem form factor, constituent semiconductor memory components (which may be single or multi-chip packages) may be implemented with any practicable core storage technology including, for example and without limitation, dynamic random access memory (DRAM), static random access memory (SRAM), and/or various non-volatile semiconductor storage (e.g., NAND flash, NOR flash, magneto-resistive memory, phase-change memory, etc.). Though not specifically shown, various control signaling lines may extend between control component 101 and DIMMs 103 (or other memory subsystem components) to convey command/address values, chip-enable signals, power-mode control signals, clock-enable, bus inversion, etc. In a number of embodiments, DIMMs 103 (or other form-factor memory components) may include one or more additional integrated-circuit components (register clock driver (RCD), data buffer (DB), serial presence detect (SPD), etc.) coupled to control signal lines and/or data signal lines extending from control component 101. [0019] Figure 2 illustrates exemplary serializer/deserializer and ECC encoder/decoder circuitry within a mode-configurable memory-side PHY. When operated in unified- auxiliary mode 161 (as shown at left in response to “config” signal setting), a 512-bit (64B) incoming application data block is serialized into a xl6 sequence of 32-bit application data words via serializer circuit 163 (i.e., converting parallel 512 data block into a continuous sequence of 16 32-bit data words). Each of the data words is supplied to a SECDED (32, 7) which, in turn, generates a 7-bit ECC value, with the ECC value and corresponding write-data word supplied in bit-parallel format via a set of 40 output drivers (shown in this case as a clocked bit register 165 and signal amplifier 167) - the 7-bit ECC value transmitted over an 8-bit auxiliary signaling channel 170 (with one output/driver signaling line unused), and the 32-bit write data word transmitted over a 32-bit data signaling channel 172 (the “application data” or “primary” channel). In embodiments herein the output drivers implement double-data-rate signaling, driving each of the sixteen 32-bit data words and 7-bit ECC values onto the memory channel (data channel + auxiliary channel) on rising and falling edges of a transmit clock signal (tclk) in respective unit intervals and thus two successive bits per signaling-link per transmit clock cycle (one bit per UI). More or fewer data transmissions per transmit clock cycle may be implemented in alternative embodiments (e.g., single bit per cycle, four bits per cycle, 8 bits per cycle, etc.), including higher data rates achieved in whole or part through transmission of multibit symbols per clock cycle (e.g., symbol having one of four levels to convey two bits per clock edge instead of a single bit).

[0020] Unified-auxiliary-mode read-data reception is essentially reverse of the write data transmission, with signal receivers (not specifically shown) sampling the data and auxiliary channels in response to edges of a receive clock (e.g., a clock operating at the same frequency as the transmit clock, but possibly at different phase as necessary to align receive clock edges with midpoints in incoming signal eyes) to generate a xl6 burst of 40-bit samples (one 40-bit sample per unit interval/clock edge), each 40-bit sample including a 7-bit ECC value and 32-bit data word that are jointly applied to SECDED (32, 7) decoder 175. The decoder, in response, generates a 32-bit error-detected/corrected version (RdDat) of each sampled data word (RdDat’) and thus a sequence of 16 error-detected/corrected data words that are deserialized (parallelized) within 16:1 deserializer 177 to produce a 512-bit read data block to be transmitted to the host via the host PHY.

[0021] In a number of embodiments, a multi-bit implementation of the configuration signal enables various split- auxiliary modes with differentiated ECC densities according to desired per-transaction metadata volume. In the exemplary split-auxiliary configuration shown at 183 in Figure 2 (at right), ECC density is scaled down from 7 ECC bits per 39-bit codeword to 8 ECC bits per 80-bit codeword, yielding a codeword that spans a successive pair 40-bit transfers over the auxiliary and application data signaling channels (170, 172). Accordingly, the split-aux configuration setting revises the host-side serialization factor from 1:16 to 1:8 (as shown at 185), the memory-side serialization from none to 1:2 (187) and the ECC encoder from SECDED(32, 7) to SECDED(72, 8) as shown at 189, an approach that frees auxiliary channel bandwidth (otherwise dedicated to ECC transmission) for metadata conveyance - in this case freeing half the aux-channel carrying capacity (i.e., 64 bits out of the 128 bits that may be conveyed over 8 auxiliary signaling links in a xl6 burst) for metadata transmission and storage. Moreover, the 80-bit codeword size includes headroom, after accounting for the 64-bit application data block size and 8-bit ECC value, to error-protect an additional 8 bits per codeword, and thus the entirety of any content (metadata or otherwise) conveyed over the freed portion of auxiliary channel 170. Accordingly, in the depicted embodiment, 64 metadata bits are serialized into a sequence of eight 8-bit values via serializer 191 and supplied to the SECDED(72,8)-configured ECC encoder 189 to effect a 72-bit encoded data block that is transmitted, together with an 8-bit ECC value, via output drivers (165/167) over successive unit intervals - 80-bits per UI pair and, via 1:2 serializers 193, 195 and 187, 40 bits per UI (4 bits metadata, 4 bits ECC and 32 bits application data per UI). Data received from the memory subsystem is deserialized and decoded according to the same bit ratios - 2:1 deserializers (201, 203, 205) to supply an 80-bit codeword to SECDED(72,8)-configured ECC decoder 207 every two UIs (i.e., with receiver circuits (not shown) generating a 40-bit sample every UI, including 4 bits of metadata, 4 bits ECC and 32 bits of application data, and thus a complete 8-bit ECC value together with 8 bits of metadata and 64 bits application data every two unit intervals). The SECDED decoder responsively delivers a sequence of eight 72-bit error detected/corrected outputs (i.e., 8 bits of metadata and 64 bits of application data per decode) to 8:1 deserializers 209, 211 which, in turn, yield 64-bit metadata and 512-bit application data blocks with at least the latter being returned to the host via the host-side PHY.

[0022] Figure 3 presents a logical view of application, metadata and ECC storage according to the unified- and split- auxiliary operating modes shown in Figure 2 - showing exclusively ECC storage within the auxiliary partition in unified-aux mode 181, and an evenly split allocation of that same storage in split- aux mode 183, half of the auxiliary storage to metadata and the other half to ECC. Figure 4 illustrates various configurable (or fixed-by- hardware) ECC-density options that may be implemented by the split- auxiliary control component of Figures 1 and 2, showing an exemplary set of five SECDED encode/decode options within respective rows of table 220 and corresponding xl6 burst transmissions over a given memory channel (e.g., memory channel A and/or memory channel B as shown in Figure 1) in transfer diagrams 221, 222, 223, 224, 225. In the highest ECC-density configuration, parameterized in the first row of table 220, the auxiliary channel bandwidth is split 7:1, with 7 ECC bits and 1 metadata bit transferred per unit interval (as shown in transfer diagram 221), thus enabling conveyance and storage of 16 metadata bits per memory access transaction. Thus, the ECC encoder/decoder set implements SECDED (33, 7), with each 40-bit codeword constituted by 32 application data bits (“32 dat”), one metadata bit (“1 md”), and seven ECC bits (“7 ECC”) as shown in the second column of table 220. The total auxiliary channel transfer over the 16-UI burst interval (8 transmit clock cycles in a double-data-rate output driver implementation) includes 16 metadata bits and 112 ECC bits (512 application data bits are transferred in all cases), with a single UI per codeword. In an embodiment capable of data-rate ECC encoding/decoding (i.e., executing an ECC encoding/decoding operation within a single unit interval), the serialization may be effected by a 1:16 host-side encoder as shown in the “serialization split” column of table 220. For data reception, the deserialization placement and ratios remain the same - SECDED(32,7) decoder feeding a 16:1 host-side deserializer.

[0023] Continuing with Figure 4, the second row of table 220 corresponds to the SECDED(72,8) configuration shown in the split-aux mode of Figure 2, increasing the per- codeword ECC bit depth by one bit (from seven bits to eight). As shown in the exemplary data transfer diagram at 222, every codeword spans two UIs and includes 64 bits of application data (231), 8-bits of metadata (232) and 8-bits of ECC (233). The per-codeword ECC bit depth is incremented by another bit in the third row of table 220 (from 8 bits to 9), with total ECC density further reduced by extending the codeword over four unit intervals as shown in transfer diagram 223. Thus, per columns 1-3 of table 220, SECDED(151, 9) encoding/decoding is implemented to enable a codeword consisting of 128 application data bits (i.e., 32-bits per UI over 4 UIs), 23 metadata bits and 8 ECC bits - effecting a total auxiliary channel transfer/storage of 92 metadata bits and 36 ECC bits per memory access transaction. As can be seen in transfer diagram 223, the per-codeword metadata and ECC bit counts (23 and 9, respectively) are not uniformly divisible over the four UI interval codeword transfer so that the metadata bit count per UI (and ECC bit count per UI) is not uniform across the four-UI codeword transfer interval. While a specific bit pattern is shown (one additional ECC bit conveyed in final UI of 4-UI codeword transfer), ECC and metadata bits may be conveyed in any practicable distribution within the available auxiliary channel bandwidth (including transfers consisting of metadata exclusively or ECC bits exclusively) - an option in all ECC bit-depth configurations herein, regardless of whether the auxiliary bandwidth may be split uniformly across UIs or groups of UIs or not. For example, in the SECDED (72,8) configuration, 8-bits of metadata may be transmitted via the auxiliary channel in one unit interval followed (or preceded) by transmission of a corresponding set of 8-bits of ECC information in a succeeding or preceding unit interval - an arrangement that may provide serialization/deserialization benefits in some implementations.

[0024] Still referring to Figure 4, the fourth and fifth rows of table 220 show progressively reduced ECC density, with the codeword extending to 320 bits (8 unit intervals, 10-bit ECC value) in the former instance (fourth row of table 220) and to the full 640 bits of the 16-UI transfer in the latter instance (11 -bit ECC value). As in the SECDED(151, 9) embodiment, the per codeword metadata/ECC bit counts do not enable uniform metadata/ECC - bit split across all UIs (or exclusive information load within all UIs - i.e., as where each UI may convey exclusively metadata or exclusively ECC), with exemplary metadata/ECC patterns shown in transfer diagrams 224, 225. Any other practicable distribution of ECC and metadata within the auxiliary channel and/or across UIs may be implemented.

[0025] Figure 5 illustrates applicability of the various ECC bit depths shown within table 220 of Figure 4 (the first two columns of which are excerpted at 250) across different multichannel DIMM implementations, including a DIMM 251 containing ten x4 memory ICs per memory channel on each face (i.e., each memory IC having a four-bit data width to effect a 40- bit memory channel), as well as a DIMM 253 having five x8 memory ICs per memory channel on each face, also effecting a 40-bit memory channel. As discussed above, various memory form factors other than dual inline memory modules may be deployed in connection with a split- aux memory control component in alternative embodiments, and DIMMs having more or fewer than two memory channels may be used.

[0026] Figure 6 illustrates exemplary application of a split- auxiliary channel in an alternative DIMM implementation (261) having two 36-bit memory channels (e.g., nine x4 memory ICs per memory channel on each DIMM face, with each memory IC coupled to a respective set of four signaling links within the 36-link signaling path). In this example, the primary data channel width (application data channel width) is maintained at 32 bits (32 signaling links), while the auxiliary channel width is halved from eight links to four links (8 bits to 4 bits in the case of binary symbol conveyance). As shown, ECC bit depth may be scaled as shown in table 270 and corresponding transfer diagrams (271, 272, 273) to enable desired pertransaction metadata volume. In the 9-bit ECC bit depth shown in the first row of table 270, for example, SECDED(135, 9) ECC encoder/decoder circuitry generates a 144 bit codeword (135+9) that is transmitted over the 36-bit memory channel in four successive unit intervals, with four such codewords transmitted per xl6 burst. Each codeword conveys 7 bits of metadata (and 128 bits of application data), so that as many as 28 metadata bits may be conveyed to the memory subsystem (and stored therein) per memory transaction together with 512 bits of application data. As one codeword is generated per four unit intervals, 1:4 host-side serialization is implemented (e.g., as shown in the final column of table 270) to supply 135 bits (128 application data bits and 7 metadata bits) to a SECDED(135, 9) encoder, with the output of that encoder being further serialized 1:4 for transmission to the memory subsystem in four consecutive unit intervals (16 UIs used to convey all four codewords). The latter two rows of table 270 show additional ECC bit depth configurations (10 bits and 11 bits, respectively) with corresponding codeword content and host-side/memory-side serialization ratios. As in a number of ECC bit-depth configurations discussed above, the per codeword metadata/ECC bit counts for the three ECC-density configurations shown in table 270 do not enable uniform metadata/ECC- bit split across all UIs, with exemplary metadata/ECC patterns shown in transfer diagrams 271, 272, 273. Any other practicable distribution of ECC and metadata within the 4-bit auxiliary channel and/or across UIs may be implemented in alternative embodiments.

[0027] In embodiments described thus far, SECDED encoder/decoder circuits are clocked at frequencies according to the data rate divided by the unit-intervals per codeword and thus at the data rate in the highest-frequency case (first row of Figure-4 table 220), half the data rate (same frequency as the transmit/receive clock - collectively the “memory” clock rate - in the double-data-rate examples presented) in the SECDED(72, 8) example shown in the second row of table 220, or half, one-quarter or one-eighth the memory clock rate, respectively, in the final three rows of the Figure 4 table (and the three rows of table 270 in Figure 6). In alternative embodiments, configuration-dependent ECC decode/encode clocking is avoided by clocking the SECDED encoder/decoder circuits with a fixed-rate controller core clock or frequency-divided instance of the core clock in all ECC-density configurations with (i) the core clock having a predetermined frequency relation to the memory clock and (ii) accommodation of ECC-density - dependent Ul-count per encode/decode within the encoder/decoder circuit implementations themselves. More specifically, where the configuration-required rate of ECC encode (or decode) exceeds the core clock rate (e.g., configuration requires an ECC encode every two unit intervals (2UI) where core clock cycle time is 4UI), multiple instances of the ECC encode/decode circuits may be operated in parallel to meet the encoding/decoding demand.

[0028] Figures 7 and 8 illustrate examples of ECC circuit parallelization for the 2UI per ECC codeword example shown in Figure-4 table row two (e.g., SECDED(72,8)), with the parallel (multi-instance) ECC encoders/decoders clocked by half-rate core clock and quarter-rate core clocks, respectively. Taking the Figure 7 example first, the controller core clock (clk-c) cycles at half the rate of the memory clock (clk-m) and thus once for every four unit intervals in a double-data-rate memory-side PHY (i.e., tcik-c = 2t c ik-m = 4UI). To meet the ECC encode/decode rate demanded by the SECDED(72, 8) configuration - one ECC encode per 2UI in the write data path, one ECC decode per 2UI in read data path - two SECDED(72,8) ECC encoder instances (189) operate in parallel in the core clock domain to encode respective halves of the 128 bit application data blocks and 16 bit metadata chunks exiting 1:4 serializers 301 and 303, respectively. That is, each SECDED (72, 8) encoder 189 receives a respective half (64 bits) of each 128-bit application data block and a respective half (8 bits) of each 16-bit metadata chunk and produces a respective 8-bit ECC value, with the two 8-bit ECC values (16 bits of ECC information total) being supplied, together with the source 128-bit application data block and 16-bit metadata chunk, to 1:4 serializers 305, 307, 309 within the memory clock domain to produce the requisite 32-bit application data word, 4-bit metadata value and 4-bits of ECC information transmitted per unit interval. The memory read path is the converse of the write path, with the outputs of 4:1 deserializers 321, 323, 325 (i.e., 128 bits application data, 16 bits metadata, 16 bits ECC) split evenly between two core-clocked SECDED(72, 8) decoders 207, with each decoder yielding an error-detected/corrected set 8 metadata bits and 64 application data bits - collectively 16 metadata bits and 128 application data bits that are 4:1 deserialized within host-side deserializing circuits 331, 333 to yield a 64-bit metadata block and 512-bit application data block as shown. Figure 8 illustrates a similar approach in the context of a quarter-rate core clock signal (i.e., t c ik-c = 4tcik-m = 8UI), with four SECDED encoder/decoder instances operated in parallel to encode and decode the requisite volumes of application data and metadata - that is, 256 application data bits and 32 metadata bits supplied in respective 64-bit/8- bit subsets to each of four SECDED(72, 8) encoders and correspondingly generated by each of four SECDED(72, 8) decoders.

[0029] Figure 9 presents a tabulated view of various configuration operations that may be selected/enabled to yield half-rate core clock and quarter-rate core clock ECC encoding/decoding for selected ECC bit depths (ranging from 7 bits to 11 bits in this SECDED example) and across the two exemplary module memory populations presented above (i.e., 40- bit memory channel, 36-bit memory channel). Where a fractional number of encode/decode operations are required per core clock cycle (e.g., “1/2” or “1/4” as shown in column 7 of Figure 9), a single encoder may be operated in pipelined manner (or in response to frequency-divided core clock signals, etc.) to encode/decode a given SECDED codeword over multiple core clock cycles (e.g., over two or four core clock cycles where the encode/decode per core clock cycle is * or 14, respectively). Alternatively, fractional encode/decode per core clock cycle may be accommodated by operating the encoder/decoder at the core clock rate, idling for one or more cycles after each encode/decode to yield the fractional encode/decode rate.

[0030] Figure 10 illustrates an embodiment of a memory control component having a configuration register 351 that may be run-time programmed (e.g., in response to host instruction/operand) or production-time programmed to enable operation in either of the unified- auxiliary or split- auxiliary modes presented herein (e.g., setting “sa” bit to ‘0’ or ‘1’) and, when split-aux mode is enabled, to operate with an ECC density that yields a desired metadata subchannel bandwidth/storage capacity within an otherwise ECC-dedicated auxiliary channel. In the conceptual example depicted, the ECC density setting and split-aux enable bit (‘sa’) are supplied to a configuration decoder 353 that responsively outputs configuration signals to set the host-side and memory-side serialization/deserialization ratios (HSSEL and MSSEL, respectively, supplied to host-side serializer 355 and memory-side serializer 357) and ECC encoding density within variable-density encoder 359 in accordance with the programmed setting. Though not specifically shown, the same control signals are applied to host-side and memory-side deserializers and variable-density ECC decoder (not specifically shown) in the read data path. In the Figure 10 embodiment, variable-density ECC encoder 359 (and counterpart decoder) includes SECDED encode blocks corresponding to each of the options shown in Figure 9 for a 40-bit memory channel or a 36-bit memory channel, with the ECC-select signal enabling one of those SECDED encoders (any of which may include multiple encoder instances according to the core-clock/memory-clock ratio per discussion of Figures 7, 8 and 9) and disabling the others. Though not specifically shown, configuration decoder 353 may additionally output a configuration signal to VO block 361 to disable unused output drivers/receivers in response to an ECC density corresponding to a 36-bit memory channel. In alternative embodiments, circuitry within any one of the constituent SECDED encoders within variable-density ECC encoder 359 may also constitute part or all of the circuitry within another of the constituent encoders (i.e., sharing circuitry so that implementations of the constituent decoders need not be discrete). Likewise, for SECDED decoder instances (not shown) in the read data path. More generally, various encoders other than (or in addition to) the specific SECDED encoders shown may also be selectively engaged within variable-density ECC encoder 359, including encoders corresponding to memory channel widths other than the 40-bit and 36-bit widths shown in Figure 9, block encoders other than or in addition to SECDED encoders, and so forth. [0031] Referring to Figures 1-10 generally, while application data, metadata and ECC values are presented in various specific ratios herein (e.g., application data constituted by 2 N bits, metadata constituted by M bits, where 2 N + M yields a non-power of number of bits, and/or where 2 N > 8*M; denoting multiplication) such ratios are presented for purposes of explaining exemplary operation within selected embodiments. Various other ratios application data, metadata and/or ECC information may apply in alternative embodiments. Also, the various memory control components presented herein may be implemented within a standalone integrated circuit component or IC package, or within one or more IC components (including packages having multiple IC dies) that combines the memory control function with one or more other functions (e.g., integrated-circuit processor, application-specific integrated circuit (ASIC), etc.). One or more programmed microcontrollers and/or dedicated hardware circuits (e.g., finite state machines, registered or combinational circuits, etc.) may implement and/or control all or part of the various architectural and functional elements within the memory controller architectures presented herein (e.g., to implement any one or more of the ECC decoders, encoders, configuration controller, etc.). Additionally, any or all of those architectural/functional elements may be described using computer aided design tools and expressed (or represented), as data and/or instructions embodied in various computer-readable media, in terms of their behavioral, register transfer, logic component, transistor, layout geometries, and/or other characteristics. Formats of files and other objects in which such circuit expressions may be implemented include, but are not limited to, formats supporting behavioral languages such as C, Verilog, and VHDL, formats supporting register level description languages like RTL, and formats supporting geometry description languages such as GDSII, GDSIII, GDSIV, CIF, MEBES and any other suitable formats and languages. Computer- readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, computer storage media in various forms (e.g., optical, magnetic or semiconductor storage media).

[0032] When received within a computer system via one or more computer-readable media, such data and/or instruction-based expressions of the above described circuits can be processed by a processing entity (e.g., one or more processors) within the computer system in conjunction with execution of one or more other computer programs including, without limitation, net-list generation programs, place and route programs and the like, to generate a representation or image of a physical manifestation of such circuits. Such representation or image can thereafter be used in device fabrication, for example, by enabling generation of one or more masks that are used to form various components of the circuits in a device fabrication process.

[0033] In the foregoing description and in the accompanying drawings, specific terminology and drawing symbols have been set forth to provide a thorough understanding of the disclosed embodiments. In some instances, the terminology and symbols may imply specific details not required to practice those embodiments. For example, the various memory channel widths, memory-IC data widths, per-DIMM device counts, signaling path widths, data block sizes, data word sizes, message lengths, encoding/decoding schemes and parameters, host interface types, data-unit or symbol sizes (e.g., nibbles, bytes), metadata sizes, error detection/correction capabilities, ECC density options, and so forth are provided for purposes of example only - any practicable alternatives may be implemented in all cases. Similarly, signaling link parameters, protocols, configurations may be implemented in accordance with any practicable open or proprietary standard and any version of such standard. Links or other interconnection between integrated circuit devices or internal circuit elements or blocks may be shown as buses or as single signal lines. Each of the buses can alternatively be a single signal line (e.g., with digital or analog signals time-multiplexed thereon), and each of the single signal lines can alternatively be a bus. Signals and signaling links, however shown or described, can be single-ended or differential. Logic signals shown as having active-high assertion or “true” states, may have opposite assertion states in alternative implementations. A signal driving circuit is said to “output” a signal to a signal receiving circuit when the signal driving circuit asserts (or de-asserts, if explicitly stated or indicated by context) the signal on a signal line coupled between the signal driving and signal receiving circuits. The term “coupled” is used herein to express a direct connection as well as a connection through one or more intervening circuits or structures. Integrated circuit device or register “programming” can include, for example and without limitation, loading a control value into a configuration register or other storage circuit within the integrated circuit device in response to a host instruction (and thus controlling an operational aspect of the device and/or establishing a device configuration) or through a onetime programming operation (e.g., blowing fuses within a configuration circuit during device production), and/or connecting one or more selected pins or other contact structures of the device to reference voltage lines (also referred to as strapping) to establish a particular device configuration or operational aspect of the device. The terms “exemplary” and "embodiment" are used to express an example, not a preference or requirement. Also, the terms “may” and “can” are used interchangeably to denote optional (permissible) subject matter. The absence of either term should not be construed as meaning that a given feature or technique is required. [0034] Various modifications and changes can be made to the embodiments presented herein without departing from the broader spirit and scope of the disclosure. For example, features or aspects of any of the embodiments can be applied in combination with any other of the embodiments or in place of counterpart features or aspects thereof. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.