DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Status
	Claims 1-2, 10-11 and 19 have been amended. Claims 1-20 remain pending and are ready for examination.

Contingent Limitations
	The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met. For example, assume a method claim requires step A if a first condition happens and step B if a second condition happens. If the claimed invention may be practiced without either the first or second condition happening, then neither step A or B is required by the broadest reasonable interpretation of the claim. If the claimed invention requires the first condition to occur, then the broadest reasonable interpretation of the claim requires step A. If the claimed invention requires both the first and second conditions to occur, then the broadest reasonable interpretation of the claim requires both steps A and B. The broadest reasonable interpretation of a system (or apparatus or product) claim having structure that performs a function, which only needs to occur if a condition precedent is met, requires structure for performing the function 
 
Claim Interpretation
This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function.  Such claim limitation(s) is/are: “memory module” in claim 1.
Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.
If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function.

Independent claim 1 features the term “memory module”, which does not invoke 35 U.S.C. 112(f). Firstly, note that the term “module” does not appear in the method set of the claims, indicating that the module is tied to the device set of claims (i.e., Figure 1 is a block diagram illustrating a computing system with a cache, according to an embodiment. In one embodiment, the computing system 100 includes processing device 110 and memory module 120. While only a single computing device is illustrated, the term “computing device” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. For a more explicit example of the memory module being a physical entity, also see Specification paragraph [0020], In one embodiment, memory modules 120 may be dual inline memory modules (DIMMs), which each comprise a series of DRAM integrated circuits mounted together on a printed circuit board. Each of memory modules 120 may be coupled to processing device 110 via an individual or shared processor bus 117 or other interconnect.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 5-11, 14-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Loh et al. (US Publication No. 2013/0138892 -- "Loh") in view of Bryant et al. (US Publication No. 2019/0294554 – “Bryant”).

Regarding claim 1, Loh teaches A system comprising: a processing device comprising a cache controller; an interconnect coupled to the processing device; and (Loh paragraph [0029], Each of the cache memory subsystems 124a-124b and 128 may include a cache memory, or cache array, connected to a corresponding cache controller. The cache memory subsystems 124a-124b and 128 may be implemented as a hierarchy of caches. Caches located nearer the processor cores 122a-122b (within the hierarchy) may be integrated into the processor cores 122a-122b, if desired. This level of the caches may be a level-one (L1) of a multi-level hierarchy. In one embodiment, the cache memory subsystems 124a-124b each represent L2 cache structures, and the shared cache memory subsystem 128 represents an L3 cache structure. In another embodiment, cache memory subsystems 114 each represent L1 cache structures, and shared cache subsystem 118 represents an L2 cache structure. Other embodiments are possible and contemplated. The processing device has a cache Regardless of a given type of processing unit used in the computing system 100, as software applications access more and more data, the memory subsystem is utilized more heavily. Latencies become more crucial. More on-chip memory storage may be used to reduce interconnect latencies. For example, each of the cache memory subsystems 124a-124b may reduce memory latencies for a respective one of the processor cores 122a-122b. In addition, the microprocessor 110 may include the shared cache memory subsystem 128 as a last-level cache (LLC) before accessing the off-chip DRAM 170 and/or the off-chip disk memory 162) a memory module coupled to the interconnect, wherein the memory module is separated from the processing device by the interconnect, (Loh paragraph [0043], Turning now to FIG. 3, a generalized block diagram of one embodiment of a computing system 300 utilizing a three-dimensional (3D) DRAM is shown. Circuitry and logic described earlier are numbered identically. The computing system 300 may utilize three-dimensional (3D) packaging, such as a System in Package (SiP) as described earlier. The computing system 300 may include a SiP 310. In one embodiment, the SiP 310 may include the processing unit 220 described earlier and a 3D DRAM 330 that communicate through low-latency interconnect 340. The in-package low-latency interconnect 340 may be horizontal and/or vertical with shorter lengths than long off-chip interconnects when a SiP is not used. Loh Figure 3; Reference #330, 340, 220. Note that the processing unit 220 is separated from the 3D DRAM 330 (i.e, memory module) via the interconnect 340 (in this case, a low-latency interconnect)) the memory module comprising: a main memory storing data; (Loh paragraph [0074], the memory request may be sent to main memory. The main memory may include an off-chip non-integrated DRAM and/or an off-chip disk memory. If the tag comparisons determine a tag hit occurs (conditional block 516), then in block 520, read or write operations are performed on a corresponding cache line in the row buffer. The main memory device is used to store data, among other various functions. Note that the memory module containing the main memory is coupled to the interconnect, see Loh paragraph [0005], The 3D packaging, known as System in Package (SiP) or Chip Stack multi-chip module (MCM), saves space by stacking separate chips in a single package. Components within these layers communicate using on-chip signaling, whether vertically or horizontally. This signaling provides reduced interconnect signal delay over known two-dimensional planar layout circuits) a cache memory to store at least a portion of the data from the main memory; and (Loh paragraphs [0024-0025], A reduced miss rate achieved by the additional memory provided by the cache memory subsystems 124a-124b and 128 helps hide the latency gap between a given one of the processor cores 122a-122b and the off-chip memory. However, there is limited real estate to use for each of the cache memory subsystems 124a-124b and 128. Therefore, the respective sizes is limited for each of the cache memory subsystems 124a-124b and 128 and a significant number of accesses are still sent to the off-chip memory, such as the DRAM 170 and/or the disk memory 162. A cache memory/off-chip memory is used to store at least a portion of the main memory data in certain situations such as access overload) a memory controller coupled to the cache memory, wherein the memory controller comprises a comparator and is configured to: (Loh Figure 3; Loh paragraph [0021], Referring to FIG. 1, a generalized block diagram of one embodiment of a computing system 100 is shown. As shown, microprocessor 110 may include one or more processor cores 122a-122b connected to corresponding one or more cache memory subsystems 124a-124b. The microprocessor may also include interface logic 140, a memory controller 130, system communication logic 126, and a shared cache memory subsystem 128. In one embodiment, the illustrated functionality of the microprocessor 110 is incorporated upon a single integrated circuit. In another embodiment, the illustrated functionality is incorporated in a chipset on a computer motherboard. Note that while the controller does not describe a physical comparator entity, it does contain a comparator in that the controller performs the function of comparing a plurality of tags from requests, see Loh paragraph [0048], Similar to other DRAM topologies, the 3D DRAM 330 may include multiple memory array banks 332a-332b. Each one of the banks 332a-332b may include a respective one of the row buffers 334a-334b. Each one of the row buffers 334a-334b may store data in an accessed row of the multiple rows within the memory array banks 332a-332b. The accessed row may be identified by a DRAM address in the received memory request. The control logic 336 may perform tag comparisons between a cache tag in a received memory request and the one or more cache tags stored in the row buffer. In addition, the control logic may alter a column access of the row buffers by utilizing the cache tag comparison results rather than a bit field within the received DRAM address) receive a first read request from the cache controller over the interconnect, the first read request comprising first tag data identifying a first cache line in the cache memory; (Loh paragraphs [0011-0012], In one embodiment, a computing system includes a processing unit and an integrated dynamic random access memory (DRAM). Examples of the processing unit include a general-purpose microprocessor, a graphics processing unit (GPU), an accelerated processing unit (APU), and so forth. The integrated DRAM may be a three-dimensional (3D) DRAM and may be included in a System-in-Package (SiP) with the processing unit. The processing unit may utilize the 3D DRAM as a cache. [0012] In various embodiments, the 3D DRAM may store both a tag array and a data array. Each row of the multiple rows in the memory array banks of the 3D DRAM may store one or more cache tags and one or more corresponding cache lines indicated by the one or more cache tags. In response to receiving a memory request from the processing unit, the 3D DRAM may perform a memory access according to the received memory request on a given cache line indicated by a cache tag within the received memory request. Performing the memory access may include a single read of a respective row of the multiple rows storing the given cache line. Rather than utilizing multiple DRAM transactions, a single, complex DRAM transaction may be used to reduce latency and power consumption. The memory request can be considered a read request, which is associated with a first tag data that is used to identify a specific portion of the cache, i.e., the "given cache line", as stated in the reference) determine that the first read request comprises a tag read request; (Loh paragraph [0047], The processing unit 220 may include interface logic to I/O devices and other processing units. This interface logic is not shown for ease of illustration. The processing unit 220 may also include the interface logic 324 for communicating with the 3D DRAM 330. Protocols, address formats, and interface signals used in this communication may be similar to the protocols, address formats and interface signals used for off-package DRAM 170. However, when the 3D DRAM 330 is used as a last-level cache (LLC), adjustments may be made to this communication. For example, a memory request sent from the processing unit 220 to the 3D DRAM 330 may include a cache tag in addition to a DRAM address identifying a respective row within one of the memory array banks 332a-332b. The received cache tag may be used to compare to cache tags stored in the identified given row within the 3D DRAM 330. As stated in Loh, each memory access request (i.e., read request) may be sent with a cache tag as a means to provide a location for the read request) read second tag data corresponding to the tag read request from the cache memory; (Loh paragraph [0048], Similar to other DRAM topologies, the 3D DRAM 330 may include multiple memory array banks 332a-332b. Each one of the banks 332a-332b may include a respective one of the row buffers 334a-334b. Each one of the row buffers 334a-334b may store data in an accessed row of the multiple rows within the memory array banks 332a-332b. The accessed row may be identified by a DRAM address in the received memory request. The control logic 336 may perform tag comparisons between a cache tag in a received memory request and the one or more cache tags stored in the row buffer. In addition, the control logic may alter a column access of the row buffers by utilizing the cache tag comparison results rather than a bit field within the received DRAM address. The cache tag that is used to perform the tag read request may provide the memory system with additional tag located in the particular cache line that the original tag was identifying, leading to obtaining two separate tag data) compare, using the comparator, the second tag data read from the cache memory to the first tag data received from the cache controller with the first read request; and if the second tag data matches the first tag data, initiate an action with respect to the first cache line in the cache memory (Loh paragraphs [0060-0061], A sequence of steps 1-7 is shown in FIG. 4 for accessing tags, status information and data corresponding to cache lines stored in a 3D DRAM. When the memory array bank 430 is used as a cache storing both a tag array and a data array within a same row, an access sequence different from a sequence utilizing steps 1-7 for a given row of the rows 432a-432k may have a large latency. For example, a DRAM access typically includes an first activation or opening stage, a stage that copies the contents of an entire row into the row buffer, a tag read stage, a tag comparison stage, a data read or write access stage that includes a column access, a first precharge or closing stage, a second activation or opening stage, a stage that copies the contents of the entire row again into the row buffer, a tag read stage, a tag comparison stage, an update stage for status information corresponding to the matching tag, and a second precharge or closing stage. The two separate tags are compared to one another. If the two tags match, then the memory access request that was previously described may be executed, which can involve a plurality of actions as well as cache lines, see Loh paragraph [0061], Continuing with the access steps within the memory array bank 430, one or more additional precharge and activation stages may be included after each access of the row buffer if other data stored in other rows are accessed in the meantime. Rather than utilize multiple DRAM transactions for a single cache access, the sequence of steps 1-7, may be used to convert a cache access into a single DRAM transaction. Each of the different DRAM operations, such as activation/open, column access, read, write, and precharge/close, has a different respective latency).
the cache memory comprising a plurality of sets of cache lines each comprising a plurality of cache storage locations, wherein a first location of a first set of the plurality of sets of cache lines comprises cache tag data for the plurality of sets of cache lines; read second tag data corresponding to the tag read request from the first location of the first set of the plurality of sets of cache line.
However, Bryant teaches the cache memory comprising a plurality of sets of cache lines each comprising a plurality of cache storage locations, wherein a first location of a first set of the plurality of sets of cache lines comprises cache tag data for the plurality of sets of cache lines; (Bryant paragraph [0062], For example, if one additional bit is provided in field 122, such that the virtual address index bits are formed by bits 12 to 6 of the virtual address in the illustrated example, this enables two aliased locations to be identified within a 32 Kbyte four-way cache, as illustrated schematically in FIG. 2A. In particular, the storage 150 within the level 1 cache 100 may comprise of a number of sets 160a to 160g, where each set includes a plurality of ways 170a to 170d. For ease of illustration, the storage structure 150 does not show separate tag RAMs and data RAMs, but typically there will be a separate tag RAM entry for each cache line in the data RAM, the tag RAM entry identifying physical address bits, valid bits, dirty bits, etc. and the corresponding cache line containing the data. The tag RAMs and data RAMs are accessed in the same way, and in particular the cache access circuitry 140 will use the virtual index derived from the virtual address in order to determine the appropriate set to access (each set containing a cache line in each way, and the corresponding tag RAM entries for those cache lines). In this example, because the virtual address index bits are bits 12 to 6, there are two potential aliased locations, as indicated by the shaded sets 160b and 160e. If instead the virtual index extends from bits 13 down to 6, then there would be four aliased locations in a 64 Kbyte four-way cache. The cache memory contains a plurality of cache lines which utilize cache tags in order to identify and label storage locations (both physical and virtual address). Each contains a plurality of tags and a plurality of cache lines). Also see Bryant paragraph [0006], In a further example configuration, there is provided a method of handling access requests in an apparatus comprising: employing processing circuitry to process a plurality of program threads to perform data processing operations on data, the operations identifying the data using virtual addresses, and the virtual addresses being mapped to physical addresses within a memory system; providing a cache storage having a plurality of cache entries to store data, an aliasing condition existing when multiple virtual addresses map to the same physical address, and allocation of data into the cache storage being constrained to prevent multiple cache entries of the cache storage simultaneously storing data for the same physical address and Paragraph [0019], Often, a cache storage arranged in this way is referred to as a virtually indexed physically tagged (VIPT) cache. One benefit of arranging a cache in that way is that a lookup can begin to be performed within the cache whilst the physical address is still being determined from the specified virtual address. This can provide some performance benefits) read second tag data corresponding to the tag read request from  the first location of the first set of the plurality of sets of cache line (See Bryant as described above. Also see Bryant paragraphs [0048-0049] and [0050], In particular, once a set has been identified using the cache index, then it is determined whether a hit is detected within one of the cache lines of that set by comparing a physical address portion stored in association with that cache line with a tag portion of the physical address produced by the TLB circuitry 35. In the presence of a hit, the access can proceed within the level 1 cache. However, in the event of a miss, the access request can be propagated onto the further levels of cache/main memory 30 in order to cause the required data to be accessed. As part of this process, a linefill operation may occur within the level 1 cache 25 in order to store a cache line's worth of data containing the data being accessed by the access request, so that that data is then available in the level 1 cache for any subsequent access request that also seeks to access the data in that cache line. As shown in FIG. 1, in the example illustrated therein any accesses to the further levels of cache or main memory 30 are performed using the physical address determined by the TLB circuitry 35. The first set of cache lines can be determined and used to provide data access of exclusive operations).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Loh with those of Bryant. Bryant teaches having a plurality of cache lines and sets of cache lines which comprise cache tag data and storage information. This can improve both cache efficiency and cache reliability, such as by ensuring that multiple different entries storing data for the same physical address, or attempting access to a cache entry using a virtual address that maps to the same physical address as other virtual addresses, and other various dilemmas (Bryant paragraph [0017], The operations identify the data using virtual addresses, and the virtual addresses are mapped to physical addresses within a memory system. The apparatus also has a cache storage providing a plurality of cache entries for storing data. An aliasing condition exists when multiple virtual addresses map to the same physical address, and allocation of data into the cache storage is constrained to prevent multiple cache entries of the cache storage simultaneously storing data for the same physical address. In particular, since the cache index used to identify one or more entries within the cache storage that can be used to store the data is usually determined with reference to the virtual address, when two different virtual addresses are used for the same physical address, different entries in the cache storage would be identified dependent on which virtual address is used. The cache storage can be arranged to prevent different entries storing data for the same physical address at any point in time, for example by evicting the contents in one cache entry identified by a first virtual address, when an access is then attempted to another entry using a second virtual address that maps to the same physical address. Once the first entry's contents have been evicted, the entry identified using the second virtual address can then be populated with the data. Additionally, the efficiency can see improvements via quicker cache look ups (Bryant paragraph [0019], Often, a cache storage arranged in this way is referred to as a virtually indexed physically tagged (VIPT) cache. One benefit of arranging a cache in that way is that a lookup can begin to be performed within the cache whilst the physical address is still being determined from the specified virtual address. This can provide some performance benefits).
 
Claim 10 is the corresponding method claim to the system claim 1. It is rejected with the same references and rationale.

Regarding claim 2, Loh in view of Bryant teaches The system of claim 1, wherein the cache memory comprises a set associative cache implemented on a dynamic random access memory (DRAM) device, (Loh paragraph [0008], Utilizing DRAM access mechanisms while storing and accessing the tags and data of the additional cache in the integrated DRAM dissipates a lot of power. In addition, these mechanisms consume a lot of bandwidth, especially for a highly associative on-package cache, and consume too much time as the tags and data are read out in a sequential manner. Therefore, the on-package DRAM provides a lot of extra data storage, but cache and DRAM access mechanisms are inefficient).

Claim 11 is the corresponding method claim to the system claim 2. It is rejected with the same references and rationale.

Regarding claim 5, Loh in view of Bryant teaches The system of claim 1, wherein to initiate the action with respect to the first cache line, the memory controller is configured to: prepare a portion of the cache memory corresponding to the first cache line for a second read request to be subsequently received from the cache controller over the interconnect (Loh paragraph [0029], Each of the cache memory subsystems 124a-124b and 128 may include a cache memory, or cache array, connected to a corresponding cache controller. The cache memory subsystems 124a-124b and 128 may be implemented as a hierarchy of caches. Caches located nearer the processor cores 122a-122b (within the hierarchy) may be integrated into the processor cores 122a-122b, if desired. This level of the caches may be a level-one (L1) of a multi-level hierarchy. In one embodiment, the cache memory subsystems 124a-124b each represent L2 cache structures, and the shared cache memory subsystem 128 represents an L3 cache structure. In another embodiment, cache memory subsystems 114 each represent L1 cache structures, and shared cache subsystem 118 represents an L2 cache structure. Other embodiments are possible and contemplated. Multiple read requests can be sent from the cache controller, see Loh paragraph [0064], A cache tag may be used to determine which of the multiple cache lines are being accessed within a selected row. For example, in a 30-way set-associative cache organization, when the row 432a is selected, the cache tag values stored in the fields 434a-434d may be used to determine which one of the 30 cache lines stored in fields 438a-438d is being accessed. The cache tag stored in field 412 within the address 410 may be used in comparison logic to locate a corresponding cache line of the multiple cache lines stored in the row buffer 440).

Claim 14 is the corresponding method claim to the system claim 5. It is rejected with the same references and rationale.

Regarding claim 6, Loh in view of Bryant teaches The system of claim 1, wherein to initiate the action with respect to the first cache line, the memory controller is configured to: read the first cache line from the cache memory before a second read request is received from the cache controller over the interconnect (Loh paragraph [0062], During sequence 1, a memory request from a processing unit may be received by a 3D DRAM. The memory request may have traversed horizontal or vertical short low-latency interconnect routes available through a 3D integrated fabrication process. A portion of a complete address is shown as address 410. The fields 412 and 414 may store a cache tag and a page index, respectively. Other portions of the complete address may include one or more of a channel index, a bank index, a sub array index, and so forth to identify the memory array bank 430 within the 3D DRAM. During sequence 2, a given row of the rows 432a-432k may be selected from other rows by the page index 414. The first read request is sent and completed before the subsequent read requests are received and acted upon by the memory system).

Claim 15 is the corresponding method claim to the system claim 6. It is rejected with the same references and rationale.

Regarding claim 7, Loh in view of Bryant teaches The system of claim 6, wherein to initiate the action with respect to the first cache line, the memory controller is further configured to: send the first cache line to the cache controller without receiving the second read request from the cache controller over the interconnect (Loh claims 17-19, The method as recited in claim 16, wherein performing the memory access with a single read of the respective row storing the given cache line includes updating the metadata based on the memory access. 18. The method as recited in claim 15, further comprising sending within the memory request the first cache tag in addition to a DRAM address identifying the respective row. 19. The method as recited in claim 15, wherein the DRAM is a three-dimensional (3D) integrated circuit (IC).  The first cache line (i.e., the given cache line associated with the first single read), is sent to the controller individually, without any other future read requests. The interconnect component is used to send the cache data).

Claim 16 is the corresponding method claim to the system claim 7. It is rejected with the same references and rationale.

Regarding claim 8, Loh in view of Bryant teaches The system of claim 6, wherein to initiate the action with respect to the first cache line, the memory controller is further configured to: receive the second read request from the cache controller over the interconnect; and send the first cache line to the cache controller (Loh paragraph [0060], A sequence of steps 1-7 is shown in FIG. 4 for accessing tags, status information and data corresponding to cache lines stored in a 3D DRAM. When the memory array bank 430 is used as a cache storing both a tag array and a data array within a same row, an access sequence different from a sequence utilizing steps 1-7 for a given row of the rows 432a-432k may have a large latency. For example, a DRAM access typically includes an first activation or opening stage, a stage that copies the contents of an entire row into the row buffer, a tag read stage, a tag comparison stage, a data read or write access stage that includes a column access, a first precharge or closing stage, a second activation or opening stage, a stage that copies the contents of the entire row again into the row buffer, a tag read stage, a tag comparison stage, an update stage for status information corresponding to the matching tag, and a second precharge or closing stage.  A second read request can be sent via If a cache miss occurs, such as a requested block is not found in a respective one of the cache memory subsystems 124a-124b or in the shared cache memory subsystem 128, then a read request may be generated and transmitted to the memory controller 130. The memory controller 130 may translate an address corresponding to the requested block and send a read request to the off-chip DRAM 170 through the memory bus 150. The off-chip DRAM 170 may be filled with data from the off-chip disk memory 162 through the I/O controller and bus 160 and the memory bus 150).

Claim 17 is the corresponding method claim to the system claim 8. It is rejected with the same references and rationale.

Regarding claim 9, Loh in view of Bryant teaches The system of claim 1, wherein the memory controller is further configured to: if the second tag data does not match the first tag data, return an indication of a cache miss to the cache controller (Loh paragraphs [0072-0074], In block 504, the processing unit may determine a given memory request misses within a cache memory subsystem within the processing unit. In block 506, the processing unit may send an address corresponding to the given memory request to an in-package integrated DRAM cache, such as the 3D DRAM. The address may include a non-translated cache tag in addition to a DRAM address translated from a corresponding cache address used within the processing unit to access on-chip caches. In block 508, control logic within the 3D DRAM may identify a given row corresponding to the address within the memory array banks in the 3D DRAM. In block 510, control logic within the 3D DRAM may activate and open the given row. In block 512, the contents of the given row may be copied and stored in a row buffer. In block 514, the tag information in the row buffer may be compared with tag information in the address. The steps described in blocks 506-512 may correspond to the sequences 1-4 described earlier regarding FIG. 4. If the tag comparisons determine a tag hit does not occur (conditional block 516), then in block 518, the memory request may be sent to main memory. The main memory may include an off-chip non-integrated DRAM and/or an off-chip disk memory. If the tag comparisons determine a tag hit occurs (conditional block 516), then in block 520, read or write operations are performed on a corresponding cache line in the row buffer. When the data being compared results in a difference (i.e., not matching), the memory can return a cache miss).

Claim 18 is the corresponding method claim to the system claim 9. It is rejected with the same references and rationale.


Claims 3 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Loh in view of Bryant as applied to claims 1 and 10 above, and further in view of Nale (US Publication No. 2019/0102313 -- "Nale").

Regarding claim 3, Loh in view of Bryant in further view of Nale teaches The system of claim 1, wherein to determine that the first read request comprises a tag read request, the memory controller is configured to: read an identifier in the first read request, the identifier indicating that the first read request is a tag read request (Nale paragraph [0013], Various embodiments described herein include a memory controller that can store a copy of a portion of a critical chunk in a spare lane such that the entire critical chunk can be provided to a CPU using one half of a cache line. In some embodiments, the memory controller may utilize the spare lane to store an entire critical chunk in each half of a cache line. For example, the critical chunk may include metadata (e.g., a cache tag or a Read ID) stored in both halves of a cache line. In such examples, the metadata that is normally in the first half of the cache line may be copied or mapped to spare lane bits associated with the second half of the cache line and metadata that is normally in the second half of the cache line may be copied or mapped to spare lane bits associated with the first half of the cache line. The read/access request includes metadata (i.e., an identifier) which indicates whether or not a tag is present for the read request target).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Loh and Bryant with those of Nale. Nale teaches using an identifier to indicate whether or not the first read request is a tag read request, which allows the system to more easily identify and classify when a tag read request occurs versus a normal read occurring, resulting in improved memory performance (Nale paragraph [0013], Various embodiments described herein include a memory controller that can store a copy of a portion of a critical chunk in a spare lane such that the entire critical chunk can be provided to a CPU using one half of a cache line. In some embodiments, the memory controller may utilize the spare lane to store an entire critical chunk in each half of a cache line. For example, the critical chunk may include metadata (e.g., a cache tag or a Read ID) stored in both halves of a cache line. In such examples, the metadata that is normally in the first half of the cache line may be copied or mapped to spare lane bits associated with the second half of the cache line and metadata that is normally in the second half of the cache line may be copied or mapped to spare lane bits associated with the first half of the cache line. In embodiments, the memory controller may allow critical chunk operations to be used in 2LM and/or DDR-T2 environments when a spare lane is implemented by the memory controller and the DDR-T interface. In one or more embodiments, the memory controller may store the critical chunk in the same locations in the two halves of the cache line such that the memory controller does not have to multiplex (MUX) the data depending on which half comes first. In various embodiments, the memory controller may arrange the bits in a critical chunk separately, such as depending on which half of a cache line is requested. In these and other ways the memory controller may enable reliable and efficient critical chunk operation to achieve improved memory performance, such as by reducing the overall number of memory operations required to provide a critical chunk to a CPU, resulting in several technical effects and advantages).

Claim 12 is the corresponding method claim to the system claim 3. It is rejected with the same references and rationale.


Claims 4 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Loh in view of Bryant as applied to claims 1 and 10 above, and further in view of Kim et al. (US Publication No. 2017/0168931 -- "Kim").

Regarding claim 4, Loh in view of Bryant in further view of Kim teaches The system of claim 1, wherein the memory controller is further configured to: send the second tag data read from the cache memory to the cache controller over the interconnect (Kim claim 11, The nonvolatile memory module of claim 9, wherein, when a read request is received, a second match signal indicating the cache hit corresponding to the read request is generated by reading a second tag from the tag array and comparing the read second tag with second tag information received with the second tag, and second cache data corresponding to the read request is read from the data array in response to the second match signal.  Second tag data is read from the cache memory to the controller, which is used for all read operations, see Kim paragraph [0056], FIG. 3 is a block diagram for conceptually illustrating the tag DRAM 331 and the data DRAM 332 of FIG. 2. Referring to FIG. 3, the tag DRAM 331 and the data DRAM 332 may include the same elements, for example, memory cell arrays 331-1 and 332-1, tag comparison circuits 331-5 and 332-5, and multiplexers (Mux Circuit) 331-6 and 332-6. In some embodiments, each of the tag DRAM 331 and the data DRAM 332 may include a dual port DRAM. The dual port DRAM may include input/output ports respectively corresponding to different kinds of devices, for example, data buffer/nonvolatile memory controller. A data path of the dual port DRAM may be connected to a first external device, for example, a data buffer, or a second external device, for example, a nonvolatile memory controller, based on the selection of the multiplexer, that is, multiplexers, 331-6 or 332-6).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Loh and Bryant with those of Kim. Kim teaches sending a second tag data from the cache memory to the cache controller, to later be used in a comparison, which allows the memory system to compare and contrast the two separate tags for improving memory reliability and consistency (Kim paragraph [0049-0050], At least one DRAM 331 of the plurality of first DRAMs 330-1 and the plurality of second DRAMs 330-2 may store a tag corresponding to a cache line and compare stored tag information with input tag information. The remaining DRAMs may be implemented to store cache data corresponding to the tag. Hereinafter, a DRAM, which stores tags, may be referred to as "tag DRAM", and each of the remaining DRAMs may be referred to as "data DRAM". The at least one DRAM 331 may be a tag DRAM. DRAM 332 may be a data DRAM. [0050] In some embodiments, the tag DRAM 331 may store a 4-byte tag. In some embodiments, the tag DRAM 331 may store tags in a 2-way, 1:8 direct mapping scheme. The tag may include location information about cache data stored in the data DRAMs and dirty/clear information indicating validity of cache data. In some embodiments, the tag may include an error correction value for error correction. Thus, the tag DRAM 331 may further include an error correction circuit for correcting an error. The memory module control device 350 may provide tag information to the DRAM 330-2).

Claim 13 is the corresponding method claim to the system claim 4. It is rejected with the same references and rationale.


Claims 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Loh in view of Bryant in further view of Jiang (US Publication No. 2018/0173640 -- "Jiang").

Regarding claim 19, Loh teaches A device comprising: a cache memory; and a memory controller coupled to the cache memory, wherein the memory controller is configured to: (Loh Figure 3; Loh paragraph [0021], Referring to FIG. 1, a generalized block diagram of one embodiment of a computing system 100 is shown. As shown, microprocessor 110 may include one or more processor cores 122a-122b connected to corresponding one or more cache memory subsystems 124a-124b. The microprocessor may also include interface logic 140, a memory controller 130, system communication logic 126, and a shared cache memory subsystem 128. In one embodiment, the illustrated functionality of the microprocessor 110 is incorporated upon a single integrated circuit. In another embodiment, the illustrated functionality is incorporated in a chipset on a computer motherboard. Note that while the controller does not describe a physical comparator entity, it does contain a comparator in that the controller performs the function of comparing a plurality of tags from requests, see Loh paragraph [0048], Similar to other DRAM topologies, the 3D DRAM 330 may include multiple memory array banks 332a-332b. Each one of the banks 332a-332b may include a respective one of the row buffers 334a-334b. Each one of the row buffers 334a-334b may store data in an accessed row of the multiple rows within the memory array banks 332a-332b. The accessed row may be identified by a DRAM address in the received memory request. The control logic 336 may perform tag comparisons between a cache tag in a received memory request and the one or more cache tags stored in the row buffer. In addition, the control logic may alter a column access of the row buffers by utilizing the cache tag comparison results rather than a bit field within the received DRAM address) receive a first write request from a cache controller over an interconnect, the first write request comprising first tag data identifying a first cache line in the cache memory; (Loh paragraphs [0011-0012], In one embodiment, a computing system includes a processing unit and an integrated dynamic random access memory (DRAM). Examples of the processing unit include a general-purpose microprocessor, a graphics processing unit (GPU), an accelerated processing unit (APU), and so forth. The integrated DRAM may be a three-dimensional (3D) DRAM and may be included in a System-in-Package (SiP) with the processing unit. The processing unit may utilize the 3D DRAM as a cache. [0012] In various embodiments, the 3D DRAM may store both a tag array and a data array. Each row of the multiple rows in the memory array banks of the 3D DRAM may store one or more cache tags and one or more corresponding cache lines indicated by the one or more cache tags. In response to receiving a memory request from the processing unit, the 3D DRAM may perform a memory access according to the received memory request on a given cache line indicated by a cache tag within the received memory request. Performing the memory access may include a single read of a respective row of the multiple rows storing the given cache line. Rather than utilizing multiple DRAM transactions, a single, complex DRAM transaction may be used to reduce latency and power consumption. The memory request can be considered a read request, which is associated with a first tag data that is used to identify a specific portion of the cache, i.e., the "given cache line", as stated in the reference) wherein the cache memory is separated from the cache controller by the interconnect; (Loh paragraph [0043], Turning now to FIG. 3, a generalized block diagram of one embodiment of a computing system 300 utilizing a three-dimensional (3D) DRAM is shown. Circuitry and logic described earlier are numbered identically. The computing system 300 may utilize three-dimensional (3D) packaging, such as a System in Package (SiP) as described earlier. The computing system 300 may include a SiP 310. In one embodiment, the SiP 310 may include the processing unit 220 described earlier and a 3D DRAM 330 that communicate through low-latency interconnect 340. The in-package low-latency interconnect 340 may be horizontal and/or vertical with shorter lengths than long off-chip interconnects when a SiP is not used. Loh Figure 3; Reference #330, 340, 220. Note that the processing unit 220 is separated from the 3D DRAM 330 (i.e, memory module) via the interconnect 340 (in this case, a low-latency interconnect)) compare the second tag data read from the cache memory to the first tag data received from the cache controller with the first read request; if the second tag data matches the first tag data and the first cache line is not already marked as dirty, modify a dirty status indicator for the first cache line before a second write request is received from the cache controller over the interconnect; (Loh paragraphs [0060-0061], A sequence of steps 1-7 is shown in FIG. 4 for accessing tags, status information and data corresponding to cache lines stored in a 3D DRAM. When the memory array bank 430 is used as a cache storing both a tag array and a data array within a same row, an access sequence different from a sequence utilizing steps 1-7 for a given row of the rows 432a-432k may have a large latency. For example, a DRAM access typically includes an first activation or opening stage, a stage that copies the contents of an entire row into the row buffer, a tag read stage, a tag comparison stage, a data read or write access stage that includes a column access, a first precharge or closing stage, a second activation or opening stage, a stage that copies the contents of the entire row again into the row buffer, a tag read stage, a tag comparison stage, an update stage for status information corresponding to the matching tag, and a second precharge or closing stage. The two separate tags are compared to one another. If the two tags match, then the memory access request that was previously described may be executed, which can involve a plurality of actions as well as cache lines, see Loh paragraph [0061], Continuing with the access steps within the memory array bank 430, one or more additional precharge and activation stages may be included after each access of the row buffer if other data stored in other rows are accessed in the meantime. Rather than utilize multiple DRAM transactions for a single cache access, the sequence of steps 1-7, may be used to convert a cache access into a single DRAM transaction. Each of the different DRAM operations, such as activation/open, column access, read, write, and precharge/close, has a different respective latency).
Loh does not teach a cache memory comprising a plurality of sets of cache lines each comprising a plurality of cache storage locations, wherein a first location of a first set of the plurality of sets of cache lines comprises cache tag data for the plurality of sets of cache lines; the first write request from the first location of the first set of the plurality of sets of cache line in the cache memory; determine that the first write request comprises a tag write request; read second tag data corresponding to the first write request from the cache memory; receive the second write request is received from the cache controller over the interconnect; perform a write operation on the first cache line.
However, Bryant teaches a cache memory comprising a plurality of sets of cache lines each comprising a plurality of cache storage locations, wherein a first location of a first set of the plurality of sets of cache lines comprises cache tag data for the plurality of sets of cache lines; (Bryant paragraph [0062], For example, if one additional bit is provided in field 122, such that the virtual address index bits are formed by bits 12 to 6 of the virtual address in the illustrated example, this enables two aliased locations to be identified within a 32 Kbyte four-way cache, as illustrated schematically in FIG. 2A. In particular, the storage 150 within the level 1 cache 100 may comprise of a number of sets 160a to 160g, where each set includes a plurality of ways 170a to 170d. For ease of illustration, the storage structure 150 does not show separate tag RAMs and data RAMs, but typically there will be a separate tag RAM entry for each cache line in the data RAM, the tag RAM entry identifying physical address bits, valid bits, dirty bits, etc. and the corresponding cache line containing the data. The tag RAMs and data RAMs are accessed in the same way, and in particular the cache access circuitry 140 will use the virtual index derived from the virtual address in order to determine the appropriate set to access (each set containing a cache line in each way, and the corresponding tag RAM entries for those cache lines). In this example, because the virtual address index bits are bits 12 to 6, there are two potential aliased locations, as indicated by the shaded sets 160b and 160e. If instead the virtual index extends from bits 13 down to 6, then there would be four aliased locations in a 64 Kbyte four-way cache. The cache memory contains a plurality of cache lines which utilize cache tags in order to identify and label storage locations (both physical and virtual address). Each contains a plurality of tags and a plurality of cache lines). Also see Bryant paragraph [0006], In a further example configuration, there is provided a method of handling access requests in an apparatus comprising: employing processing circuitry to process a plurality of program threads to perform data processing operations on data, the operations identifying the data using virtual addresses, and the virtual addresses being mapped to physical addresses within a memory system; providing a cache storage having a plurality of cache entries to store data, an aliasing condition existing when multiple virtual addresses map to the same physical address, and allocation of data into the cache storage being constrained to prevent multiple cache entries of the cache storage simultaneously storing data for the same physical address and Paragraph [0019], Often, a cache storage arranged in this way is referred to as a virtually indexed physically tagged (VIPT) cache. One benefit of arranging a cache in that way is that a lookup can begin to be performed within the cache whilst the physical address is still being determined from the specified virtual address. This can provide some performance benefits) the first write request from the first location of the first set of the plurality of sets of cache line in the cache memory; (See Bryant as described above. Also see Bryant paragraphs [0048-0049] and [0050], In particular, once a set has been identified using the cache index, then it is determined whether a hit is detected within one of the cache lines of that set by comparing a physical address portion stored in association with that cache line with a tag portion of the physical address produced by the TLB circuitry 35. In the presence of a hit, the access can proceed within the level 1 cache. However, in the event of a miss, the access request can be propagated onto the further levels of cache/main memory 30 in order to cause the required data to be accessed. As part of this process, a linefill operation may occur within the level 1 cache 25 in order to store a cache line's worth of data containing the data being accessed by the access request, so that that data is then available in the level 1 cache for any subsequent access request that also seeks to access the data in that cache line. As shown in FIG. 1, in the example illustrated therein any accesses to the further levels of cache or main memory 30 are performed using the physical address determined by the TLB circuitry 35. The first set of cache lines can be determined and used to provide data access of exclusive operations).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Loh with those of Bryant. Bryant teaches having a plurality of cache lines and sets of cache lines which comprise cache tag data and storage information. This can improve both cache efficiency and cache reliability, such as by ensuring that multiple different entries storing data for the same physical address, or attempting access to a cache entry using a virtual address that maps to the same physical address as other virtual addresses, and other various dilemmas (Bryant paragraph [0017], The operations identify the data using virtual addresses, and the virtual addresses are mapped to physical addresses within a memory system. The apparatus also has a cache storage providing a plurality of cache entries for storing data. An aliasing condition exists when multiple virtual addresses map to the same physical address, and allocation of data into the cache storage is constrained to prevent multiple cache entries of the cache storage simultaneously storing data for the same physical address. In particular, since the cache index used to identify one or more entries within the cache storage that can be used to store the data is usually determined with reference to the virtual address, when two different virtual addresses are used for the same physical address, different entries in the cache storage would be identified dependent on which virtual address is used. The cache storage can be arranged to prevent different entries storing data for the same physical address at any point in time, for example by evicting the contents in one cache entry identified by a first virtual address, when an access is then attempted to another entry using a second virtual address that maps to the same physical address. Once the first entry's contents have been evicted, the entry identified using the second virtual address can then be populated with the data. Additionally, the efficiency can see improvements via quicker cache look ups (Bryant paragraph [0019], Often, a cache storage arranged in this way is referred to as a virtually indexed physically tagged (VIPT) cache. One benefit of arranging a cache in that way is that a lookup can begin to be performed within the cache whilst the physical address is still being determined from the specified virtual address. This can provide some performance benefits).

determine that the first write request comprises a tag write request; read second tag data corresponding to the first write request from the cache memory; receive the second write request is received from the cache controller over the interconnect; perform a write operation on the first cache line.
However, Jiang teaches determine that the first write request comprises a tag write request; read second tag data corresponding to the first write request from the cache memory; (Jiang paragraph [0004], Embodiments of the present disclosure also provide a method of operating a cache that comprises a tag array, a tag control buffer, a data array, and a write buffer. The method comprises: receiving a first write request including write data and a memory address; receiving a second data access request; determining a tag address based on the memory address; performing a first read operation to the tag array to determine if there is a cache-hit; and responsive to determining that there is a cache-hit: performing a write operation to the write buffer to store information related to the first write request, performing a write operation to the tag control buffer to update stored cache control information, performing second read operations to the tag array and to the data array for the second data access request, and performing a write operation to a first data entry of the data array based on the information related to the first write request stored in the write buffer. The write request can be a tag write request and can read the tag data) receive the second write request is received from the cache controller over the interconnect; perform a write operation on the first cache line (Jiang claim 12, The method of claim 11, wherein the second read operations to the tag array and to the data array for second request are performed before the write operation to the tag control buffer is completed.  The second write/read requests can be received to perform the operation on the initial storage/operation location. Also see paragraph [0005], Embodiments of the present disclosure also provide a computer system. The computer system comprises a hardware processor and a hierarchical memory system coupled with the hardware processor. The hierarchical memory system comprises a dynamic random access memory device and a cache. The cache comprises a tag array configured to store one or more tag addresses, a tag control buffer configured to store cache control information, a data array configured to store data acquired from the dynamic random access memory device, and a write buffer configured to store information related to a write request from the hardware processor. The tag array is configured to be accessed independently from the tag control buffer, and the data array is configured to be accessed independently from the write buffer).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Loh and Bryant with those of Jiang. Jiang teaches using a write request to determine if there is a write tag request associated with it, and to use the tag data to perform write operations. This allows the system to more accurately monitor data that is stored in the data arrays and entries, as well as performing operations in a more efficient manner based on the information that the tags relay to the memory controller (Jiang paragraphs [0006-0007], Embodiments of the present disclosure also provide a cache. The cache comprises a tag array configured to store one or more tag addresses and cache control information, a data array configured to store data acquired from a memory device, and a write buffer configured to store information related to a write request. The data array is configured to be accessed independently from the write buffer. [0007] Embodiments of the present disclosure also provide a method of operating a cache that comprises a tag array, a data array, and a write buffer. The method comprises: receiving a write request including a first data and a memory address; determining a tag address based on the memory address; performing a read operation to the tag array to determine if there is a cache-hit; responsive to determining that there is a cache-hit, performing a write operation to the write buffer to store the first data, and performing a write operation to the tag array to update stored cache control information; and responsive to determining that preset condition is satisfied, performing a write operation to the data array based on the first data stored in the write buffer).

Regarding claim 20, Loh in view of Bryant in further view of Jiang teaches The device of claim 19, wherein the memory controller is further configured to: if the second tag data does not match the first tag data, send at least one of the second tag data read from the cache memory or an indication of a cache miss to the cache controller over the interconnect (Loh paragraphs [0072-0074], In block 504, the processing unit may determine a given memory request misses within a cache memory subsystem within the processing unit. In block 506, the processing unit may send an address corresponding to the given memory request to an in-package integrated DRAM cache, such as the 3D DRAM. The address may include a non-translated cache tag in addition to a DRAM address translated from a corresponding cache address used within the processing unit to access on-chip caches. In block 508, control logic within the 3D DRAM may identify a given row corresponding to the address within the memory array banks in the 3D DRAM. In block 510, control logic within the 3D DRAM may activate and open the given row. In block 512, the contents of the given row may be copied and stored in a row buffer. In block 514, the tag information in the row buffer may be compared with tag information in the address. The steps described in blocks 506-512 may correspond to the sequences 1-4 described earlier regarding FIG. 4. If the tag comparisons determine a tag hit does not occur (conditional block 516), then in block 518, the memory request may be sent to main memory. The main memory may include an off-chip non-integrated DRAM and/or an off-chip disk memory. If the tag comparisons determine a tag hit occurs (conditional block 516), then in block 520, read or write operations are performed on a corresponding cache line in the row buffer. When the data being compared results in a difference (i.e., not matching), the memory can return a cache miss).

Claim Interpretation
Independent claim 10 is a method claim which features a contingent limitation. Specifically, the limitation “if the second tag data matches the first tag data, initiating, by the memory controller, an action with respect to the first cache line in the cache memory” is contingent. This effectively means that the prior art disclosed does not need to contain this limitation in the 103 Prior Art Obviousness Rejection, as the limitation itself does not necessarily need to occur. In other words, the examiner can interpret the claim as having a situation where the second data tag does not match the first data tag, in response to the second data tag matching the first tag data ....”

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 10 and 19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
The rejection of independent claim 1 has been changed to reflect the amended claim limitations. The rejection of independent claims 1 and 10 now use the original primary reference Loh as well as a secondary reference Bryant. The teachings of Bryant refer to cache tags reflecting the storage locations of a plurality of cache lines for a plurality of cache sets. Claim 19 still uses references Loh and Jiang, but also use Bryant as well.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  


Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONAH C KRIEGER whose telephone number is (571)272-3627.  The examiner can normally be reached on Monday - Friday 8 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on (571)272-4085.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  






/J.C.K./           Examiner, Art Unit 2136                                                                                                                                                                                                                                                                                                                                                                                 /CHARLES RONES/Supervisory Patent Examiner, Art Unit 2136