DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event a determination of the status of the application as subject to AIA  35 U.S.C. 102, 103, and 112 (or as subject to pre-AIA  35 U.S.C. 102, 103, and 112) is incorrect, any correction of the statutory basis for a rejection will not be considered a new ground of rejection if the prior art relied upon and/or the rationale supporting the rejection, would be the same under either status.  

Notice of Claim Interpretation
Claims in this application are not interpreted under 35 U.S.C. 112(f) unless otherwise noted in an office action.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 5 May 2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-9, 11-15, 17-19, 21 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Cheriton et al. (“Software-Controlled Caches in the VMP Multiprocessor”) in view of Zheng et al. (“Performance Evaluation of Exclusive Cache Hierarchies”).
In regards to claims 1, 11, 21, and 22, Cheriton teaches a method of operating a data processing system, the data processing system comprising:
a memory system (“VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory.”, abstract, paragraph 1);
a processor (“VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory.”, abstract, paragraph 1); and
a cache system configured to transfer data stored in the memory system to the processor for use by the processor when performing data processing operations and to transfer data from the processor to the memory system (See figure 1);
the cache system comprising a cache configured to receive data from the memory system and to provide data to the processor for use when performing processing operations and to receive data from the processor for sending to the memory system (“VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory.”, abstract, paragraph 1);
the data processing system further comprising a processing unit operable to read data from the cache (“VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory.”, abstract, paragraph 1);
the method comprising:
when the processing unit requires data from the cache, sending a read request for the data to the cache (“The processor is directly connected to a virtually addressed cache, as depicted in Figure 1. That is, the cache contents are addressed by virtual address, rather than by physical addresses.1 Thus, in the absence of a cache miss, the memory reference is satisfied at maximum speed because the processor is the single master of the cache and it executes synchronously with respect to the cache, i.e. no arbitration is required and there is no virtual-to-physical address translation as part of a cache reference.”, section 2, paragraph 1);
the cache system, in response to the read request, determining whether the requested data is present in the cache (“The processor is directly connected to a virtually addressed cache, as depicted in Figure 1. That is, the cache contents are addressed by virtual address, rather than by physical addresses.1 Thus, in the absence of a cache miss, the memory reference is satisfied at maximum speed because the processor is the single master of the cache and it executes synchronously with respect to the cache, i.e. no arbitration is required and there is no virtual-to-physical address translation as part of a cache reference.”, section 2, paragraph 1); and
when the requested data is present in the cache, returning the data from the cache to the processing unit (“The processor is directly connected to a virtually addressed cache, as depicted in Figure 1. That is, the cache contents are addressed by virtual address, rather than by physical addresses.1 Thus, in the absence of a cache miss, the memory reference is satisfied at maximum speed because the processor is the single master of the cache and it executes synchronously with respect to the cache, i.e. no arbitration is required and there is no virtual-to-physical address translation as part of a cache reference.”, section 2, paragraph 1); and
when it is determined that the requested data is not present in the cache, returning an indication of that to the processing unit, without the cache system causing any external memory transactions (“On cache miss, the cache controller signals a processor exception interrupt (bus error) and generates a suggested cache slot 2 to use for the missing cache page.”, section 2, paragraph 3; “Assuming the virtual memory page is present in the main memory, the processor instructs the block copier to copy the required data from main memory into the cache, specifying the cache flags to be assigned to the cache slot if the copy succeeds. … If the required data is not in main memory, the operating system page fault handler is given control.”, section 2, paragraph 4; “It also offers the flexibility to experiment with different techniques of virtual-to-physical address translation and cache loading and replacement policies without hardware modification.”, section 2, paragraph 9).
Cheriton fails to teach that when the requested data is present in the cache, invalidating the entry for the data in the cache.  Zheng teaches that when the requested data is present in the cache, invalidating the entry for the data in the cache (Yes arrow of L2 hit? leads to invalidate the block, figure 3) which “allows a tradeoff between optimizing hit time and miss rate” (section 1, paragraph 1).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Cheriton with Zheng such that when the requested data is present in the cache, invalidating the entry for the data in the cache which “allows a tradeoff between optimizing hit time and miss rate” (id.).
In regards to claims 2 and 12, Zheng further teaches that the cache is an L2 cache of the cache system (L2 hit?, figure 3).
In regards to claims 3 and 13, Zheng further teaches that the cache system includes multiple cache levels (“A two-level cache hierarchy allows a tradeoff between optimizing hit time and miss rate [9].”, section 1, paragraph 1), and the determining of whether the requested data is present in the cache comprises:
determining whether the requested data is present in the cache to which the read request is made (L1 miss, figure 3) or in a lower level cache of the cache system (L2 hit?, figure 3).
In regards to claims 4 and 14, Zheng further teaches that determining whether the data is present in the cache comprises:
determining whether the data is present in the cache level to which the read request is made, and in the event that the data is not present in that cache level, then determining whether the data is present in a lower level cache, and in the event that the data is present in a lower level cache, evicting data from the lower level cache to the cache to which the read request was made, such that the cache to which the read request was made can then return the requested data to the processing unit (“In two-level exclusive caching, when a load misses in the L1 and hits in the L2, the contents of L1 and L2 are swapped. That is, the victim block from the L1 cache is first transferred to the victim buffer; then the referenced data is transferred from the L2 cache into the LI cache, after the request is serviced, the victim block is transferred to the second-level cache.”, section 3.2, paragraph 2; See also figure 3).
In regards to claims 5 and 15, Cheriton further teaches that the processing unit is a processing unit that is associated with the cache of the cache system (“The processor is directly connected to a virtually addressed cache, as depicted in Figure 1.”, section 2, paragraph 1).
In regards to claims 7 and 17, Cheriton further teaches when the requested data is present in the cache, also returning to the processing unit an indication whether the returned data is dirty or not (“For each cache page, flags are maintained that indicate: valid, modified, exclusive-ownership, supervisor writable, user readable and user writable.”, section 4, paragraph 5; “On interrupt, the processor writes out the cache page (if dirty).”, section 3.3, paragraph 4).
In regards to claims 8 and 18, Cheriton further teaches when the cache system returns the requested data, the processing unit performing at least one of processing the returned data (“If not, a read cache miss to this area is handled by a read-private bus transaction, eliminating the need to later do an assert-ownership on the first write operation.”, section 5.4, paragraph 2); and
writing the returned data back to the memory system (“• write-back - to write the cache page back to main memory, releasing ownership of the page.”, section 3.1, paragraph 2, bullet 4).
In regards to claims 9 and 19, Cheriton further teaches when the cache system returns an indication that the requested data is not present in the cache, the processing unit determining whether to send a request to the memory system for the data (“On cache miss, the cache controller signals a processor exception interrupt (bus error) and generates a suggested cache slot2 to use for the missing cache page. On exception interrupt, … Assuming the virtual memory page is present in the main memory, the processor instructs the block copier to copy the required data from main memory into the cache, specifying the cache flags to be assigned to the cache slot if the copy succeeds. … If the required data is not in main memory, the operating system page fault handler is given control.”, section 2, paragraphs 3-4).

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Cheriton et al. (“Software-Controlled Caches in the VMP Multiprocessor”) in view of Zheng et al. (“Performance Evaluation of Exclusive Cache Hierarchies”) and Abali et al. (“Performance of Hardware Compressed Main Memory”).
In regards to claims 6 and 16, Cheriton in view of Zheng teaches claims 1 and 11.  Cheriton in view of Zheng fails to teach that the processing unit is a data encoder associated with the cache and configured to:
when data is to be written from the cache to the memory system, encode uncompressed data from the cache for storing in the memory system in a compressed format and send the data in the compressed format to the memory system for storing; and
when data in a compressed format is to be read from the memory system into the cache, decode the compressed data from the memory system and store the data in the cache in an uncompressed format.
Abali teaches that the processing unit is a data encoder associated with the cache (“The L3 Cache/Compressed Memory Controller is central to the operation of the MXT system. The L3 cache appears as the main memory to the upper layers of the memory hierarchy and its operation is transparent to the rest of the hardware including the processors and I/O. The controller compresses 1 KB cache lines before writing them to the compressed memory and decompresses them after reading from the compressed memory.”, section 2.1, paragraph 1) and configured to:
when data is to be written from the cache to the memory system, encode uncompressed data from the cache for storing in the memory system in a compressed format and send the data in the compressed format to the memory system for storing (“The L3 cache contains uncompressed cached lines. … The controller compresses 1 KB cache lines before writing them to the compressed memory and decompresses them after reading from the compressed memory.”, section 2.1, paragraph 1); and
when data in a compressed format is to be read from the memory system into the cache, decode the compressed data from the memory system and store the data in the cache in an uncompressed format (“The L3 cache contains uncompressed cached lines. … The controller compresses 1 KB cache lines before writing them to the compressed memory and decompresses them after reading from the compressed memory.”, section 2.1, paragraph 1)
in order to achieve “savings in memory cost” (section 1, paragraph 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Cheriton with Zheng and Abali such that the processing unit is a data encoder associated with the cache and configured to:
when data is to be written from the cache to the memory system, encode uncompressed data from the cache for storing in the memory system in a compressed format and send the data in the compressed format to the memory system for storing; and
when data in a compressed format is to be read from the memory system into the cache, decode the compressed data from the memory system and store the data in the cache in an uncompressed format
in order to achieve “savings in memory cost” (id.).

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Cheriton et al. (“Software-Controlled Caches in the VMP Multiprocessor”) in view of Zheng et al. (“Performance Evaluation of Exclusive Cache Hierarchies”) and Espasa et al. (“Tarantula: A Vector Extension to the Alpha Architecture”).
In regards to claims 10 and 20,  Cheriton further teaches the processing unit processing the returned data (“If not, a read cache miss to this area is handled by a read-private bus transaction, eliminating the need to later do an assert-ownership on the first write operation.”, section 5.4, paragraph 2) and writing it back to the memory system (“• write-back - to write the cache page back to main memory, releasing ownership of the page.”, section 3.1, paragraph 2, bullet 4); 
the processing unit determines whether to request data relating at least to the one or more of the read requests that returned an indication that the requested data was not stored in the cache from the memory system (“On cache miss, the cache controller signals a processor exception interrupt (bus error) and generates a suggested cache slot 2 to use for the missing cache page.”, section 2, paragraph 3; “Assuming the virtual memory page is present in the main memory, the processor instructs the block copier to copy the required data from main memory into the cache, specifying the cache flags to be assigned to the cache slot if the copy succeeds. … If the required data is not in main memory, the operating system page fault handler is given control.”, section 2, paragraph 4); and
the processing unit processing the combined set of data (“If not, a read cache miss to this area is handled by a read-private bus transaction, eliminating the need to later do an assert-ownership on the first write operation.”, section 5.4, paragraph 2); and writing the processed combined set of data back to the memory system (“• write-back - to write the cache page back to main memory, releasing ownership of the page.”, section 3.1, paragraph 2, bullet 4).
Cheriton in view of Zheng fails to teach the processing unit sending a set of plural read requests to the cache together;
the method further comprising:
when the cache returns the requested data for all of the plurality of read requests:
the processing unit processing the returned data; and
when the cache returns the requested data for only some but not all of the plurality of read requests, and one or more of the read requests returns an indication that the requested data is not stored in the cache:
when it has the data from the memory system, combining the data returned from the cache for the read request(s) for which data was present in the cache, with data from the memory system for the one or more of the read requests that returned an indication that the requested data was not stored in the cache, to provide a combined set of data, and then processing the combined set of data.
Espasa teaches the processing unit sending a set of plural read requests to the cache together (“If properly aligned, the 128 quadwords requested by a stride-1 instruction are contained in exactly 16 cache lines (17 if the base address is not aligned to a cache line boundary).”, page 285, paragraph 6);
the method further comprising:
when the cache returns the requested data for all of the plurality of read requests (“Strides marked with the ‘pump’ bit read out 16 cache lines from the data array just like any other slice does. But, as opposed to normal slices, the sixteen full cache lines are latched into one of the four registers (16x512 bits each) of the PUMP structure.”, page 285, paragraph 7):
the processing unit processing the returned data (“From there, a sequencer in each bank reads two quadwords per cycle and sends them to the Vbox.”, page 285, paragraph 7); and
when the cache returns the requested data for only some but not all of the plurality of read requests, and one or more of the read requests returns an indication that the requested data is not stored in the cache (“The Vbox has sent a read slice to the L2 cache and several of its sixteen addresses miss in the lookup stage.”, page 286, paragraph 7):
when it has the data from the memory system, combining the data returned from the cache for the read request(s) for which data was present in the cache, with data from the memory system for the one or more of the read requests that returned an indication that the requested data was not stored in the cache, to provide a combined set of data, and then processing the combined set of data (“The slice is ‘put to sleep’ in the MAF and a ‘waiting’ bit is set for each of its sixteen addresses that missed. As each individual cache line arrives to the L2 from the system, it searches the MAF for matching addresses. For each matching address, its ‘waiting’ bit is cleared. When all ‘waiting’ bits are clear, the slice wakes up and goes to the Retry Queue (a structure within the L2 cache itself). From there, the slice will retry, walk down the L2 pipe again and lookup the tag array a second time.”, page 286, paragraph 8)
in order to obtain “good speedups over a conventional superscalar processor” (page 281, paragraph 4).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Cheriton with Zheng and Espasa to include the processing unit sending a set of plural read requests to the cache together;
the method further comprising:
when the cache returns the requested data for all of the plurality of read requests:
the processing unit processing the returned data; and
when the cache returns the requested data for only some but not all of the plurality of read requests, and one or more of the read requests returns an indication that the requested data is not stored in the cache:
when it has the data from the memory system, combining the data returned from the cache for the read request(s) for which data was present in the cache, with data from the memory system for the one or more of the read requests that returned an indication that the requested data was not stored in the cache, to provide a combined set of data, and then processing the combined set of data
in order to obtain “good speedups over a conventional superscalar processor” (id.).

Claims 23 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Cheriton et al. (“Software-Controlled Caches in the VMP Multiprocessor”) in view of Zheng et al. (“Performance Evaluation of Exclusive Cache Hierarchies”) and Barroso et al. (“Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing”).
In regards to claims 23 and 25, Zheng further teaches that the cache system includes multiple cache levels (“Many desktop microprocessors use an on-chip two-level cache hierarchy.”, section 1, paragraph 1), and the determining of whether the requested data is present in the cache comprises: 
determining whether the requested data is present in the cache to which the read request is made (“In two-level exclusive caching, when a load misses in the L1 and hits in the L2, the contents of LI and L2 are swapped.”, section 3.2, paragraph 2).
Cheriton in view of Zheng fails to teach, in the event that the data is not present in that cache level, then sending a read request to a lower level of the cache system for the requested data, but not sending a read request for the requested data to a higher level cache of the cache system.  Barroso teaches, in the event that the data is not present in that cache level, then sending a read request to a lower level of the cache system for the requested data, but not sending a read request for the requested data to a higher level cache of the cache system (“Depending on the state at the L2, the L2 can possibly (a) service the request directly, (b) forward the request to a local (owner) L1, (c) forward the request to one of the protocol engines, or (d) obtain the data from memory through the memory controller (only if the home is local).”, section 2.3, paragraph 5) in order to “avoid the use of snooping at L1 caches” (section 2.3, paragraph 2).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Cheriton with Zheng and Barroso such that, in the event that the data is not present in that cache level, then sending a read request to a lower level of the cache system for the requested data, but not sending a read request for the requested data to a higher level cache of the cache system in order to “avoid the use of snooping at L1 caches” (id.).

Claims 24 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Cheriton et al. (“Software-Controlled Caches in the VMP Multiprocessor”) in view of Zheng et al. (“Performance Evaluation of Exclusive Cache Hierarchies”) and Stallings (Computer Organization and Architecture: Designing for Performance).
In regards to claims 24 and 26, Cheriton in view of Zheng teaches claims 4 and 14.  Cheriton and Zheng fails to teach that, in the event that the data is present in a lower level cache and evicted from the lower level cache to the cache to which the read request was made, invalidating the entry for the data in the lower level cache.  Stallings teaches that, in the event that the data is present in a lower level cache and evicted from the lower level cache to the cache to which the read request was made, invalidating the entry for the data in the lower level cache (“The other processor gains access to the bus, writes the modified cache line back to main memory, and transitions the state of the cache line to invalid (because the initiating processor is going to modify this line).”, page 645, paragraph 1).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Cheriton with Zheng and Stallings such that, in the event that the data is present in a lower level cache and evicted from the lower level cache to the cache to which the read request was made, invalidating the entry for the data in the lower level cache in order to avoid reading an old value. 

Response to Arguments
Applicant's arguments filed 1 July 2022 have been fully considered but they are not persuasive.
The Examiner disagrees that Cheriton’s statement “each cache miss results in bus traffic” means that the cache system causes an external memory transaction.  First, the Examiner believes that the processor causes the bus traffic, not the cache system.  Cheriton is clear that, on a cache miss, the cache controller merely signals a processor exception interrupt and suggests a cache slot (section 2, paragraph 3).  From there, the processor is free to do whatever its software wants.  Second, it is unclear from the Cheriton’s statement that it is accounting for cache misses that result in segmentation faults.  While it is clear from Cheriton that the common case is that the processor decides to copy data from main memory into the cache, it is unclear that the cache miss handler does this in all situations.  Third, it is unclear from Cheriton’s statement that the bus traffic is necessarily an external memory transaction.  While Cheriton does teach that the central memory is connected to the bus (section 4, paragraph 2, bullet 2), it is also used for I/O units (section 4, paragraph 2) and (section 3.1, paragraph 2).  For these reasons, the Examiner does not believe that Cheriton teaches that the cache system causes external memory transactions.
The Examiner notes that Zheng has only been used to teach invalidating the entry in the cache.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NATHAN SADLER whose telephone number is (571)270-7699. The examiner can normally be reached Monday - Friday 9am - 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached on (571)272-4204. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Nathan Sadler/Primary Examiner, Art Unit 2139                                                                                                                                                                                                        11 July 2022