DETAILED ACTION
The following office action is sent in response to a Request for Continued Examination (RCE) filed January 18, 2021 for application 15/797,527.  
Claims 1 and 11 have been amended. No claims have been added.  Claims 8 and 18 have been cancelled. Thus claims 1-7, 9-17, and 19 have been examined.
Acknowledgement is made of applicant’s claim for foreign priority based on an application filed in People’s Republic of China. Examiner notes the priority documents to CN201710796762.0 have been received by the office.
The objections and rejections from the prior correspondence that are not restated herein are withdrawn.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/18/2021 has been entered.


Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1- 4, 9, and 11-14 are rejected under U.S.C. 103 as being unpatentable over Biles (Biles et al., US 2009/0216958 A1) in view of Hady (Hady, et al., US 2016/0188466 A1

Regarding claim 1, Biles teaches A processing system (Biles Fig. 1 SOC 2 and supporting para [0030]), comprising a cache; (Biles Fig. 1 and [0030] discloses memory system 8 shown in figure 1 is a hierarchical memory system comprising cache memory and a main memory as well as off chip memory/storage.) a host memory (Biles Fig. 1 Memory System 8 and supporting para 30.   See also Biles Fig. 1 and [0030], [0040] that discloses the system on the chip contains a general purpose processor 4, thus is an example of a host, and the memory system 8 is an example of host memory.); 
a CPU, (Biles Fig. 1 Core 4 and supporting para [0030]) bypassing an accelerator interface to directly access the cache and the host memory (Examiner notes that the instant application does not contain an explicit definition of an accelerator interface.   Consistent with paragraph [0016] of the instant application, the accelerator interface may be any component between the hardware accelerator and other modules in the processing system such as the cache and host memory.  Thus, the control 12 interface between the Core and HW Accelerator,  and generating at least on instruction, (Biles [0031] discloses control signals and task requests, where a control signal and a task request is an example of an instruction.)
a hardware accelerator  (Biles Fig. 1, HW Accel 6 and supporting paras [0030]) receiving the instruction from the CPU through the accelerator interface (Biles [0031]-[0032] discloses requests, including control signals and process requests, sent between the processor and the accelerator through communication channel 12 and further passes it to the MMU through Multiplexer 18, via communication channel 14, thus the ); 
However, Biles does not explicitly discloses operating in a non-temporal access mode or a temporal access mode according to an access behavior of the instruction received from the CPU through the accelerator interface, wherein the hardware accelerator accesses the host memory through the accelerator interface when the hardware accelerator operates in the non-temporal access mode, wherein the hardware accelerator access the cache through the accelerator interface when the hardware accelerator operates in the temporal access mode, and a monitor unit, monitoring multiple addresses of the access behavior that the hardware accelerator executes according to the instruction to determine whether the hardware accelerator operates in the non-temporal access mode or the temporal access mode.
Hady, of a similar field of endeavor, further teaches operating in a non-temporal access mode ... wherein the hardware accelerator accesses the host memory through the accelerator interface when the hardware accelerator operates in the non-temporal access mode,   (Hady [0002] discloses the cores my bypass the cache (which is an example of non-temporal access mode) and go directly to the main memory; where [0016] and [0026] teaches the cache bypassing requests may be handled by the NPU (the HW Accelerator of Biles).   See also Hady [0016] that disclose the NPU cores perform acceleration and Hady [0003] that discloses Hady is directed to network and graphics accelerators.    See also Biles [0004] that discloses Biles is directed to video/graphics accelerators.  Thus, the NPU cores of Hady would be a component of HW Accel 6 of Biles. )
 or a temporal access mode ... wherein the hardware accelerator access the cache through the accelerator interface when the hardware accelerator operates in the temporal access mode, (Hady discloses accessing memory using the cache (an example of temporal access mode) in para  [0002], [0010], and [0015].  See also Hady [0027] that discloses this may be a mode controlled by the programmer. )
according to an access behavior of the instruction received from the CPU through the accelerator interface, (Hady [0026] discloses the cache bypass mode may be set based on the instruction type, or the memory type.  Hady [0027] discloses that the programmer may direct the system to route commands to the cache or not based on the temporal or spatial locality of the data.   Hady [0019] discloses the cache may be instructed based on a NPU command. Thus, Hady discloses setting the cache mode based on the command instruction type, memory type, which may include bypass directions from the application program as to how that command is to be handled (either cached, or directed to the memory itself).  The a command or graphics/video request may be transferred from the cpu through the control 12 interface to the ; 
a monitor unit, (Hady [0023] discloses processor 12 may be a unit that implements cache coherent access to main memory and the cache between the processors (CPU core and NPU cores).  Hady [0019] discloses the NPU cores may set the caching mode via a NPU command, and also discloses the cache Fig. 1C 18) may also set the caching mode, all of which may be examples of the monitor unit.)
monitoring multiple addresses of the access behavior that the hardware accelerator executes according to the instruction to determine whether the hardware accelerator operates in the non-temporal access mode or the temporal access mode. (Hady [0019] discloses a plurality of request addresses are monitored to determine the memory type, which is an indication of the caching mode.  Hady [0026] discloses the cacheability of data involved may be based on a Memory Type Range Register (MTTR).  Thus, Hady monitors a range of addresses specified by the MTTR to determine if the command should be in temporal or non-temporal access mode.   Examiner notes that a range implies at least two addresses, thus represents a plurality of addresses.).
Biles and Hady are in a similar field of endeavor as all relate to caching memory, and more specifically to caching memory using a hardware accelerator.  Thus, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the caching modes of Hady, including monitoring memory addresses and prefetching memory not the cache,  into the solution of Biles.  One would be motivated to 
The reasons for obviousness to combine Hady into Biles for claims 2-7, and 9-10 are the same as those presented for claim 1. 

Regarding claim 2, The combination of Biles and Hady teaches all of the limitations of claim 1 above. 
Hady further teaches wherein the instruction comprises a section configured to inform the hardware accelerator to operate in the non-temporal access mode or the temporal access mode (Hady [0026] discloses the cacheability of data (the caching policy) of the instruction may be determined based on the instruction type (e.g. an instruction that specified anon-cached transaction), thus the instruction type of Hady is an example of an instruction that comprises a section (the instruction type) configured to inform the hardware accelerator to operate in a given caching mode (non-temporal or temporal).   Hady [0019] discloses the NPU cores may set the caching mode via a NPU command, thus the section of the NPU command instructing the cache to operate in a specified cache mode/policy is an example of a section configured to inform the hardware accelerator.  )

Regarding claim 3, The combination of Biles and Hady teaches all of the limitations of claim 1 above.   
Hady further teaches wherein the hardware accelerator further comprises: a data access unit, accessing data through the accelerator interface according to the instruction and operating in the non-temporal access mode or the temporal access mode (See Hady [0026] and [0027] that discloses the instruction type, memory type, or programmer direction may determine the caching mode for a given request.   Hady [0019] discloses the NPU cores may set the caching mode via a NPU command.    Thus, the solution of Biles in view of Hady may offload an instruction from Core 4 of Biles to HW Accel 6 of Biles and that instruction may contain a caching policy direction via an instruction type, memory type, or programmer direction that is received through control interface 12 and/or the MMU 10 of Biles. The accelerator operates in the cache mode or bypass mode per the instruction.)
according to a mode signal (Examiner notes that the instant application does not contain an explicit definition of a mode signal.  Using broadest reasonable interpretation, any software field or hardware signal that indicates an appropriate cache mode for a request is an example of a mode signal.   See Hady [0026] and [0027] that discloses the instruction type, memory type, or programmer direction may determine the caching mode/policy for a given request, thus any of these three fields are examples of mode signal that directs the solution of Biles in view of Hady to process the request in the appropriate caching (temporal) or bypass (non-temporal) mode.     See also Hady [0019] ‘For example, in CPU applications a cache typically uses addresses to determine memory type, but in NPU applications the cache may be instructed as to memory type by the NPU command.’    Thus, the NPU command contains a field to instruct the cache as to the appropriate cache mode and is also an example of a mode signal.)
and a monitor unit (Hady [0023] and [0019] that discloses the CPU, NPU, and/or cache 14 may monitor the request to determine the cache policy), monitoring the access behavior that the data access unit operates according to the instruction to generate the mode signal (See Hady [0026] and [0027] that discloses the instruction type, memory type, or programmer direction may determine the caching mode/policy for a given request, thus any of these three fields are examples of access behavior monitored that directs the solution of Biles in view of Hady to process the request in the appropriate caching (temporal) or bypass (non-temporal) mode, where the instruction for the caching mode is an example of a mode signal.  Hady [0019] ‘For example, in CPU applications a cache typically uses addresses to determine memory type, but in NPU applications the cache may be instructed as to memory type by the NPU command.’  Thus, the NPU command instructing the cache on the appropriate cache mode is an example of a mode signal.).

Regarding claim 4, the combination of Biles and Hady teaches all of the limitations of claim 3 above.   
Hady further teaches wherein the hardware accelerator further comprises: a control unit (Hady [0020] ‘NPU cores 20a, 20b, ..., 20k’), receiving the instruction to generate a first control signal and a second control signal (Hady [0019] ‘For example, in CPU applications a cache typically uses addresses to determine memory type, but in NPU applications the cache may be instructed as to memory type by the NPU command.’  Where the first control signal is to operate in either the temporal mode or non-temporal mode, and the second control signal is to operate in the alternate mode (non-temporal or temporal mode respectively depending on how the first control signal is defined)); and 
an execution unit, executing a logical calculation on the data according to a second control signal (Hady [0019] ‘FIGS. 1 B-1C show further details of the bus-based cache architecture, according to exemplary embodiments. In FIG. 1B, each of the NPU cores 22 includes NPU core translation logic (NPU-TL)30 and the CPU core 24 includes CPU core translation logic (CPU-TL) 32. The translation logic 30, 32 translates core-specific memory transactions (such as reads and writes) into core-independent memory transactions that will appear on the bus 26 and that are comprehended by the shared cache 18 without regard for the type of core that initiated them.), 
wherein the data access unit accesses the data through the accelerator interface according to the first control signal (Hady [0019] ‘For example, in CPU applications a cache typically uses addresses to determine memory type, but in NPU applications the cache may be instructed as to memory type by the NPU command.’).  


Regarding claim 9, The combination of Biles and Hady teaches all of the limitations of claim 1 above.  Biles further discloses wherein the monitor is placed inside of the accelerator interface (Biles [0030] and [0033] discloses the shared memory management unit 10 serves to perform memory management operations upon memory accesses to the memory system 8, where the selection of a cache mode (to the cache or bypassing the cache) is an example of a memory management operation since it directs the memory how to process the request.   Thus, the solution of Biles in view of Hady may also monitor and manage the instruction form within the MMU, which is an example of an accelerator interface, which would provide efficiency since 


Regarding claim 11, Hady teaches An access method (Biles, claim 19, discloses a method accessing data values of a memory system),
adapted in a hardware accelerator (Biles Fig. 1,, HW Accel 6 and supporting paras [0030]) ), 
wherein a CPU (Biles Fig. 1 Core 4 and supporting para [0030]) bypasses an accelerator interface to directly access the cache a cache and a host memory (Biles Fig 1, Core 4  and supporting paras [0030]-[0032]that has a direct connection to the Memory system via multiplexers 16 and 18.   Note that this bypass the HW Accel 6, thus bypasses both the hardware accelerator and the hardware accelerator interface.)  and generates at least on instruction, (Biles [0031] discloses control signals, where a control signal is an example of an instruction.)
comprising: receiving , by the hardware accelerator through the accelerator interface the instruction from the CPU (Biles [0031] discloses control signals, where a control signal is an example of an instruction, sent between the processor (an example of a CPU) and the accelerator through communication channel 12, where the receiving component of HW Accel 6 is an example of an accelerator interface.)
the instruction received from the CPU through the accelerator interface (Biles [0031] discloses control signals, where a control signal is an example of an instruction, sent between ,
However, Biles does not explicitly disclose operating, by the hardware accelerator,  in a non-temporal access mode or a temporal access mode according to an access mode of the instruction when operating in the non-temporal access mode, accessing the host memory through the accelerator interface; and when operating in the temporal access mode, accessing the cache through the  accelerator interface, wherein the step of operating in the non-temporal access mode or the temporal access mode according to the access behavior of the instruction further comprises: monitoring multiple address of the access behavior that the hardware accelerator executes according to the instruction to determine whether the hardware accelerator operates in the non-temporal access mode or the temporal access mode;
Hady, of a similar field of endeavor, further teaches operating, by the hardware accelerator,  in a non-temporal access mode  (Hady [0026] ‘With this feature at least one of the heterogeneous processor cores, e.g., the NPU core, is capable of generating reads and writes to the main memory 14 that bypass the shared cache 16’.   See also Hady [0016[ that disclose the NPU cores perform acceleration, for example providing networking algorithms like Intrusion Detection, Firewalling, Secure Sockets Layer acceleration.    Thus, the NPU cores of Hady would be a component of HW Accel 6 of Biles and operating in a non-temporal access mode which is access the memory through the accelerator interface.) 
or a temporal access mode according to an access mode of the instruction (Hady [0002] ‘Caches capitalize on this locality by fetching data from main memory in larger chunks than requested (spatial locality) and holding onto the data for a period of time even after the processor has used that data (temporal locality).’ Thus, the NPU cores of Hady would be a component of HW Accel 6 of Biles and operating in a temporal access mode which is accessing the cache through the accelerator interface.   See also Hady [0027] that discloses this may be a mode controlled by the programmer, thus would be provided via an instruction from the programmer.)
when operating in the non-temporal access mode, accessing the host memory through the accelerator interface (Hady [0026] discloses a non-temporal access mode, the accelerator (including the accelerator interface) accessing the memory directly); and 
when operating in the temporal access mode, accessing the cache through the  accelerator interface (Hady [0002] discloses a temporal access mode, accessing the cache through the accelerator interface. ) 
wherein the step of operating in the non-temporal access mode or the temporal access mode according to the access behavior of the instruction (Hady [0026] ‘Cacheability of data involved in a memory transfer may be determined based on instruction type (e.g., an instruction that specifies a non-cached transaction) and/or based on memory type, e.g., as specified in a Memory Type Range Register (MTTR). With this feature at least one of the heterogeneous processor cores, e.g., the NPU core, is capable of generating reads and writes to the main memory 14 that bypass the shared cache 16 in the event of a cache miss.’ See also Hady [0027] ‘For example, some special purpose processors, such as network processors, he programmer knows to have very poor temporal and spatial locality. The same may be true for some accesses by the general-purpose processor.  Accesses that the programmer knows will not hit cache can be routed around that cache, increasing the cache hit rate for other accesses.’   Thus Hady discloses setting the cache mode based on a NPU command, or may include caching instructions for a given read or write command as to how that command is to be handled (either cached, or directed to the memory itself) which is set by the programmer based on the temporal and spatial locality (a form of behavior).) 
further comprises: monitoring multiple address of the access behavior that the hardware accelerator executes according to the instruction to determine whether the hardware accelerator operates in the non-temporal access mode or the temporal access mode( Hady [0026] ‘Cacheability of data involved in a memory transfer may be determined based on memory type, e.g., as specified in a Memory Type Range Register (MTTR). With this feature at least one of the heterogeneous processor cores, e.g., the NPU core, is capable of generating reads and writes to the main memory 14 that bypass the shared cache 16 in the event of a cache miss.’  Thus, Hady monitors a range of addresses specified by the MTTR to determine if the command should be in temporal or non-temporal access mode.   Examiner notes that a range implies at least two addresses, thus represents a plurality of addresses.);
The reasons for obviousness regarding claims 12-17 and 19 are the same as those presented for claim 11. 

Regarding claim 12, The combination of Biles and Hady teaches all of the limitations of claim 11 above.   
Hady further teaches wherein the instruction comprises a section, wherein the step of operating in the non-temporal access mode or the temporal access mode according to the access behavior of the instruction further comprises: retrieving the section of the instruction to operate in the non-temporal access mode or the temporal access mode according to the access behavior of the instruction further comprises:(See Hady [0026] and [0027] that discloses the instruction type, memory type, or programmer direction may determine the caching mode/policy for a given request.  See also Hady [0019] ‘For example, in CPU applications a cache typically uses addresses to determine memory type, but in NPU applications the cache may be instructed as to memory type by the NPU command.’    Thus, the NPU command contains a field to instruct the cache as to the appropriate cache mode through an NPU command, which the cache retrieves when it receives the NPU command to set the caching policy.)
retrieving the section of the instruction to operate in the non-temporal access mode or the temporal access mode (See Hady [0027], [0027], and [0019] where a NPU may issue a NPU command to the cache instructing it on how to handle the request, and the cache retrieves the instruction from the NPU command (an example of a section of the instruction to operate in the non-temporal access mode or the temporal access mode so that it may handle the request appropriately (to the cache, or by-passing the cache and access the memory directly).).

Regarding claim 13, The combination of Biles and Hady teaches all of the limitations of claim 11 above.  
Hady further teaches wherein the step of operating in the non- temporal access mode or the temporal access mode according to the access behavior of the instruction further comprises: monitoring an access behavior of a data access unit to generate a mode signal; (See Hady [0026] and [0027] that discloses the instruction type, memory type, or programmer direction may determine the caching mode/policy for a given request, thus any of these three fields are examples of access behavior monitored that directs the solution of Biles in view of Hady to process the request in the appropriate caching (temporal) or bypass (non-temporal) mode, where the instruction for the caching mode is an example of a mode signal.  Hady [0019] ‘For example, in CPU applications a cache typically uses addresses to determine memory type, but in NPU applications the cache may be instructed as to memory type by the NPU command.’  Thus the NPU command instructing the cache on the appropriate cache policy/mode is an example of a generated mode signal directing the cache to process the request in the host memory (non-temporal access mode, or cache (temporal access mode).).
and operating the data access unit in the non-temporal access mode or the temporal access mode according to the mode signal. (Hady [0026], [0027], [0019] discloses an NPU generating the NPU command instructing the cache on the appropriate cache policy/mode and where the cache receives the NPU command instructing it to operate according to a given cache mode/policy and implements the policy according to the instruction). 

Regarding claim 14, The combination of Biles and Hady teaches all of the limitations of claim 13 above.  
Hady further teaches wherein the step of operating in the non-temporal access mode or the temporal access mode according to the access behavior of the instruction further comprises: receiving the instruction to generate a first control signal and a second control signal (Hady [0019] ‘For example, in CPU applications a cache typically uses addresses to determine memory type, but in NPU applications the cache may be instructed as to memory type by the NPU command.’  Where the first control signal is to operate in either the temporal mode or non-temporal mode, and the second control signal is to operate in the alternate mode (non-temporal or temporal mode respectively).  See also Hady [0026] ‘Cacheability of data involved in a memory transfer may be determined based on instruction type (e.g., an instruction that specifies a non-cached transaction) and/or based on memory type, e.g., as specified in a Memory Type Range Register (MTTR). With this feature at least one of the heterogeneous processor cores, e.g., the NPU core, is capable of generating reads and writes to the main memory 14 that bypass the shared cache 16 in the event of a cache miss.); 
accessing, using the data access unit, data through the accelerator interface according to the first control signal (Hady [0019] demonstrates that you may execute a NPU instruction to determine that a temporal or non-temporal instruction is to be executed (the mode selected according to the first control signal); and
executing a logical calculation on the data according to the second control signal (Hady [0019] ‘FIGS. 1 B-1C show further details of the bus-based cache architecture, according to exemplary embodiments. In FIG. 1B, each of the NPU cores 22 includes NPU core translation logic (NPU-TL)30 and the CPU core 24 includes CPU core translation logic (CPU-TL) 32. The translation logic 30, 32 translates core-specific memory transactions (such as reads and writes) into core-independent memory transactions that will appear on the bus 26 and that are comprehended by the shared cache 18 without regard for the type of core that initiated them.’)


Claims 5-7, and 15-17  are rejected under U.S.C. 103 as being unpatentable over Biles in view of Hady and further in view of Akin (Akin et al., US 2018/0285279 A1).

Regarding claim 5, the combination of Biles and Hady teaches all of the limitations of claim 3 above.    
Hady further teaches wherein the monitor unit records a previously-accessed address that the data access unit previously accessed according to a previous instruction and determines a difference value between a currently-accessed address that the data access unit is currently accessing according to a current instruction and the previously-accessed address (Hady [0002] discloses that caches capitalize on locality by fetching data from main memory in larger chunks than requests (spatial locality) and holding onto the data for a period of time even after the processor has used that data (temporal locality).   This prefetching of data, including data outside the range of the request, is maintained so that future requests may be serviced from this cached data if the future request is within the address range of the prefetched data.  The previously accessed address would be the address portion of the prefetched data that was previously requested and maintained after the data was used 
the monitor unit  (Hady [0023] and [0019]) generates the mode signal, so that the data access unit operates in the non-temporal access mode... the monitor unit generates the mode signal, so that the data access unit operates in the temporal access mode, (Hady [0026] and [0027] that discloses the instruction type, memory type, or programmer direction may determine the caching mode/policy for a given request, thus any of these three fields are examples of mode signal.   See also Hady [0019] that discloses a NPU command contains a field to instruct the cache as to the appropriate cache mode and is also an example of a mode signal.).
However, the combination of Biles and Hady does not explicitly teach wherein when the difference value exceeds the predetermined length ... data access unit operates in the non-temporal access mode,  wherein when the difference values does not exceed the predetermined length, the monitor unit .. operates in the temporal access mode.
Akin, of a similar field of endeavor, further teaches wherein when the difference value exceeds the predetermined length ... data access unit operates in the non-temporal access mode,  wherein when the difference values does not exceed the predetermined length, the monitor unit .. operates in the temporal access mode (Akin [0015] discloses data exhibits locality is written to the cache, and nonlocality is written to main memory.  (Akin [0062] 
Biles, Hady, and Akin are in a similar field of endeavor as all rate to caching memory, and using caching policies that direct a memory request to the cache or bypassing the cache and going directly to main memory.  Thus, it would have been obvious to a person of ordinary skill in the art before the effectively filing date of the claimed invention to incorporate the spatial and temporal threshold limit for a plurality of sub-regions of a cache line into the solution of Biles and Hady that also implements spatial and temporal caching policies.   One would be motivated to do so in order to (Akin [0035] and [0062[) exploit locality while simplifying cache design.

Regarding claim 6, the combination of Biles and Hady teaches all of the limitations of claim 3 above.    
Hady further teaches wherein the monitor unit records an initial address that the data access unit initially accessed according to an initial instruction and monitors a predetermined range from the initial address (Hady [0002] discloses that caches capitalize on locality by fetching data from main memory in larger chunks than requests (spatial locality) and holding onto the data for a period of time even after the processor has used that data (temporal locality).   This prefetching of data, including data outside the range of the request, is 
However, the combination of Biles and Hady does not explicitly teach wherein the monitor unit divides the predetermined range into a plurality of sub-regions and counts an access number of times of each sub-region, wherein when a currently-accessed address that the data access unit is currently accessing according to a current instruction exceeds a sum of the initial address and the predetermined range, the monitor unit counts a number of the sub-regions that the access number of times of each sub-region exceeds a first threshold, wherein when the number of the sub-regions that the access number of times of each sub- region exceeds the first threshold does not exceed a second threshold, the monitor unit generates the mode signal, so that the data access unit operates in the non-temporal access mode.
Akin, of a similar field of endeavor, further teaches wherein the monitor unit divides the predetermined range into a plurality of sub-regions (Akin [0002] ‘newer memory interfaces provide fine-gran memory access capabilities, namely memory access less than a given memory line or cache line width’.  Akin [0036] ‘In embodiments, processor 202 includes a addresses of fine-grained access in this buffer’. Multiple sub-regions may be tracked in the SAB.  Akin [0064] ‘If a full address (e.g., cache line address and block offset) is provided to the memory controller, then the SAB can be used to track both spatial and temporal locality. For example, assuming 8 byte (B) tracking granularity, for each 64B cache line, a SAB entry may include 8 confidence counters (note that a confidence counter can be as low as 1’.  Thus, a 64B region is divided into 8 B regions, each which is tracked by a separate counter in the SAB.) and  
counts an access number of times of each sub-region (Akin [0064] ‘For ach cache line, the sum of all counters may be combined to give the temporal locality confidence, and adjacent counters with non-zero confidence values may indicate special locality confidence for the bypassed addresses.’  Thus, Akin counts the access of each sub-region, as well as sums the counts of each sub-region to generate a region count.), 
wherein when a currently accessed address that the data access unit is currently accessing according to a current instruction exceeds a sum of the initial address and the predetermined range (Akin [0036] ‘When a NL load reaches the memory controller, the memory controller looks up the buffer.  A hit to a corresponding entry in SAB 245 implies locality (such as where data previously bypassed caches, but the line is now re-referenced).’  Where the initial address is the address of the first sub-region in the SAB and the predetermined range is the address size described by the SAB (64B in the above example).), 
the monitor unit counts a number of the sub-regions that the access number of times of each sub-region exceeds a first threshold (Akin [0046] ‘For each cache line, the sum of all counters may be combined to give the temporal locality confidence, and adjacent counters with non-zero confidence values may indicate spatial locality confidence for the bypassed addresses. The threshold for promoting an access to a regular full memory access is determined using these temporal/spatial confidence counters. In the simplest version, tracking granularity is matched to the cache line size and a single one-bit counter is used.  Hence, in this simplest configuration, an access is promoted to a full access after the first hit in the SAB (either temporal hit to the same chunk or a spatial hit to another chunk in the same cache line’.), 
wherein when the number of the sub-regions that the access number of times of each sub- region exceeds the first threshold does not exceed a second threshold (Akin [0046] ‘Hence, in this simplest configuration, an access is promoted to a full access after the first hit in the SAB (either temporal hit to the same chunk or a spatial hit to another chunk in the same cache line’.  In this simplest configuration, the first threshold is 0 since the system sends the message in non-locality mode, and the second threshold is 2 accesses.  Note the first access resulted in the creation of the SAB entry and the count is 1, but the system remains in non-locality mode, and the second access resulting in a hit and a counter of the present value being > 1, which is added to the next read for an access count of 2, resulting in the system moving to locality mode.), 
the monitor unit generates the mode signal (Akin [0031] ‘Referring still to FIG. 2, processor 202 includes a decode unit or decoder 230.  Decode unit 230 may receive and decode no-locality hint memory access instructions 214.  Decode unit 230 may output one or more .. control signals, or other relatively lower-level instructions or control signals that reflect, represent, and/or are derived from the no-locality hint memory access instructions.‘, thus each  
so that the data access unit operates in the non-temporal access 12 mode (Akin [0046] ‘In the simplest version, tracking granularity is matched to the cache line size and a single one bit counter is used. Hence, in this simplest configuration, an access is promoted to a full access after the first hit in the SAB (either temporal hit to the same chunk or a spatial hit to another chunk in the same cache line).’  In this simple example the first threshold is equal to the value 0 (the system sends the message in non-temporal mode) and the second threshold is 2 accesses since any single access to accesses.  
The motivation to combine Akin into the combination of Biles and Hady is the same as set forth in claim 5 above.


Regarding claim 7, The combination of Biles, Hady, and Akin teaches all of the limitations of claim 6 above.  
Akin further teaches wherein when the number of the sub-regions that the access number of times of each sub-region exceeds the first threshold exceeds the second threshold, the monitor unit generates the mode signal so that the data access unit operates in the temporal access mode (Akin [0046] ‘For each cache line, the sum of all counters may be combined to give the temporal locality confidence, and adjacent counters with non-zero confidence values may indicate spatial locality confidence for the bypassed addresses. The threshold for promoting an access to a regular full memory access is determined using these temporal/spatial confidence counters.   Akin [0046] ‘Hence, in this simplest configuration, an access is promoted to a full access after the first hit in the SAB (either temporal hit to the same chunk or a spatial hit to another chunk in the same cache line’.  Thus, the system of Akin operates in temporal mode upon the second reference to the same sub-region.).  
The motivation to combine Akin into the existing combination is the same as set forth in claim 6 above.  


Regarding claim 15, The combination of Biles and Hady teaches all of the limitations of claim 13 above.    
Hayes further teaches wherein the step of monitoring the access behavior of the data access unit to generate the mode signal (See Hady [0026] and [0027] that discloses the instruction type, memory type, or programmer direction may determine the caching mode/policy for a given request, thus any of these three fields are examples of access behavior monitored that directs the solution of Biles in view of Hady to process the request in the appropriate caching (temporal) or bypass (non-temporal) mode, where the instruction for the caching mode is an example of a mode signal.  Hady [0019] ‘For example, in CPU applications a cache typically uses addresses to determine memory type, but in NPU applications the cache may be instructed as to memory type by the NPU command.’  Thus the NPU command instructing the cache on the appropriate cache policy/mode is an example of a generated mode .
The remainder of claim 15 recites limitations described in claim 5 above, and thus are rejected on the teachings and rationale as described in claim 5 above. 


Regarding claim 16, The combination of Biles and Hady teaches all of the limitations of claim 13 above.
Hayes further teaches wherein the step of monitoring the access behavior of the data access unit to generate the mode signal further comprises: (See Hady [0026], [0027], and [0019] as detailed in claim 15 above).
The remainder of  claim 16   recites limitations described in claim 6 above, and thus are rejected on the teachings and rationale as described in claim 6 above.

Regarding claim 17, The combination of Biles, Hady, and Akin teaches all of the limitations of claim 16 above.
Hayes further teaches wherein the step of monitoring the access behavior of the data access unit to generate the mode signal further comprises: (See Hady [0026], [0027], and [0019] as detailed in claim 15 above).
The remainder of  claim 17   recites limitations described in claim 7 above, and thus are rejected on the teachings and rationale as described in claim 7 above.



Claims 10 and 19 are rejected under U.S.C. 103 as being unpatentable over Biles in view of Hady  and further in view of Gaur (Gaur et al., US 2016/0179387 A1).

Regarding claim 10, The combination of Biles and Hady teaches all of the limitations of claim 1 above.   However, the combination does not explicitly teach wherein when operating in the non- temporal access mode, the hardware accelerator directly writes writing data of the instruction to the host memory through the accelerator interface and invalidates data corresponding to the same address of the writing data in the cache.
Gaur, of a similar field of endeavor, further teaches wherein when operating in the non- temporal access mode, the hardware accelerator directly writes writing data of the instruction to the host memory through the accelerator interface and invalidates data corresponding to the same address of the writing data in the cache (Gaur [0174] ‘In yet another embodiment, if MMU 1814 determines that L4 Cache 1908 write bandwidth is limited (as shown at (C)), MMU 1814 may dynamically partition requests to distribute loads to or from system memory 1910 to take advantage of available bandwidth therein and lessen demands upon L4 Cache 1908 write bandwidth. For example, MMU 1814 may perform a write-bypass operation. Such an operation may be illustrated in FIG. 20B. The write to L4 Cache 1908 may be skipped or delayed and the data written directly to system memory 1910. If the write to L4 Cache 1908 is skipped, then MMU 1814 may invalidate the copy of the respective line in L4 Cache 1908 (if it is present) to maintain cache coherency and correctness.).  
maintain cache coherency and correctness’ as otherwise the cached memory would be out of date when the date write bypasses the cache).  

Regarding 19, the combination of Biles and Hady teaches all of the limitations of claim 11 above.  However, the combination does not explicitly teach wherein when operating in the non- temporal access mode, the hardware accelerator directly writes writing data of the instruction to the host memory through the accelerator interface and invalidates data corresponding to the same address of the writing data in the cache.
Gaur, of a similar field of endeavor, further teaches wherein when operating in the non- temporal access mode, the hardware accelerator directly writes writing data of the instruction to the host memory through the accelerator interface and invalidates data corresponding to the same address of the writing data in the cache (Gaur [0174] ‘In yet another embodiment, if MMU 1814 determines that L4 Cache 1908 write bandwidth is limited erform a write-bypass operation. Such an operation may be illustrated in FIG. 20B. The write to L4 Cache 1908 may be skipped or delayed and the data written directly to system memory 1910. If the write to L4 Cache 1908 is skipped, then MMU 1814 may invalidate the copy of the respective line in L4 Cache 1908 (if it is present) to maintain cache coherency and correctness.).  
Biles, Hady,  and Gaur are in a similar field of endeavor as all three relate to caching memory, and more specifically to caching memory with a bypass mode.  Thus, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the cache block invalidation as described by Gaur into the cache system of Biles and Hady.  One would be motivated to do so in order to (Gaur [0174]) ‘maintain cache coherency and correctness’ as otherwise the cached memory would be out of date when the date write bypasses the cache).  





Response to Remarks
Examiner thanks applicant for their remarks of December 3, 2020 and their Request for Continued Examination  January 18, 2021.   They have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of newly cited Biles US 2009/0216958 A1 to address the newly amended limitations.



Conclusion                                                                                                                                                            
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JANICE M. GIROUARD whose telephone number is (469)295-9131.  The examiner can normally be reached on M-F 9:30 - 7:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached on 571-270-7519.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/J.M.G./Examiner, Art Unit 2138                                                                                                                                                                                           
/William E. Baughman/Primary Examiner, Art Unit 2138