DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
	Applicant’s response and amendment on 3/4/2021 have been fully considered. Claims 4, 18 have been canceled. Claims 7, 8 are allowable over the art of record. The dependent Claim 20 as indicated in the previous action includes allowable subject matter. Clams 21, 22 are newly added and are grouped with the rejection of claims 12, 13. 
Applicant’s claim interpretation and citations of the specification are considered by examiner. Examiner appreciates with the specific details of the specification provided by applicant which may not necessarily be in the claim. Though understanding the claim language may be aided by explanations contained in the written description, it is important not to import into a claim limitations that are not part of the claim. For example, a particular embodiment appearing in the written description may not be read into a claim when the claim language is broader than the embodiment." Superguide Corp. v. DirecTV Enterprises, Inc., 358 F.3d 870, 875, 69 USPQ2d 1865, 1868 (Fed. Cir. 2004). Applicant is also reminded that unclaimed features cannot be used to overcome the prior art (e.g. see CCPA In re Lundenberg & Zuschlag, 113, USPQ 530, 534 (1957)). 
The rejection of Claim(s) 1-3,5,6, 14-17,19 rejected under 35 U.S.C. 102a (1) as being anticipated by Kim et al. (20120137074) has been maintained. The rejection of claims 1, 2, 9, 10, 11, 12, 13, 14, 15, 16 on nonstatutory double patenting over the copending application 
This action addresses all claims including the amended and new clams and examiner’s response to applicant’s remarks.
In the remarks, applicant argued that:
a) Kim's stream buffer copy instruction does not specify whether to preload the level of the cache memory for a subsequent read or a subsequent write as recited.
b) Claims 9-13 depend from and further limit independent claim 1, and as explained above, cited portions of Kim do not teach all of the elements of amended claim 1. The additional cited references of Collard, Pier, and Varadarajan do not remedy these deficiencies and are not cited for this purpose.
Examiner’s Response
As to applicant’s remark a) above, applicant’s remark is referring to the newly amended feature of: 
“receiving an instruction that specifies a base address, a data size, a level of a cache memory to operate on, and whether to preload the level of the cache memory for a subsequent read or a subsequent write;”  (see amended claim 1)
Kim as set forth in Non-Final action already  teaches receiving an instruction [Stream buffer copy instruction S_BUFF_CP SRC, DST[LVL]] that specifies a base address [SRC address/DST address], a data size [byte, 32 bytes, 64 bytes, 128 bytes, vector register width] (implicitly set, see para [0034]), and a level of a cache [LVL] memory to operate on (see fig.5 for general format of a stream buffer copy instruction that implements copying data among 
[0033] “The stream buffer copy instruction 510 reads data from the SRC memory 
address and write or copy the data to the DST memory address at the desired LVL 
in one embodiment of the invention.” (emphasis added)
[0036] “the desired LVL of the DST memory address can be, specified by a user, generated by a compiler, detected by hardware logic, and the like.” (emphasis added)
Therefore, Kim teaches the amended feature of whether to preload (e.g. the LVL being specified by the user in the Stream buffer copy instruction: [S_BUFF_CP SRC, DST[LVL]]  as taught in [0036] ) the level [LVL] of the cache memory [DST  memory] for a subsequent read (See fig.8,para [0048][0050], shows the algorithm for the stream buffer copy instruction starting from receiving the instruction at step [810]; see a subsequent read cache data in fig.8 step [832]) or a subsequent write (see a subsequent write the data to the destination memory at equal or higher level at step [864]), as claimed.
As to applicant’s remark b) above, no specific argument(s) regarding the teachings of Collard, Pier, and Varadarajan by applicant can be found in the applicant’s response. The following rejections are maintained:
1) Claims 9,10 under 35 U.S.C. 103 as being unpatentable over Kim et al. (20120137074) in view of Collard et al. (7296136).
2) Claim 11 under 35 U.S.C. 103 as being unpatentable over Kim et al. (20120137074) in view Peir et al. (20040268054).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, 5, 6, 14-17, 19 is/are rejected under 35 U.S.C. 102a (1) as being anticipated by Kim et al. (20120137074).
As to amended claim 1, Kim teaches a method comprising: 
receiving an instruction [Stream buffer copy instruction S_BUFF_CP SRC, DST[LVL]] that specifies a base address [SRC address/DST address], a data size [byte, 32 bytes, 64 bytes, 128 bytes, vector register width] (implicitly set, see para [0034]), and a level of a cache [LVL] memory to operate on (see fig.5 for general format of a stream buffer copy instruction that implements copying data among caches. See the SRC memory address in [0034]. See LVL cache level, such as L1$ and L2$ in [0033]), and whether to preload (e.g. the LVL being specified by the user in the Stream buffer copy instruction: [S_BUFF_CP SRC, DST[LVL]]  as taught in [0036] ) the 
determining, based on the base address [SRC address/DST address] and the data size [byte, 32 bytes, 64 bytes, 128 bytes, vector register width] (implicitly set, see para [0034]), a set of addresses [0x100][0x10] associated with the instruction [Stream buffer copy instruction S_BUFF_CP SRC, DST[LVL]] ; and 
issuing a set of cache preload operations (e.g. the load and store) to the cache memory that includes a cache preload operation (e.g. to copy to one level of cache using the Stream buffer copy instruction S_BUFF_CP SRC, DST[LVL]) for each address in the set of addresses [source address/dest address] (see microoperaton [520] to load from SRC address and microoperation [530] to store to DST address in [0037][0038]. See also one example of the stream buffer copy instruction specifying L1 cache in fig.6, and another example for L2 cache in fig.7, para [0042][0043]. See the copy instruction is referring to the load instruction and store instruction in [0037]).
As to claim 2, Kim teaches wherein the cache memory includes a level 2 (L2) cache and a level 3 (L3) cache, and the instruction specifies whether to operate on the L2 cache or the L3 cache.(See para [0019], the stream buffer can be created in, but not limited to, the L1 data cache memory 250, the level two (L2) data cache memory, the level three (L3) data cache memory, the main memory or any memory module. See also para [0020], the execution logic 235 executes an instruction to copy data from a source memory address to a destination 
As to claim 3, wherein the cache memory includes a level 1 (L1) cache and a level 2 (L2) cache, and the set of cache preload operations are issued to the L2 cache via a data path that does not include the L1 cache (see the copy instruction is applied between the L2 cache and the main memory which does not include L1 cache in fig.7, para [0042]).
As to amended claim 5, Kim teaches further comprising, when the instruction [stream buffer copy instruction 610] specifies to preload (e.g. read/load) the level of the cache memory for subsequent read (see para [0041], the execution logic reads the data 630 from the SRC memory address of 0x100 in one embodiment of the invention.  In one embodiment of the invention, the data 630 is accessed using a load instruction), the set of cache preload operations request to preload the cache memory in a shared access mode (e.g. shared by the cache line n-2 and cache line n-3, see para [0041], the execution logic writes the data 630 to the DST memory address 0x10 in the cache line n-2 of the L1 data cache memory 620. The data 630 was in the cache memory line n-3 as taught in para [0040]).
As to amended claim 6, Kim teaches when the instruction [stream buffer copy instruction 610] specifies to preload the level of the cache memory for subsequent  write (see para [0040], when the stream buffer copy instruction 610 is executed, the execution logic first checks if the SRC memory address of 0x100 is cached in the L1 data cache memory 620 or the level of cache memory that is closest to the processing core), the set of cache preload operations request to preload the cache memory in an exclusive access mode [L1 cache].(see 
As to amended claim 14, Kim teaches a system comprising (see fig.3): 
a processor [processing core 320/330]; 
a memory [level 2 cache 324/334] coupled to the processor [processing core 320/330] that includes a cache memory [level 2 cache 324/334] (fig.3); and 
a memory component [execution logics and associated interconnections] (see the execution logic executes the control program to receive the copy instruction and determines if data at the source memory address is cached in any of the cache memory lines in fig.8, para [0046]. Note: the execution logics should have interconnections, although not explicitly shown, with the processor core and the L2 cache memory for the purpose of receiving the copy instruction and determining the L2 cache data source) coupled to the processor [processing core 320/330]  and the memory [level 2 cache 324/334], wherein the memory component is operable to: 
receive an instruction [Stream buffer copy instruction S_BUFF_CP SRC, DST [LVL]] from the processor [processing core 320/330] that specifies a level [LVL] of the cache memory to preload (see LVL cache level, such as L1$ and L2$ in [0033]; see the SRC memory address in [0034]. See also one example of the stream buffer copy instruction specifying L1 cache in fig.6, and another example for L2 cache in fig.7, para [0042][0043]), and whether to preload (e.g. the LVL being specified by the user in the Stream buffer copy instruction: [S_BUFF_CP SRC, DST[LVL]]  as taught in [0036] ) the level [LVL] of the cache memory [DST  memory] for a subsequent read (See fig.8,para [0048][0050], shows the algorithm for the stream buffer copy 
determine a set of addresses [0x100] [0x10] associated with the instruction [Stream buffer copy instruction S_BUFF_CP SRC, DST [LVL]] (see para [0039] [0040], fig.6); and 
issue a set of cache preload operations (e.g. see microoperaton [520] to load from SRC address and microoperation [530] to store to DST address in [0037][0038]) to the cache memory [cache] that includes a cache preload operation [load/store] for each address in the set of addresses (See each of the micro-operations 520, 530 of the stream buffer copy instruction in [0037][0038]. Alternatively, see multiples SRC memory addresses can be specified for more than one copy operation in [0035]; each copy operation is a cache preload operation. See the copy instruction is referring to the load instruction and store instruction in [0037]).
As to claim 15, Kim teaches wherein: the instruction further specifies a base address [SRC address/DST address] and an amount of data [byte, 32 bytes, 64 bytes, 128 bytes, vector register width] (implicitly set, see para [0034]); and the memory component is operable to determine the set of addresses [0x100][0x10] based on the base address [SRC address/DST address] and the amount of data [byte, 32 bytes, 64 bytes, 128 bytes, vector register width].
As to claim 16, Kim teaches wherein the cache memory includes a level 2 (L2) cache [L2 cache] and a level 3 (L3) cache [L3 cache], and the instruction specifies whether to operate on the L2 cache or the L3 cache. (See para [0019], the stream buffer can be created in, but not limited to, the L1 data cache memory 250, the level two (L2) data cache memory, the level three (L3) data cache memory, the main memory or any memory module. See also para [0020], 
As to claim 17, Kim teaches wherein the memory component [execution logics and associated interconnections] (see the execution logic executes the control program to receive the copy instruction and determines if data at the source memory address is cached in any of the cache memory lines in fig.8, para [0046]. Note: the execution logics should have interconnections, although not explicitly shown, with the processor core and the L2 cache memory for the purpose of receiving the copy instruction and determining the L2 cache data source) is coupled to the L2 cache [L2 cache] and is operable to issue the set of cache preload operations [load/store] directly to the L2 cache.  (See fig.7 for the copy instruction in the L2 cache, para [0042][0043]).
As to claim 19, Kim teaches wherein the memory component is operable to: when the instruction [stream buffer copy instruction 610] specifies to preload (e.g. read/load) to the level  the cache memory for the subsequent read (see para [0041] , the execution logic reads the data 630 from the SRC memory address of 0x100 in one embodiment of the invention.  In one embodiment of the invention, the data 630 is accessed using a load instruction), issue the set of cache preload operations to preload (e.g. to load) the cache memory [L1 cache] in a shared access mode (e.g. shared by the cache line n-2 and cache line n-3, see para [0041], the execution logic writes the data 630 to the DST memory address 0x10 in the cache line n-2 of the L1 data cache memory 620. The data 630 was in the cache memory line n-3 as taught in para [0040]); and 
(see para [0041], see the execution logic writes the data 630 to the DST memory address 0x10 in the cache line n-2 of the L1 data cache memory 620).
Claims 9, 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (20120137074) in view of Collard et al. (7296136). 
As to claim 9, Kim does not but Collard teaches:
asserting an indicator (e.g. cannot determine load aliases the store) of the set of cache preload operations during the issuing of the set of cache preload operations [load, store] to the cache memory [cache memory 302]; and in response to the indicator (e.g. cannot determine load aliases the store), causing a processor wait (i.e. delay) to execute a second instruction [store instruction]. (See Collard, col.5, lines 26-32, For cases where the program compiler cannot determine whether the second load in the indirect load sequence aliases the store instruction, it may be necessary to delay execution of the store instruction).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to assert an indicator of the set of cache preload operations during the issuing of the set of cache preload operations to the cache memory, in response to the indicator causing a processor wait to execute a second instruction, as claimed , because one 
As to claim 10, Kim does not but Collard teaches wherein:
 the set of addresses includes a virtual address [virtual addresses] (see virtual addresses in TLB in col.3, lines 23-25; see also fig.3 shows a second TLB 310); 
the method further comprises translating the virtual address to a physical address [c bar] (see the TLB is used for translating virtual address into physical address in col.3, lines 23-25; see also the second TLB 310 that translates the virtual address c into physical address c bar in col.4, lines 48-53; the memory location c is a virtual address as taught in col.3, lines 54-55); and 
the issuing of the cache preload operation [load request] associated with the virtual address [c] uses the physical address [c bar].(see the output of second TLB 310 c  bar is the translated physical address/location to both the cache memory and main memory in fig.3).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to  include the set of addresses includes a virtual addresses, translating the virtual address to a physical address, issuing of the cache preload operation associated with the virtual address uses the physical address, as claimed, because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the TLB for translating the virtual addresses to physical addresses as taught by Collard, to a 
Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (20120137074) in view Peir et al. (20040268054). 
As to claim 11, Kim does not but Peir teaches:
determining that an address [entry] of the set of addresses is invalid; and notifying a processor (e.g. via the T bit to indicate to the processor) that the address is invalid. (See fig.3 shows an invalidation type bit T shown at 308 is incorporated into each IHT such as 303 to indicate whether the corresponding IHT entry is a clean-invalidate or dirty-invalidate IHT entry, para [0028]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to determining that an address of the set of addresses is invalid; and notifying a processor that the address is invalid, as claimed, because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the invalidation type bit T to indicate whether the corresponding IHT entry is a clean-invalidate as taught by Peir, to a known device/method, such as the  Kim’s cache preload instruction [Stream buffer copy instruction S_BUFF_CP SRC, DST[LVL]] that specifies the base address and the data size, in order to indicate the invalid entry (see Peir, para [0028]. MPEP 2143 KSR Example D).
Claims 12, 13, 21(new), 22(new) is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (20120137074) in view of Varadarajan et al. (20070043975).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to  determine an address of the set of addresses causes a page fault, as claimed, because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the determination of  the address of a faulty page as taught by Varadarajan, to a known device/method, such as the  Kim’s cache preload instruction [Stream buffer copy instruction S_BUFF_CP SRC, DST[LVL]] that specifies the base address and the data size, in order to determine the address of the faulty page (see Varadarajan, para [0047]. MPEP 2143 KSR Example D).
As to claim 13, Kim does not but Varadarajan teaches:
saving a state [check point] of the set of cache preload operations prior to the page fault (see para [0044], a process is saved at a series of check points so that, in the event of a system failure, the process can be restarted from the last saved check point. See also a check point may comprises a back-up of data pages associated with the process for the introductory teaching in [0020]); 
moving a new page of data into the cache memory (see the data page from the previous 
check point is copied to a new location, fig. 5 step s8; para [0047]); and 
 the processor faults occur on the processor's cache lines in para [0030]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to  save a state of the set of cache preload operations prior to the page fault, moving a new page of data into the cache memory; and resuming the set of cache preload operations using the saved state, as claimed, because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the resuming the process (e.g. read/write) at the last saved check point as taught by Varadarajan, to a known device/method, such as the  Kim’s cache preload instruction [Stream buffer copy instruction S_BUFF_CP SRC, DST[LVL]] that specifies the base address and the data size, in order to restart the process from the last saved check point  (see Varadarajan, para [0047]. MPEP 2143 KSR Example D).
As to newly added claim 21, Kim does not but Varadarajan teaches detect that a first address of the set of addresses causes a page fault (see fig.5 determination of address of page fault [s1]); 
In response to the page fault [s1] (see the system determines the address of faulty page at the step s1 in para [47]),
saving a current state [check point] of the set of cache preload operations (see para [0044], a process is saved at a series of check points so that, in the event of a system failure, the process can be restarted from the last saved check point. See also a check point may 
resuming (e.g. restarting) the set of cache preload operations (e.g. read-call) using the current state [check point] after the page fault is resolved (see the process is restarted from the last saved check point in para [0047], fig.5 step s5. See also the processor faults occur on the processor's cache lines in para [0030]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to  detect that a first address of the set of addresses is associated with a page fault, in response to the page fault, save a current state of the set of cache preload operations; and resume the set of cache preload operations using the current state after the page fault is resolved, as claimed (see details of claim mapping above), because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the determination of  the address of a faulty page, saving the check point, and restarting the preload, as taught by Varadarajan, to a known device/method, such as the  Kim’s cache preload instruction [Stream buffer copy instruction S_BUFF_CP SRC, DST[LVL]] that specifies the base address and the data size, in order to determine the address of the faulty page and restart the process from the last saved check point   (see Varadarajan, para [0047]. MPEP 2143 KSR Example D).
As to newly added claim 22, Kim does not but Varadarajan teaches in response to the page fault (see fig.5 determination of address of page fault [s1]), notifying a processor of the page fault [s1] (see the system determines the address of faulty page at the step s1 in para [47]).
.
Allowable Subject Matter
Claim 20 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and pending on all applicable double patenting rejections in this action. None of the prior art of record teaches:
a)  The determination of whether each cache preload operation of the set of cache preload operations is directed to cacheable memory; and when a first cache preload operation of the set of cache preload operations is directed to non-cacheable memory, drop the first cache preload operation while proceeding with a remainder of the set of cache preload operations. (Claim 20).
Claims 7, 8 are allowable over the art of record. None of the prior art of record additionally teaches:
a) for each cache preload operation in the set of cache preload operations, determining whether the privilege level permits preload of the respective address of the cache preload operation; and when the privilege level does not permit preload of a first cache preload operation of the set of cache preload operations, dropping the first cache preload operation while proceeding with a remainder of the set of cache preload operations. (Claim 7)
b) The determination of whether each cache preload operation of the set of cache preload operations is directed to cacheable memory; and when a first cache preload operation of the set of cache preload operations is directed to non-cacheable memory, drop the first 
The prior art made of record in the previous action and not relied upon is considered pertinent to applicant's disclosure.  
a) Barreh et al. (7185178) is cited for the teaching of translating the virtual address to physical address at the page level (see col.8, lines 59-67, col.9, lines 1-9).
b)  Diefendorff et al. (7480769) is cited for the teaching of cache level indicator [802] that specifies the cache levels (col.12, lines 60-67, col.13, lines 1-6).
All the references cited in this action were already cited in the previous Non-Final Action.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL H PAN whose telephone number is (571)272-4172.  The examiner can normally be reached on M-F 8:30 am -5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 571 272 4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


DANIEL H. PAN
Examiner
Art Unit 2182