DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes amendments to claims 1, 2, 4, 8, 9, 11, and 14-16 filed on 04/07/2021. Claims 1-2, 4-9, 11-6, and 18-20 are now pending.

Objection to the Specification
Examiner notes Applicant’s amendments to specification. The amendments to the specification do not add any new matter and as such are accepted by the Examiner; therefore, all previous objections pertaining to the specification are withdrawn.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1-2, 4-9, 11-16, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eichenberger (US 20120210073 A1) in further view of Turner (US 20180336133 A1) and Mirhosseininiri (US 20190243654 A1).

Regarding Claim 1, Eichenberger teaches a method for maintaining cache coherency, the method comprising: storing a set of data blocks in the cache line of the main cache memory ([Eichenberger Fig. 3A-C, 0026, 0031] describes shared cache memory device (i.e. main cache memory) 315; depicts each cache line having a plurality of blocks a1-a3), the set of data blocks associated with a main process ([Eichenberger Fig. 1, 0027-0028] describes software program region with no data dependency that can be run in parallel (the program region running corresponds to the main process); also describes cache line blocks are updated as a result of program region running (i.e. data blocks are associated with the main process)); storing a first local copy of the set of data blocks in a first local cache memory of a first processor ([Eichenberger Fig. 3A, 0031] blocks a0-a3 are stored in cache line 330 of processor core 0), of the set of two or more processors ([Eichenberger Fig. 3A] depicts two processor cores: 0 and 1), wherein the first processor is configured to modify data within a first data block of the first local copy without modifying data in other data blocks of the set of data blocks of the first local copy ([Eichenberger Fig.3A, 0031] processor core 0 updates block a0 (355) in the cache line 330 without updates data blocks a1-a3); storing a second local copy of the set of data blocks in a [Eichenberger, Fig. 3B, 0031] processor core 1 has data blocks a0-a3 in corresponding cache line 330), of the set of two or more processors ([Eichenberger Fig. 3A] depicts two processor cores: 0 and 1); executing, on the first processor, a first child process of the main process ([Eichenberger 0027] a subset of the software program is run in each processor core, in parallel) to generate first output data ([Eichenberger 0027] each processor can make a change in its local cache memory device while running the software region in parallel); writing the first output data to the first data block of the first local copy as a write through; writing the first output data to the first data block of the main cache memory as a part of the write through ([Eichenberger Fig. 3A, 0031] a0 is written through to both local cache 305 and shred cache 315); and marking the second local copy of the set of data blocks as delayed ([Eichenberger Fig. 3A, 0029] false sharing bit 200 in other processors are set to delay coherency operation (e.g. invalidating the cache block)).
Eichenberger does not teach executing, by a set of two or more processors, a diverge instruction to place a local cache associated with each of the set of two or more processors in a write through mode and to configure the set of two or more processors to permit writes by the set of two or more processors to data blocks assigned to respective processors of the set of two or more processors, wherein a cache line of a main cache memory includes the data blocks and transmitting an invalidate request to the second local cache memory nor transmitting an acknowledgment to the invalidate request.
Turner teaches transmitting an invalidate request to the second local cache memory ([Turner Fig. 7, 0094] block 704 describes sending a page table cache invalidate signal to various processing devices associated with caches storing shared page data) and transmitting an acknowledgment to the invalidate request ([Turner Fig. 7, 0098] block 712 describes receiving an acknowledgement of the page table cache invalidate signal (i.e. receivers of invalidate request transmitted acknowledgement)).

Eichenberger and Turner do not teach executing, by a set of two or more processors, a diverge instruction to place a local cache associated with each of the set of two or more processors in a write through mode and to configure the set of two or more processors to permit writes by the set of two or more processors to data blocks assigned to respective processors of the set of two or more processors, wherein a cache line of a main cache memory includes the data blocks.
Mirhosseininiri teaches executing, by a set of two or more processors, a diverge instruction to place a local cache ([Mirhosseininiri Fig. 1] depicts further cache memory 9 of mode-selectable processor 6) associated with each of the set of two or more processors ([Mirhosseininiri Fig. 1, 0029] depicts mode-selectable processor 6, where it is described that there may be one or more mode-selectable processors) in a write through mode and to configure the set of two or more processors to permit writes by the set of two or more processors to data blocks assigned to respective processors of the set of two or more processors, wherein a cache line of a main cache memory includes the data blocks ([Mirhosseininiri 0032] describes a second mode where memory accesses are made via a further cache memory (i.e. operating in write though mode)).
Eichenberger, Turner, and Mirhosseininiri are analogous art because they are from the same field of endeavor in cache memory management. Before the filing date of the claimed invention, it 

Regarding Claim 2, Eichenberger, Turner, and Mirhosseininiri teach the method of Claim 1. 
Further, Eichenberger teaches the method of Claim 1 further comprising: executing, on the second processor, a second child process of the main process to generate second output data; writing the second output data to a second data block of the second local copy as a write through; writing the second output data to the second data block of the main cache memory as a part of the write through ([Eichenberger Fig. 3A, 0031] processor core 1 writes through data block a1 to local cache 310 and shared cache 315); and invalidating the second local copy of the set of data blocks ([Eichenberger 0033] describes upon reaching a barrier (i.e. the end of parallel processing operation), the cache line 330 in local cache 310 of processor core 1 is invalidated).

Regarding Claim 4, Eichenberger, Turner, and Mirhosseininiri teach the method of Claim 1. 
Further, Eichenberger teaches the method of Claim 1 further comprising: executing, on the set of two or more processors, a converge instruction; and invalidating the data blocks marked as delayed based on the converge instruction ([Eichenberger 0033, 0041] a converge instruction is not explicitly stated, however the reference describes another flag bit being set which indicates that a barrier is reached (i.e. a convergence point) and upon reaching a barrier, the local cache line with false sharing indicator is invalidated).

Regarding Claim 5, Eichenberger, Turner, and Mirhosseininiri teach the method of Claim 4. 
Further, Eichenberger teaches the method of Claim 4 further comprising invalidating the first local copy based on the converge instruction ([Eichenberger Fig. 3A, 0033, 0041] describes upon reaching a barrier, local cache lines with false sharing indicator set (i.e. local cache line 330 of processor 0) are invalidated).

Regarding Claim 6, Eichenberger, Turner, and Mirhosseininiri teach the method of Claim 4.
Further, Eichenberger teaches the method of Claim 4 further comprising: receiving, by the main process, an indication by each child process that the child process has executed the converge instruction ([Eichenberger 0041] the compiler sets a second flag bit indicating parallel processing of the software region has completed (i.e. each processor core has executed convergence instruction)); and performing one or more operation on the main cache memory based on output data from each child process ([Eichenberger 0033] the shared cache now has all valid data blocks in the cache line and can be fetched to the local cache lines as needed).

Regarding Claim 7, Eichenberger, Turner, and Mirhosseininiri teach the method of Claim 1.
Further Eichenberger teaches wherein the main cache memory comprises a shared cache memory of a memory controller ([Eichenberger Fig. 7, 0026] describes shared cache memory 720 that includes hardware cache controller 722).


	Regarding Claim 8, Eichenberger and Turner teach a processing system comprising: a main cache memory storing a set of data blocks in a cache line ([Eichenberger Fig. 3A-C, 0026, 0031] describes shared cache memory device (i.e. main cache memory) 315; depicts each cache line having a plurality of blocks a1-a3), the set of data blocks associated with a main process ([Eichenberger Fig. 1, 0027-0028] describes software program region with no data dependency that can be run in parallel (the program region running corresponds to the main process); also describes cache line blocks are updated as a result of program region running (i.e. data blocks are associated with the main process)); a first processor of two or more processors  ([Eichenberger Fig. 3A] depicts two processor cores: 0 and 1) is configured to: store a first local copy of the set of data blocks in a first local cache memory of the first processor ([Eichenberger Fig. 3A, 0031] blocks a0-a3 are stored in cache line 330 of processor core 0), modify data within a first data block of the first local copy without modifying data in other data blocks of the set of data blocks of the first local copy ([Eichenberger Fig.3A, 0031] processor core 0 updates block a0 (355) in the cache line 330 without updates data blocks a1-a3); execute, a first child process of the main process ([Eichenberger 0027] a subset of the software program is run in each processor core, in parallel) to generate first output data ([Eichenberger 0027] each processor can make a change in its local cache memory device while running the software region in parallel); write the first output data to the first data block of the first local copy as a write through; and write the first output data to the first data block of the main cache memory as a part of the write through ([Eichenberger Fig. 3A, 0031] a0 is written through to both local cache 305 and shared cache 315); and a second processor of the two or more processors is configured to: store a second local copy of the set of data blocks in the second local cache memory of the second processor ([Eichenberger, Fig. 3B, 0031] processor core 1 has data blocks a0-a3 in corresponding cache line 330); and mark the second local copy of the set of data blocks as delayed ([Eichenberger Fig. 3A, 0029] false sharing bit 200 in other processors are set to delay coherency operation (e.g. invalidating the cache block)).
	Eichenberger does not teach a memory controller configured to transmit an invalidate request to a second local cache memory; transmitting an acknowledgment to the invalidate request; and 
	Turner teaches a memory controller configured to ([Turner 0092] describes various implementations on transmitting invalidate requests and acknowledgments (part of automatic cache coherency method), one implementation including memory controllers) transmit an invalidate request to a second local cache memory ([Turner Fig. 7, 0094] block 704 describes sending a page table cache invalidate signal to various processing devices associated with caches storing shared page data) and transmitting an acknowledgment to the invalidate request ([Turner Fig. 7, 0098] block 712 describes receiving an acknowledgement of the page table cache invalidate signal (i.e. receivers of invalidate request transmitted acknowledgement)).
Eichenberger and Turner are analogous art because they are from the same field of endeavor in cache memory management. Before the filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of a write-through cache system running in parallel with each other before him or her to modify the cache system of Eichenberger to include the invalidate request and acknowledgements of Turner. The suggestion and/or motivation for doing so would be obtaining the advantage of improving the cache coherency and efficiency of a given cache system as suggested by Turner. Therefore, it would have been obvious to combine Eichenberger with Turner to obtain the invention as specified in the instant application claims.
Eichenberger and Turner do not teach wherein the two or more processors are configured to execute a diverge instruction to place a local cache associated with each of the two or more processors in a write through mode and to configure each of the two or more processors to permit writes by each of the two or more processors to respective data blocks of the set of data blocks.
Mirhosseininiri Fig. 1] depicts further cache memory 9 of mode-selectable processor 6) associated with each of the set of two or more processors ([Mirhosseininiri Fig. 1, 0029] depicts mode-selectable processor 6, where it is described that there may be one or more mode-selectable processors) in a write through mode and to configure each of the two or more processors to permit writes by each of the two or more processors to respective data blocks of the set of data blocks ([Mirhosseininiri 0032] describes a second mode where memory accesses are made via a further cache memory (i.e. operating in write though mode)).
Eichenberger, Turner, and Mirhosseininiri are analogous art because they are from the same field of endeavor in cache memory management. Before the filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of a write-through cache system running in parallel with each other before him or her to modify the cache system of Eichenberger and Turner to include the different modes as taught by Mirhosseininiri. The suggestion and/or motivation for doing so would be increasing efficiency of the memory system by reducing latency as suggested by Mirhosseininiri. Therefore, it would have been obvious to combine Eichenberger, Turner, and Mirhosseininiri to obtain the invention as specified in the instant application claims.

Regarding Claim 9, Eichenberger, Turner, and Mirhosseininiri teaches the processing system of Claim 8. 
Further, Eichenberger teaches wherein the second processor is further configured to: execute a second child process of the main process to generate second output data; write the second output data to a second data block of the second local copy as a write through; write the second output data to the second data block of the main cache memory as a part of the write through ([Eichenberger Fig. 3A, 0031] processor core 1 writes through data block a1 to local cache 310 and shared cache 315); and [Eichenberger 0033] describes upon reaching a barrier (i.e. the end of parallel processing operation), the cache line 330 in local cache 310 of processor core 1 is invalidated).

Regarding Claim 11, Eichenberger, Turner, and Mirhosseininiri teach the processing system of Claim 8. 
Further, Eichenberger teaches wherein the two or more processors are further configured to: execute a converge instruction; and invalidate data blocks of the set of data blocks marked as delayed based on the converge instruction ([Eichenberger 0033, 0041] a converge instruction is not explicitly stated, however the reference describes another flag bit being set which indicates that a barrier is reached (i.e. a convergence point) and upon reaching a barrier, the local cache line with false sharing indicator is invalidated).

Regarding Claim 12, Eichenberger, Turner, and Mirhosseininiri teach the processing system of Claim 11. 
Further, Eichenberger teaches wherein the two or more processors are further configured to invalidate the first local copy based on the converge instruction ([Eichenberger Fig. 3A, 0033, 0041] describes upon reaching a barrier, local cache lines with false sharing indicator set (i.e. local cache line 330 of processor 0) are invalidated).

	Regarding Claim 13, Eichenberger, Turner, and Mirhosseininiri teach the processing system of Claim 11.
Further, Eichenberger teaches wherein the processing system further comprises: another processor configured to: execute the main process; receive an indication by each child process that the [Eichenberger 0041] the compiler sets a second flag bit indicating parallel processing of the software region has completed (i.e. each processor core has executed convergence instruction)); and perform one or more operations on the main cache memory based on output data from each child process ([Eichenberger 0033] the shared cache now has all valid data blocks in the cache line and can be fetched to the local cache lines as needed).

Regarding Claim 14, Eichenberger, Turner, and Mirhosseininiri teach the processing system of Claim 8. 
Further, Eichenberger teaches wherein the main cache memory comprises a shared cache memory of the memory controller ([Eichenberger Fig. 7, 0026] describes shared cache memory 720 that includes hardware cache controller 722).

Regarding Claim 15, Eichenberger teaches a non-transitory program storage device comprising instructions stored thereon to cause: a third processor associated with a main process to store a set of data blocks in a cache line of a main cache memory ([Eichenberger Fig. 7, 0026] describes shared cache memory 720 (i.e. main cache memory) that includes hardware cache controller 722 (i.e. a processor to store a set of data blocks in main cache memory)), the set of data blocks associated with the main process ([Eichenberger Fig. 1, 0027-0028] describes software program region with no data dependency that can be run in parallel (the program region running corresponds to the main process); also describes cache line blocks are updated as a result of program region running (i.e. data blocks are associated with the main process)); a first processor ([Eichenberger Fig.3A] depicts processor 0), of a set of two or more processors ([Eichenberger Fig. 3A] depicts two processor cores: 0 and 1), to: store a first local copy of the set of data blocks in a first local cache memory of the first processor ([Eichenberger Fig. 3A, 0031] blocks a0-a3 are stored in cache line 330 of processor core 0); modify [Eichenberger Fig.3A, 0031] processor core 0 updates block a0 (355) in the cache line 330 without updates data blocks a1-a3); execute, a first child process of the main process to generate first output data ([Eichenberger 0027] a subset of the software program is run in each processor core, in parallel); write the first output data to the first data block of the first local copy as a write through; and write the first output data to the first data block of the main cache memory as a part of the write through ([Eichenberger Fig. 3A, 0031] a0 is written through to both local cache 305 and shred cache 315); a second processor of the two or more processors to: store a second local copy of the set of data blocks in the second local cache memory of the second processor ([Eichenberger, Fig. 3B, 0031] processor core 1 has data blocks a0-a3 in corresponding cache line 330); and mark the second local copy of the set of data blocks as delayed ([Eichenberger Fig. 3A, 0029] false sharing bit 200 in other processors are set to delay coherency operation (e.g. invalidating the cache block)).
Eichenberger does not teach a memory controller to transmit an invalidate request to a second local cache memory nor transmitting an acknowledgment to the invalidate request and wherein the two or more processors are configured to execute a diverge instruction to place a local cache associated with each of the two or more processors in a write through mode and to configure each of the two or more processors to permit writes by each of the two or more processors to respective data blocks of the set of data blocks.
Turner teaches a memory controller ([Turner 0092] describes various implementations on transmitting invalidate requests and acknowledgments (part of automatic cache coherency method), one implementation including memory controllers) to transmit an invalidate request to a second local cache memory ([Turner Fig. 7, 0094] block 704 describes sending a page table cache invalidate signal to various processing devices associated with caches storing shared page data) and transmitting an acknowledgment to the invalidate request ([Turner Fig. 7, 0098] block 712 describes receiving an acknowledgement of the page table cache invalidate signal (i.e. receivers of invalidate request transmitted acknowledgement)).
Eichenberger and Turner are analogous art because they are from the same field of endeavor in cache memory management. Before the filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of a write-through cache system running in parallel with each other before him or her to modify the cache system of Eichenberger to include the invalidate request and acknowledgements of Turner. The suggestion and/or motivation for doing so would be obtaining the advantage of improving the cache coherency and efficiency of a given cache system as suggested by Turner. Therefore, it would have been obvious to combine Eichenberger with Turner to obtain the invention as specified in the instant application claims.
Eichenberger and Turner do not teach wherein the two or more processors are configured to execute a diverge instruction to place a local cache associated with each of the two or more processors in a write through mode and to configure each of the two or more processors to permit writes by each of the two or more processors to respective data blocks of the set of data blocks.
Mirhosseininiri teaches wherein the two or more processors are configured to execute a diverge instruction to place a local cache ([Mirhosseininiri Fig. 1] depicts further cache memory 9 of mode-selectable processor 6) associated with each of the set of two or more processors ([Mirhosseininiri Fig. 1, 0029] depicts mode-selectable processor 6, where it is described that there may be one or more mode-selectable processors) in a write through mode and to configure each of the two or more processors to permit writes by each of the two or more processors to respective data blocks of the set of data blocks ([Mirhosseininiri 0032] describes a second mode where memory accesses are made via a further cache memory (i.e. operating in write though mode)).
Eichenberger, Turner, and Mirhosseininiri are analogous art because they are from the same field of endeavor in cache memory management. Before the filing date of the claimed invention, it 

Regarding Claim 16, Eichenberger, Turner, and Mirhosseininiri teach the non-transitory program storage device of Claim 15. 
Further, Eichenberger teaches wherein the stored instructions further cause the second processor to: execute a second child process of the main process to generate second output data; write the second output data to a second data block of the second local copy as a write through; write the second output data to the second data block of the main cache memory as a part of the write through ([Eichenberger Fig. 3A, 0031] processor core 1 writes through data block a1 to local cache 310 and shared cache 315); and invalidate the second local copy of the set of data blocks ([Eichenberger 0033] describes upon reaching a barrier (i.e. the end of parallel processing operation), the cache line 330 in local cache 310 of processor core 1 is invalidated).
Regarding Claim 18, Eichenberger, Turner, and Mirhosseininiri teach the non-transitory program storage device of Claim 15. 
Further Eichenberger teaches wherein the stored instructions further cause the two or more processors to: execute a converge instruction; and invalidate data blocks marked as delayed based on the converge instruction ([Eichenberger 0033, 0041] a converge instruction is not explicitly stated, however the reference describes another flag bit being set which indicates that a barrier is reached (i.e. a convergence point) and upon reaching a barrier, the local cache line with false sharing indicator is invalidated).

Regarding Claim 19, Eichenberger, Turner, and Mirhosseininiri teach the non-transitory program storage device of Claim 18. 
Further, Eichenberger teaches wherein the stored instructions further cause the two or more processors to invalidate the first local copy based on the converge instruction ([Eichenberger Fig. 3A, 0033, 0041] describes upon reaching a barrier, local cache lines with false sharing indicator set (i.e. local cache line 330 of processor 0) are invalidated).

Regarding Claim 20, Eichenberger, Turner, and Mirhosseininiri teach the non-transitory program storage device of Claim 18. 
Further, Eichenberger teach wherein the third processor is separate from the two or more processors and wherein the stored instruction further cause the third processor to: execute the main process; receive an indication by each child process that the child process has executed the converge instruction ([Eichenberger 0041] the compiler sets a second flag bit indicating parallel processing of the software region has completed (i.e. each processor core has executed convergence instruction)); and perform one or more operations on the main cache memory based on output data from each child process ([Eichenberger 0033] the shared cache now has all valid data blocks in the cache line and can be fetched to the local cache lines as needed).




Response to Arguments
Applicants arguments with respect to Claims 1-2, 4-9, 11-16, 18-20 file on 04/07/2021 have been considered but are either deemed not persuasive, or are rendered moot in view of new grounds for rejection.
Applicant argues that the previously cited references fail to teach the newly added limitation of “executing, by a set of two or more processors, a diverge instruction to place a local cache associated with each of the set of two or more processors in a write through mode and to configure the set of two or more processors to permit writes by the set of two or more processors to data blocks assigned to respective processors of the set of two or more processors, wherein a cache line of a main cache memory includes the data blocks” in Claim 1. New reference, Mirhosseininiri, teaches this limitation as a PHOSITA can see that switching between the different modes of the processor may encapsulate the limitation of sending a “diverge instruction”. Claims 8 and 15 are analogous to Claim 1, and are therefore rejected under the same reasoning.
Applicant argues that claims 2, 4-7, 9, 11-14, 16, 18-20 are in condition for allowance due to their dependencies, however this no longer holds true. 
All arguments by the applicant are believed to be covered in the body of this office action thus, this action constitutes a complete response to the issues raised in the remarks dated 04/07/2021. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRUNG-HAO J NGUYEN whose telephone number is (571)272-3517.  The examiner can normally be reached on Monday - Friday, 8:00 - 5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached on (571)270-7519.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TRUNG-HAO JOSEPH NGUYEN/Examiner, Art Unit 2132                                                                                                                                                                                         
/DAVID YI/Supervisory Patent Examiner, Art Unit 2132