Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection mailed on 12/09/2021.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/04/2022 has been entered. 	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected 35 U.S.C. 103 as being unpatentable over Fleming, JR et al. (USPGPUB No. 2019/0303297 A1, hereinafter referred to as Fleming) in view of Sankaran et al. (USPGPUB No. 2019/0347125 A1, hereinafter referred to as Sankaran) and further in view of Wang et al. (USPGPUB No. 2019/0102346 A1, hereinafter referred to as Wang) and further in view of Wang et al. (USPGPUB No. 2019/0102346 A1, hereinafter referred to as Wang) further in view of Chetlur et al. (USPGPUB No. 2021/0133583 A1, hereinafter referred to as Chetlur). 
As per claim 1, Kessler discloses an enhanced direct memory access (DMA) engine comprising {“Processor Engines of a CSA”, see Fig. 9, [0172], said CSA implements DMA (see Figs. 1 and 86, [0605]).}: 
an arithmetic logic unit (ALU) {“ALU 918”, see Fig. 9, [0176]}; and control logic configured to {“scheduler 914”, see Fig. 9, [0176]}:
detect a command targeting 	the enhanced DMA engine {“when input data and control input arrives”, see Fig. 6, [0176]}; perform a DMA transfer operation responsive {“steer the various multiplexors within the PE”, see Fig. 9, [0172]} to determining that an operation flag {“microcoding these configuration bits”, see Fig. 9, [0172]} in the command has a first value {“type of memory access”, see Fig. 13, [0205]};
Fleming does not appear to explicitly disclose and responsive to determining that the operation flag in the command has a second value:  wherein perform one or more read operations to load first data into the enhanced DMA engine; 
cause the ALU to perform one or more operations on the first data to generate second data; and perform one or more write operations to store the second data.
However, Sankaran discloses and responsive to determining that the operation flag in the command has a second value {“value in the burst descriptor”, see Fig. 90, [0984]}: perform one or more read operations to load first data {first data “have been captured in the vector value buffer”, see Fig. 89, [0985].} comprising multiple data inputs into the enhanced DMA engine {“required dot-product(s)”, see Fig. 89, [0985]}; 
Fleming and Sankaran are analogous because they are from the same field of endeavor, loading and recognizing ALU operation(s). 
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Fleming and Sankaran before him or her, to modify Fleming’s “enhanced DMA engine” incorporating Sankaran’s “dot matrix” operations (see Fig. 89, [0983]). 
The suggestion/motivation for doing so would have been to implement a heterogeneous scheduler to map each phase of the workload to the most suitable type of processing element. Ideally, this mitigates the need to build hardware for legacy features and avoids exposing details of the microarchitecture (Sankaran [0234]).
Therefore, it would have been obvious to combine Sankaran with Fleming to obtain the invention as specified in the instant claim(s).
Furthermore, Wang discloses: perform a DMA transfer operation {“DMA engine 1452”, see Fig. 14, [0086]}, comprising at least a read operation to a memory to load data {“atomicity of the read/write operations”, see Figs. 3 and 4, [0042]} into the enhanced DMA engine {“”}.
Fleming/Sankaran and Wang are analogous because they are from the same field of endeavor, loading and recognizing ALU operation(s). 
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Fleming and Sankaran before him or her, to modify Fleming/Sankaran’s device incorporating Wang’s “lookup engine” (see Fig. 4). 
The suggestion/motivation for doing so would have been to implement cache LLC slices and lookup engines facilitating  performance criteria in data lookup system design is the reduction of time from when a request for data is received to a time in which a response is provided (Wang [0002]).
Therefore, it would have been obvious to combine Wang with Fleming/Sankaran to obtain the invention as specified in the instant claim(s).
Furthermore, Chetlur discloses: wherein the command identifier a reduction operation {“all-reduce operation”, [0055]}, comprising at least a read operation to a memory to load data {“to be loaded to configure, logic, including integer and/or floating point”, see Fig. 9a, [0095]} into the enhanced DMA engine {“”}.
cause the ALU to combine the multiple data inputs into a single data output {“one or more arithmetic logic unit (s)”, [0097]} and convey the single data output {“matrix-based mathematics performed by ALU(s) 910”, [0097]}.
Fleming/Sankaran/Wang and Chetlur are analogous because they are from the same field of endeavor, loading and recognizing ALU operation(s). 
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Fleming and Sankaran before him or her, to modify Fleming/Sankaran/Wang’s device incorporating Chetlur’s allReduce operation and corresponding command ALU (see Fig. 9, [0095]). 
The suggestion/motivation for doing so would have been to implement a backward propagation passes (Chetlur [0053) in a machine learning module (Chetlur [0054]) so that workers then use this to redundantly update their own copy of weights, ensuring workers maintain identical copies of updated weights going into a next iteration (Chetlur [0055]).
Therefore, it would have been obvious to combine Wang with Fleming/Sankaran to obtain the invention as specified in the instant claim(s).

As per claim 2, the rejection of claim 1 is incorporated and Sankaran discloses wherein when the operation flag has the second value, the command includes a field specifying {“required elements of the vector”, see Fig. 89, [0985]} the one or more operations to be performed by the ALU {one or more operations “required dot-product(s)”, see Fig. 89, [0985]}.

As per claim 3, the rejection of claim 1 is incorporated and Fleming discloses wherein the control logic operates in a first mode when the operation flag has the first value {first mode “during a special mode controlled”, see Fig. 9, [0307]}, and wherein the control logic operates in a second mode {“an OS managed mode”, see Fig. 53, [0355]} when the operation flag has the second value {“configuration cache”, see Fig. 53, [0355]}.

As per claim 4, the rejection of claim 3 is incorporated and Chetlur discloses wherein the command corresponds to a kernel converted by a processor {“may be a thread, a process, a processor”, [0053]} into an one or more enhanced DMA commands  {“VA core may include a processor subsystem, DMA engine(s)}”, [0154]}.

As per claim 5, the rejection of claim 1 is incorporated and Chetlur discloses wherein the enhanced DMA engine is configured to broadcast the single data output {“a reduce-scatter operation partitions a weight update operation into a number of approximately equal portions which are distributed to workers”, see Fig. 5, [0076]154]}.

As per claim 6, the rejection of claim 1 is incorporated and Fleming discloses wherein the command includes a data-type flag {“dataflow operator”, [0134]} which specifies a data format of operands being operated on by the ALU {“incoming operands become available”, [0134]}.

As per claim 7, the rejection of claim 6 is incorporated and Fleming discloses wherein the data-type flag specifies one or more of bitfield {Examiner interpretation: by limitations “one or more of” in conjunction with “and/or combinations” is treated as a Markush claim, thus the reference reciting at least one member of the group to address the claim.}, signed integer, unsigned integer, characters, standard floatingpoint, custom floating point, fixed-point fractions, bit-width, and/or combinations of multiple values {“for example, one or more of floating point addition and multiplication” or “integer addition”, [0134]}.

Referring to claim 8-14 are method claims reciting claim functionality corresponding to the apparatus claim of claims 1-7, respectively, thereby rejected under the same rationale as claims 1-7 recited above.

Referring to claim 15-20 are system claims reciting claim functionality corresponding to the apparatus claim of claims 1-7, respectively, thereby rejected under the same rationale as claims 1-7 recited above.
Response to Arguments  
Applicant’s arguments, filed on 05/04/2022, have been considered however rendered moot in view of the new ground of rejection(s). 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following references as 103 art teaching the reduce operation as recited in claim 1: US 20210092069 A1, US 20200042362 A1, and US 20090327464 A1.

Contact Information

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER A. BARTELS whose telephone number is (571)270-3182.  The examiner can normally be reached on Monday-Friday 9:00a-5:30pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Dr. Henry Tsai can be reached on 571-272-4176.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/C.A.B./
Examiner
Art Unit 2184




/HENRY TSAI/Supervisory Patent Examiner, Art Unit 2184