16662981The present application, filed on or after 16 March 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION

This office action is in response to Applicant’s submission filed on 24 October 2019.     THIS ACTION IS NON-FINAL.

Status of Claims

Claims 1-21 are pending.
Claim 7 include limitations interpreted under 35 U.S.C. 112(f), because it uses a generic placeholder coupled with functional language without reciting sufficient structure to achieve the function.  
Claims 1-21 are rejected under 35 U.S.C. 103 as unpatentable.

Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


In claim 7, claim limitations "a plurality of heterogeneous computation units (HCUs) … being configured to …", HCUs  have been interpreted under 35 U.S.C. 112(f), because they use a generic placeholder coupled with functional language without reciting sufficient structure to achieve the function.  A review of the specification shows that in Item 110 of FIG.1A appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph limitation.
If applicant wishes to provide further explanation or dispute the examiner’s interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action. 
If applicant does not intend to have the claim(s) limitations treated under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112 , sixth paragraph, applicant may amend the claim(s) so that it/they will clearly not invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, or present a sufficient showing that the claim recites/recite sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).



Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-21 are rejected under 35 U.S.C. 103 as being unpatentable over Matthews, et al., US-PATENT NO.11328222B1 [hereafter Matthews] in view of Kim et al., US-PGPUB NO.20210374503A1 [hereafter Kim].

With regards to claim 7, Matthews teaches 
“A neural network processing system, comprising: a network connection (Matthews, FIG.2, 

    PNG
    media_image1.png
    592
    829
    media_image1.png
    Greyscale

); and a plurality of heterogeneous computation units (HCUs) communicatively coupled to the network connection (Matthews, FIG.1-6, FIG.22,

    PNG
    media_image2.png
    584
    852
    media_image2.png
    Greyscale

), the plurality of HCUs being configured to:  32Attorney Docket No. 12852.0322-00000 Alibaba Ref. No.: A23491US compute a first plurality of gradients from a first plurality of samples, wherein the first plurality of gradients are aggregated to generate an aggregated gradient (Matthews, FIG.3, FIG.21, C5L17-19, ‘When writing a gradient, a compute logic first aggregates the gradient with the working result …’,

    PNG
    media_image3.png
    728
    536
    media_image3.png
    Greyscale

) ….”.
Matthew does not explicitly detail “compute a second plurality of gradients from a second plurality of samples; aggregate, at each of the plurality of HCUs, the aggregated gradient with a corresponding gradient of the second plurality of gradients to generate a local gradient update; and update, at each of the plurality of HCUs, a local copy of a neural network with the local gradient update”.
However Kim teaches “compute a second plurality of gradients from a second plurality of samples; aggregate, at each of the plurality of HCUs, the aggregated gradient with a corresponding gradient of the second plurality of gradients to generate a local gradient update (Kim, FIG.2,

    PNG
    media_image4.png
    448
    955
    media_image4.png
    Greyscale

); and update, at each of the plurality of HCUs, a local copy of a neural network with the local gradient update (Kim, FIG.3A&B,

    PNG
    media_image5.png
    457
    612
    media_image5.png
    Greyscale


    PNG
    media_image6.png
    763
    633
    media_image6.png
    Greyscale

)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Matthew and Kim before him or her, to modify the distributed neural network training method & system of Matthew to include distributed gradient aggregation as shown in Kim.   
The motivation for doing so would have been for distributed training Kim, Background). 

With regards to claim 8, Matthews in view of Kim teaches 
“The neural network processing system of claim 7”
Matthew does not explicitly detail “wherein the plurality of HCUs are further configured to: aggregate, at each of the plurality of HCUs, a plurality of local gradients of that HCU to generate a local aggregated gradient”.
However Kim teaches “wherein the plurality of HCUs are further configured to: aggregate, at each of the plurality of HCUs, a plurality of local gradients of that HCU to generate a local aggregated gradient (Kim, FIG.2,

    PNG
    media_image4.png
    448
    955
    media_image4.png
    Greyscale

[0041], ‘In these algorithms, worker and aggregator nodes that compute the local gradient .. and the non-leaf nodes are the aggregator nodes that collect the calculated local gradients …’)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Matthew and Kim before him or her, to modify the distributed neural network training method & system of Matthew to include distributed gradient aggregation as shown in Kim.   
The motivation for doing so would have been for distributed training Kim, Background). 

With regards to claim 9, Matthews in view of Kim teaches 
“The neural network processing system of claim 8”
Matthew does not explicitly detail “wherein the plurality of HCUs are further configured to: aggregate the local aggregated gradients from the plurality of HCUs”.
However Kim teaches “wherein the plurality of HCUs are further configured to: aggregate the local aggregated gradients from the plurality of HCUs (Kim, FIG.2,

    PNG
    media_image4.png
    448
    955
    media_image4.png
    Greyscale

)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Matthew and Kim before him or her, to modify the distributed neural network training method & system of Matthew to include distributed gradient aggregation as shown in Kim.   
The motivation for doing so would have been for distributed training Kim, Background). 

With regards to claim 10, Matthews in view of Kim teaches 
“The neural network processing system of claim 8”
Matthew does not explicitly detail “wherein the plurality of HCUs are in different groups, and wherein the plurality of HCUs are further configured to: aggregate, for each group of HCUs, the local aggregated gradients from HCUs of that group to generate an intra-group aggregated gradient; and aggregate the intra-group aggregated gradients of two or more groups”.
However Kim teaches “wherein the plurality of HCUs are in different groups, and wherein the plurality of HCUs are further configured to: aggregate, for each group of HCUs, the local aggregated gradients from HCUs of that group to generate an intra-group aggregated gradient; and aggregate the intra-group aggregated gradients of two or more groups (Kim, FIG.1A-C, FIG.3A-B,

    PNG
    media_image7.png
    303
    848
    media_image7.png
    Greyscale

[0033], ‘The aggregation operator (typically a sum operation) is associative and thus, the gradients can be aggregated gradually by a group of worker nodes’)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Matthew and Kim before him or her, to modify the distributed neural network training method & system of Matthew to include distributed gradient aggregation as shown in Kim.   
The motivation for doing so would have been for distributed training Kim, Background). 

With regards to claim 11, Matthews in view of Kim teaches 
“The neural network processing system of claim 10”
Matthew does not explicitly detail “wherein the plurality of HCUs are further configured to: compute a third plurality of gradients from a third plurality of samples; aggregate, at each of the plurality of HCUs, the intra-group aggregated gradient with corresponding gradient of the third plurality of gradients to generate a second local gradient update; and updating, at each of the plurality of HCUs, a local copy of a neural network with the second local gradient update”.
However Kim teaches “wherein the plurality of HCUs are further configured to: compute a third plurality of gradients from a third plurality of samples; aggregate, at each of the plurality of HCUs, the intra-group aggregated gradient with corresponding gradient of the third plurality of gradients to generate a second local gradient update; and updating, at each of the plurality of HCUs, a local copy of a neural network with the second local gradient update (Kim, FIG.1A-C, FIG.3A-B,

    PNG
    media_image7.png
    303
    848
    media_image7.png
    Greyscale

[0033], ‘The aggregation operator (typically a sum operation) is associative and thus, the gradients can be aggregated gradually by a group of worker nodes’)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Matthew and Kim before him or her, to modify the distributed neural network training method & system of Matthew to include distributed gradient aggregation as shown in Kim.   
The motivation for doing so would have been for distributed training Kim, Background). 

With regards to claim 12, Matthews in view of Kim teaches 
“The neural network processing system of claim 7, wherein the HCU comprises: a memory for storing data; a computing unit communicatively coupled to the memory, the computing unit being configured to perform a computation or aggregation of gradients; an interconnect unit configured to send and receive data and instructions; and a controller configured to control operations of the computing unit and the interconnect unit (Matthews, FIG.3, 

    PNG
    media_image8.png
    596
    811
    media_image8.png
    Greyscale

C4L35-36, ‘to aggregate a gradient with other gradients, a compute node would conventionally send the gradient out …’, and  FIG.22, 

    PNG
    media_image9.png
    516
    737
    media_image9.png
    Greyscale

)”.

With regards to claim 13, Matthews in view of Kim teaches 
“The neural network processing system of claim 7”
Matthew does not explicitly detail “wherein the HCU comprises: a plurality of buffers configured to buffer a gradient and a weight; a plurality of multiplexers communicatively coupled to the plurality of buffers and configured to multiplex the gradient and weight in the plurality of buffers”.
However Kim teaches “wherein the HCU comprises: a plurality of buffers configured to buffer a gradient and a weight; a plurality of multiplexers communicatively coupled to the plurality of buffers and configured to multiplex the gradient and weight in the plurality of buffers (Kim, FIG.7

    PNG
    media_image10.png
    620
    614
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    380
    507
    media_image11.png
    Greyscale

)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Matthew and Kim before him or her, to modify the distributed neural network training method & system of Matthew to include distributed gradient aggregation as shown in Kim.   
The motivation for doing so would have been for distributed training Kim, Background). 

With regards to claim 14, Matthews in view of Kim teaches 
“The neural network processing system of claim 7”
Matthew does not explicitly detail “wherein the aggregation of the first plurality of gradients to generate an aggregated gradient is performed in parallel with computation of a second plurality of gradients from a second plurality of samples”.
However Kim teaches “wherein the aggregation of the first plurality of gradients to generate an aggregated gradient is performed in parallel with computation of a second plurality of gradients from a second plurality of samples (Kim, FIG.1A-C, FIG.3A-B,

    PNG
    media_image7.png
    303
    848
    media_image7.png
    Greyscale

[0047], ‘In disclosed embodiments, the three groups of aggregation (from the first, second, and third groups of computing devices) can run in parallel, as illustrated in FIG.1C, for example’)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Matthew and Kim before him or her, to modify the distributed neural network training method & system of Matthew to include distributed gradient aggregation as shown in Kim.   
The motivation for doing so would have been for distributed training Kim, Background). 

Claims 1-6, 15-21 are substantially similar to claims 7-14. The arguments as given above for claims 7-14 are applied, mutatis mutandis, to claims 1-6, 15-21, therefore the rejection of claims 7-14 are applied accordingly.



Additional Relevant Art
The prior art made of record is considered pertinent to applicant’s disclosure and is recorded on Form PTO-892. Applicant is required under 37 C.F.R. § 1.111 (c) to consider these references fully when responding to this action, with particular attention paid to:
Zinkevich et al., “Parallelized Stochastic Gradient Descent”, NIPS 2010 [hereafter Zinkevich] shows parallel gradient calculation.


Examiner's Note

The Examiner respectfully requests of the Applicant in preparing responses, to fully consider the entirety of the reference(s) as potentially teaching all or part of the claimed invention.  It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned.  They are part of the literature of the art, relevant for all they contain.”  In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including non-preferred embodiments (see MPEP 2123).  The Examiner has cited particular locations in the reference(s) as applied to the claim(s) above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim(s), typically other passages and figures will apply as well. 


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TSU-CHANG LEE whose telephone number is 571-272-3567.  The fax number is 571-273-3567.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas, can be reached 571-272-2589.  
 Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TSU-CHANG LEE/
Primary Examiner, Art Unit 2128