DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. 

Claims 1-31 are pending in this office action and presented for examination.

Specification
The disclosure is objected to because of the following informalities. Appropriate correction is required.
In [0004], “are not well suited” should be “is not well suited”. 
In [0004], “because of it” should be “because it”. 
In [0018], line 6, “steaming” should be “streaming”. 
In [0021], line 2, “core engine 130” should be “core 130”. 
In [0021], second-to-last line, “architecture 100” should be “architecture 101”. 
In [0024], line 2, “core engine 130” should be “core 130”. 
In [0026], line 8, “ML instruction RAM 230” should be “ML command RAM 230”.
In [0026], line 6, “ML instruction RAM 230” should be “ML command RAM 230”.
In [0027], line 5, “breakdown” should be “break down”.
In [0028], line 4, “steaming” should be “streaming”.
In [0034], line 2, “instruction engine 150” should be “instruction-streaming engine 150”.
In [0034], line 4, “instruction streamer 150” should be “instruction-streaming engine 150”.
In [0037], line 2, “calculated” should be “calculates”.
In [0038], line 7, “row” should be “rows”. 
In [0051], line 4, “translocation engine 150” should be “instruction-streaming engine 150”.

The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Drawings
The drawings are objected to because:
The view numbers must be larger than the numbers used for reference characters.
MPEP 608.02, section V, states that “[l]ead lines are required for each reference character except for those which indicate the surface or cross section on which they are placed. Such a reference character must be underlined to make it clear that a lead line has not been left out by mistake." However, Figure 1A (210, 190), 1B (210, 190), 3A (110), and 5A (130) each contain reference characters that are neither underlined nor associated with lead lines.
For Figures 1A, 1B, 3A, 4, 5A, 5B, 6, and 7, the drawing sheet numbers and the “REPLACEMENT SHEET” label should be further apart from the actual figures. 
All drawings must be made by a process which will give them satisfactory reproduction characteristics. Every line, number, and letter must be durable, clean, black (except for color drawings), sufficiently dense and dark, and uniformly thick and well-defined. The weight of all lines and letters must be heavy enough to permit adequate reproduction. This requirement applies to all lines however fine, to shading, and to lines representing cut surfaces in sectional views. However, FIG. 5A and 5B (for example, see various lines and reference characters 130 and 160) do not meet this requirement.
In FIG. 5A, it is unclear as to whether the core on the left is part of that which is associated with reference character 160.
In FIG. 5B, it is unclear as to whether the core on the left is part of that which is associated with reference character 160.
Figure 6 associates reference character 601 with RELU. However, paragraph [0049] associates both reference character 601 and reference character 610 with RELU. In addition, reference character 610 does not appear to be in the Figures. 
In Figure 7, “DATA , ADDRESS” should be “DATA, ADDRESS”. 
Numbers, letters, and reference characters should not cross or mingle with the lines. However, various “5”s and “7”s in FIG. 8 above the POD blocks mingle with a line. In addition, various characters mingle with a line in FIG. 5A and FIG. 5B (e.g., “C”s in “Compute”, “d”s in Sigmoid”)
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing 

Claim Objections
Claims 1-31 are objected to because of the following informalities.  Appropriate correction is required.
In claim 1, line 8, an “and” appears to be missing at the end of the line.
Claims 2-12 are objected to for failing to alleviate the objection of claim 1 above.

Claim 3 recites the limitation “the inference engine” in lines 2-3. However, there is insufficient antecedent basis for this limitation in the claims. For the purposes of prior art examination, Examiner is interpreting this limitation as “the array-based inference engine”.
Claim 3 recites the limitation “the inference engine” in line 3. However, there is insufficient antecedent basis for this limitation in the claims. For the purposes of prior art examination, Examiner is interpreting this limitation as “the array-based inference engine”.

Claim 4 recites the limitation “the inference engine” in line 3. However, there is insufficient antecedent basis for this limitation in the claims. For the purposes of prior art examination, Examiner is interpreting this limitation as “the array-based inference engine”.
Claims 5-6 are objected to for failing to alleviate the objection of claim 4 above.

Claim 11 recites the limitation “one or more post matrix multiplication operation” in lines 2-3. However, this limitation should presumably be “one or more post matrix multiplication operations”.
Claim 12 is objected to for failing to alleviate the objection of claim 11 above.

Claim 12 recites the limitation “the inference engine” in line 2. However, there is insufficient antecedent basis for this limitation in the claims. For the purposes of prior art examination, Examiner is interpreting this limitation as “the array-based inference engine”.
Claim 12 recites the limitation “these post matrix multiplication operations” in lines 7-8. However, this limitation has insufficient antecedent basis in the claims. For the purposes of prior art examination, Examiner is interpreting this limitation as “the one or more post matrix multiplication operations”. 

In claim 13, line 11, an “and” appears to be missing following the penultimate step.
Claims 14-22 are objected to for failing to alleviate the objection of claim 13 above.

Claim 21 recites the limitation “one or more post matrix multiplication operation” in line 2. However, this limitation should presumably be “one or more post matrix multiplication operations”.
Claim 22 is objected to for failing to alleviate the objection of claim 21 above.

Claim 22 recites the limitation “these post matrix multiplication operations” in lines 6-7. However, this limitation has insufficient antecedent basis in the claims. For the purposes of prior art examination, Examiner is interpreting this limitation as “the one or more post matrix multiplication operations”.

In claim 23, line 4, an “and” appears to be missing following the penultimate element.
In claim 23, line 10, an “and” appears to be missing at the end of the line.
In claim 23, line 15, an “and” appears to be missing at the end of the line.
Claims 24-26 are objected to for failing to alleviate the objections of claim 23 above.

Claim 25 recites the limitation “the second plurality sub-tasks” in line 2. However, there is insufficient antecedent basis for this limitation in the claims. For the purposes of prior art examination, Examiner is interpreting this limitation as “the second plurality of sub-tasks”. 

In claim 27, line 9, an “and” appears to be missing at the end of the line.
In claim 27, line 11, an “and” appears to be missing at the end of the line.

In claim 28, line 3, an “and” appears to be missing at the end of the line.
In claim 28, line 10, an “and” appears to be missing at the end of the line.
In claim 28, line 12, an “and” appears to be missing at the end of the line.
In claim 28, line 15, an “and” appears to be missing at the end of the line.
Claims 29-30 are objected to for failing to alleviate the objections of claim 28 above. 

In claim 31, line 3, an “and” appears to be missing at the end of the line.
In claim 31, line 9, an “and” appears to be missing at the end of the line.
In claim 31, line 12, an “and” appears to be missing at the end of the line.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-22 and 31 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 4-12, and 14-19 of U.S. Patent No. 10824433. Although the claims at issue are not identical, they are not patentably distinct from each other because all the limitations of each of the aforementioned instant claims are taught by a corresponding claim of the ‘433 patent. As an exemplary case, see the table below, wherein standard-format limitations in the left column correlate to italicized limitations in the right column.

Claim 1 of Instant Application: 16948867
Claim 1 of Patent: 10824433

1. An array-based inference engine configured to perform a machine learning (ML) operation on an input data stream, comprising: 
a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns, wherein each processing tile of the plurality of processing tiles comprises at least one or more of 
a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns, wherein each processing tile of the plurality of processing tiles comprises at least one or more of 
an on-chip memory (OCM) configured to receive and maintain data from the input data stream for local access by components in the each processing tile; 
an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the each processing tile; 
maintain and output result of the ML operation performed by the each processing tile as an output data stream; 
maintain and output result of the ML operation performed by the each processing tile as an output data stream; 
a first processing unit configured to perform a first type of computation task of the ML operation on the data in the OCM; and 
a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM; and 
a second processing unit configured to perform a second type of computation task of 
a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD, the plurality of processing tiles are organized into a plurality of processing blocks, and wherein the OCMs of the plurality of processing tiles in the same processing block are configured to support aligned-reads, wherein data allocated and maintained in the OCMs are retrieved directly by the corresponding PODs and/or PEs in the processing tiles via at least one read port in each of the OCMs.


All the limitations of instant claim 2 are taught by claim 1 of the ‘433 patent. (Note that the recited OCM, to receive data and output data, necessarily entails a read port and a write port.)
All the limitations of instant claim 3 are taught by claim 2 of the ‘433 patent.
All the limitations of instant claim 4 are taught by claim 4 of the ‘433 patent.
All the limitations of instant claim 5 are taught by claim 5 of the ‘433 patent.
All the limitations of instant claim 6 are taught by claim 6 of the ‘433 patent.
All the limitations of instant claim 7 are taught by claim 7 of the ‘433 patent.
All the limitations of instant claim 8 are taught by claim 1 of the ‘433 patent.
All the limitations of instant claim 9 are taught by claim 1 of the ‘433 patent.
All the limitations of instant claim 10 are taught by claim 8 of the ‘433 patent.
All the limitations of instant claim 11 are taught by claim 9 of the ‘433 patent.
All the limitations of instant claim 12 are taught by claim 10 of the ‘433 patent.
All the limitations of instant claim 13 are taught by claim 11 of the ‘433 patent.
All the limitations of instant claim 14 are taught by claim 11 of the ‘433 patent. (Note that the recited OCM, to receive data and output data, necessarily entails a read port and a write port.)
All the limitations of instant claim 15 are taught by claim 12 of the ‘433 patent.
All the limitations of instant claim 16 are taught by claim 4 (or 14) of the ‘433 patent.
All the limitations of instant claim 17 are taught by claim 5 (or 14) of the ‘433 patent.
All the limitations of instant claim 18 are taught by claim 6 (or 15) of the ‘433 patent.
All the limitations of instant claim 19 are taught by claim 16 of the ‘433 patent.
All the limitations of instant claim 20 are taught by claim 17 of the ‘433 patent.
All the limitations of instant claim 21 are taught by claim 18 of the ‘433 patent.
All the limitations of instant claim 22 are taught by claim 19 of the ‘433 patent.
All the limitations of instant claim 31 are taught by claim 15 of the ‘433 patent.

Claims 23-25 and 28-30 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 4, 11, 14, and 15 of U.S. Patent No. 10824433 in view of Achilles et al. (Achilles) (US 20110307890 A1).
Regarding the additional limitation that claim 23 recites but is not taught by claim 11 of the ‘433 patent, Achilles is relied upon to render obvious this additional limitation in an analogous manner as Achilles was relied upon in the rejection of claim 23 under 35 USC 103 below; see the citations in Achilles and corresponding rationale for obviousness in the rejection of claim 23 under 35 USC 103 below.
Except for the limitation that Achilles is relied upon to render obvious, all the limitations of instant claim 24 are taught by claims 4 (or 14) of the ‘433 patent.
Except for the limitation that Achilles is relied upon to render obvious, all the limitations of instant claim 25 are taught by claim 11 of the ‘433 patent.

Regarding the additional limitation that claim 28 recites but is not taught by claim 11 of the ‘433 patent, Achilles is relied upon to render obvious this additional limitation in an analogous manner as Achilles was relied upon in the rejection of claim 23 under 35 USC 103 below; see the citations in Achilles and corresponding rationale for obviousness in the rejection of claim 23 under 35 USC 103 below.
Except for the limitation that Achilles is relied upon to render obvious, all the limitations of instant claim 29 are taught by claims 4 (or 14) of the ‘433 patent.
Except for the limitation that Achilles is relied upon to render obvious, all the limitations of instant claim 30 are taught by claim 15 of the ‘433 patent.

Claim 26 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 15 (which is indirectly dependent on claim 11) of U.S. Patent No. 10824433 and Achilles et al. (Achilles) (US 20110307890 A1) as applied to claim 23 above, and further in view of Anderson et al. (Anderson) (US 20150019836).
Regarding the additional limitation that claim 26 recites but is not taught by claim 15 of the ‘433 patent and Achilles, Anderson is relied upon to render obvious this additional limitation in an analogous manner as Anderson was relied upon in the rejection of claim 26 under 35 USC .

Claim 27 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 15 of U.S. Patent No. 10824433 in view of Anderson et al. (Anderson) (US 20150019836)
Regarding the additional limitation that claim 27 recites but is not taught by claim 15 of the ‘433 patent, Anderson is relied upon to render obvious this additional limitation in an analogous manner as Anderson was relied upon in the rejection of claim 27 under 35 USC 103 below; see the citations in Anderson and corresponding rationale for obviousness in the rejection of claim 27 under 35 USC 103 below.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-31 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation “maintain and output result” in line 9. However, it is indefinite as to whether “a result” or “results” are being maintained and output. For the purposes of prior art examination, Examiner is interpreting the former possibility to be the case. 
Claim 1 recites the limitation “the ML operation performed by the each processing tile” in lines 9-10. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and each processing tile has been recited, the claims did not previously recite an ML operation that was performed by the each processing tile. 
Claim 1 recites the limitation “the data in the OCM and/or from the first processing unit” in lines 14-15. However, this limitation has insufficient antecedent basis in the claims, because data from the first processing unit has not been previously recited.
Claims 2-12 are rejected for failing to alleviate the rejections of claim 1 above. 

Claim 2 recites the limitation “The system of claim 1” in line 1. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “The array-based inference engine of claim 1” or as being dependent on a claim directed to a system. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.
Claim 2 recites the limitation “the data maintained in the OCM in the each processing tile is retrieved from and/or written to directly by the first processing unit” in lines 2-3. However, it is indefinite as to what is being conveyed. For example, it is indefinite as to what it means for data maintained in the OCM to be “retrieved from … the first processing unit” — is the data in 

Claim 3 recites the limitation “The system of claim 1” in line 1. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “The array-based inference engine of claim 1” or as being dependent on a claim directed to a system. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.

Claim 4 recites the limitation “The system of claim 1” in line 1. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “The array-based inference engine of claim 1” or as being dependent on a claim directed to a system. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.
Claims 5-6 are rejected for failing to alleviate the rejection of claim 4 above.

Claim 5 recites the limitation “the operations” in line 5. However, there is insufficient antecedent basis for this limitation in the claims. Note that this limitation is also recited in claim 5, line 6.
Claim 6 is rejected for failing to alleviate the rejection of claim 5 above. 

Claim 6 recites the limitation “each processing tile is programmed to load, process, and output the input data stream and/or the output data stream via one streaming instruction, wherein 

Claim 7 recites the limitation “The system of claim 1” in line 1. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “The array-based inference engine of claim 1” or as being dependent on a claim directed to a system. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.
Claim 7 recites the limitation “the processing blocks” in line 3. However, there is insufficient antecedent basis for this limitation in the claims. To any extent to which this limitation is intended to be “the one or more processing blocks”, it is further indefinite as to what it means for (just) one processing block to be coupled to one another. 

Claim 8 recites the limitation “The system of claim 1” in line 1. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “The array-based inference engine of claim 1” or as being dependent on a claim directed to a system. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.
Claims 10-12 are rejected for failing to alleviate the rejection of claim 8 above.

Claim 9 recites the limitation “The system of claim 1” in line 1. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “The array-based inference engine of claim 1” or as being dependent on a claim directed to a system. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.

Claim 12 recites the limitation “one or more post matrix multiplication operations” in lines 2-3. However, it is indefinite as to whether these “one or more post matrix multiplication operations” are the same as or different from “one or more post matrix multiplication operation” in claim 11, lines 2-3. If the same, antecedent basis language should be used.
Claim 12 recites the limitation “the one or more post matrix multiplication operations” in lines 4-5. However, it is indefinite as to whether this limitation has antecedent basis back to “one or more post matrix multiplication operation” in claim 11, lines 2-3, or “one or more post matrix multiplication operations” in claim 12, lines 2-3.
Claim 12 recites the limitation “the matrix multiplication” in lines 5-6. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “a matrix multiplication” or as “the matrix multiplication operation”. For the purposes of prior art examination, Examiner is taking the latter possibility to be the case.

Claim 13 recites the limitation “A method to support perform” in line 1. However, it is indefinite as to what is being conveyed. For example, it is indefinite as to whether the method is 
Claim 13 recites the limitation “receiving and maintaining data from the input data stream for local access by an on-chip memory (OCM)” in lines 3-4. However, it is indefinite as to whether the on-chip memory is that which locally accesses the received and maintained data, or whether the on-chip memory is that which receives and maintains the data.
Claim 13 recites the limitation “the data in the OCM and/or from the first processing unit” in lines 9-10. However, this limitation has insufficient antecedent basis in the claims, because data from the first processing unit has not been previously recited.
Claim 13 recites the limitation “the ML operation performed by the processing tile” in lines 12-13. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and a processing tile has been recited, the claims did not previously recite an ML operation that was performed by the processing tile. 
Claim 13 recites the limitation “maintaining and outputting result” in line 12. However, it is indefinite as to whether “a result” or “results” are being maintained and output. For the purposes of prior art examination, Examiner is interpreting the former possibility to be the case. 
Claims 14-22 are rejected for failing to alleviate the rejections of claim 13 above.

Claim 14 recites the limitation “the OCM in the each processing tile” in lines 2-3. However, there is insufficient antecedent basis for this limitation in the claims.
Claim 14 recites the limitation “the each processing tile” in lines 2-3. However, there is insufficient antecedent basis for this limitation in the claims. Note that this limitation is also recited in lines 3-4.
Claim 14 recites the limitation “the first processing unit and/or second processing unit in the each processing tile” in lines 3-4. However, there is insufficient antecedent basis for this limitation in the claims. 

Claim 17 recites the limitation “the operations” in lines 4-5. However, there is insufficient antecedent basis for this limitation in the claims. Note that this limitation is also recited in claim 17, line 6.
Claim 18 is rejected for failing to alleviate the rejection of claim 17 above.

Claim 18 recites the limitation “programming each processing tile to load, process, and output the input data stream and/or the output data stream via one streaming instruction, wherein the input data stream and/or the output data stream each comprises a plurality of data” in lines 1-4. However, it is indefinite as to what is being conveyed. For example, it is indefinite as to whether the claim is conveying the scenario wherein an output data stream is loaded and an input data stream is output.

Claim 19 recites the limitation “the processing blocks” in lines 4-5. However, there is insufficient antecedent basis for this limitation in the claims. To any extent to which this limitation is intended to be “the one or more processing blocks”, it is further indefinite as to what it means for (just) one processing block to be coupled to one another.

Claim 22 recites the limitation “one or more post matrix multiplication operations” in line 2. However, it is indefinite as to whether these “one or more post matrix multiplication 
Claim 22 recites the limitation “the one or more post matrix multiplication operations” in lines 3-4. However, it is indefinite as to whether this limitation has antecedent basis back to “one or more post matrix multiplication operation” in claim 21, line 2, or “one or more post matrix multiplication operations” in claim 22, line 2.
Claim 22 recites the limitation “the matrix multiplication” in line 5. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “a matrix multiplication” or as “the matrix multiplication operation”. For the purposes of prior art examination, Examiner is taking the latter possibility to be the case.
Claim 22 recites the limitation “the first processing unit in each processing tile” in line 3. However, there is insufficient antecedent basis for this limitation in the claims. In addition, the metes and bounds of the claim are indefinite in view of the recitation of “each processing tile”, given that the surrounding context of this limitation appears to be within the scope of “the” (i.e. a particular) processing tile. 

Claim 23 recites the limitation “maintain and output result” in line 11. However, it is indefinite as to whether “a result” or “results” are being maintained and output. For the purposes of prior art examination, Examiner is interpreting the former possibility to be the case. 
Claim 23 recites the limitation “the ML operation performed by the one or more processing units in the each processing tile” in lines 11-12. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been 
Claims 24-26 are rejected for failing to alleviate the rejections of claim 23 above.

Claim 27 recites the limitation “maintain and output result” in line 10. However, it is indefinite as to whether “a result” or “results” are being maintained and output. For the purposes of prior art examination, Examiner is interpreting the former possibility to be the case. 
Claim 27 recites the limitation “the ML operation performed by the one or more processing units in the each processing tile” in lines 10-11. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and one or more processing units in each processing tile has been recited, the claims did not previously recite an ML operation that was performed by the one or more processing units in the each processing tile.
Claim 27 recites “A system … comprising: a streaming engine configured to transmit a stream of data to an inference engine  … said inference engine comprising …” in lines 1-4. However, it is indefinite as to whether the metes and bounds of the claim are such that the recited system only comprises one element (a streaming engine), despite a system being a set of things working together, or whether the system comprises the recited inference engine as well.

Claim 28 recites the limitation “maintain and output result” in line 11. However, it is indefinite as to whether “a result” or “results” are being maintained and output. For the purposes of prior art examination, Examiner is interpreting the former possibility to be the case. 
Claim 28 recites the limitation “the ML operation performed by the one or more processing units in the each processing tile” in lines 11-12. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and one or more processing units in each processing tile has been recited, the claims did not previously recite an ML operation that was performed by the one or more processing units in the each processing tile.
Claims 29-30 are rejected for failing to alleviate the rejections of claim 28 above.

Claim 31 recites the limitation “an inference engine” in line 4. However, it is indefinite as to whether this inference engine is the same as or different from “an inference engine” as recited in claim 31, line 2. If the same, antecedent basis language should be used for clarity.
Claim 31 recites the limitation “maintain and output result” in line 10. However, it is indefinite as to whether “a result” or “results” are being maintained and output. For the purposes of prior art examination, Examiner is interpreting the former possibility to be the case. 
Claim 31 recites the limitation “the ML operation performed by the one or more processing units in the each processing tile” in lines 10-11. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and one or more processing units in each processing tile has been recited, the claims did not previously recite an ML operation that was performed by the one or more processing units in the each processing tile.
Claim 31 recites the limitation “the second plurality of sub-tasks” in lines 12-13. However, there is insufficient antecedent basis for this limitation in the claims. In addition, it is 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-3, 7-9, 13-15, 19, and 28-29 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Lie et al. (Lie) (US 20180314941 A1).
Consider claim 1, Lie discloses an array-based inference engine configured to perform a machine learning (ML) operation on an input data stream ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns (FIG. 4, which shows processing elements 499 in rows and columns), wherein each processing tile of the plurality of processing tiles comprises at least one or more of an on-chip memory (OCM) ([0527], lines 2-3, Memory 854, RF 842, Qs 897, and D-Store 848) configured to receive and maintain data from the input data stream for local access by components in the each processing tile ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 

Consider claim 2, Lie discloses the data maintained in the OCM in the each processing tile is retrieved from and/or written to directly by the first processing unit and/or second processing unit in the each processing tile via at least one read port and/or at least one write port of the OCM, respectively ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations); note that memory has one or more ports from which data is received or sent).

Consider claim 3, Lie discloses the input data stream includes data to be analyzed and inferred by the inference engine and/or training data used to train the inference engine for the ML operation ([0488], lines 2-3, training data is applied to the PEs; [0460], lines 1-2, neural network training and inference).

Consider claim 7, Lie discloses one or more processing blocks each including a set of the processing tiles coupled to one another via a routing element, wherein the processing blocks are coupled to one another via one or more routing elements (Figure 5, router 510; [0495], lines 11-12, square-organized section or a rectangular-organized section of PEs).

Consider claim 8, Lie discloses the first type of computation task of the ML operation is a dense and/or regular computation task ([0553], line 2, dense wavelet).

Consider claim 9, Lie discloses the second type of computation task of the ML operation is a sparse and/or irregular computation task ([0548], line 2, sparse wavelet).

Consider claim 13, Lie discloses a method to support perform a machine learning (ML) operation on an input data stream via an inference engine ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: receiving and maintaining data from the input data stream for local access ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-

Consider claim 14, Lie discloses retrieving from and/or writing the data directly to the OCM in the each processing tile by the first processing unit and/or second processing unit in the each processing tile via at least one read port and/or at least one write port of the OCM, respectively ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) 

Consider claim 15, Lie discloses including in the input data stream data to be analyzed and inferred by the inference engine and/or training data used to train the inference engine for the ML operation ([0488], lines 2-3, training data is applied to the PEs; [0460], lines 1-2, neural network training and inference).

Consider claim 19, Lie discloses organizing one or more of the plurality of processing tiles into one of one or more processing blocks, wherein the one or more of the plurality of processing tiles in the processing block are coupled to one another via a routing element, wherein the processing blocks are coupled to one another via one or more routing elements (Figure 5, router 510; [0495], lines 11-12, square-organized section or a rectangular-organized section of PEs).

Consider claim 28, Lie discloses a method to perform a machine learning (ML) operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: dividing the ML operation into a first plurality of sub-tasks and a second plurality of sub-tasks via a core ([0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information 

Consider claim 29, Lie discloses programming the core and/or the inference engine via a set of programming instructions ([0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); [0467], lines 4-5, the placement programs are stored in CRM 152 and executed by CPUs 151).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4-6, 16-18, and 30-31 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie (in the case of claims 4-6, as applied to claim 1; in the case of claims 16-18, as applied to claim 13; in the case of claim 30, as applied to claim 28 above), and further in view of Nemirovsky et al. (Nemirovsky) (US 20080040577 A1).
Consider claim 4, Lie discloses each processing tile is programmable by a set of programming instructions in the inference engine ([0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122)).

Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the invention of Lie in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (load/store streaming) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, entailing load/store streaming), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Nemirovsky’s teaching of load/store streaming, when applied to the invention of Lie which entails each processing tile is programmable by a set of programming instructions in the inference engine, results in the overall claimed limitation.

Consider claim 5, the overall combination entails the set of programming instructions is configured to program the first processing unit and/or the second processing unit in the processing tile to perform one or more of: loading the data into the first processing unit and/or the second processing unit, performing the operations on the data by the first processing unit and/or the second processing unit, and writing output of the operations into the associated OCM of the processing tile (Lie, [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)).

Consider claim 6, the overall combination entails each processing tile is programmed to load, process, and output the input data stream and/or the output data stream via one streaming instruction, wherein the input data stream and/or the output data stream each comprises a plurality of data (Lie, [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations); [0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); Nemirovsky, [0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction).

Consider claim 16, Lie discloses programming each processing tile by a set of programming instructions in the inference engine ([0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling 
Moreover, to any extent to which Lie does not disclose the aforementioned instructions are streamed to the aforementioned inference engine, Nemirovsky explicitly discloses load/store streaming ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the invention of Lie in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (load/store streaming) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, entailing load/store streaming), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Nemirovsky’s teaching of load/store streaming, when 

Consider claim 17, the overall combination entails programming the first processing unit and/or the second processing unit in the processing tile via a set of programming instructions to perform one or more of: loading the data into the first processing unit and/or the second processing unit, performing the operations on the data by the first processing unit and/or the second processing unit, and writing output of the operations into the OCM of the processing tile (Lie, [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)).

Consider claim 18, the overall combination entails programming each processing tile to load, process, and output the input data stream and/or the output data stream via one streaming instruction, wherein the input data stream and/or the output data stream each comprises a plurality of data (Lie, [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations); [0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 

Consider claim 30, Lie discloses transmitting the stream of data to the inference engine ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference).
However, Lie does not disclose the aforementioned transmitting being performed via a single load instruction.
On the other hand, Nemirovsky discloses a single load instruction to perform transmitting ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the invention of Lie in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (a single load instruction to perform transmitting) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, further entailing a single load instruction to perform the 

Consider claim 31, Lie discloses a method to perform a machine learning (ML) operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: transmitting a stream of data to an inference engine for the ML operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference); performing the ML operation via an inference engine comprising a plurality of processing tiles ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference; FIG. 4, which shows processing elements 499 in rows and columns), wherein each processing tile of the plurality of processing tiles includes an on-chip memory (OCM) ([0527], lines 2-3, Memory 854, RF 842, Qs 897, and D-Store 848) and one or more processing units ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)), wherein the OCM is configured to receive and maintain a stream of data for local access by the one or more processing units in the each processing tile ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to 
However, Lie does not disclose the aforementioned transmitting being performed via a single load instruction.
On the other hand, Nemirovsky discloses a single load instruction to perform transmitting ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the invention of Lie in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (a single load instruction to perform transmitting) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, further entailing a single load instruction to perform the .

Claims 10-12 and 20-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie as applied to claims 8 and 13 above, and further in view of Jang et al. (Jang) (US 5481487).
Consider claim 10, Lie does not disclose the first processing unit in each processing tile is configured to perform a matrix multiplication operation on the data in the OCM of the processing tile.
On the other hand, Jang discloses a circuit that performs a matrix multiplication operation on data (FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data).
Jang’s teaching increase functionality by supporting matrix multiplication operations.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Jang with the invention of Lie in order to support matrix multiplication operations. Alternatively, this modification merely entails applying a known technique (a circuit to perform matrix multiplication operation on data) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, entailing a circuit to perform matrix multiplication operation on data), which is a rationale that may support a conclusion of obviousness as per 

Consider claim 11, the combination thus far entails the second processing unit in each processing tile is configured to perform one or more post matrix multiplication operation on output from the matrix multiplication operation by the first processing unit in the same processing tile (Jang, FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data).

Consider claim 12, the combination thus far entails the inference engine is configured to integrate one or more post matrix multiplication operations with the matrix multiplication operation by the first processing unit in each processing tile so that the one or more post matrix multiplication operations are performed immediately on the output from the matrix multiplication by the first processing unit without having to transmit and save the output to the OCM first and to read from the OCM again for these post matrix multiplication operations (Jang, FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and 

Consider claim 20, Lie does not disclose performing a matrix multiplication operation on the data in the OCM of the processing tile via the first processing unit in the processing tile.
On the other hand, Jang discloses a circuit that performs a matrix multiplication operation on data (FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data).
Jang’s teaching increase functionality by supporting matrix multiplication operations.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Jang with the invention of Lie in order to support matrix multiplication operations. Alternatively, this modification merely entails applying a known technique (a circuit to perform matrix multiplication operation on data) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, entailing a circuit to perform matrix multiplication operation on data), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Jang’s teaching of a circuit that performs a matrix multiplication operation on data, when applied to the invention of Lie which entails a processing unit in each processing tile processing data in the OCM of the processing tile, results in the overall claimed limitation. 

Consider claim 21, the combination thus far entails performing one or more post matrix multiplication operation on output from the matrix multiplication operation by the first processing unit in the same processing tile (Jang, FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data).

Consider claim 22, the combination thus far entails integrating one or more post matrix multiplication operations with the matrix multiplication operation by the first processing unit in each processing tile so that the one or more post matrix multiplication operations are performed immediately on the output from the matrix multiplication by the first processing unit without having to transmit and save the output to the OCM first and to read from the OCM again for these post matrix multiplication operations (Jang, FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data; note that block 40 is performed immediately on the output of block 35).

Claims 23-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie et al. (Lie) (US 20180314941 A1) in view of Achilles et al. (Achilles) (US 20110307890 A1).
Consider claim 23, Lie discloses a system configured to perform a machine learning (ML) operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), 
However, Lie does not disclose the first plurality of sub-tasks are executed by the core.
On the other hand, Achilles discloses executing by a core in tandem with executing by an accelerator ([0032], lines 1-4, Special-purpose accelerators are well-known devices used to provide an efficient method of offloading computationally intensive tasks from the general-purpose processor (e.g., CPU or microprocessor); [0037], line 4, blend hardware acceleration with software).
Achilles’s teaching results in improvements in throughput, latency, and quality (Achilles, [0037], lines 1-3). 
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Achilles with the invention of Lie in order to result in improvements in throughput, latency, and quality. Note that the teaching of Achilles which entails executing by a core in tandem with executing by an 

Consider claim 24, the overall combination entails the core and/or the inference engine are programmable via a set of programming instructions (Lie, [0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); [0467], lines 4-5, the placement programs are stored in CRM 152 and executed by CPUs 151).

Consider claim 25, the overall combination entails the second plurality sub-tasks includes one or more of a dense and/or regular computation task (Lie, [0553], line 2, dense wavelet) and a sparse and/or irregular computation task of the ML operation (Lie, [0548], line 2, sparse wavelet).

Claim 26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie and Achilles as applied to claim 23 above, and further in view of Anderson et al. (Anderson) (US 20150019836) in view of Nemirovsky et al. (Nemirovsky) (US 20080040577 A1).
Consider claim 26, the combination thus far discloses receiving the second plurality of sub-tasks from the core (Lie, [0483], lines 2-3, Task SW on PEs 260 conceptually represents distributed SW executed as tasks on various PEs of PEs 122, [0531], line 2, processing a wavelet for task initiation), and transmitting the second plurality of sub-tasks and the stream of data to the inference engine ([0483], lines 2-3, Task SW on PEs 260 conceptually represents distributed SW executed as tasks on various PEs of PEs 122, [0531], line 2, processing a wavelet for task initiation; [0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data).
However, the combination thus far does not entail a streaming engine to perform the aforementioned receiving and transmitting, and a single load instruction to perform the aforementioned transmitting.
On the other hand, Anderson discloses a streaming engine ([0027], line 1, streaming engines).
Anderson’s teaching frees memory fetch tasks from the corresponding CPU enabling other processing functions (Anderson, [0026], lines 19-21).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Anderson with the combination of Lie and Achilles in order to free memory fetch tasks from the corresponding CPU enabling other processing functions.

On the other hand, Nemirovsky discloses a single load instruction to perform transmitting ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the combination of Lie, Achilles, and Anderson in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (a single load instruction to perform transmitting) to a known device (method, or product) ready for improvement (the combination of Lie, Achilles, and Anderson which entails transmitting to the inference engine by a streaming engine) to yield predictable results (the combination of Lie, Achilles, and Anderson, entailing a single load instruction to perform the transmitting to the inference engine by a streaming engine), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Nemirovsky’s teaching of a single load instruction to perform transmitting, when applied to the combination of Lie, Achilles, and Anderson which entails transmitting to the inference engine by a streaming engine, results in the overall claimed limitation.

Claim 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie in view of Anderson et al. (Anderson) (US 20150019836) in view of Nemirovsky et al. (Nemirovsky) (US 20080040577 A1).
Consider claim 27, Lie discloses a system configured to perform a machine learning (ML) operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: transmitting a stream of data to an inference engine for the ML operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference); said inference engine comprising a plurality of processing tiles ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference; FIG. 4, which shows processing elements 499 in rows and columns), wherein each processing tile of the plurality of processing tiles includes an on-chip memory (OCM) ([0527], lines 2-3, Memory 854, RF 842, Qs 897, and D-Store 848) and one or more processing units ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)), wherein the OCM is configured to receive and maintain the stream of data for local access by the one or more processing units in the each processing tile ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)); maintain and output result of the ML operation performed by the one or more processing units in the each processing tile as an output data stream ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of 
However, Lie does not entail a streaming engine to perform the aforementioned transmitting, and a single load instruction to perform the aforementioned transmitting.
On the other hand, Anderson discloses a streaming engine ([0027], line 1, streaming engines).
Anderson’s teaching frees memory fetch tasks from the corresponding CPU enabling other processing functions (Anderson, [0026], lines 19-21).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Anderson with the invention of Lie in order to free memory fetch tasks from the corresponding CPU enabling other processing functions.
However, the combination thus far does not entail a single load instruction to perform the aforementioned transmitting.
On the other hand, Nemirovsky discloses a single load instruction to perform transmitting ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Goyal et al. (2017/0316312) is cited for the teaching of a first processing unit (see one of the plurality of multipliers 408) configured to perform a dense computation task of the ML operation on the data in the OCM (See each TE 104 includes a plurality of multipliers 408 for performing dense matrix multiplication operations in [0025] ) and a second processing unit (see another of the plurality of multipliers 408 as a PE) to perform a sparse computation task of the ML operation on the data in the OCM and/or from the POD. (See each TE 104 includes a plurality of multipliers 408 for performing a plurality 
Xu (8,117,137) is cited for the teaching of an array-based processing tiles for machine learning (see fig.3 PEO-PEn; col.3, lines 52-64, col.10, lines 62-64), which is relevant to the claimed array-based inference engine configured to perform a machine learning operation, comprising a plurality of processing tiles.
Bruestle et al. (2017/0323224) is cited for an array of processing elements for machine learning (see para [0021], fig.3 [execution units 320)]), which is relevant to the claimed array-based inference engine configured to perform a machine learning operation.
Hawkins et al. (8,175,981) is cited for an array of processing elements of rows and columns (see fig.3C, col.23, lines 29-39), which is relevant to the claimed two-dimensional array of a plurality of rows and a plurality of columns.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH E VICARY whose telephone number is (571)270-1314. The examiner can normally be reached Monday to Friday, 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEITH E VICARY/Primary Examiner, Art Unit 2182