DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-31 are pending in this office action and presented for examination. Claims 1-14, 16-20, 22-23, 25, 27-28, and 31 are newly amended by the response received May 27, 2022. 

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. Examiner submits that processing different computational tasks was well known before the effective filing date of the claimed invention. 

Drawings
The drawings are objected to because:
In amended FIG. 1A, it is unclear as to whether PCIE 210 is pointing to a logical boundary (separating the left from the right) (in which case it is unclear as to how PCIE is a logical boundary), or whether the dashed line is itself a PCIE bus oriented north-south (in which case it is unclear as to whether this bus is connected to any other elements of the figure), or whether the figure is trying to convey something different.
In amended FIG. 1A, reference characters 190 and 120 appear to be directed to the same block, as are reference characters 190 and 140. To the extent that reference character 190 is not directed to either block, reference character 190 would then not be associated with a lead line or underlining. 
In amended FIG. 1B, it is unclear as to whether PCIE 210 is pointing to a logical boundary (separating the left from the right) (in which case it is unclear as to how PCIE is a logical boundary), or whether the dashed line is itself a PCIE bus oriented north-south (in which case it is unclear as to whether this bus is connected to any other elements of the figure), or whether the figure is trying to convey something different.
In amended FIG. 1B, reference characters 190 and 120 appear to be directed to the same block, as are reference characters 190 and 140. To the extent that reference character 190 is not directed to either block, reference character 190 would then not be associated with a lead line or underlining. 
In amended FIG. 3A, it is unclear as to whether HOST 110 is connected to PCI Controller 210.
In amended FIG. 3A, it is unclear as to the intent of the “TO” text above block 110.
MPEP 608.02, section V, states that “[l]ead lines are required for each reference character except for those which indicate the surface or cross section on which they are placed. Such a reference character must be underlined to make it clear that a lead line has not been left out by mistake." However, in amended FIG. 3A, reference character 110 is neither underlined nor associated with a lead line. 
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Objections
Claims 11-12, 21-22, 25, and 27-31 are objected to because of the following informalities.  Appropriate correction is required.
Claim 11 recites the limitation “one or more post matrix multiplication operation” in line 4. However, this limitation should presumably be “one or more post matrix multiplication operations”.
Claim 12 is objected to for failing to alleviate the objection of claim 11 above.

Claim 21 recites the limitation “one or more post matrix multiplication operation” in line 2. However, this limitation should presumably be “one or more post matrix multiplication operations”.
Claim 22 is objected to for failing to alleviate the objection of claim 21 above.

Claim 25 recites the limitation “the second plurality sub-tasks” in line 2. However, there is insufficient antecedent basis for this limitation in the claims. For the purposes of prior art examination, Examiner is interpreting this limitation as “the second plurality of sub-tasks”. 

In claim 27, line 18, an “and” appears to be missing at the end of the line (before the last wherein clause).

In claim 28, line 16, an “and” appears to be missing at the end of the line (before the last wherein clause).
Claims 29-30 are objected to for failing to alleviate the objection of claim 28 above. 

In claim 31, line 18, an “and” appears to be missing at the end of the line (before the last wherein clause).

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-22 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 4-12, and 14-19 of U.S. Patent No. 10824433. Although the claims at issue are not identical, they are not patentably distinct from each other because all the limitations of each of the aforementioned instant claims are taught by a corresponding claim of the ‘433 patent. As an exemplary case, see the table below, wherein standard-format limitations in the left column correlate to italicized limitations in the right column.

Claim 1 of Instant Application: 16948867
Claim 1 of Patent: 10824433
1. An array-based inference engine configured to perform a machine learning (ML) operation on an input data stream, comprising: 
1. An array-based inference engine configured to perform a machine learning (ML) operation on an input data stream, comprising: 
a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns, wherein at least one processing tile of the plurality of processing tiles comprises at least one or more of 
a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns, wherein each processing tile of the plurality of processing tiles comprises at least one or more of 
an on-chip memory (OCM) configured to receive and maintain data from the input data stream for local access by components in the at least one processing tile; 
an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the each processing tile; 
maintain result of one ML operation performed by the at least one processing tile; and output the result of the one ML operation performed by the at least one processing tile as an output data stream; 
maintain and output result of the ML operation performed by the each processing tile as an output data stream; 
a first processing unit configured to perform a first type of computation task of the one ML operation on the data in the OCM; and 
a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM; and 
a second processing unit configured to perform a second type of computation task of the one ML operation on the data in the OCM and/or data from the first processing unit.
a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD, the plurality of processing tiles are organized into a plurality of processing blocks, and wherein the OCMs of the plurality of processing tiles in the same processing block are configured to support aligned-reads, wherein data allocated and maintained in the OCMs are retrieved directly by the corresponding PODs and/or PEs in the processing tiles via at least one read port in each of the OCMs.


All the limitations of instant claim 2 are taught by claim 1 of the ‘433 patent. (Note that the recited OCM, to receive data and output data, necessarily entails a read port and a write port.)
All the limitations of instant claim 3 are taught by claim 2 of the ‘433 patent.
All the limitations of instant claim 4 are taught by claim 4 of the ‘433 patent.
All the limitations of instant claim 5 are taught by claim 5 of the ‘433 patent.
All the limitations of instant claim 6 are taught by claim 6 of the ‘433 patent.
All the limitations of instant claim 7 are taught by claim 7 of the ‘433 patent.
All the limitations of instant claim 8 are taught by claim 1 of the ‘433 patent.
All the limitations of instant claim 9 are taught by claim 1 of the ‘433 patent.
All the limitations of instant claim 10 are taught by claim 8 of the ‘433 patent.
All the limitations of instant claim 11 are taught by claim 9 of the ‘433 patent.
All the limitations of instant claim 12 are taught by claim 10 of the ‘433 patent.
All the limitations of instant claim 13 are taught by claim 11 of the ‘433 patent.
All the limitations of instant claim 14 are taught by claim 11 of the ‘433 patent. (Note that the recited OCM, to receive data and output data, necessarily entails a read port and a write port.)
All the limitations of instant claim 15 are taught by claim 12 of the ‘433 patent.
All the limitations of instant claim 16 are taught by claim 4 (or 14) of the ‘433 patent.
All the limitations of instant claim 17 are taught by claim 5 (or 14) of the ‘433 patent.
All the limitations of instant claim 18 are taught by claim 6 (or 15) of the ‘433 patent.
All the limitations of instant claim 19 are taught by claim 16 of the ‘433 patent.
All the limitations of instant claim 20 are taught by claim 17 of the ‘433 patent.
All the limitations of instant claim 21 are taught by claim 18 of the ‘433 patent.
All the limitations of instant claim 22 are taught by claim 19 of the ‘433 patent.

Claims 23-25 and 28-31 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 4, 11, 14, and 15 of U.S. Patent No. 10824433 in view of Achilles et al. (Achilles) (US 20110307890 A1).
Regarding the additional limitation that claim 23 recites but is not taught by claim 11 of the ‘433 patent, Achilles is relied upon to render obvious this additional limitation in an analogous manner as Achilles was relied upon in the rejection of claim 23 under 35 USC 103 below; see the citations in Achilles and corresponding rationale for obviousness in the rejection of claim 23 under 35 USC 103 below.
Except for the limitation that Achilles is relied upon to render obvious, all the limitations of instant claim 24 are taught by claims 4 (or 14) of the ‘433 patent.
Except for the limitation that Achilles is relied upon to render obvious, all the limitations of instant claim 25 are taught by claim 11 of the ‘433 patent.

Regarding the additional limitation that claim 28 recites but is not taught by claim 11 of the ‘433 patent, Achilles is relied upon to render obvious this additional limitation in an analogous manner as Achilles was relied upon in the rejection of claim 23 under 35 USC 103 below; see the citations in Achilles and corresponding rationale for obviousness in the rejection of claim 23 under 35 USC 103 below.
Except for the limitation that Achilles is relied upon to render obvious, all the limitations of instant claim 29 are taught by claims 4 (or 14) of the ‘433 patent.
Except for the limitation that Achilles is relied upon to render obvious, all the limitations of instant claim 30 are taught by claim 15 of the ‘433 patent.

Except for the limitation that Achilles is relied upon to render obvious, all the limitations of instant claim 31 are taught by claim 15 of the ‘433 patent.

Claim 26 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 15 (which is indirectly dependent on claim 11) of U.S. Patent No. 10824433 and Achilles et al. (Achilles) (US 20110307890 A1) as applied to claim 23 above, and further in view of Anderson et al. (Anderson) (US 20150019836).
Regarding the additional limitation that claim 26 recites but is not taught by claim 15 of the ‘433 patent and Achilles, Anderson is relied upon to render obvious this additional limitation in an analogous manner as Anderson was relied upon in the rejection of claim 26 under 35 USC 103 below; see the citations in Anderson and corresponding rationale for obviousness in the rejection of claim 26 under 35 USC 103 below.

Claim 27 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 15 of U.S. Patent No. 10824433 in view of Anderson et al. (Anderson) (US 20150019836) in view of Achilles et al. (Achilles) (US 20110307890 A1).
Regarding the additional limitations that claim 27 recites but is not taught by claim 15 of the ‘433 patent, Anderson and Achilles are relied upon to render obvious these additional limitations in an analogous manner as Anderson and Achilles were relied upon in the rejection of claim 27 under 35 USC 103 below; see the citations in Anderson and Achilles and corresponding rationale for obviousness in the rejection of claim 27 under 35 USC 103 below.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-31 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 1 recites the limitation “at least one processing tile of the plurality of processing tiles comprises at least one or more of …” in lines 4-5. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for just one processing tile of the plurality of processing tiles comprising that which is recited in claim 1, lines 6-17, which is a scenario encompassed by the claim in view of the recited “at least one” language.
Claims 2-12 are rejected for failing to alleviate the rejection of claim 1 above.

Claim 4 recites the limitation “the at least one processing tile is programmable by a set of programming instructions streamed to the array-based inference engine” in lines 3-4. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for just one processing tile of the plurality of processing tiles being programmable by a set of programming instructions streamed to the array-based inference engine, which is a scenario encompassed by the claim in view of the recited “at least one” language.
Claims 5-6 are rejected for failing to alleviate the rejection of claim 4 above.

Claim 6 recites the limitation “the at least one processing tile is programmed to load and process the input data stream and/or the output data stream via one streaming instruction” in lines 3-5. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for the at least one processing tile being programmed to load and process the output data stream. 

Claim 13 recites the limitation “receiving and maintaining data from the input data stream by an on-chip memory (OCM) for local access by at least one processing tile of a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns in the inference engine” in lines 3-6. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for receiving and maintaining data from the input data stream by an on-chip memory (OCM) for local access by more than one processing tile of the plurality of processing tiles, which is a scenario encompassed by the claim in view of the recited “at least one” language.
Claim 13 recites the limitation “performing a first type of computation task of the ML operation on the data in the OCM via a first processing unit in the at least one processing tile” in lines 7-8. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for performing a first type of computation task of the ML operation on the data in the OCM via a first processing unit in more than one processing tile of the plurality of processing tiles, which is a scenario encompassed by the claim in view of the recited “at least one” language.
Claim 13 recites the limitation “performing a second type of computation task of the ML operation on the data in the OCM and/or from data processed by the first processing unit via a second processing unit in the at least one processing tile” in lines 9-11. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for performing a second type of computation task of the ML operation on the data in the OCM and/or from data processed by the first processing unit via a second processing unit in more than one processing tile of the plurality of processing tiles, which is a scenario encompassed by the claim in view of the recited “at least one” language.
Claim 13 recites the limitation “maintaining result of the ML operation performed by the at least one processing tile in the OCM” in lines 12-14. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for maintaining result of the ML operation performed by more than one processing tile in the [same] OCM, which is a scenario encompassed by the claim in view of the recited “at least one” language.
Claims 14-22 are rejected for failing to alleviate the rejections of claim 13 above.

Claim 16 recites the limitation “programming the at least one processing tile by a set of programming instructions streamed to the inference engine” in lines 2-3. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for programming just one processing tile by a set of programming instructions streamed to the inference engine, which is a scenario encompassed by the claim in view of the recited “at least one” language.
Claims 17-18 are rejected for failing to alleviate the rejection of claim 16 above.

Claim 18 recites the limitation “programming the at least one processing tile to load and process the input data stream and/or the output data stream via one streaming instruction” in lines 2-4. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for the at least one processing tile being programmed to load and process the output data stream. 

Claim 23 recites the limitation “at least one processing tile of the plurality of processing tiles includes an on-chip memory (OCM) and one or more processing units, wherein the OCM is configured to  …” in lines 7-9. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for just one processing tile of the plurality of processing tiles including that which is recited in claim 23, line 8, which is a scenario encompassed by the claim in view of the recited “at least one” language.
Claims 24-26 are rejected for failing to alleviate the rejection of claim 23 above.

Claim 27 recites the limitation “at least one processing tile of the plurality of processing tiles includes an on-chip memory (OCM) and one or more processing units, wherein the OCM is configured to  …” in lines 10-12. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for just one processing tile of the plurality of processing tiles including that which is recited in claim 27, line 11, which is a scenario encompassed by the claim in view of the recited “at least one” language.

Claim 28 recites the limitation “at least one processing tile of the plurality of processing tiles includes an on-chip memory (OCM) and one or more processing units, wherein the OCM is configured to  …” in lines 7-10. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for just one processing tile of the plurality of processing tiles including that which is recited in claim 28, lines 8-9, which is a scenario encompassed by the claim in view of the recited “at least one” language.
Claims 29-30 are rejected for failing to alleviate the rejection of claim 28 above.

Claim 31 recites the limitation “at least one processing tile of the plurality of processing tiles includes an on-chip memory (OCM) and one or more processing units, wherein the OCM is configured to  …” in lines 9-12. However, the original disclosure does not appear to provide support for this limitation. For example, the original disclosure does not appear to provide support for just one processing tile of the plurality of processing tiles including that which is recited in claim 31, lines 10-11, which is a scenario encompassed by the claim in view of the recited “at least one” language.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-31 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation “maintain result” in line 9. However, it is indefinite as to whether “a result” or “results” are being maintained. For the purposes of prior art examination, Examiner is interpreting the former possibility to be the case. 
Claims 2-12 are rejected for failing to alleviate the rejection of claim 1 above. 

Claim 5 recites the limitation “performing the one ML operation on the data by the first processing unit and/or the second processing unit” in lines 6-7. Claim 1, upon which claim 5 is indirectly dependent, recites the limitation “a first processing unit configured to perform a first type of computation task of the one ML operation on the data in the OCM; and a second processing unit configured to perform a second type of computation task of the one ML operation on the data in the OCM and/or data from the first processing unit” in lines 13-17. Claim 5 appears to encompass the scenario wherein the one ML operation is performed by just the first processing unit or just the second processing unit, in view of the “and/or” language. However, claim 1 appears to convey that the one ML operation is performed by both the first processing unit and the second processing unit. Therefore, the metes and bounds of claim 5 are indefinite, with respect to whether the claim encompasses the one ML operation being performed by just the first processing unit or just the second processing unit.
Claim 6 is rejected for failing to alleviate the rejection of claim 5 above.

Claim 7 recites the limitation “the one or more processing blocks are coupled to one another via one or more routing elements” in lines 4-5. However, in the scenario in which there is just one processing block (which is encompassed by the claim in view of the “one or more” language) it is indefinite as to what it means for just one processing block to be coupled to one another. 

Claim 12 recites the limitation “the one or more post matrix multiplication operation” in lines 3-4 (with a previously recited “operations” newly amended to read “operation”). It is indefinite as to whether the limitation is to be interpreted as “the post matrix multiplication operation” or as “the one or more post matrix multiplication operations”. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.
Claim 12 recites the limitation “the one or more post matrix multiplication operation is” in line 6 (with a previously recited “operations are” newly amended to read “operation is”). It is indefinite as to whether the limitation is to be interpreted as “the post matrix multiplication operation is” or as “the one or more post matrix multiplication operations are”. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.
Claim 12 recites the limitation “the matrix multiplication” in line 7. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “a matrix multiplication” or as “the matrix multiplication operation”. For the purposes of prior art examination, Examiner is taking the latter possibility to be the case.
Claim 12 recites the limitation “the one or more post matrix multiplication operation” in lines 9-10 (with a previously recited “operations” newly amended to read “operation”). It is indefinite as to whether the limitation is to be interpreted as “the post matrix multiplication operation” or as “the one or more post matrix multiplication operations”. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.

Claim 13 recites the limitation “maintaining result” in line 12. However, it is indefinite as to whether “a result” or “results” are being maintained and output. For the purposes of prior art examination, Examiner is interpreting the former possibility to be the case.
Claim 13 recites the limitation “the ML operation performed by the at least one processing tile” in lines 12-13. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and at least one processing tile has been recited, the claims did not previously recite an ML operation that was performed by the processing tile. 
Claim 13 recites the limitation “the at least one processing tile in the OCM” in lines 12-13. However, there is insufficient antecedent basis for this limitation in the claims.
Claims 14-22 are rejected for failing to alleviate the rejections of claim 13 above.

Claim 14 recites the limitation “the OCM in the at least one processing tile” in lines 2-3. However, there is insufficient antecedent basis for this limitation in the claims.

Claim 17 recites the limitation “the first processing unit and/or the second processing unit in the processing tile” in lines 2-3. However, there is insufficient antecedent basis for this limitation in the claims.
Claim 17 recites the limitation “the processing tile” in line 3. However, there is insufficient antecedent basis for this limitation in the claims.
Claim 17 recites the limitation “performing the ML operation on the data by the first processing unit and/or the second processing unit” in lines 5-6. Claim 13, upon which claim 17 is indirectly dependent, recites the limitation “performing a first type of computation task of the ML operation on the data in the OCM via a first processing unit in the at least one processing tile; performing a second type of computation task of the ML operation on the data in the OCM and/or from data processed by the first processing unit via a second processing unit in the at least one processing tile” in lines 7-11. Claim 17 appears to encompass the scenario wherein the ML operation is performed by just the first processing unit or just the second processing unit, in view of the “and/or” language. However, claim 13 appears to convey that the ML operation is performed by both the first processing unit and the second processing unit. Therefore, the metes and bounds of claim 17 are indefinite, with respect to whether the claim encompasses the ML operation being performed by just the first processing unit or just the second processing unit.
Claim 18 is rejected for failing to alleviate the rejections of claim 17 above.

Claim 19 recites the limitation “the one or more processing blocks are coupled to one another via one or more routing elements” in lines 5-6. However, in the scenario in which there is just one processing block (which is encompassed by the claim in view of the “one or more” language) it is indefinite as to what it means for just one processing block to be coupled to one another. 

Claim 20 recites the limitation “the OCM of the at least one processing tile” in lines 2-3. However, there is insufficient antecedent basis for this limitation in the claims.
Claims 21-22 are rejected for failing to alleviate the rejection of claim 20 above.

Claim 22 recites the limitation “the one or more post matrix multiplication operation” in lines 2-3 (with a previously recited “operations” newly amended to read “operation”). It is indefinite as to whether the limitation is to be interpreted as “the post matrix multiplication operation” or as “the one or more post matrix multiplication operations”. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.
Claim 22 recites the limitation “the one or more post matrix multiplication operation is” in lines 4-5 (with a previously recited “operations are” newly amended to read “operation is”). It is indefinite as to whether the limitation is to be interpreted as “the post matrix multiplication operation is” or as “the one or more post matrix multiplication operations are”. For the purposes of prior art examination, Examiner is taking the former possibility to be the case.
Claim 22 recites the limitation “the matrix multiplication” in line 6. However, there is insufficient antecedent basis for this limitation in the claims, and it is further indefinite as to whether this limitation is to be interpreted as “a matrix multiplication” or as “the matrix multiplication operation”. For the purposes of prior art examination, Examiner is taking the latter possibility to be the case.
Claim 22 recites the limitation “the post matrix multiplication operation” in line 8. However, there is insufficient antecedent basis for this limitation in the claims. 

Claim 23 recites the limitation “the ML operation performed by the one or more processing units in the at least one processing tile” in lines 13-14. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and one or more processing units in at least one processing tile has been recited, the claims did not previously recite an ML operation that was performed by the one or more processing units in the each processing tile.
Claims 24-26 are rejected for failing to alleviate the rejection of claim 23 above.

Claim 27 recites “A system … comprising: a core … a streaming engine configured to transmit a stream of data associated with the second plurality of sub-tasks to an inference engine  … and said inference engine comprising …” in lines 1-9. However, it is indefinite as to whether or not the system comprises the inference engine.
Claim 27 recites the limitation “the ML operation performed by the one or more processing units in the at least one processing tile” in lines 16-17. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and one or more processing units in the at least one processing tile has been recited, the claims did not previously recite an ML operation that was performed by the one or more processing units in the at least one processing tile.

Claim 28 recites the limitation “the ML operation performed by the one or more processing units in the at least one processing tile” in lines 14-15. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and one or more processing units in at least one processing tile has been recited, the claim did not previously recite an ML operation that was performed by the one or more processing units in the at least one processing tile.
Claims 29-30 are rejected for failing to alleviate the rejection of claim 28 above.

Claim 31 recites the limitation “an inference engine” in line 8. However, it is indefinite as to whether this inference engine is the same as or different from “an inference engine” as recited in claim 31, line 6. If the same, antecedent basis language should be used for clarity.
Claim 31 recites the limitation “the ML operation performed by the one or more processing units in the at least one processing tile” in lines 16-17. However, there is insufficient antecedent basis for this limitation in the claims. Note that while an ML operation has been previously recited, and one or more processing units in the at least one processing tile has been recited, the claims did not previously recite an ML operation that was performed by the one or more processing units in the at least one processing tile.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-3, 7-9, 13-15, 19, and 28-29 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Lie et al. (Lie) (US 20180314941 A1).
Consider claim 1, Lie discloses an array-based inference engine configured to perform a machine learning (ML) operation on an input data stream ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns (FIG. 4, which shows processing elements 499 in rows and columns), wherein at least one processing tile of the plurality of processing tiles comprises at least one or more of an on-chip memory (OCM) ([0527], lines 2-3, Memory 854, RF 842, Qs 897, and D-Store 848) configured to receive and maintain data from the input data stream for local access by components in the at least one processing tile ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)); maintain result of one ML operation performed by the at least one processing tile; and output the result of the one ML operation performed by the at least one processing tile as an output data stream ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)); a first processing unit configured to perform a first type of computation task of the one ML operation on the data in the OCM ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)); and a second processing unit configured to perform a second type of computation task of the one ML operation on the data in the OCM and/or data from the first processing unit ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)).

Consider claim 2, Lie discloses the OCM includes a read port and a write port, wherein the first processing unit or the second processing unit retrieves data from the OCM via the read port and wherein the first processing unit or the second processing unit writes data to the OCM via the write port ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations); note that memory has one or more ports from which data is received or sent).

Consider claim 3, Lie discloses the input data stream includes data to be analyzed and inferred by the array-based inference engine and/or training data used to train the array-based inference engine for the one ML operation ([0488], lines 2-3, training data is applied to the PEs; [0460], lines 1-2, neural network training and inference).

Consider claim 7, Lie discloses one or more processing blocks each including a set of the plurality of processing tiles coupled to one another via a routing element, wherein the one or more processing blocks are coupled to one another via one or more routing elements (Figure 5, router 510; [0495], lines 11-12, square-organized section or a rectangular-organized section of PEs).

Consider claim 8, Lie discloses the first type of computation task of the one ML operation is a dense and/or regular computation task ([0553], line 2, dense wavelet).

Consider claim 9, Lie discloses the second type of computation task of the one ML operation is a sparse and/or irregular computation task ([0548], line 2, sparse wavelet).

Consider claim 13, Lie discloses a method to perform a machine learning (ML) operation on an input data stream via an inference engine ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: receiving and maintaining data from the input data stream ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)) by an on-chip memory (OCM) ([0527], lines 2-3, Memory 854, RF 842, Qs 897, and D-Store 848) for local access by at least one processing tile of a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns in the inference engine  (FIG. 4, which shows processing elements 499 in rows and columns); performing a first type of computation task of the ML operation on the data in the OCM via a first processing unit in the at least one processing tile ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)); performing a second type of computation task of the ML operation on the data in the OCM and/or from data processed by the first processing unit via a second processing unit in the at least one processing tile ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)); and maintaining result of the ML operation performed by the at least one processing tile in the OCM and further outputting the result as an output data stream from the OCM ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844).

Consider claim 14, Lie discloses retrieving from and/or writing the data directly to the OCM in the at least one processing tile by the first processing unit and/or second processing unit in the at least one processing tile via at least one read port and/or at least one write port of the OCM, respectively ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations); note that memory has one or more ports from which data is received or sent).

Consider claim 15, Lie discloses including in the input data stream data to be analyzed and inferred by the inference engine and/or training data used to train the inference engine for the ML operation ([0488], lines 2-3, training data is applied to the PEs; [0460], lines 1-2, neural network training and inference).

Consider claim 19, Lie discloses organizing one or more of the plurality of processing tiles into one of one or more processing blocks, wherein the one or more of the plurality of processing tiles in the one or more processing blocks are coupled to one another via a routing element, wherein the one or more processing blocks are coupled to one another via one or more routing elements (Figure 5, router 510; [0495], lines 11-12, square-organized section or a rectangular-organized section of PEs).

Consider claim 28, Lie discloses a method to perform a machine learning (ML) operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: dividing the ML operation into a first plurality of sub-tasks and a second plurality of sub-tasks via a core ([0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); and performing the first plurality of sub-tasks and/or a second plurality of sub-tasks of the ML operation via an inference engine comprising a plurality of processing tiles ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference; FIG. 4, which shows processing elements 499 in rows and columns), wherein at least one processing tile of the plurality of processing tiles includes an on-chip memory (OCM) ([0527], lines 2-3, Memory 854, RF 842, Qs 897, and D-Store 848) and one or more processing units ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)), wherein the OCM is configured to receive a stream of data and maintain the stream of data for local access by the one or more processing units in the at least one processing tile ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)); and maintain a result of the ML operation performed by the one or more processing units in the at least one processing tile and further output the result of the ML operation as an output data stream ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)), wherein the one or more processing units are configured to receive the second plurality of sub-tasks of the ML operation from the core ([0483], lines 2-3, Task SW on PEs 260 conceptually represents distributed SW executed as tasks on various PEs of PEs 122, [0531], line 2, processing a wavelet for task initiation); and perform the second plurality of sub-tasks of the ML operation on the stream of data maintained in the OCM ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)).

Consider claim 29, Lie discloses programming the core and/or the inference engine via a set of programming instructions ([0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); [0467], lines 4-5, the placement programs are stored in CRM 152 and executed by CPUs 151).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4-6, 16-18, and 30-31 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie (in the case of claims 4-6, as applied to claim 1; in the case of claims 16-18, as applied to claim 13; in the case of claim 30, as applied to claim 28 above), and further in view of Nemirovsky et al. (Nemirovsky) (US 20080040577 A1).
Consider claim 4, Lie discloses the at least one processing tile is programmable by a set of programming instructions in the array-based inference engine ([0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122)).
Moreover, to any extent to which Lie does not disclose the aforementioned instructions are streamed to the aforementioned array-based inference engine, Nemirovsky explicitly discloses load/store streaming ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the invention of Lie in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (load/store streaming) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, entailing load/store streaming), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Nemirovsky’s teaching of load/store streaming, when applied to the invention of Lie which entails at least one processing tile is programmable by a set of programming instructions in the array-based inference engine, results in the overall claimed limitation.

Consider claim 5, the overall combination entails the set of programming instructions is configured to program the first processing unit and/or the second processing unit in the at least one processing tile to perform one or more of: loading the data into the first processing unit and/or the second processing unit, performing the one ML operation on the data by the first processing unit and/or the second processing unit, and writing output of the one ML operation into the associated OCM of the at least one processing tile (Lie, [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)).

Consider claim 6, the overall combination entails the at least one processing tile is programmed to load and process the input data stream and/or the output data stream via one streaming instruction, wherein the input data stream and/or the output data stream each comprises a plurality of data (Lie, [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations); [0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); Nemirovsky, [0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction).

Consider claim 16, Lie discloses programming the at least one processing tile by a set of programming instructions in the inference engine ([0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122)).
Moreover, to any extent to which Lie does not disclose the aforementioned instructions are streamed to the aforementioned inference engine, Nemirovsky explicitly discloses load/store streaming ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the invention of Lie in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (load/store streaming) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, entailing load/store streaming), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Nemirovsky’s teaching of load/store streaming, when applied to the invention of Lie which entails programming at least one processing tile by a set of programming instructions in the inference engine, results in the overall claimed limitation.

Consider claim 17, the overall combination entails programming the first processing unit and/or the second processing unit in the processing tile via a set of programming instructions to perform one or more of: loading the data into the first processing unit and/or the second processing unit, performing the ML operation on the data by the first processing unit and/or the second processing unit, and writing output of the ML operation into the OCM of the at least one processing tile (Lie, [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)).

Consider claim 18, the overall combination entails programming the at least one processing tile to load and process the input data stream and/or the output data stream via one streaming instruction, wherein the input data stream and/or the output data stream each comprises a plurality of data (Lie, [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations); [0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); Nemirovsky, [0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction).

Consider claim 30, Lie discloses transmitting the stream of data to the inference engine ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference).
However, Lie does not disclose the aforementioned transmitting being performed via a single load instruction.
On the other hand, Nemirovsky discloses a single load instruction to perform transmitting ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the invention of Lie in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (a single load instruction to perform transmitting) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, further entailing a single load instruction to perform the transmitting to the inference engine), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Nemirovsky’s teaching of a single load instruction to perform transmitting, when applied to the invention of Lie which entails transmitting to the inference engine, results in the overall claimed limitation.

Consider claim 31, Lie discloses a method to perform a machine learning (ML) operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: dividing the ML operation into a first plurality of sub-tasks and a second plurality of sub-tasks via a core ([0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); transmitting a stream of data associated with the second plurality of sub-tasks to an inference engine for the ML operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference); and performing the ML operation via an inference engine comprising a plurality of processing tiles ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference; FIG. 4, which shows processing elements 499 in rows and columns), wherein at least one processing tile of the plurality of processing tiles includes an on-chip memory (OCM) ([0527], lines 2-3, Memory 854, RF 842, Qs 897, and D-Store 848) and one or more processing units ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)), wherein the OCM is configured to receive a stream of data and maintain the stream of data for local access by the one or more processing units in the at least one processing tile ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)); and maintain a result of the ML operation performed by the one or more processing units in the at least one processing tile and further to output the result of the ML operation as an output data stream ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)), wherein the one or more processing units are configured to perform the second plurality of sub-tasks of the ML operation on the stream of data maintained in the OCM ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)).
However, Lie does not disclose the aforementioned transmitting being performed via a single load instruction.
On the other hand, Nemirovsky discloses a single load instruction to perform transmitting ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the invention of Lie in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (a single load instruction to perform transmitting) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, further entailing a single load instruction to perform the transmitting to the inference engine), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Nemirovsky’s teaching of a single load instruction to perform transmitting, when applied to the invention of Lie which entails transmitting to the inference engine, results in the overall claimed limitation.

Claims 10-12 and 20-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie as applied to claims 8 and 13 above, and further in view of Jang et al. (Jang) (US 5481487).
Consider claim 10, Lie does not disclose the first processing unit in the at least one processing tile is configured to perform a matrix multiplication operation on the data in the OCM of the at least one processing tile.
On the other hand, Jang discloses a circuit that performs a matrix multiplication operation on data (FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data).
Jang’s teaching increase functionality by supporting matrix multiplication operations.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Jang with the invention of Lie in order to support matrix multiplication operations. Alternatively, this modification merely entails applying a known technique (a circuit to perform matrix multiplication operation on data) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, entailing a circuit to perform matrix multiplication operation on data), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Jang’s teaching of a circuit that performs a matrix multiplication operation on data, when applied to the invention of Lie which entails a processing unit in at least one processing tile processing data in the OCM of the at least one processing tile, results in the overall claimed limitation. 

Consider claim 11, the combination thus far entails the second processing unit in the at least one processing tile is configured to perform one or more post matrix multiplication operation on output from the matrix multiplication operation by the first processing unit (Jang, FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data).

Consider claim 12, the combination thus far entails the array-based inference engine is configured to integrate the one or more post matrix multiplication operation with the matrix multiplication operation by the first processing unit in the at least one processing tile so that the one or more post matrix multiplication operation is performed immediately on the output from the matrix multiplication by the first processing unit without having to transmit and save the output to the OCM first and to read from the OCM again for the one or more post matrix multiplication operation (Jang, FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data; note that block 40 is performed immediately on the output of block 35).

Consider claim 20, Lie does not disclose performing a matrix multiplication operation on the data in the OCM of the at least one processing tile via the first processing unit in the at least one processing tile.
On the other hand, Jang discloses a circuit that performs a matrix multiplication operation on data (FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data).
Jang’s teaching increase functionality by supporting matrix multiplication operations.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Jang with the invention of Lie in order to support matrix multiplication operations. Alternatively, this modification merely entails applying a known technique (a circuit to perform matrix multiplication operation on data) to a known device (method, or product) ready for improvement (the invention of Lie) to yield predictable results (the invention of Lie, entailing a circuit to perform matrix multiplication operation on data), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Jang’s teaching of a circuit that performs a matrix multiplication operation on data, when applied to the invention of Lie which entails a processing unit in at least one processing tile processing data in the OCM of the at least one processing tile, results in the overall claimed limitation. 

Consider claim 21, the combination thus far entails performing one or more post matrix multiplication operation on output from the matrix multiplication operation by the first processing unit in the same processing tile (Jang, FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data).

Consider claim 22, the combination thus far entails integrating the one or more post matrix multiplication operation with the matrix multiplication operation by the first processing unit in the at least one processing tile so that the one or more post matrix multiplication operation is performed immediately on the output from the matrix multiplication by the first processing unit without having to transmit and save the output to the OCM first and to read from the OCM again for the post matrix multiplication operation (Jang, FIG. 3, col. 5, lines 57-62, as shown, the 1-D DCT circuit 20 includes a first circuit containing pre-registers and an ALU 30 for preprocessing the data, multiplier and accumulators 35 for performing row-column matrix multiplication and a second circuit containing post-registers and an ALU 40 for post-processing the data; note that block 40 is performed immediately on the output of block 35).

Claims 23-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie et al. (Lie) (US 20180314941 A1) in view of Achilles et al. (Achilles) (US 20110307890 A1).
Consider claim 23, Lie discloses a system configured to perform a machine learning (ML) operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: a core configured to divide the ML operation into a first plurality of sub-tasks and a second plurality of sub-tasks ([0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); and an inference engine comprising a plurality of processing tiles ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference; FIG. 4, which shows processing elements 499 in rows and columns), wherein at least one processing tile of the plurality of processing tiles includes an on-chip memory (OCM) ([0527], lines 2-3, Memory 854, RF 842, Qs 897, and D-Store 848) and one or more processing units ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)), wherein the OCM is configured to receive a stream of data and maintain the stream of data for local access by the one or more processing units in the at least one processing tile ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)); and maintain a result of the ML operation performed by the one or more processing units in the at least one processing tile and further  to output the result of the ML operation as an output data stream ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)), wherein the one or more processing units are configured to receive the second plurality of sub-tasks of the ML operation from the core ([0483], lines 2-3, Task SW on PEs 260 conceptually represents distributed SW executed as tasks on various PEs of PEs 122, [0531], line 2, processing a wavelet for task initiation); and perform the second plurality of sub-tasks of the ML operation on the stream of data that is maintained in the OCM ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)).
However, Lie does not disclose the first plurality of sub-tasks are executed by the core.
On the other hand, Achilles discloses executing by a core in tandem with executing by an accelerator ([0032], lines 1-4, Special-purpose accelerators are well-known devices used to provide an efficient method of offloading computationally intensive tasks from the general-purpose processor (e.g., CPU or microprocessor); [0037], line 4, blend hardware acceleration with software).
Achilles’s teaching results in improvements in throughput, latency, and quality (Achilles, [0037], lines 1-3). 
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Achilles with the invention of Lie in order to result in improvements in throughput, latency, and quality. Note that the teaching of Achilles which entails executing by a core in tandem with executing by an accelerator, when applied to the invention of Lie which entails sub-tasks associated with ML operation, results in the overall claim limitation. 

Consider claim 24, the overall combination entails the core and/or the inference engine are programmable via a set of programming instructions (Lie, [0076], lines 3-6, the wavelets correspond to dataflow and/or instruction flow in accordance with communication and/or processing enabling computations performed for training of and/or inference using a neural network; [0483], lines 3-11, all or any portions of Task SW on PEs 260 and/or a representation thereof is stored in non-volatile memory comprised in PEs 122 and/or accessible to Connection Server(s) 160. In various embodiments and/or usage scenarios, Task SW on PEs 260 enables performing processing of training data such as to determine weights of a neural network (e.g., via forward, delta, and chain passes); [0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122); [0467], lines 4-5, the placement programs are stored in CRM 152 and executed by CPUs 151).

Consider claim 25, the overall combination entails the second plurality sub-tasks of the ML operation includes one or more of a dense and/or regular computation task (Lie, [0553], line 2, dense wavelet) and a sparse and/or irregular computation task of the ML operation (Lie, [0548], line 2, sparse wavelet).

Claim 26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie and Achilles as applied to claim 23 above, and further in view of Anderson et al. (Anderson) (US 20150019836) in view of Nemirovsky et al. (Nemirovsky) (US 20080040577 A1).
Consider claim 26, the combination thus far discloses receiving the second plurality of sub-tasks from the core (Lie, [0483], lines 2-3, Task SW on PEs 260 conceptually represents distributed SW executed as tasks on various PEs of PEs 122, [0531], line 2, processing a wavelet for task initiation), and transmitting the second plurality of sub-tasks and the stream of data to the inference engine ([0483], lines 2-3, Task SW on PEs 260 conceptually represents distributed SW executed as tasks on various PEs of PEs 122, [0531], line 2, processing a wavelet for task initiation; [0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data).
However, the combination thus far does not entail a streaming engine to perform the aforementioned receiving and transmitting, and a single load instruction to perform the aforementioned transmitting.
On the other hand, Anderson discloses a streaming engine ([0027], line 1, streaming engines).
Anderson’s teaching frees memory fetch tasks from the corresponding CPU enabling other processing functions (Anderson, [0026], lines 19-21).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Anderson with the combination of Lie and Achilles in order to free memory fetch tasks from the corresponding CPU enabling other processing functions.
However, the combination thus far does not entail a single load instruction to perform the aforementioned transmitting.
On the other hand, Nemirovsky discloses a single load instruction to perform transmitting ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the combination of Lie, Achilles, and Anderson in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (a single load instruction to perform transmitting) to a known device (method, or product) ready for improvement (the combination of Lie, Achilles, and Anderson which entails transmitting to the inference engine by a streaming engine) to yield predictable results (the combination of Lie, Achilles, and Anderson, entailing a single load instruction to perform the transmitting to the inference engine by a streaming engine), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Nemirovsky’s teaching of a single load instruction to perform transmitting, when applied to the combination of Lie, Achilles, and Anderson which entails transmitting to the inference engine by a streaming engine, results in the overall claimed limitation.

Claim 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lie in view of Anderson et al. (Anderson) (US 20150019836) in view of Nemirovsky et al. (Nemirovsky) (US 20080040577 A1) in view of Achilles et al. (Achilles) (US 20110307890 A1).
Consider claim 27, Lie discloses a system configured to perform a machine learning (ML) operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference), comprising: a core configured to divide the ML operation into a first plurality of sub-tasks and a second plurality of sub-tasks ([0468], lines 1-9, Connection Server(s) 160 is enabled to communicate with FPGAs 121 and indirectly with PEs 122 via FPGAs 121/Coupling 123, via NICs 164 and programmed control thereof via driver programs In various embodiments and/or usage scenarios, the communication comprises placement information (e.g., from Placement Server(s) 150), training information (e.g., from sources not illustrated but accessible via Internet 180) and/or results of training (e.g., weights from PEs 122);  transmitting a stream of data associated with the second plurality of sub-tasks to an inference engine for the ML operation ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference); and said inference engine comprising a plurality of processing tiles ([0062], lines 7-8, an array of processing elements performs flow-based computations on wavelets of data; [0460], lines 1-2, neural network training and inference; FIG. 4, which shows processing elements 499 in rows and columns), wherein at least one processing tile of the plurality of processing tiles includes an on-chip memory (OCM) ([0527], lines 2-3, Memory 854, RF 842, Qs 897, and D-Store 848) and one or more processing units ([0527], lines 8-11, Data Path 852 comprises execution resources (e.g., ALUs) enabled to perform operations (e.g., specified by an opcode decoded and/or provided by Dec 840, according to embodiment)), wherein the OCM is configured to receive the stream of data and maintain the stream of data for local access by the one or more processing units in the at least one processing tile ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)); and maintain a result of the ML operation performed by the one or more processing units in the at least one processing tile and further to output the result of the ML operation as an output data stream ([0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)), wherein the one or more processing units are configured to perform one or more types of computation tasks of the second plurality of sub-tasks of the ML operation on the stream of data maintained in the OCM ([0516], lines 1-3, picker 830 receives the selected wavelet from one of Qs 897 and is enabled to send one or more of data and index from the selected wavelet to RF 842; [0527], lines 2-6, any one or more of Memory 854, RF 842, Qs 897, and D-Store 848 are enabled to provide data to Data Path 852 (e.g., in response to a request from D-Seq 844) and to receive data from Data Path 852 (e.g., results of operations)).
However, Lie does not entail a streaming engine to perform the aforementioned transmitting, and a single load instruction to perform the aforementioned transmitting. Lie also does not disclose the first plurality of sub-tasks are executed by the core.
On the other hand, Anderson discloses a streaming engine ([0027], line 1, streaming engines).
Anderson’s teaching frees memory fetch tasks from the corresponding CPU enabling other processing functions (Anderson, [0026], lines 19-21).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Anderson with the invention of Lie in order to free memory fetch tasks from the corresponding CPU enabling other processing functions.
However, the combination thus far does not entail a single load instruction to perform the aforementioned transmitting. The combination thus far also does not disclose the first plurality of sub-tasks are executed by the core.
On the other hand, Nemirovsky discloses a single load instruction to perform transmitting ([0033], lines 2-3, Stream Load instruction; [0036], line 2, Stream Store instruction)
Nemirovsky’s teaching improves speed and efficiency (Nemirovsky, [0004], lines 7-13).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nemirovsky with the combination of Lie and Anderson in order to improve speed and efficiency. Alternatively, this modification merely entails applying a known technique (a single load instruction to perform transmitting) to a known device (method, or product) ready for improvement (the combination of Lie and Anderson which entails transmitting to the inference engine by a streaming engine) to yield predictable results (the combination of Lie and Anderson, entailing a single load instruction to perform the transmitting to the inference engine by a streaming engine), which is a rationale that may support a conclusion of obviousness as per MPEP 2143. Note that Nemirovsky’s teaching of a single load instruction to perform transmitting, when applied to the combination of Lie and Anderson which entails transmitting to the inference engine by a streaming engine, results in the overall claimed limitation.
However, the combination thus far also does not disclose the first plurality of sub-tasks are executed by the core.
On the other hand, Achilles discloses executing by a core in tandem with executing by an accelerator ([0032], lines 1-4, Special-purpose accelerators are well-known devices used to provide an efficient method of offloading computationally intensive tasks from the general-purpose processor (e.g., CPU or microprocessor); [0037], line 4, blend hardware acceleration with software).
Achilles’s teaching results in improvements in throughput, latency, and quality (Achilles, [0037], lines 1-3). 
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Achilles with the combination of Lie, Anderson, and Nemirovsky in order to result in improvements in throughput, latency, and quality. Note that the teaching of Achilles which entails executing by a core in tandem with executing by an accelerator, when applied to the combination of Lie, Anderson, and Nemirovsky which entails sub-tasks associated with ML operation, results in the overall claim limitation.

Response to Arguments
Applicant on page 20 argues: “The specification is objected to for minor informalities. Applicant has amended the title as well as paragraphs 4, 18, 21, 24, 26-28, 34, 37-38, and 51 and withdrawal of the objection is earnestly solicited.”
In view of the aforementioned amendments, all objections to the specification have been withdrawn except for the objection directed to the title. Examiner submits that processing different computational tasks was well known before the effective filing date of the claimed invention.

Applicant on page 20 argues: “The drawings are objected to for various reasons. Replacement sheets are submitted herein and withdrawal of the objection is respectfully solicited.”
Various previously pending objections to the drawings are withdrawn in view of the amendments to the drawings. However, other previously pending objections to the drawings remain applicable, and in various cases the amendments to the drawings introduce additional objectionable issues — see the drawings section above.

Applicant on page 20 argues: ‘In other words, at least one processing tile comprises one or more of an OCM, a first processing unit, and a second processing unit. The OCM is configured to receive and maintain data from the input data stream and output result as an output data stream. As such, an "and" where appropriate was included. As such, withdrawal of the objection with respect to Claim 1 and its dependent claims is respectfully solicited.’
Examiner generally notes that the requested location of “and” was not preceding “a first processing unit” but instead preceded the last configured functionality of the OCM. Nevertheless, in view of the newly added “and” in claim 1, line 10, the objection is withdrawn.

Applicant on page 21 argues: “Claims 3-4 are objected to for allegedly having insufficient antecedent basis. Without acquiescing to the rationale of the rejection and merely to expedite the prosecution of the instant application, Claim 3-4 have been amended. As such, withdrawal of the objection is respectfully solicited.”
In view of the aforementioned amendments, the associated previously presented objections are withdrawn.

Applicant on page 21 argues: ‘Claim 11 is objected to for being allegedly reciting a singular operation rather than a plural operations. Applicant respectfully submits that since the claim recites "one or more" then either a singular "operation" or a plural "operations" can be used. As such, withdrawal of the objection is respectfully solicited.’
However, for grammatical clarity and definiteness, Examiner submits that “operations” rather than “operation” should follow “one or more”. Nevertheless, Examiner is very willing to consider any citations to supplemental resources regarding the use of a singular noun follows “one or more”. 

Applicant on page 21 argues: “Claim 12 is objected to for allegedly failing to alleviate the objection of Claim 11. Applicant respectfully request withdrawal of the objection for similar reasons as Claim 11.”
Examiner’s response to argument above with respect to claim 11 is likewise applicable to the aforementioned argument directed to claim 12.

Applicant on page 21 argues: “Moreover, Claim 12 is objected to for allegedly having insufficient antecedent basis. Without acquiescing to the rationale of the objection and merely to expedite the prosecution of the instant application, Claim 12 has been amended and withdrawal of the objection is respectfully solicited.”
In view of the aforementioned amendments, the associated previously presented objections are withdrawn.

Applicant on page 21 argues: ‘Claim 13 is objected to for missing an "and". Claim 13 has been amended and withdrawal of the objection is respectfully solicited. Claims 14-22 are objected to for allegedly failing to alleviate the objection of Claim 13. Withdrawal of the objection to Claims 14-22 is respectfully requested in light of amendment to Claim 13.’
In view of the aforementioned amendment, the associated previously presented objection is withdrawn.

Applicant on page 21 argues: “Claim 21 is objected to for similar reasons that Claim 11 is objected to. Applicant respectfully request withdrawal of the objections for similar reasons to that of Claim 11.”
Examiner’s response to argument above with respect to claim 11 is likewise applicable to the aforementioned argument directed to claim 21.

Applicant across pages 21-22 argues: “Claim 22 is objected to for failing to alleviate the objection of Claim 21. Applicant respectfully request withdrawal of the objection in light of the reasoning provided for Claim 21.”
Examiner’s response to argument above with respect to claim 22 is likewise applicable to the aforementioned argument directed to claim 21.

Applicant on page 22 argues: “Claim 22 is further objected to for allegedly insufficient antecedent basis. Without acquiescing to the rationale of the rejection and merely to expedite the prosecution of the instant application, Claim 22 has been amended. Withdrawal of the objection is respectfully requested.”
In view of the aforementioned amendment, the associated previously presented objection is withdrawn.

Applicant on page 22 argues: “Claim 23 is objected to for missing an "and". Without acquiescing to the rationale of the rejection and merely to expedite the prosecution of the instant application, Claim 23 has been amended. Withdrawal of the objection is respectfully requested.”
In view of the aforementioned amendments, the associated previously presented objections are withdrawn.

Applicant on page 22 argues: “Claims 24-26 are objected to for allegedly failing to alleviate the objections of Claim 23. Withdrawal of the objection is respectfully requested in light of the amendment to Claim 23.”
Examiner’s response to argument above with respect to claim 23 is likewise applicable to the aforementioned arguments directed to claims 24-26.

Applicant on page 22 argues: “Claim 25 is objected for allegedly having insufficient antecedent basis. Claim 25 has been amended without acquiescing to the rationale of the rejection and withdrawal of the objection is respectfully solicited.”
However, the relevant portion of claim 25 does not appear to be amended, and consequently the objection to claim 25 is maintained. Examiner recommends inserting “of” prior to “sub-tasks”. 

Applicant on page 22 argues: “Claims 27-28, and 31 are objected to for missing an "and". Claims 27-28, and 31 have been amended and withdrawal of the objection is respectfully solicited.”
However, the objections to the aforementioned claims do not appear to be wholly addressed via amendment — see the remaining objections of claims 27-28 and 31 set forth in the claim objections section above.

Applicant on page 22 argues: “Claims 29-30 are objected to for allegedly failing to alleviate the objections of Claim 28. Claim 28 is amended and as such, withdrawal of the objection with respect to Claims 29-30 is respectfully solicited.”
Examiner’s response to argument above with respect to claim 28 is likewise applicable to the aforementioned arguments directed to claims 29-30.

Applicant on pages 22-23 argues: “Applicant respectfully submits that a Terminal Disclaimer will be submitted once there are no other rejections on the merit.”
Examiner acknowledges Applicant’s intent.

Applicant on page 23 argues: “Claims 1-31 are rejected, under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being allegedly indefinite. Applicant disagrees but have amended the relevant claims to further clarify the claimed features. As such, withdrawal of the rejection is respectfully requested.”
Various previously pending rejections of the claims under 35 U.S.C. §112(b) are withdrawn in view of the amendments to the claims. However, other previously presented rejections under 35 U.S.C. §112(b) remain applicable, and in various cases the amendments to the claims introduce additional indefinite subject matter; see the Claim Rejections - 35 USC § 112 section above.

Applicant across pages 24-25 argues: “Applicant respectfully submits that in this instance, a mere disclosure of ALUs that perform operations does not teach or teach the first and the second processing units configured to perform a first and a second type of computation tasks respectively in the claimed fashion because each ALU may in fact perform either a first or the second type of computations based on the opcode. Accordingly, the disclosure by Lie provides nothing more than probabilities or possibilities, which are insufficient to establish inherency. In other words, each ALU, as disclosed by Lie, is not specifically designed to perform a particular type of operation but rather can perform any type of operation depending on the received opcode. In other words, a first processing unit configured to perform a first type of computation task ... and a second processing unit configured to perform a second type of computation task of the one ML operation, as claimed, does not necessarily flow from the disclosure by Lie. As such, Lie fails to anticipate independent Claim 1. Independent Claim 13 also recites features similar to that of Claim 1 and is patentable over Lie for similar reasons that Claim 1 is patentable. Dependent claims are patentable by virtue of their dependency.”
However, Examiner submits that each ALU being able to perform any type of operation depending on the received opcode still meets the claimed limitations. Examiner submits that the ALUs are specifically designed to perform the operations which the ALUs support.

Applicant on page 25 argues: “In contrast, Lie discloses that the connection server 160 communicate with the FPGAs 121 and indirectly with PEs 122 via coupling 123, NICs 164, and programmed control thereof via driver programs (see Lie, paragraph 468). It appears that the rejection is relying on inherency and that a mere presence of communication between a connection server 160 and the FPGAs 121 and indirectly with PEs 122, as disclosed by Lie, necessarily teaches dividing an ML operation into various sub-tasks in the claimed fashion. Applicant respectfully disagrees because the connection server 160 and the FPGAs 121 and indirectly PEs 122 may communicate the ML operation without dividing it, e.g., one may process one operation to generated an updated operation that is passed on to another component. Accordingly, the disclosure by Lie provides nothing more than probabilities or possibilities, which are insufficient to establish inherency. In other words, dividing the ML operation into a first plurality of sub-tasks and a second plurality of sub- tasks via a core, as claimed, does not necessarily flow from the disclosure by Lie. As such, Lie fails to anticipate independent Claim 28. Dependent claims are patentable by virtue of their dependency.”
However, Examiner submits that even in the hypothetical scenario given by Applicant, the overall ML operation is still being logically divided into pluralities of sub-tasks under the control of the combined server 110. In addition, the examiner’s citation of Lie included the placement information subject matter, which reflects the recited division. 

Applicant on page 26 argues: “Independent Claim 31 recites dividing the ML operation into a first plurality of sub-tasks and a second plurality of sub-tasks via a core; transmitting a stream of data associated with the second plurality of sub-tasks to an inference engine for the ML operation via a single load instruction, as claimed. As discussed above, Lie does not teach or suggest dividing the ML operation into sub-tasks, as claimed. Nemirovsky fails to remedy the failures of Lie and as such, Claim 31 is patentable.”
Examiner’s response to arguments above is likewise applicable to the argument directed to claim 31. 

Applicant on page 26 argues: “Moreover, Claim 31 recites transmitting a stream of data associated with the second plurality of sub-tasks to an inference engine for the ML operation via a single load instruction, as claimed. Nowhere does Nemirovsky teach or suggest that the data being streamed via a single instruction is associated with the second plurality of sub-tasks in the claimed fashion that will be executed by one or more processing units in the claimed fashion. As such, Lie alone or in combination with Nemirovsky fails to teach or suggest the recited features of Claim 31 and its allowance is therefore requested.”
However, Examiner does not rely upon Nemirovsky alone to teach the entirety of the aforementioned subject matter. Rather, Examiner relies upon a combination of Lie and Nemirovsky.

Applicant on page 26 argues: “Claims 10-12 and 20-22 are rejected, under 35 U.S.C. 103, as being allegedly unpatentable over Lie as applied to claims 8 and 13 above, and further in view of Jang et al. (Jang) (US 5481487). Jang fails to remedy the failures of Lie with respect to independent Claims 1 and 13. Claims 10-12 and 20-22 depend from independent Claims 1 and 13 and are patentable by virtue of their dependency in addition to their own patentable features. As such, allowance of Claims 10-12 and 20-22 is earnestly solicited.”
Examiner’s response to arguments with respect to claims 1 and 13 are likewise applicable to the arguments directed to the aforementioned further claims.

Applicant across pages 26-27 argues: “Claims 23-25 are rejected, under 35 U.S.C. 103, as being allegedly unpatentable over Lie in view of Achilles et al. (Achilles) (US 20110307890 A1). Claim 23 is patentable over the cited combination for similar reasons that Claims 28 and 31 are patentable because Achilles fails to remedy the failures of Lie with respect to Claims 28 and 31, as discussed above. Claims 24-25 depend from Claim 23 and are patentable by virtue of their dependency in addition to their own patentable features. As such, allowance of Claims 23-25 is earnestly solicited.”
Examiner’s response to arguments with respect to claims 28 and 31 are likewise applicable to the arguments directed to the aforementioned further claims.

Applicant on page 27 argues: “Claim 26 is rejected, under 35 U.S.C. 103, as being allegedly unpatentable over Lie and Achilles and yet in further view of Anderson et al. (Anderson) (US 20150019836) in view of Nemirovsky. Anderson and Nemirovsky fail to remedy the failures of Lie and Achilles with respect to independent Claim 23. Claim 26 depends from Claim 23 and is patentable over the cited combination by virtue of its dependency in addition to its own patentable features. As such, allowance of Claim 26 is earnestly solicited.”
 Examiner’s response to arguments with respect to claim 23 is likewise applicable to the arguments directed to the aforementioned further claims.

Applicant on page 27 argues: “Claim 27 is rejected, under 35 U.S.C. 103, as being allegedly unpatentable over Lie in view of Anderson and yet in further view of Nemirovsky. Claim 27 recites features similar to that of Claims 23 and 31 and is patentable over the relied upon combination for similar reasons that Claims 23 and 31 are patentable. As such, allowance of Claim 27 is earnestly solicited.”
Examiner’s response to arguments with respect to claims 23 and 31 are likewise applicable to the arguments directed to the aforementioned further claim.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH E VICARY whose telephone number is (571)270-1314. The examiner can normally be reached Monday to Friday, 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571)270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEITH E VICARY/Primary Examiner, Art Unit 2182