Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is the initial office action that has been issued in response to patent application 16/237,617 filed on 12/31/2018. Claims 1-8, as originally filed, are currently pending and have been considered below. Claim 1 and 6 are independent claims.

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in China on 08/01/2018. It is noted, however, that applicant has not filed a certified copy of the CN201810862229.4 application as required by 37 CFR 1.55.

Claim Objections
Claims 1-8 are objected to because of the following informalities: 
In claim 1, lines 2-3, “a temporal engine; The frontal engine is” should read “a temporal engine; the frontal engine is”
In claim 1, lines 12-13, “the flow sensor processor further A neuron station is provided” should read “the flow sensor processor further a neuron station is provided”
In claim 2, lines 3-4, “channel Dimensions C, K, where C represents the input feature map, K represents the output feature map” should read “channel dimensions C, K, where C represents the input feature map, K represents the output feature map”
In claim 4, lines 3-4, “all streams The perceptron processor shares an L2 cache and an export block” should read “all streams the perceptron processor shares an L2 cache and an export block”
In claim 6, lines 3-4, “a channel dimension C, K, where C represents an input feature map, and K represents Output feature map” should read “a channel dimension C, K, where C represents an input feature map, and K represents output feature map”
In claim 6, lines 5-6, “divide each tile into several 17700333.118230332-30005Attorney Docket No.: 230332-30005wave blocks, and then each wave The block is divided into waves” should read “divide each tile into several 17700333.118230332-30005Attorney Docket No.: 230332-30005wave blocks, and then each wave the block is divided into waves”
In claim 6, line 9, “Step 1. The block tile schedular” should read “Step 1: The block tile schedular”
In claim 6, line 9, “Step 1. The block tile schedular” should read “Step 1. The block tile scheduler”
In claim 6, line 13, “Step 2, the tile dispatcher” should read “Step 2: The tile dispatcher”
In claim 6, line 22, “Step 5, the neuron station” should read “Step 5: The neuron station”
In claim 6, line 24, “In step 6, there is a” should read “Step 6: There is a”
In claim 8, line 1, “A flexible data stream processing method for an artificial intelligence device” should read “The flexible data stream processing method for an artificial intelligence device”
In claim 8, lines 2-3, “wherein the size of the tiles, tiles, blocks and waves is programmable” should read “wherein the size of the tiles, tile blocks and waves is programmable”

Each dependent claim is objected to base on the same rationale as the claim from which it depends. Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are (generic place holders in bold):
Claim 1:
The frontal engine is provided with a tile block scheduler, the frontal engine receives the tensor information (Specification [0036] reiterates the function, but does not provide description of the structure)
the occipital engine receives and organizes the rendered partial tensor and outputs it (Specification [0039] reiterates the function, but does not provide description of the structure)
the temporal engine receives the tensor information output by the occipital engine, performs post processing and writes the final tensor into the memory (Specification [0040] reiterates the function, but does not provide description of the structure)

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-5 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirements. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.

Each of the limitations in claims 1-5 that contains the following generic placeholders:
frontal engine
occipital engine
temporal engine

invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112. Sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the corresponding description found in the Specification (see Section 6 of the Office Action) of each of the generic placeholders listed above substantially reiterates the claim language and does not provide description of the structure that performs the corresponding functions. Therefore, claims 1-5 are rejected under 35 U.S.C. 112(a) for lack of written description. See MPEP 2181 (IV) (“the means- (or step-) plus- function claim must still be analyzed to determine whether there exists corresponding adequate support for such claim limitation under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph support for the claim limitation, the examiner must consider whether the specification describes the claimed invention in sufficient detail to establish that the inventor or joint inventor(s) had possession of the claimed invention as of the application’s filing date”).

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1- 8 are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, as being indefinite or failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 1 recites the limitation “the frontal engine receives the tensor information” in lines 3-4. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “the frontal engine receives a tensor information”.
Claim 1 recites the limitation “the tile scheduler divides” in line 4. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “the tile block scheduler divides”.
Claim 1 recites the limitation “divides the tensor into a plurality of tile blocks” in line 4. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “divides a tensor into a plurality of tile blocks”.
Claim 1 recites the limitation “organizes the rendered partial tensor” in line 15. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “organizes a rendered partial tensor”
Claim 1 recites the limitation “writes the final tensor” in line 17. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “writes a final tensor”
Claim 1 recites the limitation “into the memory” in line 17. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “into a memory”
Claim 2 recites the limitation “C represents the input feature map” in lines 3-4. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “C represents an input feature map”
Claim 2 recites the limitation “K represents the output feature map” in line 4. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “K represents an output feature map”
Claim 2 recites the limitation “N represents the batch dimension” in line 4. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “N represents a batch dimension”
Claim 3 recites the limitation “the rendering feature is sent back” in line 3. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a rendering feature is sent back”
Claim 3 recites the limitation “the results are sent” in line 4. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a results are sent”
Claim 5 recites the limitation “with the same characteristics” in lines 3-4. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “with a same characteristics”
Claim 6 recites the limitation “N represents the batch dimension” in line 4. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “N represents a batch dimension”
Claim 6 recites the limitation “with the same rendered features” in lines 6-7. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “with a same rendered features”
Claim 6 recites the limitation “in the same neuron block” in line 7. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “in a same neuron block”
Claim 6 recites the limitation “Step 1. The block tile scheduler” in line 9. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “Step 1. A block tile scheduler”
Claim 6 recites the limitation “in the frontal engine receives” in line 9. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “in a frontal engine receives”
Claim 6 recites the limitation “from an application” in line 10. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “from an application”
Claim 6 recites the limitation “to the requirements of the application” in line 10. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “to a requirements of an application”
Claim 6 recites the limitation “the tile scheduler divides” in line 11. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a tile scheduler divides”
Claim 6 recites the limitation “the scheduling mode is assigned” in line 12. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a scheduling mode is assigned”
Claim 6 recites the limitation “to the parietal engine group” in line 12. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “to a parietal engine group”
Claim 6 recites the limitation “the tile dispatcher” in line 13. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a tile dispatcher”
Claim 6 recites the limitation “in the parietal engine” in line 13. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “in a parietal engine”
Claim 6 recites the limitation “tile block of the a dimension” in line 14. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “tile block of an a dimension”
Claim 6 recites the limitation “wherein the a dimension is an” in line 14. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “wherein an a dimension is”
Claim 6 recites the limitation “The block wave scheduler” in line 16. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “A block wave scheduler”
Claim 6 recites the limitation “in the parietal engine” in line 16. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “in a parietal engine”
Claim 6 recites the limitation “to the flow sensor processor” in lines 17-18. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “to a flow sensor processor”
Claim 6 recites the limitation “in the parietal engine” in line 18. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “in a parietal engine”
Claim 6 recites the limitation “The block wave dispatcher” in line 19. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “A block wave dispatcher”
Claim 6 recites the limitation “in the flow sensor processor” in line 19. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “in a flow sensor processor”
Claim 6 recites the limitation “the neuron station” in line 22. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a neuron station”
Claim 6 recites the limitation “in the flow sensor processor” in line 22. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “in a flow sensor processor”
Claim 6 recites the limitation “loads the activation and weight” in line 22. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “loads an activation and weight”
Claim 6 recites the limitation “in the neuron block” in line 24. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “in a neuron block”
Claim 6 recites the limitation “the neuron station” in line 24. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “a neuron station”
Claim 6 recites the limitation “having the same beta dimension” in line 25. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “having a same beta dimension”
Claim 7 recites the limitation “from the parietal engine” in line 3. There is insufficient antecedent basis for this limitation in the claim. The limitation is interpreted as “from a parietal engine”
Claim 7 recites the limitation “in the parietal engine group” in line 3. There is insufficient antecedent basis for this limitation in the claim. “in the parietal engine group” in line 3. The limitation is interpreted as “in a parietal engine group”

The term “flexible” in claims 1-8 is a relative term which renders the claim indefinite. The term “flexible” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For the examination purpose, “flexible” can be any level of flexibility.

Claim 1 contains a period in line 8 and line, “… obtains the tile block and divides the tile into a plurality of tiles. The wave block scheduler acquires…”, thus rendering the claim to lack clarity because it is unclear if the claim language after the period in line 8 is considered part of the claim. For examination purposes, the period in line 8 is considered a comma.
Claim 3 contains a period in line 3, “… sent back to the parietal engine. After the parietal engine finishes…”, and line 4, thus rendering the claim to lack clarity because it is unclear if the claim language after the period in line 3 is considered part of the claim. For examination purposes, the period in line 3 is considered a comma.
Claim 6 contains a period in line 10, “… application through the driver. According to the requirements…”, line 12, “…polled. The scheduling mode…” and line 25, thus rendering the claim to lack clarity because it is unclear if the claim language after the period in line 10 and line 12 is considered part of the claim. For examination purposes, the period in line 10 and line 12 is considered a comma.
Claim 7 contains a period in line 3, “… the parietal engine group. The number of engines…” and line 4, thus rendering the claim to lack clarity because it is unclear if the claim language after the period in line 3 is considered part of the claim. For examination purposes, the period in line 3 is considered a comma.

Each of the limitations in claims 1-5 that contain the following the following generic placeholders:
frontal engine
occipital engine
temporal engine

invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the corresponding description found in the Specification (see Section 6 of the Office Action) of each of the generic placeholders listed above substantially reiterates the claim language and does not provide description of the structure that performs the corresponding functions. Therefore, the claims is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

For examination purposes, frontal engine, parietal engine group, parietal engine, occipital engine, and temporal engine are interpreted as software running on a generic processor.

In addition to the grounds stated above, each dependent claim is rejected based on the same rationale as the claim from which it depends.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to an apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites an apparatus for designing flexible dataflow processor for artificial intelligent devices. Each of the following limitation(s):
… divides the tensor into a plurality of tile blocks, and… allocates the tile block to… ; 
… divides the tile into a plurality of tiles… acquires the tile and divides it into several wave blocks; 
… divide the wave block into several waves, and… and the waves are characterized in the neuron block; 
… and organizes the rendered partial tensor and outputs it; 
… performs post processing 

as drafted, claim 1 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) but for the generic computer components language. For example, but for generic component language, the above limitation in the context of this claim encompasses… divides the tensor into a plurality of tile blocks, and… allocates the tile block to … (corresponds to evaluation and judgement). Further, the claim encompasses… divides the tile into a plurality of tiles… acquires the tile and divides it into several wave blocks (corresponds to evaluation). Further, the claim encompasses… divide the wave block into several waves, and… and the waves are characterized in the neuron block (corresponds to evaluation and judgement). Further, the claim encompasses… and organizes the rendered partial tensor and outputs it (corresponds to evaluation and judgement). Further, the claim encompasses… performs post processing… (corresponds to evaluation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “the frontal engine is provided with a tile block schedule”,  “the tile scheduler”, “the parietal engine group includes a plurality of”, “a tile dispatcher and a wave block scheduler are disposed in”, “The wave block scheduler”, “the parietal engine is further provided with a plurality of flow sensor processors, and the flow sensor processor is provided with a wave block dispatcher, and the wave block dispatcher can”, “the flow sensor processor”, “the occipital engine”, “the temporal engine”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Further, the limitations of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, and “receives the tensor information output by the occipital engine”, as drafted, is reciting insignificant extra solution activity because it relates to mere data gathering. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere data gathering under MPEP 2106.05(g). Further, the limitations of “writes the final tensor into the memory”, which can be considered as mere data gathering. See MPEP 2106.05(g)(3). Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”, namely a neural network environment. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, and “receives the tensor information output by the occipital engine” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Further, the insignificant extra-solution activity of “writes the final tensor into the memory” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”. Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h). Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to an apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites an apparatus for designing flexible dataflow processor for artificial intelligent devices. Each of the following limitation(s):
one tensor of said tensor information has 5 dimensions, including17700333.117 230332-30005Attorney Docket No.: 230332-30005feature map dimensions: X, Y; channel Dimensions C, K, where C represents the input feature map, K represents the output feature map; N represents the batch dimension

as drafted, claim 2 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) but for the generic computer components language. For example, but for generic component language, the above limitation in the context of this claim encompasses one tensor of said tensor information has 5 dimensions, including17700333.117 230332-30005Attorney Docket No.: 230332-30005feature map dimensions: X, Y; channel Dimensions C, K, where C represents the input feature map, K represents the output feature map; N represents the batch dimension (corresponds to evaluation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “the frontal engine is provided with a tile block schedule”,  “the tile scheduler”, “the parietal engine group includes a plurality of”, “a tile dispatcher and a wave block scheduler are disposed in”, “The wave block scheduler”, “the parietal engine is further provided with a plurality of flow sensor processors, and the flow sensor processor is provided with a wave block dispatcher, and the wave block dispatcher can”, “the flow sensor processor”, “the occipital engine”, “the temporal engine”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Further, the limitations of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, and “receives the tensor information output by the occipital engine”, as drafted, is reciting insignificant extra solution activity because it relates to mere data gathering. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere data gathering under MPEP 2106.05(g). Further, the limitations of “writes the final tensor into the memory”, which can be considered as mere data gathering. See MPEP 2106.05(g)(3).
Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”, namely a neural network environment. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, and “receives the tensor information output by the occipital engine” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Further, the insignificant extra-solution activity of “writes the final tensor into the memory” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”. Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h). Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to an apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Please see analysis of claim 2 above.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “the frontal engine is provided with a tile block schedule”,  “the tile scheduler”, “the parietal engine group includes a plurality of”, “a tile dispatcher and a wave block scheduler are disposed in”, “The wave block scheduler”, “the parietal engine is further provided with a plurality of flow sensor processors, and the flow sensor processor is provided with a wave block dispatcher, and the wave block dispatcher can”, “the flow sensor processor”, “the occipital engine”, “the temporal engine”, and “wherein the occipital engine is constructed in a unified rendering architecture, and specifically includes”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Further, the limitations of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, “receives the tensor information output by the occipital engine”, and “the rendering feature is sent back to the parietal engine. After the parietal engine finishes rendering, the results are sent back to the occipital engine”, as drafted, is reciting insignificant extra solution activity because it relates to mere data gathering. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere data gathering under MPEP 2106.05(g). Further, the limitations of “writes the final tensor into the memory”, which can be considered as mere data gathering. See MPEP 2106.05(g)(3). Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”, namely a neural network environment. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, “receives the tensor information output by the occipital engine”, and “the rendering feature is sent back to the parietal engine. After the parietal engine finishes rendering, the results are sent back to the occipital engine” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Further, the insignificant extra-solution activity of “writes the final tensor into the memory” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”. Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h). Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to an apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Please see analysis of claim 1 above.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “the frontal engine is provided with a tile block schedule”,  “the tile scheduler”, “the parietal engine group includes a plurality of”, “a tile dispatcher and a wave block scheduler are disposed in”, “The wave block scheduler”, “the parietal engine is further provided with a plurality of flow sensor processors, and the flow sensor processor is provided with a wave block dispatcher, and the wave block dispatcher can”, “the flow sensor processor”, “the occipital engine”, “the temporal engine”, and “The perceptron processor shares an L2 cache and an export block”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Further, the limitations of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, “receives the tensor information output by the occipital engine”, and “wherein said frontal engine sends a group tensor to a parietal engine in a polling schedule, all streams”, as drafted, is reciting insignificant extra solution activity because it relates to mere data gathering. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere data gathering under MPEP 2106.05(g). Further, the limitations of “writes the final tensor into the memory”, which can be considered as mere data gathering. See MPEP 2106.05(g)(3).
Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”, namely a neural network environment. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, “receives the tensor information output by the occipital engine”, and “wherein said frontal engine sends a group tensor to a parietal engine in a polling schedule, all streams” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Further, the insignificant extra-solution activity of “writes the final tensor into the memory” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”. Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h). Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 5 is directed to an apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites an apparatus for designing flexible dataflow processor for artificial intelligent devices. Each of the following limitation(s):
each multiply accumulator group Information with the same characteristics can be processed 

as drafted, claim 5 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) but for the generic computer components language. For example, but for generic component language, the above limitation in the context of this claim encompasses… each multiply accumulator group Information with the same characteristics can be processed (corresponds to evaluation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “the frontal engine is provided with a tile block schedule”,  “the tile scheduler”, “the parietal engine group includes a plurality of”, “a tile dispatcher and a wave block scheduler are disposed in”, “The wave block scheduler”, “the parietal engine is further provided with a plurality of flow sensor processors, and the flow sensor processor is provided with a wave block dispatcher, and the wave block dispatcher can”, “the flow sensor processor”, “the occipital engine”, “the temporal engine”, and “in said flow sensor processor has a multiply accumulator group”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Further, the limitations of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, and “receives the tensor information output by the occipital engine”, as drafted, is reciting insignificant extra solution activity because it relates to mere data gathering. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere data gathering under MPEP 2106.05(g). Further, the limitations of “writes the final tensor into the memory”, which can be considered as mere data gathering. See MPEP 2106.05(g)(3).
Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks” and “wherein said neuron block”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”, namely a neural network environment. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of “the frontal engine receives the tensor information”, “the tile dispatcher obtains the tile block”, “the occipital engine receives”, and “receives the tensor information output by the occipital engine” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Further, the insignificant extra-solution activity of “writes the final tensor into the memory” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93”. Moreover, the additional element(s) of “further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks” and “wherein said neuron block”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h). Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 6 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for designing flexible dataflow processor for artificial intelligent devices. Each of the following limitation(s): 
divide the tensor into several tile blocks, divide each tile block into several tiles, divide each tile into several 17700333.118 230332-30005Attorney Docket No.: 230332-30005 wave blocks, and then each wave The block is divided into waves and the waves with the same rendered features are processed in the same neuron block; The specific steps are as follows: 
Step 1… According to the requirements of the application, the tile scheduler divides the tensor into a plurality of tile blocks, and the tile blocks are polled. The scheduling mode is assigned to the parietal engine group; 
Step 2… acquires the tile block and divides the tile block of the a dimension to form a plurality of tiles… 
Step 3… acquires the tile and divides the X and Y dimensions to form a plurality of wave blocks…
Step 4… acquires the wave block and divides it into a plurality of waves based on the p dimension… 
Step 5… loads the activation and weight, and performs neuron processing; 
In step 6… and each multiply accumulator set processes waves having the same beta dimension 

as drafted, claim 6 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) but for the generic computer components language. For example, but for generic component language, the above limitation in the context of this claim encompasses divide the tensor into several tile blocks, divide each tile block into several tiles, divide each tile into several 17700333.118 230332-30005Attorney Docket No.: 230332-30005 wave blocks, and then each wave The block is divided into waves and the waves with the same rendered features are processed in the same neuron block; The specific steps are as follows (corresponds to evaluation and judgement). Further, the claim encompasses… According to the requirements of the application, the tile scheduler divides the tensor into a plurality of tile blocks, and the tile blocks are polled. The scheduling mode is assigned to the parietal engine group (corresponds to evaluation and judgement). Further, the claim encompasses… acquires the tile block and divides the tile block of the a dimension to form a plurality of tiles… (corresponds to evaluation and judgement). Further, the claim encompasses… acquires the tile and divides the X and Y dimensions to form a plurality of wave blocks… (corresponds to evaluation and judgement). Further, the claim encompasses… acquires the wave block and divides it into a plurality of waves based on the p dimension… (corresponds to evaluation and judgement). Further, the claim encompasses… loads the activation and weight, and performs neuron processing (corresponds to evaluation). Further, the claim encompasses… and each multiply accumulator set processes waves having the same beta dimension (corresponds to evaluation and judgement).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “X, Y; a channel dimension C, K, where C represents an input feature map, and K represents Output feature map; N represents the batch dimension”, “The block tile scheduler in the frontal engine”, “wherein the a dimension is an N or C or K dimension”, “the tile dispatcher in the parietal engine”, “wherein the a dimension is an N or C or K dimension”, “The block wave scheduler in the parietal engine”, “the parietal engine”, “The block wave dispatcher in the flow sensor processor”, “there is a multiply accumulator set in the neuron block in the neuron station”, and “wherein the p dimension is an N or C or K dimension”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Further, the limitations of “receives the tensor information from the application through the driver” and “and the wave block is sent to the flow sensor processor in”, as drafted, is reciting insignificant extra solution activity because it relates to mere data gathering. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere data gathering under MPEP 2106.05(g). Moreover, the additional element(s) of “the neuron station in the flow sensor processor”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”, namely a neural network environment. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of ““receives the tensor information from the application through the driver” and “and the wave block is sent to the flow sensor processor in” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Moreover, the additional element(s) of “the neuron station in the flow sensor processor”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h). Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 7 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for designing flexible dataflow processor for artificial intelligent devices. Each of the following limitation(s): 
wherein in step 1, the tile scheduler divides the number of tile17700333.119230332-30005Attorney Docket No.: 230332-30005 blocks separated by the tensor from...

as drafted, claim 7 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) but for the generic computer components language. For example, but for generic component language, the above limitation in the context of this claim encompasses wherein in step 1, the tile scheduler divides the number of tile17700333.119230332-30005Attorney Docket No.: 230332-30005 blocks separated by the tensor from... (corresponds to evaluation and judgement).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “X, Y; a channel dimension C, K, where C represents an input feature map, and K represents Output feature map; N represents the batch dimension”, “The block tile scheduler in the frontal engine”, “wherein the a dimension is an N or C or K dimension”, “the tile dispatcher in the parietal engine”, “wherein the a dimension is an N or C or K dimension”, “The block wave scheduler in the parietal engine”, “the parietal engine”, “The block wave dispatcher in the flow sensor processor”, “there is a multiply accumulator set in the neuron block in the neuron station”, “wherein the p dimension is an N or C or K dimension”, and “the parietal engine in the parietal engine group. The number of engines is the same”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Further, the limitations of “receives the tensor information from the application through the driver” and “and the wave block is sent to the flow sensor processor in”, as drafted, is reciting insignificant extra solution activity because it relates to mere data gathering. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere data gathering under MPEP 2106.05(g). Moreover, the additional element(s) of “the neuron station in the flow sensor processor”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”, namely a neural network environment. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of ““receives the tensor information from the application through the driver” and “and the wave block is sent to the flow sensor processor in” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Moreover, the additional element(s) of “the neuron station in the flow sensor processor”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h). Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Please see analysis of claim 6 above.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “X, Y; a channel dimension C, K, where C represents an input feature map, and K represents Output feature map; N represents the batch dimension”, “The block tile scheduler in the frontal engine”, “wherein the a dimension is an N or C or K dimension”, “the tile dispatcher in the parietal engine”, “wherein the a dimension is an N or C or K dimension”, “The block wave scheduler in the parietal engine”, “the parietal engine”, “The block wave dispatcher in the flow sensor processor”, “there is a multiply accumulator set in the neuron block in the neuron station”, “wherein the p dimension is an N or C or K dimension”, “the parietal engine in the parietal engine group. The number of engines is the same”, and “wherein the size of the tiles, tiles, blocks and waves is programmable”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Further, the limitations of “receives the tensor information from the application through the driver” and “and the wave block is sent to the flow sensor processor in”, as drafted, is reciting insignificant extra solution activity because it relates to mere data gathering. The insignificant extra-solution activity are recited at a high level of generality such that it amount no more than mere data gathering under MPEP 2106.05(g). Moreover, the additional element(s) of “the neuron station in the flow sensor processor”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”, namely a neural network environment. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The insignificant extra-solution activity of ““receives the tensor information from the application through the driver” and “and the wave block is sent to the flow sensor processor in” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Moreover, the additional element(s) of “the neuron station in the flow sensor processor”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h). Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Abadi et al. (“TensorFlow: A System for Large-Scale Machine Learning”; hereinafter “Abadi et al. 1”) in view of Graham et al. (“DECAF - A Flexible Multi Agent System Architecture”) in view of Du et al. (“ShiDianNao: Shifting Vision Processing Closer to the Sensor”) in view of Abadi et al. (“TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems”; hereinafter “Abadi et al. 2”)
Regarding Claim 1,
Abadi et al. 1 teaches a flexible data stream processor for an artificial intelligence device, comprising: a frontal engine, a parietal engine group, an occipital engine, and a temporal engine (Abadi et al. 1, Section 2.2 Pg. 267, “We designed TensorFlow to be much more flexible than DistBelief, while retaining its ability to satisfy the demands of Google’s production machine learning workloads. TensorFlow provides a simple dataflow-based programming abstraction that allows users to deploy applications on distributed clusters, local workstations, mobile devices, and custom-designed accelerators” teaches a designed Tensorflow (corresponds to the flexible data stream processor) for artificial intelligence device. Section 5 Pg. 275, “The distributed master translates user requests into execution across a set of tasks” teaches a distributed master (corresponds to the frontal engine) within the layered Tensorflow architecture. Fig. 2 and Section 4.3 Pg. 274, “Save writes one or more tensors to a checkpoint file, and Restore reads one or more tensors from a checkpoint file” teaches the one or more tensors (corresponds to the tile block) being allocated to a checkpoint file (corresponds to the parietal engine group). Section 3.1 Pg. 270, “In TensorFlow, we model all data as tensors (n-dimensional arrays) with the elements having one of a small number of primitive types, such as int32, float32, or string (where string can represent arbitrary binary data). Tensors naturally represent the inputs to and results of the common mathematical operations in many machine learning algorithms: for example, a matrix multiplication takes two 2-D tensors and produces a 2-D tensor; and a batch 2-D convolution takes two 4-D tensors and produces another 4-D tensor” teaches mathematical operations (corresponds to the occipital engine) such as matrix multiplication (corresponds to the temporal engine)).
The frontal engine is provided with a tile block scheduler, the frontal engine receives the tensor information, the tile scheduler divides the tensor into a plurality of tile blocks, and the frontal engine allocates the tile block to the parietal engine group (Abadi et al. 1, Section 5 Pg. 275, “The distributed master translates user requests into execution across a set of tasks” teaches a distributed master (corresponds to the frontal engine) within the layered Tensorflow architecture. Section 5 Pg. 275, “The dataflow executor in each task handles requests from the master, and schedules the execution of the kernels that comprise a local subgraph” teaches a dataflow executor (corresponds to the tile block scheduler), that handles the request from the distributed master, within the layered Tensorflow architecture that schedules execution. Section 3.1 Pg. 270, “In TensorFlow, we model all data as tensors (n-dimensional arrays) with the elements having one of a small number of primitive types, such as int32, float32, or string (where string can represent arbitrary binary data). Tensors naturally represent the inputs to and results of the common mathematical operations in many machine learning algorithms: for example, a matrix multiplication takes two 2-D tensors and produces a 2-D tensor; and a batch 2-D convolution takes two 4-D tensors and produces another 4-D tensor” teaches receiving the tensor information as input. Section 4.2 Pg. 273, “The dynamic partition (Part) operation divides the incoming indices into variable-sized tensors that contain the indices destined for each shard” teaches dividing the received tensor into a plurality of tensor shard (corresponds to the tile blocks). Fig. 2 and Section 4.3 Pg. 274, “Save writes one or more tensors to a checkpoint file, and Restore reads one or more tensors from a checkpoint file” teaches the one or more tensors (corresponds to the tile block) being allocated to a checkpoint file (corresponds to the parietal engine group)).
… the occipital engine receives and organizes the rendered partial tensor and outputs it (Abadi et al. 1, Section 3.2 Pg. 270, “TensorFlow uses a dataflow graph to represent all possible computations in a particular application. The API for executing a graph allows the client to specify declaratively the subgraph that should be executed. The client selects zero or more edges to feed input tensors into the dataflow, and one or more edges to fetch output tensors from the dataflow; the runtime then prunes the graph to contain the necessary set of operations. Each invocation of the API is called a step, and TensorFlow supports multiple concurrent steps on the same graph. Stateful operations allow steps to share data and synchronize when necessary” teaches the dataflow graph (corresponds to the occipital engine) receive input tensors that go through steps and outputs the tenors)).
Abadi et al. 1 does not appear to explicitly teach the parietal engine group includes a plurality of parietal engines, and a tile dispatcher and a wave block scheduler are disposed in the parietal engine, and the tile dispatcher obtains the tile block and divides the tile into a plurality of tiles. The wave block scheduler acquires the tile and divides it into several wave blocks; the parietal engine is further provided with a plurality of flow sensor processors, and the flow sensor processor is provided with a wave block dispatcher, and the wave block dispatcher can divide the wave block into several waves
However, Graham et al., teaches the parietal engine group includes a plurality of parietal engines, and a tile dispatcher and a wave block scheduler are disposed in the parietal engine, and the tile dispatcher obtains the tile block and divides the tile into a plurality of tiles. The wave block scheduler acquires the tile and divides it into several wave blocks (Graham et al., Fig. 2 and Section 6 Pg. 15, “Figure 2 represents the high level structure of the DECAF architecture. Structures inside the heavy black line are internal to the architecture and the items outside the line are user-written or provided from some other outside source (such as incoming KQML messages). There are five internal execution modules (square boxes) in the current DECAF implementation” teaches a DECAF architecture (corresponds to the parietal engine group) that includes a plurality of internal execution modules (corresponds to the plurality of parietal engines) with a dispatcher (corresponds to the tile dispatcher) and scheduler (corresponds to the wave block scheduler). Section 6.2 Pg. 16, “Agent initialization is done once and then control is passed to the Dispatcher which waits for incoming KQML messages… The message is attempting to communicate as part of an ongoing conversation. The Dispatcher makes this distinction mostly by recognizing the KQML:in-reply-to field designator, which indicates the message is part of an existing conversation” teaches the dispatcher receives incoming KQML messages (corresponds to the tile block) and the incoming message divided into parts (corresponds to the plurality of tiles). Section 6.4 Pg. 17, “The scheduling functions are actually divided into two separate modules; the Task Scheduler and the Agenda Manager. The purpose of the Task Scheduler is to evaluate the HTN task structure to determine a set of actions which will ‘‘best’’ suit the users goals. The input is a task HTN will all possible actions” teaches the scheduler (corresponds to the wave block scheduler) acquiring the task HTN input (corresponds to the tile) and dividing the scheduling into two modules (corresponds to the several wave blocks)).
the parietal engine is further provided with a plurality of flow sensor processors, and the flow sensor processor is provided with a wave block dispatcher, and the wave block dispatcher can divide the wave block into several waves, and (Graham et al., Section 7.1.2 Pg. 18 and 20, “One premise of DECAF is that the architecture provides increased reliability by using unused CPU cycles to maximize throughput” teaches DECAF architecture utilizing unused CPU cycles (corresponds to the flow sensor processors). Fig. 2 and Section 6 Pg. 15, “There are five internal execution modules (square boxes) in the current DECAF implementation, and seven associated data structure queues (rounded boxes)” teaches the implementation of DECAF with internal execution modules (corresponds to the parietal engine). Section 6.2 Pg. 16, “If so a new objective is created (equivalent to the BDI ‘‘desires’’ concept [37]) and placed on the Objectives Queue for the Planner. The dispatcher assign a unique identifier to this message which is used to distinguish all messages that are part of the new conversation” teaches the Objective Queue connected to the dispatcher (corresponds to the wave block dispatcher) that distinguishes the messages parts (corresponds to the several waves divided).
Abadi et al. 1 in view of Graham et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1 with Graham et al., with motivation of the parietal engine group includes a plurality of parietal engines, and a tile dispatcher and a wave block scheduler are disposed in the parietal engine, and the tile dispatcher obtains the tile block and divides the tile into a plurality of tiles. The wave block scheduler acquires the tile and divides it into several wave blocks; the parietal engine is further provided with a plurality of flow sensor processors, and the flow sensor processor is provided with a wave block dispatcher, and the wave block dispatcher can divide the wave block into several waves. “DECAF (Distributed, Environment Centered Agent Framework) is a software toolkit for the rapid design, development, and execution of ‘‘intelligent’’ agents to achieve solutions in complex software systems. DECAF is based on the premise that execution of the actions required to accomplish a task specified by an agent program is similar to a traditional operating system executing a sequence of user requests. In the same fashion that an operating system provides an environment for the execution of a user request, an agent framework provides the needed environment for the execution of agent actions. The agent environment includes the ability to communicate with other agents, efficiently maintain the current state of an executing agent, and select an execution path from a set of possible execution paths so as to support persistent, flexible, and robust actions” (Graham et al., Abstract). The proposed teaching is beneficial in that it achieves solutions in complex software systems.
Abadi et al. 1 in view of Graham et al. does not appear to explicitly teach the flow sensor processor further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks, and the waves are characterized in the neuron block
However, Du et al., teaches the flow sensor processor further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks, and the waves are characterized in the neuron block (Du et al., Fig. 1 and Section 2 Pg. 93, “Figure 1 shows a typical integration solution for cheap cameras (closely resembling an STM chipset [55, 56]): An image processing chip is connected to cameras (in typical smartphones, two) streaming their data through standard Camera Serial Interfaces (CSIs). Video processing pipelines, controlled by a microcontroller unit, implement a number of essential functions such as Bayer reconstruction, white balance and barrel correction, noise and defect filtering, autofocus control, video stabilization, and image compression. More advanced processors already implement rudimentary object detection and tracking functions, such as face recognition” teaches a possible system integration of an accelerator in a commercial image processing chip that includes an image processor (corresponds to the flow sensor processor). Fig. 14 and Section 8.2 Pg. 99, “pooling layer, pooling windows of adjacent output neurons are adjacent but non-overlapping, i.e., the step size of window sliding equals to the window size. We present in Figure 14 the execution flow of one such pooling layer” teaches a pooling layer (corresponds to a neuron station) with a plurality of neurons (corresponds to the neuron blocks). The neurons consist of pooling window and step size (corresponds to wave)).
Abadi et al. 1 in view of Graham et al. in view of Du et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1 and Graham et al. with Du et al., with motivation of the flow sensor processor further A neuron station is provided, and the neuron station is composed of a plurality of neuron blocks, and the waves are characterized in the neuron block. “In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86 mm2 and consuming only 320 mW, but still about 30× faster than high-end GPUs” (Du et al., Abstract). The proposed teaching is beneficial in that it reduces the neural network memory footprint and is more energy efficient.
Abadi et al. 1 in view of Graham et al. in view of Du et al. does not appear to explicitly teach the temporal engine receives the tensor information output by the occipital engine, performs post processing and writes the final tensor into the memory
However, Abadi et al. 2, teaches the temporal engine receives the tensor information output by the occipital engine, performs post processing and writes the final tensor into the memory (Abadi et al. 2, Fig. 4, “When we insert Send and Receive nodes, we canonicalize all users of a particular tensor on a particular device to use a single Receive node, rather than one Receive node per downstream user on a particular device. This ensures that the data for the needed tensor is only transmitted once between a source device → destination device pair, and that memory for the tensor on the destination device is only allocated once, rather than multiple times” teaches Device B receiving (corresponds to the temporal engine) receiving nodes (corresponds to the tensor information within the nodes) from Device A (corresponds to the occipital engine) for processing and then stored in memory of the destination device).
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1, Graham et al., and Du et al. with Abadi et al. 2, with motivation of the temporal engine receives the tensor information output by the occipital engine, performs post processing and writes the final tensor into the memory. “The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery” (Abadi et al. 2, Abstract). The proposed teaching is beneficial in that it is flexible and can be used to express a wide variety of algorithms.
Regarding Claim 5,
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 teaches the flexible data stream processor for an artificial intelligence device according to claim 1, 
Du et al. further teaches wherein said neuron block in said flow sensor processor has a multiply accumulator group, each multiply accumulator group Information with the same characteristics can be processed (Du et al., Fig. 1 and Section 2 Pg. 93, “Figure 1 shows a typical integration solution for cheap cameras (closely resembling an STM chipset [55, 56]): An image processing chip is connected to cameras (in typical smartphones, two) streaming their data through standard Camera Serial Interfaces (CSIs). Video processing pipelines, controlled by a microcontroller unit, implement a number of essential functions such as Bayer reconstruction, white balance and barrel correction, noise and defect filtering, autofocus control, video stabilization, and image compression. More advanced processors already implement rudimentary object detection and tracking functions, such as face recognition” teaches a possible system integration of an accelerator in a commercial image processing chip that includes an image processor (corresponds to the flow sensor processor). Fig. 14 and Section 8.2 Pg. 99, “pooling layer, pooling windows of adjacent output neurons are adjacent but non-overlapping, i.e., the step size of window sliding equals to the window size. We present in Figure 14 the execution flow of one such pooling layer” teaches a pooling layer (corresponds to a neuron station) with a plurality of neurons (corresponds to the neuron blocks). Section 4 Pg. 94, “a purely spatial hardware implementation of a neural network would devote a separate accumulation unit for each neuron and a separate multiplier for each synapse” teaches a multiplier and accumulation unit for the neuron (corresponds to the neuron block) which is separatedly devoted to each neuron or synapse (corresponds to grouped information with the same characteristic)).
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1, Graham et al., and Abadi et al. 2 with Du et al., with motivation wherein said neuron block in said flow sensor processor has a multiply accumulator group, each multiply accumulator group Information with the same characteristics can be processed. “In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86 mm2 and consuming only 320 mW, but still about 30× faster than high-end GPUs” (Du et al., Abstract). The proposed teaching is beneficial in that it reduces the neural network memory footprint and is more energy efficient.
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 in further view of Azarkhish et al. (“Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes”)
Regarding Claim 2,
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 teaches the flexible data stream processor for an artificial intelligence device according to claim 1, wherein: 
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 does not appear to explicitly teach one tensor of said tensor information has 5 dimensions, including 17700333.117 230332-30005Attorney Docket No.: 230332-30005 feature map dimensions: X, Y; channel Dimensions C, K, where C represents the input feature map, K represents the output feature map; N represents the batch dimension
However, Azarkhish et al., teaches one tensor of said tensor information has 5 dimensions, including 17700333.117 230332-30005Attorney Docket No.: 230332-30005 feature map dimensions: X, Y; channel Dimensions C, K, where C represents the input feature map, K represents the output feature map; N represents the batch dimension (Azarkhish et al., Section 2.3 Pg. 423, “In this paper, we follow a different approach based on many scalar coprocessors working in parallel on a shared memory. This is described in Section 3. On the other hand, Google's TensorFlow platform [42] maps large-scale ML problems to several machines and computation devices, including multi-core CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs)” teaches a Tensorflow platform with Tensor processing units (corresponds to tensor information). Fig 3 and Section 4.1 Pg. 426, “A 4D-tile (illustrated in Figs. 3a and 3b) is a subset of the input volume (called Input-tile) and output volume (Output-tile) of a convolutional layer (l) identified by the (T(l)Xi, T(l)Yi, T(l)Ci, T(l)Co) tuple. T(l)Xi and T(l)Yi are the tile width and height of the input volume of layer l, and T(l)Ci and T(l)Co are the numbers of input and output channels to the tile. The output dimensions of each tile are calculated directly from input width and height, filter dimensions, striding, and zero-padding parameters” teaches ConvNet with width (X) and height (Y) feature map dimensions along with the output dimension (corresponds to the output feature map, K) and the input filter dimension (corresponds to the input feature map, C). Section 5.1 Pg. 427, “When the number of total jobs (TXo×TYo×TCo) is more than NNST, the jobs will be broken into several batches (NUM_BATCHES)” teaches the jobs being broken into several batches (corresponds to the batch dimension, N) from a convolution kernel performed on a 4D-tile). 
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 in view of Azarkhish et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1, Graham et al., Du et al., and Abadi et al. 2 with Azarkhish et al., with motivation of one tensor of said tensor information has 5 dimensions, including 17700333.117 230332-30005Attorney Docket No.: 230332-30005 feature map dimensions: X, Y; channel Dimensions C, K, where C represents the input feature map, K represents the output feature map; N represents the batch dimension. “In this paper, we propose a flexible processor-in-memory (PIM) solution for scalable and energy-efficient execution of deep convolutional networks (ConvNets), one of the fastest-growing workloads for servers and high-end embedded systems. Our co-design approach consists of a network of Smart Memory Cubes (modular extensions to the standard HMC) each augmented with a many-core PIM platform called NeuroCluster. NeuroClusters have a modular design based on NeuroStream coprocessors (for Convolutionintensive computations) and general-purpose RISC-V cores. In addition, a DRAM-friendly tiling mechanism and a scalable computation paradigm are presented to efficiently harness this computational capability with a very low programming effort. NeuroCluster occupies only 8 percent of the total logic-base (LoB) die area in a standard HMC and achieves an average performance of 240 GFLOPS for complete execution of full-featured state-of-the-art (SoA) ConvNets within a power budget of 2.5 W. Overall 11 W is consumed in a single SMC device, with 22.5 GFLOPS/W energy-efficiency which is 3.5X better than the best GPU implementations in similar technologies. The minor increase in system-level power and the negligible area increase make our PIM system a cost-effective and energy efficient solution, easily scalable to 955 GFLOPS with a small network of just four SMCs” (Azarkhish et al., Abstract). The proposed teaching is beneficial in that it is a cost-effective and energy efficient solution.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 in view of Azarkhish et al. in further view of Steinkraus et al. (“Using GPUs for machine learning algorithms”)
Regarding Claim 3,
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 in view of Azarkhish et al. teaches the flexible data stream processor for an artificial intelligence device according to claim 2, 
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 in view of Azarkhish et al. does not appear to explicitly teach wherein the occipital engine is constructed in a unified rendering architecture, and specifically includes: the rendering feature is sent back to the parietal engine. After the parietal engine finishes rendering, the results are sent back to the occipital engine
However, Steinkraus et al., teaches wherein the occipital engine is constructed in a unified rendering architecture, and specifically includes: the rendering feature is sent back to the parietal engine. After the parietal engine finishes rendering, the results are sent back to the occipital engine (Steinkraus et al., Section 1 Pg. 1, “They are programmable through languages such as DirectX or OpenGL. The graphics primitives still use triangles, but the hardware also allows the instructions to render each pixel to be specified by a program, which can be loaded before the triangle(s) is (are) rendered. These programmable triangle renderers are called "pixel shaders". The instructions of the program in the shaders are close to assembly language, since each has a direct hardware implementation. The new flexibility introduced by pixel shaders allows not only naturalistic rendering of surfaces, but also brings the GPU closer to a general purpose parallel processor” teaches the pixel shaders (corresponds to the parietal engine) rendering triangles. Section 3 Pg. 3, “Both types of shader are concerned with the rendering of triangles (the building blocks of graphics objects) to an output device” teaches the results of the rendered triangle in the shaders are sent to an output device (corresponds to the occipital engine)).
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 in view of Azarkhish et al. in view of Steinkraus et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1, Graham et al., Du et al., Abadi et al. 2, and Azarkhish et al. with Steinkraus et al., with motivation wherein the occipital engine is constructed in a unified rendering architecture, and specifically includes: the rendering feature is sent back to the parietal engine. After the parietal engine finishes rendering, the results are sent back to the occipital engine. “We propose a generic 2-layer fully connected neural network GPU implementation which yields over 3/spl times/ speedup for both training and testing with respect to a 3 GHz P4 CPU” (Steinkraus et al., Abstract). The proposed teaching is beneficial in that it yields a higher speedup.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 in further view of Nowatzki et al. (“Stream-Dataflow Acceleration”)
Regarding Claim 4,
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 teaches the flexible data stream processor for an artificial intelligence device according to claim 1, 
Abadi et al. 1 further teaches wherein said frontal engine sends a group tensor to a parietal engine in a polling schedule, all streams (Abadi et al. 1, Section 5 Pg. 275, “The distributed master translates user requests into execution across a set of tasks” teaches a distributed master (corresponds to the frontal engine) within the layered Tensorflow architecture. Section 5 Pg. 275, “The dataflow executor in each task handles requests from the master, and schedules the execution of the kernels that comprise a local subgraph” teaches a dataflow executor that handles the request from the distributed master, within the layered Tensorflow architecture that schedules execution (corresponds to the polling schedule). Fig. 2 and Section 4.3 Pg. 274, “Save writes one or more tensors to a checkpoint file, and Restore reads one or more tensors from a checkpoint file” teaches the one or more tensors (corresponds to the tile block) being allocated to a checkpoint file (corresponds to the parietal engine group)). 
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2. does not appear to explicitly teach The perceptron processor shares an L2 cache and an export block
However, Nowatzki et al., teaches The perceptron processor shares an L2 cache and an export block (Nowatzki et al., Section 4.5 Pg. 424, “Because Softbrain's memory interface directly accesses the L2 cache, there is the possibility of incoherence between the control core's L1 and the L2. to avoid incoherent reads from the L2 to the stream engines, the control core's Ll is write-through. to avoid incoherent reads on the L1, Softbrain sends L1 tag invalidations as it processes the stream” teaches Softbrain’s memory interface (corresponds to the perceptron processor) with an L2 cache and Softbrain sending L1 tag validation (corresponds to the export block)).
Abadi et al. 1 in view of Graham et al. in view of Du et al. in view of Abadi et al. 2 in view of Nowatzki et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1, Graham et al., Du et al., and Abadi et al. 2, with Nowatzki et al., with motivation of The perceptron processor shares an L2 cache and an export block. “We define a general architecture (a hardware-software interface) which can more efficiently expresses programs with these properties called streamdataflow. The dataflow component of this architecture enables high concurrency, and the stream component enables communication and coordination at very-low power and area overhead. This paper explores the hardware and software implications, describes its detailed microarchitecture, and evaluates an implementation. Compared to a state-of-the-art domain specific accelerator (DianNao), and fixed-function accelerators for MachSuite, Softbrain can match their performance with only 2× power overhead on average” (Nowatzki et al., Abstract). The proposed teaching is beneficial in that it enables high concurrency and the stream component enables communication and coordination at very-low power and area overhead.
Claims 6-8 are rejected under 35 U.S.C. 103 as being unpatentable over Abadi et al. 1 in view of Azarkhish et al. in view of Graham et al. in view of Du et al.
Regarding Claim 6,
Abadi et al. 1 teaches a flexible data stream processing method for an artificial intelligence device (Abadi et al. 1, Section 2.2 Pg. 267, “We designed TensorFlow to be much more flexible than DistBelief, while retaining its ability to satisfy the demands of Google’s production machine learning workloads. TensorFlow provides a simple dataflow-based programming abstraction that allows users to deploy applications on distributed clusters, local workstations, mobile devices, and custom-designed accelerators” teaches a designed Tensorflow (corresponds to the flexible data stream processing method) for artificial intelligence device).
… divide the tensor into several tile blocks (Abadi et al. 1, Section 4.2 Pg. 273, “The dynamic partition (Part) operation divides the incoming indices into variable-sized tensors that contain the indices destined for each shard” teaches dividing the received tensor into a plurality of tensor shard (corresponds to the tile blocks)
… Step 1. The block tile scheduler in the frontal engine receives the tensor information from the application through the driver. According to the requirements of the application, the tile scheduler divides the tensor into a plurality of tile blocks, and the tile blocks are polled. The scheduling mode is assigned to the parietal engine group (Abadi et al. 1, Abstract Pg. 265, “It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs)” teaches computational devices (corresponds to computer devices that contain drivers) of Tensorflow). Section 5 Pg. 275, “The distributed master translates user requests into execution across a set of tasks” teaches a distributed master (corresponds to the frontal engine) within the layered Tensorflow architecture. Section 5 Pg. 275, “The dataflow executor in each task handles requests from the master, and schedules the execution of the kernels that comprise a local subgraph” teaches a dataflow executor (corresponds to the tile block scheduler), that handles the request from the distributed master, within the layered Tensorflow architecture that schedules execution (corresponds to the polling schedule mode). Section 3.1 Pg. 270, “In TensorFlow, we model all data as tensors (n-dimensional arrays) with the elements having one of a small number of primitive types, such as int32, float32, or string (where string can represent arbitrary binary data). Tensors naturally represent the inputs to and results of the common mathematical operations in many machine learning algorithms: for example, a matrix multiplication takes two 2-D tensors and produces a 2-D tensor; and a batch 2-D convolution takes two 4-D tensors and produces another 4-D tensor” teaches receiving the tensor information as input. Section 4.2 Pg. 273, “The dynamic partition (Part) operation divides the incoming indices into variable-sized tensors that contain the indices destined for each shard” teaches dividing the received tensor into a plurality of tensor shard (corresponds to the tile blocks). Fig. 2 and Section 4.3 Pg. 274, “Save writes one or more tensors to a checkpoint file, and Restore reads one or more tensors from a checkpoint file” teaches the one or more tensors (corresponds to the tile block) being allocated to a checkpoint file (corresponds to the parietal engine group)).
Abadi et al. 1 does not appear to explicitly teach characterized in that a tensor has five dimensions, including a feature map dimension: X, Y; a channel dimension C, K, where C represents an input feature map, and K represents Output feature map; N represents the batch dimension; wherein the a dimension is an N or C or K dimension; divides the X and Y dimensions to form a plurality of wave blocks; wherein the p dimension is an N or C or K dimension; Step 5, the neuron station in the flow sensor processor loads the activation and weight, and performs neuron processing 
However, Azarkhish et al., teaches characterized in that a tensor has five dimensions, including a feature map dimension: X, Y; a channel dimension C, K, where C represents an input feature map, and K represents Output feature map; N represents the batch dimension (Azarkhish et al., Section 2.3 Pg. 423, “In this paper, we follow a different approach based on many scalar coprocessors working in parallel on a shared memory. This is described in Section 3. On the other hand, Google's TensorFlow platform [42] maps large-scale ML problems to several machines and computation devices, including multi-core CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs)” teaches a Tensorflow platform with Tensor processing units (corresponds to tensor information). Fig 3 and Section 4.1 Pg. 426, “A 4D-tile (illustrated in Figs. 3a and 3b) is a subset of the input volume (called Input-tile) and output volume (Output-tile) of a convolutional layer (l) identified by the (T(l)Xi, T(l)Yi, T(l)Ci, T(l)Co) tuple. T(l)Xi and T(l)Yi are the tile width and height of the input volume of layer l, and T(l)Ci and T(l)Co are the numbers of input and output channels to the tile. The output dimensions of each tile are calculated directly from input width and height, filter dimensions, striding, and zero-padding parameters” teaches ConvNet with width (X) and height (Y) feature map dimensions along with the output dimension (corresponds to the output feature map, K) and the input filter dimension (corresponds to the input feature map, C). Section 5.1 Pg. 427, “When the number of total jobs (TXo×TYo×TCo) is more than NNST, the jobs will be broken into several batches (NUM_BATCHES)” teaches the jobs being broken into several batches (corresponds to the batch dimension, N) from a convolution kernel performed on a 4D-tile).
… wherein the a dimension is an N or C or K dimension (Azarkhish et al., Fig 3 and Section 4.1 Pg. 426, “A 4D-tile (illustrated in Figs. 3a and 3b) is a subset of the input volume (called Input-tile) and output volume (Output-tile) of a convolutional layer (l) identified by the (T(l)Xi, T(l)Yi, T(l)Ci, T(l)Co) tuple. T(l)Xi and T(l)Yi are the tile width and height of the input volume of layer l, and T(l)Ci and T(l)Co are the numbers of input and output channels to the tile. The output dimensions of each tile are calculated directly from input width and height, filter dimensions, striding, and zero-padding parameters” teaches the channel dimensions along with the output dimension (corresponds to the output feature map, K) and the input filter dimension (corresponds to the input feature map, C). Section 5.1 Pg. 427, “When the number of total jobs (TXo×TYo×TCo) is more than NNST, the jobs will be broken into several batches (NUM_BATCHES)” teaches the jobs being broken into several batches (corresponds to the batch dimension, N) from a convolution kernel performed on a 4D-tile).
… divides the X and Y dimensions to form a plurality of wave blocks (Azarkhish et al., Fig 3 and Section 4.1 Pg. 426, “A 4D-tile (illustrated in Figs. 3a and 3b) is a subset of the input volume (called Input-tile) and output volume (Output-tile) of a convolutional layer (l) identified by the (T(l)Xi, T(l)Yi, T(l)Ci, T(l)Co) tuple. T(l)Xi and T(l)Yi are the tile width and height of the input volume of layer l, and T(l)Ci and T(l)Co are the numbers of input and output channels to the tile. The output dimensions of each tile are calculated directly from input width and height, filter dimensions, striding, and zero-padding parameters” teaches ConvNet with width (X) and height (Y) feature map dimensions along with the output dimension and the input filter dimension. The ConvNet that contains Raw-Tiles, augmented tile, and the 4D-Tiles (corresponds to the plurality of wave blocks)).
… wherein the p dimension is an N or C or K dimension (Azarkhish et al., Fig 3 and Section 4.1 Pg. 426, “A 4D-tile (illustrated in Figs. 3a and 3b) is a subset of the input volume (called Input-tile) and output volume (Output-tile) of a convolutional layer (l) identified by the (T(l)Xi, T(l)Yi, T(l)Ci, T(l)Co) tuple. T(l)Xi and T(l)Yi are the tile width and height of the input volume of layer l, and T(l)Ci and T(l)Co are the numbers of input and output channels to the tile. The output dimensions of each tile are calculated directly from input width and height, filter dimensions, striding, and zero-padding parameters” teaches the tuple (corresponds the p dimension) used to identify the output dimension (corresponds to the output feature map, K) and the input filter dimension (corresponds to the input feature map, C). Section 5.1 Pg. 427, “When the number of total jobs (TXo×TYo×TCo) is more than NNST, the jobs will be broken into several batches (NUM_BATCHES)” teaches the jobs being broken into several batches (corresponds to the batch dimension, N) from a convolution kernel performed on a 4D-tile).
Step 5, the neuron station in the flow sensor processor loads the activation and weight, and performs neuron processing (Azarkhish et al., Section 3. 2 Pg. 425, “NSTs follow a nonblocking dataflow computation paradigm, and information flows in them as tokens… NST supports strided convolution, max-pooling, ReLU-activation, along with some basic utilities for backpropagation and training. Apart from these tasks, it can also be used for generic computations such as dot product, matrix multiplication, linear transformations, and weighted sum/average” teaches a NeuroStream (corresponds to a neuron station that perform neuron processing) in a flow computation paradigm (corresponds to the flow sensor processor) that consist of ReLU-activation and weight).
Abadi et al. 1 in view of Azarkhish et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1 with Azarkhish et al., with motivation of characterized in that a tensor has five dimensions, including a feature map dimension: X, Y; a channel dimension C, K, where C represents an input feature map, and K represents Output feature map; N represents the batch dimension; wherein the a dimension is an N or C or K dimension; divides the X and Y dimensions to form a plurality of wave blocks; wherein the p dimension is an N or C or K dimension; Step 5, the neuron station in the flow sensor processor loads the activation and weight, and performs neuron processing. “In this paper, we propose a flexible processor-in-memory (PIM) solution for scalable and energy-efficient execution of deep convolutional networks (ConvNets), one of the fastest-growing workloads for servers and high-end embedded systems. Our co-design approach consists of a network of Smart Memory Cubes (modular extensions to the standard HMC) each augmented with a many-core PIM platform called NeuroCluster. NeuroClusters have a modular design based on NeuroStream coprocessors (for Convolutionintensive computations) and general-purpose RISC-V cores. In addition, a DRAM-friendly tiling mechanism and a scalable computation paradigm are presented to efficiently harness this computational capability with a very low programming effort. NeuroCluster occupies only 8 percent of the total logic-base (LoB) die area in a standard HMC and achieves an average performance of 240 GFLOPS for complete execution of full-featured state-of-the-art (SoA) ConvNets within a power budget of 2.5 W. Overall 11 W is consumed in a single SMC device, with 22.5 GFLOPS/W energy-efficiency which is 3.5X better than the best GPU implementations in similar technologies. The minor increase in system-level power and the negligible area increase make our PIM system a cost-effective and energy efficient solution, easily scalable to 955 GFLOPS with a small network of just four SMCs” (Azarkhish et al., Abstract). The proposed teaching is beneficial in that it is a cost-effective and energy efficient solution.
Abadi et al. 1 in view of Azarkhish et al. does not appear to explicitly teach divide each tile block into several tiles, divide each tile into several17700333.118230332-30005Attorney Docket No.: 230332-30005 wave blocks and then each wave The block is divided into waves and the waves with the same rendered features are processed in the same neuron block; The specific steps are as follows: Step 2, the tile dispatcher in the parietal engine acquires the tile block and divides the tile block of the a dimension to form a plurality of tiles; Step 3: The block wave scheduler in the parietal engine acquires the tile; and the wave block is sent to the flow sensor processor in the parietal engine; Step 4: The block wave dispatcher in the flow sensor processor acquires the wave block and divides it into a plurality of waves based on the p dimension
However, Graham et al., teaches divide each tile block into several tiles, divide each tile into several17700333.118230332-30005Attorney Docket No.: 230332-30005 wave blocks (Graham et al., Section 6.2 Pg. 16, “Agent initialization is done once and then control is passed to the Dispatcher which waits for incoming KQML messages… The message is attempting to communicate as part of an ongoing conversation. The Dispatcher makes this distinction mostly by recognizing the KQML:in-reply-to field designator, which indicates the message is part of an existing conversation” teaches the dispatcher receives incoming KQML messages (corresponds to the tile block) and the incoming message divided into parts (corresponds to the plurality of tiles). Section 6.4 Pg. 17, “The scheduling functions are actually divided into two separate modules; the Task Scheduler and the Agenda Manager. The purpose of the Task Scheduler is to evaluate the HTN task structure to determine a set of actions which will ‘‘best’’ suit the users goals. The input is a task HTN will all possible actions” teaches the scheduler (corresponds to the wave block scheduler) acquiring the task HTN input (corresponds to the tile) and dividing the scheduling into two modules (corresponds to the several wave blocks)).
and then each wave The block is divided into waves and the waves with the same rendered features are processed in the same neuron block; The specific steps are as follows (Graham et al., Section 6.2 Pg. 16, “If so a new objective is created (equivalent to the BDI ‘‘desires’’ concept [37]) and placed on the Objectives Queue for the Planner. The dispatcher assign a unique identifier to this message which is used to distinguish all messages that are part of the new conversation” teaches the Objective Queue connected to the dispatcher (corresponds to the wave block dispatcher) that distinguishes the messages parts (corresponds to the several waves divided) to distinguish the messages (corresponds to the same rendered features)).
… Step 2, the tile dispatcher in the parietal engine acquires the tile block and divides the tile block of the a dimension to form a plurality of tiles (Graham et al., Fig. 2 and Section 6 Pg. 15, “Figure 2 represents the high level structure of the DECAF architecture. Structures inside the heavy black line are internal to the architecture and the items outside the line are user-written or provided from some other outside source (such as incoming KQML messages). There are five internal execution modules (square boxes) in the current DECAF implementation” teaches a DECAF architecture (corresponds to the parietal engine group) that includes a plurality of internal execution modules (corresponds to the plurality of parietal engines) with a dispatcher (corresponds to the tile dispatcher) and scheduler. Section 6.2 Pg. 16, “Agent initialization is done once and then control is passed to the Dispatcher which waits for incoming KQML messages… The message is attempting to communicate as part of an ongoing conversation. The Dispatcher makes this distinction mostly by recognizing the KQML:in-reply-to field designator, which indicates the message is part of an existing conversation” teaches the dispatcher receives incoming KQML messages (corresponds to the tile block) and the incoming message divided into parts (corresponds to the plurality of tiles).
… Step 3: The block wave scheduler in the parietal engine acquires the tile (Graham et al., Fig. 2 and Section 6 Pg. 15, “Figure 2 represents the high level structure of the DECAF architecture. Structures inside the heavy black line are internal to the architecture and the items outside the line are user-written or provided from some other outside source (such as incoming KQML messages). There are five internal execution modules (square boxes) in the current DECAF implementation” teaches a DECAF architecture (corresponds to the parietal engine group) that includes a plurality of internal execution modules (corresponds to the plurality of parietal engines) with a dispatcher and scheduler (corresponds to the wave block scheduler).  
… and the wave block is sent to the flow sensor processor in the parietal engine (Graham et al., Section 7.1.2 Pg. 18 and 20, “One premise of DECAF is that the architecture provides increased reliability by using unused CPU cycles to maximize throughput” teaches DECAF architecture utilizing unused CPU cycles (corresponds to the flow sensor processors). Fig. 2 and Section 6 Pg. 15, “There are five internal execution modules (square boxes) in the current DECAF implementation, and seven associated data structure queues (rounded boxes)” teaches internal execution modules (corresponds to the parietal engine). Section 6.2 Pg. 16, “If so a new objective is created (equivalent to the BDI ‘‘desires’’ concept [37]) and placed on the Objectives Queue for the Planner. The dispatcher assign a unique identifier to this message which is used to distinguish all messages that are part of the new conversation” teaches the Objective Queue connected to the dispatcher that distinguishes the messages parts (corresponds to the several waves divided).
Step 4: The block wave dispatcher in the flow sensor processor acquires the wave block and divides it into a plurality of waves based on the p dimension (Graham et al., Section 7.1.2 Pg. 18 and 20, “One premise of DECAF is that the architecture provides increased reliability by using unused CPU cycles to maximize throughput” teaches DECAF architecture utilizing unused CPU cycles (corresponds to the flow sensor processors). Fig. 2 and Section 6 Pg. 15, “There are five internal execution modules (square boxes) in the current DECAF implementation, and seven associated data structure queues (rounded boxes)” teaches internal execution modules (corresponds to the parietal engine). Section 6.2 Pg. 16, “If so a new objective is created (equivalent to the BDI ‘‘desires’’ concept [37]) and placed on the Objectives Queue for the Planner. The dispatcher assign a unique identifier to this message which is used to distinguish all messages that are part of the new conversation” teaches the Objective Queue connected to the dispatcher (corresponds to the wave block dispatcher) that distinguishes the messages parts (corresponds to the several waves divided) based on a unique identifier (corresponds to the p dimension)).
Abadi et al. 1 in view of Azarkhish et al. in view of Graham et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1 and Azarkhish et al. with Graham et al., with motivation to divide each tile block into several tiles, divide each tile into several17700333.118230332-30005Attorney Docket No.: 230332-30005 wave blocks and then each wave The block is divided into waves and the waves with the same rendered features are processed in the same neuron block; The specific steps are as follows: Step 2, the tile dispatcher in the parietal engine acquires the tile block and divides the tile block of the a dimension to form a plurality of tiles; Step 3: The block wave scheduler in the parietal engine acquires the tile; and the wave block is sent to the flow sensor processor in the parietal engine; Step 4: The block wave dispatcher in the flow sensor processor acquires the wave block and divides it into a plurality of waves based on the p dimension. “DECAF (Distributed, Environment Centered Agent Framework) is a software toolkit for the rapid design, development, and execution of ‘‘intelligent’’ agents to achieve solutions in complex software systems. DECAF is based on the premise that execution of the actions required to accomplish a task specified by an agent program is similar to a traditional operating system executing a sequence of user requests. In the same fashion that an operating system provides an environment for the execution of a user request, an agent framework provides the needed environment for the execution of agent actions. The agent environment includes the ability to communicate with other agents, efficiently maintain the current state of an executing agent, and select an execution path from a set of possible execution paths so as to support persistent, flexible, and robust actions” (Graham et al., Abstract). The proposed teaching is beneficial in that it achieves solutions in complex software systems.
Abadi et al. 1 in view of Azarkhish et al. in view of Graham et al. does not appear to explicitly teach In step 6, there is a multiply accumulator set in the neuron block in the neuron station each multiply accumulator set processes waves having the same beta dimension
However, Du et al., teaches In step 6, there is a multiply accumulator set in the neuron block in the neuron station each multiply accumulator set processes waves having the same beta dimension (Du et al., Section 4 Pg. 94, “a purely spatial hardware implementation of a neural network would devote a separate accumulation unit for each neuron and a separate multiplier for each synapse” teaches a multiplier and accumulation unit for the neuron (corresponds to the neuron block) in the CNN layers (corresponds to the neuron station. Fig. 14 and Section 8.2 Pg. 99, “pooling layer, pooling windows of adjacent output neurons are adjacent but non-overlapping, i.e., the step size of window sliding equals to the window size. We present in Figure 14 the execution flow of one such pooling layer” teaches the neurons consist of pooling window and step size (corresponds to wave) that are adjacent and equal (corresponds to the same beta dimension)).  
Abadi et al. 1 in view of Azarkhish et al. in view of Graham et al. in view of Du et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1, Azarkhish et al., and Graham et al. with Du et al., with motivation of In step 6, there is a multiply accumulator set in the neuron block in the neuron station each multiply accumulator set processes waves having the same beta dimension. “In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60× more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86 mm2 and consuming only 320 mW, but still about 30× faster than high-end GPUs” (Du et al., Abstract). The proposed teaching is beneficial in that it reduces the neural network memory footprint and is more energy efficient.
Regarding Claim 7,
Abadi et al. 1 in view of Azarkhish et al. in view of Graham et al. in view of Du et al. teaches the flexible data stream processing method for an artificial intelligence device according to claim 6, 
Abadi et al. 1 further teaches wherein in step 1, the tile scheduler divides the number of tile230332-30005Attorney Docket No.: 230332-30005 blocks separated by the tensor from the parietal engine in the parietal engine group. The number of engines is the same (Abadi et al. 1, Section 5 Pg. 275, “The dataflow executor in each task handles requests from the master, and schedules the execution of the kernels that comprise a local subgraph” teaches a dataflow executor (corresponds to the tile scheduler), that handles the request from the distributed master, within the layered Tensorflow architecture that schedules execution. Section 3.1 Pg. 270, “In TensorFlow, we model all data as tensors (n-dimensional arrays) with the elements having one of a small number of primitive types, such as int32, float32, or string (where string can represent arbitrary binary data). Tensors naturally represent the inputs to and results of the common mathematical operations in many machine learning algorithms: for example, a matrix multiplication takes two 2-D tensors and produces a 2-D tensor; and a batch 2-D convolution takes two 4-D tensors and produces another 4-D tensor” teaches receiving the tensor information as input. Section 4.2 Pg. 273, “The dynamic partition (Part) operation divides the incoming indices into variable-sized tensors that contain the indices destined for each shard” teaches dividing the received tensor into a plurality of tensor shard (corresponds to the tile blocks). Fig. 2 and Section 4.3 Pg. 274, “Save writes one or more tensors to a checkpoint file, and Restore reads one or more tensors from a checkpoint file” teaches the one or more tensors (corresponds to the tile block) being allocated to a checkpoint file (corresponds to the parietal engine group with parietal engine)).
Regarding Claim 8,
Abadi et al. 1 in view of Azarkhish et al. in view of Graham et al. in view of Du et al. teaches a flexible data stream processing method for an artificial intelligence device according to claim 6 
Azarkhish et al. further  teaches wherein the size of the tiles, tiles, blocks and waves is programmable (Azarkhish et al., Section 2.3, “A glance at the SoA highlights two main directions: (I) Application-specific architectures based on ASIC/FPGAs [13], [15], [16], [17], [18], [34], [38]; (II) Software implementations on programmable general-purpose platforms such as CPUs and GPUs [13], [30], [39], [40]. ASIC ConvNet implementations achieve impressive energy efficiency and performance” teaches the software implementation of the ConvNet being programmable. Fig. 3 and Section 4.1, “A 4D-tile (illustrated in Figs. 3a and 3b) is a subset of the input volume (called Input-tile) and output volume (Output-tile) of a convolutional layer (l) identified by the (T(l)Xi, T(l)Yi, T(l)Ci, T(l)Co) tuple. T(l)Xi and T(l)Yi are the tile width and height of the input volume of layer l, and T(l)Ci and T(l)Co are the numbers of input and output channels to the tile. The output dimensions of each tile are calculated directly from input width and height, filter dimensions, striding, and zero-padding parameters” teaches a ConvNet that contains Raw-Tiles (corresponds to the tile blocks), augmented tile (corresponds to the tiles), and the 4D-Tiles (corresponds to the waves) with their size).
Abadi et al. 1 in view of Azarkhish et al. in view of Graham et al. in view of Du et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “the design and implementation of dataflow”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Abadi et al. 1, Azarkhish et al., and Graham et al. with Du et al., with motivation of In step 6, there is a multiply accumulator set in the neuron block in the neuron station each multiply accumulator set processes waves having the same beta dimension. “In this paper, we propose a flexible processor-in-memory (PIM) solution for scalable and energy-efficient execution of deep convolutional networks (ConvNets), one of the fastest-growing workloads for servers and high-end embedded systems. Our co-design approach consists of a network of Smart Memory Cubes (modular extensions to the standard HMC) each augmented with a many-core PIM platform called NeuroCluster. NeuroClusters have a modular design based on NeuroStream coprocessors (for Convolutionintensive computations) and general-purpose RISC-V cores. In addition, a DRAM-friendly tiling mechanism and a scalable computation paradigm are presented to efficiently harness this computational capability with a very low programming effort. NeuroCluster occupies only 8 percent of the total logic-base (LoB) die area in a standard HMC and achieves an average performance of 240 GFLOPS for complete execution of full-featured state-of-the-art (SoA) ConvNets within a power budget of 2.5 W. Overall 11 W is consumed in a single SMC device, with 22.5 GFLOPS/W energy-efficiency which is 3.5X better than the best GPU implementations in similar technologies. The minor increase in system-level power and the negligible area increase make our PIM system a cost-effective and energy efficient solution, easily scalable to 955 GFLOPS with a small network of just four SMCs” (Azarkhish et al., Abstract). The proposed teaching is beneficial in that it is a cost-effective and energy efficient solution.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Henry T Nguyen whose telephone number is (571)272-8860. The examiner can normally be reached Monday-Friday 8:00am-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HENRY TRONG NGUYEN/
Examiner, Art Unit 2125                                                                                                                                                                                             

/KAMRAN AFSHAR/           Supervisory Patent Examiner, Art Unit 2125