DETAILED ACTION
This is the first office action regarding application number 16/093,956, filed October 15, 2018.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Drawings
The drawings are objected to because of the following informalities:
Figure 1B: A typographical error in block labeled “Computation Module[[s]] 110”.  Appropriate correction is required.
Figures 1B, 1C: In both figures, there are two sets of bi-directional arrows originating from the Direct Memory Access Unit 102 and terminating at the same Computation Module[[s]] 110. It is not clear why there needs to be two identical paths originating and terminating between the same set of elements. Appropriate correction is required.
Figure 1C: There is bi-directional dotted line originating from the Data Converter 105 and terminating at the Computation Module 110. It is not clear from the figure why this needs to be a dotted line (versus having it as a solid line). Furthermore, it is also not clear why this dotted line needs to be bi-directional, given that paragraph [0038] in the specification describes a data converter as only transmitting converted data to a 
Figure 5: Operations 511A…511N are represented as dotted blocks. It is not clear from the figure why these operation elements are represented as dotted blocks. Appropriate correction is required.
Figure 7: It is unclear what “---” within in the Slave Computation Module 114N dotted block is supposed to represent. Appropriate correction is required.
Figure 7: There are no reference characters for blocks “Adder”, “Bias operation”, “Activation function”, and “Output Vector” within Master Computation Module 112 (corresponding to the same operations described in paragraph [0083]). Appropriate correction is required. 
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities:
Paragraph [0002]: The Specification amendment received on 10/19/2018 to revise paragraph [0002] is inconsistent with the Application Data Sheet (ADS) filed on 10/19/2018. Both the ADS filed on 10/19/2018 and paragraph [0002] in the original specification filed on 10/15/2018 indicate the same PCT application number 079431, which in turn is consistent with the submitted WIPO publication WO2017/177442 submitted on 10/15/2018. However, the specification amendment received on 10/19/2018 indicates a different PCT application number PCT/CN2016/079443. Appropriate correction is required.
Paragraphs [0028], [0029], [0034], and [0047]: In paragraphs [0028] and [0029], both “computing process 100” and “neural network processor 100” are used to describe the same reference character 100. Paragraphs [0034] and [0047] uses “neural network processor 100”, but paragraph [0047] also uses “MNN acceleration processor 100”. The same reference character must be described using the same descriptive label. Appropriate correction is required.
Paragraph [0038]: A typographical error in line 3: “data convert[er] 105”. Appropriate correction is required.
Paragraph [0048], line 2: It is unclear what the term “the process” in the phrase “by the master computation 112 in the process” is referencing; it is either a typographical error (e.g., it is the “neural network processor 100”), or this sentence is referencing a process/step/method (i.e., a hardware or software process). Appropriate correction is required.
Paragraph [0051], Table 1: There are several issues with this table: 1). There is no descriptive text in the table caption describing this table (i.e., the table caption only says ‘Table 1’).; 2). There are no row/column headers to describe what the numbers in each row/column represent.; and 3). There is no description in the surrounding paragraph on how to interpret and derive the numbers within each table row/column entry. Appropriate correction is required.
Paragraph [0057]: This paragraph refers to a “slave computation module 114”, but Figure 4 shows a “slave computation module 114N
Paragraph [0060], [0062]-[0064]: These paragraphs reference a “slave computation module 114”.  However, per paragraph [0034], “114” is meant to reference the one or more slave computation modules (114A-114N). Appropriate correction is required.
Paragraph [0069]: A typographical error in line 12 “continuous data process[or] 504”. Appropriate correction is required.
Paragraph [0069]: This appears to be an incomplete paragraph, as the last sentence does not have a punctuation mark and does not complete the statement it is trying to convey (“That is, the received MNN data may be further transmitted to continuous data process 504 configured to process”). Either this paragraph should be removed, or edited such that it does not contain new matter. Appropriate correction is required. 
Paragraph [0079], Table 2: There are several issues with this table: 1). There is no descriptive text in the table caption describing this table (i.e., the table caption only says ‘Table 2’.).; 2). There are no row/column headers to describe what the numbers in each row/column represent.; and 3). There is no description in the surrounding paragraph on how to interpret and derive the numbers within each table row/column entry. Appropriate correction is required.

Claim Objections




Claim 5 is objected to because of the following informalities: A typographical error in the following limitation: “wherein the interconnection unit is connected to the master computation module and the one or more slave computation modules, and exchange[s] data between the master computation module and the one or more slave computation modules”. Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are:
Claim 1: “the master computation module configured to: receive … , and transmit …” and 
“the master computation module is further configured to: calculate … , and generate …”
Claim 1: “the one or more slave computation modules configured to receive … , and calculate …,”
Claim 1: “a controller unit configured to transmit …”
Claim 2: “the interconnection unit is configured to combine …”
Claim 3: “the one or more slave computation modules are configured to parallel calculate …”
Claim 4: “the master computation module is configured to perform one operation selected from the group consisting of: …”
Claim 6: “a master computation unit configured to perform one of one or more operations …”
Claim 7: “a master data dependency relationship determination unit configured to prevent an instruction from being executed …”
Claim 8: “a slave data dependency relationship determination unit configured to perform data exchange operations … ;”
Claim 9: “the slave data dependency relationship determination unit is configured to: determine whether there is a dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed;”
Claims 10, 11: “an operation determiner configured to determine an operation to be performed … ”
Claims 10, 11: “a hybrid data processor configured to perform the determined operation.”
Claim 12: “a data type determiner configured to determine the data type of the input data;”
Claim 12: “the discrete data processor is configured to process the input data based on the determination that the input data is stored as discrete values,”
Claim 12: “the continuous data processor is configured to process the input data based on a determination that the input data is stored as continuous values.”
Claim 13: “a data converter configured to: receive …, convert …, and transmit …”
Claim 14: “a preprocessing unit configured to clip a portion of the input data … ;”
Claim 14: “a distance calculator configured to calculate multiple distance values …;”
Claim 14: “a comparer configured to compare  …”
Claim 15: “the data converter is configured to receive … ”
Claim 16: “the data converter configured to: receive  … , convert … , and transmit … ”
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-16, 20-21, 23-29, and 31 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding Claim 1, 
The term “the computation module” in the limitation “a controller unit configured to transmit one or more instructions to the computation module” is indefinite, since it is unclear exactly which computation module mentioned earlier in Claim 1 is being referenced in this limitation: “a computation module that includes a master computation module and one or more slave computation modules”, “a master computation module”, “one or more slave computation modules”, or all modules that are considered as computation modules. Hence, this lack of clarity renders this claims as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Claims 2-16 are dependent claims tracing back to independent parent Claim 1, and as such inherit the same indefiniteness established in Claim 1. Hence, Claims 2-16 are also rejected as being indefinite by virtue of dependency.
Regarding Claim 4,
Claim 4 recites the limitation "wherein the master computation module is configured to perform one operation selected from the group consisting of: ..." in line 2. There is insufficient antecedent basis for this limitation in the claim, since there is no earlier reference to a Markush grouping “a group consisting of: …” that contains the list of subsequent operations. For the purposes of examination, this claim limitation will be interpreted as “wherein the master computation module is configured to perform one operation selected from [a] group consisting of: …”.
Claim 4 further recites the limitation “wherein the activation function is a function selected from the group consisting of non-linear sigmoid, tanh, relu, and softmax;” in lines 4-There is insufficient antecedent basis for this limitation in the claim, since there is no earlier reference to a Markush grouping “a group consisting of …” that contains the list of subsequent activation functions. For the purposes of examination, this claim limitation will be interpreted as “wherein the activation function is a function selected from [a] group consisting of non-linear sigmoid, tanh, relu, and softmax;”.
Regarding Claim 5,
The term “wherein the interconnection unit … exchange[s] data” in the limitation “wherein the interconnection unit is connected to the master computation module and the one or more slave computation modules, and exchange[s] data between the master computation module and the one or more slave computation modules” is indefinite, since it is unclear which types of data from Claim 1 is being referenced in this limitation: “one or more groups of MNN data”, “data type of each of the one or more groups of MNN data”, “one or more groups of slave output values”, “a merged intermediate vector”, “an output vector”, or all of the above. Hence, this lack of clarity renders this claims as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Regarding Claim 6,
The term “one of one or more operations” in the limitation “a master computation unit configured to perform one of one or more operations that corresponds to the data type of each of the one or more groups of MNN data” is indefinite, since it is unclear what more operations are being referenced in the limitation. Independent parent claim 1 only explicitly recites one operation associated with the master computation module that is associated with a data type: “wherein the master computation module is further configured to: … calculate a merged intermediate vector based on the data type of each of the one or more groups of MNN data”, The other limitations recited for the master computation module do not explicitly indicate an association based on a data type for each of the one or more groups of MNN data. Given that it is unclear what more operations are included in the term “one of one or more operations”, this lack of clarity renders this claim as being indefinite. For the purposes of examination, the term “one of one or more operations” will be associated with the one operation “wherein the master computation module is further configured to: … calculate a merged intermediate vector based on the data type of each of the one or more groups of MNN data”.
Claims 10 and 12 are dependent claims tracing back to parent Claim 6, and as such inherit the same indefiniteness established in Claim 6. Hence, Claims 10 and 12 are also rejected as being indefinite by virtue of dependency.
Regarding Claim 8,
The term “data exchange operations” in the limitation “a slave data dependency relationship determination unit configured to perform data exchange operations based on a determination that no conflict exists between the data exchange operations” is indefinite, since it is unclear what data exchange operations are being referenced in this limitation. Independent parent claim 1 recites that the one or more slave computation modules “receive one or more groups of MNN data”, and “calculate one or more groups of slave output values based on a data type of each of the one or more groups of MNN data”, but neither of these limitations are indicated as being considered as data exchange operations. Given that it is unclear whether the term “data exchange operations” encompasses these operations performed within the slave computation modules, this lack of clarity renders this claim as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Claims 9 and 11 are dependent claims tracing back to parent Claim 8, and as such inherit the same indefiniteness established in Claim 8. Hence, Claims 9 and 11 are also rejected as being indefinite by virtue of dependency.
Regarding Claim 9,
Claim 9 recites the terms “the micro-instruction which …” and “that micro-instruction which …” in several places within the limitation: “if there is no dependent relationship, allow the micro-instruction which has not been executed to be executed immediately, otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.”. The presence of these terms renders the claim as being indefinite, since it is unclear whether the two terms are referencing “a first micro-instruction” or “a second micro-instruction” recited earlier in the same claim (i.e., “determine whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed”). It is recommended that the applicant uses the terms “the first micro-instruction” and “the second micro-instruction” in place of “the micro-instruction which …” and “that micro-instruction which …” in the appropriate claim limitations to improve the clarity of the claim.
Claim 9 further recites the term “all of the micro-instructions” in the limitation “otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.”. There is insufficient antecedent basis for this limitation in the claim, since Claim 9 only recites “a first micro-instruction” and “a second micro-instruction” and does not reference a plurality of micro-instructions, either as a separate group, or associated with either the first micro-instruction or the second micro-instruction. Furthermore, none of the parent claims 1 and 8 recite any plurality of micro-instructions. It is recommended that the applicant correct this phrasing within the claim limitation to improve the clarity of the claim.
Claim 9 further recites the word “depend” that renders the following claim limitation to be incoherent: “otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.”, where the placement of the word “depend” renders the claim as being indefinite, as it is unclear which parts of the 
Regarding Claim 10,
The term “a hybrid data processor” in the limitation “a hybrid data processor configured to perform the determined operation” is indefinite, since it is unclear what the word “hybrid” is modifying, the processor (i.e., a hybrid processor that processes data) or the data (i.e., a processor that processes hybrid data). Parent Claims 1 and 6 do not recite any additional limitations including other data types that would quantify the term “hybrid data”, and these parent claims also do not recite any additional limitations including other types of processor that would quantify the term “hybrid processor”. Hence, this lack of clarity renders this claim as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Regarding Claim 11,
The term “a hybrid data processor” in the limitation “a hybrid data processor configured to perform the determined operation” is indefinite, since it is unclear what the word “hybrid” is modifying, the processor (i.e., a hybrid processor that processes data) or the data this lack of clarity renders this claim as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Regarding Claim 13, 
The term “the computation module” in the limitation “transmit the discrete data to the computation module” is indefinite, since it is unclear exactly which computation module mentioned earlier in independent parent Claim 1 is being referenced in this limitation: “a computation module that includes a master computation module and one or more slave computation modules”, “a master computation module”, “one or more slave computation modules”, or all modules that are considered as computation modules. Hence, this lack of clarity renders this claims as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Claims 14 and 15 are dependent claims tracing back to parent Claim 13, and as such inherit the same indefiniteness established in Claim 13. Hence, Claims 14 and 15 are also rejected as being indefinite by virtue of dependency.
Regarding Claim 20,
Claim 20 recites the limitation "performing, by the master computation module, one operation selected from the group consisting of: ..." in line 2. There is insufficient antecedent basis for this limitation in the claim, since there is no earlier reference to a Markush grouping “a group consisting of: …” that contains the list of subsequent operations. For the purposes of examination, this claim limitation will be interpreted as “wherein the master computation module is configured to perform one operation selected from [a] group consisting of: …”.
wherein the activation function is a function selected from the group consisting of non-linear sigmoid, tanh, relu, and softmax;” in lines 4-6. There is insufficient antecedent basis for this limitation in the claim, since there is no earlier reference to a Markush grouping “a group consisting of …” that contains the list of subsequent activation functions. For the purposes of examination, this claim limitation will be interpreted as “wherein the activation function is a function selected from [a] group consisting of non-linear sigmoid, tanh, relu, and softmax;”.
Regarding Claim 21,
The term “one of one or more operations” in the limitation “performing, by a master computation unit of the master computation module, one of one or more operations that corresponds to the data type of each of the one or more groups of MNN data” is indefinite, since it is unclear what more operations are being referenced in the limitation. Independent parent claim 17 only explicitly recites one operation associated with the master computation module that is associated with a data type: “calculating, by the master computation module, a merged intermediate vector based on the data type of each of the one or more groups of MNN data”, The other limitations recited for the master computation module do not explicitly indicate an association based on a data type for each of the one or more groups of MNN data. Given that it is unclear what more operations are included in the term “one of one or more operations”, this lack of clarity renders this claim as being indefinite. For the purposes of examination, the term “one of one or more operations” will be associated with the one operation “calculating, by the master computation module, a merged intermediate vector based on the data type of each of the one or more groups of MNN data”.
Claims 25 and 27 are dependent claims tracing back to parent Claim 21, and as such inherit the same indefiniteness established in Claim 21. Hence, Claims 25 and 27 are also rejected as being indefinite by virtue of dependency.
Regarding Claim 23,
data exchange operations” in the limitation “performing, by a slave data dependency relationship determination unit of each of the slave computation modules, data exchange operations based on a determination that no conflict exists between the data exchange operations” is indefinite, since it is unclear what data exchange operations are being referenced in this limitation. Independent parent claim 17 recites that the one or more slave computation modules performs “calculating, by one or more slave computations modules of the computation module, one or more groups of slave output values based on a data type of each of the one or more groups of MNN data”, but this limitation is not indicated as being considered as data exchange operations. Given that it is unclear whether the term “data exchange operations” encompasses these operations performed within the slave computation modules, this lack of clarity renders this claim as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Claims 24 and 26 are dependent claims tracing back to parent Claim 23, and as such inherit the same indefiniteness established in Claim 23. Hence, Claims 24 and 26 are also rejected as being indefinite by virtue of dependency.
Regarding Claim 24,
Claim 24 recites the terms “the micro-instruction which …” and “that micro-instruction which …” in several places within the limitation “if there is no dependent relationship, allowing the micro-instruction which has not been executed to be executed immediately, otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.”, The presence of these terms renders the claim as being indefinite, since it is unclear whether the two terms are referencing “a first micro-instruction” or “a second micro-instruction” recited earlier in the same claim (i.e., “determining, by the slave dependency relationship determination unit, whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed”). It is recommended that the applicant uses the terms “the first micro-instruction” and “the second micro-instruction” in place of “the micro-instruction which …” and “that micro-instruction which …” in the appropriate claim limitations to improve the clarity of the claim.
Claim 24 further recites the term “all of the micro-instructions” in the limitation “otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.”. There is insufficient antecedent basis for this limitation in the claim, since Claim 24 only recites “a first micro-instruction” and “a second micro-instruction” and does not reference a plurality of micro-instructions, either as a separate group, or associated with either the first micro-instruction or the second micro-instruction. Furthermore, none of the parent claims 17 and 23 recite any plurality of micro-instructions. It is recommended that the applicant correct this phrasing within the claim limitation to improve the clarity of the claim.
Claim 24 further recites the word “depend” that renders the following claim limitation to be incoherent: “otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.”, where the placement of the word “depend” renders the claim as being indefinite, as it is unclear which parts of the limitation is being modified by the word “depend”. For example, it is not clear whether this limitation is indicating that a) all micro-instructions having a dependency associated with the first micro-instruction (that has not been executed) must be completed before executing the first micro-instruction; b) all micro-instructions having a dependency associated with the second micro-instruction (in the process of being executed) must be completed before executing the first micro-instruction; or c) a micro-instruction in the set of all micro-instructions (where this micro-instruction has not been executed but has a dependency associated with 
Regarding Claim 25,
The term “a hybrid data processor” in the limitation “performing, by a hybrid data processor of the master computation unit, the determined operation” is indefinite, since it is unclear what the word “hybrid” is modifying, the processor (i.e., a hybrid processor that processes data) or the data (i.e., a processor that processes hybrid data). Parent Claims 17 and 21 do not recite any additional limitations including other data types that would quantify the term “hybrid data”, and these parent claims also do not recite any additional limitations including other types of processor that would quantify the term “hybrid processor”. Hence, this lack of clarity renders this claim as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Regarding Claim 26,
The term “a hybrid data processor” in the limitation “performing, by a hybrid data processor of the slave computation unit, the determined operation” is indefinite, since it is unclear what the word “hybrid” is modifying, the processor (i.e., a hybrid processor that processes data) or the data (i.e., a processor that processes hybrid data). Parent Claims 17 and 23 do not recite any additional limitations including other data types that would quantify the term “hybrid data”, and these parent claims also do not recite any additional limitations including other types of processor that would quantify the term “hybrid processor”. Hence, this lack of clarity renders this claim as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Regarding Claim 28, 
the computation module” in the limitation “transmitting, by the data converter, the discrete data to the computation module” is indefinite, since it is unclear exactly which computation module mentioned earlier in independent parent Claim 17 is being referenced in this limitation: “a computation module”, “a master computation module of a computation module”, “one or more slave computation modules of the computation module”, or all modules that are considered as computation modules. Hence, this lack of clarity renders this claims as being indefinite. For the purposes of examination, this claim limitation will be addressed accordingly in the context of the prior art.
Claims 29 and 31 are dependent claims tracing back to independent parent Claim 28, and as such inherit the same indefiniteness established in Claim 28. Hence, Claims 29 and 31 are also rejected as being indefinite by virtue of dependency.

Double Patenting










A rejection based on double patenting of the “same invention” type finds its support in the language of 35 U.S.C. 101 which states that “whoever invents or discovers any new and useful process... may obtain a patent therefor...” (Emphasis added). Thus, the term “same invention,” in this context, means an invention drawn to identical subject matter. See Miller v. Eagle Mfg. Co., 151 U.S. 186 (1894); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Ockert, 245 F.2d 467, 114 USPQ 330 (CCPA 1957).
A statutory type (35 U.S.C. 101) double patenting rejection can be overcome by canceling or amending the claims that are directed to the same invention so they are no longer coextensive in scope. The filing of a terminal disclaimer cannot overcome a double patenting rejection based upon 35 U.S.C. 101.
Claims 1-31 are provisionally rejected under 35 U.S.C. 101 as claiming the same invention as that of claims 1-31 of copending Application No. 16/182,420 (reference application). This is a provisional statutory double patenting rejection since the claims directed 
Instant Application (application 16/093,956 - CIP)
Copending Application (#16/182,420 - CIP)
Applicant: Cambricon Technologies Corporation Limited
Applicant: Cambricon Technologies Corporation Limited
Inventors: Shaoli Liu, Yong Yu, Yunji Chen, Tianshi Chen
Inventors: Shaoli Liu, Yong Yu, Yunji Chen, Tianshi Chen
Filed: 10/15/2018
Filed: 11/06/2018


Claim 1
Claim 1
An apparatus for forward propagation of a multilayer neural network (MNN), comprising: 
a computation module that includes a master computation module and one or more slave computation modules, wherein the master computation module configured to: 
receive one or more groups of MNN data, wherein the one or more groups of MNN data include input data and one or more weight values and wherein at least a portion of the input data and the weight values are stored as discrete values, and transmit the MNN data to an interconnection unit; and 
wherein the one or more slave computation modules configured to receive the one or more groups of MNN data, and calculate one or more groups of slave output values based on a data type of each of the one or more groups of MNN data, 
wherein the master computation module is further configured to: calculate a merged intermediate vector based on the data type of each of the one or more groups of MNN data, and generate an output vector based on the merged intermediate vector; and 
32a controller unit configured to transmit one or more instructions to the computation module.
An apparatus for forward propagation of a multilayer neural network (MNN), comprising: 
a computation module that includes a master computation module and one or more slave computation modules, wherein the master computation module configured to: 
receive one or more groups of MNN data, wherein the one or more groups of MNN data include input data and one or more weight values and wherein at least a portion of the input data and the weight values are stored as discrete values, and transmit the MNN data to an interconnection unit; and 
wherein the one or more slave computation modules configured to receive the one or more groups of MNN data, and calculate one or more groups of slave output values based on a data type of each of the one or more groups of MNN data, 
wherein the master computation module is further configured to: calculate a merged intermediate vector based on the data type of each of the one or more groups of MNN data, and generate an output vector based on the merged intermediate vector; and 
33a controller unit configured to transmit one or more instructions to the computation module.


Claim 2
Claim 2
The apparatus of claim 1, 
wherein the interconnection unit is configured to combine the one or more groups of slave output values to generate one or more intermediate result vectors.
The apparatus of claim 1, 
wherein the interconnection unit is configured to combine the one or more groups of slave output values to generate one or more intermediate result vectors.


Claim 3
Claim 3
The apparatus of claim 1, 
wherein the one or more slave computation modules are configured to parallelly calculate the one or more groups of slave output values based on the input data and the weight values.
The apparatus of claim 1, 
wherein the one or more slave computation modules are configured to parallelly calculate the one or more groups of slave output values based on the input data and the weight values.


Claim 4
Claim 4
The apparatus of claim 1, 
wherein the master computation module is configured to perform one operation selected from the group consisting of: 
adding a bias value to the merged intermediate vector; 
activating the merged intermediate vector with an activation function, wherein the activation function is a function selected from the group consisting of non-linear sigmoid, tanh, relu, and softmax; 
outputting a predetermined value based on a comparison between the merged intermediate vector and a random number; and 
pooling the merged intermediate vector.
The apparatus of claim 1, 
wherein the master computation module is configured to perform one operation selected from the group consisting of: 
adding a bias value to the merged intermediate vector; 
activating the merged intermediate vector with an activation function, wherein the activation function is a function selected from the group consisting of non-linear sigmoid, tanh, relu, and softmax; 
outputting a predetermined value based on a comparison between the merged intermediate vector and a random number; and 
pooling the merged intermediate vector.


Claim 5
Claim 5
The apparatus of claim 1, 
wherein the interconnection unit is connected to the master computation module and the one or more slave computation modules and exchange data between the master computation module and the one or more slave computation modules.
The apparatus of claim 1, 
wherein the interconnection unit is connected to the master computation module and the one or more slave computation modules and exchange data between the master computation module and the one or more slave computation modules.  


Claim 6
Claim 6
The apparatus of claim 1, 
wherein the master computation module includes: 
a master neuron caching unit configured to temporarily store the input data and the output vector; and 
a master computation unit configured to perform one of one or more operations that corresponds to the data type of each of the one or more groups of MNN data.
The apparatus of claim 1, 
wherein the master computation module includes: 
a master neuron caching unit configured to temporarily store the input data and the output vector; and 
a master computation unit configured to perform one of one or more operations that corresponds to the data type of each of the one or more groups of MNN data.


Claim 7
Claim 7
The apparatus of claim 1, 
wherein the master computation module includes a master data dependency relationship determination unit configured to prevent an instruction from being executed based on a determination that a conflict exists between the instruction and other instructions.
The apparatus of claim 1, 
wherein the master computation module includes a master data dependency relationship determination unit configured to prevent an instruction from being executed based on a determination that a conflict exists between the instruction and other instructions.


Claim 8
Claim 8
The apparatus of claim 1, 
wherein each of the slave computation modules includes a slave computation unit configured to receive one or more groups of micro-instructions from the controller unit and to perform arithmetic logical operations that respectively correspond to the data type of the MNN data; 
a slave data dependency relationship determination unit configured to perform data 34exchange operations based on a determination that no conflict exists between the data exchange operations; 
a slave neuron caching unit configured to temporarily store the input data and the slave output values; and 
a weight value caching unit configured to temporarily store the weight values.
The apparatus of claim 1, 
wherein each of the slave computation modules includes a slave computation unit configured to receive one or more groups of micro-instructions from the controller unit and to perform arithmetic logical operations that respectively correspond to the data type of the MNN data; 
a slave data dependency relationship determination unit configured to perform data 35exchange operations based on a determination that no conflict exists between the data exchange operations; 
a slave neuron caching unit configured to temporarily store the input data and the slave output values; and 
a weight value caching unit configured to temporarily store the weight values.


Claim 9
Claim 9
The apparatus of claim 8, 
wherein the slave data dependency relationship determination unit is configured to: 
determine whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed; and 
if there is no dependent relationship, allow the micro-instruction which has not been executed to be executed immediately, 
otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.
The apparatus of claim 8, 
wherein the slave data dependency relationship determination unit is configured to: 
determine whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed; and 
if there is no dependent relationship, allow the micro-instruction which has not been executed to be executed immediately, 
otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.


Claim 10
Claim 10
The apparatus of claim 6, 
wherein the master computation unit includes an operation determiner configured to determine an operation to be performed based on the data type of the input data; and 
a hybrid data processor configured to perform the determined operation.
The apparatus of claim 6, 
wherein the master computation unit includes an operation determiner configured to determine an operation to be performed based on the data type of the input data; and 
a hybrid data processor configured to perform the determined operation.


Claim 11
Claim 11
The apparatus of claim 8, 
wherein the slave computation unit includes an operation determiner configured to determine an operation to be performed based on the data type of the input data; and 
a hybrid data processor configured to perform the determined operation.
The apparatus of claim 8, 
wherein the slave computation unit includes an operation determiner configured to determine an operation to be performed based on the data type of the input data; and 
a hybrid data processor configured to perform the determined operation.


Claim 12
Claim 12
The apparatus of claim 10, 
wherein the master computation unit further includes a data type determiner configured to determine the data type of the input data; and 
at least one of a discrete data processor or a continuous data processor, 
wherein the discrete data processor is configured to process the input data based on a determination that the input data is stored as discrete values, and 
wherein the continuous data processor is configured to process the input data based on a determination that the input data is stored as continuous values.
The apparatus of claim 10, 
wherein the master computation unit further includes a data type determiner configured to determine the data type of the input data; and 
at least one of a discrete data processor or a continuous data processor, 
wherein the discrete data processor is configured to process the input data based on a determination that the input data is stored as discrete values, and 
wherein the continuous data processor is configured to process the input data based on a determination that the input data is stored as continuous values.  


Claim 13
Claim 13
The apparatus of claim 1, further comprising 
a data converter configured to: 
receive continuous data, 
convert the continuous data to discrete data, and 
transmit the discrete data to the computation module.
The apparatus of claim 1, further comprising 
a data converter configured to: 
receive continuous data, 
convert the continuous data to discrete data, and 
transmit the discrete data to the computation module.


Claim 14
Claim 14
The apparatus of claim 13, 
wherein the data converter includes 
a preprocessing unit configured to clip a portion of the input data that is within a 36predetermined range to generate preprocessed data; 
a distance calculator configured to calculate multiple distance values between the preprocessed data and multiple discrete values; and 
a comparer configured to compare the multiple distance values to output one or more of the multiple discrete values.
The apparatus of claim 13, 
wherein the data converter includes 
a preprocessing unit configured to clip a portion of the input data that is within a 37predetermined range to generate preprocessed data; 
a distance calculator configured to calculate multiple distance values between the preprocessed data and multiple discrete values; and 
a comparer configured to compare the multiple distance values to output one or more of the multiple discrete values.


Claim 15
Claim 15
The apparatus of claim 13, 
wherein the data converter is configured to receive continuous data from an external storage device.
The apparatus of claim 13, 
wherein the data converter is configured to receive continuous data from an external storage device.


Claim 16
Claim 16
The apparatus of claim 1, further comprising a data converter configured to: 
receive continuous data from an external storage device, 
convert the continuous data to discrete data, and 
transmit the discrete data to the external storage device.
The apparatus of claim 1, further comprising 
a data converter configured to: 
receive continuous data from an external storage device, 
convert the continuous data to discrete data, and 
transmit the discrete data to the external storage device.


Claim 17
Claim 17
A method for forward propagation of a multilayer neural network (MNN), comprising: 
receiving, by a master computation module of a computation module, one or more groups of MNN data from a direct memory access unit, wherein the one or more groups of MNN data include input data and one or more weight values and wherein at least a portion of the input data and the weight values are stored as discrete values; 
calculating, by one or more slave computation modules of the computation module, 37one or more groups of slave output values based on a data type of each of the one or more groups of MNN data; 
calculating, by the master computation module, a merged intermediate vector based on the data type of each of the one or more groups of MNN data; and 
generating, by the master computation module, an output vector based on the merged intermediate vector.
A method for forward propagation of a multilayer neural network (MNN), comprising: 
receiving, by a master computation module of a computation module, one or more groups of MNN data from a direct memory access unit, wherein the one or more groups of MNN data include input data and one or more weight values and wherein at least a portion of the input data and the weight values are stored as discrete values; 
calculating, by one or more slave computation modules of the computation module, 38one or more groups of slave output values based on a data type of each of the one or more groups of MNN data; 
calculating, by the master computation module, a merged intermediate vector based on the data type of each of the one or more groups of MNN data; and 
generating, by the master computation module, an output vector based on the merged intermediate vector.


Claim 18
Claim 18
The method of claim 17, further comprising: 
combining, by an interconnection unit, the one or more groups of slave output values to generate one or more intermediate result vectors.
The method of claim 17, further comprising: 
combining, by an interconnection unit, the one or more groups of slave output values to generate one or more intermediate result vectors.


Claim 19
Claim 19
The method of claim 17, further comprising: 
parallelly calculating, by the one or more slave computation modules, the one or more groups of slave output values based on the input data and the weight values.
The method of claim 17, further comprising: 
parallelly calculating, by the one or more slave computation modules, the one or more groups of slave output values based on the input data and the weight values.


Claim 20
Claim 20
The method of claim 17, further comprising 
performing, by the master computation module, one operation selected from the group consisting of: 
adding a bias value to the merged intermediate vector; 
activating the merged intermediate vector with an activation function, wherein the activation function is a function selected from the group consisting of non-linear sigmoid, 38tanh, relu, and softmax; 
outputting a predetermined value based on a comparison between the merged intermediate vector and a random number; and 
pooling the merged intermediate vector.
The method of claim 17, further comprising 
performing, by the master computation module, one operation selected from the group consisting of: 
adding a bias value to the merged intermediate vector; 
activating the merged intermediate vector with an activation function, wherein the activation function is a function selected from the group consisting of non-linear sigmoid, 39tanh, relu, and softmax; 
outputting a predetermined value based on a comparison between the merged intermediate vector and a random number; and 
pooling the merged intermediate vector.


Claim 21
Claim 21
The method of claim 17, further comprising: 
temporarily storing, by a master neuron caching unit of the master computation module, the input data and the output vector; and 
performing, by a master computation unit of the master computation module, one of one or more operations that corresponds to the data type of each of the one or more groups of MNN data.
The method of claim 17, further comprising: 
temporarily storing, by a master neuron caching unit of the master computation module, the input data and the output vector; and 
performing, by a master computation unit of the master computation module, one of one or more operations that corresponds to the data type of each of the one or more groups of MNN data.


Claim 22
Claim 22
The method of claim 17, further comprising 
preventing, by a master data dependency relationship determination unit of the master computation module, an instruction from being executed based on a determination that a conflict exists between the instruction and other instructions.
The method of claim 17, further comprising 
preventing, by a master data dependency relationship determination unit of the master computation module, an instruction from being executed based on a determination that a conflict exists between the instruction and other instructions.


Claim 23
Claim 23
The method of claim 17, further comprising 
receiving, by a slave computation unit of each of the slave computation modules, one or more groups of micro-instructions from a controller unit;  
39performing, by the slave computation unit, arithmetic logical operations that respectively correspond to the data type of the MNN data; 
performing, by a slave data dependency relationship determination unit of each of the slave computation modules, data exchange operations based on a determination that no conflict exists between the data exchange operations; 
temporarily storing, by a slave neuron caching unit of each of the slave computation modules, the input data and the slave output values; and 
temporarily storing, by a weight value caching unit of each of the slave computation modules, the weight values.
The method of claim 17, further comprising 
receiving, by a slave computation unit of each of the slave computation modules, one or more groups of micro-instructions from a controller unit;  
40performing, by the slave computation unit, arithmetic logical operations that respectively correspond to the data type of the MNN data; 
performing, by a slave data dependency relationship determination unit of each of the slave computation modules, data exchange operations based on a determination that no conflict exists between the data exchange operations; 
temporarily storing, by a slave neuron caching unit of each of the slave computation modules, the input data and the slave output values; and 
temporarily storing, by a weight value caching unit of each of the slave computation modules, the weight values.


Claim 24
Claim 24
The method of claim 23, further comprising: 
determining, by the slave data dependency relationship determination unit, whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed; and 
if there is no dependent relationship, allowing, by the slave data dependency relationship determination unit, the micro-instruction which has not been executed to be executed immediately, 
otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.
The method of claim 23, further comprising: 
determining, by the slave data dependency relationship determination unit, whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed; and 
if there is no dependent relationship, allowing, by the slave data dependency relationship determination unit, the micro-instruction which has not been executed to be executed immediately, 
otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.


Claim 25
Claim 25
The method of claim 21, further comprising: 
determining, by an operation determiner of the master computation unit, an operation to be performed based on the data type of the input data; and 
performing, by a hybrid data processor of the master computation unit, the determined operation.
The method of claim 21, further comprising: 
determining, by an operation determiner of the master computation unit, an operation to be performed based on the data type of the input data; and 
performing, by a hybrid data processor of the master computation unit, the determined operation.


Claim 26
Claim 26
The method of claim 23, further comprising: 
determining, by an operation determiner of the slave computation unit, an operation to be performed based on the data type of the input data; and 
performing, by a hybrid data processor of the slave computation unit, the determined operation.
The method of claim 23, further comprising: 
determining, by an operation determiner of the slave computation unit, an operation to be performed based on the data type of the input data; and 
performing, by a hybrid data processor of the slave computation unit, the determined operation.


Claim 27
Claim 27
The method of claim 21, further comprising: 
determining, by a data type determiner of the master computation unit, the data type of the input data; and 
processing, by a discrete data processor of the master computation unit, the input data based on a determination that the input data is stored as discrete values; and 
processing, by a continuous data processor of the master computation unit, the input data based on a determination that the input data is stored as continuous values.
The method of claim 21, further comprising: 
determining, by a data type determiner of the master computation unit, the data type of the input data; and 
processing, by a discrete data processor of the master computation unit, the input data based on a determination that the input data is stored as discrete values; and 
processing, by a continuous data processor of the master computation unit, the input data based on a determination that the input data is stored as continuous values.


Claim 28
Claim 28
The method of claim 17, further comprising 
receiving, by a data converter, continuous data; 
converting, by the data converter, the continuous data to discrete data; and 
transmitting, by the data converter, the discrete data to the computation module.
The method of claim 17, further comprising 
receiving, by a data converter, continuous data; 
converting, by the data converter, the continuous data to discrete data; and 
transmitting, by the data converter, the discrete data to the computation module.


Claim 29
Claim 29
The method of claim 28, further comprising 
receiving, by the data converter, continuous data from an external storage device.
The method of claim 28, further comprising 
receiving, by the data converter, continuous data from an external storage device.


Claim 30
Claim 30
The method of claim 17, further comprising 
receiving, by a data converter, continuous data from an external storage device; 
converting, by the data converter, the continuous data to discrete data; and 
transmitting, by the data converter, the discrete data to the external storage device.
The method of claim 17, further comprising 
receiving, by a data converter, continuous data from an external storage device; 
converting, by the data converter, the continuous data to discrete data; and 
transmitting, by the data converter, the discrete data to the external storage device.


Claim 31
Claim 31
The method of claim 28, further comprising: 
clipping, by a preprocessing unit of the data converter, a portion of the input data that is within a predetermined range to generate preprocessed data; 
calculating, by a distance calculator of the data converter, multiple distance values between the preprocessed data and multiple discrete values; and 
comparing, by a comparer of the data converter, the multiple distance values to output one or more of the multiple discrete values.
The method of claim 28, further comprising: 
clipping, by a preprocessing unit of the data converter, a portion of the input data that is within a predetermined range to generate preprocessed data; 
calculating, by a distance calculator of the data converter, multiple distance values between the preprocessed data and multiple discrete values; and 
comparing, by a comparer of the data converter, the multiple distance values to output one or more of the multiple discrete values.


Claim Rejections - 35 USC § 103


The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-12 are rejected under 35 U.S.C. 103 as being unpatentable over Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, Microprocessors and Microsystems Vol. 19 Number 8, Elsevier Science B.V., October 1995, pp.447-465 [hereafter referred as Hamalainen] in view of Henry et al., U.S. .
Regarding Claim 1, Hamalainen teaches
An apparatus for forward propagation of a multilayer neural network (MNN), comprising:
a computation module that includes a master computation module and one or more slave computation modules (Hamalainen p.448 Figure 1: examiner’s note: A tree-shaped parallel computer architecture consisting of a root/master processing element (PE) (corresponding to “a master computation module”) and leaves/slave PEs (corresponding to “one or more slave computation modules”), interconnected by a series of interconnecting nodes and communication network, where the collective architecture is considered as one large processing element (corresponding to “a computation module”) (Hamalainen p.449 col.2 Tree Shape Architecture for Neural Computations, 1st – p.449 col.1 1st paragraph: “The tree shape parallel computer architecture is depicted in Figure 1. The top of the tree is formed by a horizontal line of PEs, the trunk is composed of interconnecting nodes and the root may be a PE or an interface to another system. In general, this architecture can be referred to as a master-slave configuration, where the root acts as a master and the top PEs as slaves. The root may be only an initiator for PEs in the beginning of execution, or it may feed PEs continuously in real-time. … the communication network as a whole becomes an active 'PE'.”).), 
wherein the master computation module configured to: 
receive one or more groups of MNN data (Hamalainen p.449 Figure 2; p.450 Figure 3; p.451 Figure 4: examiner’s note: The parallel computer architecture supporting two parallelism modes (node parallelism and weight parallelism), with both providing input and weight data from the master PE through broadcasting (node parallelism) or through a distributed assignment of inputs and weights to the processing elements (weight parallelism), where in both modes the master computation module receives collectively the input vector and weight inputs  (Hamalainen p.449 col.1 3rd paragraph - col.2 2nd paragraph: “In the node parallel mapping, … the input vector is the same for all neurons … The communication network broadcasts the input vector from the root to the PEs … In the weight parallelism the calculation of a single neuron output is distributed to all elements in the tree. Each PE is assigned only one input (and weight) of the neuron. Due to this, the input vector is now delivered element by element to PEs, such that the leftmost PE gets the first element and so on. This cannot be done by broadcasting, so the root has to write elements individually. The task of the PE is simply to multiply the input element with a weight assigned to that neuron input.”).), 
wherein the one or more groups of MNN data include input data and one or more weight values (Hamalainen p.449 col.2 2nd paragraph – p.450 col.1 2nd paragraph: examiner’s note: The input vector data broadcasted by the master PE to the slave PEs determines the selection and adjustment of the weights assigned to each PE during neural network computations, such that the input vector becomes a representation of the weight data, thus corresponding to “one or more groups of MNN data include input data and one or more weight values” (“The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. Now the communication network sums these weighted inputs, as illustrated in Figure 4. … We have found it useful to let the communication network be an adder tree or a broadcasting medium. Different requirements are given with Kohonen's Self-Organizing Feature Map (SOFM) algorithms. … The idea is to let the weights of the neurons self-organize according to the topology of the input data. Practically this is done by first seeking the closest weight with respect to the input vector and then adjusting that weight towards the input vector. … Each PE is mapped to one neuron (or in fact weight), and the communication network is used to broadcast the input vector to all PEs.”).) and 
…
transmit the MNN data to an interconnection unit (Hamalainen p.452 Figure 6; p.454 Figure 9: examiner’s note: The communication units in the TUTNC parallel computer architecture (Hamalainen p.452 Figure 6, p.451 col.1 3rd paragraph – col.2 3rd paragraph: “…the interconnecting nodes in the trunk of the tree are called communication units (CUs) … The CUs are routing switch elements with a reduced set of arithmetic and logical functions. …”) and the bus structure between the CUs and PUs (Hamalainen p.454 Figure 9; p.453 col.2 2nd paragraph – p.454 cols.1 and 2) form a communication network, where the C-bus, A-bus, and D-bus represent command, address, and data buses that transmit data and commands between PUs and CUs within the tree shape architecture shown in Figures 6 and Figures 9; collectively the CUs and the communication network correspond to “an interconnection unit”.); and 
wherein the one or more slave computation modules configured to 
receive the one or more groups of MNN data (Hamalainen p.449 Figure 2; p.450 Figure 3; p.451 Figure 4: examiner’s note: The parallel computer architecture supporting two parallelism modes (node parallelism and weight parallelism), with both providing input and weight data from the master PE through broadcasting (node parallelism) or through a distributed assignment of inputs and weights to the processing elements (weight parallelism), where in both modes the one or more slave computation modules receive collectively the input vector and weight inputs as “one or more groups of MNN data” (Hamalainen p.449 col.1 3rd paragraph - col.2 2nd paragraph: “In the node parallel mapping, … the input vector is the same for all neurons … The communication network broadcasts the input vector from the root to the PEs … In the weight parallelism the calculation of a single neuron output is distributed to all elements in the tree. Each PE is assigned only one input (and weight) of the neuron. Due to this, the input vector is now delivered element by element to PEs, such that the leftmost PE gets the first element and so on. This cannot be done by broadcasting, so the root has to write elements individually. The task of the PE is simply to multiply the input element with a weight assigned to that neuron input.”).), and 
calculate one or more groups of slave output values (Hamalainen p.449, Figure 2 Broadcast operation, 450 Figure 3 Node parallel computation, p.451 Figure 4 Weight parallel computation: examiner’s note: For both parallelism modes, the receiving PEs (corresponding to “one or more slave computation modules”) perform a neural network computation involving a calculation of an intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    at each neural network layer (where intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    corresponds to “one or more groups of slave output values”) (Hamalainen p.449 col.1 2nd paragraph: “… consider a mapping of the layered perceptron neural network using node and weight parallel mapping styles. The task of the perception is to calculate a thresholded output from the sum of weighted inputs                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     = f(                        
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            j
                                            i
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    ), where                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     is the output of the ith neuron in a layer,                         
                            
                                
                                    w
                                
                                
                                    j
                                    i
                                
                            
                        
                     denotes a weight and                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     is an item of the input vector.”).) … , 
wherein the master computation module is further configured to: 
calculate a merged intermediate vector (Hamalainen p.449 Figure 2, p.450 Figure 3, and p.451 Figure 4: examiner’s note: For both parallelism modes, the root (master PE) receives the outputs produced by the slave PEs for each network layer i (corresponding to “calculating a merged intermediate vector”) and generates and stores this intermediate vector for use by the PEs to calculate the output for the next layer i+1, until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph: “In the node parallel mapping … When a PE has finished the calculation of the neuron output, the root collects all outputs and forms an output vector for that particular layer or neurons. The next neuron layer is processed by first broadcasting the output vector of the previous layer and then repeating the other steps. … In the weight parallelism the calculation of a single neuron output is distributed to all element in the tree. … The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. … After summation, the root performs thresholding and stores the neuron output to a vector for further usage. The output of the next neuron in a layer can be calculated by changing the weights in each PE …”).) … , and 
generate an output vector based on the merged intermediate vector (Hamalainen p.449 Figure 2, p.450 Figure 3, and p.451 Figure 4: examiner’s note: For both parallelism modes, the root (master PE) receives the outputs produced by the slave PEs for each network layer i and generates and stores this intermediate vector for use by the PEs to calculate the output for the next layer i+1, until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network (corresponding to “generate an output vector based on the merged intermediate vector”) (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph: “In the node parallel mapping … When a PE has finished the calculation of the neuron output, the root collects all outputs and forms an output vector for that particular layer or neurons. The next neuron layer is processed by first broadcasting the output vector of the previous layer and then repeating the other steps. … In the weight parallelism the calculation of a single neuron output is distributed to all element in the tree. … The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. … After summation, the root performs thresholding and stores the neuron output to a vector for further usage. The output of the next neuron in a layer can be calculated by changing the weights in each PE …”).); and 
a controller unit configured to transmit one or more instructions to the computation module ([Hamalainen p.452 Figure 6; p.454 Figure 9: examiner’s note: The communication units in the TUTNC parallel computer architecture (Hamalainen p.452 Figure 6, p.451 col.1 3rd paragraph – col.2 3rd paragraph: “…the interconnecting nodes in the trunk of the tree are called communication units (CUs) … The CUs are routing switch elements with a reduced set of arithmetic and logical functions. …”) and the bus structure between the CUs and PUs (Hamalainen p.454 Figure 9; p.453 col.2 2nd paragraph – p.454 cols.1 and 2) form a communication network, where the C-bus, A-bus, and D-bus represent command, address, and data buses that transmit data and commands between PUs and CUs within the tree shape architecture shown in Figures 6 and Figures 9; collectively the CUs and the communication network correspond to “an interconnection unit”.] [Hamalainen p.455 Figure 11: examiner’s note: A control unit (CRU) within a PU connecting the C-, A-, and D-buses to other processing elements within the PU via internal data and address buses, where this communication network effectively transmits data and instructions between all PUs (i.e., to/from the DSP to perform arithmetic logical operations, and to the master PE, thus corresponding to “a controller unit configured to transmit one or more instructions to the computation module”) (Hamalainen p.455 col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph: “The block diagram of the PU is shown in Figure 11. The three basic parts are the control unit (CRU), the digital signal processor (DSP) and the random-access memory (RAM). They are connected together with the address bus, data bus, control signals and status signals. The CRU is also connected to the C-, A, and D-buses. … The main function of the CRU is to decode commands from the C-bus, arbitrate memory accesses, and control the operation of the DSP.”).]).  
While Hamalainen teaches a DSP unit within each processing unit (Hamalainen p.455 Figure 11) with the capability to perform fixed point arithmetic (Hamalainen p.455 col.2 Processing Unit, 1st paragraph – p.456 col.1 1st paragraph), Hamalainen does not explicitly teach
wherein the master computation module configured to: …
… wherein at least a portion of the input data and the weight values are stored as discrete values, and 
wherein the one or more slave computation modules configured to …
calculate … based on a data type of each of the one or more groups of MNN data, …
wherein the master computation module is further configured to: …
calculate … based on the data type of each of the one or more groups of MNN data …
Henry teaches
wherein the master computation module configured to: …
… wherein at least a portion of the input data and the weight values are stored as discrete values (Henry Figure 1, elements 121, 126, 124, 122: examiner’s note: A NNU processing module (corresponding to a “computation module”) containing a plurality of NPUs (with the plurality of NPUs corresponding to “… that includes a master computation module and one or more slave computation modules”, and a single NPU corresponding to “a master computation module”) containing data and weight RAM to store input data and weight data (Henry paragraph [0051]: “The NNU 121 includes a weight random access memory (RAM) 124, a data RAM 122, N neural processing units (NPUs) 126, … a sequencer 128 … The NPUs 126 function conceptually as neurons in a neural network. The weight RAM 124, data RAM 122 and program memory 129 are all writable and readable … The weight RAM 124 is arranged as W rows of N weight words, and the data RAM 122 is arranged as D rows of N data words. Each data word and each weight word is a plurality of bits, preferably 8 bits, 9 bits, 12 bits or 16 bits.”), where each row of weight or data RAM is assigned to one of the NPUs within the NNU (Henry paragraph [0054]: “The sequencer 128 also generates a memory address 125 and a read command for provision to the weight RAM 124 to select one of the W rows of N weight words for provision to the N NPUs 126. … The sequencer 128 also generates a memory address 123 and a write command for provision to the data RAM 122 to select one of the D rows of N data words for writing from the N NPUs 126.”), and where each NPU performs arithmetic operations (add, multiply, accumulate; Henry paragraphs [0059]-[0060]) on the singular value weight word or data word stored in the weight or data RAM (corresponding to “wherein at least a portion of the input data and the weight values are stored as discrete values”, where under its broadest reasonable interpretation in light of the applicant’s specification paragraph [0007], an integer or fixed-point value represents a discrete data value) (Henry Figure 2, elements 126-J, 203, 206, 207, 209; paragraph [0062]: “Preferably, although the weight word 203 and the data word 209 are the same size (in bits), they may have different binary point locations, … Preferably, the multiplier 242 and adder 244 are integer multipliers and adders, as described in more detail below, to advantageously accomplish less complex, smaller, faster and lower power consuming ALUs 204 than floating-point counterparts. However, it should be understood that in other embodiments the ALU 204 performs floating-point operations.”).), and 
wherein the one or more slave computation modules configured to …
calculate … based on a data type of each of the one or more groups of MNN data (Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU (corresponding to “one or more slave computation modules”) is instructed via translated micro-instructions to perform neural network calculations such as multiply-accumulate, add, and activation functions on received input values and weight data from the weight  (Henry paragraph [0060]), where the multiply-accumulate, add operations within the ALU are configured by a control register (Henry paragraphs [0223]-[0227]) to perform integer or fixed-point operations (i.e., discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations; Henry paragraphs [0219]-[0221], where under its broadest reasonable interpretation in light of the applicant’s specification paragraphs [0005]-[0006], streaming data that involves high-precision values (such as floating-point values) represent continuous data values), and where the activation function unit (AFU) performs normalization operations to generate an output within a specified range of values based on a non-linear function (also controlled by the same control register; Henry paragraphs [0228]-[0231]); collectively these operations correspond to performing calculations “based on a data type of each of the one or more groups of MNN data” (Henry paragraph [0064]) (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).), …
wherein the master computation module is further configured to: …
calculate … based on the data type of each of the one or more groups of MNN data (Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU (corresponding to “the master computation module”) is instructed via translated micro-instructions to perform neural network calculations such as multiply-accumulate, add, and activation functions on received input values and weight data from the weight RAM and data RAM (Henry paragraph [0060]), where the multiply-accumulate, add operations within the ALU are configured by a control register (Henry paragraphs [0223]-[0227]) to perform integer or fixed-point operations (i.e.,  discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations; Henry paragraphs [0219]-[0221], where under its broadest reasonable interpretation in light of the applicant’s specification paragraphs [0005]-[0006], streaming data that involves high-precision values (such as floating-point values) represent continuous data values), and where the activation function unit (AFU) performs normalization operations to generate an output within a specified range of values based on a non-linear function (also controlled by the same control register; Henry paragraphs [0228]-[0231]); collectively these operations correspond to performing calculations “based on a data type of each of the one or more groups of MNN data” (Henry paragraph [0064]) (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).) …
Both Hamalainen and Henry are analogous art since they both teach performing neural network computations using neural network hardware architectures.
It would have been obvious to a person having ordinary skill in the art to substitute the DSP circuitry (found within each master and slave processing elements) taught in Hamalainen with an appropriate DSP circuitry that performs the specific arithmetic logic unit functions taught in Henry in order to provide the integer, fixed-point, and floating-point operations required for performing neural network data computations to produce the same predictable computation results for the invention.
Regarding Claim 2, Hamalainen in view of Henry teaches
The apparatus of claim 1, 
wherein the interconnection unit is configured to combine the one or more groups of slave output values to generate one or more intermediate result vectors ([Hamalainen p.452 Figure 6; p.454 Figure 9: examiner’s note: The communication units in the TUTNC parallel computer architecture (Hamalainen p.452 Figure 6, p.451 col.1 3rd paragraph – col.2 3rd paragraph) and the bus structure between the CUs and PUs (Hamalainen p.454 Figure 9; p.453 col.2 2nd paragraph – p.454 cols.1 and 2) form a communication network, where the C-bus, A-bus, and D-bus represent command, address, and data buses that transmit data and commands between PUs and CUs within the tree shape architecture shown in Figures 6 and Figures 9; collectively the CUs and the communication network correspond to “an interconnection unit”. Each CU has a reduced set of arithmetic and logical functions to perform comparison, subtraction, summation operations (Hamalainen p.451 col.2 2nd paragraph).] [Hamalainen p.449 Figure 2, p.450 Figure 3, and p.451 Figure 4: examiner’s note: For both parallelism modes, the root (master PE) receives the outputs produced by the slave PEs for each network layer i (corresponding to “calculating a merged intermediate vector”) and generates and stores this intermediate vector for use by the PEs to calculate the output for the next layer i+1, until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network (corresponding to “the interconnection unit is configured to combine the one or more groups of slave output values to generate one or more intermediate result vectors”) (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph: “In the node parallel mapping … When a PE has finished the calculation of the neuron output, the root collects all outputs and forms an output vector for that particular layer or neurons. The next neuron layer is processed by first broadcasting the output vector of the previous layer and then repeating the other steps. … In the weight parallelism the calculation of a single neuron output is distributed to all element in the tree. … The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. … After summation, the root performs thresholding and stores the neuron output to a vector for further usage. The output of the next neuron in a layer can be calculated by changing the weights in each PE …”).]).  
Regarding Claim 3, Hamalainen in view of Henry teaches
The apparatus of claim 1, 
wherein the one or more slave computation modules are configured to parallelly calculate the one or more groups of slave output values based on the input data and the weight values (Hamalainen p.449 Figure 2, p.450 Figure 3, and p.451 Figure 4: examiner’s note: For both parallelism modes, the root (master PE) receives the outputs produced by the slave PEs for each network layer i (corresponding to “calculating a merged intermediate vector”) and generates and stores this intermediate vector for use by the PEs to calculate the output for the next layer i+1, until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network (corresponding to “wherein the one or more slave computation modules are configured to parallelly calculate the one or more groups of slave output values based on the input data and the weight values”) (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph: “In the node parallel mapping … When a PE has finished the calculation of the neuron output, the root collects all outputs and forms an output vector for that particular layer or neurons. The next neuron layer is processed by first broadcasting the output vector of the previous layer and then repeating the other steps. … In the weight parallelism the calculation of a single neuron output is distributed to all element in the tree. … The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. … After summation, the root performs thresholding and stores the neuron output to a vector for further usage. The output of the next neuron in a layer can be calculated by changing the weights in each PE …”).).  
Regarding Claim 4, Hamalainen in view of Henry teaches
The apparatus of claim 1, 
wherein the master computation module is configured to perform one operation selected from the group consisting of: 
adding a bias value to the merged intermediate vector; 
activating the merged intermediate vector with an activation function ([Hamalainen p.449 col.1 2nd paragraph: “ … Thresholding is usually performed with a nonlinear function, e.g., sigmoid or hyperbolic tangent.”] [Henry paragraph [0064]: “Generally speaking, the activation function in a neuron of an intermediate layer of an artificial neural network may serve to normalize the accumulated sum of products, preferably in a non-linear fashion. … The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify).”]), 
wherein the activation function is a function selected from the group consisting of non-linear sigmoid, tanh, relu, and softmax (Henry paragraph [0064]: “Generally speaking, the activation function in a neuron of an intermediate layer of an artificial neural network may serve to normalize the accumulated sum of products, preferably in a non-linear fashion. … The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify).”); 
outputting a predetermined value based on a comparison between the merged intermediate vector and a random number; and 
pooling the merged intermediate vector.  
Regarding Claim 5, Hamalainen in view of Henry teaches
The apparatus of claim 1, 
wherein the interconnection unit is connected to the master computation module and the one or more slave computation modules (Hamalainen p.448 Figure 1: examiner’s note: A tree-shaped parallel computer architecture consisting of a root/master processing element (PE) (corresponding to “a master computation module”) and leaves/slave PEs, interconnected by a series of interconnecting nodes and communication network (corresponding to “wherein the Hamalainen p.452 Figure 6; p.454 Figure 9), such that the collective architecture is considered as one large processing element (Hamalainen p.449 col.2 Tree Shape Architecture for Neural Computations, 1st – p.449 col.1 1st paragraph: “The tree shape parallel computer architecture is depicted in Figure 1. The top of the tree is formed by a horizontal line of PEs, the trunk is composed of interconnecting nodes and the root may be a PE or an interface to another system. In general, this architecture can be referred to as a master-slave configuration, where the root acts as a master and the top PEs as slaves. The root may be only an initiator for PEs in the beginning of execution, or it may feed PEs continuously in real-time. … the communication network as a whole becomes an active 'PE'.”).) and 
exchange[s] data between the master computation module and the one or more slave computation modules ([Hamalainen p.452 Figure 6; p.454 Figure 9: examiner’s note: The communication units in the TUTNC parallel computer architecture (Hamalainen p.452 Figure 6, p.451 col.1 3rd paragraph – col.2 3rd paragraph: “…the interconnecting nodes in the trunk of the tree are called communication units (CUs) … The CUs are routing switch elements with a reduced set of arithmetic and logical functions. …”) and the bus structure between the CUs and PUs (Hamalainen p.454 Figure 9; p.453 col.2 2nd paragraph – p.454 cols.1 and 2) form a communication network, where the C-bus, A-bus, and D-bus represent command, address, and data buses that transmit data and commands between PUs and CUs within the tree shape architecture shown in Figures 6 and Figures 9; collectively the CUs and the communication network correspond to “an interconnection unit”.] [Hamalainen p.449 Figure 2, p.450 Figure 3, and p.451 Figure 4: examiner’s note: For both parallelism modes, the root (master PE) receives the outputs produced by the slave PEs for each network layer i (corresponding to “calculating a merged intermediate vector”) and generates and stores this Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph: “In the node parallel mapping … When a PE has finished the calculation of the neuron output, the root collects all outputs and forms an output vector for that particular layer or neurons. The next neuron layer is processed by first broadcasting the output vector of the previous layer and then repeating the other steps. … In the weight parallelism the calculation of a single neuron output is distributed to all element in the tree. … The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. … After summation, the root performs thresholding and stores the neuron output to a vector for further usage. The output of the next neuron in a layer can be calculated by changing the weights in each PE …”).]).  
Regarding Claim 6, Hamalainen in view of Henry teaches
The apparatus of claim 1, 
wherein the master computation module includes: 
a master neuron caching unit configured to temporarily store the input data and the output vector (Henry Figure 1, elements 114, 121: examiner’s note: Performing data transfers between the NNU 121 (which comprises of a plurality of NPUs 126 and the memory subsystem (which includes a memory management unit and associated cache memory, where a memory management unit performs logical separation and management of the cache memory hierarchy to different NPUs for use in storing input data, weight data, and merged intermediate and output vectors, thus corresponding to an assignment of “a master neuron caching unit configured to temporarily store the input data and the output vector”) (Henry paragraph [0051]: “ … Preferably, the memory subsystem 114 includes a memory management unit (not shown), which may include … a level-I data cache (and the instruction cache 102), a level-2 unified cache, and a bus interface unit that interfaces the processor 100 to system memory. In one embodiment, the processor 100 of FIG. 1 is representative of a processing core that is one of multiple processing cores in a multi-core processor that share a last-level cache memory. …” and Henry paragraph [0057]: “ … Furthermore, the large memory hierarchy of the memory subsystem 114, including the cache memories, provides very high data bandwidth for the transfers between the system memory and the NNU 121. Still further, preferably, the memory subsystem 114 includes hardware data prefetchers that track memory access patterns, such as loads of neural data and weights from system memory, and perform data prefetches into the cache hierarchy to facilitate high bandwidth and low latency transfers to the weight RAM 124 and data RAM 122.”).); and 
a master computation unit configured to perform one of one or more operations that corresponds to the data type of each of the one or more groups of MNN data ([Hamalainen p.455 Figure 11: examiner’s note: A control unit (CRU) within a PU connecting the C-, A-, and D-buses to other processing elements within the PU via internal data and address buses, where this communication network effectively transmits data and instructions between all PUs (i.e., to/from the DSP to perform arithmetic logical operations such as calculating a merged intermediate vector, refer to Hamalainen p.449 Figure 2, p.450 Figure 3, and p.451 Figure 4, and Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph, thus corresponding to “a master computation unit configured to perform one of one or more operations”) (Hamalainen p.455 col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph: “The block diagram of the PU is shown in Figure 11. The three basic parts are the control unit (CRU), the digital signal processor (DSP) and the random-access memory (RAM). They are connected together with the address bus, data bus, control signals and status signals. The CRU is also connected to the C-, A, and D-buses. … The main function of the CRU is to decode commands from the C-bus, arbitrate memory accesses, and control the operation of the DSP.”).] [Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU is instructed via translated micro-instructions to perform neural network calculations such as multiply-accumulate, add, and activation functions on received input values and weight data from the weight RAM and data RAM (Henry paragraph [0060]), where the multiply-accumulate, add operations within the ALU are configured by a control register (Henry paragraphs [0223]-[0227]) to perform integer or fixed-point operations (i.e., discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations; Henry paragraphs [0219]-[0221], where under its broadest reasonable interpretation in light of the applicant’s specification paragraphs [0005]-[0006], streaming data that requires high precision represent continuous data values), and where the activation function unit (AFU) performs normalization operations to generate an output within a specified range of values based on a non-linear function (also controlled by the same control register; Henry paragraphs [0228]-[0231]); collectively these operations correspond to performing calculations “based on a data type of each of the one or more groups of MNN data” (Henry paragraph [0064]) (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).]).  
Regarding Claim 7, Hamalainen in view of Henry teaches
The apparatus of claim 1, 
wherein the master computation module includes a master data dependency relationship determination unit configured to prevent an instruction from being executed based on a determination that a conflict exists between the instruction and other instructions (Henry paragraphs [0083], [0099]: examiner’s note: A stalling mechanism within each NPU (corresponding to “the master computation module includes a master data dependency relationship determination unit”), where the stalling mechanism prevents the NPU from reading the weight RAM (corresponding to “preventing an instruction from being executed”) in order to enable a buffer to properly write into the weight RAM (where the stalling mechanism that preserves and allows the sequential ordering of reading and writing operations corresponds to “based on a determination that a conflict exists between the instruction and other instructions”) (Henry paragraph [0083]: “ … multiple clock cycles are required to read the data words and weight words from the data RAM 122 and weight RAM 124 to perform the multiply-accumulate instruction at address 1 of FIG. 4; however, the data RAM 122 and weight RAM 124 and NPUs 126 are pipelined such that once the first multiply-accumulate operation is begun (e.g., as shown during clock 1 of FIG. 5), the subsequent multiply accumulate operations (e.g., as shown during clocks 2-512) are begun in successive clock cycles. Preferably, the NPUs 126 may briefly stall in response to an access of the data RAM 122 and/or weight RAM 124 by an architectural instruction, e.g., MTNN or MFNN instruction (described below with respect to FIGS. 14 and 15) or a microinstruction into which the architectural instructions are translated.” and Henry paragraph [0099]: “ … assuming an embodiment that includes a write and read buffer such as the buffer 1704 of FIG. 17, concurrently with the NPU 126 reads, the processor 100 writes the weight RAM 124 such that the buffer 1704 performs one write to the weight RAM 124 approximately every 16 clock cycles to write the weight words. Thus, in a single-ported embodiment of the weight RAM 124 (such as described with respect to FIG. 17), approximately every 16 clock cycles, the NPUs 126 must be stalled from reading the weight RAM 124 to enable the buffer 1704 to write the weight RAM 124.”).).  
Regarding Claim 8, Hamalainen in view of Henry teaches
The apparatus of claim 1, 
wherein each of the slave computation modules includes a slave computation unit configured to 
receive one or more groups of micro-instructions from the controller unit ([Henry Figure 4: examiner’s note: A NNU receiving program instructions, including instructions to multiply-accumulate, rotate (Henry Figure 4, address 2), where each NPU within the NNU (corresponding to “each of the slave computation modules”) performs the instruction in parallel (Henry paragraphs [0070]-[0077]: “For each instruction of the program, all of the NPUs 126 perform the instruction in parallel. … The third row, at address 2, specifies a multiply-accumulate rotate instruction with a count of 511, which instructs each of the 512 NPUs 126 to perform 511 multiply-accumulate operations.”), and where each instruction for each NPU is processed into a set of microinstructions (corresponding to “receive one or more groups of micro-instructions”) (Henry paragraph [0045]: “The architectural instructions 103 include a move to neural network (MTNN) instruction and a move from neural network (MFNN) instruction, which are described in more detail below. In one embodiment, the architectural instructions 103 are instructions of the x86 instruction set architecture (ISA), with the addition of the MTNN and MFNN instructions.”).] [Hamalainen p.455 Figure 11: examiner’s note: A control unit (CRU) within a PU connecting the C-, A-, and Hamalainen p.455 col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph: “The block diagram of the PU is shown in Figure 11. The three basic parts are the control unit (CRU), the digital signal processor (DSP) and the random-access memory (RAM). They are connected together with the address bus, data bus, control signals and status signals. The CRU is also connected to the C-, A, and D-buses. … The main function of the CRU is to decode commands from the C-bus, arbitrate memory accesses, and control the operation of the DSP.”).]) 
and to perform arithmetic logical operations that respectively correspond to the data type of the MNN data (Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU is instructed via translated micro-instructions to perform neural network calculations such as multiply-accumulate, add, and activation functions on received input values and weight data from the weight RAM and data RAM (Henry paragraph [0060]), where the multiply-accumulate, add operations within the ALU are  configured by a control register (Henry paragraphs [0223]-[0227]) to perform integer or fixed-point operations (i.e., discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations; Henry paragraphs [0219]-[0221]), and where the activation function unit (AFU) performs normalization operations to generate an output within a specified range of values based on a non-linear function (also controlled by the same control register; Henry paragraphs [0228]-[0231]); collectively these operations correspond to performing calculations “that respectively correspond to a data type of the MNN data” (Henry paragraph [0064]) (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).); 
a slave data dependency relationship determination unit configured to perform data exchange operations based on a determination that no conflict exists between the data exchange operations (Henry paragraphs [0083], [0099]: examiner’s note: A stalling mechanism within each NPU (corresponding to “a slave data dependency relationship determination unit”), where the stalling mechanism prevents the NPU from reading the weight RAM (corresponding to “perform data exchange operations”) in order to enable a buffer to write into the weight RAM (where the stalling mechanism that preserves and allows the sequential ordering of reading and writing operations corresponds to “based on a determination that no conflict exists between the data exchange operations”) (Henry paragraph [0083]: “ … multiple clock cycles are required to read the data words and weight words from the data RAM 122 and weight RAM 124 to perform the multiply-accumulate instruction at address 1 of FIG. 4; however, the data RAM 122 and weight RAM 124 and NPUs 126 are pipelined such that once the first multiply-accumulate operation is begun (e.g., as shown during clock 1 of FIG. 5), the subsequent multiply accumulate operations (e.g., as shown during clocks 2-512) are begun in successive clock cycles. Preferably, the NPUs 126 may briefly stall in response to an access of the data RAM 122 and/or weight RAM 124 by an architectural instruction, e.g., MTNN or MFNN instruction (described below with respect to FIGS. 14 and 15) or a microinstruction into which the architectural instructions are translated.” and Henry paragraph [0099]: “ … assuming an embodiment that includes a write and read buffer such as the buffer 1704 of FIG. 17, concurrently with the NPU 126 reads, the processor 100 writes the weight RAM 124 such that the buffer 1704 performs one write to the weight RAM 124 approximately every 16 clock cycles to write the weight words. Thus, in a single-ported embodiment of the weight RAM 124 (such as described with respect to FIG. 17), approximately every 16 clock cycles, the NPUs 126 must be stalled from reading the weight RAM 124 to enable the buffer 1704 to write the weight RAM 124.”).); 
a slave neuron caching unit configured to temporarily store the input data and the slave output values (Henry Figure 1, elements 114, 121: examiner’s note: Performing data transfers between the NNU 121 (which comprises of a plurality of NPUs 126 and the memory subsystem (which includes a memory management unit and associated cache memory, where the memory management unit performs logical separation and management of the cache memory hierarchy to different NPUs for use in storing input data, weight data, and merged intermediate and output vectors, thus corresponding to an assignment of “a slave neuron caching unit configured to temporarily store the input data and the slave output values”) (Henry paragraph [0051]: “ … Preferably, the memory subsystem 114 includes a memory management unit (not shown), which may include … a level-I data cache (and the instruction cache 102), a level-2 unified cache, and a bus interface unit that interfaces the processor 100 to system memory. In one embodiment, the processor 100 of FIG. 1 is representative of a processing core that is one of multiple processing cores in a multi-core processor that share a last-level cache memory. …” and Henry paragraph [0057]: “ … Furthermore, the large memory hierarchy of the memory subsystem 114, including the cache memories, provides very high data bandwidth for the transfers between the system memory and the NNU 121. Still further, preferably, the memory subsystem 114 includes hardware data prefetchers that track memory access patterns, such as loads of neural data and weights from system memory, and perform data prefetches into the cache hierarchy to facilitate high bandwidth and low latency transfers to the weight RAM 124 and data RAM 122.”).); and 
a weight value caching unit configured to temporarily store the weight values (Henry Figure 1, elements 114, 121: examiner’s note: Performing data transfers between the NNU 121 (which comprises of a plurality of NPUs 126 and the memory subsystem (which includes a memory management unit and associated cache memory, where the memory management unit can perform logical separation and management of the cache memory hierarchy to different NPUs for use in storing input data, weight data, and merged intermediate and output vectors, thus corresponding to an assignment of “a slave value caching unit configured to temporarily store the weight values”) (Henry paragraph [0051]: “ … Preferably, the memory subsystem 114 includes a memory management unit (not shown), which may include … a level-I data cache (and the instruction cache 102), a level-2 unified cache, and a bus interface unit that interfaces the processor 100 to system memory. In one embodiment, the processor 100 of FIG. 1 is representative of a processing core that is one of multiple processing cores in a multi-core processor that share a last-level cache memory. …” and Henry paragraph [0057]: “ … Furthermore, the large memory hierarchy of the memory subsystem 114, including the cache memories, provides very high data bandwidth for the transfers between the system memory and the NNU 121. Still further, preferably, the memory subsystem 114 includes hardware data prefetchers that track memory access patterns, such as loads of neural data and weights from system memory, and perform data prefetches into the cache hierarchy to facilitate high bandwidth and low latency transfers to the weight RAM 124 and data RAM 122.”).).  
Regarding Claim 9, Hamalainen in view of Henry teaches
The apparatus of claim 8, 
wherein the slave data dependency relationship determination unit is configured to: 
determine whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed (Henry Figure 1, elements 103, 104, 105, 106, 108: examiner’s note: A stalling mechanism involving the rename unit and reservation stations and the NPU pipeline (Henry Figure 34; paragraph [0273]) (corresponding to “the slave data dependency relationship determination unit”), where an instruction translator first translates architectural program instructions into microinstructions (Henry paragraph [0045]), and the microcode unit within the translator sends the microinstructions to a selector to provision a rename unit (Henry paragraph [0046]). The rename unit checks if general-purpose and media registers are free in the physical register file (corresponding to “determine whether there is a dependent relationship between a first micro-instruction and a second micro-instruction being executed”), and either stalls the NPU pipeline if no registers are free in the physical register file, or allows the microinstructions that are held in the reservations stations to be issued (Henry paragraphs [0049]-[0050]: “ … the processor 100 includes a physical register file that includes more physical registers than the number of architectural registers, but does not include an architectural register file, and the reorder buffer entries do not include result storage. …The processor 100 also includes a pointer table with an associated pointer for each architectural register. For the operand of a microinstruction 105 that specifies an architectural register, the rename unit populates the destination operand field in the microinstruction 105 with a pointer to a free register in the physical register file. If no registers are free in the physical register file, the rename unit 106 stalls the pipeline. … The reservation stations 108 hold microinstructions 105 until they are ready to be issued to an execution unit 112/121 for execution. A microinstruction 105 is ready to be issued when all of its source operands are available and an execution unit 112/121 is available to execute it. The execution units 112/121 receive register source operands from the reorder buffer or the architectural register file in the first embodiment or from the physical register file in the second embodiment described above. … the MTNN and MFNN architectural instructions 103 include an immediate operand that specifies a function to be performed by the NNU 121 that is provided in one of the one or more microinstructions 105 into which the MTNN and MFNN architectural instructions 103 are translated.”).); and 
if there is no dependent relationship, allow the micro-instruction which has not been executed to be executed immediately, otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed (Henry Figure 1, elements 103, 104, 105, 106, 108: examiner’s note: A stalling mechanism involving the rename unit and reservation stations and the NPU pipeline (Henry Figure 34; paragraph [0273]), where an instruction translator first translates architectural program instructions into microinstructions (Henry paragraph [0045]), and the microcode unit within the translator sends the microinstructions to a selector to provision a rename unit (Henry paragraph [0046]). The rename unit checks if general-purpose and media registers are free in the physical register file, and either stalls the NPU pipeline if no registers are free (corresponding to “all the micro-instructions upon which that micro-instruction which has not been executed depend is completed”) in the physical register file (corresponding to the dependent relationship indicated in “otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed.”), or allows the microinstructions that are held in the reservations stations to be issued (corresponding to “if there is no dependent relationship, allow  (Henry paragraphs [0049]-[0050]: “ … the processor 100 includes a physical register file that includes more physical registers than the number of architectural registers, but does not include an architectural register file, and the reorder buffer entries do not include result storage. …The processor 100 also includes a pointer table with an associated pointer for each architectural register. For the operand of a microinstruction 105 that specifies an architectural register, the rename unit populates the destination operand field in the microinstruction 105 with a pointer to a free register in the physical register file. If no registers are free in the physical register file, the rename unit 106 stalls the pipeline. … The reservation stations 108 hold microinstructions 105 until they are ready to be issued to an execution unit 112/121 for execution. A microinstruction 105 is ready to be issued when all of its source operands are available and an execution unit 112/121 is available to execute it. The execution units 112/121 receive register source operands from the reorder buffer or the architectural register file in the first embodiment or from the physical register file in the second embodiment described above. … the MTNN and MFNN architectural instructions 103 include an immediate operand that specifies a function to be performed by the NNU 121 that is provided in one of the one or more microinstructions 105 into which the MTNN and MFNN architectural instructions 103 are translated.”).).  
Regarding Claim 10, Hamalainen in view of Henry teaches
The apparatus of claim 6, 
wherein the master computation unit includes an operation determiner configured to determine an operation to be performed based on the data type of the input data ([Henry Figure 29A, element 2926; Figure 2, elements 204, 202, 203, 209: examiner’s note: A control register containing an ALU function that determines the type of ALU operation to be performed within the NPU (where the control register corresponds to “a master computation unit Henry paragraph [0227]: “… The ALU function 2926 specifies the function performed by the ALU 204 of the NPU 126. As described above, the ALU functions 2926 may include, but are not limited to: multiply data word 209 and weight word 203 and accumulate product with accumulator 202; sum accumulator 202 and weight word 203; sum accumulator 202 and the data word 209; maximum of accumulator 202 and data word 209; maximum of accumulator 202 and weight word 203; output accumulator 202; pass through data word 209; pass through weight word 203; output zero. …”).] [Henry Figure 1, element 127; Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU instructed via translated micro-instructions to perform neural network calculations such as multiply-accumulate, add, and activation functions ([Henry paragraph [0059]) on received input values and weight data from the weight RAM and data RAM (Henry paragraph [0060]), where the multiply-accumulate, add operations within the ALU are configured by the control register through program instructions sent to each NPU, sequenced by the sequencer (Henry paragraph [0055]) and media registers (corresponding to “based on the data type of the input data”) (Henry paragraph [0223]: “The control register 127 includes the following fields, as shown: configuration 2902, signed data 2912, signed weight 2914, data binary point 2922, weight binary point 2924, ALU function 2926, round control 2932, activation function 2934, reciprocal 2942, shift amount 2944, output RAM 2952, output binary point 2954, and output command 2956. … The control register 127 values may be written by both an MTNN instruction 1400 and an instruction of an NNU program, such as an initiate instruction.” and Henry paragraph [0235]:” … many of the fields are included in the NNU instructions themselves and decoded by the sequencer 128 to generate to a micro-operation 3416 (of FIG. 34) that controls the ALUs 204 and/or AFUs 212. Additionally, the fields may be included in a micro-operation 3414 (of FIG. 34) stored in a media register 118 that controls the ALUs 204 and/or AFUs 212.”).]); and 
a hybrid data processor configured to perform the determined operation (Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU instructed via translated micro-instructions to perform neural network calculations involving integer or fixed-point operations (i.e., discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations) on an ALU unit (controlled according to configuration provided by a control register; Henry paragraphs [0223]-[0231]), where the ALU unit with the capability to handle integer, fixed-point, and floating-point operations with fixed-point hardware-assist logic is considered to be “a hybrid data processor” (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).).  
Regarding Claim 11, Hamalainen in view of Henry teaches
The apparatus of claim 8, 
wherein the slave computation unit includes an operation determiner configured to determine an operation to be performed based on the data type of the input data ([Henry Figure 29A, element 2926; Figure 2, elements 204, 202, 203, 209: examiner’s note: A control register containing an ALU function that determines the type of ALU operation to be performed within the NPU (where the control register corresponds to “the slave computation unit includes an operation determiner configured to determine an operation to be performed”) (Henry paragraph [0227]: “… The ALU function 2926 specifies the function performed by the ALU 204 of the NPU 126. As described above, the ALU functions 2926 may include, but are not limited to: multiply data word 209 and weight word 203 and accumulate product with accumulator 202; sum accumulator 202 and weight word 203; sum accumulator 202 and the data word 209; maximum of accumulator 202 and data word 209; maximum of accumulator 202 and weight word 203; output accumulator 202; pass through data word 209; pass through weight word 203; output zero. …”).] [Henry Figure 1, element 127; Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: The NPU performs neural network calculations such as multiply-accumulate, add, and activation functions ([Henry paragraph [0059]) on received input values and weight data from the weight RAM and data RAM (Henry paragraph [0060]), where the multiply-accumulate, add operations within the ALU are configured by the control register through program instructions sequenced by the sequencer (Henry paragraph [0055]) and media registers (corresponding to “based on the data type of the input data”) (Henry paragraph [0223]: “The control register 127 includes the following fields, as shown: configuration 2902, signed data 2912, signed weight 2914, data binary point 2922, weight binary point 2924, ALU function 2926, round control 2932, activation function 2934, reciprocal 2942, shift amount 2944, output RAM 2952, output binary point 2954, and output command 2956. … The control register 127 values may be written by both an MTNN instruction 1400 and an instruction of an NNU program, such as an initiate instruction.” and Henry paragraph [0235]:” … many of the fields are included in the NNU instructions themselves and decoded by the sequencer 128 to generate to a micro-operation 3416 (of FIG. 34) that controls the ALUs 204 and/or AFUs 212. Additionally, the fields may be included in a micro-operation 3414 (of FIG. 34) stored in a media register 118 that controls the ALUs 204 and/or AFUs 212.”).]); and 
a hybrid data processor configured to perform the determined operation (Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU instructed via translated micro-instructions to perform neural network calculations involving integer or fixed-point operations (i.e., discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations) on an ALU unit (controlled according to configuration provided by a control register; Henry paragraphs [0223]-[0231]), where the ALU unit with the capability to handle integer, fixed-point, and floating-point operations with fixed-point hardware-assist logic is considered to be “a hybrid data processor” (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).).  
Regarding Claim 12, Hamalainen in view of Henry teaches
The apparatus of claim 10, 
wherein the master computation unit further includes a data type determiner configured to determine the data type of the input data (Henry Figure 1, element 127, Figure 29A, element 127: examiner’s note: A control register for the NNU containing hardware configuration (Henry Figure 29A) that informs each NPU within the NNU the configuration for supporting arithmetic operations involving signed values, decimal point values, rounding, bit shifting (Henry paragraphs [0223]-[0235]), all of which that can be used to support integer and fixed-point values representing both discrete and continuous data for both input data and weight data (corresponding to “data type of the input data”). The control signals for the control register are generated by the sequencer on the NNU (Henry paragraph [0055], [0235]) or through or NPU micro-instructions stored in media registers (Henry paragraph [0235]). Collectively, the media registers, the control register, and the sequencer logically act as a “data type determiner” for the master computation unit.); and 
at least one of a discrete data processor or a continuous data processor, 
wherein the discrete data processor is configured to process the input data based on a determination that the input data is stored as discrete values (Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU instructed via translated micro-instructions to perform neural network calculations involving integer or fixed-point operations (i.e., discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations) on an ALU unit (controlled according to configuration provided by a control register; Henry paragraphs [0223]-[0231]) where the ALU unit with the capability to handle integer and fixed-point operations is considered to be “a discrete data processor” (Henry paragraph [0064]) (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).), and 
wherein the continuous data processor is configured to process the input data based on a determination that the input data is stored as continuous values (Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU instructed via translated micro-instructions to perform neural network calculations involving integer or fixed-point operations (i.e., discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations) on an ALU unit (controlled according to configuration provided by a control register; Henry paragraphs [0223]-[0231]), where the ALU unit with the capability to handle floating-point operations with fixed-point hardware-assist logic is considered to be “a continuous data processor” (Henry paragraph [0064]) (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).).  
Claims 13 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, Microprocessors and Microsystems Vol. 19 Number 8, Elsevier Science B.V., October 1995, pp.447-465 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry], in further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, arXiv:1603.01025v2, March 17 2016, 10 pages [hereafter referred as Miyashita].
Regarding Claim 13, Hamalainen in view of Henry teaches
The apparatus of claim 1, further comprising a data converter configured to: 
receive continuous data (Hamalainen p.448 Figure 1: examiner’s note: A tree-shaped parallel computer architecture consisting of a root/master processing element (PE) (corresponding to “a master computation module”) and leaves/slave PEs, interconnected by a series of interconnecting nodes and communication network (corresponding to “wherein the Hamalainen p.452 Figure 6; p.454 Figure 9), where the collective architecture is considered as one large processing element, and where data is continuously streamed into the parallel computer architecture in real-time (corresponding to the streaming aspect of “receive continuous data”) (Hamalainen p.449 col.2 Tree Shape Architecture for Neural Computations, 1st – p.449 col.1 1st paragraph: “The tree shape parallel computer architecture is depicted in Figure 1. The top of the tree is formed by a horizontal line of PEs, the trunk is composed of interconnecting nodes and the root may be a PE or an interface to another system. In general, this architecture can be referred to as a master-slave configuration, where the root acts as a master and the top PEs as slaves. The root may be only an initiator for PEs in the beginning of execution, or it may feed PEs continuously in real-time. … the communication network as a whole becomes an active 'PE'.”).), …
transmit the discrete data to the computation module (Hamalainen p.452 Figure 6; p.454 Figure 9: examiner’s note: The communication units in the TUTNC parallel computer architecture (Hamalainen p.452 Figure 6, p.451 col.1 3rd paragraph – col.2 3rd paragraph: “…the interconnecting nodes in the trunk of the tree are called communication units (CUs) … The CUs are routing switch elements with a reduced set of arithmetic and logical functions. …”) and the bus structure between the CUs and PUs (Hamalainen p.454 Figure 9; p.453 col.2 2nd paragraph – p.454 cols.1 and 2) form a communication network, where the C-bus, A-bus, and D-bus represent command, address, and data buses that transmit data and commands between PUs and CUs within the tree shape architecture shown in Figures 6 and Figures 9; collectively the CUs and the communication network correspond to an interconnection unit.) (Hamalainen p.455 Figure 11: examiner’s note: A control unit (CRU) within a PU connecting the C-, A-, and D-buses to other processing elements within the PU via Hamalainen p.455 col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph: “The block diagram of the PU is shown in Figure 11. The three basic parts are the control unit (CRU), the digital signal processor (DSP) and the random-access memory (RAM). They are connected together with the address bus, data bus, control signals and status signals. The CRU is also connected to the C-, A, and D-buses. … The main function of the CRU is to decode commands from the C-bus, arbitrate memory accesses, and control the operation of the DSP.”).).  
While Hamalainen in view of Henry teaches supporting floating-point operations using fixed-point hardware assist (corresponding to the high-precision aspect of “continuous data”), Hamalainen in view of Henry does not explicitly teach
convert the continuous data to discrete data, …
Miyashita teaches
convert the continuous data to discrete data (Miyashita p.3 Figure 1(b); p.2 col.2 Section 3. Concept and Motivation 1st paragraph – p.3 col.1 2nd paragraph (Section 3.1. Proposed Method 1.): examiner’s note: Applying a quantization method to perform conversion from continuous data to discrete data, where input x contains data from set of real numbers, and the Quantize() function corresponds to a conversion from the floating/fixed-point data (continuous data) to logarithm representation (discrete data) (“…The first proposed method as shown in Figure 1(b) is to transform one operand to its log representation, convert the resulting transformation back to the linear domain, and multiply this by the other operand. This is simply <refer to p.3 col.1 equation 1> where                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                                ~
                            
                        
                     = Quantize(                        
                            
                                
                                    l
                                    o
                                    g
                                
                                
                                    2
                                
                            
                            (
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    )), Quantize (∙) quantizes ∙ to an integer, and Bitshift (a,b) is the function that bit-shifts a value a by an integer b in fixed-point arithmetic. In floating-point, this operation is simply an addition of b with the exponent part of a.”…), and where the quantization method can be instantiated through hardware arithmetic operations such as bit-shifting and taking the floor of an input, or rounding to the nearest integer, where these arithmetic operations represent “a data converter” component either implemented via hardware or software (“… In order to quantize, we propose two hardware-friendly flavors. The first option is to simply floor the input. This method computes ⌊log2(w)⌋ by returning the position of the first 1 bit seen from the most significant bit (MSB). The second option is to round to the nearest integer, which is more precise than the first option. ...”).), …
Both Hamalainen in view of Henry and Miyashita are analogous art since they both teach performing data computations in neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the real-time input data containing high-precision values of Hamalainen in view of Henry and perform the quantization method of Miyashita as a way to convert the streaming real-time data into discrete data values for further computational processing in the neural network hardware architecture. The motivation to combine is taught in Miyashita, as quantization introduces a form of compression of the data set without significant deterioration in performance of the neural network, thereby also improving the performance and efficiency of the neural network hardware architecture by allowing these computations to be performed on less complex hardware without taking up additional computational resources ([Miyashita p.1 col.1 Abstract: “Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance.”] [Miyashita p.1 col.2 Section 1. Introduction: “In order for these large networks to run in real-time applications such as for mobile or embedded platforms, it is often necessary to use low-precision arithmetic and apply compression techniques.”]).
Regarding Claim 15, Hamalainen in view of Henry, in further view of Miyashita teaches
The apparatus of claim 13, 
wherein the data converter is configured to receive continuous data from an external storage device ([Hamalainen p.452 Figure 6: examiner’s note: Transmitting data between a PU containing a DSP and an external host computer (which contains separate memory from the PUs, thus corresponding to “an external storage device”), where the host computer can also transmit data to other PUs (Hamalainen p.453 col.1 Modes of Communication, 1st paragraph – p.453 col.2 1st paragraph: “The basic modes of communication are data transfer operations. The host can write data to a particular PU, broadcast data to all PUs or read data from the addressed PU.”) (Hamalainen p.456 col.1 3rd paragraph – p.456 col.2 1st paragraph: “For synchronization purposes, there is a set of registers in the register block of the CRU. The register block consists of data registers, a signature register, an address register and a status register. … The host computer can write data to the data register, where the DSP can read it. The DSP can also supply data to the data register, which can be read by the host. A handshaking procedure is employed to control data transfers. The host can first write data and an interrupt request to the DSP, which, in turn, reads data in an interrupt service routine. The DSP can supply the data to the register and write a message to the status register, which is read by the host.”).] [Miyashita p.3 Figure 1(b); p.2 col.2 Section 3. Concept and Motivation 1st paragraph – p.3 col.1 2nd paragraph (Section 3.1. Proposed Method 1.): examiner’s note: Applying a quantization method to perform conversion from continuous data to discrete data, where input x contains data from set of real numbers, and the Quantize() function corresponds to a conversion (“…The first proposed method as shown in Figure 1(b) is to transform one operand to its log representation, convert the resulting transformation back to the linear domain, and multiply this by the other operand. This is simply <refer to p.3 col.1 equation 1> where                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                                ~
                            
                        
                     = Quantize(                        
                            
                                
                                    l
                                    o
                                    g
                                
                                
                                    2
                                
                            
                            (
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    )), Quantize (∙) quantizes ∙ to an integer, and Bitshift (a,b) is the function that bit-shifts a value a by an integer b in fixed-point arithmetic. In floating-point, this operation is simply an addition of b with the exponent part of a.”…), and where the quantization method can be instantiated through hardware arithmetic operations such as bit-shifting and taking the floor of an input, or rounding to the nearest integer, where these arithmetic operations represent “a data converter” component either implemented via hardware or software, configured to “receive continuous data from an external storage device” (“… In order to quantize, we propose two hardware-friendly flavors. The first option is to simply floor the input. This method computes ⌊log2(w)⌋ by returning the position of the first 1 bit seen from the most significant bit (MSB). The second option is to round to the nearest integer, which is more precise than the first option. ...”).]).  
Regarding Claim 16, Hamalainen in view of Henry teaches
The apparatus of claim 1, further comprising a data converter configured to: 
receive continuous data from an external storage device ([Hamalainen p.448 Figure 1: examiner’s note: A tree-shaped parallel computer architecture consisting of a root/master processing element (PE) (corresponding to “a master computation module”) and leaves/slave PEs, interconnected by a series of interconnecting nodes and communication network, where the collective architecture is considered as one large processing element, and where data is continuously streamed into the parallel computer architecture in real-time (corresponding to the streaming aspect of “receive continuous data”) (Hamalainen p.449 col.2 Tree Shape Architecture for Neural Computations, 1st – p.449 col.1 1st paragraph: “The tree shape parallel computer architecture is depicted in Figure 1. The top of the tree is formed by a horizontal line of PEs, the trunk is composed of interconnecting nodes and the root may be a PE or an interface to another system. In general, this architecture can be referred to as a master-slave configuration, where the root acts as a master and the top PEs as slaves. The root may be only an initiator for PEs in the beginning of execution, or it may feed PEs continuously in real-time. … the communication network as a whole becomes an active 'PE'.”).] [Hamalainen p.452 Figure 6: examiner’s note: Transmitting data between a PU containing a DSP and an external host computer (which contains separate memory from the PUs, thus corresponding to “an external storage device”), where the host computer can also transmit data to other PUs (Hamalainen p.453 col.1 Modes of Communication, 1st paragraph – p.453 col.2 1st paragraph: “The basic modes of communication are data transfer operations. The host can write data to a particular PU, broadcast data to all PUs or read data from the addressed PU.”) (Hamalainen p.456 col.1 3rd paragraph – p.456 col.2 1st paragraph: “For synchronization purposes, there is a set of registers in the register block of the CRU. The register block consists of data registers, a signature register, an address register and a status register. … The host computer can write data to the data register, where the DSP can read it. The DSP can also supply data to the data register, which can be read by the host. A handshaking procedure is employed to control data transfers. The host can first write data and an interrupt request to the DSP, which, in turn, reads data in an interrupt service routine. The DSP can supply the data to the register and write a message to the status register, which is read by the host.”).]), …
transmit the discrete data to the external storage device ([Hamalainen p.452 Figure 6; p.454 Figure 9: examiner’s note: The communication units in the TUTNC parallel computer architecture (Hamalainen p.452 Figure 6, p.451 col.1 3rd paragraph – col.2 3rd paragraph: “…the interconnecting nodes in the trunk of the tree are called communication units (CUs) … The CUs are routing switch elements with a reduced set of arithmetic and logical functions. …”) and the bus structure between the CUs and PUs (Hamalainen p.454 Figure 9; p.453 col.2 2nd paragraph – p.454 cols.1 and 2) form a communication network, where the C-bus, A-bus, and D-bus represent command, address, and data buses that transmit data and commands between PUs and CUs within the tree shape architecture shown in Figures 6 and Figures 9; collectively the CUs and the communication network correspond to an interconnection unit.) (Hamalainen p.455 Figure 11: examiner’s note: A control unit (CRU) within a PU connecting the C-, A-, and D-buses to other processing elements within the PU via internal data and address buses, where this communication network effectively transmits data and instructions between all PUs (i.e., to/from the DSP to perform arithmetic logical operations, and to the master PE, thus corresponding to “transmit the discrete data to the computation module”) (Hamalainen p.455 col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph: “The block diagram of the PU is shown in Figure 11. The three basic parts are the control unit (CRU), the digital signal processor (DSP) and the random-access memory (RAM). They are connected together with the address bus, data bus, control signals and status signals. The CRU is also connected to the C-, A, and D-buses. … The main function of the CRU is to decode commands from the C-bus, arbitrate memory accesses, and control the operation of the DSP.”).] [Hamalainen p.452 Figure 6: examiner’s note: Transmitting data between a PU containing a DSP and an external host computer (which contains separate memory from the PUs, thus corresponding to “an external storage device”), where the host computer can also transmit data to other PUs (Hamalainen p.453 col.1 Modes of Communication, 1st paragraph – p.453 col.2 1st paragraph: “The basic modes of communication are data transfer operations. The host can write data to a particular PU, broadcast data to all PUs or read data from the addressed PU.”) (Hamalainen p.456 col.1 3rd paragraph – p.456 col.2 1st paragraph: “For synchronization purposes, there is a set of registers in the register block of the CRU. The register block consists of data registers, a signature register, an address register and a status register. … The host computer can write data to the data register, where the DSP can read it. The DSP can also supply data to the data register, which can be read by the host. A handshaking procedure is employed to control data transfers. The host can first write data and an interrupt request to the DSP, which, in turn, reads data in an interrupt service routine. The DSP can supply the data to the register and write a message to the status register, which is read by the host.”).]).  
While Hamalainen in view of Henry teaches supporting floating-point operations using fixed-point hardware assist (corresponding to the high-precision aspect of “continuous data”), Hamalainen in view of Henry does not explicitly teach
convert the continuous data to discrete data, …
Miyashita teaches
convert the continuous data to discrete data (Miyashita p.3 Figure 1(b); p.2 col.2 Section 3. Concept and Motivation 1st paragraph – p.3 col.1 2nd paragraph (Section 3.1. Proposed Method 1.): examiner’s note: Applying a quantization method to perform conversion from continuous data to discrete data, where input x contains data from set of real numbers, and the Quantize() function corresponds to a conversion from the floating/fixed-point data (continuous data) to logarithm representation (discrete data) (“…The first proposed method as shown in Figure 1(b) is to transform one operand to its log representation, convert the resulting transformation back to the linear domain, and multiply this by the other operand. This is simply <refer to p.3 col.1 equation 1> where                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                                ~
                            
                        
                     = Quantize(                        
                            
                                
                                    l
                                    o
                                    g
                                
                                
                                    2
                                
                            
                            (
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    )), Quantize (∙) quantizes ∙ to an integer, and Bitshift (a,b) is the function that bit-shifts a value a by an integer b in fixed-point arithmetic. In floating-point, this operation is simply an addition of b with the exponent part of a.”…), and where the quantization method can be instantiated through hardware arithmetic operations such as bit-shifting and taking the floor of an input, or rounding to the nearest integer, where these arithmetic operations corresponds to “a data converter” component either implemented via hardware or software (“… In order to quantize, we propose two hardware-friendly flavors. The first option is to simply floor the input. This method computes ⌊log2(w)⌋ by returning the position of the first 1 bit seen from the most significant bit (MSB). The second option is to round to the nearest integer, which is more precise than the first option. ...”).), …
Both Hamalainen in view of Henry and Miyashita are analogous art since they both teach performing data computations in neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the real-time input data containing high-precision values of Hamalainen in view of Henry and perform the quantization method of Miyashita as a way to convert the streaming real-time data into discrete data values for further computational processing in the neural network hardware architecture. The motivation to combine is taught in Miyashita, as quantization introduces a form of compression of the data set without significant deterioration in performance of the neural network, thereby also improving the performance and efficiency of the neural network hardware architecture by allowing these computations to be performed on less complex hardware without taking up additional computational resources ([Miyashita p.1 col.1 Abstract: “Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance.”] [Miyashita p.1 col.2 Section 1. Introduction: “In order for these large networks to run in real-time applications such as for mobile or embedded platforms, it is often necessary to use low-precision arithmetic and apply compression techniques.”])
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, Microprocessors and Microsystems Vol. 19 Number 8, Elsevier Science B.V., October 1995, pp.447-465 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry], in further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, arXiv:1603.01025v2, March 17 2016, 10 pages [hereafter referred as Miyashita], in even further view of Hassner et al., U.S. Patent 5,638,065, issued 6/10/1997 [hereafter referred as Hassner].
Regarding Claim 14, Hamalainen in view of Henry, in further view of Miyashita teaches
The apparatus of claim 13, 
wherein the data converter includes 
a preprocessing unit configured to clip a portion of the input data that is within a predetermined range to generate preprocessed data ([Miyashita p.3 Figure 1(b); p.2 col.2 Section 3. Concept and Motivation 1st paragraph – p.3 col.1 2nd paragraph (Section 3.1. Proposed Method 1.): examiner’s note: Applying a quantization method to perform conversion from continuous data to discrete data, where input x contains data from set of real numbers, and the Quantize() function corresponds to a conversion from the floating/fixed-point data (continuous data) to logarithm representation (discrete data) (“…The first proposed method as shown in Figure 1(b) is to transform one operand to its log representation, convert the resulting transformation back to the linear domain, and multiply this by the other operand. This is simply <refer to p.3 col.1 equation 1> where                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                                ~
                            
                        
                     = Quantize(                        
                            
                                
                                    l
                                    o
                                    g
                                
                                
                                    2
                                
                            
                            (
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    )), Quantize (∙) quantizes ∙ to an integer, and Bitshift (a,b) is the function that bit-shifts a value a by an integer b in fixed-point arithmetic. In floating-point, this operation is simply an addition of b with the exponent part of a.”…), and where the quantization method can be instantiated through hardware arithmetic operations such as bit- (“… In order to quantize, we propose two hardware-friendly flavors. The first option is to simply floor the input. This method computes ⌊log2(w)⌋ by returning the position of the first 1 bit seen from the most significant bit (MSB). The second option is to round to the nearest integer, which is more precise than the first option. ...”), with additional examples representing a clipping function shown in Miyashita p.4, equations 5, 6, and 7 (where a predetermined range is determined by either a floor function returning the position of the first 1 bit seen from the most significant bit, or a rounding function to the nearest integer, and where the instantiation of these functions in hardware corresponds to a “preprocessing unit configured to clip a portion of the input data that is within a predetermined range to generate preprocessed data”).); …
However, Hamalainen in view of Henry, in further view of Miyashita does not teach
a distance calculator configured to calculate multiple distance values between the preprocessed data and multiple discrete values; and 
a comparer configured to compare the multiple distance values to output one or more of the multiple discrete values.  
Hassner teaches
a distance calculator configured to calculate multiple distance values between the preprocessed data and multiple discrete values (Hassner Figure 6, elements 30a, 30b: examiner’s note: Taking vectors W and W’ containing discrete values (corresponding to preprocessed data and multiple discrete values; Hassner col.5 lines 5-23) and inputting them into filter units implementing three linear functions F1, F2, F3, and performing calculations to maximize the Euclidean distance between those vectors (where maximizing the Euclidean distance is interpreted as performing an absolute value of the Euclidean distance between each of the discrete values within the two vectors, thus corresponding to a “distance calculator”) Hassner col.5 lines 42-56:  “Referring now to FIG. 6, the                         
                            
                                
                                    W
                                
                                
                                    '
                                
                            
                        
                     symbols [                        
                            
                                
                                    W
                                
                                
                                    5
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    6
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    7
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    8
                                
                            
                        
                    ] are routed in parallel via a delay unit 26 and an intersymbol interference subtraction unit 27 to a set 28 of analog matched filter units 30a-30d. Delay unit 26 delays the symbols [                        
                            
                                
                                    W
                                
                                
                                    5
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    6
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    7
                                
                            
                        
                     ,                         
                            
                                
                                    W
                                
                                
                                    8
                                
                            
                        
                    ] for one symbol period of four clock cycles duration. After this delay and the subtraction of intersymbol interference from the ' lookahead symbols [                        
                            
                                
                                    W
                                
                                
                                    5
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    6
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    7
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    8
                                
                            
                        
                    ] by unit 27 in the manner presently to be described, the lookahead symbols are transformed into W symbols [                        
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    4
                                
                            
                        
                    ] for an updated current state of the channel. … The vectors                         
                            
                                
                                    W
                                
                                
                                    '
                                
                            
                        
                     and W are fed in parallel to the four analog matched filter units 30a-30d. These units 30a-30d calculate values of linear functions chosen to maximize Euclidean distances between vectors whose values are ambiguous in 55 order better to distinguish between them.”).); and 
a comparer configured to compare the multiple distance values to output one or more of the multiple discrete values (Hassner Figure 6, elements 28, 30a-30f, 20, 32; Figures 7A-7D, elements 36a-36f: examiner’s note: Filter units containing comparator elements to perform comparisons against the vector values and the three linear functions in order to produce binary outputs 40a-40f, which are fed into a finite state machine to output a decoded symbol pattern (representing one or more of the multiple discrete values) (Hassner col.6 lines 1-16: “FIGS. 7A-7D shows in detail the binary decision outputs generated from the linear functions by the filter units 30a-30d, respectively. More specifically, units 30a-30d comprise filters 34a-34d, respectively, for implementing the linear functions F1, F2 , F3. The linear functions F1, F2 , F3 output from each filter 34 (e.g., 34a as shown in FIG. 7A) six comparators 36a-36f which compare the respective function values with respective identical preselected threshold values in each of the four units 30a-30d to generate three respective outputs which are ANDed at 38a-3Sf to provide binary outputs 40a-40f, respectively, to digital sequential finite-state machine 32. More specifically and, for example, the six outputs 40a-40f of matched filter unit 30a for states A-F, respectively, constitute the finite-state machine inputs for bit 1 of the four-bit pattern ....”).).  
Both Hamalainen in view of Henry, in further view of Miyashita and Hassner are analogous art since they both teach processing of discrete data in hardware.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the output from the preprocessing unit of Hamalainen in view of Henry, in further view of Miyashita and couple it as an input into the filter units of Hassner as a way to perform maximum-likelihood decisions (i.e., performing distance calculations between two vectors, generating one or more discrete values that most likely represents the preprocessed data) in hardware. The motivation to combine is taught in Hassner, since implementing this logic in hardware frees up the computational resources within each individual neural network processor unit to perform other complex neural network computations, thus improving the overall efficiency of the system (Hassner col.1 lines 10-16: “This invention relates to an apparatus and method for processing analog signals … and more particularly to an Apparatus and method for detecting multiple-bit symbols by (i) converting the analog signals into analog vectors using a linear Walsh transform, and making maximum-likelihood decisions using vector metric calculations which are determined by the selected run-length-limited modulation code and equalized linear channel response signal shape and are implemented by analog matched filters, analog comparators, and digital sequential finite-state machines matched to RLL-coded symbols.”).
Claims 17-27 are rejected under 35 U.S.C. 103 as being unpatentable over Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, Microprocessors and Microsystems Vol. 19 Number 8, Elsevier Science B.V., October 1995, pp.447-465 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, .
Regarding Claim 17, Hamalainen teaches
A method for forward propagation of a multilayer neural network (MNN), comprising: 
receiving, by a master computation module of a computation module, one or more groups of MNN data ([Hamalainen p.448 Figure 1: examiner’s note: A tree-shaped parallel computer architecture consisting of a root/master processing element (PE) (corresponding to “a master computation module”) and leaves/slave PEs (corresponding to “one or more slave computation modules”), interconnected by a series of interconnecting nodes and communication network, where the collective architecture is considered as one large processing element (corresponding to “a computation module”) (Hamalainen p.449 col.2 Tree Shape Architecture for Neural Computations, 1st – p.449 col.1 1st paragraph: “The tree shape parallel computer architecture is depicted in Figure 1. The top of the tree is formed by a horizontal line of PEs, the trunk is composed of interconnecting nodes and the root may be a PE or an interface to another system. In general, this architecture can be referred to as a master-slave configuration, where the root acts as a master and the top PEs as slaves. The root may be only an initiator for PEs in the beginning of execution, or it may feed PEs continuously in real-time. … the communication network as a whole becomes an active 'PE'.”).] [Hamalainen p.449 Figure 2; p.450 Figure 3; p.451 Figure 4: examiner’s note: The parallel computer architecture supporting two parallelism modes (node parallelism and weight parallelism), with both providing input and weight data from the master PE through broadcasting (node parallelism) or through a distributed assignment of inputs and weights to the processing elements (weight parallelism), where in both modes the master computation module of a computation module receives collectively the input vector and weight inputs as “one or more groups of MNN data” (Hamalainen p.449 col.1 3rd paragraph - col.2 2nd paragraph: “In the node parallel mapping, … the input vector is the same for all neurons … The communication network broadcasts the input vector from the root to the PEs … In the weight parallelism the calculation of a single neuron output is distributed to all elements in the tree. Each PE is assigned only one input (and weight) of the neuron. Due to this, the input vector is now delivered element by element to PEs, such that the leftmost PE gets the first element and so on. This cannot be done by broadcasting, so the root has to write elements individually. The task of the PE is simply to multiply the input element with a weight assigned to that neuron input.”).]) … ,
wherein the one or more groups of MNN data include input data and one or more weight values (Hamalainen p.449 col.2 2nd paragraph – p.450 col.1 2nd paragraph: examiner’s note: The input vector data broadcasted by the master PE to the slave PEs determines the selection and adjustment of the weights assigned to each PE during neural network computations, such that the input vector becomes a representation of the weight data, thus corresponding to “one or more groups of MNN data include input data and one or more weight values” (“The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. Now the communication network sums these weighted inputs, as illustrated in Figure 4. … We have found it useful to let the communication network be an adder tree or a broadcasting medium. Different requirements are given with Kohonen's Self-Organizing Feature Map (SOFM) algorithms. … The idea is to let the weights of the neurons self-organize according to the topology of the input data. Practically this is done by first seeking the closest weight with respect to the input vector and then adjusting that weight towards the input vector. … Each PE is mapped to one neuron (or in fact weight), and the communication network is used to broadcast the input vector to all PEs.”).) and …
calculating, by one or more slave computation modules of the computation module, one or more groups of slave output values (Hamalainen p.449, Figure 2 Broadcast operation, 450 Figure 3 Node parallel computation, p.451 Figure 4 Weight parallel computation: examiner’s note: For both parallelism modes, the receiving PEs (corresponding to “one or more slave computation modules of the computation module”) perform a neural network computation involving a calculation of an intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    at each neural network layer (where intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    corresponds to “calculating … one or more groups of slave output values”) (Hamalainen p.449 col.1 2nd paragraph: “… consider a mapping of the layered perceptron neural network using node and weight parallel mapping styles. The task of the perception is to calculate a thresholded output from the sum of weighted inputs                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     = f(                        
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            j
                                            i
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    ), where                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     is the output of the ith neuron in a layer,                         
                            
                                
                                    w
                                
                                
                                    j
                                    i
                                
                            
                        
                     denotes a weight and                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     is an item of the input vector.”).) … ; 
calculating, by the master computation module, a merged intermediate vector (Hamalainen p.449 Figure 2, p.450 Figure 3, and p.451 Figure 4: examiner’s note: For both parallelism modes, the root (master PE, corresponding to “the master computation module”) receives the outputs produced by the slave PEs for each network layer i (corresponding to “calculating … a merged intermediate vector”) and generates and stores this intermediate vector for use by the PEs to calculate the output for the next layer i+1, until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph: “In the node parallel mapping … When a PE has finished the calculation of the neuron output, the root collects all outputs and forms an output vector for that particular layer or neurons. The next neuron layer is processed by first broadcasting the output vector of the previous layer and then repeating the other steps. … In the weight parallelism the calculation of a single neuron output is distributed to all element in the tree. … The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. … After summation, the root performs thresholding and stores the neuron output to a vector for further usage. The output of the next neuron in a layer can be calculated by changing the weights in each PE …”).) … ; and 
generating, by the master computation module, an output vector based on the merged intermediate vector (Hamalainen p.449 Figure 2, p.450 Figure 3, and p.451 Figure 4: examiner’s note: For both parallelism modes, the root (master PE, corresponding to “the master computation module”) receives the outputs produced by the slave PEs for each network layer i and generates and stores this intermediate vector for use by the PEs to calculate the output for the next layer i+1, until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network (corresponding to “generating … an output vector based on the merged intermediate vector”) (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph: “In the node parallel mapping … When a PE has finished the calculation of the neuron output, the root collects all outputs and forms an output vector for that particular layer or neurons. The next neuron layer is processed by first broadcasting the output vector of the previous layer and then repeating the other steps. … In the weight parallelism the calculation of a single neuron output is distributed to all element in the tree. … The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. … After summation, the root performs thresholding and stores the neuron output to a vector for further usage. The output of the next neuron in a layer can be calculated by changing the weights in each PE …”).).  
While Hamalainen teaches a DSP unit within each processing unit (Hamalainen p.455 Figure 11) with the capability to perform fixed-point arithmetic (Hamalainen p.455 col.2 Processing Unit, 1st paragraph – p.456 col.1 1st paragraph), Hamalainen does not explicitly teach
… wherein at least a portion of the input data and the weight values are stored as discrete values; …
calculating, by one or more slave computation modules of the computation module, … based on a data type of each of the one or more groups of MNN data;
calculating, by the master computation module, … based on the data type of each of the one or more groups of MNN data;
Henry teaches
… wherein at least a portion of the input data and the weight values are stored as discrete values (Henry Figure 1, elements 121, 126, 124, 122: examiner’s note: A NNU processing module (corresponding to a “computation module”) containing a plurality of NPUs (with the plurality of NPUs corresponding to “… that includes a master computation module and one or more slave computation modules”, and a single NPU corresponding to “a master computation module”) containing data and weight RAM to store input data and weight data (Henry paragraph [0051]: “The NNU 121 includes a weight random access memory (RAM) 124, a data RAM 122, N neural processing units (NPUs) 126, … a sequencer 128 … The NPUs 126 function conceptually as neurons in a neural network. The weight RAM 124, data RAM 122 and program memory 129 are all writable and readable … The weight RAM 124 is arranged as W rows of N weight words, and the data RAM 122 is arranged as D rows of N data words. Each data word and each weight word is a plurality of bits, preferably 8 bits, 9 bits, 12 bits or 16 bits.”), where each row of weight or data RAM is assigned to one of the NPUs within the NNU (Henry paragraph [0054]: “The sequencer 128 also generates a memory address 125 and a read command for provision to the weight RAM 124 to select one of the W rows of N weight words for provision to the N NPUs 126. … The sequencer 128 also generates a memory address 123 and a write command for provision to the data RAM 122 to select one of the D rows of N data words for writing from the N NPUs 126.”), and where each NPU performs arithmetic operations (add, multiply, accumulate; Henry paragraphs [0059]-[0060]) on the singular value weight word or data word stored in the weight or data RAM (corresponding to “wherein at least a portion of the input data  under its broadest reasonable interpretation in light of the applicant’s specification paragraph [0007], an integer or fixed-point value represents a discrete data value) (Henry Figure 2, elements 126-J, 203, 206, 207, 209; paragraph [0062]: “Preferably, although the weight word 203 and the data word 209 are the same size (in bits), they may have different binary point locations, … Preferably, the multiplier 242 and adder 244 are integer multipliers and adders, as described in more detail below, to advantageously accomplish less complex, smaller, faster and lower power consuming ALUs 204 than floating-point counterparts. However, it should be understood that in other embodiments the ALU 204 performs floating-point operations.”).); …
calculating, by one or more slave computation modules of the computation module, … based on a data type of each of the one or more groups of MNN data (Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU (corresponding to “one or more slave computation modules”) is instructed via translated micro-instructions to perform neural network calculations such as multiply-accumulate, add, and activation functions on received input values and weight data from the weight RAM and data RAM (Henry paragraph [0060]), where the multiply-accumulate, add operations within the ALU are configured by a control register (Henry paragraphs [0223]-[0227]) to perform integer or fixed-point operations (i.e., discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations; Henry paragraphs [0219]-[0221], where under its broadest reasonable interpretation in light of the applicant’s specification paragraphs [0005]-[0006], streaming data that involves high-precision values (such as floating-point values) represent continuous data values), and where the activation function unit (AFU) performs normalization operations to generate an output within a specified range of values based on a non-linear function (also controlled by the same control register; Henry paragraphs [0228]-[0231]); collectively these operations correspond to performing calculations “based on a data Henry paragraph [0064]) (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).);
calculating, by the master computation module, … based on the data type of each of the one or more groups of MNN data (Henry Figure 2, elements 126-J, 205, 206, 207, 208: examiner’s note: An NPU (corresponding to “the master computation module”) is instructed via translated micro-instructions to perform neural network calculations such as multiply-accumulate, add, and activation functions on received input values and weight data from the weight RAM and data RAM (Henry paragraph [0060]), where the multiply-accumulate, add operations within the ALU are configured by a control register (Henry paragraphs [0223]-[0227]) to perform integer or fixed-point operations (i.e.,  discrete data type operations) and floating-point operations with fixed-point hardware-assist logic (i.e., continuous data type operations; Henry paragraphs [0219]-[0221], where under its broadest reasonable  and where the activation function unit (AFU) performs normalization operations to generate an output within a specified range of values based on a non-linear function (also controlled by the same control register; Henry paragraphs [0228]-[0231]); collectively these operations correspond to performing calculations “based on a data type of each of the one or more groups of MNN data” (Henry paragraph [0064]) (Henry paragraph [0059]: “The NPU 126 operates to perform many functions, or operations. In particular, advantageously the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function, or operation. That is, generally speaking, the NPU 126 (neuron) is configured to: (1) receive an input value from each neuron having a connection to it, typically but not necessarily from the immediately previous layer of the artificial neural network; (2) multiply each input value by a corresponding weight value associated with the connection to generate a product; (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron. However, rather than performing all the multiplies associated with all the connection inputs and then adding all the products together as in a conventional manner, advantageously each neuron is configured to perform, in a given clock cycle, the weight multiply operation associated with one of the connection inputs and then add (accumulate) the product with the accumulated value of the products associated with connection inputs processed in previous clock cycles up to that point.”).); …
Both Hamalainen and Henry are analogous art since they both teach performing neural network computations using neural network hardware architectures.
It would have been obvious to a person having ordinary skill in the art to substitute the DSP circuitry (found within each master and slave processing elements) taught in Hamalainen 
While Hamalainen in view of Henry teaches a DSP unit within each processing unit (Hamalainen p.455 Figure 11) with the capability to perform block moves of data (Hamalainen p.456 col.2 Processing Unit, 1st paragraph), Hamalainen in view of Henry does not explicitly teach
receiving, by the master computation module of the computation module, … from a direct memory access unit, …
Gilbert teaches
receiving, by the master computation module of the computation module, … from a direct memory access unit (Gilbert Figure 1, elements 16, 12, 14; Figure 2, elements 50a, 50b, 76: examiner’s note: A master processor and an array of slave processor elements (where the master processor and slave processor elements are identical to each other; Gilbert col. 4 lines 36-45), where each processor element contains a DMA controller (Gilbert Figure 2, element 76; col.5 lines 33-39) connected to two SRAMs that store data and processing instructions, where the DMA controller manages data transfers into each processing element (corresponding to “receiving, by the master computation module of the computation module, … from a direct memory access unit”) (Gilbert Figure 2, elements 50a, 50b; col.5 lines 12-15: “FIG.2 shows the internal and operational hardware of a single processor element 14. A large on-chip memory 50 includes two separate SRAMs 50a, 50b, which store data and processing instructions.” and Gilbert col.6 line 58 – col.7 line 8: “There are two general types of data transfers: input and output between a processor element and the host 20; and interprocessor communication between processor elements in the array 12. These two different data transfers share a common DMA mechanism in the hardware of each processor element and are specified by TCB data structures. In each case, one channel of the DMA hardware serves to move data 65 into the processor element, while another channel moves data out. The transfers are carried out independently of, and non-interfering with, the computational core processor section 52 of the processor element due to its dual-ported memory. Data transfers are also "through-routed", and are stored in memory only after arriving at the destination processing element. Such transfers thus bypass the internal memory of intervening slaves when there is communication between non-adjacent slaves. In this manner, the processor element enhances throughput via concurrent computation and communication.”).), …
Both Hamalainen in view of Henry and Gilbert are analogous art since they both teach neural network hardware architectures.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to substitute the step involving the block moves of data in the DSP circuitry of Hamalainen in view of Henry with the DMA controller of Gilbert in order to support the same predictable results of performing block moves of data within the DSP circuitry. The motivation to combine is also further taught in Gilbert, since a DMA controller performs these block moves of data independently of the processing element, thereby providing concurrent data throughput and communication and hence improving the overall computational efficiency of the system (Gilbert col.6 line 63 – col.7 line 8: “In each case, one channel of the DMA hardware serves to move data 65 into the processor element, while another channel moves data out. The transfers are carried out independently of, and non-interfering with, the computational core processor section 52 of the processor element due to its dual-ported memory. Data transfers are also "through-routed", and are stored in memory only after arriving at the destination processing element. Such transfers thus bypass the internal memory of intervening slaves when there is communication between non-adjacent slaves. In this manner, the processor element enhances throughput via concurrent computation and communication.”).
Regarding Claim 18, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 17, further comprising: 
combining, by an interconnection unit, the one or more groups of slave output values to generate one or more intermediate result vectors (This claim limitation is similar in scope to a corresponding limitation in Claim 2, and hence is rejected under similar rationale.).  
Regarding Claim 19, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 17, further comprising: 
parallelly calculating, by the one or more slave computation modules, the one or more groups of slave output values based on the input data and the weight values (This claim limitation is similar in scope to a corresponding limitation in Claim 3, and hence is rejected under similar rationale.).  
Regarding Claim 20, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 17, further comprising performing, by the master computation module, one operation selected from : 
adding a bias value to the merged intermediate vector; 
activating the merged intermediate vector with an activation function, wherein the activation function is a function selected from  non-linear sigmoid, tanh, relu, and softmax (This claim limitation is similar in scope to a corresponding limitation in Claim 4, and hence is rejected under similar rationale.); 
outputting a predetermined value based on a comparison between the merged intermediate vector and a random number; and 
pooling the merged intermediate vector.  
Regarding Claim 21, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 17, further comprising: 
temporarily storing, by a master neuron caching unit of the master computation module, the input data and the output vector (This claim limitation is similar in scope to a corresponding limitation in Claim 6, and hence is rejected under similar rationale.); and 
performing, by a master computation unit of the master computation module, one of one or more operations that corresponds to the data type of each of the one or more groups of MNN data (This claim limitation is similar in scope to a corresponding limitation in Claim 6, and hence is rejected under similar rationale.).  
Regarding Claim 22, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 17, further comprising 
preventing, by a master data dependency relationship determination unit of the master computation module, an instruction from being executed based on a determination that a conflict exists between the instruction and other instructions (This claim limitation is similar in scope to a corresponding limitation in Claim 7, and hence is rejected under similar rationale.).  
Regarding Claim 23, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 17, further comprising 
receiving, by a slave computation unit of each of the slave computation modules, one or more groups of micro-instructions from a controller unit (This claim limitation is similar in scope to a corresponding limitation in Claim 8, and hence is rejected under similar rationale.);  
performing, by the slave computation unit, arithmetic logical operations that respectively correspond to the data type of the MNN data (This claim limitation is similar in scope to a corresponding limitation in Claim 8, and hence is rejected under similar rationale.); 
performing, by a slave data dependency relationship determination unit of each of the slave computation modules, data exchange operations based on a determination that no conflict exists between the data exchange operations (This claim limitation is similar in scope to a corresponding limitation in Claim 8, and hence is rejected under similar rationale.); 
temporarily storing, by a slave neuron caching unit of each of the slave computation modules, the input data and the slave output values (This claim limitation is similar in scope to a corresponding limitation in Claim 8, and hence is rejected under similar rationale.); and 
temporarily storing, by a weight value caching unit of each of the slave computation modules, the weight values (This claim limitation is similar in scope to a corresponding limitation in Claim 8, and hence is rejected under similar rationale.).  
Regarding Claim 24, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 23, further comprising: 
determining, by the slave data dependency relationship determination unit, whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed (This claim limitation is similar in scope to a corresponding limitation in Claim 9, and hence is rejected under similar rationale.); and 
if there is no dependent relationship, allowing, by the slave data dependency relationship determination unit, the micro-instruction which has not been executed to be executed immediately (Under its broadest reasonable interpretation, this claim limitation in a method claim recites a contingent clause that effectively renders the subsequent claim limitation and claim body to not be performed because the condition precedent (“if there is no dependent relationship, allowing, … otherwise, …”) is not required to be met, and the claimed invention can be practiced without the condition occurring. See MPEP 2111.04(II). Applicant is advised to amend the claim to positively cite the condition as being fulfilled, since no patentable weight is given for the subsequent claim language following a contingent clause that does not require the condition to be fulfilled for practicing the claimed invention. However, for purposes of examination, this “if” clause will be treated as if the condition were fulfilled, thus allowing the subsequent claim limitation and claim body for further examination.) (This claim limitation is similar in scope to a corresponding limitation in Claim 9, and hence is rejected under similar rationale.), 
otherwise, the micro-instruction which has not been executed will not be allowed to execute until the execution of all the micro-instructions upon which that micro-instruction which has not been executed depend is completed (This claim limitation is similar in scope to a corresponding limitation in Claim 9, and hence is rejected under similar rationale.).  
Regarding Claim 25, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 21, further comprising: 
determining, by an operation determiner of the master computation unit, an operation to be performed based on the data type of the input data (This claim limitation is similar in scope to a corresponding limitation in Claim 10, and hence is rejected under similar rationale.); and 
performing, by a hybrid data processor of the master computation unit, the determined operation (This claim limitation is similar in scope to a corresponding limitation in Claim 10, and hence is rejected under similar rationale.).  
Regarding Claim 26, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 23, further comprising: 
determining, by an operation determiner of the slave computation unit, an operation to be performed based on the data type of the input data (This claim limitation is similar in scope to a corresponding limitation in Claim 11, and hence is rejected under similar rationale.); and 
performing, by a hybrid data processor of the slave computation unit, the determined operation (This claim limitation is similar in scope to a corresponding limitation in Claim 11, and hence is rejected under similar rationale.).  
Regarding Claim 27, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 21, further comprising: 
determining, by a data type determiner of the master computation unit, the data type of the input data (This claim limitation is similar in scope to a corresponding limitation in Claim 12, and hence is rejected under similar rationale.); and 
processing, by a discrete data processor of the master computation unit, the input data based on a determination that the input data is stored as discrete values (This claim limitation is similar in scope to a corresponding limitation in Claim 12, and hence is rejected under similar rationale.); and 
processing, by a continuous data processor of the master computation unit, the input data based on a determination that the input data is stored as continuous values (This claim limitation is similar in scope to a corresponding limitation in Claim 12, and hence is rejected under similar rationale.).  
Claims 28-30 are rejected under 35 U.S.C. 103 as being unpatentable over Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, Microprocessors and Microsystems Vol. 19 Number 8, Elsevier Science B.V., October 1995, pp.447-465 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry], in further view of Gilbert, Ira H., U.S. Patent 5,752,068, issued 5/12/1998 [hereafter referred as Gilbert], in even further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, arXiv:1603.01025v2, March 17 2016, 10 pages [hereafter referred as Miyashita].
Regarding Claim 28, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 17, further comprising 
receiving, by a data converter, continuous data (This claim limitation is similar in scope to a corresponding limitation in Claim 13, and hence is rejected under similar rationale.); …
transmitting, by the data converter, the discrete data to the computation module (This claim limitation is similar in scope to a corresponding limitation in Claim 13, and hence is rejected under similar rationale.).  
Hamalainen in view of Henry, in further view of Gilbert teaches supporting floating-point operations using fixed-point hardware assist (corresponding to the high-precision aspect of “continuous data”), Hamalainen in view of Henry, in further view of Gilbert does not explicitly teach
converting, by the data converter, the continuous data to discrete data; …
Miyashita teaches
converting, by the data converter, the continuous data to discrete data (This claim limitation is similar in scope to a corresponding limitation in Claim 13, and hence is rejected under similar rationale.); …
Both Hamalainen in view of Henry, in further view of Gilbert and Miyashita are analogous art since they both teach performing data computations in neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the real-time input data containing high-precision values of Hamalainen in view of Henry, in further view of Gilbert and perform the quantization method of Miyashita as a way to convert the streaming real-time data into discrete data values for further computational processing in the neural network hardware architecture. The motivation to combine is taught in Miyashita, as quantization introduces a form of compression of the data set without significant deterioration in performance of the neural network, thereby also improving the performance and efficiency of the neural network hardware architecture by allowing these computations to be performed on less complex hardware without taking up additional computational resources ([Miyashita p.1 col.1 Abstract: “Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance.”] [Miyashita p.1 col.2 Section 1. Introduction: “In order for these large networks to run in real-time applications such as for mobile or embedded platforms, it is often necessary to use low-precision arithmetic and apply compression techniques.”]).
Regarding Claim 29, Hamalainen in view of Henry, in further view of Gilbert, in even further view of Miyashita teaches
The method of claim 28, further comprising 
receiving, by the data converter, continuous data from an external storage device (This claim limitation is similar in scope to a corresponding limitation in Claim 15, and hence is rejected under similar rationale.).  
Regarding Claim 30, Hamalainen in view of Henry, in further view of Gilbert teaches
The method of claim 17, further comprising 
receiving, by a data converter, continuous data from an external storage device (This claim limitation is similar in scope to a corresponding limitation in Claim 16, and hence is rejected under similar rationale.); …
transmitting, by the data converter, the discrete data to the external storage device (This claim limitation is similar in scope to a corresponding limitation in Claim 16, and hence is rejected under similar rationale.).  
While Hamalainen in view of Henry, in further view of Gilbert teaches supporting floating-point operations using fixed-point hardware assist (corresponding to the high-precision aspect of “continuous data”), Hamalainen in view of Henry, in further view of Gilbert does not explicitly teach
converting, by the data converter, the continuous data to discrete data; … 
Miyashita teaches
converting, by the data converter, the continuous data to discrete data (This claim limitation is similar in scope to a corresponding limitation in Claim 16, and hence is rejected under similar rationale.); … 
Both Hamalainen in view of Henry, in further view of Gilbert and Miyashita are analogous art since they both teach performing data computations in neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the real-time input data containing high-precision values of Hamalainen in view of Henry, in further view of Gilbert and perform the quantization method of Miyashita as a way to convert the streaming real-time data into discrete data values for further computational processing in the neural network hardware architecture. The motivation to combine is taught in Miyashita, as quantization introduces a form of compression of the data set without significant deterioration in performance of the neural network, thereby also improving the performance and efficiency of the neural network hardware architecture by allowing these computations to be performed on less complex hardware without taking up additional computational resources ([Miyashita p.1 col.1 Abstract: “Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance.”] [Miyashita p.1 col.2 Section 1. Introduction: “In order for these large networks to run in real-time applications such as for mobile or embedded platforms, it is often necessary to use low-precision arithmetic and apply compression techniques.”])
Claim 31 is rejected under 35 U.S.C. 103 as being unpatentable over Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, Microprocessors and Microsystems Vol. 19 Number 8, Elsevier Science B.V., October 1995, pp.447-465 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry], in further view of Gilbert, Ira H., U.S. Patent 5,752,068, issued 5/12/1998 [hereafter referred as Gilbert], in even further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, arXiv:1603.01025v2, March 17 2016, 10 pages [hereafter referred as Miyashita], in even further view of Hassner et al., U.S. Patent 5,638,065, issued 6/10/1997 [hereafter referred as Hassner].
Regarding Claim 31, Hamalainen in view of Henry, in further view of Gilbert, in even further view of Miyashita teaches
The method of claim 28, further comprising: 
clipping, by a preprocessing unit of the data converter, a portion of the input data that is within a predetermined range to generate preprocessed data (This claim limitation is similar in scope to a corresponding limitation in Claim 14, and hence is rejected under similar rationale.); …
However, Hamalainen in view of Henry, in further view of Gilbert, in even further view of Miyashita does not teach
calculating, by a distance calculator of the data converter, multiple distance values between the preprocessed data and multiple discrete values; and 
comparing, by a comparer of the data converter, the multiple distance values to output one or more of the multiple discrete values.  
Hassner teaches
calculating, by a distance calculator of the data converter, multiple distance values between the preprocessed data and multiple discrete values (This claim limitation is similar in scope to a corresponding limitation in Claim 14, and hence is rejected under similar rationale.); and 
comparing, by a comparer of the data converter, the multiple distance values to output one or more of the multiple discrete values (This claim limitation is similar in scope to a corresponding limitation in Claim 14, and hence is rejected under similar rationale.).  
Both Hamalainen in view of Henry, in further view of Gilbert, in even further view of Miyashita and Hassner are analogous art since they both teach processing of discrete data in hardware.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the output from the preprocessing unit of Hamalainen in view of Henry, in further view of Gilbert, in even further view of Miyashita and couple it as an input into the filter units of Hassner as a way to perform maximum-likelihood decisions (i.e., performing distance calculations between two vectors, generating one or more discrete values that most likely represents the preprocessed data) in hardware. The motivation to combine is taught in Hassner, since implementing this logic in hardware frees up the computational resources within each individual neural network processor unit to perform other complex neural network computations, thus improving the overall efficiency of the system (Hassner col.1 lines 10-16: “This invention relates to an apparatus and method for processing analog signals … and more particularly to an Apparatus and method for detecting multiple-bit symbols by (i) converting the analog signals into analog vectors using a linear Walsh transform, and making maximum-likelihood decisions using vector metric calculations which are determined by the selected run-length-limited modulation code and equalized linear channel response signal shape and are implemented by analog matched filters, analog comparators, and digital sequential finite-state machines matched to RLL-coded symbols.”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332.  The examiner can normally be reached on Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121