DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 16/182,420, filed November 6, 2018.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed April 20, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 16/182,420, which include: Amendments to the Specification (marked-up and clean copy), Amendments to the Claims, Amendments to the Drawings and Appendix (10 pages), and Remarks containing Applicant’s amendments. 
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Claims 1-31 have been amended. Claims 1-31 remain pending in the application. 
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges the claim objection identified in Claim 5 is now resolved, and therefore this respective claim objection previously set forth in the Non-Final Office Action mailed October 20, 2021 is withdrawn.
Regarding Applicant’s Remarks and Amendments to the Specification, Examiner acknowledges Applicant’s amendments (identified in the Marked-Up Specification) have resolved the corresponding specification objections previously set forth in the Non-Final Office Action mailed October 20, 2021, and therefore those respective specification objections are withdrawn. However, Examiner notes that one of the resulting corrections has generated a new specification objection, which will be identified in the relevant section indicated below.
Regarding Applicant’s Remarks and Amendments to the Drawings, Examiner acknowledges Applicant’s amendments have resolved certain drawing objections (typographical error in Figure 1B; bi-directional lines in Figure 1C; dotted blocks in Figure 5; missing reference characters in Figure 7) previously set forth in the Non-Final Office Action mailed October 20, 2021, and therefore those respective objections are withdrawn. However, Examiner notes that the other identified drawing objections have not been resolved, and hence those respective objections will be maintained, with those objections indicated in the relevant sections indicated below.
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges the Amendments to the Claims have removed the claim language that invoked the identified 112(f) claim interpretations, and as such, those respective 112(f) claim interpretation invocations previously set forth in the Non-Final Office Action mailed October 20, 2021 for Claims 1-4 and 6-16 are withdrawn.
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Applicant’s amendments have resolved the indefiniteness and lack of antecedent issues identified in Claims 1, 4-6, 8-11, 20-21, 23-26, and 28 (and inherited in their respective dependent claims), and therefore those respective §112(b) rejections previously set forth in the Non-Final Office Action mailed October 20, 2021 are withdrawn. Examiner further notes that Applicant has introduced new matter into amended Claim 5 such that a new 112(a) rejection is now identified, with this new rejection indicated in the relevant section indicated below.

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/182,420, which include: Remarks containing Applicant’s arguments. 
Regarding Applicant’s Remarks regarding the Information Disclosure Statement and associated NPL documents submitted on 10/30/2019, Examiner acknowledges that the Applicant has provided an updated Information Disclosure Statement submitted on 5/6/2022 that includes the English translation for the International Search Report for PCT/CN2016/0479431 that also indicates the correct mail date of January 16, 2017, and therefore this reference will be considered. 
However, Applicant’s submission for the other two NPL references (now combined into one NPL reference with the title “Parallel BP Neural Network Training Algorithm based on Data”) still do not meet the standard NPL reference submission as described in MPEP 609.04(a)(I), which specifies the guidelines for content requirements for an Information Disclosure Statement and their corresponding documents under 37 CFR 1.98, including: “Each publication must be identified by publisher, author (if any), title, relevant pages of the publication, and date and place of publication. The date of publication supplied must include at least the month and year of publication, except that the year of publication (without the month) will be accepted if the applicant points out in the information disclosure statement that the year of publication is sufficiently earlier than the effective U.S. filing date and any foreign priority date so that the particular month of publication is not in issue. The place of publication refers to the name of the journal, magazine, or other publication in which the information being submitted was published. See MPEP § 707.05(e), for more information on data that should be used when citing publications and electronic documents.”. Examiner points out the combined NPL reference still does not have the author’s name as indicated in the 5/6/2022 IDS (Xuan Zhang), and the combined reference still does not have the same date as indicated in the IDS (May 10, 2010). Furthermore, the two different references in the combined NPL reference do not share any common identifier (such as a title or author’s name) such that a person would realize these sections are originating from the same master’s thesis. Examiner notes that the English translations for both NPL references are obtained from a Google Translate web page, which means that the source for the original master’s thesis is accessible from the web and has a corresponding web page URL. MPEP 609.04(a)(I) provides additional guidelines for publications obtained from the Internet: “For publications obtained from the Internet, the uniform resource locator (URL) of the Web page that is the source of the publication must be provided for the place of publication (e.g., "www.uspto.gov"). Further, for an Internet publication obtained from a website that archives Web pages, both the URL of the archived Web page submitted for consideration and the URL of the website from which the archived copy of the Web page was obtained should be provided on the document listing (e.g., "Hand Tools," Web page <http://www.farmshopstore.com/handtools.html>, 1 page, August 18, 2009, retrieved from Internet Archive Wayback Machine <http://web.archive.org/web/20090818144217/ http://www.farmshopstore.com/handtools.html> on December 20, 2012).”. Examiner suggests that if the English translation of the complete master’s thesis is not available for consideration, Applicant should include at least the English-translated cover page from the master’s thesis that explicitly lists the author’s name and proper publication date, along with the corresponding URL of the web page (if available), and include this cover page as part of the combined NPL reference to establish that the combined references are originating from the same URL web page source from the same author and published on the indicated date.
Regarding Applicant’s Remarks regarding the 35 U.S.C. 101 double patenting rejection, Examiner acknowledges Applicant’s abandonment of co-pending application 16/093,956, and therefore the provisional statutory double patenting rejection previously set forth in the Non-Final Office Action mailed October 20, 2021 is withdrawn.
Regarding Applicant's Remarks for Claims 1-12 under 35 U.S.C. 103 as being unpatentable over Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, October 1995 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry]; for Claims 13 and 15-16 under 35 U.S.C. 103 as being unpatentable over Hamalainen in view of Henry, in further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, March 17 2016 [hereafter referred as Miyashita]; for Claim 14 under 35 U.S.C. 103 as being unpatentable over Hamalainen in view of Henry, in further view of Miyashita, in even further view of Hassner et al., U.S. Patent 5,638,065, issued 6/10/1997 [hereafter referred as Hassner]; for Claims 17-27 under 35 U.S.C. 103 as being unpatentable over Hamalainen in view of Henry, in further view of Gilbert, Ira H., U.S. Patent 5,752,068, issued 5/12/1998 [hereafter referred as Gilbert]; for Claims 28-30 under 35 U.S.C. 103 as being unpatentable over Hamalainen in view of Henry, in further view of Gilbert, in even further view of Miyashita; and for Claim 31 under 35 U.S.C. 103 as being unpatentable over Hamalainen in view of Henry, in further view of Gilbert, in even further view of Miyashita, in even further view of Hassner, Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. Examiner notes that the majority of the Applicant’s prior arguments are directed to the newly added claim limitations that were not previously presented, where the new claim limitations necessitate further examination and re-evaluation of the amended and original claims. The updated claim mappings according to the Applicant’s amended claims are provided in the sections indicated below. However, Examiner will address the interpretation of the amended term provided by the Applicant in the following paragraphs.
Regarding Applicant’s Remarks:
“The Office Action admits that Hamalainen fails to teach "at least a portion of the input data and the weight values are stored as discrete values" but alleges that Henry cures the deficiency. Applicant respectfully disagrees and submits that at least Henry fails to disclose or suggest the amended feature as shown in current claim 1.
More specifically, the Office Action alleges that Henry discloses the singular value weight word or data word stored in the weight or data RAM and explains that "under its broadest reasonable interpretation in light of the applicant's specification paragraph [0007], an integer or fixed-point value represents a discrete data value." Applicant respectfully submits that the difference between the claimed subject matter and the disclosure of Henry is not what represents a discrete data value. Rather, it is that the discrete values are predetermined and represented by some of the input data or weight values. The Specification provides an example, in which a 2-bit discrete data represents four predetermined discrete values (e.g., 00, 01, 10, 11 respectively represents -1, -0.5, 0.125, 2), rather than 0, 1, 2, 3. In other words, the true discrete values that the 2-b[i]t discrete data represents can be any numbers as long as the numbers are predetermined.
Thus, it is respectfully submitted that Henry fails to compensate for the deficiency of Hamalainen. Since other references are not shown to cure the deficiency of Hamalainen, Applicant respectfully submits that claim 1 is now allowable over the combination of the cited references. Claim 17 and other dependent claims are also allowable for at least the same reason.
Therefore, the withdrawal of the rejections under 35 USC 103 is respectfully requested.”
Examiner has considered this argument, and finds the argument to be not persuasive. Regarding the first part of Applicant’s argument that suggests that the Henry reference does not properly teach the original limitation as indicated in the Non-Final Office Action mailed October 20, 2021: “… wherein at least a portion of the input data and the weight values are stored as discrete values …”, Examiner finds this sub-argument to be not persuasive. MPEP 2111 requires that during patent examination, the pending claims must be given their broadest reasonable interpretation consistent with the specification, and an Examiner must construe claim terms in the broadest reasonable manner during prosecution as is reasonably allowed in an effort to establish a clear record of what applicant intends to claim. As indicated in the Non-Final Office Action mailed October 20, 2021, Applicant’s independent claims contain the limitation: “wherein at least a portion of the input data and the weight values are stored as discrete values”. Applicant’s original specification paragraph [0007] states: “Discrete data representation may refer to designating one or more numbers to represent one or more discrete values. For example, typically, binary numbers, 00, 01, 10, and 11, represent continuous values, 0, 1, 2, and 3. In some examples of discrete data representation, the four binary numbers (00, 01, 10, and 11) may be designated to respectively represent discrete values, e.g., -1, -1/8, 1/8, and 1.”, where the term “discrete values” is used in an example to describe “… discrete values (e.g., -1, -1/8, 1/8, and 1)”. A person having ordinary skill in the art would understand that this set of exemplary values recited in Applicant’s original specification paragraph [0007] broadly recite instances of integer values (e.g., -1, 1, which are integers) and fixed-point values (e.g., -1/8, 1/8, which can also be expressed as -0.125 and 0.125 in fixed-point notation, respectively). As indicated in the same Non-Final Office Action, the Henry reference teaches a weight RAM arranged in W rows of N weight words, and data RAM arranged in D rows of N data words, where each data word and weight word is a plurality of bits (preferably 8, 9, 12 or 16 bits). Henry further teaches each NPU performs neural network multiply-accumulate operations using the data stored in the weight RAM and data RAM, where data is read from the weight RAM into a NPU register, thus indicating that a weight value is stored in the weight RAM (Henry [0059]-[0060]: “… the NPU 126 is configured to operate as a neuron, or node, in an artificial neural network to perform a classic multiply-accumulate function … the NPU 126 … is configured to (1) receive an input value from each neuron having a connection to it … (2) multiply each input value by a corresponding weight value associated with the connection … (3) add all the products to generate a sum; and (4) perform an activation function on the sum … The NPU 126 includes a register 205, a 2-input multiplexed register (mux-reg) 208, an arithmetic logic unit (ALU) 204 … The register 205 receives a weight word 206 from the weight RAM 124 …”). Henry [0062] additionally teaches that the ALU within each NPU performs multiply-accumulate operations (with the multiplier and adder within the ALU performing integer multiplies and adds) on the weight and data word values, such that the fact that the indicated integer multiply and add operations are performed on the weight and data word values indicate that the weight words and data words stored in their respective RAMs are stored as integers (and hence are stored as discrete values): “… the ALU 204 includes a multiplier 242 that multiplies the weight word 203 and the data word of the mux-reg 208 output 29 to generate a product 246. … The ALU 204 also includes an adder 244 that adds the product 246 to the accumulator 202 output 217 … Preferably, although the weight word 203 and the data word 209 are the same size (in bits), they may have different binary point locations, as described in more detail below. Preferably, the multiplier 242 and adder 244 are integer multipliers and adders, as described in more detail below, to advantageously accomplish less complex, smaller, faster and lower power consuming ALUs 204 than floating-point counterparts …”. Hence, given the above evidence, Applicant’s sub-argument is not persuasive, and the prior art rejection is maintained.
	Regarding the second part of Applicant’s argument that suggests that the Applicant’s amended independent claims now recite representations of predetermined discrete values, Examiner also finds this sub-argument to be not persuasive. MPEP 2145 (VI) indicates that “Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.”. Examiner also cites the guidelines in MPEP 2111.01(II) which caution against importing written description into a claim limitation that is broader than the cited embodiment: "Though understanding the claim language may be aided by explanations contained in the written description, it is important not to import into a claim limitations that are not part of the claim. For example, a particular embodiment appearing in the written description may not be read into a claim when the claim language is broader than the embodiment.". Applicant’s amended independent claim limitation now recites: “… wherein at least a portion of the input data and the weight values represents one or more predetermined discrete values …”. Under its broadest reasonable interpretation, this limitation broadly recites that the input data and the weight values are instances of “predetermined discrete values”, where the term “predetermined” broadly recites an association of those discrete values as being input data and weight values for training a neural network. Applicant cites original specification paragraph [0051] to suggest that the input data and weights are first mapped into binary numbers (2-bit discrete data), where these binary numbers are generated as corresponding representations of predetermined discrete values (“… For example, a 2-bit discrete data may represent four discrete values (e.g., 00, 01, 10, and 11 respectively represents  -1, -0.5, 0.125, and 2).”). Contrary to Applicant’s assertion, this amended limitation does not recite generating a set of binary numbers and mapping them to a set of discrete values. Examiner notes that there are two distinct terms provided in the example cited in paragraph [0051]: “discrete data” and “discrete values”, with “discrete data” being associated with the binary numbers 00, 01, 10, and 11, and “discrete values” associated with the integer and fixed-point values -1, -0.5, 0.125, and 2, respectively. Furthermore, Examiner finds that the usage of the term “discrete values” in Applicant’s original specification paragraph [0051] is consistent with the earlier established usage of the same term found in Applicant’s original specification paragraph [0007]: “discrete values (“e.g., -1, -1/8, 1/8, and 1)”, where in both examples, the discrete values are instances of integer and fixed-point values. According to the amended independent claim limitation, Applicant uses the term “one or more predetermined discrete values” to describe the “at least a portion of the input data and the weight values”, and therefore the amended claim limitation broadly recites that the input data and weight values contain instances of values such as integer and fixed-point values, such that these integer and fixed-point values are predetermined based on a source location (based on a storage location or an assignment). Hence, given the above evidence, Applicant’s sub-argument is not persuasive, and the broadest reasonable interpretation explained above (where the term “predetermined” broadly recites an association of those discrete values as being input data and weight values for training a neural network) will be used for analyzing the amended independent claim limitation in the relevant sections indicated below.
	 As indicated earlier, Examiner notes that the remainder of the Applicant’s prior art arguments are directed to the newly added claim limitations recited in the respective independent claims, where the new claim limitations necessitate further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the Applicant’s amended claims are provided in the sections indicated below.

Information Disclosure Statement
The non-patent literature documents in the following information disclosure statements were not considered due to the following reasons:
IDS 5/6/2022: Xuan Zhang, “Parallel BP Neural Network Training Algorithm based on Data”, China Excellence Masters’ Theses, Huazhong University of Science and Technology, May 10, 2010, 7 pages. Examiner notes that this NPL reference submission combines the references from two earlier references from IDS 10/30/2019, with the combined NPL reference being titled as “Parallel BP Neural Network Training Algorithm based on Data”. However, Examiner points out this combined NPL reference still does not have the author’s name as indicated in the 5/6/2022 IDS (Xuan Zhang), and this combined NPL reference still does not have the same date as indicated in the IDS (May 10, 2010). Furthermore, the two different references in the combined NPL reference do not share any common identifier (such as a title or author’s name) such that a person would realize the sections are originating from the same master’s thesis. Examiner notes that the English translations for both NPL references are obtained from a Google Translate web page, which means that the source for the original master’s thesis is accessible from the web and has a corresponding web page URL. Examiner refers to MPEP 609.04(a)(I) for the guidelines for content requirements for an Information Disclosure Statement and their corresponding documents under 37 CFR 1.98, as well as guidelines for publications obtained from the Internet. As indicated earlier, Examiner suggests that if the English translation of the complete master’s thesis document is not available for submission, Applicant should include at least an English-translated cover page from the master’s thesis that lists the author’s name and proper publication date, along with the corresponding URL of the web page (if available), and re-submit this information as part of the combined NPL reference to establish that the combined references are originating from the same source from the same author and published on the indicated date and URL web page.
If applicant wishes to have the above non-patent literature references to be considered, applicant must list them in a new Information Disclosure Statement and submit copies of the references with the required information. See CFR 37 1.98.

Drawings
The drawings are objected to because of the following informality: Figure 7: It is not clear the meaning or usage of the notation “---” underneath the “                    
                        
                            
                                W
                            
                            
                                j
                                2
                            
                        
                    
                ” block within Slave Computation Module 114N. Appropriate correction is required.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification















Examiner notes that both the marked-up and clean copy of the amended specification filed on April 20, 2022 are based on a version of the specification that is different from the original specification filed on November 6, 2018, since the “Cross-Reference to Related Applications” section and its corresponding paragraph (indicating that this application is a continuation-in-part of U.S. Patent Application No. 16/093,956, filed on October 15, 2018, and a 35 U.S.C. 371 National Stage of PCT/CN2016/079431 filed on April 15, 2016) are not present in either the marked-up or clean copy of the amended specification. Examiner points out that the records for this application 16/418,232 still identify it as a continuation-in-part with now-abandoned application 16/093,956 filed on October 15, 2018, and a 35 U.S.C. 371 National Stage of PCT/CN201/079431 filed on April 15, 2016. Examiner further points out that Applicant did not indicate in the Remarks section that this particular section/paragraph is to be explicitly removed from the amended specification, and hence it is not clear whether this particular section/paragraph is now removed intentionally, or whether it is now deleted by mistake. Applicant is asked to clarify the status of this missing section/paragraph from both the marked-up and clean copy of the amended specification, and to restore this section in the next set of amendments (if applicable).
The disclosure is further objected to because of the following informality: Paragraph [0026]: A missing space between “150” and the word “at” in the following sentence: “Fig. 1A is a block diagram illustrating an example computing process 150 at a neural network processor for neural networks.”. Appropriate correction is required.
Claim Objections

















Claims 4, 9, and 24 are objected to 
because of the following informalities:
Claim 4: The strikethrough line needs to be removed from the beginning of the following limitation: “pooling the merged intermediate vector”. Appropriate correction is required.
Claim 9: The phrase “… upon which the first micro-instruction depend …” appears to be grammatically incorrect in the context of the amended limitations: “determine whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed; and if there is no dependent relationship, allow the first micro-instruction to be executed immediately, otherwise, the first micro-instruction will not be allowed to execute until the execution of all micro-instructions upon which the first micro-instruction depend is completed”. Examiner notes that the identified phrase in the amended limitation appears to be used to emphasize the term “otherwise” to indicate there is a dependent relationship between the first micro-instruction and “all micro-instructions”. Examiner suggests the following correction to improve the clarity of the limitation: “determine whether there is a dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed; and if there is no dependent relationship, allow the first micro-instruction to be executed immediately[[,]]; otherwise, if the dependent relationship exists, the first micro-instruction will not be allowed to execute until the execution of all micro-instructions is completed”. Appropriate correction is required.
Claim 24: The phrase “… upon which the first micro-instruction depend …” appears to be grammatically incorrect in the context of the amended limitations: “determining, by the slave data dependency relationship determination circuit, whether there is dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed; and if there is no dependent relationship, allowing, by the slave data dependency relationship determination circuit, the first micro-instruction to be executed immediately, otherwise, the first micro-instruction will not be allowed to execute until the execution of all micro-instructions upon which the first micro-instruction depend is completed”. Examiner notes that the identified phrase in the amended limitation appears to be used to emphasize the term “otherwise” to indicate there is a dependent relationship between the first micro-instruction and “all micro-instructions”. Examiner suggests the following correction to improve the clarity of the limitation: “determining, by the slave data dependency relationship determination circuit, whether there is a dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed; and if there is no dependent relationship, allowing, by the slave data dependency relationship determination circuit, the first micro-instruction to be executed immediately[[,]]; otherwise, if the dependent relationship exists, the first micro-instruction will not be allowed to execute until the execution of all micro-instructions is completed”. Appropriate correction is required.

Claim Rejections - 35 USC § 112



The following is a quotation of the first paragraph of 35 U.S.C. 112(a):

(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
	

Claim 5 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. 
The claim contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding amended Claim 5,
This claim recites the following amended limitation: “… exchanges data between the master computation circuit and the one or more slave computation circuits, wherein the data includes the one or more groups of MNN data, the data type of each of the one or more groups of MNN data, and the one or more groups of slave output values”, where this limitation broadly recites some form of data communication that includes the input vector and weight data and the calculated slave output values between the master computation circuit and the one or more slave computation circuits, where each of these respective groups of data correspond to respective data being received in the corresponding master computation circuit and one or more slave computation circuits as recited in independent Claim 1. However, Applicant has now included the phrase “the data type of each of the one or more groups of MNN data”, which constitutes as new matter, as the original specification does not indicate that the data type of each of the one or more groups of MNN data is exchanged between the master computation circuit and the one or more slave computation circuits. Rather, the original specification explicitly indicates that the data type of the one or more groups of MNN data is either determined by using a data type determiner to analyze the received one or more groups of MNN data at each of the master computation circuit and slave computation circuits (Figure 5, paragraph [0068]: “Referring to Fig. 5, a block diagram illustrates an example master computation unit 302 or an example slave computation unit 402 by which a forward propagation computation of artificial neural networks may be implemented … As depicted, the example master computation unit 302 or the example slave computation unit 402 may include a data type determiner 502 that may be configured to determine the data type of the received MNN data (i.e., discrete data or continuous data).”, or pre-selected by a system administrator (paragraph [0032]: “… Further, with respect to each layer, a data type (i.e., discrete or continuous data) of the input neuron data or the weight values at the layer may be selected by a system administrator prior to the forward propagation process.”). Determining a data type based on received MNN data (i.e., input data and weight values) through a data type determiner at each master or slave computation circuit is not equivalent to exchanging data between the master computation circuit and one or more slave computation circuits, where the term “exchange” in the context of the above limitation indicates some form of data communication between the master computation circuit and the one or more slave computation circuits, such that the data being communicated includes an explicit specified data type. Examiner further points out that Applicant’s original set of claims filed November 6, 2018 also do not mention any data exchange involving “the data type of each of the one or more groups of MNN data”. The specification must describe and support the claims such that the public is informed of the boundaries of what constitutes infringement of the patent, as well as determining whether the claimed invention meets all the criteria for patentability by distinctly claiming the subject matter which the inventor regards as the invention. See MPEP 2163. Given that there is no support of this amended limitation present in the specification, this amended limitation also fails to comply with the written description requirement. For the purposes of examination, this limitation will be examined without giving any patentable weight to the phrase identified as being new matter, such that the limitation will be interpreted as: “… exchanges data between the master computation circuit and the one or more slave computation circuits, wherein the data includes the one or more groups of MNN data, .”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-12 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, October 1995 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry].
Regarding amended Claim 1, 
Hamalainen teaches
(Currently Amended) An apparatus for forward propagation of a multilayer neural network (MNN), comprising:
a computation circuit that includes a master computation circuit and one or more slave computation circuits (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites physical or logical structures (“circuits”) performing the recited functions. Hamalainen teaches a tree-shaped parallel computer architecture (TUTNC) consisting of a root/master processing element (PE) (corresponding to “a master computation circuit”) and leaves/slave PEs (corresponding to “one or more slave computation circuits”), interconnected by a series of interconnecting nodes and communication network, where the collective architecture is considered as one large processing element (corresponding to “a computation circuit”). Hamalainen further teaches these elements in the TUTNC architecture are implemented as processing units (PUs) and communication units (CUs), where each of these elements are implemented with FPGA chips (Hamalainen p.448 Figure 1 and p.449 col.2 Tree Shape Architecture for Neural Computations, 1st paragraph – p.449 col.1 1st paragraph: “… The top of the tree is formed by a horizontal line of PEs, the trunk is composed of interconnecting nodes and the root may be a PE or an interface to another system … this architecture can be referred to as a master-slave configuration, where the root acts as a master and the top PEs as slaves. The root may be only an initiator for PEs in the beginning of execution, or it may feed PEs continuously in real-time. … the communication network as a whole becomes an active 'PE'.”; and p.452 Figure 6 and p.451 col.1 1st paragraph – col.2 2nd paragraph: TUTNC Implementation).), 
wherein the master computation circuit configured to: 
receive one or more groups of MNN data (Examiner’s note: Hamalainen teaches the TUTNC architecture supporting two parallelism modes (node parallelism and weight parallelism), with both providing input and weight data through broadcasting (node parallelism) or through a distributed assignment of inputs and weights to the remaining PEs (weight parallelism), where in both modes the master PE receives collectively the input vector and weight inputs as “one or more groups of MNN data” (Hamalainen p.449 Figure 2 and p.449 col.1 3rd paragraph - col.2 2nd paragraph: “In the node parallel mapping … the input vector is the same for all neurons … The communication network broadcasts the input vector from the root to the PEs … In the weight parallelism the calculation of a single neuron output is distributed to all elements in the tree. Each PE is assigned only one input (and weight) of the neuron. Due to this, the input vector is now delivered element by element to PEs, such that the leftmost PE gets the first element and so on. This cannot be done by broadcasting, so the root has to write elements individually. The task of the PE is simply to multiply the input element with a weight assigned to that neuron input.”; p.450 Figure 3; and p.451 Figure 4).) …
… wherein the one or more groups of MNN data include input data and one or more weight values (Examiner’s note: As indicated earlier, Hamalainen teaches the TUTNC architecture receiving input vector and weight data, where the input vector data broadcasted by the master PE to the slave PEs determines the selection and adjustment of the weights assigned to each PE during neural network computations, such that the input vector x traversing through each PE represents both the input data and the weight data assigned to each PE, thus corresponding to “one or more groups of MNN data include input data and one or more weight values” (Hamalainen p.451 Figure 4 and p.449 col.2 2nd paragraph – p.450 col.1 2nd paragraph: “… The task of the PE is simply to multiply the input element with a weight assigned to that neuron input. Now the communication network sums these weighted inputs, as illustrated in Figure 4. … We have found it useful to let the communication network be an adder tree or a broadcasting medium. … The idea is to let the weights of the neurons self-organize according to the topology of the input data. Practically this is done by first seeking the closest weight with respect to the input vector and then adjusting that weight towards the input vector. … Each PE is mapped to one neuron (or in fact weight), and the communication network is used to broadcast the input vector to all PEs.”).) …
… transmit the MNN data to an interconnection circuit (Examiner’s note: As indicated earlier, Hamalainen teaches communication units (CUs) as the interconnecting nodes in the TUTNC architecture. Hamalainen further teaches the bus structure between the CUs and PUs form a communication network, where the C-bus, A-bus, and D-bus represent command, address, and data buses that transmit data and commands between PUs and CUs within the tree shape architecture, and where collectively the CUs and the bus/communication network represent an interconnection circuit (Hamalainen p.452 Figure 6 and p.451 col.1 3rd paragraph – col.2 3rd paragraph: “… the interconnecting nodes in the trunk of the tree are called communication units (CUs) … The CUs are routing switch elements with a reduced set of arithmetic and logical functions. …”; and p.454 Figure 9 and p.453 col.2 2nd paragraph – p.454 cols.1 and 2).) …
… wherein the one or more slave computation circuits configured to …
… receive the one or more groups of MNN data (Examiner’s note: As indicated earlier, Hamalainen teaches the TUTNC architecture supporting two parallelism modes (node parallelism and weight parallelism), where in both modes the receiving PEs (“one or more slave computation circuits”) receive data corresponding to the input vector and weight inputs (“one or more groups of MNN data”) (Hamalainen p.449 col.1 3rd paragraph - col.2 2nd paragraph; p.449 Figure 2; p.450 Figure 3; and p.451 Figure 4).) …
… calculate one or more groups of slave output values (Examiner’s note: Hamalainen teaches for both parallelism modes, the receiving PEs (“one or more slave computation circuits”) perform a neural network computation involving a calculation of an intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    at each neural network layer, where the intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    corresponds to “one or more groups of slave output values” (Hamalainen p.449 col.1 2nd paragraph: “… consider a mapping of the layered perceptron neural network using node and weight parallel mapping styles. The task of the perception is to calculate a thresholded output from the sum of weighted inputs                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     = f(                        
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            j
                                            i
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    ), where                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     is the output of the ith neuron in a layer,                         
                            
                                
                                    w
                                
                                
                                    j
                                    i
                                
                            
                        
                     denotes a weight and                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     is an item of the input vector.”; p.449 Figure 2, p.450 Figure 3; and p.451 Figure 4).) … 
… wherein the master computation circuit is further configured to: …
… calculate a merged intermediate vector (Examiner’s note: Hamalainen teaches for both parallelism modes, the root (represented by a master PE corresponding to a “master computation circuit”) receives the outputs produced by the slave PEs for each network layer i and generates and stores an intermediate vector for use by the PEs to calculate the output for the next layer i+1, where the calculation of this intermediate vector at each network layer i corresponds to “calculating a merged intermediate vector”. Hamalainen further teaches this calculation and generation of intermediate vectors repeats until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph; p.449 Figure 2; p.450 Figure 3; and p.451 Figure 4).) … 
… generate an output vector based on the merged intermediate vector (Examiner’s note: As indicated earlier, Hamalainen teaches for both parallelism modes, the master PE at the root receives the outputs produced by the slave PEs for each network layer i and generates and stores an intermediate vector for use by the PEs to calculate the output for the next layer i+1, where the generation of this intermediate vector at each network layer i corresponds to “calculating a merged intermediate vector”. Hamalainen further teaches this calculation and generation of intermediate vectors repeats until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network, where the calculation of the output vector at the final layer corresponds to “generate an output vector based on the merged intermediate vector” (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph; p.449 Figure 2; p.450 Figure 3; and p.451 Figure 4).) …
… a controller circuit configured to transmit one or more micro-instructions to the master computation circuit and the one or more slave computation circuits (Examiner’s note: As indicated earlier, Hamalainen teaches the CUs and the C, A, and D-bus/communication network collectively representing an interconnection circuit (Hamalainen p.452 Figure 6, p.451 col.1 3rd paragraph – col.2 3rd paragraph; p.452 Figure 6; and p.454 Figure 9 and p.453 col.2 2nd paragraph – p.454 cols.1 and 2). Hamalainen further teaches a control unit (CRU) within a PU connecting the C, A, and D-buses to other PUs and CUs via the internal data and address buses, where this communication network effectively transmits data and commands between all PUs and CUs (i.e., to/from a DSP within a PU to perform arithmetic logical operations), including the master PE at the root. A person having ordinary skill in the art would understand that the commands transmitted to a DSP on this bus/communication network represent micro-instructions, and hence this CRU corresponds to “a controller circuit configured to transmit one or more instructions to the master computation circuit and the one or more slave computation circuits” (Hamalainen p.455 Figure 11 and col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph: “The block diagram of the PU is shown in Figure 11. The three basic parts are the control unit (CRU), the digital signal processor (DSP) and the random-access memory (RAM). They are connected together with the address bus, data bus, control signals and status signals. The CRU is also connected to the C-, A, and D-buses. … The main function of the CRU is to decode commands from the C-bus, arbitrate memory accesses, and control the operation of the DSP.”).).  
While Hamalainen teaches a DSP circuit within each processing unit (Hamalainen p.455 Figure 11) with the capability to perform fixed point arithmetic (Hamalainen p.455 col.2 Processing Unit, 1st paragraph – p.456 col.1 1st paragraph), Hamalainen does not explicitly teach
… wherein at least a portion of the input data and the weight values represents one or more predetermined discrete values …
… calculate one or more groups of slave output values based on a data type of each of the one or more groups of MNN data …
… calculate a merged intermediate vector based on the data type of each of the one or more groups of MNN data …
Henry teaches
… wherein at least a portion of the input data and the weight values represents one or more predetermined discrete values (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0007], the term “discrete values” is used in an example to describe “… discrete values (e.g., -1, -1/8, 1/8, and 1)”, where these values of -1, -1/8, 1/8, and 1 broadly recite integer or fixed-point values, and hence in the context of this limitation, this limitation broadly recites that the input data and the weight values are instances of “predetermined discrete values”, where the term “predetermined” broadly recites an association of those discrete values as being input data and weight values for training a neural network. Henry teaches a processing core within a multi-core processor, where the processing core contains a neural network unit (NNU) that includes a plurality of neural processing units (NPUs), where the NNU (corresponding to a “computation circuit”) contains data and weight RAM that stores input data and weight data (where a sequencer assigns and selects each row of weight or data RAM to each NPU within the NNU, and as such this assignment of data in the respective RAMS to respective NPUs corresponds to the aspect of “predetermining” the input data and weight values). Henry further teaches each NPU (representing neurons in a neural network performing arithmetic operations, respectively corresponding to a “master computation circuit” and “one or more slave computation circuits”) performing arithmetic operations such as add, multiply, accumulate operations on the singular value weight word or data word stored in the weight or data RAM using an arithmetic logic unit (ALU), where the ALU supports arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations). Hence, the operations performed within each NPU on dedicated input data and weight values in corresponding RAM locations in the NNU correspond to “wherein at least a portion of the input data and the weight values represents predetermined discrete values” (Henry Figure 1 and [0051]: “The NNU 121 includes a weight random access memory (RAM) 124, a data RAM 122, N neural processing units (NPUs) 126 … a sequencer 128 … The weight RAM 124 is arranged as W rows of N weight words, and the data RAM 122 is arranged as D rows of N data words. ”; [0054]: “The sequencer 128 also generates a memory address 125 and a read command for provision to the weight RAM 124 to select one of the W rows of N weight words for provision to the N NPUs 126. … The sequencer 128 also generates a memory address 123 and a write command for provision to the data RAM 122 to select one of the D rows of N data words for writing from the N NPUs 126.”; Figure 2 and [0059]-[0060]: “… the NPU 126 … is configured to (1) receive an input value from each neuron having a connection to it … (2) multiply each input value by a corresponding weight value associated with the connection … (3) add all the products to generate a sum; and (4) perform an activation function on the sum to generate the output of the neuron … The NPU 126 includes a register 205, a 2-input multiplexed register (mux-reg) 208, an arithmetic logic unit (ALU) 204 … The register 205 receives a weight word 206 from the weight RAM 124 …”, [0062]: “… although the weight word 203 and the data word 209 are the same size (in bits), they may have different binary point locations … Preferably, the multiplier 242 and adder 244 are integer multipliers and adders … However, it should be understood that in other embodiments the ALU 204 performs floating-point operations.”, and [0064]).) …
… calculate one or more groups of slave output values based on a data type of each of the one or more groups of MNN data (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0005]-[0008], continuous data is described as data involving more computational resources (such as 32-bit floating-point numbers) than discrete data, and hence in the context of this limitation, this limitation broadly recites calculating one or more groups of output values using either floating-point (“continuous”) or integer and fixed-point (“discrete”) operations. As indicated earlier, Henry teaches each NPU performing arithmetic operations on the singular value weight word or data word stored in the weight or data RAM using an arithmetic logic unit (ALU), where the ALU is controlled by a control register that supports arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed point hardware-assist logic. Hence, these arithmetic operations are based on data types of each of the one or more groups of MNN data, where these arithmetic operations are performed in the context of generating sums of multiply-accumulate operations of input data and weight values at each neural network layer, such that each of these multiply-accumulate operations of input data and weight values represent one or more groups of slave output values based on integer, fixed-point, or floating-point operations (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).) …
… calculate a merged intermediate vector based on the data type of each of the one or more groups of MNN data (Examiner’s note: As indicated earlier, Henry teaches each NPU performing arithmetic operations on the singular value weight word or data word stored in the weight or data RAM using an arithmetic logic unit (ALU), where the ALU is controlled by a control register that supports arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed point hardware-assist logic. Hence, these arithmetic operations are based on data types of each of the one or more groups of MNN data, where these arithmetic operations are performed in the context of generating sums of multiply-accumulate operations of input data and weight values at each neural network layer, such that each of these sums of multiply-accumulate operations of input data and weight values at each neural network layer represent merged intermediate vectors based on based on integer, fixed-point, or floating-point operations (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).) …
Both Hamalainen and Henry are analogous art since they both teach performing neural network computations using neural network hardware architectures.
It would have been obvious to a person having ordinary skill in the art to substitute the DSP circuitry (found within each master and slave PEs) taught in Hamalainen with the NPU/ALU circuitry that performs arithmetic logic unit functions taught in Henry, since the NPU/ALU circuitry taught in Henry provides the same integer, fixed-point, and floating-point operations required for performing neural network data computations that would produce the same predictable computation results.
Regarding amended Claim 2, 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 1, 
wherein the interconnection circuit is configured to combine the one or more groups of slave output values to generate one or more intermediate result vectors (Examiner’s note: As indicated earlier, Hamalainen teaches a communication network, where the C-bus, A-bus, and D-bus represent command, address, and data buses that transmit data and commands between PUs and CUs within the tree shape architecture, and where collectively the CUs and the bus/communication network represent an interconnection circuit (Hamalainen p.451 col.1 3rd paragraph – col.2 3rd paragraph; p.452 Figure 6; and p.454 Figure 9, p.453 col.2 2nd paragraph – p.454 cols.1 and 2). Hamalainen additionally teaches each CU has a reduced set of arithmetic and logical functions to perform comparison, subtraction, summation operations (Hamalainen p.451 col.2 2nd paragraph). As indicated earlier, Hamalainen teaches for both parallelism modes, the root (represented by a master PE corresponding to a “master computation circuit”) receives the outputs produced by the slave PEs for each network layer i and generates and stores an intermediate vector for use by the PEs to calculate the output for the next layer i+1, where the calculation of this intermediate vector at each network layer i corresponds to “calculating a merged intermediate vector”. Hamalainen further teaches this calculation and generation of intermediate vectors repeats until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network, such that this process involving the slave PEs and the CUs in the TUTNC architecture to calculate and generate these intermediate vectors corresponds to a process where the “interconnection circuit is configured to combine the one or more groups of slave output values to generate one or more intermediate result vectors” (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph; p.449 Figure 2; p.450 Figure 3; and p.451 Figure 4).).  
Regarding amended Claim 3, 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 1, 
wherein the one or more slave computation circuits are configured to parallelly calculate the one or more groups of slave output values based on the input data and the weight values (Examiner’s note: As indicated earlier, Hamalainen teaches the TUTNC architecture supporting two parallelism modes, with both providing input and weight data through broadcasting (node parallelism) or through a distributed assignment of inputs and weights to the PEs (weight parallelism), where in both modes the master PE receives collectively the input vector and weight inputs as “one or more groups of MNN data”, and where the receiving PEs multiply the input with a weight in a parallel fashion, such that the output of this multiply operation represents an output value. As indicated earlier, Hamalainen teaches for both parallelism modes, the receiving PEs (“one or more slave computation circuits”) perform a neural network computation involving this multiplication calculation of an intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    at each neural network layer, and as such, this intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    corresponds to “one or more groups of slave output values based on the input data and the weight values” (Hamalainen p.449 Figure 2; p.449 col.1 2nd paragraph; p.449 col.1 3rd paragraph - col.2 2nd paragraph; p.450 Figure 3; and p.451 Figure 4).).  
Regarding amended Claim 4, 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 1, 
wherein the master computation circuit is configured to perform one operation selected from a group consisting of: 
adding a bias value to the merged intermediate vector; 
activating the merged intermediate vector with an activation function (Examiner’s note: Hamalainen teaches performing thresholding of the output from the PEs using a non-linear function (Hamalainen p.449 col.1 2nd paragraph: “ … Thresholding is usually performed with a nonlinear function, e.g., sigmoid or hyperbolic tangent.”). Henry further teaches that these non-linear functions are used as activation functions to normalize the accumulated sum of products at each neural network layer, where this accumulated sum of products at each neural network layer represents the merged intermediate vector that is calculated by the neurons at each network layer (Henry [0064]: “Generally speaking, the activation function in a neuron of an intermediate layer of an artificial neural network may serve to normalize the accumulated sum of products, preferably in a non-linear fashion. … The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify).”).) … 
… wherein the activation function is a function selected from the group consisting of non-linear sigmoid, tanh, relu, and softmax (Examiner’s note: As indicated earlier, Hamalainen teaches non-linear functions (sigmoid, hyperbolic tangent) performing activation thresholding. Henry additionally teaches additional activation functions (Henry [0064]: “Generally speaking, the activation function in a neuron of an intermediate layer of an artificial neural network may serve to normalize the accumulated sum of products, preferably in a non-linear fashion. … The activation functions may include, but are not limited to, a step function, a rectify function, a sigmoid function, a hyperbolic tangent (tanh) function and a softplus function (also referred to as smooth rectify).”).) … 
outputting a predetermined value based on a comparison between the merged intermediate vector and a random number; and 
[[
Regarding amended Claim 5, 
 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 1, 
wherein the interconnection circuit is connected to the master computation circuit and the one or more slave computation circuits (Examiner’s note: As indicated earlier, Hamalainen teaches a tree-shaped parallel computer architecture (TUTNC) consisting of a root/master processing element (PE) (“a master computation circuit”) and leaves/slave PEs (“one or more slave computation circuits”), interconnected by a series of interconnecting nodes and communication network, where the collective architecture is considered as one large processing element (corresponding to “a computation circuit”). Hamalainen further teaches these elements in the TUTNC architecture are implemented as processing units (PUs) and communication units (CUs), where each of these elements are implemented with FPGA chips (Hamalainen p.448 Figure 1 and p.449 col.2 Tree Shape Architecture for Neural Computations, 1st paragraph – p.449 col.1 1st paragraph; and p.452 Figure 6 and p.451 col.1 1st paragraph – col.2 2nd paragraph: TUTNC Implementation). As indicated earlier, Hamalainen teaches a bus/communication network representing an interconnection circuit connecting the PUs, and hence corresponding to “the interconnection circuit is connected to the master computation circuit and the one or more slave computation circuits” (Hamalainen p.452 Figure 6, p.451 col.1 3rd paragraph – col.2 3rd paragraph: “…the interconnecting nodes in the trunk of the tree are called communication units (CUs) … The CUs are routing switch elements with a reduced set of arithmetic and logical functions. …”; p.452 Figure 6; and p.454 Figure 9 and p.453 col.2 2nd paragraph – p.454 cols.1 and 2).) …
… exchanges data between the master computation circuit and the one or more slave computation circuits, wherein the data includes the one or more groups of MNN data, (Examiner’s note: As indicated earlier, this limitation exhibits a 112(a) lack of written description issue, and hence, for purposes of examination, the identified new matter (“… the data type of each of the one or more groups of MNN data …”) will not be given any patentable weight. Under its broadest reasonable interpretation, this limitation broadly recites some form of data communication that includes the input vector and weight data and the calculated slave output values between the master computation circuit and the one or more slave computation circuits. As indicated earlier, Hamalainen teaches the TUTNC architecture receiving input vector and weight data, where the input vector data broadcasted by the master PE to the slave PEs determines the selection and adjustment of the weights assigned to each PE during neural network computations, such that the input vector x traversing through each PE represents both the input data and the weight data assigned to each PE, where the broadcast corresponds to data communication that “… exchanges data between the master computation circuit and the one or more slave computation circuits, wherein the data include one or more groups of MNN data”. As indicated earlier, Hamalainen teaches for both parallelism modes, the receiving PEs (“one or more slave computation circuits”) perform a neural network computation involving a calculation of an intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    at each neural network layer, where the intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    corresponds to “one or more groups of slave output values”, and these intermediate output values produced by the slave PEs are received by the root (represented by a master PE corresponding to a “master computation circuit”), and for each network layer i the root/master PE generates and stores an intermediate vector for use by the PEs to calculate the output for the next layer i+1, such that the calculation of this intermediate vector at each network layer i requires the input vector, weight data, and the intermediate output values being exchanged between the master and slave PEs (Hamalainen p.449 Figure 2; p.449 col.1 2nd paragraph; p.449 col.1 3rd paragraph - col.2 2nd paragraph; p.449 col.2 2nd paragraph – p.450 col.1 2nd paragraph; p.450 Figure 3; and p.451 Figure 4).).
Regarding amended Claim 6, 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 1, wherein the master computation circuit includes: 
a master neuron caching circuit configured to temporarily store the input data and the output vector (Examiner’s note: Henry teaches performing data transfers between the NNU and a memory subsystem that includes a memory management unit and associated cache memory, where the memory management unit performs logical separation and management of the cache memory hierarchy to different NPUs within the NNU to store input data, weight data, and merged intermediate and output vectors. Hence the combination of Hamalainen and Henry teaches this cache memory hierarchy (where this cache memory hierarchy is associated with each NPU performing the arithmetic processing functionality of master and slave PEs) represents circuitry corresponding to “a master neuron caching circuit configured to temporarily store the input data and the output vector” (Henry Figure 1 and [0051]: “ … the memory subsystem 114 includes a memory management unit (not shown), which may include … a level-I data cache (and the instruction cache 102), a level-2 unified cache, and a bus interface unit that interfaces the processor 100 to system memory … the processor 100 of FIG. 1 is representative of a processing core that is one of multiple processing cores in a multi-core processor that share a last-level cache memory. …”; and [0057]: “ … the large memory hierarchy of the memory subsystem 114, including the cache memories, provides very high data bandwidth for the transfers between the system memory and the NNU 121. … the memory subsystem 114 includes hardware data prefetchers that track memory access patterns, such as loads of neural data and weights from system memory, and perform data prefetches into the cache hierarchy to facilitate high bandwidth and low latency transfers to the weight RAM 124 and data RAM 122.”).); and 
a master computation circuit configured to perform one of one or more operations that corresponds to the data type of each of the one or more groups of MNN data (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0005]-[0008], continuous data is described as data involving more computational resources (such as 32-bit floating-point numbers) than discrete data, and hence in the context of this limitation, this limitation broadly recites performing one or more operations corresponding to values using either floating-point (“continuous”) or integer and fixed-point (“discrete”) operations. As indicated earlier, Hamalainen teaches a communication network transmitting data and commands between all PUs and CUs (i.e., to/from a DSP within a PU to perform arithmetic logical operations), and the master PE at the root (“master computation circuit”), where the DSP present at each PU (where these PUs represent the the master and slave PEs) performing arithmetic logical operations correspond to “a master computation circuit configured to perform one or more operations …” (Hamalainen p.455 Figure 11 and col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph). Henry teaches a control register for the NNU containing hardware configuration that informs each NPU within the NNU the configuration for supporting arithmetic operations involving signed values, decimal point values, rounding, bit shifting, all of which that can be used to support integer, fixed-point, floating-point values representing both discrete and continuous data for both input data and weight data (corresponding to a “data type for each of the one or more groups of MNN data”). The control signals for the control register are generated by the sequencer on the NNU or through or NPU micro-instructions stored in media registers. Collectively, the media registers, the control register, and the sequencer determines the data type for the received data at the master computation circuit (Henry Figure 1 and Figure 29A; [0223]-[0235]; [0055]; [0235]). As indicated earlier, Henry teaches each NPU performing arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed point hardware-assist logic. Hence, the combination of Hamalainen and Henry, where these arithmetic operations are performed in an ALU to generate sums of multiply-accumulate operations of input data and weight values at each neural network layer (where the ALU has the capability to handle integer, fixed-point, and floating-point operations), represents circuitry performing one or more operations that corresponds to the data type of each of the one or more groups of MNN data (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).),
wherein the one or more operations at least include:
arithmetic operations for the discrete values (Examiner’s note: As indicated earlier, Henry teaches each NPU performing arithmetic operations on the singular value weight word or data word stored in the weight or data RAM using an arithmetic logic unit (ALU), where the ALU is controlled by a control register that supports arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed point hardware-assist logic. Hence the arithmetic operations in the ALU for integer and fixed-point values corresponds to arithmetic operations for the discrete values (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).), and
bit manipulation operations for hybrid data that include both the discrete values and continuous values (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites bit manipulation operations performed for both discrete and continuous values. As indicated earlier, Henry teaches each NPU performing arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed point hardware-assist logic. Henry further teaches the ALU control register contains a shift amount field that controls the number of right shifts in the accumulator register to divide by a power of two, where this shift amount field corresponds to bit manipulation operations performed in the ALU that supports integer and fixed-point (“discrete data”), and floating-point operations (“continuous data”) (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]: “… The shift amount 2944 specifies a number of bits that a shifter of the AFU 212 shifts the accumulator 202 value 217 right to accomplish a divide by a power of two …”).).  
Regarding amended Claim 7, 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 1, wherein the master computation circuit includes 
a master data dependency relationship determination circuit configured to prevent a micro-instruction from being executed based on a determination that a conflict exists between the micro-instruction and other micro-instructions (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites circuitry that logically manages the flow of micro-instructions within a processing element. As indicated earlier, Henry teaches a stalling mechanism within each NPU, where the stalling mechanism prevents the NPU from reading the weight RAM in order to enable a buffer to properly write into the weight RAM. Henry further teaches this stalling mechanism is implemented using a rename unit, associated reservation stations, and the NPU pipeline (Henry Figure 1 and [0046]-[0050]; Figure 34 and [0273]) such that collectively the above recited circuitry corresponds to a “master data dependency relationship determination circuit”. Hence the combination of Hamalainen and Henry teaches this stalling mechanism (where this stalling mechanism is present in each NPU performing the arithmetic processing functionality of each master and slave PE, and where this stalling mechanism preserves and allows the sequential ordering of reading and writing operations) represents circuitry corresponding to a “master data dependency relationship determination circuit configured to prevent a micro-instruction from being executed based on a determination that a conflict exists between the micro-instruction and other micro-instructions” (Henry [0083]: “ … multiple clock cycles are required to read the data words and weight words from the data RAM 122 and weight RAM 124 to perform the multiply-accumulate instruction at address 1 of FIG. 4; however, the data RAM 122 and weight RAM 124 and NPUs 126 are pipelined such that once the first multiply-accumulate operation is begun (e.g., as shown during clock 1 of FIG. 5), the subsequent multiply accumulate operations (e.g., as shown during clocks 2-512) are begun in successive clock cycles. Preferably, the NPUs 126 may briefly stall in response to an access of the data RAM 122 and/or weight RAM 124 by an architectural instruction, e.g., MTNN or MFNN instruction (described below with respect to FIGS. 14 and 15) or a microinstruction into which the architectural instructions are translated.”; and [0099]: “ … assuming an embodiment that includes a write and read buffer such as the buffer 1704 of FIG. 17, concurrently with the NPU 126 reads, the processor 100 writes the weight RAM 124 such that the buffer 1704 performs one write to the weight RAM 124 approximately every 16 clock cycles to write the weight words. Thus, in a single-ported embodiment of the weight RAM 124 (such as described with respect to FIG. 17), approximately every 16 clock cycles, the NPUs 126 must be stalled from reading the weight RAM 124 to enable the buffer 1704 to write the weight RAM 124.”).).  
Regarding amended Claim 8, 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 1, wherein each of the slave computation circuit includes a slave computation circuit configured to 
receive one or more groups of micro-instructions from the controller circuit (Examiner’s note: As indicated earlier, Hamalainen teaches the TUTNC architecture receiving input vector and weight data, where the input vector data broadcasted by the master PE (“a master computation circuit”) to the slave PEs (“one or more slave computation circuits”) determines the selection and adjustment of the weights assigned to each PE during neural network computations, such that the input vector x traversing through each PE represents both the input data and the weight data assigned to each PE. As indicated earlier, Hamalainen teaches the master and slave PEs are interconnected by a series of interconnecting nodes and a communication network transmitting data and commands between all PUs and CUs (i.e., to/from a DSP within a PU to perform arithmetic logical operations), including the master PE at the root. A person having ordinary skill in the art would understand that the commands transmitted to and from the DSP on this bus/communication network represent micro-instructions, and hence the commands transmitted on this bus/communication network to the slave PEs/PU elements corresponds to “receive one or more groups of micro-instructions from the controller circuit” (Hamalainen p.455 Figure 11 and col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph). Henry additionally teaches translated architectural instructions (parallel move instructions or multiply-accumulate rotate) that represent micro-instruction commands that are sent to each NPU to perform arithmetic operations or transfer data between NPUs. Hence the combination of Hamalainen and Henry that teaches these additional micro-instructions (where these micro-instructions are sent to each NPU circuitry representing the arithmetic processing functionality of each master and slave PE taught in Hamalainen) further represent where the slave computation circuits receive one or more groups of micro-instructions from the controller circuit (Henry Figure 4, [0045]; [0070]-[0077]).) …
… to perform arithmetic logical operations that respectively correspond to the data type of the MNN data (Examiner’s note: As indicated earlier, Henry teaches a control register within each NPU that controls arithmetic operations involving signed values, decimal point values, rounding, bit shifting, all of which that can be used to support integer, fixed-point, floating-point values representing both discrete and continuous data for both input data and weight data (corresponding to a “data type of the MNN data”). The control signals for the control register are generated by the sequencer on the NNU or through or NPU micro-instructions stored in media registers. Collectively, the media registers, the control register, and the sequencer determines the data type for the received data at each slave computation circuit (Henry Figure 1 and Figure 29A; [0223]-[0235]; [0055]; [0235]). As indicated earlier, Henry teaches each NPU performs arithmetic operations (add, multiply, accumulate operations) on the singular value weight word or data word stored in the weight or data RAM using an arithmetic logic unit (ALU). Hence, these arithmetic operations performed in an ALU to generate sums of multiply-accumulate operations of input data and weight values at each neural network layer (where the ALU has the capability to handle integer, fixed-point, and floating-point operations) represents arithmetic logical operations that respectively correspond to the data type of the MNN data (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).) …
… a slave data dependency relationship determination circuit configured to perform data exchange operations based on a determination that no conflict exists between the data exchange operations (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites circuitry that logically manages the flow of micro-instructions within a processing element. As indicated earlier, Henry teaches a stalling mechanism within each NPU, where the stalling mechanism prevents the NPU from reading the weight RAM in order to enable a buffer to properly write into the weight RAM. Henry further teaches this stalling mechanism is implemented using a rename unit, associated reservation stations, and the NPU pipeline (Henry Figure 1 and [0046]-[0050]; Figure 34 and [0273]) such that collectively the above recited circuitry corresponds to a “slave data dependency relationship determination circuit”. Hence the combination of Hamalainen and Henry that teaches this stalling mechanism (where this stalling mechanism is present in each NPU performing the arithmetic processing functionality of each master and slave PE, and where this stalling mechanism preserves and allows the sequential ordering of reading and writing operations) represents circuitry corresponding to a “slave data dependency relationship determination circuit configured to prevent a micro-instruction from being executed based on a determination that a conflict exists between the data exchange operations” (Henry [0083]; and [0099]).) …
… wherein the data exchange operations include reception operations from the master computation circuit and transmission operations to the master computation circuit (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites transmitting and receiving commands to and from the master computation circuit. As indicated earlier, Hamalainen teaches the TUTNC architecture receiving input vector and weight data, where the input vector data broadcasted by the master PE (“a master computation circuit”) to the slave PEs (“one or more slave computation circuits”), where Hamalainen further teaches these elements are implemented as processing units (PUs) and communication units (CUs), with a communication network interconnecting the PUs and CUs transmitting data and commands between all PUs and CUs (i.e., to/from a DSP within a PU to perform arithmetic logical operations), including the master PE at the root. A person having ordinary skill in the art would understand that the commands transmitted to and from the DSP on this bus/communication network represent micro-instructions, and hence the commands transmitted on this bus/communication network corresponds to “data exchange operations include reception operations from the master computation circuit and transmission operations to the master computation circuit” (Hamalainen p.448 Figure 1 and p.449 col.2 Tree Shape Architecture for Neural Computations, 1st paragraph – p.449 col.1 1st paragraph; p.452 Figure 6 and p.451 col.1 1st paragraph – col.2 3rd paragraph: TUTNC Implementation; p.454 Figure 9 and p.453 col.2 2nd paragraph – p.454 cols.1 and 2; and p.455 Figure 11 and col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph).) …
… a slave neuron caching circuit configured to temporarily store the input data and the slave output values (Examiner’s note: As indicated earlier, Henry teaches performing data transfers between the NNU and a memory subsystem that includes a memory management unit and associated cache memory, where the memory management unit performs logical separation and management of the cache memory hierarchy to different NPUs within the NNU to store input data, weight data, and merged intermediate and output vectors. Hence the combination of Hamalainen and Henry that teaches this cache memory hierarchy (where this cache memory hierarchy is associated with each NPU performing the arithmetic processing functionality of each master and slave PE) represents circuitry corresponding to “a slave neuron caching circuit configured to temporarily store the input data and the slave output values” (Henry Figure 1 and [0051]; and [0057]).); and 
a weight value caching circuit configured to temporarily store the weight values (Examiner’s note: As indicated earlier, Henry teaches performing data transfers between the NNU and a memory subsystem that includes a memory management unit and associated cache memory), where the memory management unit performs logical separation and management of the cache memory hierarchy to different NPUs within the NNU to store input data, weight data, and merged intermediate and output vectors. Hence the combination of Hamalainen and Henry that teaches this cache memory hierarchy (where this cache memory hierarchy is associated with each NPU performing the arithmetic processing functionality of each master and slave PE) represents circuitry corresponding to “a weight value caching circuit configured to temporarily store the weight values” (Henry Figure 1 and [0051]; and [0057]).).  
Regarding amended Claim 9, 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 8, wherein the slave data dependency relationship determination circuit is configured to: 
determine whether there is a dependent relationship between a first micro-instruction which has not been executed and a second micro-instruction which is being executed (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites determining a dependent relationship between a micro-instruction that has not been executed and another micro-instruction currently being executed. As indicated earlier, Henry teaches a stalling mechanism within each NPU, where the stalling mechanism prevents the NPU from reading the weight RAM in order to enable a buffer to properly write into the weight RAM, such that this stalling mechanism that preserves and allows the sequential ordering of reading and writing operations represents circuitry corresponding to a “slave data dependency relationship determination circuit configured to prevent a micro-instruction from being executed based on a determination that a conflict exists between the micro-instruction and other micro-instructions”, where the conflict between a micro-instruction and other micro-instructions represents a dependent relationship (Henry [0083]; and [0099]). As indicated earlier, Henry further teaches this stalling mechanism is implemented using a rename unit, associated reservation stations, and the NPU pipeline (Henry Figure 1 and [0046]-[0050]; Figure 34 and [0273]) such that collectively the above recited circuitry corresponds to a “slave data dependency relationship determination circuit”. Henry additionally teaches the microcode unit sending the translated microinstructions to a selector to provision the rename unit, which checks if general-purpose and media registers are free in the physical register file, and either stalls the NPU pipeline if no registers are free in the physical register file, or allows the microinstructions that are held in the reservations stations to be issued, where the rename unit that checks the status of the general purpose and media registers represents circuitry actions that corresponds to “determine whether there is a dependent relationship between a first micro-instruction and a second micro-instruction being executed” (Henry [0049]-[0050]: “… the processor 100 includes a physical register file that includes more physical registers than the number of architectural registers, but does not include an architectural register file, and the reorder buffer entries do not include result storage. …The processor 100 also includes a pointer table with an associated pointer for each architectural register. For the operand of a microinstruction 105 that specifies an architectural register, the rename unit populates the destination operand field in the microinstruction 105 with a pointer to a free register in the physical register file. If no registers are free in the physical register file, the rename unit 106 stalls the pipeline. … The reservation stations 108 hold microinstructions 105 until they are ready to be issued to an execution unit 112/121 for execution. A microinstruction 105 is ready to be issued when all of its source operands are available and an execution unit 112/121 is available to execute it. The execution units 112/121 receive register source operands from the reorder buffer or the architectural register file in the first embodiment or from the physical register file in the second embodiment described above. …”).); and 
if there is no dependent relationship, allow the first micro-instruction to be executed immediately[[,]]; otherwise, if the dependent relationship exists, the first micro-instruction will not be allowed to execute until the execution of all micro-instructions (Examiner’s note: Henry additionally teaches the microcode unit sending the translated microinstructions to a selector to provision the rename unit, which checks if general-purpose and media registers are free in the physical register file, and either stalls the NPU pipeline if no registers are free in the physical register file, or allows the microinstructions that are held in the reservations stations to be issued, where the actions involving allowing the held micro-instructions in the reservations stations to be issued represent circuitry actions that corresponds to “if there is no dependent relationship, allow the first micro-instruction to be executed immediately”, and the actions involving stalling the NPU pipeline represent circuitry actions that correspond to “otherwise, the first micro-instruction will not be allowed to execute until the execution of all micro-instructions is completed” (Henry [0049]-[0050]).).  
Regarding amended Claim 10, 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 6, wherein the master computation circuit includes 
an operation determiner circuit configured to determine an operation to be performed based on the data type of the input data (Examiner’s note: As indicated earlier, Henry teaches the media registers, the control register, and the sequencer determines the data type for the received data at the NPU as well as determining the operations and functions for the input data, and hence the combination of Hamalainen and Henry teaching the media register, control register, and sequencer circuitry (where this circuitry is associated with each NPU that performs arithmetic processing functionality of each master and slave PE) represents circuitry corresponding to the “master computation circuit includes an operation determiner circuit configured to determine an operation to be performed based on the data type of the input data” (Henry Figure 1 and [0051], [0054]-[0055]: “… The sequencer 128 also generates control signals to the NPUs 126 to instruct them to perform various operations or functions, such as initialization, arithmetic/logical operations, rotate and shift operations, activation functions and write back operations, examples of which are described in more detail below (see, for example, micro-operations 3418 of FIG. 34).”; Figure 2 and [0059]-[0060], [0062], and [0064]; Figure 29A and [0223]-[0235]).); and 
a hybrid data circuit configured to process the hybrid data and perform the determined operation (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites circuitry that performs one or more operations corresponding to values using either floating-point (“continuous”) or integer and fixed-point (“discrete”) operations. As indicated earlier, Henry teaches each NPU performing arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed point hardware-assist logic. Hence the ALU performing arithmetic operations relating to integer and fixed-point values, and floating-point values using fixed point hardware-logic represents circuitry that corresponds to “a hybrid data circuit configured to process the hybrid data and perform the determined operation” (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).).  
Regarding amended Claim 11, 
Hamalainen in view of Henry teaches
The apparatus of claim 8, wherein the slave computation circuit includes 
an operation determiner circuit configured to determine an operation to be performed based on the data type of the input data (Examiner’s note: As indicated earlier, Henry teaches the media registers, the control register, and the sequencer determines the data type for the received data at the NPU as well as determining the operations and functions for the input data, and hence the combination of Hamalainen and Henry teaching the media register, control register, and sequencer circuitry (where this circuitry is associated with each NPU that performs arithmetic processing functionality of each master and slave PE) represents circuitry corresponding to the “slave computation circuit includes an operation determiner circuit configured to determine an operation to be performed based on the data type of the input data” (Henry Figure 1 and [0051], [0054]-[0055]: “… The sequencer 128 also generates control signals to the NPUs 126 to instruct them to perform various operations or functions, such as initialization, arithmetic/logical operations, rotate and shift operations, activation functions and write back operations, examples of which are described in more detail below (see, for example, micro-operations 3418 of FIG. 34).”; Figure 2 and [0059]-[0060], [0062], and [0064]; Figure 29A and [0223]-[0235]).); and 
a hybrid data circuit configured to process hybrid data that include both discrete values and continuous values and perform the determined operation (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites circuitry that performs one or more operations corresponding to values using either floating-point (“continuous”) or integer and fixed-point (“discrete”) operations. As indicated earlier, Henry teaches each NPU performing arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed point hardware-assist logic. Hence the ALU performing arithmetic operations relating to integer and fixed-point values, and floating-point values using fixed point hardware-logic represents circuitry that corresponds to “a hybrid data circuit configured to process the hybrid data and perform the determined operation” (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).).  
Regarding amended Claim 12, 
Hamalainen in view of Henry teaches
(Currently Amended) The apparatus of claim 10, wherein the master computation circuit further includes 
a data type determiner circuit configured to determine the data type of the input data (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites circuitry that logically manages and determines the data type of the input data. Henry teaches a control register for the NNU containing hardware configuration that informs each NPU within the NNU the configuration for supporting arithmetic operations involving signed values, decimal point values, rounding, bit shifting, all of which that can be used to support integer, fixed-point, and floating-point values representing both discrete and continuous data for both input data and weight data (corresponding to a “data type of the input data”). The control signals for the control register are generated by the sequencer on the NNU or through or NPU micro-instructions stored in media registers. Collectively, the media registers, the control register, and the sequencer logically act as a “data type determiner” for the master computation circuit (Henry Figure 1 and Figure 29A; [0223]-[0235]; [0055]; [0235]).); and 
at least one of a discrete data circuit or a continuous data circuit, 
wherein the discrete data circuit is configured to process the input data based on a determination that the input data is stored as discrete values (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0005]-[0008], continuous data is described as data involving more computational resources (such as 32-bit floating-point numbers) than discrete data, and hence in the context of this limitation, this limitation broadly recites calculating one or more groups of output values using either floating-point (“continuous”) or integer and fixed-point (“discrete”) operations. As indicated earlier, Henry teaches each NPU performing arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed point hardware-assist logic. Hence, these arithmetic operations are based on data types of each of the one or more groups of MNN data, where these arithmetic operations are performed in the context of generating sums of multiply-accumulate operations of input data and weight values at each neural network layer, and as such the ALU with the capability to handle integer and fixed-point operations corresponds to a “discrete data circuit” (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).), and 
wherein the continuous data circuit is configured to process the input data based on a determination that the input data is stored as continuous values (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0005]-[0008], continuous data is described as data involving more computational resources (such as 32-bit floating-point numbers) than discrete data, and hence in the context of this limitation, this limitation broadly recites calculating one or more groups of output values using either floating-point (“continuous”) or integer and fixed-point (“discrete”) operations. As indicated earlier, Henry teaches each NPU performing arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed-point hardware-assist logic. Hence, these arithmetic operations are based on data types of each of the one or more groups of MNN data, where these arithmetic operations are performed in the context of generating sums of multiply-accumulate operations of input data and weight values at each neural network layer, and as such the ALU with the capability to handle floating-point operations with fixed-point hardware-assist logic corresponds to a “continuous data circuit” (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).).  
Claims 13 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, October 1995 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry] as applied to Claim 1; in further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, March 17 2016 [hereafter referred as Miyashita].
Regarding amended Claim 13, 
Hamalainen in view of Henry as applied to Claim 1 teaches
(Currently Amended) The apparatus of claim 1, further comprising a data converter circuit configured to: 
… receive continuous data (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites circuitry that logically receives continuous data, which is interpreted as a continuous stream of data, where the data includes both floating-point, fixed-point, and integer values. As indicated earlier, Hamalainen teaches a tree-shaped parallel computer architecture (TUTNC) consisting of a root/master processing element (PE) (“a master computation circuit”) and leaves/slave PEs (“one or more slave computation circuits”), interconnected by a series of interconnecting nodes and communication network, where the collective architecture is considered as one large PE (“a computation circuit”), where data is continuously streamed into the parallel computer architecture in real-time (corresponding to the streaming aspect of “receive continuous data”), and where the collective architecture logically represents a single logical PE (Hamalainen p.448 Figure 1 and p.449 col.2 Tree Shape Architecture for Neural Computations, 1st – p.449 col.1 1st paragraph: “… the trunk is composed of interconnecting nodes and the root may be a PE or an interface to another system. … this architecture can be referred to as a master-slave configuration, where the root acts as a master and the top PEs as slaves. The root may be only an initiator for PEs in the beginning of execution, or it may feed PEs continuously in real-time. … the communication network as a whole becomes an active 'PE'.”).) …
… transmit the discrete data to the master computation circuit and the one or more slave computation circuits (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0005]-[0008], continuous data is described as data involving more computational resources (such as 32-bit floating-point numbers) than discrete data, and hence in the context of this limitation, this limitation broadly recites transmission of data representing either floating-point (“continuous”) or integer and fixed-point (“discrete”) values. As indicated earlier, Hamalainen teaches the PEs in the TUTNC architecture are implemented as processing units (PUs) and communication units (CUs), where each of these elements are implemented with FPGA chips (Hamalainen p.452 Figure 6 and p.451 col.1 1st paragraph – col.2 2nd paragraph: TUTNC Implementation). As indicated earlier, Hamalainen teaches a communication network transmitting data and commands between all PUs and CUs (i.e., to/from a DSP within a PU to perform arithmetic logical operations), and the master PE at the root. Hamalainen additionally teaches the DSP supports 16 bit, fixed point arithmetic operations, and as such, the transmission of data through the bus/communication network connecting the master PE and slave PEs corresponds to the transmission of data (containing discrete values) to the master computation circuit and the one or more slave computation circuits (Hamalainen p.452 Figure 6 and p.451 col.1 3rd paragraph – col.2 3rd paragraph; p.454 Figure 9 and p.453 col.2 2nd paragraph – p.454 cols.1 and 2;  p.455 Figure 11 and col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph: “The block diagram of the PU is shown in Figure 11. The three basic parts are the control unit (CRU), the digital signal processor (DSP) and the random-access memory (RAM). They are connected together with the address bus, data bus, control signals and status signals. … The DSP is Texas Instruments’ TMS320C25 … Its peak multiply-accumulate performance is 20 MIPS in 16 bit, fixed point arithmetic at 40MHz operation frequency …”).).  
While Hamalainen in view of Henry teaches supporting floating-point operations using fixed-point hardware assist (corresponding to the high-precision, floating-point aspect of “continuous data”), Hamalainen in view of Henry does not explicitly teach
… convert the continuous data to discrete data …
Miyashita teaches
… convert the continuous data to discrete data (Examiner’s note: Miyashita teaches applying a quantization method to convert input data/activation values and weight values (both from the set of real numbers and represented by floating and fixed point representations corresponding to “continuous data”), where the Quantize function performs this conversion from the floating/fixed-point representations to a logarithm representation (where this logarithm representation based on flooring the input or rounding to the nearest integer represents integer and fixed-point representations corresponding to “discrete data”). Miyashita further teaches this quantization method can be instantiated through hardware arithmetic operations such as bit-shifting and taking the floor of an input, or rounding to the nearest integer, and as such, this quantization method using hardware arithmetic operations logically represents data converter circuitry (Miyashita p.1 Abstract; p.2 Section 3: “Each convolutional and fully-connected layer of a network performs matrix operations that distill down to dot products                         
                            y
                            =
                            
                                
                                    w
                                
                                
                                    T
                                
                            
                            x
                        
                    , where x∈                        
                            
                                
                                    R
                                
                                
                                    n
                                
                            
                        
                     is the input, w∈                        
                            
                                
                                    R
                                
                                
                                    n
                                
                            
                        
                     the weights, and y the activations before being transformed by the non-linearity (e.g., ReLU). Using conventional digital hardware, this operation is performed using n multiply-and-add operations using floating or fixed point representations as shown in Figure 1(a). …”; and p.3 Figure 1(b), pp.2-3 Section 3.1 1st-2nd paragraphs: “The first proposed method as shown in Figure 1(b) is to transform one operand to its log representation, convert the resulting transformation back to the linear domain, and multiply this by the other operand. This is simply <refer to p.3 col.1 equation (1)> where                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                                ~
                            
                        
                     = Quantize(                        
                            
                                
                                    l
                                    o
                                    g
                                
                                
                                    2
                                
                            
                            (
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    )), Quantize (∙) quantizes ∙ to an integer, and Bitshift (a,b) is the function that bit-shifts a value a by an integer b in fixed-point arithmetic. In floating-point, this operation is simply an addition of b with the exponent part of a. … In order to quantize, we propose two hardware-friendly flavors. The first option is to simply floor the input. This method computes ⌊log2(w)⌋ by returning the position of the first 1 bit seen from the most significant bit (MSB). The second option is to round to the nearest integer, which is more precise than the first option ...”).) …
Both Hamalainen in view of Henry and Miyashita are analogous art since they both teach performing data computations in neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the real-time input data containing high-precision values of Hamalainen in view of Henry and perform the quantization method of Miyashita as a way to convert the streaming real-time data into discrete values for further computational processing in the neural network hardware architecture. The motivation to combine is taught in Miyashita, as quantization introduces a form of compression of the data set without significant deterioration in performance of the neural network, thereby also improving the performance and efficiency of the neural network hardware architecture by allowing these computations to be performed on less complex hardware without taking up additional computational resources (Miyashita p.1 col.1 Abstract: “Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance.”; and p.1 col.2 Section 1. Introduction: “In order for these large networks to run in real-time applications such as for mobile or embedded platforms, it is often necessary to use low-precision arithmetic and apply compression techniques.”).
Regarding amended Claim 15, 
Hamalainen in view of Henry, in further view of Miyashita teaches
(Currently Amended) The apparatus of claim 13, wherein the data converter circuit is configured to receive continuous data from an external storage device (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites an external device with separate storage transmitting continuous data into a system containing the data converter circuitry. As indicated earlier, Hamalainen teaches a tree-shaped parallel computer architecture (TUTNC) consisting of a root/master processing element (PE) (“a master computation circuit”) and leaves/slave PEs (“one or more slave computation circuits”), interconnected by a series of interconnecting nodes and a communication network (Hamalainen p.448 Figure 1 and p.449 col.2 Tree Shape Architecture for Neural Computations, 1st – p.449 col.1 1st paragraph). Hamalainen further teaches an external host computer performing data writes to a particular PU and performing broadcasting of input data to all PUs. A person having ordinary skill in the art would understand an external host computer will have separate memory from the TUTNC architecture, and hence this external host computer transmitting data to the PUs in the TUTNC architecture corresponds to an external storage device transmitting continuous data to the TUTNC architecture. The combination of Hamalainen in view of Henry and Miyashita references as recited in dependent Claim 13 further teaches that this TUTNC architecture contains data converter circuitry to perform the conversion of continuous data to discrete data (Hamalainen p.453 col.1 Modes of Communication, 1st paragraph – p.453 col.2 1st paragraph: “… The host can write data to a particular PU, broadcast data to all PUs or read data from the addressed PU.”; p.456 col.1 3rd paragraph – p.456 col.2 1st paragraph: “… The host computer can write data to the data register, where the DSP can read it. The DSP can also supply data to the data register, which can be read by the host. … The host can first write data and an interrupt request to the DSP, which, in turn, reads data in an interrupt service routine. The DSP can supply the data to the register and write a message to the status register, which is read by the host.”; p.452 Figure 6 and p.451 col.1 3rd paragraph – col.2 3rd paragraph; p.454 Figure 9 and p.453 col.2 2nd paragraph – p.454 cols.1 and 2; p.455 Figure 11 and col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph; and Miyashita p.1 Abstract; p.2 Section 3; p.3 Figure 1(b) and pp.2-3 Section 3.1 1st-2nd paragraphs).).  
Regarding amended Claim 16, 
Hamalainen in view of Henry as applied to Claim 1 teaches
(Currently Amended) The apparatus of claim 1, further comprising a data converter circuit configured to: 
receive continuous data from an external storage device (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites circuitry that logically receives continuous data. As indicated earlier, Hamalainen teaches a tree-shaped parallel computer architecture (TUTNC) consisting of a root/master processing element (PE) (“a master computation circuit”) and leaves/slave PEs (“one or more slave computation circuits”), interconnected by a series of interconnecting nodes and a communication network, where the collective architecture is considered as one large PE (“a computation circuit”), where data is continuously streamed into the parallel computer architecture in real-time (corresponding to the streaming aspect of “receive continuous data”), and where collective architecture logically represents a single logical PE (corresponding to “a data converter circuit”) (Hamalainen p.448 Figure 1 and p.449 col.2 Tree Shape Architecture for Neural Computations, 1st – p.449 col.1 1st paragraph). As indicated earlier, Hamalainen further teaches an external host computer performing data writes to a particular PU and performing broadcasting of input data to all PUs. A person having ordinary skill in the art would understand an external host computer will have separate memory from the TUTNC architecture, and hence this external host computer transmitting data to the PUs in the TUTNC architecture corresponds to an external storage device transmitting continuous data to the TUTNC architecture (Hamalainen p.453 col.1 Modes of Communication, 1st paragraph – p.453 col.2 1st paragraph; and p.456 col.1 3rd paragraph – p.456 col.2 1st paragraph).) …
… transmit the discrete data to the external storage device (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0005]-[0008], continuous data is described as data involving more computational resources (such as 32-bit floating-point numbers) than discrete data, and hence in the context of this limitation, this limitation broadly recites transmission of data representing either floating-point (“continuous”) or integer and fixed-point (“discrete”) values. As indicated earlier, Hamalainen teaches the PEs in the TUTNC architecture are implemented as processing units (PUs) and communication units (CUs), where each of these elements are implemented with FPGA chips (Hamalainen p.452 Figure 6 and p.451 col.1 1st paragraph – col.2 2nd paragraph: TUTNC Implementation). As indicated earlier, Hamalainen teaches a communication network transmitting data and commands between all PUs and CUs (i.e., to/from a DSP within a PU to perform arithmetic logical operations), and the master PE at the root. Hamalainen additionally teaches the DSP supports 16 bit, fixed point arithmetic operations, and as such, the transmission of data through the bus/communication network connecting the master PE and slave PEs corresponds to the transmission of data (containing discrete values) to the master computation circuit and the one or more slave computation circuits (Hamalainen p.452 Figure 6 and p.451 col.1 3rd paragraph – col.2 3rd paragraph; p.454 Figure 9 and p.453 col.2 2nd paragraph – p.454 cols.1 and 2;  p.455 Figure 11 and col.2 Processing Unit 1st paragraph – p.456 col.1 2nd paragraph). As indicated earlier, Hamalainen further teaches an external host computer performing data writes to a particular PU and performing broadcasting of input data to all PUs. A person having ordinary skill in the art would understand an external host computer will have separate memory from the TUTNC architecture, and hence this external host computer transmitting data to the PUs in the TUTNC architecture corresponds to an external storage device transmitting continuous data (containing discrete values) to the TUTNC architecture (Hamalainen p.453 col.1 Modes of Communication, 1st paragraph – p.453 col.2 1st paragraph; and p.456 col.1 3rd paragraph – p.456 col.2 1st paragraph)).  
While Hamalainen in view of Henry teaches supporting floating-point operations using fixed-point hardware assist (corresponding to the high-precision, floating-point aspect of “continuous data”), Hamalainen in view of Henry does not explicitly teach
… convert the continuous data to discrete data …
Miyashita teaches
… convert the continuous data to discrete data (Examiner’s note: As indicated earlier, Miyashita teaches applying a quantization method to convert input data/activation values and weight values (both from the set of real numbers and represented by floating and fixed point representations corresponding to “continuous data”), where the Quantize function performs this conversion from the floating/fixed-point representations to a logarithm representation (where this logarithm representation based on flooring the input or rounding to the nearest integer represents integer and fixed-point representations corresponding to “discrete data”). Miyashita further teaches this quantization method can be instantiated through hardware arithmetic operations such as bit-shifting and taking the floor of an input, or rounding to the nearest integer, and as such, this quantization method using hardware arithmetic operations logically represents data converter circuitry (Miyashita p.1 Abstract; p.2 Section 3; and p.3 Figure 1(b), pp.2-3 Section 3.1 1st-2nd paragraphs).) …
Both Hamalainen in view of Henry and Miyashita are analogous art since they both teach performing data computations in neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the real-time input data containing high-precision values of Hamalainen in view of Henry and perform the quantization method of Miyashita as a way to convert the streaming real-time data into discrete data values for further computational processing in the neural network hardware architecture. The motivation to combine is taught in Miyashita, as provided in the prior art claim mapping of Claim 13 recited above.
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over 
Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, October 1995 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry], in further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, March 17 2016 [hereafter referred as Miyashita] as applied to Claim 13; in even further view of Hassner et al., U.S. Patent 5,638,065, issued 6/10/1997 [hereafter referred as Hassner].
Regarding amended Claim 14, 
Hamalainen in view of Henry, in further view of Miyashita as applied to Claim 13 teaches
The apparatus of claim 13, wherein the data converter circuit includes 
a preprocessing circuit configured to clip a portion of the input data that is within a predetermined range to generate preprocessed data (Examiner’s note: As indicated earlier, Miyashita teaches applying a quantization method to convert input data/activation values and weight values (both from the set of real numbers and represented by floating and fixed point representations corresponding to “continuous data”), where the Quantize function performs this conversion from the floating/fixed-point representations to a logarithm representation (where this logarithm representation based on flooring the input or rounding to the nearest integer represents integer and fixed-point representations corresponding to “discrete data”). Miyashita further teaches this quantization method can be instantiated through hardware arithmetic operations such as bit-shifting and taking the floor of an input value, or rounding to the nearest integer, and as such, this quantization method using hardware arithmetic operations logically represents data converter circuitry, where the taking the floor of an input value represents preprocessing done on an input to clip a portion of the input data within a predetermined range (Miyashita p.1 Abstract; p.2 Section 3; and p.3 Figure 1(b), pp.2-3 Section 3.1 1st-2nd paragraphs). Miyashita further teaches additional examples representing clipping functions, such that the instantiation of these functions in hardware corresponds to a “preprocessing circuit configured to clip a portion of the input data that is within a predetermined range to generate preprocessed data” (Miyashita p.4 Section 4.1, equations (5), (6), and (7)).) …
However, Hamalainen in view of Henry, in further view of Miyashita does not teach
a distance calculator circuit configured to calculate multiple distance values between the preprocessed data and multiple discrete values; and 
a comparer circuit configured to compare the multiple distance values to output one or more of the multiple discrete values.  
Hassner teaches
a distance calculator circuit configured to calculate multiple distance values between the preprocessed data and multiple discrete values (Examiner’s note: Hassner teaches taking vectors W and W’ containing discrete values (corresponding to preprocessed data and multiple discrete values; Hassner col.5 lines 5-23) and inputting them into filter units implementing three linear functions F1, F2, F3, and performing calculations to maximize the Euclidean distance between those vectors (where maximizing the Euclidean distance is interpreted as performing an absolute value of the Euclidean distance between each of the discrete values within the two vectors), such that these filter units that maximizes the Euclidean distance between discrete values represents circuitry that corresponds to a “distance calculator circuit configured to calculate multiple distance values between the preprocessed data and multiple discrete values” (Hassner Figure 6, elements 30a, 30b; and col.5 lines 42-56: “Referring now to FIG. 6, the                         
                            
                                
                                    W
                                
                                
                                    '
                                
                            
                        
                     symbols [                        
                            
                                
                                    W
                                
                                
                                    5
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    6
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    7
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    8
                                
                            
                        
                    ] are routed in parallel via a delay unit 26 and an intersymbol interference subtraction unit 27 to a set 28 of analog matched filter units 30a-30d. Delay unit 26 delays the symbols [                        
                            
                                
                                    W
                                
                                
                                    5
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    6
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    7
                                
                            
                        
                     ,                         
                            
                                
                                    W
                                
                                
                                    8
                                
                            
                        
                    ] for one symbol period of four clock cycles duration. After this delay and the subtraction of intersymbol interference from the ' lookahead symbols [                        
                            
                                
                                    W
                                
                                
                                    5
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    6
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    7
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    8
                                
                            
                        
                    ] by unit 27 in the manner presently to be described, the lookahead symbols are transformed into W symbols [                        
                            
                                
                                    W
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    2
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    3
                                
                            
                        
                    ,                         
                            
                                
                                    W
                                
                                
                                    4
                                
                            
                        
                    ] for an updated current state of the channel. … The vectors                         
                            
                                
                                    W
                                
                                
                                    '
                                
                            
                        
                     and W are fed in parallel to the four analog matched filter units 30a-30d. These units 30a-30d calculate values of linear functions chosen to maximize Euclidean distances between vectors whose values are ambiguous in 55 order better to distinguish between them.”).); and 
a comparer circuit configured to compare the multiple distance values to output one or more of the multiple discrete values (Examiner’s note: Hassner teaches filter units containing comparator elements to perform comparisons against the vector values and the three linear functions in order to produce binary outputs 40a-40f, which are fed into a finite state machine to output a decoded symbol pattern (representing one or more of the multiple discrete values), such that these filter units containing comparator elements represent circuitry that corresponds to “a comparer circuit configured to compare the multiple distance values to output one or more of the multiple discrete values” (Hassner Figure 6, elements 28, 30a-30f, 20, 32; Figures 7A-7D, elements 36a-36f; and col.6 lines 1-16: “FIGS. 7A-7D shows in detail the binary decision outputs generated from the linear functions by the filter units 30a-30d, respectively. More specifically, units 30a-30d comprise filters 34a-34d, respectively, for implementing the linear functions F1, F2 , F3. The linear functions F1, F2 , F3 output from each filter 34 (e.g., 34a as shown in FIG. 7A) six comparators 36a-36f which compare the respective function values with respective identical preselected threshold values in each of the four units 30a-30d to generate three respective outputs which are ANDed at 38a-3Sf to provide binary outputs 40a-40f, respectively, to digital sequential finite-state machine 32. More specifically and, for example, the six outputs 40a-40f of matched filter unit 30a for states A-F, respectively, constitute the finite-state machine inputs for bit 1 of the four-bit pattern ....”).).  
Both Hamalainen in view of Henry, in further view of Miyashita and Hassner are analogous art since they both teach processing of discrete data in hardware.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the output from the clipping functions implemented in hardware (representing a preprocessing circuit) of Hamalainen in view of Henry, in further view of Miyashita and couple it as an input into the filter units of Hassner as a way to perform maximum-likelihood decisions (i.e., performing distance calculations between two vectors, generating one or more discrete values that most likely represents the preprocessed data) in hardware. The motivation to combine is taught in Hassner, since implementing this logic in hardware frees up the computational resources within each individual neural network processor unit to perform other complex neural network computations, thus improving the overall efficiency of the system (Hassner col.1 lines 10-16: “This invention relates to an apparatus and method for processing analog signals … and more particularly to an Apparatus and method for detecting multiple-bit symbols by (i) converting the analog signals into analog vectors using a linear Walsh transform, and making maximum-likelihood decisions using vector metric calculations which are determined by the selected run-length-limited modulation code and equalized linear channel response signal shape and are implemented by analog matched filters, analog comparators, and digital sequential finite-state machines matched to RLL-coded symbols.”).
Claims 17-27 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, October 1995 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry], in further view of Gilbert, Ira H., U.S. Patent 5,752,068, issued 5/12/1998 [hereafter referred as Gilbert].
Regarding amended Claim 17, 
Hamalainen teaches
(Currently Amended) A method for forward propagation of a multilayer neural network (MNN), comprising: 
receiving, by a master computation circuit of a computation circuit, one or more groups of MNN data (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites physical or logical structures (“circuits”) performing the recited functions. As indicated earlier, Hamalainen teaches a tree-shaped parallel computer architecture (TUTNC) consisting of a root/master processing element (PE) (corresponding to “a master computation circuit”) and leaves/slave PEs (corresponding to “one or more slave computation circuits”), interconnected by a series of interconnecting nodes and communication network, where the collective architecture is considered as one large processing element (corresponding to “a computation circuit”). As indicated earlier, Hamalainen further teaches these elements in the TUTNC architecture are implemented as processing units (PUs) and communication units (CUs), where each of these elements are implemented with FPGA chips (Hamalainen p.448 Figure 1 and p.449 col.2 Tree Shape Architecture for Neural Computations, 1st – p.449 col.1 1st paragraph; and p.452 Figure 6 and p.451 col.1 1st paragraph – col.2 2nd paragraph: TUTNC Implementation). Hamalainen further teaches the TUTNC architecture supporting two parallelism modes (node parallelism and weight parallelism), with both providing input and weight data from the master PE through broadcasting (node parallelism) or through a distributed assignment of inputs and weights to the processing elements (weight parallelism), where in both modes the master computation circuit receives collectively the input vector and weight inputs as “one or more groups of MNN data” (Hamalainen p.449 Figure 2 and p.449 col.1 3rd paragraph - col.2 2nd paragraph; p.450 Figure 3; and p.451 Figure 4).) … 
… wherein the one or more groups of MNN data include input data and one or more weight values (Examiner’s note: As indicated earlier, Hamalainen teaches the TUTNC architecture receiving input vector and weight data, where the input vector data broadcasted by the master PE to the slave PEs determines the selection and adjustment of the weights assigned to each PE during neural network computations, such that the input vector x traversing through each PE represents both the input data and the weight data assigned to each PE, thus corresponding to “one or more groups of MNN data include input data and one or more weight values” (Hamalainen p.451 Figure 4 and p.449 col.2 2nd paragraph – p.450 col.1 2nd paragraph).) …
… calculating, by one or more slave computation circuits of the computation circuits, one or more groups of slave output values (Examiner’s note: Hamalainen teaches for both parallelism modes, the receiving PEs (“one or more slave computation circuits”) perform a neural network computation involving a calculation of an intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    at each neural network layer, where the intermediate output                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                             
                        
                    corresponds to “one or more groups of slave output values” (Hamalainen p.449 col.1 2nd paragraph; p.449 Figure 2, p.450 Figure 3; and p.451 Figure 4).)  …  
… calculating, by the master computation circuit, a merged intermediate vector (Examiner’s note: As indicated earlier, Hamalainen teaches for both parallelism modes, the root (represented by a master PE corresponding to a “master computation circuit”) receives the outputs produced by the slave PEs for each network layer i and generates and stores an intermediate vector for use by the PEs to calculate the output for the next layer i+1, where the calculation of this intermediate vector at each network layer i corresponds to “calculating a merged intermediate vector”. Hamalainen further teaches this calculation and generation of intermediate vectors repeats until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph; p.449 Figure 2; p.450 Figure 3; and p.451 Figure 4).) … 
… generating, by the master computation circuit, an output vector based on the merged intermediate vector (Examiner’s note: As indicated earlier, Hamalainen teaches for both parallelism modes, the master PE at the root receives the outputs produced by the slave PEs for each network layer i and generates and stores an intermediate vector for use by the PEs to calculate the output for the next layer i+1, where the generation of this intermediate vector at each network layer i corresponds to “calculating a merged intermediate vector”. Hamalainen further teaches this calculation and generation of intermediate vectors repeats until the calculation of the merged intermediate vector at the final layer corresponds to an output vector for the entire neural network, where the calculation of the output vector at the final layer corresponds to “generating, by the master computation circuit, an output vector based on the merged intermediate vector” (Hamalainen p.449 col.1 3rd paragraph – col.2 2nd paragraph; p.449 Figure 2; p.450 Figure 3; and p.451 Figure 4).).  
While Hamalainen teaches a DSP circuit within each processing unit (Hamalainen p.455 Figure 11) with the capability to perform fixed-point arithmetic (Hamalainen p.455 col.2 Processing Unit, 1st paragraph – p.456 col.1 1st paragraph), Hamalainen does not explicitly teach
… wherein at least a portion of the input data and the weight values represents one or more predetermined discrete values …
… calculating … one or more groups of slave output values based on a data type of each of the one or more groups of MNN data …
… calculating … a merged intermediate vector based on the data type of each of the one or more groups of MNN data …
Henry teaches
… wherein at least a portion of the input data and the weight values represents one or more predetermined discrete values (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0007], the term “discrete values” is used in an example to describe “… discrete values (e.g., -1, -1/8, 1/8, and 1)”, where these values of -1, -1/8, 1/8, and 1 broadly recite integer or fixed-point values, and hence in the context of this limitation, this limitation broadly recites that the input data and the weight values are instances of “predetermined discrete values”, where the term “predetermined” broadly recites the source of the input data and the weight values (i.e., the storage location or assignment of these input data and weight values). Henry teaches a processing core within a multi-core processor, where the processing core contains a neural network unit (NNU) that includes a plurality of neural processing units (NPUs), where the NNU (corresponding to a “computation circuit”) contains data and weight RAM that stores input data and weight data (where a sequencer assigns and selects each row of weight or data RAM to each NPU within the NNU, thus corresponding to the aspect of “predetermining” the input data and weight values). Henry further teaches each NPU (representing neurons in a neural network) performing arithmetic operations (add, multiply, accumulate operations) on the singular value weight word or data word stored in the weight or data RAM using an arithmetic logic unit (ALU), where the ALU supports arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), where the floating-point operations are performed using fixed point hardware-assist logic, and as such, the operations performed within each NPU on dedicated input data and weight values in corresponding RAM locations in the NNU correspond to “wherein at least a portion of the input data and the weight values represents predetermined discrete values” (Henry Figure 1 and [0051]; [0054]; Figure 2 and [0059]-[0060], [0062], and [0064]).) …
… calculating … one or more groups of slave output values based on a data type of each of the one or more groups of MNN data (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0005]-[0008], continuous data is described as data involving more computational resources (such as 32-bit floating-point numbers) than discrete data, and hence in the context of this limitation, this limitation broadly recites calculating one or more groups of output values using either floating-point (“continuous”) or integer and fixed-point (“discrete”) operations. As indicated earlier, Henry teaches each NPU performing arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), and as such, these arithmetic operations are based on data types of each of the one or more groups of MNN data, where these arithmetic operations are performed in the context of generating sums of multiply-accumulate operations of input data and weight values at each neural network layer, such that each of these multiply-accumulate operations of input data and weight values represent one or more groups of slave output values based on integer, fixed-point, or floating-point operations (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).) …
… calculating … a merged intermediate vector based on the data type of each of the one or more groups of MNN data (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s original specification paragraph [0005]-[0008], continuous data is described as data involving more computational resources (such as 32-bit floating-point numbers) than discrete data, and hence in the context of this limitation, this limitation broadly recites calculating one or more groups of output values using either floating-point (“continuous”) or integer and fixed-point (“discrete”) operations. As indicated earlier, Henry teaches each NPU performing arithmetic operations involving integer operations, arithmetic operations involving same size (in bits) (interpreted as fixed-point operations), and arithmetic operations involving different binary point locations (corresponding to floating-point operations), and as such, these arithmetic operations are based on data types of each of the one or more groups of MNN data, where these arithmetic operations are performed in the context of generating sums of multiply-accumulate operations of input data and weight values at each neural network layer, such that each of these sums of multiply-accumulate operations of input data and weight values at each neural network layer represent merged intermediate vectors based on based on integer, fixed-point, or floating-point operations (Henry [0045]-[0046]; Figure 2 and [0059]-[0060], [0062], [0064]; [0219]-[0221]; [0223]-[0227]; and [0228]-[0231]).) …
Both Hamalainen and Henry are analogous art since they both teach performing neural network computations using neural network hardware architectures.
It would have been obvious to a person having ordinary skill in the art to substitute the DSP circuitry (found within each master and slave processing elements) taught in Hamalainen with an appropriate DSP circuitry that performs the specific arithmetic logic unit functions taught in Henry in order to provide the integer, fixed-point, and floating-point operations required for performing neural network data computations to produce the same predictable computation results for the invention.
While Hamalainen in view of Henry teaches a DSP circuit within each processing unit (Hamalainen p.455 Figure 11) with the capability to perform block moves of data (Hamalainen p.456 col.2 Processing Unit, 1st paragraph), Hamalainen in view of Henry does not explicitly teach
… receiving … from a direct memory access circuit …
Gilbert teaches
… receiving … from a direct memory access circuit (Examiner’s note: Gilbert teaches a master processor and an array of slave processor elements (where the master processor and slave processor elements are identical to each other; Gilbert col. 4 lines 36-45), where each processor element contains a DMA controller (Gilbert Figure 2, element 76; col.5 lines 33-39) connected to two SRAMs that store data and processing instructions, with the DMA controller managing data transfers into each processing element, and as such this DMA controller corresponds to circuitry performing operations including “receiving … from a direct memory access circuit” (Gilbert Figure 1, elements 16, 12, 14; Figure 2, elements 50a, 50b, 76; col.5 lines 12-15: “FIG.2 shows the internal and operational hardware of a single processor element 14. A large on-chip memory 50 includes two separate SRAMs 50a, 50b, which store data and processing instructions.” and Gilbert col.6 line 58 – col.7 line 8: “There are two general types of data transfers: input and output between a processor element and the host 20; and interprocessor communication between processor elements in the array 12. These two different data transfers share a common DMA mechanism in the hardware of each processor element and are specified by TCB data structures. In each case, one channel of the DMA hardware serves to move data 65 into the processor element, while another channel moves data out. The transfers are carried out independently of, and non-interfering with, the computational core processor section 52 of the processor element due to its dual-ported memory. Data transfers are also "through-routed", and are stored in memory only after arriving at the destination processing element. Such transfers thus bypass the internal memory of intervening slaves when there is communication between non-adjacent slaves. In this manner, the processor element enhances throughput via concurrent computation and communication.”).), …
Both Hamalainen in view of Henry and Gilbert are analogous art since they both teach neural network hardware architectures.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to substitute the step involving the block moves of data in the DSP circuitry of Hamalainen in view of Henry with the DMA controller of Gilbert in order to support the same predictable results of performing block moves of data within the DSP circuitry. The motivation to combine is also further taught in Gilbert, since a DMA controller performs these block moves of data independently of the processing element, thereby providing concurrent data throughput and communication and hence improving the overall computational efficiency of the system (Gilbert col.6 line 63 – col.7 line 8: “In each case, one channel of the DMA hardware serves to move data 65 into the processor element, while another channel moves data out. The transfers are carried out independently of, and non-interfering with, the computational core processor section 52 of the processor element due to its dual-ported memory. Data transfers are also "through-routed", and are stored in memory only after arriving at the destination processing element. Such transfers thus bypass the internal memory of intervening slaves when there is communication between non-adjacent slaves. In this manner, the processor element enhances throughput via concurrent computation and communication.”).
Regarding amended Claim 18, 
Claim 18 recites the method of claim 17, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 2, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 2, in view of the additional rejections provided by Gilbert from Claim 17.
Regarding amended Claim 19, 
Claim 19 recites the method of claim 17, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 3, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 3, in view of the additional rejections provided by Gilbert from Claim 17.
Regarding amended Claim 20, 
Claim 20 recites the method of claim 17, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 4, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 4, in view of the additional rejections provided by Gilbert from Claim 17.
Regarding amended Claim 21, 
Claim 21 recites the method of claim 17, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 6, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 6, in view of the additional rejections provided by Gilbert from Claim 17.
Regarding amended Claim 22, 
Claim 22 recites the method of claim 17, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 7, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 7, in view of the additional rejections provided by Gilbert from Claim 17.
Regarding amended Claim 23, 
Claim 23 recites the method of claim 17, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 8, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 8, in view of the additional rejections provided by Gilbert from Claim 17.
Regarding amended Claim 24, 
Claim 24 recites the method of claim 23, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 9, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 9, in view of the additional rejections provided by Gilbert from Claim 23.
Regarding amended Claim 25, 
Claim 25 recites the method of claim 21, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 10, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 10, in view of the additional rejections provided by Gilbert from Claim 21.
Regarding amended Claim 26, 
Claim 26 recites the method of claim 23, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 11, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 11, in view of the additional rejections provided by Gilbert from Claim 23.
Regarding amended Claim 27, 
Claim 27 recites the method of claim 21, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 12, and hence is rejected under similar rationale provided by Hamalainen and Henry as indicated in Claim 12, in view of the additional rejections provided by Gilbert from Claim 21.
Claims 28-30 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, October 1995 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry], in further view of Gilbert, Ira H., U.S. Patent 5,752,068, issued 5/12/1998 [hereafter referred as Gilbert] as applied to Claim 17; in even further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, March 17 2016 [hereafter referred as Miyashita].
Regarding amended Claim 28, 
Claim 28 recites the method of claim 17, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 13, and hence is rejected under similar rationale and motivations provided by Hamalainen, Henry, and Miyashita as indicated in Claim 13, in view of the additional rejections provided by Gilbert from Claim 17.
Regarding amended Claim 29, 
Claim 29 recites the method of claim 28, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 15, and hence is rejected under similar rationale provided by Hamalainen, Henry, and Miyashita as indicated in Claim 15, in view of the additional rejections provided by Gilbert from Claim 28.
Regarding amended Claim 30, 
Claim 30 recites the method of claim 17, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 16, and hence is rejected under similar rationale and motivations provided by Hamalainen, Henry, and Miyashita as indicated in Claim 16, in view of the additional rejections provided by Gilbert from Claim 17.
Claim 31 is rejected under 35 U.S.C. 103 as being unpatentable over 
Hamalainen et al., TUTNC: A General Purpose Parallel Computer for Neural Network Computations, October 1995 [hereafter referred as Hamalainen] in view of Henry et al., U.S. PGPUB 2017/0103306, filed 4/5/2016, provisional applications 62/239,254 filed 10/8/2015, 62/262,104 filed 12/2/2015, 62/299,191 filed 2/24/2016 [hereafter referred as Henry], in further view of Gilbert, Ira H., U.S. Patent 5,752,068, issued 5/12/1998 [hereafter referred as Gilbert], in even further view of Miyashita et al., Convolutional Neural Networks using Logarithmic Data Representation, March 17 2016 [hereafter referred as Miyashita] as applied to Claim 28; in even further view of Hassner et al., U.S. Patent 5,638,065, issued 6/10/1997 [hereafter referred as Hassner].
Regarding amended Claim 31, 
Claim 31 recites the method of claim 28, further comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 14, and hence is rejected under similar rationale and motivations provided by Hamalainen, Henry, Miyashita, and Hassner as indicated in Claim 14, in view of the additional rejections provided by Gilbert from Claim 28.

Conclusion






Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121