Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination (RCE) under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission (an after-final amendment filed on July 11, 2022) has been entered in accordance with the RCE filed on August 1, 2022.

Remarks
	This Office Action is in response to applicant’s RCE filed on August 1, 2022, under which claims 1, 4-9, 12-17, 19-22, 24-28, and 31-36 are pending and under consideration.

Response to Arguments
	Applicant’s amendments have overcome the previous claim objections. Therefore, the previous claim objections have been withdrawn.
	Applicant’s arguments directed to the § 102  and § 103 rejections have been fully considered, but are not deemed to be persuasive in distinguishing over the cited references as currently applied to the amended claims. The claims remain rejected over the previously applied references, although the rejections have been updated to account for the amended claim language.
In regards to independent claims 1, 9, 17, and 22, applicant asserts that Chang does not teach the amended claim limitation of “wherein the first and second programmable processing components are configured to compute  respective first  and second nonlinear functions  of respective portions of the input based at least in part on respective control signals of the plurality of control signals received at the first and second programmable processing components.” In particular, applicant argues that “Chang does not anticipate or teach or suggest that the Gate sigmoid and the Gate Tanh are configured to compute it and             
                
                    
                        c
                    
                    ~
                
            
        t respectively of respective portions of the vector xt based at least in part on respective outputs of MAC units received at the Gate sigmoid and the Gate tanh…In fact, nowhere does Chang disclose computation of it and             
                
                    
                        c
                    
                    ~
                
            
        t for respective portions of the vector xt, and reception of respective outputs of MAC units at the Gate sigmoid and the Gate tanh” (applicant’s remarks, page 13).  The Examiner respectfully disagrees for the following reasons.
In regards to applicant’s observation that “nowhere does Chang disclose computation of it and             
                
                    
                        c
                    
                    ~
                
            
        t for respective portions of the vector xt,” the current rejection does not rely on different sub-portions of the vector xt to satisfy the limitation of “respective portions of the input.” Instead, the instance of vector xt received by the Gate sigmoid and the instance of the vector xt received by the Gate tanh collectively correspond to the “input,” and these respective instances further correspond to the “respective portions” of such input. Chang, FIG. 5 teaches a “Gate sigmoid” block and a “Gate tanh” block, each of which has the general structure shown in FIG. 3. FIG. 3 shows that each gate block receives the vector xt as input. Therefore, across two gate blocks, there are two instances of the vector xt, and each instance of vector xt corresponds to a “respective portions of the input.” Therefore, applicant’s observations on the vector xt do not distinguish over Chang as currently applied. 
The current claim language of “respective portions of the input” does not require any specific content for the respective portions, nor does it require any specific methodology for obtaining the respective portions. For example, the current claim language does not require the respective portions of the input to have different content, and does not require the respective portions to be different sub-parts of something that was originally a particular vector. Therefore, the current claim language does not distinguish over Chang.
In regards to applicant’s observation that Chang does not teach computation of the it and             
                
                    
                        c
                    
                    ~
                
            
        t based on “respective outputs of MAC units,” and does not teach “reception of respective outputs of MAC units at the Gate sigmoid and the Gate tanh,” the Examiner respectfully disagrees. Referring to the claim language of “to compute…based at least in part on respective control signals of the plurality of control signals received at the first and second programmable processing components,” the element of “respective” control signals is met because the structure shown in FIG. 3 of Chang applies to both the Gate sigmoid and the Gate tanh. Since each gate module has the structure shown in FIG. 3, each gate module has its own set of MAC outputs. Therefore, Chang meets the instant claim limitations. Furthermore, claim 1 does not require the control signals to be received from a source external to the first and second programmable processing components, but only requires a source received at some part of the first and second programmable processing components. Therefore, the outputs of the MAC units in Chang constitutes control signals received at the gate modules. 
	Furthermore in regards to the independent claim, on page 13 of applicant's response, applicant also argues that Chang fails to teach “simultaneous outputs.” The Examiner respectfully disagrees. As addressed in the previous advisory action and further discussed in the rejections below, the it and             
                
                    
                        c
                    
                    ~
                
            
        t values that respectively use the “sigmoid” and “tanh” functions are computed simultaneously because they are computed in parallel during the same stage of LSTM computation (see Chang 4th page, first two paragraphs). Thus, the computed it and             
                
                    
                        c
                    
                    ~
                
            
        t are considered to be “simultaneous outputs.” Furthermore, Chang teaches an “inference” during use of the trained model, and this inference is based on the it and             
                
                    
                        c
                    
                    ~
                
            
        t values that were computed simultaneously. 
Therefore, applicant's arguments directed to the independent claims are not persuasive.
In regards to dependent claims 8, 16, and 28, on page 18 of applicant's response, applicant argues that Brothers (specifically, paragraphs 89 and 92 thereof) does not teach the feature that bit D1 is set to compute a first/second nonlinear function based on a control signal received by the AFU. Applicant also argues that the computation of the first/second nonlinear function does not use the configuration setting of either bit D0 or bit D1. These arguments are not persuasive. As addressed in the previous advisory action and further discussed in the rejections below, paragraph 89 of Brothers teaches that bit D1 causes the section of a particular activation function (analogous to the first/second nonlinear function). This selection causes the selected activation to be computed in the operations of AFU 514. Thus, the bit D1 is used for the computation of different activation functions, analogous to the first and second nonlinear functions. Note that the specific features of the first and second nonlinear functions are already taught by Chang's activation functions, and that Brothers was cited for the concepts of switching between different activation functions. Thus, the combination of Chang and Brothers teaches each and every limitation of claim 8, and similar dependent claims.
Applicant’s remarks on the remaining dependent claims rely on the remarks for the independent claims. Therefore, these remarks do not overcome the rejections of those remaining dependent claims.  

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 4-9, 12-17, 19-22, 24-28, and 31-36 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claims 1, 9, 17, and 22, which initially recite “…to compute a first nonlinear function…to compute a second nonlinear function, the first nonlinear function being different than the second nonlinear function” (see first paragraph of claim 1), recite the following limitations that are indefinite:  
“the first and second programmable processing components are configured to compute respective first and second nonlinear functions” (see second paragraph of claim 1), and 
“based at least in part on simultaneous outputs of the first programmable processing component that computes the first nonlinear function, and the second programmable processing component that computes the second nonlinear function” (see third paragraph of claim 1). 
These limitation are indefinite because it is unclear whether the “respective first and second nonlinear functions” are the same as the previously recited “a first nonlinear function” and “a second nonlinear function,” due to the lack of the term “the” in front of “respective.” Thus, it is also unclear whether the expressions “the first nonlinear function” and “the second nonlinear function” refer to both of the above or just one of the above. 
	This part of the rejection can be overcome by amending “respective first and second nonlinear functions” to “the respective first and second nonlinear functions.” For purposes of examination, the claim has been interpreted to have the meaning of the suggested revision.
In claims 8, 16, and 28, the phrase “setting at least one switch of at least one of the first programmable processing component to compute the first nonlinear function or the second programmable processing component to compute the second nonlinear function based at least in part on the received plurality of control signals” is indefinite because of a grammatical ambiguity. It is unclear as whether “to compute the first nonlinear function” and “to compute the second nonlinear function” are merely modifiers of the nouns “the first programmable processing component” and “second programmable processing component” or instead specify the result of the act of “setting.” In other words, it is unclear whether the meaning of the claim is:
(1)	“setting at least one switch of at least one of the first programmable processing component configured to compute the first nonlinear function or the second programmable processing component configured to compute the second nonlinear function, wherein the at least one switch is set based at least in part on the received plurality of control signals” (i.e., “to compute the first/second nonlinear function” merely modifies the nouns “the first/second programmable processing component” in the manner of an adjective phrase); or
(2)	 “setting at least one switch of at least one of the first programmable processing component , wherein the at least one switch is set so that the first programmable processing component computes the first nonlinear function and/or so that the second programmable processing component computes the second nonlinear function, and wherein the at least one switch is set based at least in part on the received plurality of control signals” (i.e., “to compute the first/second nonlinear function” specifies the result of the action of “setting”).
Since the current claim language is unclear as to which of the two meanings described above is appropriate, claims 8, 16, 28 are therefore indefinite. 
Based on the specification’s disclosure, the Examiner believes that the second meaning described above could have been intended. However, the current claim language does not express the second meaning in a clear form. Here, “at least one switch of at least one of the first programmable processing component…or the second programmable processing component…” is a single noun phrase that serves as the object of gerund “setting”; thus, placing the phrases “to compute the first nonlinear function” and “to compute the second nonlinear function” within that noun phrase results in a confusing grammatical form. From another perspective, the “first programmable processing component” and “second grammatical processing component” form an alternate expression that defines “switch,” but “to compute the first nonlinear function’ and “to compute the second nonlinear function” form a separate alternate expression that defines the result of “setting.” Thus, interweaving the two expression results in a confusing grammatical form. Therefore, if (2) is the intended meaning, the claim language should be revised to clarify the intended meaning.
This part of the rejection can be overcome by amending the claim language, such as in the manner described in item (2) listed above, if such is the intended meaning. For purposes of examination, the claim language has been interpreted to have meaning (2) described above. 
In claim 31, the phrase “wherein the first programmable processing component to compute the first nonlinear function and the second programmable processing component to compute the second nonlinear function is based on the plurality of control signals received by the first processing component” appears to be missing a word, and is therefore indefinite. Here, the singular form “is” appears to imply that the intended subject is not “the first programming processing component…and the second programmable processing component” (since this phrase would instead use plural form “are”) but is instead a word that is missing from the current claim language. In view of the contexts in which the above features appear in other claims, the Examiner believes that claim 31 may have been based on claim 6. If so, then claim 31 appears be missing the word “configuring” after “wherein.” For purposes of examination, the claim has been interpreted to cover the meaning of “wherein configuring the first programmable processing component to…” This part of the rejection can be overcome by amending the claim in this manner.
Claims dependent from one or more of the above discussed claims are also rejected for the same reasons, since these dependent claims incorporate the indefinite recitations of their parent claims without curing the deficiencies thereof. Therefore, the above reasons for rejection of the independent claims also apply to the remaining dependent claims. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 4-5, 7, 9, 12-13, 15, 17 19-21, 22, 24-25, and 27 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chang et al., “Recurrent Neural Networks Hardware Implementation on FPGA,” arXiv:1511.05552v4 [cs.NE] 4 Mar 2016 (“Chang”).
As to claim 1, Chang teaches a method of a device operating a computational network, comprising: 
receiving an input and a plurality of control signals at a first processing component, [FIG. 5, which teaches three gate modules of the structure shown in FIG. 3. Specifically the combination of a “Gate sigmoid” (which computes it, in accordance with equation (1) in § II, paragraph 3) and a “Gate tanh” (which computes                         
                            
                                
                                    c
                                
                                ~
                            
                        
                    t, in accordance with equation (3) in § II, paragraph 3) constitutes a “first processing component.” Furthermore, as shown in FIG. 3, each gate module receives the vector xt. Thus, the instance of vector xt inputted into the “Gate sigmoid” and the instance of vector xt inputted into the “Gate tanh” collectively read on the “input” of the current claim limitation. Furthermore, each gate module receives an output from “Sync” and “MAC” units. The output from the MAC units constitutes a “control signal” (and thus a “plurality of control signals” across two gate modules) as further discussed below.] the first processing component comprising at least a first programmable processing component and a second programmable processing component, [The “Gate sigmoid” that computes it and the “Gate tanh” that computes                         
                            
                                
                                    c
                                
                                ~
                            
                        
                    t identified above respectively correspond to “a first programmable processing component and a second programmable processing component” of the instant claim. These gates are “programmable” because the entire system is implemented by an FPGA (see § I, paragraph 4: “a LSTM hardware module implemented on the Zynq 7020 FPGA”), and the gates can be configured to perform either the Tanh or Sigmoid functions, as disclosed in FIG. 3, caption: “The non-linear module can be configured to be a tanh or logistic sigmoid.”] wherein a control signal of the plurality of control signals is generated by a second processing component configured to compute one or more linear functions; [As noted above, the MAC units in the gate module, as shown in FIG. 3, generate an input signal that feeds into the “Tanh/Sigmoid” block. This input signal constitutes a “control signal” because it cause the “Tanh/Sigmoid” block to compute the non-linear function, as disclosed in § IV.A, paragraph 1: “The results from the MAC units are added together. The adder’s output goes to an element wise non-linear function.” The MAC units of any one of the gate modules constitute a “second processing component” that computes a linear function, as shown in Equations (1) and (3) in § II, paragraph 3 (for example, Wxixt is a linear function). The Examiner notes that the instant claim does not require a the “second processing component” to be external to the “first processing component.”]    
configuring the first programmable processing component to compute a first nonlinear function and the second programmable processing component to compute a second nonlinear function, the first nonlinear function being different than the second nonlinear function, [FIG. 3, caption: “The non-linear module can be configured to be a tanh or logistic sigmoid.” As shown in FIG. 5, two different gate modules are configured to implement two different non-linear activation functions, namely sigmoid and tanh. See also § IV.A: “The non-linear function is segmented…The values of a, b and x range are stored in configuration registers during the configuration stage” (describing the configuration of the registers to enable the computation of the non-linear functions); and § IV.B: “The software populates the main memory with weight values and input vectors, and it controls the hardware module with a set of configuration registers.”] and the first nonlinear function being computed simultaneously with the second nonlinear function, [4th page, first two paragraphs: “…the LSTM computation was separated into three sequential stages: 1) Compute it and                         
                            
                                
                                    c
                                
                                ~
                            
                        
                    t. 2) Compute ft and ot. 3) Compute ct and ht. In the first and second stage, two gate modules (4 MAC units) are running in parallel to generate two internal vectors (it,                         
                            
                                
                                    c
                                
                                ~
                            
                        
                    t, ft and ot). That is, it and                         
                            
                                
                                    c
                                
                                ~
                            
                        
                    t, which use the non-linear functions of sigmoid and tanh, respectively, are computed simultaneously because they take place in parallel during the same stage.], wherein the first and second programmable processing components are configured to compute  respective first  and second nonlinear functions  of respective portions of the input [The “Gate sigmoid” that computes it and the “Gate tanh” that computes                         
                            
                                
                                    c
                                
                                ~
                            
                        
                    t as described above respectively compute the sigmoid and tanh activation functions, and do so based on their respective instance of vector xt, as described in equations (1) and (3) on the second page of the reference. That is, the instance of vector xt inputted into the “Gate sigmoid” and the instance of vector xt inputted into the “Gate tanh” respectively correspond to “respective portions of the input.” The instant claim does not require a limitation as to the specific content for the “portions of the input” that distinguishes over Chang.] based at least in part on respective control signals of the plurality of control signals received at the first and second programmable processing components [§ IV.A, paragraph 1: “The results from the MAC units are added together. The adder’s output goes to an element wise non-linear function.” Since the output of the MAC units causes the computation of the non-linear function, they are considered to “control” the gate module to compute the non-linear function. The computation of the non-linear in any single gate module (as shown in FIG. 3) as a result of the MAC unit output constitutes the computation based on a respective control signal. Therefore, the output of the MAC units of the “Gate sigmoid” may be regarded as the control signal of the first programmable processing component, and output of the MAC units of the “Tanh sigmoid” may be regarded as the control signal of the second programmable processing component.] and 
operating the computational network to generate an inference [In general, § II, paragraph 5 teaches computing an output (i.e., inference) based on training: “One needs to train the model to get the parameters that will give the desired output. In simple terms, training is an iterating process in which training data is fed in and the output is compared with a target.” For specific examples, see § IV, paragraphs 2-3: “The Torch7 code implements a character level language model, which predicts the next character given a previous character…The predicted character from last layer is fed back to input xt of first layer for following time step.”] based at least in part on simultaneous outputs of the first programmable processing component that computes the first nonlinear function, and the second programmable processing component that computes the second nonlinear function. [The it and                         
                            
                                
                                    c
                                
                                ~
                            
                        
                    t, values that respectively use the “sigmoid” and “tanh” functions are computed simultaneously because they are computed in parallel during the same stage of LSTM computation (see Chang 4th page, first two paragraphs, as discussed above). Thus, the it and                         
                            
                                
                                    c
                                
                                ~
                            
                        
                    t values read on the limitation of “simultaneous outputs.” Furthermore, the “inference” in Chang as noted above is “based on” the it and                         
                            
                                
                                    c
                                
                                ~
                            
                        
                    t values.]

As to claim 4, Chang teaches the method of claim 1, wherein the first nonlinear function and the second nonlinear function comprise activation functions. [§ II, paragraph 3, teaches the sigmoid (σ) and tanh nonlinear functions, which are activation functions in the context of the LSTM neural network model shown in equations (1) through (6).]   

As to claim 5, Chang teaches the method of claim 4, wherein at least one of the first nonlinear function and the second nonlinear function is an approximated function. [§ IV.A, paragraph 2: “The non-linear function is segmented into lines y = ax+ b, with x limited to a particular range. The values of a, b and x range are stored in configuration registers during the configuration stage. Each line segment is implemented with a MAC unit and a comparator. The MAC multiplies a and x and accumulates with b.” Note that the above description, particularly the underlined part, refers to a linear interpolation approximation of the sigmoid and tanh functions.]

As to claim 7, Chang teaches the method of claim 1, wherein the first processing component includes n programmable processing components and further comprising configuring each of the n programmable processing components to compute a nonlinear function. [As noted in the rejection of claim 1, Chang teaches at least a first and a second programmable processing component, and the operation of configuring these components. Therefore, Chang teaches the instant limitation with respect to n being at least 2.]

As to claims 9, 12-13, and 15, these claims are directed to an apparatus for performing operations that are the same or substantially the same as those recited in claims 1, 4-5, and 7, respectively. Therefore, the rejections made to claims 1, 4-5, and 7 are applied to claims 9, 12-13, and 15, respectively.
Additionally, Chang teaches “an apparatus for operating a computational network, comprising: a memory; and at least one processor coupled to the memory, the at least one processor being configured to” [§ I, paragraph 4: “a LSTM hardware module implemented on the Zynq 7020 FPGA.” § V.B, paragraph 1: “The control and testing software was implemented with C code. The software populates the main memory with weight values and input vectors, and it controls the hardware module with a set of configuration registers.” That is, the device is operated by a computer, and it is implicitly disclosed that such a compute includes a processor and a memory.]

As to claims 17 and 19-21, these claims are directed to an apparatus for performing operations that are the same or substantially the same as those recited in claims 1 and 4-5, and 7, respectively. Therefore, the rejections made to claims 1, 4-5, and 7 are applied to claims 17 and 19-21, respectively. 
Additionally, Chang teaches “an apparatus for operating a computational network, comprising: circuitry configured for” [§ I, paragraph 4: “a LSTM hardware module implemented on the Zynq 7020 FPGA.” § V.B, paragraph 1: “The control and testing software was implemented with C code. The software populates the main memory with weight values and input vectors, and it controls the hardware module with a set of configuration registers.” That is, the device is operated by a computer, and it is implicitly disclosed that such a compute includes circuitry.]

As to claims 22, 24-25, and 27, these claims are directed to a computer readable medium for performing operations that are the same or substantially the same as those recited in claims 1, 4-5, and 7, respectively. Therefore, the rejections made to claims 1, 4-5, and 7 are applied to claims 22, 24-25, and 27, respectively. 
Additionally, Chang teaches “a non-transitory computer readable medium having executable code for operating a computational network, the code comprising…” [§ I, paragraph 4: “a LSTM hardware module implemented on the Zynq 7020 FPGA.” § V.B, paragraph 1: “The control and testing software was implemented with C code. The software populates the main memory with weight values and input vectors, and it controls the hardware module with a set of configuration registers.” That is, the device is operated by a computer, and it is implicitly disclosed that such a computer has a non-transitory computer readable medium to store the operational code.]

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 6, 8, 14, 16, 26, 28, 31, and 33-36 are rejected under 35 U.S.C. 103 as being unpatentable over Chang in view of Brothers et al. (US 2017/0011288 A1) (“Brothers”).
As to claim 6, Chang teaches the method of claim 1, but does not teach the further limitations of “wherein the configuring is based at least in part on the control signal received by the first processing component.”  
Brothers, in an analogous art, teaches the above limitations. Brothers “relates to a neural network processor for executing a neural network” ([0002]), and is therefore in the same field of endeavor as the claimed invention, namely machine learning, and also pertains specifically to physical implementation of neural network devices. In general, FIG. 5 of Brothers teaches details a device that is “capable of performing operations supporting LSTM cell evaluation” ([0083]) and includes an AFU (activation function unit) 514. 
In particular, Brothers teaches wherein the configuring [[0089]: “…AFU 514 may be controlled by bits DO and D1 of the macro instruction stored in instruction register 502….When bit DO is set to 1, the signal received from mask circuit 512 is run through, e.g., processed by, LUT 522 and/or LUT 524. When bit D1 is set to 0, LUT 524 is selected for use to process the signal received from mask circuit 512. When bit D1 is set to 1, LUT 522 is selected for use to process the signal received from mask circuit 512.” [0097]: “A fourth macro instruction from control unit 102 causes processor 500 to apply the activation function (e.g., the Sigmoid operation of input gate 605) to the summed vector products…the particular activation function to be applied is specified by the macro instruction that is received. For example, depending upon the particular LUT that is enabled in AFU 514, the activation function that is applied will differ.”] is based at least in part on the plurality of control signals received by the first processing component. [[0092]: “setting the appropriate values within instruction register 502 initiates a plurality of different operations throughout processing unit 500… the macro instruction stored in instruction register 502 may be specified by control unit 102. Control unit 102 executes a single macro instruction that generates control signals to processing unit 500 thereby storing the instruction within instruction register 502.”]
The teachings for the control signal in Brothers are compatible with the limitations of parent claim 1. For example, Brothers also teaches “one or more linear functions” (see [0039]: “AAC arrays 106 are configured to perform multiply accumulate (MAC) operations.”). Thus, the combination of the AAC arrays 106 and the control unit 102 may be regarded as a “second processing component.” 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chang with the teachings of Brothers by modifying the device of Chang such that “the configuring is based at least in part on the control signal received by the first processing component.” The motivation would have been to implement a processing unit that can be switched to perform different operations (Brothers [0092]: “setting the appropriate values within instruction register 502 initiates a plurality of different operations throughout processing unit 500.”), including the computation of different activation functions (Brothers [0097]: “depending upon the particular LUT that is enabled in AFU 514, the activation function that is applied will differ”).

As to claim 8, Chang teaches the method of claim 1, but does not explicitly teach the further limitations of “wherein the configuring comprises setting at least one switch of at least one of the first programmable processing component to compute the first nonlinear function or the second programmable processing component to compute the second nonlinear function based at least in part on the received control signal.”
Brothers, in an analogous art, teaches the above limitations. Brothers “relates to a neural network processor for executing a neural network” ([0002]), and is therefore in the same field of endeavor as the claimed invention, namely machine learning, and also pertains specifically to physical implementation of neural network devices. In general, FIG. 5 of Brothers teaches details a device that is “capable of performing operations supporting LSTM cell evaluation” ([0083]) and includes an AFU (activation function unit) 514. 
In particular, Brothers teaches wherein the configuring comprises setting at least one switch [[0089]: “As pictured, AFU 514 includes an interpolation circuit 520, a lookup table (LUT) 522, a LUT 524, and a multiplexer 526.” As shown in FIG. 5, the multiplexer acts as a switch to select either LUT 522 or LUT 524, are the look-up tables for implementing two different types of activation functions, as described below.] of at least one of the first programmable processing component to compute the first nonlinear function or the second programmable processing component to compute the second nonlinear function [[0089]: “…AFU 514 may be controlled by bits DO and D1 of the macro instruction stored in instruction register 502….When bit DO is set to 1, the signal received from mask circuit 512 is run through, e.g., processed by, LUT 522 and/or LUT 524. When bit D1 is set to 0, LUT 524 is selected for use to process the signal received from mask circuit 512. When bit D1 is set to 1, LUT 522 is selected for use to process the signal received from mask circuit 512.” [0097]: “A fourth macro instruction from control unit 102 causes processor 500 to apply the activation function (e.g., the Sigmoid operation of input gate 605) to the summed vector products…the particular activation function to be applied is specified by the macro instruction that is received. For example, depending upon the particular LUT that is enabled in AFU 514, the activation function that is applied will differ.” That is, the activation functions in Brothers are analogous to the first/second nonlinear function of Chang, and paragraph 89 of Brothers teaches that bit D1 causes the selection of a particular activation function. This selection causes the selected activation to be computed in the operations of AFU 514. Thus, the bit D1 is used for the computation of different activation functions analogous to the first and second nonlinear functions. Note that the specific limitations of the “first nonlinear function” and the “second nonlinear function” are already taught by Chang’s activation functions. Thus, Brothers has been cited to teach the recited features that are applied to the “first nonlinear function” and the “second nonlinear function.”] based at least in part on the received plurality of control signals. [[0092]: “setting the appropriate values within instruction register 502 initiates a plurality of different operations throughout processing unit 500… the macro instruction stored in instruction register 502 may be specified by control unit 102. Control unit 102 executes a single macro instruction that generates control signals to processing unit 500 thereby storing the instruction within instruction register 502.”]
The teachings for the control signal in Brothers are compatible with the limitations of parent claim 1. For example, Brothers also teaches “one or more linear functions” (see [0039]: “AAC arrays 106 are configured to perform multiply accumulate (MAC) operations.”). Thus, the combination of the AAC arrays 106 and the control unit 102 may be regarded as a “second processing component.” 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chang with the teachings of Brothers by modifying the device of Chang such that “the configuring comprises setting at least one switch of at least one of the first programmable processing component to compute the first nonlinear function or the second programmable processing component to compute the second nonlinear function based at least in part on the received control signal.” The motivation would have been to implement a processing unit that can be switched to perform different operations (Brothers [0092]: “setting the appropriate values within instruction register 502 initiates a plurality of different operations throughout processing unit 500.”), including the computation of different activation functions (Brothers [0097]: “depending upon the particular LUT that is enabled in AFU 514, the activation function that is applied will differ”).

	As to claims 14 and 26, the further limitations of these claims are the same or substantially the same as those recited in claim 6. Therefore, the rejection made to claim 6 is applied to claims 14, and 26.

As to claims 16 and 28, the further limitations of these claims are the same or substantially the same as those recited in claim 8. Therefore, the rejection made to claim 8 is applied to claims 16, and 28.

As to claim 31, Chang teaches the apparatus of claim 17, but does not teach the further limitations of “wherein the first programmable processing component to compute the first nonlinear function and the second programmable processing component to compute the second nonlinear function is based on the plurality of control signals received by the first processing component.”  
Brothers, in an analogous art, teaches the above limitations. Brothers “relates to a neural network processor for executing a neural network” ([0002]), and is therefore in the same field of endeavor as the claimed invention, namely machine learning, and also pertains specifically to physical implementation of neural network devices. In general, FIG. 5 of Brothers teaches details a device that is “capable of performing operations supporting LSTM cell evaluation” ([0083]) and includes an AFU (activation function unit) 514. 
In particular, Brothers teaches “wherein the first programmable processing component to compute the first nonlinear function and the second programmable processing component to compute the second nonlinear function” [[0089]: “…AFU 514 may be controlled by bits DO and D1 of the macro instruction stored in instruction register 502….When bit DO is set to 1, the signal received from mask circuit 512 is run through, e.g., processed by, LUT 522 and/or LUT 524. When bit D1 is set to 0, LUT 524 is selected for use to process the signal received from mask circuit 512. When bit D1 is set to 1, LUT 522 is selected for use to process the signal received from mask circuit 512.” [0097]: “A fourth macro instruction from control unit 102 causes processor 500 to apply the activation function (e.g., the Sigmoid operation of input gate 605) to the summed vector products…the particular activation function to be applied is specified by the macro instruction that is received. For example, depending upon the particular LUT that is enabled in AFU 514, the activation function that is applied will differ.”] is based on the plurality of control signals received by the first processing component. [[0092]: “setting the appropriate values within instruction register 502 initiates a plurality of different operations throughout processing unit 500… the macro instruction stored in instruction register 502 may be specified by control unit 102. Control unit 102 executes a single macro instruction that generates control signals to processing unit 500 thereby storing the instruction within instruction register 502.”]
The teachings for the control signal in Brothers are compatible with the limitation of “generated by a second processing component” in the parent claim. For example, Brothers also teaches “one or more linear functions” (see [0039]: “AAC arrays 106 are configured to perform multiply accumulate (MAC) operations.”). Thus, the combination of the AAC arrays 106 and the control unit 102 may be regarded as a “second processing component.” Furthermore, note that the limitations of “the first programmable processing component” and “the second programmable processing component” are already taught by primary reference Chang. Brothers’ teachings are applicable to each of the two programmable processing components, because AFU 514 in Brothers is analogous to each individual programmable processing components.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chang with the teachings of Brothers by modifying the device of Chang such that “the first programmable processing component to compute the first nonlinear function and the second programmable processing component to compute the second nonlinear function is based on the plurality of control signals received by the first processing component.” The motivation would have been to implement a processing unit that can be switched to perform different operations (Brothers [0092]: “setting the appropriate values within instruction register 502 initiates a plurality of different operations throughout processing unit 500.”), including the computation of different activation functions (Brothers [0097]: “depending upon the particular LUT that is enabled in AFU 514, the activation function that is applied will differ”).

As to claim 33, the combination of method of Chang and Brothers teaches the method of claim 8, wherein the first nonlinear function is computed based at least in part on the at least one switch being at a first position, and the second nonlinear function is computed based at least in part on the at least one switch being at a second position. [As described in the rejection of claim 8, as shown in FIG. 5 of Brothers, the multiplexer is a switch that selects either LUT 522 or LUT 524, which are the look-up tables for implementing two different types of activation functions, as described in the rejection of claim 8, above. This switching process is described in Brothers, [0089]: “When bit D1 is set to 0, LUT 524 is selected for use to process the signal received from mask circuit 512. When bit D1 is set to 1, LUT 522 is selected for use to process the signal received from mask circuit 512.” That is, the multiplexer is at respective positions that respectively select LUT 522 or 524 and their respective activation functions, as described in [0097]: “…depending upon the particular LUT that is enabled in AFU 514, the activation function that is applied will differ.]

As to claims 34-36, the further limitations of these claims are the same or substantially the same as those recited in claim 33. Therefore, the rejection made to claim 33 is applied to claims 34-36. It is noted that claim 35 recites features corresponding to those of claim 8, in addition to features corresponding to those of claim 33. However, claim 33 depends from claim 8, and likewise also includes the features of claim 8. Therefore, the rejection made to claim 33 is applied to claim 35.
 
2.	Claim 32 is rejected under 35 U.S.C. 103 as being unpatentable over Chang in view of Wang et al., “A parallel-fusion RNN-LSTM architecture for image caption generation,” 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 4448-4452 (“Wang”).
As to claim 32, Chang teaches the method of claim 1, but does not teach the remaining limitation of “wherein the inference comprises an estimate of speech inferred from audio captured by the device or an estimate of characters inferred from one or more images captured by the device, or both.” [The Examiner notes that Chang teaches an “estimate of characters,” as described in the rejection of claim 1, but does not teach this entire alternate limitation, which requires the characters are “inferred from one or more images captured by the device”]
Wang, in an analogous art, teaches the above limitations. Wang teaches “a parallel-fusion RNN-LSTM architecture” (title). Therefore, Wang is in the same field of endeavor as the claimed invention, namely machine learning, and also pertains specifically to physical implementation of neural network devices.
In particular Wang teaches “wherein the inference comprises… an estimate of characters inferred from one or more images captured by the device…” (note that the instant claim recites an alternate expression, and other alternate imitations are indicated by ellipses) [Page 4449, § 2: “The part of image representation is based on CNN while the part of caption generation is based on RNN structures. We apply them to extract image features and align visual and language data respectively. The proposed parallel-fusion model is showed in Fig.1.” See also page 4448, FIG. 1, which shows that images are input and captions (i.e., “estimate of characters inferred”) are output.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chang with the teachings of Wang by modifying the generation of the inference such that the inference comprises “an estimate of characters inferred from one or more images captured by the device.” The motivation would have been to perform an image captioning, a task that can utilize an RNN, as suggested by Wang (§ I, paragraphs 1-2: “Image caption generation is a fundamental problem in artificial Intelligence… with the application of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), recent works have made momentous progress, and present a unified method which dominates in image caption generation.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Li et al., US20190147323A1 teaches the selection of an activation function from multiple possible functions (see [0015]: “The lookup table may comprise, and may be operable to switch between, two sets of lookup data and, on the activation module performing a series of activation functions”).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124