DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments and amendments in the Amendment filed May 4, 2021 (herein “Amendment”), with respect to the rejection of claims 20-25 under 35 U.S.C. §101  have been fully considered and are persuasive.  The rejection of claims 20-25 under 35 U.S.C. 101 has been withdrawn. 
Applicant’s arguments and amendments in the Amendment with respect to the rejections of claims 1-25 under 35 U.S.C. §112 have been fully considered and are persuasive.  The rejections of claims 1-25 under 35 U.S.C. §112 have been withdrawn. 
Applicant's arguments and amendments in the Amendment with respect to the rejections of claims 1, 14 and 20, and various claims depending therefrom under 35 U.S.C. §103 have been fully considered but they are not persuasive.
Applicant first argues on page 12 of the Amendment that Yao fails to discuss applying a static neural network and a dynamic neural network to the same input as claimed. Applicant contends that the sparse DNN model of Yao (which the Non-Final Action corresponded to the claimed dynamic neural network), is formed in a training phase using training data and implemented in an implementation phase using pre-processed input data. The claim limitation at issue recites “apply the dynamic neural network to the input speech to generate a second output.” Accordingly, the is applied. Then, given that during this applying of the sparse DNN model of Yao, the applying is to the input data (notwithstanding that it is pre-processed; it is still the input data), and therefore, the same data (pre-processed data 212) to which the reference DNN (corresponding to the claimed static neural network), is applied (see Figs. 6 and 3 of Yao illustrating that the pre-processed data 212 is input to the Sparse DNN model 116 (Fig. 6), and that the same pre-processed data 212 is also input to the Reference DNN model 113 (Fig. 3). 
Applicant next argues on page 12 of the Amendment that secondary reference Inazumi, fails to disclose or render obvious applying a static neural network and a dynamic neural network to the same input. However, Inazumi is not relied upon to provide teachings of the applying the static and dynamic neural network to the input speech, and rather Yao is, where Yao does provide these teachings as discussed above.  Applicant continues in their arguments stating “Importantly, the claimed subject matter applies the static neural network and the dynamic neural network (based on training the dynamic neural network mask) to the same input and combines the outputs to generate a final output.” Here, it is noted that Inazumi was relied upon for teaching “the final output is based on both the first and second outputs.” Applicant does not argue against the teachings of the cited portions of Inazumi relied upon for the claimed “the final output is based on both the first and second outputs,” and the Examiner maintains that such reliance upon Inazumi was proper given that Inazumi teaches in col. 6, lines 7-36, output of adaptation judgement data from at least two different neural networks, the outputs from each respective network being used to select which neural See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981). Applicant’s arguments regarding Yao NPL’s or Inazumi’s individual shortcomings, then, do not show nonobviousness where, as here, the rejection is based on the cited references’ collective teachings.  See In re Merck & Co., Inc., 800 F.2d 1091 (Fed. Cir. 1986).
Third reference Chen NPL is not relied upon for the limitations at issue and Applicant does not argue the teachings of Chen NPL for which it was relied upon.
Therefore, in view of the above, while all of Applicant’s arguments regarding independent claims 1, 14 and 20 have been fully considered, they are not persuasive and the rejection is maintained in this Final Action.
Applicant’s arguments on pages 13-15 of the Amendment are directed towards the other cited art of record (Wierzynski, Diehl NPL, Chen (PGPub), Kharaghani and Robertson) also not teaching the limitations of the independent claims at issue, and Appellant’s arguments do not argue against the specific teachings for which the other cited art was relied upon. However, as noted above, Chen NPL and Inazumi in obvious 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 13, 14, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yao et al., (WO 2018/058509 A1, herein “Yao”) in view of Inazumi (US 5,751,904, herein “Inazumi”), further in view of Chen et al., "Spoken Language Understanding without Speech Recognition," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, 2018, pp. 6189-6193, doi: 10.1109/ICASSP.2018.8461718 (herein “Chen NPL”).
Regarding claim 1, Yao teaches a device comprising (Yao page 33, lines 30-32, fig. 11, device 1100 of a small form factor implementing system 1000): 
a memory to store a static neural network and a dynamic neural network mask (Yao page 5, line 20 – page 6, line 27, DNN pre-training module generating a reference DNN (static neural network) and compression module generating a sparse DNN model including a data structure representative of weights for connections not pruned (dynamic neural network mask), where page 23, lines 16-19 teach that the memory 903 stores DNN model data, reference DNN model weights and sparse DNN model weights); and one or more processors coupled to the memory, the one or more processors to (Yao page 23, line 30 – page 24, line 8, fig. 9, processor 902 obtains DNN data from memory 903): 
apply the static neural network to an input to generate a first output by application of a plurality of pre-trained weights of the static neural network trained using pre-deployment training data (Yao fig. 3, page 9, lines 13-23, reference DNN model which is a pre-trained (pre-trained weights) implementation of a DNN model/structure having inputs receiving pre-processed data (pre-deployment training data), and generating classification score outputs); 
Yao figs. 1 and 2, page 10, line 24 – page 11, line 11, another training set is received by the compression module for processing by the parameters updating module 103 to update the weights and connected/disconnected connections of the Sparse DNN model), the dynamic neural network mask comprising a plurality of mask indicators to set corresponding pre-trained weights to zero (Yao page 12, lines 1-19, sparse DNN model is represented by connection matrices including a binary matrix TK with entries indicating the states of DNN connections, where binary values are either 1 or 0, therefore this binary matrix providing mask indicators of 1 or 0, which correspond to the weights WK); 
apply the dynamic neural network mask to the static neural network to generate a dynamic neural network (Yao fig. 1, page 6, lines 23-29, page 10, line 24 – page 11, line 32, and page 12, lines 11-32,  the Reference DNN model is received by a compression module for iterative processing including pruning of the Reference DNN model, where a connection matrix TK is generated and iterated upon, the connection matrix being optimized to update the network structure of the DNN model (thus applied), and the sparse DNN model results (dynamic neural network)), wherein the dynamic neural network comprises a plurality of zero weights corresponding to the plurality of mask indicators and a plurality of weights matching the pre-trained weights (Yao page 12, line 11 – page 13, line 4, page 16, line 19 – page 17, line 6, the representation of the sparse DNN model includes the connection matrix characterized as a mask matrix based on weights of reference DNN model, the connection matrix being a binary matrix (values of 0 or 1) and where the connection matrix indicates a connection not being connected, it has a value of 0 (zero weights)); 
apply the dynamic neural network to the input to generate a second output (Yao fig. 6, page 19, lines 16 – 20, sparse DNN model (dynamic neural network) receives the pre-processed data, where page 6, lines 9-12 teaches the training data includes voice recordings for utterances, thus the intent of the neural network to process utterances (input), and page 7, lines 13-18 teaching that the pre-processed data being of audio data from a microphone, and the sparse DNN outputs classification scores 213 (second output)); and 
determine a final output associated with the input (Yao page 27, lines 11-29, sparse neural network operates on input data such as speech and provides classification scores to recognized series of textual elements as output) based on the second outputs (Yao page 27, lines 11-29, the classification score output is from the sparse neural network).
While Yao teaches the sparse neural network is trained using training data, Yao does not explicitly teach that it is trained using “post-deployment” training data.
Further, Yao does not explicitly teach that the final output is based on both the first and second outputs.
Still further while Yao teaches input speech, Yao does not explicitly teach that it is a query.
Inazumi teaches post-deployment training data (Inazumi col. 5, lines 20-31, and lines 57-67, each neural network has its own voice data (pre-trained/pre-deployment), and judges whether or not a voice data received by the neural network (post-deployment training data) as a feature vector coincides with its own voice data, and where the neural network adapts to the voice data and learns from it (thus the input voice data being training data).
Inazumi further teaches the final output is based on both the first and second outputs (Inazumi figs. 3 and 4, col. 6, lines 7-36, adaptation judgement data is output from each neural network (at least first and second outputs), where the adaptation judgement data is used to select the neural network having the highest adaptation in recognition, and then selects and outputs the recognition data from this neural network as a recognition result data (final output)).
Chen NPL teaches input query (Chen NPL section 3, page 6190, task focused on in paper is to determine intent given an utterance X, where the utterances are a customer’s response in a customer care call).
Therefore, taking the teachings of Yao and Inazumi together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the recognition result being based on the outputs of two neural networks, and using data after deployment of the trained neural networks as disclosed in Inazumi at least because doing so would provide for a neural network that can process time series data and have its circuitry reduced in whole scale (Inazumi col. 7, lines 21-26).
Further, taking the teachings of Yao and Chen NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the input 
Regarding claim 2, Yao teaches wherein the static neural network comprises a static neural network language classifier, the dynamic neural network mask comprises a dynamic neural network language classifier mask (Yao page 27, lines 11-29, the sparse neural network, comprised of the binary matrix/network mask, which is derived from the reference NN are all models to provide classification scores (thus are classifiers)), the input comprises an input speech (Yao page 27, lines 22-29, neural network models receive input data for an application of voice identification or speech to text recognizing or the like, thus a speech input). 
Yao does not explicitly teach query. Yao further does not explicitly teach the first and second outputs comprise first and second intent classifications, and the final output comprises a user intent associated with the input speech query.
Chen NPL teaches query (Chen NPL section 3, page 6190 and , task focused on in paper is to determine intent given an utterance X, where the utterances are a customer’s response in a customer care call).
Chen NPL further teaches the first and second outputs comprise first and second intent classifications, and the final output comprises a user intent associated with the input speech query (Chen fig. 1, page 6191-6192, sections 4 and 4.1, the CNN Text classifier outputs a intent, and, a fine tuned single model comprised of an acoustic model and a text classifier outputs an intent, the final output being an intent of a customer (user intent) from their input speech during a customer care call).
Therefore, taking the teachings of Yao and Chen NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the intent classification as disclosed in Chen NPL at least because doing so would very effectively reduce intent classification error rate (Chen NPL section 5, page 6192).
Regarding claim 13, Yao teaches wherein the static neural network comprises one of a multilayer perceptron neural network, a feed forward neural network, a recurrent neuronal network, a convolutional neuronal network, or a radial based feed forward neural network (Yao page 4, reference DNN is  any suitable DNN such as a deep fully connected neural network, a deep convolutional neural network, a deep recurrent neural network, or the like).
Yao does not explicitly teach the input query comprises a bag of words vector representative of received audio data, and the final output comprises an intended operation of the device.
Chen NPL teaches the input query comprises a bag of words vector representative of received audio data (Chen NPL section 3, ASR system determines the most likely word sequence W from an input utterance X (received audio data), and n-gram features are extracted (bag of words vector)), and the final output comprises an intended operation of the device (Chen NPL section 3, intent determination made by computing according to equation 3).

Regarding claim 14, Yao teaches a processor-implemented method comprising (Yao page 35, lines 3-18, embodiments of the disclosure implemented using hardware elements including microprocessors): 
applying a static neural network to an input to generate a first output by application of a plurality of pre-trained weights of the static neural network trained using pre-deployment training data (Yao fig. 3, page 9, lines 13-23, reference DNN model which is a pre-trained (pre-trained weights) implementation of a DNN model/structure having inputs receiving pre-processed data (pre-deployment training data), and generating classification score outputs); 
training a dynamic neural network mask based on training data attained at the device (Yao figs. 1 and 2, page 10, line 24 – page 11, line 11, another training set is received by the compression module for processing by the parameters updating module 103 to update the weights and connected/disconnected connections of the Sparse DNN model), the dynamic neural network mask comprising a plurality of mask indicators to set corresponding pre-trained weights to zero (Yao page 12, lines 1-19, sparse DNN model is represented by connection matrices including a binary matrix TK with entries indicating the states of DNN connections, where binary values are either 1 or 0, therefore this binary matrix providing mask indicators of 1 or 0, which correspond to the weights WK); 
applying the dynamic neural network mask to the static neural network to generate a dynamic neural network (Yao fig. 1, page 6, lines 23-29, page 10, line 24 – page 11, line 32, and page 12, lines 11-32,  the Reference DNN model is received by a compression module for iterative processing including pruning of the Reference DNN model, where a connection matrix TK is generated and iterated upon, the connection matrix being optimized to update the network structure of the DNN model (thus applied), and the sparse DNN model results (dynamic neural network)), wherein the dynamic neural network comprises a plurality of zero weights corresponding to the plurality of mask indicators and a plurality of weights matching the pre-trained weights (Yao page 12, line 11 – page 13, line 4, page 16, line 19 – page 17, line 6, the representation of the sparse DNN model includes the connection matrix characterized as a mask matrix based on weights of reference DNN model, the connection matrix being a binary matrix (values of 0 or 1) and where the connection matrix indicates a connection not being connected, it has a value of 0 (zero weights)); 
applying the dynamic neural network to the input to generate a second output (Yao fig. 6, page 19, lines 16 – 20, sparse DNN model (dynamic neural network) receives the pre-processed data, where page 6, lines 9-12 teaches the training data includes voice recordings for utterances, thus the intent of the neural network to process utterances (input), and page 7, lines 13-18 teaching that the pre-processed data being of audio data from a microphone, and the sparse DNN outputs classification scores 213 (second output)); and 
determining a final output associated with the input (Yao page 27, lines 11-29, sparse neural network operates on input data such as speech and provides classification scores to recognized series of textual elements as output) based on the second outputs (Yao page 27, lines 11-29, the classification score output is from the sparse neural network).
While Yao teaches the sparse neural network is trained using training data, Yao does not explicitly teach that it is trained using “post-deployment” training data.
Further, Yao does not explicitly teach that the final output is based on both the first and second outputs.
Still further while Yao teaches input speech, Yao does not explicitly teach that it is a query.
Inazumi teaches post-deployment training data (Inazumi col. 5, lines 20-31, and lines 57-67, each neural network has its own voice data (pre-trained/pre-deployment), and judges whether or not a voice data received by the neural network (post-deployment training data) as a feature vector coincides with its own voice data, and where the neural network adapts to the voice data and learns from it (thus the input voice data being training data).
Inazumi further teaches the final output is based on both the first and second outputs (Inazumi figs. 3 and 4, col. 6, lines 7-36, adaptation judgement data is output from each neural network (at least first and second outputs), where the adaptation judgement data is used to select the neural network having the highest adaptation in recognition, and then selects and outputs the recognition data from this neural network as a recognition result data (final output)).
Chen NPL teaches input query (Chen NPL section 3, page 6190, task focused on in paper is to determine intent given an utterance X, where the utterances are a customer’s response in a customer care call).
Therefore, taking the teachings of Yao and Inazumi together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the recognition result being based on the outputs of two neural networks, and using data after deployment of the trained neural networks as disclosed in Inazumi at least because doing so would provide for a neural network that can process time series data and have its circuitry reduced in whole scale (Inazumi col. 7, lines 21-26).
Further, taking the teachings of Yao and Chen NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the input being a customer utterance for a customer care call as disclosed in Chen NPL at least because speech-driven conversational systems for allowing users to accomplish their desired tasks by interacting with virtual agents has seen a resurgence due to the increasing availability of robust speech recognition on popular consumer devices (as discussed in Chen NPL section I), and as such, would be use of a known technique to improve similar devices (methods, or products) in the same way. see MPEP 2143(I)(C).
Regarding claim 20, Yao teaches at least one non-transitory machine readable medium comprising a plurality of instructions that, in response to being executed on a Yao page 44, lines 18-25, machine readable medium where page 8, lines 8-15 teach that the sparse DNN model implemented by the machine readable medium’s instructions processes input data 211 (evaluate an input)): 
applying a static neural network to the input to generate a first output by application of a plurality of pre-trained weights of the static neural network trained using pre-deployment training data (Yao fig. 3, page 9, lines 13-23, reference DNN model which is a pre-trained (pre-trained weights) implementation of a DNN model/structure having inputs receiving pre-processed data (pre-deployment training data), and generating classification score outputs); 
training a dynamic neural network mask based on training data attained at the device (Yao figs. 1 and 2, page 10, line 24 – page 11, line 11, another training set is received by the compression module for processing by the parameters updating module 103 to update the weights and connected/disconnected connections of the Sparse DNN model), the dynamic neural network mask comprising a plurality of mask indicators to set corresponding pre-trained weights to zero (Yao page 12, lines 1-19, sparse DNN model is represented by connection matrices including a binary matrix TK with entries indicating the states of DNN connections, where binary values are either 1 or 0, therefore this binary matrix providing mask indicators of 1 or 0, which correspond to the weights WK); 
applying the dynamic neural network mask to the static neural network to generate a dynamic neural network (Yao fig. 1, page 6, lines 23-29, page 10, line 24 – page 11, line 32, and page 12, lines 11-32,  the Reference DNN model is received by a compression module for iterative processing including pruning of the Reference DNN model, where a connection matrix TK is generated and iterated upon, the connection matrix being optimized to update the network structure of the DNN model (thus applied), and the sparse DNN model results (dynamic neural network)), wherein the dynamic neural network comprises a plurality of zero weights corresponding to the plurality of mask indicators and a plurality of weights matching the pre-trained weights (Yao page 12, line 11 – page 13, line 4, page 16, line 19 – page 17, line 6, the representation of the sparse DNN model includes the connection matrix characterized as a mask matrix based on weights of reference DNN model, the connection matrix being a binary matrix (values of 0 or 1) and where the connection matrix indicates a connection not being connected, it has a value of 0 (zero weights)); 
applying the dynamic neural network to the input to generate a second output (Yao fig. 6, page 19, lines 16 – 20, sparse DNN model (dynamic neural network) receives the pre-processed data, where page 6, lines 9-12 teaches the training data includes voice recordings for utterances, thus the intent of the neural network to process utterances (input), and page 7, lines 13-18 teaching that the pre-processed data being of audio data from a microphone, and the sparse DNN outputs classification scores 213 (second output)); and 
determining a final output associated with the input (Yao page 27, lines 11-29, sparse neural network operates on input data such as speech and provides classification scores to recognized series of textual elements as output) based on the second outputs (Yao page 27, lines 11-29, the classification score output is from the sparse neural network).

Further, Yao does not explicitly teach that the final output is based on both the first and second outputs.
Still further while Yao teaches input speech, Yao does not explicitly teach that it is a query.
Inazumi teaches post-deployment training data (Inazumi col. 5, lines 20-31, and lines 57-67, each neural network has its own voice data (pre-trained/pre-deployment), and judges whether or not a voice data received by the neural network (post-deployment training data) as a feature vector coincides with its own voice data, and where the neural network adapts to the voice data and learns from it (thus the input voice data being training data).
Inazumi further teaches the final output is based on both the first and second outputs (Inazumi figs. 3 and 4, col. 6, lines 7-36, adaptation judgement data is output from each neural network (at least first and second outputs), where the adaptation judgement data is used to select the neural network having the highest adaptation in recognition, and then selects and outputs the recognition data from this neural network as a recognition result data (final output)).
Chen NPL teaches input query (Chen NPL section 3, page 6190, task focused on in paper is to determine intent given an utterance X, where the utterances are a customer’s response in a customer care call).
Therefore, taking the teachings of Yao and Inazumi together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the 
Further, taking the teachings of Yao and Chen NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the input being a customer utterance for a customer care call as disclosed in Chen NPL at least because speech-driven conversational systems for allowing users to accomplish their desired tasks by interacting with virtual agents has seen a resurgence due to the increasing availability of robust speech recognition on popular consumer devices (as discussed in Chen NPL section I), and as such, would be use of a known technique to improve similar devices (methods, or products) in the same way. see MPEP 2143(I)(C).
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Inazumi in view of Chen NPL, as set forth above regarding claim 1 from which claim 3 depends, further in view of Wierzynski, (US 10,878,320 B2, herein “Wierzynski”).
Regarding claim 3, Yao teaches wherein a pre-training data set used to pre-train the static neural network (Yao page 6, lines 8-22, reference DNN model generated using a training set 112) and the training data comprises a second input and attained at the device (Yao figs. 1 and 2, page 10, line 24 – page 11, line 11, another training set is received by (attained at the device) the compression module for processing by the parameters updating module 103 to update the weights and connected/disconnected connections of the Sparse DNN model), where the training data consists of ).
Yao does not explicitly teach is unavailable to the device.
Yao further does not explicitly teach post-deployment training data comprises a query and a corresponding known user intent both.
Inazumi teaches post-deployment training data (Inazumi col. 5, lines 20-31, and lines 57-67, each neural network has its own voice data (pre-trained/pre-deployment), and judges whether or not a voice data received by the neural network (post-deployment training data) as a feature vector coincides with its own voice data, and where the neural network adapts to the voice data and learns from it (thus the input voice data being training data).
Chen NPL teaches comprises a query and a corresponding known user intent both (Chen NPL pages 6191-6192, sections 4, 4.1, 4.2 and 5, test set used to test the trained models coming from an A2I HAU dataset which is an acoustic input data of responses from a user during a customer care call set labeled with intent labels (known user intent)).
Wierzynski teaches is unavailable to the device (Wierzynski col. 11, lines 25-35, and col. 12, lines 34-40, neural networks are trained using a different data set when an original training set used for pre-training is unavailable).
Therefore, taking the teachings of Yao and Inazumi together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the recognition result being based on the outputs of two neural networks, and using data 
Further, taking the teachings of Yao and Chen NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the intent classification as disclosed in Chen NPL at least because doing so would very effectively reduce intent classification error rate (Chen NPL section 5, page 6192).
Still further, taking the teachings of Yao and Wierzynski together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the unavailable training set as disclosed in Wierzynski at least because doing so would be complying with licensing restrictions (see Wierzynski col. 11, lines 9-13), and therefore such a modification would be known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art, where market forces would require complying with licensing restrictions and the variations (i.e. training with other data instead) is predictable variation to one or ordinary skill. see MPEP 2143(I)(F).
Claims 4, 6, 15-16 and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Inazumi in view of Chen NPL, as set forth above regarding claim 1 from which claim 4 depends, further in view of Diehl et al., “Conversion of Artificial Recurrent Neural Networks to Spiking Neural Networks for Low-power Neuromorphic Hardware,” January 2016, arXiv:1601.04187v1 [cs.NE] (herein “Diehl NPL”).
Regarding claim 4, Yao teaches wherein the one or more processors to train the dynamic neural network mask comprises the one or more processors to (Yao page 23, line 30 – page 24, line 8, fig. 9, processor 902 obtains DNN data from memory 903): 
initialize values of the dynamic neural network mask (Yao page 17, in the Pseudocode A, the values of TK are initialized to 1); 
adapt the initialized values of the dynamic neural network mask to trained values based on the training data (Yao page 17, in the Pseudocode A, TK is updated (adapt) based on a mini-batch of network input from X (training data)); and 
threshold the trained values to generate the mask indicators (Yao page 15, lines 8-20, thresholds are used to evaluate a selected weight where the thresholds are used to set a connect/disconnect/no change indicator for each weight).
Yao does not explicitly teach floating point values.
Yao further does not explicitly teach post-deployment training data.
Inazumi teaches post-deployment training data (Inazumi col. 5, lines 20-31, and lines 57-67, each neural network has its own voice data (pre-trained/pre-deployment), and judges whether or not a voice data received by the neural network (post-deployment training data) as a feature vector coincides with its own voice data, and where the neural network adapts to the voice data and learns from it (thus the input voice data being training data).
Diehl NPL teaches floating point values (Diehl section 3, the training of the original ReLU NN is with floating point values).

Further, taking the teachings of Yao and Diehl NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the floating point value as disclosed in Diehl NPL at least because doing so would yield higher classification accuracies (Diehl NPL section 3).
Regarding claims 6, 16 and 22, Yao teaches wherein [the one or more processors to threshold – claim 6 / said thresholding – claims 16 and 22] ((Yao page 15, lines 8-20, thresholds are used to evaluate a selected weight where the thresholds are used to set a connect/disconnect/no change indicator for each weight)).
Yao does not explicitly teach further generates second mask indicators, third mask indicators, and fourth mask indicators to set, in the dynamic neural network, corresponding pre-trained weights to a first fraction of the weight, a second fraction, greater than the first fraction, of the weight, and to the weight itself, respectively.
Diehl NPL teaches further generates second mask indicators, third mask indicators, and fourth mask indicators to set, in the dynamic neural network, corresponding pre-trained weights to a first fraction of the weight, a second fraction, Diehl sections 2.4.1 and 2.4.3, the weights are discretized to 4-bit accuracy, where each input can be represented in 4 bit resolution (thus up to 16 values as 4-bits are able to represent 16 values), accordingly, the 4-bit values providing for 16 fractions of the weight in 1/16th (fractions) units, thus at least a first, second, third and fourth fraction, where each fraction of the 16 increase in value, and where the weights are bounded to (-1,1) therefore, the highest 4-bit value representing 1 or the weight itself).
Therefore, taking the teachings of Yao and Diehl NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the weight 4-bit resolution as disclosed in Diehl NPL at least because doing so would yield higher classification accuracies (Diehl NPL section 3).
Regarding claims 15 and 21, Yao teaches wherein training the dynamic neural network mask comprises (Yao page 23, line 30 – page 24, line 8, fig. 9, processor 902 obtains DNN data from memory 903): 
initializing values of the dynamic neural network mask (Yao page 17, in the Pseudocode A, the values of TK are initialized to 1); 
adapting the initialized values of the dynamic neural network mask to trained values based on the training data (Yao page 17, in the Pseudocode A, TK is updated (adapt) based on a mini-batch of network input from X (training data)); and 
thresholding the trained values to generate the mask indicators (Yao page 15, lines 8-20, thresholds are used to evaluate a selected weight where the thresholds are used to set a connect/disconnect/no change indicator for each weight).

Yao further does not explicitly teach post-deployment training data.
Inazumi teaches post-deployment training data (Inazumi col. 5, lines 20-31, and lines 57-67, each neural network has its own voice data (pre-trained/pre-deployment), and judges whether or not a voice data received by the neural network (post-deployment training data) as a feature vector coincides with its own voice data, and where the neural network adapts to the voice data and learns from it (thus the input voice data being training data).
Diehl NPL teaches floating point values (Diehl section 3, the training of the original ReLU NN is with floating point values).
Therefore, taking the teachings of Yao and Inazumi together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the recognition result being based on the outputs of two neural networks, and using data after deployment of the trained neural networks as disclosed in Inazumi at least because doing so would provide for a neural network that can process time series data and have its circuitry reduced in whole scale (Inazumi col. 7, lines 21-26).
Further, taking the teachings of Yao and Diehl NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the floating point value as disclosed in Diehl NPL at least because doing so would yield higher classification accuracies (Diehl NPL section 3).
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Inazumi in view of Chen NPL, in view of Diehl NPL, as set forth above regarding claim 4 from which claim 5 depends, further in view of Chen et al., (US 2019/0318246 A1, herein “Chen”).
Regarding claim 5, Yao teaches wherein the one or more processors to threshold comprises the one or more processors to assign the mask indicators when a corresponding trained value is less than a threshold (Yao page 15, lines 8-20, thresholds are used to evaluate a selected weight where the thresholds are used to set a connect/disconnect/no change indicator for each weight).
Yao does not explicitly teach floating point.
Yao further does not explicitly teach the threshold being not greater than 0.4.
Diehl NPL teaches floating point values (Diehl section 3, the training of the original ReLU NN is with floating point values).
Chen teaches the threshold being not greater than 0.4 (Chen paras. [0036]-[0037], the connection value in a neural network associated with a weight is set based on a second threshold value predetermined to be 0.4).
Therefore, taking the teachings of Yao and Diehl NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the floating point value as disclosed in Diehl NPL at least because doing so would yield higher classification accuracies (Diehl NPL section 3).
Further, taking the teachings of Yao and Chen together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the .
Claims 7, 17 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Inazumi in view of Chen NPL, in view of Diehl NPL, as set forth above regarding claim 4 from which claim 7 depends, and as set forth above regarding claim 14 from which claim 17 depends, and as set forth above regarding claim 21 from which claim 23 depends, further in view of Kharaghani et al., (US 2018/0300629 A1, herein “Kharaghani”).
Regarding claims 7, 17 and 23, Yao teaches wherein [the one or more processors to initialize the values of the dynamic neural network mask comprises the one or more processors – claim 7 / initializing the values of the dynamic neural network mask comprises – claims 17 and 23] (Yao page 17, pseudocode A with an initialization step for the TK binary mask values) but does not explicitly teach the remainder of the limitations of claims 7, 17 and 23.
Diehl NPL teaches to the floating point values (Diehl section 3, the training of the original ReLU NN is with floating point values).
Kharaghani teaches randomly assign[ing] (Kharaghani paras. [0074]-[0075], the probability of retention of a connection is randomly initialized).
Kharaghani further teaches in a [pre-defined – claim 7 only] range of 0.15 to 0.85, inclusive (Kharaghani paras. [0047]-[0049], mask matrix defined by using values from P and R matrixes the P and R matrices being generated via random number generation from between 0 to 1 (pre-defined range), where as shown in equations 3 and 4, features values within the range of 0.15 to 0.85).
Therefore, taking the teachings of Yao and Diehl NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the floating point value as disclosed in Diehl NPL at least because doing so would yield higher classification accuracies (Diehl NPL section 3).
Further, taking the teachings of Yao and Kharaghani together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network system of Yao with the random values as disclosed in Kharaghani at least because doing so would ensure that for any given iteration t, inclusion or exclusion of connections is affected by previous iterations of the training process (Kharaghani para. [0050]).
Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Inazumi in view of Chen NPL, as set forth above regarding claim 1 from which claims 11 and 12 depend, further in view of Robertson et al., (US 2020/0104039 A1, herein “Robertson”).
Regarding claim 11, Yao teaches further comprising the one or more processors to: store the static neural network and the dynamic neural network mask to the memory (Yao page 22, lines 24-26 and page 23, lines 10-18, the sparsely connected DNN model (including the binary mask) is stored to memory for implementation, and memory 903 stores the DNN model data, reference DNN model, sparse DNN model weights and parameters and all other data discussed therein), Yao page 23, lines 10-18, given that memory 903 stores all of the disclosed neural network models and data, and classification scores as well, it will be bigger than just the reference DNN and the sparse DNN model).
Yao does not explicitly teach but less than 1.2 times the first memory storage size.
Robertson teaches but less than 1.2 times the first memory storage size (Robertson para. [0117],  size of compiled neural network is 444 bytes, thus the system only requiring a few hundred bytes of memory to store the neural network).
Thus, taking the teachings of Yao and Robertson together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the memory of Yao to be of a minimal size not much larger than a compiled neural network as disclosed in Robertson, at least because doing so would allow for application of the neural network to run on a low power computational system (Robertson para. [0057]).
Moreover, while Robertson’s “few hundred bytes” suggests a memory size that is less than 1.2 times the compiled neural network of 444 bytes, Robertson does not explicitly teach this. However, given that “few” fits within a set range, and thus defines a finite number of predictable solutions with a reasonable expectation of success, it would be obvious to modify Yao by the teachings of Robertson to have a less than 1.2 times amount of memory to store a compiled (static) neural network at least as doing so would 
Regarding claim 12, Yao teaches further comprising the one or more processors to: generate a dynamic neural network mask training (Yao page 17, pseudocode A, generating mask TK).
Yao does not explicitly teach trigger based on one or more of a duration since a last training, a number of false input query classifications since the last training, or a number of post-deployment training data instances attained since the last training, wherein the processor to train the dynamic neural network mask is in response to the trigger.
Robertson teaches trigger based on one or more of a duration since a last training, a number of false input query classifications since the last training, or a number of post-deployment training data instances attained since the last training, wherein the processor to train the dynamic neural network mask is in response to the trigger (Robertson para. [0097], when additional crowdsourced training data is available (a number of post-deployment training data instances attained since the last training), retraining is done on the neural network model).
Thus, taking the teachings of Yao and Robertson together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the training of a neural network disclosed in Yao to be re-trained when additional training data is available as disclosed in Robertson, at least because doing so would allow for application of the neural network to run on a low power computational system (Robertson para. [0057]).

Allowable Subject Matter
Claims 8-10, 18-19 and 24-25 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The closest cited art of record is Yao which discloses a convergence condition in the neural network pruning algorithm (see page 17). Yao teaches a convergence condition based on a loss function gradient. Yao however, does not teach or suggest in obvious combination to one of ordinary skill in the art, with the other cited art of record the generating of a first set of mask indicators by thresholding trained values which were adapted from initialized values of the dynamic neural network mask, where the count of the first set of mask indicators are compared to a range of a count of mask indicators, the comparison forming the basis for the convergence condition of initialize, adapt and threshold operations that determine the plurality of mask indicators.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Lee et al., US 2019/0164054, directed towards pre-training a neural network including randomly initializing weights and applying a drop out by setting a certain percentage of hidden units to zero.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908.  The examiner can normally be reached on Monday-Friday, 9:30a-6:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for 


MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656