DETAILED ACTION
1.	This office action is in response to the Application No. 16272880 filed on 02/09/2018. Claims 1-19 has been cancelled, claims 20-39 are presented for examination and are currently pending. Applicant’s arguments have been carefully and respectfully considered.
Allowable Subject Matter
2.	Claims 23, 30 and 37 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and overcome the 35 USC 112(b) rejection.

Response to Arguments
3.	Applicant’s arguments are moot in view of the new grounds of rejection.  The examiner is withdrawing the rejections in the previous office action 10/18/2021 because the applicant amendments necessitated the new grounds of rejection presented in this office action. Accordingly, this action is made final.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


4.	Claims 20-39 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claims 20, 27 and 34 recites “than the memory” (claim 20, lines 20-21; claim 27, lines 15-16; claim 34, lines 15-16) which lacks antecedent basis.  The claims recite a plurality of memory blocks, a highest ordered memory block, and a lowest ordered memory block.  It is not clear if the memory is referring to one of the memory block or if the memory is referring to something other than one of the memory blocks.  It is not clear which memory “the memory” is referring to.  
	Claims that are not specifically mentioned are rejected due to dependency.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 20-22, 24, 25, 27-29, 31, 34-36, 38 and 39 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng et al (US20190171913 filed on 12/04/2017) in view of Sak et al (US20150161991)

	Regarding claim 20, Cheng teaches a computer-implemented method, (This specification describes systems implemented by one or more computers executing one or more computer programs that can classify an input text block according to a taxonomic hierarchy using neural networks (e.g., one or more recurrent neural networks (RNNs), LSTM neural networks, and/or GRU neural networks [0004]) 
	the method comprising: initializing, for each of a plurality of memory blocks, data stored in the memory block, (the encoder recurrent neural network 42 and the decoder recurrent neural network 44 are each implemented by a respective GRU neural network. In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells [0028]; a sequence 40 of inputs (e.g., X1, X2, . . . , XM) into a sequence 48 of outputs (e.g., Y1, Y2, . . . , YN) corresponding to a structured classification path of nodes in a taxonomic hierarchy (e.g., taxonomic hierarchy 10) [0034]; “Examiner note: GRU memory/block/cells are the memory blocks.  Each layer includes memory blocks and they would have hierarchy of layers and therefore order of blocks”)
	the memory blocks having being ordered according to associated memory block order; (The hierarchical classification system 30 includes an encoder recurrent neural network 42 and a decoder recurrent neural network 44 [0026]; In this example, each of the encoder and decoder LSTM neural networks includes one or more LSTM neural network layers, each of which includes one or more LSTM memory blocks of one or more memory cells, each of which includes an input gate, a forget gate, and an output gate that enable the cell to store previous activations of the cell … The encoder LSTM neural network processes the inputs in the sequence 40 in a particular order (e.g., in input order or reverse input order) and, in accordance with its training, the encoder LSTM neural network updates the current hidden state 46 of the encoder LSTM neural network based on results of processing the current input in the sequence 40 [0027]; In this example, the encoder recurrent neural network 42 includes two hidden neural network layers 54 (which comprises memory blocks), and the decoder recurrent neural network 44 includes hidden neural network layers 58 (which comprises memory blocks) [0034], Fig. 4)
	at each of a plurality of time steps: obtaining input data for the time step; (In this regard, the hierarchical classification system 30 processes the sequence 40 of inputs using the encoder recurrent neural network 42 to generate a respective encoder hidden state 46 for each input in the sequence of inputs 40, where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032])
	for each memory block starting from a highest ordered memory block until a lowest ordered memory block according to the memory block order (The encoder recurrent neural network 42 transforms each input in the input sequence 40 into a respective encoder hidden state until an end-of-sequence symbol (e.g., <eos>) is reached), Fig. 4; “Examiner Note: This shows sequential processing of input sequence 40 from highest to lowest, i.e first to last”)
	combining the data currently stored in the memory block as of the time step with data passed to the memory block at the time step to generate updated data, and storing the updated data in the memory block, (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input (as data passed to the memory block) is combined with the data previously stored in memory (as data currently stored in the memory block as of the time step) and an update gate that controls the amount of the previous memory that is stored by the cell [0028], Fig. 4)
	wherein the data passed to the memory block is (i) for the highest ordered memory block, the input data for the time step (Thus, in accordance with its training, the hierarchical classification system 30 is operable to receive a sequence 40 of natural language text inputs and produce, at each time step [0035]; “Examiner note: input data X1 is obtained by the first memory block as highest memory block (in neural network layer 54)” Fig. 4;) and 
	(ii) for each memory block other than the highest ordered memory block, the updated data stored in a memory block that is one memory block higher in the memory block order than the memory; (Thus, for every input word in the text block, the encoder recurrent neural network 42 outputs a respective word vector and a respective hidden state 46. The encoder recurrent neural network 42 uses the hidden state 46 for processing the next input word. The decoder recurrent neural network 44 processes the final hidden state of the encoder recurrent neural network to produce the sequence 48 of outputs. The hierarchical classification system 30 converts the sequence of outputs 48 into an output classification 34 by replacing one or more of the output word embeddings in the sequence of outputs 48 with their corresponding natural language words in the output classification 34 based on the mappings between the word vectors and the node class labels that are stored in the hierarchy structure dictionary 38 [0029]) and
	 processing the updated data stored in the plurality of memory blocks; to generate an output for the time step (The decoder recurrent neural network 44 includes a softmax layer 60 that uses the encoder hidden states 46 to calculate scores for all the outputs (e.g., class labels) in the hierarchy structure dictionary 38 at each time step [0034])
	Cheng does not explicitly teach processing the updated data stored in the plurality of memory blocks using one or more output neural network layers to generate an output for the time step.
	Sak teaches processing the updated data stored in the plurality of memory blocks using one or more output neural network layers to generate an output for the time step (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Cheng to incorporate the teachings of Sak for the benefit of training an acoustic modeling system easily by reducing the dimensionality of the data that is fed back to an LSTM memory block (Sak, [0009])

	Regarding claim 21, Modified Cheng teaches the method of claim 20, Cheng teaches wherein combining the data currently stored in the memory block as of the time step with data passed to the memory block at the time step to generate updated data (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input (as data passed to the memory block) is combined with the data previously stored in memory (as data currently stored in the memory block as of the time step) and an update gate that controls the amount of the previous memory that is stored by the cell [0028], Fig. 4) comprises: 
	Sak teaches computing a weighted combination of the data currently stored in the memory block as of the time step and the data passed to the memory block at the time step (Once the output mt has been computed, the recurrent projection layer 114 computes a recurrent projected output rt for the current time step using the output mt. In particular, the recurrent projected output rt satisfies:		r t =W rm m t
where Wrm is a matrix of current values of weights for the recurrent projection layer 114. The recurrent projected output rt can then be provided to the output layer 122 for use in computing a phoneme representation or to the next LSTM layer in the sequence and fed back to the memory cell for use in computing the output mt+1 at the next time step in the acoustic sequence [0028])
	The same motivation to combine independent claim 20 applies here.

	Regarding claim 22, Modified Cheng teaches the method of claim 21, Cheng teaches wherein respective weights for the data currently stored in the memory block as of the time step and the data passed to the memory block are determined based on a position of the memory block in the memory block order (for each time step, the decoder neural network 82 generates an attentional vector from a weighted average over the final hidden states of the encoder recurrent neural network 42, where the weights are derived from the final hidden states of the encoder recurrent neural network 42 and the current decoder hidden state [0047])

	Regarding claim 24, Modified Cheng teaches the method of claim 20, Cheng teaches for each memory block, processing the updated data stored in the memory block using one or more respective viewport neural network layers that correspond to the memory block to generate respective viewport layer data for the memory block; (the encoder recurrent network 42 outputs the encoder hidden states 46 to the decoder recurrent neural network 44. The decoder recurrent neural network 44 processes the encoder hidden states 46 through the hidden decoder neural network layers 56, 58. The decoder recurrent neural network 44 includes a softmax layer 60 that uses the encoder hidden states 46 to calculate scores for all the outputs (e.g., class labels) in the hierarchy structure dictionary 38 at each time step [0034], Fig. 4) and 	
	processing the respective viewport layer data for the memory blocks using one or more summarizer neural network layers to generate the output for the time step (For each position in the output sequence 48, the attention module 84 configures the decoder recurrent neural network 82 to generate an attention vector (or attention layer) over the encoder hidden states 46 based on the current output (i.e., the output predicted in the preceding time step) and the encoder hidden states [0043]; “Examiner note: the attention module is interpreted as the summarizer”)  
	Sak teaches wherein processing the updated data stored in the plurality of memory blocks using one or more output neural network layers to generate an output for the time step comprises: (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024]) 
	The same motivation to combine independent claim 20 applies here.

	Regarding claim 25, Cheng teaches the method of claim 20, wherein obtaining input data for the time step comprises: receiving an observation for the time step; (In this regard, the hierarchical classification system 30 processes the sequence 40 of inputs using the encoder recurrent neural network 42 to generate a respective encoder hidden state 46 for each input in the sequence of inputs 40, where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032]) and 
	Sak teaches processing the observation for the time step using one or more input neural network layers to generate the input data for the time step (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024]) 
	The same motivation to combine independent claim 20 applies here.

	Regarding claim 27, Cheng teaches a system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations (one or more sets of computer instructions encoded on one or more tangible non-transitory carrier media (e.g., a machine-readable storage device, substrate, or sequential access memory device) for execution by data processing apparatus [0049]) comprising:
	 initializing, for each of a plurality of memory blocks, data stored in the memory block, (the encoder recurrent neural network 42 and the decoder recurrent neural network 44 are each implemented by a respective GRU neural network. In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells [0028]; a sequence 40 of inputs (e.g., X1, X2, . . . , XM) into a sequence 48 of outputs (e.g., Y1, Y2, . . . , YN) corresponding to a structured classification path of nodes in a taxonomic hierarchy (e.g., taxonomic hierarchy 10) [0034]; “Examiner note: GRU memory/block/cells are the memory blocks.  Each layer includes memory blocks and they would have hierarchy of layers and therefore order of blocks”)
	the memory blocks being ordered according to associated memory block order; (The hierarchical classification system 30 includes an encoder recurrent neural network 42 and a decoder recurrent neural network 44 [0026]; In this example, each of the encoder and decoder LSTM neural networks includes one or more LSTM neural network layers, each of which includes one or more LSTM memory blocks of one or more memory cells, each of which includes an input gate, a forget gate, and an output gate that enable the cell to store previous activations of the cell … The encoder LSTM neural network processes the inputs in the sequence 40 in a particular order (e.g., in input order or reverse input order) and, in accordance with its training, the encoder LSTM neural network updates the current hidden state 46 of the encoder LSTM neural network based on results of processing the current input in the sequence 40 [0027]; In this example, the encoder recurrent neural network 42 includes two hidden neural network layers 54 (which comprises memory blocks), and the decoder recurrent neural network 44 includes hidden neural network layers 58 (which comprises memory blocks) [0034], Fig. 4)
	at each of a plurality of time steps: obtaining input data for the time step; (In this regard, the hierarchical classification system 30 processes the sequence 40 of inputs using the encoder recurrent neural network 42 to generate a respective encoder hidden state 46 for each input in the sequence of inputs 40, where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032])
	for each memory block starting from a highest ordered memory block until a lowest ordered memory block according to the memory block order: (The encoder recurrent neural network 42 transforms each input in the input sequence 40 into a respective encoder hidden state until an end-of-sequence symbol (e.g., <eos>) is reached), Fig. 4; “Examiner Note: This shows sequential processing of input sequence 40 from highest to lowest, i.e. first to last”)
	 combining the data currently stored in the memory block as of the time step with data passed to the memory block at the time step to generate updated data, and storing the updated data in the memory block, (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input (as data passed to the memory block) is combined with the data previously stored in memory (as data currently stored in the memory block as of the time step) and an update gate that controls the amount of the previous memory that is stored by the cell [0028], Fig. 4)
	wherein the data passed to the memory block is (i) for the highest ordered memory block, the input data for the time step Thus, in accordance with its training, the hierarchical classification system 30 is operable to receive a sequence 40 of natural language text inputs and produce, at each time step [0035]; “Examiner note: input data X1 is obtained by the first memory block as highest memory block (in neural network layer 54)” Fig. 4;) and
	(ii) for each memory block other than the highest ordered memory block, the updated data stored in a memory block that is one memory block higher in the memory block order than the memory; (Thus, for every input word in the text block, the encoder recurrent neural network 42 outputs a respective word vector and a respective hidden state 46. The encoder recurrent neural network 42 uses the hidden state 46 for processing the next input word. The decoder recurrent neural network 44 processes the final hidden state of the encoder recurrent neural network to produce the sequence 48 of outputs. The hierarchical classification system 30 converts the sequence of outputs 48 into an output classification 34 by replacing one or more of the output word embeddings in the sequence of outputs 48 with their corresponding natural language words in the output classification 34 based on the mappings between the word vectors and the node class labels that are stored in the hierarchy structure dictionary 38 [0029]) and 
	processing the updated data stored in the plurality of memory blocks; to generate an output for the time step (The decoder recurrent neural network 44 includes a softmax layer 60 that uses the encoder hidden states 46 to calculate scores for all the outputs (e.g., class labels) in the hierarchy structure dictionary 38 at each time step [0034])
	  Sak teaches processing the updated data stored in the plurality of memory blocks using one or more output neural network layers to generate an output for the time step (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Cheng to incorporate the teachings of Sak for the benefit of training an acoustic modeling system easily by reducing the dimensionality of the data that is fed back to an LSTM memory block (Sak, [0009])

	Regarding claim 28, The system of claim 27, wherein combining the data currently stored in the memory block as of the time step with data passed to the memory block at the time step to generate updated data (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input (as data passed to the memory block) is combined with the data previously stored in memory (as data currently stored in the memory block as of the time step) and an update gate that controls the amount of the previous memory that is stored by the cell [0028], Fig. 4) comprises: 
	Sak teaches computing a weighted combination of the data currently stored in the memory block as of the time step and the data passed to the memory block at the time step (Once the output mt has been computed, the recurrent projection layer 114 computes a recurrent projected output rt for the current time step using the output mt. In particular, the recurrent projected output rt satisfies:		r t =W rm m t
where Wrm is a matrix of current values of weights for the recurrent projection layer 114. The recurrent projected output rt can then be provided to the output layer 122 for use in computing a phoneme representation or to the next LSTM layer in the sequence and fed back to the memory cell for use in computing the output mt+1 at the next time step in the acoustic sequence [0028])
	The same motivation to combine independent claim 27 applies here.

	Regarding claim 29, Modified Cheng teaches the system of claim 28, Cheng teaches wherein respective weights for the data currently stored in the memory block as of the time step and the data passed to the memory block are determined based on a position of the memory block in the memory block order (for each time step, the decoder neural network 82 generates an attentional vector from a weighted average over the final hidden states of the encoder recurrent neural network 42, where the weights are derived from the final hidden states of the encoder recurrent neural network 42 and the current decoder hidden state [0047])

	Regarding claim 31, Modified Cheng teaches the system of claim 27, Cheng teaches for each memory block, processing the updated data stored in the memory block using one or more respective viewport neural network layers that correspond to the memory block to generate respective viewport layer data for the memory block; (the encoder recurrent network 42 outputs the encoder hidden states 46 to the decoder recurrent neural network 44. The decoder recurrent neural network 44 processes the encoder hidden states 46 through the hidden decoder neural network layers 56, 58. The decoder recurrent neural network 44 includes a softmax layer 60 that uses the encoder hidden states 46 to calculate scores for all the outputs (e.g., class labels) in the hierarchy structure dictionary 38 at each time step [0034], Fig. 4) and 	
	processing the respective viewport layer data for the memory blocks using one or more summarizer neural network layers to generate the output for the time step (For each position in the output sequence 48, the attention module 84 configures the decoder recurrent neural network 82 to generate an attention vector (or attention layer) over the encoder hidden states 46 based on the current output (i.e., the output predicted in the preceding time step) and the encoder hidden states [0043]; “Examiner note: the attention module is interpreted as the summarizer”)  
	 Sak teaches wherein processing the updated data stored in the plurality of memory blocks using one or more output neural network layers to generate an output for the time step (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024]) 
	The same motivation to combine independent claim 27 applies here.

	Regarding claim 32, Cheng teaches the system of claim 27, Cheng teaches wherein obtaining input data for the time step comprises: receiving an observation for the time step; (In this regard, the hierarchical classification system 30 processes the sequence 40 of inputs using the encoder recurrent neural network 42 to generate a respective encoder hidden state 46 for each input in the sequence of inputs 40, where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032]) and 
	Sak teaches processing the observation for the time step using one or more input neural network layers to generate the input data for the time step (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024]) 
	The same motivation to combine independent claim 27 applies here.

	Regarding claim 34, Cheng teaches one or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations (one or more sets of computer instructions encoded on one or more tangible non-transitory carrier media (e.g., a machine-readable storage device, substrate, or sequential access memory device) for execution by data processing apparatus [0049])  comprising: 	initializing, for each of a plurality of memory blocks, data stored in the memory block, (the encoder recurrent neural network 42 and the decoder recurrent neural network 44 are each implemented by a respective GRU neural network. In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells [0028]; a sequence 40 of inputs (e.g., X1, X2, . . . , XM) into a sequence 48 of outputs (e.g., Y1, Y2, . . . , YN) corresponding to a structured classification path of nodes in a taxonomic hierarchy (e.g., taxonomic hierarchy 10) [0034]; “Examiner note: GRU memory/block/cells are the memory blocks.  Each layer includes memory blocks and they would have hierarchy of layers and therefore order of blocks”)
	 the memory blocks being ordered according to associated memory block order; (The hierarchical classification system 30 includes an encoder recurrent neural network 42 and a decoder recurrent neural network 44 [0026]; In this example, each of the encoder and decoder LSTM neural networks includes one or more LSTM neural network layers, each of which includes one or more LSTM memory blocks of one or more memory cells, each of which includes an input gate, a forget gate, and an output gate that enable the cell to store previous activations of the cell … The encoder LSTM neural network processes the inputs in the sequence 40 in a particular order (e.g., in input order or reverse input order) and, in accordance with its training, the encoder LSTM neural network updates the current hidden state 46 of the encoder LSTM neural network based on results of processing the current input in the sequence 40 [0027]; In this example, the encoder recurrent neural network 42 includes two hidden neural network layers 54 (which comprises memory blocks), and the decoder recurrent neural network 44 includes hidden neural network layers 58 (which comprises memory blocks) [0034], Fig. 4)
	at each of a plurality of time steps: obtaining input data for the time step; (In this regard, the hierarchical classification system 30 processes the sequence 40 of inputs using the encoder recurrent neural network 42 to generate a respective encoder hidden state 46 for each input in the sequence of inputs 40, where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032])
	for each memory block starting from a highest ordered memory block until a lowest ordered memory block according to the memory block order: (The encoder recurrent neural network 42 transforms each input in the input sequence 40 into a respective encoder hidden state until an end-of-sequence symbol (e.g., <eos>) is reached), Fig. 4; “Examiner Note: This shows sequential processing of input sequence 40 from highest to lowest, i.e. first to last”)
	 combining the data currently stored in the memory block as of the time step with data passed to the memory block at the time step to generate updated data, and storing the updated data in the memory block, (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input (as data passed to the memory block) is combined with the data previously stored in memory (as data currently stored in the memory block as of the time step) and an update gate that controls the amount of the previous memory that is stored by the cell [0028], Fig. 4)
	 wherein the data passed to the memory block is (i) for the highest ordered memory block, the input data for the time step (Thus, in accordance with its training, the hierarchical classification system 30 is operable to receive a sequence 40 of natural language text inputs and produce, at each time step [0035]; “Examiner note: input data X1 is obtained by the first memory block as highest memory block (in neural network layer 54)” Fig. 4;) and 
	(ii) for each memory block other than the highest ordered memory block, the updated data stored in a memory block that is one memory block higher in the memory block order than the memory; (Thus, for every input word in the text block, the encoder recurrent neural network 42 outputs a respective word vector and a respective hidden state 46. The encoder recurrent neural network 42 uses the hidden state 46 for processing the next input word. The decoder recurrent neural network 44 processes the final hidden state of the encoder recurrent neural network to produce the sequence 48 of outputs. The hierarchical classification system 30 converts the sequence of outputs 48 into an output classification 34 by replacing one or more of the output word embeddings in the sequence of outputs 48 with their corresponding natural language words in the output classification 34 based on the mappings between the word vectors and the node class labels that are stored in the hierarchy structure dictionary 38 [0029]) and 	processing the updated data stored in the plurality of memory blocks ; to generate an output for the time step (The decoder recurrent neural network 44 includes a softmax layer 60 that uses the encoder hidden states 46 to calculate scores for all the outputs (e.g., class labels) in the hierarchy structure dictionary 38 at each time step [0034]) 
	Cheng does not explicitly teach processing the updated data stored in the plurality of memory blocks using one or more output neural network layers to generate an output for the time step.
	Sak teaches processing the updated data stored in the plurality of memory blocks using one or more output neural network layers to generate an output for the time step (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Cheng to incorporate the teachings of Sak for the benefit of training an acoustic modeling system easily by reducing the dimensionality of the data that is fed back to an LSTM memory block (Sak, [0009])

	Regarding claim 35,  Modified Cheng teaches the non-transitory computer-readable storage media of claim 34, Cheng teaches wherein combining the data currently stored in the memory block as of the time step with data passed to the memory block at the time step to generate updated data (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input (as data passed to the memory block) is combined with the data previously stored in memory (as data currently stored in the memory block as of the time step) and an update gate that controls the amount of the previous memory that is stored by the cell [0028], Fig. 4) comprises: 
	Sak teaches computing a weighted combination of the data currently stored in the memory block as of the time step and the data passed to the memory block at the time step (Once the output mt has been computed, the recurrent projection layer 114 computes a recurrent projected output rt for the current time step using the output mt. In particular, the recurrent projected output rt satisfies:		r t =W rm m t
where Wrm is a matrix of current values of weights for the recurrent projection layer 114. The recurrent projected output rt can then be provided to the output layer 122 for use in computing a phoneme representation or to the next LSTM layer in the sequence and fed back to the memory cell for use in computing the output mt+1 at the next time step in the acoustic sequence [0028])
	The same motivation to combine independent claim 34 applies here.

	Regarding claim 36, Modified Cheng teaches the non-transitory computer-readable storage media of claim 35, Cheng teaches wherein respective weights for the data currently stored in the memory block as of the time step and the data passed to the memory block are determined based on a position of the memory block in the memory block order (for each time step, the decoder neural network 82 generates an attentional vector from a weighted average over the final hidden states of the encoder recurrent neural network 42, where the weights are derived from the final hidden states of the encoder recurrent neural network 42 and the current decoder hidden state [0047])

	Regarding claim 38, Modified Cheng teaches the non-transitory computer-readable storage media of claim 34, Cheng teaches for each memory block, processing the updated data stored in the memory block using one or more respective viewport neural network layers that correspond to the memory block to generate respective viewport layer data for the memory block; (the encoder recurrent network 42 outputs the encoder hidden states 46 to the decoder recurrent neural network 44. The decoder recurrent neural network 44 processes the encoder hidden states 46 through the hidden decoder neural network layers 56, 58. The decoder recurrent neural network 44 includes a softmax layer 60 that uses the encoder hidden states 46 to calculate scores for all the outputs (e.g., class labels) in the hierarchy structure dictionary 38 at each time step [0034], Fig. 4) and
	processing the respective viewport layer data for the memory blocks using one or more summarizer neural network layers to generate the output for the time step (For each position in the output sequence 48, the attention module 84 configures the decoder recurrent neural network 82 to generate an attention vector (or attention layer) over the encoder hidden states 46 based on the current output (i.e., the output predicted in the preceding time step) and the encoder hidden states [0043]; “Examiner note: the attention module is interpreted as the summarizer”)  
	Sak teaches wherein processing the updated data stored in the plurality of memory blocks using one or more output neural network layers to generate an output for the time step comprises: (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	The same motivation to combine independent claim 34 applies here.

	Regarding claim 39, Modified Cheng teaches the non-transitory computer-readable storage media of claim 34, Cheng teaches wherein obtaining input data for the time step comprises: receiving an observation for the time step; (In this regard, the hierarchical classification system 30 processes the sequence 40 of inputs using the encoder recurrent neural network 42 to generate a respective encoder hidden state 46 for each input in the sequence of inputs 40, where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032]) and 
	processing the observation for the time step using one or more input neural network layers to generate the input data for the time step (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024]) 
	The same motivation to combine independent claim 34 applies here.

6.	Claims 26, 33 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng et al (US20190171913 filed on 12/04/2017) in view of Sak et al (US20150161991) and further in view of Allred et al. ("Convolving over time via recurrent connections for sequential weight sharing in neural networks." 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017)

	Regarding claim 26, Modified Cheng teaches the method of claim 25, Modified Cheng does not explicitly teach wherein the one or more input neural network layers comprise one or more convolutional neural network layers.  
	Allred teaches wherein the one or more input neural network layers comprise one or more convolutional neural network layers (Whether convolving over time or space, the same number of convolutions occur and the given layer still produces n values for each w windows. If simply stored until the other time steps complete, these values would have a storage requirement of O(n·w). However, because of the recurrent connections discussed next, the outgoing signals undergo state compression as they are retained over the remaining time steps in a recurrent layer; Fig. 3, pg. 4447, left col., 3) Outgoing signals, “Examiner note: convolutional layer serves as input to the recurrent layer”) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Cheng to incorporate the teachings of Allred for the benefit of reducing area and eliminating data redirection on the outgoing side of the convolutional layer (Allred, pg. 4447, left col., 3) Outgoing signals)

	Regarding claim 33, Modified Cheng teaches the system of claim 32, Modified Cheng does not explicitly teach wherein the one or more input neural network layers comprise one or more convolutional neural network layers.
	wherein the one or more input neural network layers comprise one or more convolutional neural network layers (Whether convolving over time or space, the same number of convolutions occur and the given layer still produces n values for each w windows. If simply stored until the other time steps complete, these values would have a storage requirement of O(n·w). However, because of the recurrent connections discussed next, the outgoing signals undergo state compression as they are retained over the remaining time steps in a recurrent layer; Fig. 3, pg. 4447, left col., 3) Outgoing signals, “Examiner note: convolutional layer serves as input to the recurrent layer”) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Cheng to incorporate the teachings of Allred for the benefit of reducing area and eliminating data redirection on the outgoing side of the convolutional layer (Allred, pg. 4447, left col., 3) Outgoing signals)

Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121