DETAILED ACTION
1.	This office action is in response to the Application No. 16272880 filed on 09/02/2022. Claims 21, 28 and 35 have been cancelled. Claims 20, 22-27, 29-34 and 36-39 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114

3. 	A request for continued examination under 37 CFR 1.114, including the fee set
forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this
application is eligible for continued examination under 37 CFR 1.114, and the fee set
forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action
has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on
10/20/2022 has been entered.

				Allowable Subject Matter
4. 	Claims 23, 30 and 37 are objected to as being dependent upon a rejected base
claim, but would be allowable if rewritten in independent form including all of the
limitations of the base claim and any intervening claims and overcome the 35 USC 112(b) rejection.

Response to Arguments

5.	On page 2 of the Remarks, the Applicant argued that “This portion of Chen does not teach or suggest that a "weighted combination" of (i) the data currently stored in the memory block as of the time step with (ii) respective updated data for the time step for a memory block that is one memory block higher in the memory block order than the memory block" is combined by either the encoder recurrent neural network 42 or the decoder recurrent neural network 44”. 
	The Offices respectively disagrees with the argument above. Cheng teaches the alignment vector at(s) consists of scores that are respectively applied to obtain the weighted average over all the encoder hidden states to generate a global encoder side context vector ct(s) [0045]; The hierarchical classification system passes the set of word embeddings, one at a time, into the encoder recurrent network 42 to obtain a final encoder hidden state for the inputs in the source sequence 40 [0047]. Cheng also teaches  each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input (as input data for the time step) is combined with the data previously stored in memory (as data stored in the highest ordered memory block encoder 42)[0028]; where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032].
	On page 2 of the Remarks, the Applicant argued that “the cited references do not teach that "respective updated data" stored in each of multiple memory blocks is processed by one or more neural network layers to "generate an output for the time step.”
	The Offices respectively disagrees with the argument above. Cheng teaches in accordance with its training, the encoder LSTM neural network updates the current hidden state 46 of the encoder LSTM neural network based on results of processing the current input in the sequence 40 [0027]; … the amount of the previous memory that is stored by the cell, where the stored memory can be used in generating a current activation or used by other elements of the GRU neural network [0028]. Sak also teaches the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024]. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 20, 22, 24, 25, 27,29,31, 32, 34, 36, 38 and 39 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng et al (US20190171913 filed on 12/04/2017) in view of Sak et al (US20150161991)

	Regarding claim 20, Cheng teaches a computer-implemented method, the method (This specification describes systems implemented by one or more computers executing one or more computer programs that can classify an input text block according to a taxonomic hierarchy using neural networks (e.g., one or more recurrent neural networks (RNNs), LSTM neural networks, and/or GRU neural networks [0004]) comprising: 
	initializing, (the hierarchical classification system initializes the current hidden state of the decoder recurrent neural network 82 for the first output position with the final hidden state of the encoder recurrent neural network 42 [0043])
	for each of a plurality of memory blocks, (the encoder recurrent neural network 42 including LSTM memory block and decoder recurrent neural network 82 including LSTM memory block [0027], [0041], Fig. 7) 
	data stored in the memory block, (In this example, each of the encoder and decoder LSTM neural networks includes one or more LSTM neural network layers, each of which includes one or more LSTM memory blocks of one or more memory cells, … enable the cell to store previous activations of the cell, which can be used in generating a current activation or used by other elements of the LSTM neural network [0027]) 
	the memory blocks being ordered according to a memory block order; (encoder neural network 42 including LSTM memory block -> decoder neural network 82 including LSTM memory block, [0026], [0041] Fig. 7)
	at each of a plurality of time steps: (a source sequence of inputs corresponding to the input text block is processed, one at a time per time step [0005]; a general form of the attention model is a variable length alignment vector at(s) that has a length equal to the number of time steps on the encoder side and is derived by comparing the current decoder hidden state ht with the encoder hidden state h s: [0045])
	obtaining input data for the time step; (The encoder LSTM neural network processes the inputs in the sequence 40 in a particular order (e.g., in input order) [0027])
	for a highest ordered memory block (the encoder neural network 42 including LSTM memory block [0026]. The Examiner notes that the encoder as the highest ordered memory block) 
	according to the memory block order, (encoder neural network 42 including LSTM memory block -> decoder neural network 82 including LSTM memory block, [0026], [0041] Fig. 7)
	generating respective updated data for the highest ordered memory block for the time step by computing a weighted combination of (The alignment vector at(s) consists of scores that are respectively applied to obtain the weighted average over all the encoder hidden states to generate a global encoder side context vector ct(s) [0045]; The hierarchical classification system passes the set of word embeddings, one at a time, into the encoder recurrent network 42 to obtain a final encoder hidden state for the inputs in the source sequence 40 [0047])
	(i) the data stored in the highest ordered memory block as of the time step with (ii) the input data for the time step storing the respective updated data for the highest ordered memory for the time step in the highest ordered memory block; and (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input is combined with the data previously stored in memory [0028]; where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032]. The Examiner notes that the current input as input data for the time step which is combined with the data previously stored in memory as data stored in the highest ordered memory block encoder 42) 	 
	for each memory block (decoder neural network 82 including LSTM memory block, Fig .7) 
	after the highest ordered memory block (encoder neural network 42 including LSTM memory block)
	according to the memory block order: (encoder neural network 42 including LSTM memory block -> decoder neural network 82 including LSTM memory block, [0026], [0041] Fig. 7)
	generating respective updated memory data for the memory block for the
time step by computing a weighted combination of (the decoder neural network 82 generates an attentional vector from a weighted average over the final hidden states of the encoder recurrent neural network 42, where the weights are derived from the final hidden states of the encoder recurrent neural network 42 and the current decoder hidden state, and the decoder neural network 82 [0047])
	(i) the data currently stored in the memory block as of the time step (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input is combined with the data previously stored in memory [0028]) with
	(ii) respective updated data for the time step for a memory block that is one memory block higher in the memory block order than the memory block (In accordance with this method, a set of attention scores are generated for the position in the output order being predicted from the updated decoder recurrent neural network hidden state for the position in the output order being predicted and the encoder recurrent neural network hidden states for the inputs in the source sequence [0042]) and 
	storing the respective updated data for the memory block for the time step in memory block (in accordance with its training, the encoder LSTM neural network updates the current hidden state 46 of the encoder LSTM neural network based on results of processing the current input in the sequence 40 [0027]; and an update gate that controls the amount of the previous memory that is stored by the cell, where the stored memory can be used in generating a current activation or used by other elements of the GRU neural network [0028]) and  
	at each of the plurality of time steps, (a source sequence of inputs corresponding to the input text block is processed, one at a time per time step [0005]; a general form of the attention model is a variable length alignment vector at(s) that has a length equal to the number of time steps on the encoder side and is derived by comparing the current decoder hidden state ht with the encoder hidden state h s: [0045])
	Cheng does not explicitly teach processing the respective updated data stored in each of the plurality of memory blocks at the time step using one or more output neural network layers to generate an output for the time step.
	Sak teaches processing the respective updated data stored in each of the plurality of memory blocks at the time step using one or more output neural network layers to generate an output for the time step. (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Cheng to incorporate the teachings of Sak for the benefit of training an acoustic modeling system easily by reducing the dimensionality of the data that is fed back to an LSTM memory block (Sak, [0009])

	Regarding claim 22, Modified Cheng teaches the method of claim 20, Cheng teaches wherein, for each memory block (decoder neural network 82 including LSTM memory block, Fig .7) 
	after the highest ordered memory block, (encoder neural network 42 including LSTM memory block)
	respective weights in the weighted combination for (The alignment vector at(s) consists of scores that are respectively applied to obtain the weighted average over all the encoder hidden states to generate a global encoder side context vector ct(s) [0045]; The hierarchical classification system passes the set of word embeddings, one at a time, into the encoder recurrent network 42 to obtain a final encoder hidden state for the inputs in the source sequence 40 [0047])
 	(i) the data currently stored in the memory block as of the time step (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input is combined with the data previously stored in memory [0028]) and 
	(ii) respective updated data for the time step for a memory block that is one memory block higher in the memory block order than the memory block are determined based on a position of the memory block in the memory block order. (In accordance with this method, a set of attention scores are generated for the position in the output order being predicted from the updated decoder recurrent neural network hidden state for the position in the output order being predicted and the encoder recurrent neural network hidden states for the inputs in the source sequence [0042])

	Regarding claim 24, Modified Cheng teaches the method of claim 20, Cheng teaches for each memory block, processing the respective updated data stored in the memory block using one or more respective viewport neural network layers that correspond to the memory block to generate respective viewport layer data for the memory block; (the encoder recurrent network 42 outputs the encoder hidden states 46 to the decoder recurrent neural network 44. The decoder recurrent neural network 44 processes the encoder hidden states 46 through the hidden decoder neural network layers 56, 58. The decoder recurrent neural network 44 includes a softmax layer 60 that uses the encoder hidden states 46 to calculate scores for all the outputs (e.g., class labels) in the hierarchy structure dictionary 38 at each time step [0034], Fig. 4) and
	processing the respective viewport layer data for the memory blocks using one or more summarizer neural network layers to generate the output for the time step. (For each position in the output sequence 48, the attention module 84 configures the decoder recurrent neural network 82 to generate an attention vector (or attention layer) over the encoder hidden states 46 based on the current output (i.e., the output predicted in the preceding time step) and the encoder hidden states [0043]; “Examiner note: the attention module is interpreted as the summarizer”)  
	Sak teaches wherein processing the respective updated data stored in each of the plurality of memory blocks at the time step using one or more output neural network layers to generate an output for the time step comprises: (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	The same motivation to combine independent claim 20 applies here.

	Regarding claim 25, Modified Cheng teaches the method of claim 20, Cheng teaches wherein obtaining input data for the time step comprises: receiving an observation for the time step; (The encoder LSTM neural network processes the inputs in the sequence 40 in a particular order (e.g., in input order) [0027]; a source sequence of inputs corresponding to the input text block is processed, one at a time per time step [0005]) 
	and processing the observation for the time step using one or more input neural network layers to generate the input data for the time step (The decoder recurrent neural network 44 processes the encoder hidden states 46 through the hidden decoder neural network layers 56, 58 [0034])

	Regarding claim 27, Cheng teaches a system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations (one or more sets of computer instructions encoded on one or more tangible non-transitory carrier media (e.g., a machine-readable storage device, substrate, or sequential access memory device) for execution by data processing apparatus [0049]) comprising:
	initializing, (the hierarchical classification system initializes the current hidden state of the decoder recurrent neural network 82 for the first output position with the final hidden state of the encoder recurrent neural network 42 [0043])
	for each of a plurality of memory blocks, (the encoder recurrent neural network 42 including LSTM memory block and decoder recurrent neural network 82 including LSTM memory block [0027], [0041], Fig. 7) 
	data stored in the memory block, (In this example, each of the encoder and decoder LSTM neural networks includes one or more LSTM neural network layers, each of which includes one or more LSTM memory blocks of one or more memory cells, … enable the cell to store previous activations of the cell, which can be used in generating a current activation or used by other elements of the LSTM neural network [0027]) 
	the memory blocks being ordered according to a memory block order; (encoder neural network 42 including LSTM memory block -> decoder neural network 82 including LSTM memory block, [0026], [0041] Fig. 7)
	at each of a plurality of time steps: (a source sequence of inputs corresponding to the input text block is processed, one at a time per time step [0005]; a general form of the attention model is a variable length alignment vector at(s) that has a length equal to the number of time steps on the encoder side and is derived by comparing the current decoder hidden state ht with the encoder hidden state h s: [0045])
	obtaining input data for the time step; (The encoder LSTM neural network processes the inputs in the sequence 40 in a particular order (e.g., in input order) [0027])
	for a highest ordered memory block (the encoder neural network 42 including LSTM memory block [0026]. The Examiner notes that the encoder as the highest ordered memory block) 
	according to the memory block order, , (encoder neural network 42 including LSTM memory block -> decoder neural network 82 including LSTM memory block, [0026], [0041] Fig. 7)
	generating respective updated data for the highest ordered memory block for the time step by computing a weighted combination of (The alignment vector at(s) consists of scores that are respectively applied to obtain the weighted average over all the encoder hidden states to generate a global encoder side context vector ct(s) [0045]; The hierarchical classification system passes the set of word embeddings, one at a time, into the encoder recurrent network 42 to obtain a final encoder hidden state for the inputs in the source sequence 40 [0047])
 	(i) the data stored in the highest ordered memory block as of the time step with (ii) the input data for the time step and storing the respective updated data for the highest ordered memory for the time step in the highest ordered memory block; (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input is combined with the data previously stored in memory [0028]; where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032]. The Examiner notes that the current input as input data for the time step which is combined with the data previously stored in memory as data stored in the highest ordered memory block encoder 42) 	 
	for each memory block (decoder neural network 82 including LSTM memory block, Fig .7) 
	after the highest ordered memory block (encoder neural network 42 including LSTM memory block)
	according to the memory block order: (encoder neural network 42 including LSTM memory block -> decoder neural network 82 including LSTM memory block, [0026], [0041] Fig. 7)
	generating respective updated memory data for the memory block for the time step by computing a weighted combination of (the decoder neural network 82 generates an attentional vector from a weighted average over the final hidden states of the encoder recurrent neural network 42, where the weights are derived from the final hidden states of the encoder recurrent neural network 42 and the current decoder hidden state, and the decoder neural network 82 [0047])
	 (i) the data currently stored in the memory block as of the time step (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input is combined with the data previously stored in memory [0028]) with 
	(ii) respective updated data for the time step for a memory block that is one memory block higher in the memory block order than the memory block (In accordance with this method, a set of attention scores are generated for the position in the output order being predicted from the updated decoder recurrent neural network hidden state for the position in the output order being predicted and the encoder recurrent neural network hidden states for the inputs in the source sequence [0042]) and
	storing the respective updated data for the memory block for the time step in the memory block (in accordance with its training, the encoder LSTM neural network updates the current hidden state 46 of the encoder LSTM neural network based on results of processing the current input in the sequence 40 [0027]; and an update gate that controls the amount of the previous memory that is stored by the cell, where the stored memory can be used in generating a current activation or used by other elements of the GRU neural network [0028]) 
	at each of the plurality of time steps, (a source sequence of inputs corresponding to the input text block is processed, one at a time per time step [0005]; a general form of the attention model is a variable length alignment vector at(s) that has a length equal to the number of time steps on the encoder side and is derived by comparing the current decoder hidden state ht with the encoder hidden state h s: [0045])
	Cheng does not explicitly teach processing the respective updated data stored in each of the plurality of memory blocks at the time step using one or more output neural network layers to generate an output for the time step.
	Sak teaches processing the respective updated data stored in each of the plurality of memory blocks at the time step using one or more output neural network layers to generate an output for the time step. (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Cheng to incorporate the teachings of Sak for the benefit of training an acoustic modeling system easily by reducing the dimensionality of the data that is fed back to an LSTM memory block (Sak, [0009])

	Regarding claim 29, Modified Cheng teaches the system of claim 27, Cheng teaches wherein, for each memory block after the highest ordered memory block, (the encoder neural network 42 including LSTM memory block [0026]. The Examiner notes that the encoder as the highest ordered memory block) 
	respective weights in the weighted combination for (The alignment vector at(s) consists of scores that are respectively applied to obtain the weighted average over all the encoder hidden states to generate a global encoder side context vector ct(s) [0045]; The hierarchical classification system passes the set of word embeddings, one at a time, into the encoder recurrent network 42 to obtain a final encoder hidden state for the inputs in the source sequence 40 [0047]) 
	(i) the data currently stored in the memory block as of the time step (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input is combined with the data previously stored in memory [0028]) and 
	(ii) the respective updated data for the time step for a memory block that is one memory block higher in the memory block order than the memory block are determined based on a position of the memory block in the memory block order. (In accordance with this method, a set of attention scores are generated for the position in the output order being predicted from the updated decoder recurrent neural network hidden state for the position in the output order being predicted and the encoder recurrent neural network hidden states for the inputs in the source sequence [0042])

	Regarding claim 31, Modified Cheng teaches the system of claim 27, Cheng teaches for each memory block, processing the updated data stored in the memory block using one or more respective viewport neural network layers that correspond to the memory block to generate respective viewport layer data for the memory block; (the encoder recurrent network 42 outputs the encoder hidden states 46 to the decoder recurrent neural network 44. The decoder recurrent neural network 44 processes the encoder hidden states 46 through the hidden decoder neural network layers 56, 58. The decoder recurrent neural network 44 includes a softmax layer 60 that uses the encoder hidden states 46 to calculate scores for all the outputs (e.g., class labels) in the hierarchy structure dictionary 38 at each time step [0034], Fig. 4) and
	processing the respective viewport layer data for the memory blocks using one or more summarizer neural network layers to generate the output for the time step. (For each position in the output sequence 48, the attention module 84 configures the decoder recurrent neural network 82 to generate an attention vector (or attention layer) over the encoder hidden states 46 based on the current output (i.e., the output predicted in the preceding time step) and the encoder hidden states [0043]; “Examiner note: the attention module is interpreted as the summarizer”)  
	Sak teaches wherein processing the respective updated data stored in each of the plurality of memory blocks at the time step using one or more output neural network layers to generate an output for the time step comprises: (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	The same motivation to combine independent claim 27 applies here.

	Regarding claim 32, Modified Cheng teaches the system of claim 27, Cheng teaches wherein obtaining input data for the time step comprises: receiving an observation for the time step; (The encoder LSTM neural network processes the inputs in the sequence 40 in a particular order (e.g., in input order) [0027]; a source sequence of inputs corresponding to the input text block is processed, one at a time per time step [0005]) and
	processing the observation for the time step using one or more input neural network layers to generate the input data for the time step (The decoder recurrent neural network 44 processes the encoder hidden states 46 through the hidden decoder neural network layers 56, 58 [0034])

	Regarding claim 34, Cheng teaches one or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations (one or more sets of computer instructions encoded on one or more tangible non-transitory carrier media (e.g., a machine-readable storage device, substrate, or sequential access memory device) for execution by data processing apparatus [0049]) comprising:
	initializing, (the hierarchical classification system initializes the current hidden state of the decoder recurrent neural network 82 for the first output position with the final hidden state of the encoder recurrent neural network 42 [0043])
	for each of a plurality of memory blocks, (the encoder recurrent neural network 42 including LSTM memory block and decoder recurrent neural network 82 including LSTM memory block [0027], [0041], Fig. 7) 
	data stored in the memory block, (In this example, each of the encoder and decoder LSTM neural networks includes one or more LSTM neural network layers, each of which includes one or more LSTM memory blocks of one or more memory cells, … enable the cell to store previous activations of the cell, which can be used in generating a current activation or used by other elements of the LSTM neural network [0027]) 
	the memory blocks being ordered according to a memory block order; (encoder neural network 42 including LSTM memory block -> decoder neural network 82 including LSTM memory block, [0026], [0041] Fig. 7)
	at each of a plurality of time steps: (a source sequence of inputs corresponding to the input text block is processed, one at a time per time step [0005]; a general form of the attention model is a variable length alignment vector at(s) that has a length equal to the number of time steps on the encoder side and is derived by comparing the current decoder hidden state ht with the encoder hidden state h s: [0045])
	obtaining input data for the time step; (The encoder LSTM neural network processes the inputs in the sequence 40 in a particular order (e.g., in input order) [0027])
	for a highest ordered memory block according to the memory block order, (the encoder neural network 42 including LSTM memory block [0026]. The Examiner notes that the encoder as the highest ordered memory block) 
	generating respective updated data for the highest ordered memory block for the time step by computing a weighted combination of (The alignment vector at(s) consists of scores that are respectively applied to obtain the weighted average over all the encoder hidden states to generate a global encoder side context vector ct(s) [0045]; The hierarchical classification system passes the set of word embeddings, one at a time, into the encoder recurrent network 42 to obtain a final encoder hidden state for the inputs in the source sequence 40 [0047]) 
	(i) the data stored in the highest ordered memory block as of the time step with (ii) the input data for the time step and storing the respective updated data for the highest ordered memory for the time step in the highest ordered memory block; (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input is combined with the data previously stored in memory [0028]; where the hierarchical classification system 30 updates a current hidden state of the encoder recurrent neural network 42 at each time step [0032]. The Examiner notes that the current input as input data for the time step which is combined with the data previously stored in memory as data stored in the highest ordered memory block encoder 42) 	 
	for each memory block (decoder neural network 82 including LSTM memory block, Fig .7) 
	after the highest ordered memory block (encoder neural network 42 including LSTM memory block)
	according to the memory block order (encoder neural network 42 including LSTM memory block -> decoder neural network 82 including LSTM memory block, [0026], [0041] Fig. 7)
	generating respective updated memory data for the memory block for the
time step by computing a weighted combination of (the decoder neural network 82 generates an attentional vector from a weighted average over the final hidden states of the encoder recurrent neural network 42, where the weights are derived from the final hidden states of the encoder recurrent neural network 42 and the current decoder hidden state, and the decoder neural network 82 [0047])
 	(i) the data currently stored in the memory block as of the time step (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input is combined with the data previously stored in memory [0028]) with 
	(ii) respective updated data for the time step for a memory block that is one memory block higher in the memory block order than the memory block (In accordance with this method, a set of attention scores are generated for the position in the output order being predicted from the updated decoder recurrent neural network hidden state for the position in the output order being predicted and the encoder recurrent neural network hidden states for the inputs in the source sequence [0042]) and
	storing the respective updated data for the memory block for the time step
in the memory block (in accordance with its training, the encoder LSTM neural network updates the current hidden state 46 of the encoder LSTM neural network based on results of processing the current input in the sequence 40 [0027]; and an update gate that controls the amount of the previous memory that is stored by the cell, where the stored memory can be used in generating a current activation or used by other elements of the GRU neural network [0028]) and
	at each of the plurality of time steps, (a source sequence of inputs corresponding to the input text block is processed, one at a time per time step [0005]; a general form of the attention model is a variable length alignment vector at(s) that has a length equal to the number of time steps on the encoder side and is derived by comparing the current decoder hidden state ht with the encoder hidden state h s: [0045])
	Cheng does not explicitly teach processing the respective updated data stored in each of the plurality of memory blocks at the time step using one or more output neural network layers to generate an output for the time step.
	Sak teaches processing the respective updated data stored in each of the plurality of memory blocks at the time step using one or more output neural network layers to generate an output for the time step. (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Cheng to incorporate the teachings of Sak for the benefit of training an acoustic modeling system easily by reducing the dimensionality of the data that is fed back to an LSTM memory block (Sak, [0009])

	Regarding claim 36, Modified Cheng teaches the non-transitory computer-readable storage media of claim 34, Cheng teaches wherein, for each memory block after the highest ordered memory block, (the encoder neural network 42 including LSTM memory block [0026]. The Examiner notes that the encoder as the highest ordered memory block) 
	respective weights in the weighted combination for (The alignment vector at(s) consists of scores that are respectively applied to obtain the weighted average over all the encoder hidden states to generate a global encoder side context vector ct(s) [0045]; The hierarchical classification system passes the set of word embeddings, one at a time, into the encoder recurrent network 42 to obtain a final encoder hidden state for the inputs in the source sequence 40 [0047]) 
 	(i) the data currently stored in the memory block as of the time step (In this example, each of the encoder and decoder GRU neural networks includes one or more GRU neural network layers, each of which includes one or more GRU blocks of one or more cells, each of which includes a reset gate that controls how the current input is combined with the data previously stored in memory [0028]) and 
	(ii) the respective updated data for the time step for a memory block that is one memory block higher in the memory block order than the memory block are determined based on a position of the memory block in the memory block order. (In accordance with this method, a set of attention scores are generated for the position in the output order being predicted from the updated decoder recurrent neural network hidden state for the position in the output order being predicted and the encoder recurrent neural network hidden states for the inputs in the source sequence [0042])

	Regarding claim 38, Modified Cheng teaches the non-transitory computer-readable storage media of claim 34, Cheng teaches for each memory block, processing the updated data stored in the memory block using one or more respective viewport neural network layers that correspond to the memory block to generate respective viewport layer data for the memory block; (the encoder recurrent network 42 outputs the encoder hidden states 46 to the decoder recurrent neural network 44. The decoder recurrent neural network 44 processes the encoder hidden states 46 through the hidden decoder neural network layers 56, 58. The decoder recurrent neural network 44 includes a softmax layer 60 that uses the encoder hidden states 46 to calculate scores for all the outputs (e.g., class labels) in the hierarchy structure dictionary 38 at each time step [0034], Fig. 4) and
	processing the respective viewport layer data for the memory blocks using one or more summarizer neural network layers to generate the output for the time step. (For each position in the output sequence 48, the attention module 84 configures the decoder recurrent neural network 82 to generate an attention vector (or attention layer) over the encoder hidden states 46 based on the current output (i.e., the output predicted in the preceding time step) and the encoder hidden states [0043]; “Examiner note: the attention module is interpreted as the summarizer”)  
	Sak teaches wherein processing the respective updated data stored in each of the plurality of memory blocks at the time step using one or more output neural network layers to generate an output for the time step comprises: (the output layer 122 receives the layer output from the highest LSTM layer in the sequence of LSTM layers and generates the set of scores for the current time step in accordance with current values of a set of parameters of the output layer [0024])
	The same motivation to combine independent claim 34 applies here.

	Regarding claim 39, Modified Cheng teaches the non-transitory computer-readable storage media of claim 34, Cheng teaches wherein obtaining input data for the time step comprises: receiving an observation for the time step; (The encoder LSTM neural network processes the inputs in the sequence 40 in a particular order (e.g., in input order) [0027]; a source sequence of inputs corresponding to the input text block is processed, one at a time per time step [0005]) and 
	processing the observation for the time step using one or more input neural network layers to generate the input data for the time step (The decoder recurrent neural network 44 processes the encoder hidden states 46 through the hidden decoder neural network layers 56, 58 [0034])

7.	Claims 26 and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng et al (US20190171913 filed on 12/04/2017) in view of Sak et al (US20150161991) and further in view of Allred et al. ("Convolving over time via recurrent connections for sequential weight sharing in neural networks." 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017)

	Regarding claim 26, Modified Cheng teaches the method of claim 25, Modified Cheng does not explicitly teach wherein the one or more input neural network layers comprise one or more convolutional neural network layers.  
	Allred teaches wherein the one or more input neural network layers comprise one or more convolutional neural network layers (Whether convolving over time or space, the same number of convolutions occur and the given layer still produces n values for each w windows. If simply stored until the other time steps complete, these values would have a storage requirement of O(n·w). However, because of the recurrent connections discussed next, the outgoing signals undergo state compression as they are retained over the remaining time steps in a recurrent layer; Fig. 3, pg. 4447, left col., 3) Outgoing signals, “Examiner note: convolutional layer serves as input to the recurrent layer”) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Cheng to incorporate the teachings of Allred for the benefit of reducing area and eliminating data redirection on the outgoing side of the convolutional layer (Allred, pg. 4447, left col., 3) Outgoing signals)

	Regarding claim 33, Modified Cheng teaches the system of claim 32, Modified Cheng does not explicitly teach wherein the one or more input neural network layers comprise one or more convolutional neural network layers.
	wherein the one or more input neural network layers comprise one or more convolutional neural network layers (Whether convolving over time or space, the same number of convolutions occur and the given layer still produces n values for each w windows. If simply stored until the other time steps complete, these values would have a storage requirement of O(n·w). However, because of the recurrent connections discussed next, the outgoing signals undergo state compression as they are retained over the remaining time steps in a recurrent layer; Fig. 3, pg. 4447, left col., 3) Outgoing signals, “Examiner note: convolutional layer serves as input to the recurrent layer”) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Cheng to incorporate the teachings of Allred for the benefit of reducing area and eliminating data redirection on the outgoing side of the convolutional layer (Allred, pg. 4447, left col., 3) Outgoing signals)

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M.G./Examiner, Art Unit 2121

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121