DETAILED ACTION
This communication is in response to the application filed on 03 April 2020.  Claims 1-20 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 17 April 2020 and 28 June 2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the IDS are being considered by Examiner.

Drawing Objections
FIG. 3A is objected to because of the following informalities:
Paragraph 44 of the disclosure references “a CTC model 375”, a “three-dimensional output 380,” and “an RNN-T model 385” for FIG. 3A, however none of the indicator numbers (375, 380, 385) appear in the figure.

Claim Objections
Claim 1 is objected to because of the following informalities: 
“wherein encoder pretraining pretrains an encoder of the RNN-T with CE criterion, and wherein whole-network pretraining pretrains the RNN-T as whole using CE criterion, wherein the CE criterion represent a divergence” should be changed to “wherein the encoder pretraining pretrains an encoder of the RNN-T with a cross-entropy (CE) criterion, and wherein the whole-network pretraining pretrains the RNN-T as whole using the CE criterion, wherein the CE criterion represents a divergence,” in order to establish proper antecedent basis and because the word “criterion’ is the singular form of the noun and its corresponding verb should be written in its proper form.
“processing the first textual output in with an application on the data processing system” should be changed to “processing a first textual output  with an application of the data processing system,” in order to establish proper antecedent basis and to clarify the wording of the claim.

Claim 2 is objected to because of the following informalities:
“wherein for encoder pretraining” should be changed to “wherein for the encoder pretraining,” in order to establish proper antecedent basis. 
“with frame boundaries of the plurality of audio frames” should be changed to “with frame boundaries of the plurality of frames of audio data,” in order to establish proper antecedent basis.

Claim 3 is objected to because of the following informalities:
The claim recites the element “whole network training.” However, the claims on which claim 3 depends do not contain this element.  Also, the as-written claim lacks a necessary verb and is therefore unclear.  Examiner suggests that “wherein for whole network training, a prediction network of the RNN-T based on the CE criterion obtained from the neural network-based token classification model” be changed to “wherein the whole network pretraining comprises a prediction network of the RNN-T based on the CE criterion obtained from the neural network-based token classification model.”

Claims 4 and 13 are objected to because of the following informalities:
“boundaries of the plurality of audio frames” should read “boundaries of the plurality of frames of audio data,” in order to establish proper antecedent basis,
“obtaining first cross-entropy (CE) criterion” should read “obtaining a first cross-entropy (CE) criterion,” in order to establish proper antecedent basis.
“wherein the CE criterion represent a divergence” should be changed to “wherein the CE criterion represents a divergence,” because the word “criterion” is the singular form of the noun, therefore the proper verb form should be used.

Claim 10 is objected to because of the following informalities:
The claim recites the limitation "the prediction network," in the limitation “wherein to pretrain the prediction network.”  The claim element “prediction network” is not found in independent claim 4 on which claim 10 depends.  Claim 10 should be changed to depend on claim 9 to overcome this objection.

Claim 20 is objected to because of the following informalities:
The claim recites the limitation " the three-dimensional label matrix," in the limitation “wherein determining the three-dimensional label matrix further comprises.”  The claim element “three-dimensional label matrix” is not found in claim 18 on which claim 20 depends.  Claim 20 should be changed to depend on claim 19 to overcome this objection.
The claim recites the limitation " the location," in the limitation “in the alignment path at the location of the space.”  The claim element “location” is not found in claim 18 on which claim 20 depends, nor in any preceding dependent claims.  “The location” is also not found in claim 19, on which claim 20 should depend, per objection l above.  Claim 20 should be changed recite “a location” or reworded in such a way as to provide a proper antecedent basis for the limitation.

Appropriate correction for all preceding objections is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 11 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The claim recites the limitation " the location," in the limitation “in the alignment path at the location of the space.”  There is insufficient antecedent basis for this limitation in the claim.  The claim element “location” is not found in claim 10 on which claim 11 depends, nor in any preceding dependent claims.  In order to overcome this rejection, claim 11 should be changed to recite “a location” or reworded in such a way as to provide a proper antecedent basis for the limitation.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1 and 3 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Peyser et al. (US 2021/0233512; hereafter Peyser).
	All citations referencing use a two-number format (xx/yy). The first number (xx) refers to the paragraph number as found in the printed US PG Publication. The second number (yy) refers to the paragraph number as found in the provisional application that corresponds to the same content as the US PG Publication ‘”xx” number.
Regarding claim 1,
		Peyser teaches:
a processor; and a computer-readable medium storing executable instructions for causing the processor to perform operations of (see Peyser ¶ 46 in PGPub/38 in provisional app, 52/44: computing device includes a processor (e.g., data processing hardware); computer readable medium used to provide machine instructions and/or data to a programmable processor):
receiving an audio input comprising spoken content (see Peyser ¶ 29/21: audio capture device captures voice inputs and communicates them in digital form (e.g., acoustic frames) to the speech recognizer);
analyzing the audio input using a Recurrent Neural Network-Transducer (RNN-T) to obtain textual content representing the spoken content (see Peyser ¶ 29/21, 30/22: speech recognizer transcribes the acoustic frames into text; speech recognizer includes an RNN-T decoder), 
the RNN-T being pretrained using encoder pretraining or whole network pretraining (see Peyser ¶ 37/29: neural network is trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function) [Examiner interprets initial iterations of backpropagation to encompass pretraining and final iteration to encompass training]),
wherein encoder pretraining pretrains an encoder of the RNN-T with CE criterion, and wherein whole-network pretraining pretrains the RNN-T as whole using CE criterion, wherein the CE criterion represent a divergence between expected outputs and reference outputs of a model (see Peyser ¶ 37/29: during the first stage the training procedure trains the encoder and the RNN-T decoder (e.g., using a CE loss function), loss function defined as a difference between the actual outputs of the network and the desired outputs of network); and
processing the first textual output in with an application on the data processing system (see Peyser ¶ 28/20: user communicat[es] with an automated assistant application, user asks automated assistant [a question], question/spoken utterance captured by the audio capture device and processed by user device, speech recognizer of user device receives the audio input and transcribes [it] (e.g., a text representation [of the question]), automated assistant of the application may respond to the question posed by user using natural language processing).

Regarding claim 3,
	Peyser teaches:
wherein for whole network training, a prediction network of the RNN-T based on the CE criterion (see Peyser ¶ 35/27, 37/29: RNN-T decoder includes a prediction network; training procedure trains the encoder and RNN-T decoder using a CE loss function) 
obtained from the neural network-based token classification model (see Peyser ¶ 37/29, 38/30: neural network trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function); modified loss function for proper noun loss augmentation is represented by the following equations: [see equation] for some constant λ>1, λ [is] a parameter [that] balance[s] the effectiveness of proper noun recognition [a type of classification] with respect to performance of the speech recognizer).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Peyser et al. (US 2021/0233512; hereafter Peyser) in view of Sundaram (US 9972339) in view of Chen et al. (US 2021/0280170; hereafter Chen).
All citations referencing Chen use a two-number format (xx/yy). The first number (xx) refers to the paragraph number as found in the printed US PG Publication. The second number (yy) refers to the paragraph number as found in the provisional application that corresponds to the same content as the US PG Publication ‘”xx” number.
Regarding claim 2, Peyser teaches all the limitations of claim 1.
Peyser does not teach:
wherein for encoder pretraining, the encoder is trained using the CE criterion from a neural network-based token classification model.
Sundaram discloses:
wherein for encoder pretraining, the encoder is trained using the CE criterion from a neural network-based token classification model (see Sundaram col 3:11-13, col 19:61-64: neural network classifier trained to perform speech processing such as identifying speech units [token classification] corresponding to input audio data; encoder RNN E and classifier H may be trained jointly with cross-entropy objective function and the backpropagation-through-time (BTT) algorithm), and 
Peyser and Sundaram are considered to be analogous because they are both from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser to incorporate the disclosure of Sundaram in order to train a neural network to detect voice activity and the position from which the voice activity originates so that the audio with the best representation of the spoken utterance is processed in order to increase the efficacy of the speech-recognition engine (see Sundaram col 16:51-54, col 17:43-54: higher SNR of the audio signal may increase the efficacy of the speech-recognition engine when performing speech recognition; for speech processing it is desirable to select and process the beam with the best representation of the spoken utterance, this may be done by selecting the beam with the highest signal quality metric (SNR), system trains a neural network classifier to determine whether voice activity is detected and what beam or geometric position the voice activity originates from).

Furthermore, regarding claim 2, 
Peyser does not teach:
and the neural network-based token classification model is trained using first token-aligned training data representing a plurality of utterances.
Chen discloses:
the neural network-based token classification model is trained using first token-aligned training data (see Chen ¶ 53 PGPub / ¶ 41 in provisional app: FIG. 5 shows a consistent loss term based on a [Kullback-Leibler divergence] between non-synthetic and synthetic speech recognition results generated by a frame alignment-based transducer model (e.g., RNN-T), non-synthetic speech recognition result [a token] at each output step is aligned with a corresponding input frame of the non-synthetic speech representation [a token] and the synthetic speech recognition result at each output step is aligned with a corresponding input frame of the synthetic speech representation [a token], non-synthetic speech recognition result 311 is associated with a non-synthetic speech representation 304 of a training utterance; RNN-T model 200a emits output labels for the characters “W”, “O”, “R”, and “D” [token classification])
representing a plurality of utterances (see Chen ¶ 47/35: training process employ[s] a consistency loss term module [that] receive[s] the corresponding non-synthetic speech and synthetic speech recognition results output by the ASR model and determine the consistency loss term for the corresponding training utterance pair at the output step),
wherein each utterance is associated with a plurality of frames of audio data (see Chen ¶ 29/17: convert the utterance into a corresponding digital format associated with input acoustic frames capable of being processed by the ASR system), 
and wherein tokens representing each utterance are aligned with frame boundaries of the plurality of audio frames (see Chen ¶ 53/41: non-synthetic speech recognition result [a token] at each output step is aligned with a corresponding input frame of the non-synthetic speech representation [a token]).
 Peyser and Chen are considered to be analogous because they are both from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser to incorporate the disclosure of Chen in order to reduce the time required to train an ASR model (see Chen ¶ 30/18: ASR model include[s] an end-to-end (E2E) sequence-to-sequence model such as a frame alignment-based transducer model [that] provide[s] E2E speech recognition by integrating acoustic, pronunciation, and language models into a single neural network, and does not require a lexicon or a separate text normalization component, structures and optimization mechanisms provide increased accuracy and reduced model training time).

Claims 4, 9, 13, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Peyser et al. (US 2021/0233512; hereafter Peyser1) in view of Chen et al. (US 2021/0280170; hereafter Chen) in view of Wang et al. (S. Wang, P. Zhou, W. Chen, J. Jia and L. Xie, "Exploring RNN-Transducer for Chinese speech recognition," 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 1364-1369; hereafter Wang) in view of Peyser et al. (US 2020/0349922; hereafter Peyser2).
All citations referencing Peyser1 and Chen use a two-number format (xx/yy). The first number (xx) refers to the paragraph number as found in the printed US PG Publication. The second number (yy) refers to the paragraph number as found in the provisional application that corresponds to the same content as the US PG Publication ‘”xx” number.
Regarding claim 4, Peyser1 teaches:
A data processing system comprising: a processor; and a computer-readable medium storing executable instructions for causing the processor to perform operations of (see Peyser1 ¶ 46 in PGPub/38 in provisional app, 52/44: computing device includes a processor (e.g., data processing hardware); computer readable medium used to provide machine instructions and/or data to a programmable processor):
obtaining first cross-entropy (CE) criterion from the neural network-based token classification model (see Peyser1 ¶ 37/29, 38/30: training procedure trains the encoder and RNN-T decoder using a CE loss function, neural network trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function); modified loss function for proper noun loss augmentation is represented by the following equations: [see equation] for some constant λ>1, λ [is] a parameter [that] balance[s] the effectiveness of proper noun recognition [a type of classification] with respect to performance of the speech recognizer), 
wherein the CE criterion represent a divergence between expected outputs and reference outputs of the model (see Peyser1 ¶ 37/29: neural network trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function), loss function defined as a difference between the actual outputs of the network and the desired outputs of network);
Peyser1 does not teach:
training a neural network-based token classification model using first token- aligned training data representing a plurality of utterances, wherein each utterance is associated with a plurality of frames of audio data, and wherein tokens representing each utterance are aligned with frame boundaries of the plurality of audio frames.
Chen discloses:
training a neural network-based token classification model using first token- aligned training data (see Chen ¶ 53 PGPub / ¶ 41 in provisional app: FIG. 5 shows a consistent loss term based on a [Kullback-Leibler divergence] between non-synthetic and synthetic speech recognition results generated by a frame alignment-based transducer model (e.g., RNN-T), non-synthetic speech recognition result [a token] at each output step is aligned with a corresponding input frame of the non-synthetic speech representation [a token] and the synthetic speech recognition result at each output step is aligned with a corresponding input frame of the synthetic speech representation [a token], non-synthetic speech recognition result 311 is associated with a non-synthetic speech representation 304 of a training utterance; RNN-T model 200a emits output labels for the characters “W”, “O”, “R”, and “D” [token classification])
representing a plurality of utterances (see Chen ¶ 47/35: training process employ[s] a consistency loss term module [that] receive[s] the corresponding non-synthetic speech and synthetic speech recognition results output by the ASR model and determine the consistency loss term for the corresponding training utterance pair at the output step),
wherein each utterance is associated with a plurality of frames of audio data (see Chen ¶ 29/17: convert the utterance into a corresponding digital format associated with input acoustic frames capable of being processed by the ASR system), 
and wherein tokens representing each utterance are aligned with frame boundaries of the plurality of audio frames (see Chen ¶ 53/41: non-synthetic speech recognition result [a token] at each output step is aligned with a corresponding input frame of the non-synthetic speech representation [a token]).
Peyser1 and Chen are considered to be analogous because they are both from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 to incorporate the disclosure of Chen in order to reduce the time required to train an ASR model (see Chen ¶ 30/18: ASR model include[s] an end-to-end (E2E) sequence-to-sequence model such as a frame alignment-based transducer model [that] provide[s] E2E speech recognition by integrating acoustic, pronunciation, and language models into a single neural network, and does not require a lexicon or a separate text normalization component, structures and optimization mechanisms provide increased accuracy and reduced model training time).
		Furthermore, regarding claim 4, Peyser1 does not teach:
pretraining an encoder of an RNN-Transducer (RNN-T) based on the first CE criterion obtained from the neural network-based token classification model.
Wang discloses:
pretraining an encoder of an RNN-Transducer (RNN-T) based on the first CE criterion obtained from the neural network-based token classification model (see Wang p. 1365 ¶ 2: for good performance in RNN-T training [we] use a pre-trained CTC [connectionist temporal classification] model to initialize the encoder of the RNN-T model, for a pre-trained CTC model we use frame-wise cross entropy (CE) training to obtain a CE model as the start point of CTC training);
Peyser1 and Wang are considered to be analogous because they are both from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 to incorporate the disclosure of Wang in order for the RNN-T model to perform well during training (see Wang p. 1365 ¶ 2: for good performance in RNN-T training [we] use a pre-trained CTC [connectionist temporal classification] model to initialize the encoder of the RNN-T model, for a pre-trained CTC model we use frame-wise cross entropy (CE) training to obtain a CE model as the start point of CTC training).
Furthermore, regarding claim 4, Peyser1 does not teach:
training the RNN-T with second training data after pretraining the encoder of the 
RNN-T.
	Peyser2 discloses:
training the RNN-T with second training data after pretraining the encoder of the RNN-T (see Peyser2 ¶ 35, 51, 57: decoder 200 includes an E2E RNN-T model 200, RNN-T model 200 includes an encoder network; FIG. 4 shows an example process for obtaining additional training data [2nd training data] for training the neural decoder/encoder 200 to learn to represent various categories of numeric sequences in the written domain; the RNN-T E2E decoder 200 is trained on a spoken domain version of the training set [pre-training the encoder]).
Peyser1 and Peyser2 are considered to be analogous because they are both from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 to incorporate the disclosure of Peyser2 in order to train the ASR system to identify categories of numeric sequences that are frequently seen in logs of digital assistant requests or of search engines, thereby improving end-to-end (E2E) model performance on utterances with numeric sequences (see Peyser2 ¶ 5, 50: to improve end-to-end (E2E) model performance on utterances with numeric sequences, the enhanced ASR system described by this specification uses additional numeric training data that is generated by a text-to-speech (TTS) system; additional training data that represents challenging yet realistic numeric sequences can be generated, to this end categories of numeric sequences that are frequently seen in logs (e.g., of digital assistant requests or of search engines) can be identified).

Regarding claim 9, Peyser1 in view of Chen in view of Wang in view of Peyser2 teach all the limitations of claim 4.
Peyser1 further teaches:
pretraining a prediction network of the RNN-T based on the CE criterion (see Peyser ¶ 35/27, 37/29: RNN-T decoder includes a prediction network; neural network trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function)) 
obtained from the neural network-based token classification model (see Peyser ¶ 37/29, 38/30: neural network trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function); modified loss function for proper noun loss augmentation is represented by the following equations: [see equation] for some constant λ>1, λ [is] a parameter [that] balance[s] the effectiveness of proper noun recognition [a type of classification] with respect to performance of the speech recognizer).

Regarding claim 13, Peyser1 teaches:
obtaining first cross-entropy (CE) criterion from the neural network-based token classification model (see Peyser1 ¶ 37/29, 38/30: training procedure trains the encoder and RNN-T decoder using a CE loss function, neural network trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function); modified loss function for proper noun loss augmentation is represented by the following equations: [see equation] for some constant λ>1, λ [is] a parameter [that] balance[s] the effectiveness of proper noun recognition [a type of classification] with respect to performance of the speech recognizer), 
wherein the CE criterion represent a divergence between expected outputs and reference outputs of the model (see Peyser1 ¶ 37/29: neural network trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function), loss function defined as a difference between the actual outputs of the network and the desired outputs of network);
Peyser1 does not teach:
training a neural network-based token classification model using first token- aligned training data representing a plurality of utterances, wherein each utterance is associated with a plurality of frames of audio data, and wherein tokens representing each utterance are aligned with frame boundaries of the plurality of audio frames.
Chen discloses:
training a neural network-based token classification model using first token- aligned training data (see Chen ¶ 53 PGPub / ¶ 41 in provisional app: FIG. 5 shows a consistent loss term based on a [Kullback-Leibler divergence] between non-synthetic and synthetic speech recognition results generated by a frame alignment-based transducer model (e.g., RNN-T), non-synthetic speech recognition result [a token] at each output step is aligned with a corresponding input frame of the non-synthetic speech representation [a token] and the synthetic speech recognition result at each output step is aligned with a corresponding input frame of the synthetic speech representation [a token], non-synthetic speech recognition result 311 is associated with a non-synthetic speech representation 304 of a training utterance; RNN-T model 200a emits output labels for the characters “W”, “O”, “R”, and “D” [token classification])
representing a plurality of utterances (see Chen ¶ 47/35: training process employ[s] a consistency loss term module [that] receive[s] the corresponding non-synthetic speech and synthetic speech recognition results output by the ASR model and determine the consistency loss term for the corresponding training utterance pair at the output step),
wherein each utterance is associated with a plurality of frames of audio data (see Chen ¶ 29/17: convert the utterance into a corresponding digital format associated with input acoustic frames capable of being processed by the ASR system), 
and wherein tokens representing each utterance are aligned with frame boundaries of the plurality of audio frames (see Chen ¶ 53/41: non-synthetic speech recognition result [a token] at each output step is aligned with a corresponding input frame of the non-synthetic speech representation [a token]).
Peyser1 and Chen are considered to be analogous because they are both from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 to incorporate the disclosure of Chen in order to reduce the time required to train an ASR model (see Chen ¶ 30/18: ASR model include[s] an end-to-end (E2E) sequence-to-sequence model such as a frame alignment-based transducer model [that] provide[s] E2E speech recognition by integrating acoustic, pronunciation, and language models into a single neural network, and does not require a lexicon or a separate text normalization component, structures and optimization mechanisms provide increased accuracy and reduced model training time).
		Furthermore, regarding claim 13, Peyser1 does not teach:
pretraining an encoder of an RNN-Transducer (RNN-T) based on the first CE criterion obtained from the neural network-based token classification model.
Wang discloses:
pretraining an encoder of an RNN-Transducer (RNN-T) based on the first CE criterion obtained from the neural network-based token classification model (see Wang p. 1365 ¶ 2: for good performance in RNN-T training [we] use a pre-trained CTC [connectionist temporal classification] model to initialize the encoder of the RNN-T model, for a pre-trained CTC model we use frame-wise cross entropy (CE) training to obtain a CE model as the start point of CTC training);
Peyser1 and Wang are considered to be analogous because they are both from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 to incorporate the disclosure of Wang in order for the RNN-T model to perform well during training (see Wang p. 1365 ¶ 2: for good performance in RNN-T training [we] use a pre-trained CTC [connectionist temporal classification] model to initialize the encoder of the RNN-T model, for a pre-trained CTC model we use frame-wise cross entropy (CE) training to obtain a CE model as the start point of CTC training).
Furthermore, regarding claim 13, Peyser1 does not teach:
training the RNN-T with second training data after pretraining the encoder of the 
RNN-T.
	Peyser2 discloses:
training the RNN-T with second training data after pretraining the encoder of the RNN-T (see Peyser2 ¶ 35, 51, 57: decoder 200 includes an E2E RNN-T model 200, RNN-T model 200 includes an encoder network; FIG. 4 shows an example process for obtaining additional training data [2nd training data] for training the neural decoder/encoder 200 to learn to represent various categories of numeric sequences in the written domain; the RNN-T E2E decoder 200 is trained on a spoken domain version of the training set [pre-training the encoder]).
Peyser1 and Peyser2 are considered to be analogous because they are both from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 to incorporate the disclosure of Peyser2 in order to train the ASR system to identify categories of numeric sequences that are frequently seen in logs of digital assistant requests or of search engines, thereby improving end-to-end (E2E) model performance on utterances with numeric sequences (see Peyser2 ¶ 5, 50: to improve end-to-end (E2E) model performance on utterances with numeric sequences, the enhanced ASR system described by this specification uses additional numeric training data that is generated by a text-to-speech (TTS) system; additional training data that represents challenging yet realistic numeric sequences can be generated, to this end categories of numeric sequences that are frequently seen in logs (e.g., of digital assistant requests or of search engines) can be identified).

Regarding claim 18, Peyser1 in view of Chen in view of Wang in view of Peyser2 teach all the limitations of claim 13.
Peyser1 further teaches:
pretraining a prediction network of the RNN-T based on the CE criterion (see Peyser ¶ 35/27, 37/29: RNN-T decoder includes a prediction network; neural network trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function)) 
obtained from the neural network-based token classification model (see Peyser ¶ 37/29, 38/30: neural network trained by backpropagation that defines a loss function (e.g., a cross-entropy loss function); modified loss function for proper noun loss augmentation is represented by the following equations: [see equation] for some constant λ>1, λ [is] a parameter [that] balance[s] the effectiveness of proper noun recognition [a type of classification] with respect to performance of the speech recognizer).

Claims 5-6, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Peyser et al. (US 2021/0233512; hereafter Peyser1) in view of Chen et al. (US 2021/0280170; hereafter Chen) in view of Wang et al. (S. Wang, P. Zhou, W. Chen, J. Jia and L. Xie, "Exploring RNN-Transducer for Chinese speech recognition," 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 1364-1369; hereafter Wang) in view of Peyser et al. (US 2020/0349922; hereafter Peyser2), as applied in claim 4 above, further in view of Chelba et al. (US 8494850; hereafter Chelba).
All citations referencing Chen use a two-number format (xx/yy). The first number (xx) refers to the paragraph number as found in the printed US PG Publication. The second number (yy) refers to the paragraph number as found in the provisional application that corresponds to the same content as the US PG Publication ‘”xx” number.
Regarding claim 5, Peyser1 in view of Chen in view of Wang in view of Peyser2 teach all the limitations of claim 4.
Peyser1 in view of Chen in view of Wang in view of Peyser2 do not expressly teach accessing a source training data element representing an utterance of a word; dividing the word into a plurality of tokens; determining a number of frames of audio data comprising the utterance of the word.
Chen discloses: 
accessing a source training data element representing an utterance of a word; dividing the word into a plurality of tokens (see Chen ¶ 53 PGPub / ¶ 41 in provisional app: non-synthetic speech recognition result 311 is associated with a non-synthetic speech representation 304 of a training utterance of the English word “WORD;” RNN-T model 200a emits output labels [tokens] for the characters “W”, “O”, “R”, and “D”);
determining a number of frames of audio data comprising the utterance of the word (see Chen 55/43: see equation 5 where U is the number of output labels in the utterance and T is the number of input frames in the utterance).
Peyser1 in view of Chen in view of Wang in view of Peyser2 and Chen are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 in view of Chen in view of Wang in view of Peyser2 to incorporate the disclosure of Chen in order to reduce the time required to train an ASR model (see Chen ¶ 30/18: structures and optimization mechanisms provide increased accuracy and reduced model training time).
Furthermore, regarding claim 5, Peyser1 in view of Chen in view of Wang in view of Peyser2 do not teach adding the source training data element to the token-aligned training data responsive to the number of frames comprising the utterance exceeding or matching a number of tokens comprising the plurality of tokens.
Chelba discloses:
adding the source training data element to the token-aligned training data responsive to the number of frames comprising the utterance exceeding or matching a number of tokens comprising the plurality of tokens (see Chelba col 13:33-39, col 14:27-30, col 14:34-38: data is received that represents instances of the same phonetic context occurring in different speech samples, reducer receives key-value pairs that identify the same training sequence [source training data], each key-value pair instance of the training sequence corresponds to different speech frames; reducer determines the number of sets of speech frames that have been accumulated for a training sequence, if the number of frames is greater than or equal to the minimum threshold, the reducer produces a GMM for the training sequence and outputs the M-phone; GMM [is] sized as a function of the number of frames, number of Gaussian components [tokens] is determined based on the number of speech frames aligned against a particular HMM state of a phone during training).
Peyser1 in view of Chen in view of Wang in view of Peyser2 and Chelba are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 in view of Chen in view of Wang in view of Peyser2 to incorporate the disclosure of Chelba to enable efficient training of an assigned portion of an acoustic model in a speech recognition system (see Chelba col 1:15-16, col 1:29-33: Speech recognition systems use acoustic models to transcribe utterances; to facilitate training of the acoustic model, data for sequences that include a common sequence of phonetic elements can be transmitted to the same processing module, permitting the processing module to efficiently train the assigned portion of the acoustic model).

Regarding claim 6, Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba teach all the limitations of claim 5.
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba do not expressly teach discarding the source training data element responsive to the number of tokens exceeding the number of frames comprising the utterance.
Chelba discloses: 
discarding the source training data element responsive to the number of tokens exceeding the number of frames comprising the utterance (see Chelba col 13:33-39, col 14:27-32, col 14:34-38: data is received that represents instances of the same phonetic context occurring in different speech samples, reducer receives key-value pairs that identify the same training sequence [source training data], each key-value pair instance of the training sequence corresponds to different speech frames; reducer determines the number of sets of speech frames that have been accumulated for a training sequence, if the number of frames is less than the minimum threshold, the training sequence is omitted from the model; GMM [is] sized as a function of the number of frames, number of Gaussian components [tokens] is determined based on the number of speech frames aligned against a particular HMM state of a phone during training).
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba and Chelba are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba to incorporate the disclosure of Chelba so that an acoustic model does not train training sequences that occur infrequently in a given training set, thereby enabling efficient training of an assigned portion of an acoustic model in a speech recognition system (see Chelba col 1:15-16, col 1:29-33, col 14:32-34: speech recognition systems use acoustic models to transcribe utterances; to facilitate training of the acoustic model, data for sequences that include a common sequence of phonetic elements can be transmitted to the same processing module, permitting the processing module to efficiently train the assigned portion of the acoustic model; in this manner, training of the acoustic model is avoided for training sequences that occur infrequently in the input data).

Regarding claim 14, Peyser1 in view of Chen in view of Wang in view of Peyser2 teach all the limitations of claim 13.
Peyser1 in view of Chen in view of Wang in view of Peyser2 do not expressly teach accessing a source training data element representing an utterance of a word; dividing the word into a plurality of tokens; determining a number of frames of audio data comprising the utterance of the word.
Chen discloses: 
accessing a source training data element representing an utterance of a word; dividing the word into a plurality of tokens (see Chen ¶ 53 PGPub / ¶ 41 in provisional app: non-synthetic speech recognition result 311 is associated with a non-synthetic speech representation 304 of a training utterance of the English word “WORD;” RNN-T model 200a emits output labels [tokens] for the characters “W”, “O”, “R”, and “D”);
determining a number of frames of audio data comprising the utterance of the word (see Chen 55/43: see equation 5 where U is the number of output labels in the utterance and T is the number of input frames in the utterance).
Peyser1 in view of Chen in view of Wang in view of Peyser2 and Chen are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 in view of Chen in view of Wang in view of Peyser2 to incorporate the disclosure of Chen in order to reduce the time required to train an ASR model (see Chen ¶ 30/18: structures and optimization mechanisms provide increased accuracy and reduced model training time).
Furthermore, regarding claim 14, Peyser1 in view of Chen in view of Wang in view of Peyser2 do not teach adding the source training data element to the token-aligned training data responsive to the number of frames comprising the utterance exceeding or matching a number of tokens comprising the plurality of tokens.
Chelba discloses:
adding the source training data element to the token-aligned training data responsive to the number of frames comprising the utterance exceeding or matching a number of tokens comprising the plurality of tokens (see Chelba col 13:33-39, col 14:27-30, col 14:34-38: data is received that represents instances of the same phonetic context occurring in different speech samples, reducer receives key-value pairs that identify the same training sequence [source training data], each key-value pair instance of the training sequence corresponds to different speech frames; reducer determines the number of sets of speech frames that have been accumulated for a training sequence, if the number of frames is greater than or equal to the minimum threshold, the reducer produces a GMM for the training sequence and outputs the M-phone; GMM [is] sized as a function of the number of frames, number of Gaussian components [tokens] is determined based on the number of speech frames aligned against a particular HMM state of a phone during training).
Peyser1 in view of Chen in view of Wang in view of Peyser2 and Chelba are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 in view of Chen in view of Wang in view of Peyser2 to incorporate the disclosure of Chelba to enable efficient training of an assigned portion of an acoustic model in a speech recognition system (see Chelba col 1:15-16, col 1:29-33: Speech recognition systems use acoustic models to transcribe utterances; to facilitate training of the acoustic model, data for sequences that include a common sequence of phonetic elements can be transmitted to the same processing module, permitting the processing module to efficiently train the assigned portion of the acoustic model).

Regarding claim 15, Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba teach all the limitations of claim 14.
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba do not expressly teach discarding the source training data element responsive to the number of tokens exceeding the number of frames comprising the utterance.
Chelba discloses: 
discarding the source training data element responsive to the number of tokens exceeding the number of frames comprising the utterance (see Chelba col 13:33-39, col 14:27-32, col 14:34-38: data is received that represents instances of the same phonetic context occurring in different speech samples, reducer receives key-value pairs that identify the same training sequence [source training data], each key-value pair instance of the training sequence corresponds to different speech frames; reducer determines the number of sets of speech frames that have been accumulated for a training sequence, if the number of frames is less than the minimum threshold, the training sequence is omitted from the model; GMM [is] sized as a function of the number of frames, number of Gaussian components [tokens] is determined based on the number of speech frames aligned against a particular HMM state of a phone during training).
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba and Chelba are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba to incorporate the disclosure of Chelba so that an acoustic model does not train training sequences that occur infrequently in a given training set, thereby enabling efficient training of an assigned portion of an acoustic model in a speech recognition system (see Chelba col 1:15-16, col 1:29-33, col 14:32-34: speech recognition systems use acoustic models to transcribe utterances; to facilitate training of the acoustic model, data for sequences that include a common sequence of phonetic elements can be transmitted to the same processing module, permitting the processing module to efficiently train the assigned portion of the acoustic model; in this manner, training of the acoustic model is avoided for training sequences that occur infrequently in the input data).

Claim 7 and 16 is rejected under 35 U.S.C. 103 as being unpatentable over Peyser et al. (US 2021/0233512; hereafter Peyser1) in view of Chen et al. (US 2021/0280170; hereafter Chen) in view of Wang et al. (S. Wang, P. Zhou, W. Chen, J. Jia and L. Xie, "Exploring RNN-Transducer for Chinese speech recognition," 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 1364-1369; hereafter Wang) in view of Peyser et al. (US 2020/0349922; hereafter Peyser2), in view of Chelba et al. (US 8494850; hereafter Chelba), as applied in claim 5 above, further in view of Yi et al. (Yi, Cheng, Feng Wang, and Bo Xu. "Ectc-Docd: An End-to-End Structure with CTC Encoder and OCD Decoder for Speech Recognition." INTERSPEECH. 2019; hereafter Yi). 
All citations referencing Chen use a two-number format (xx/yy). The first number (xx) refers to the paragraph number as found in the printed US PG Publication. The second number (yy) refers to the paragraph number as found in the provisional application that corresponds to the same content as the US PG Publication ‘”xx” number.
Regarding claim 7, Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba teach all the limitations of claim 5.
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba do not expressly teach processing an output of the encoder of the RNN-T to add alignment information to a high level representation of each frame of acoustic data.
Chen discloses:
processing an output of the encoder of the RNN-T to add alignment information to a high level representation of each frame of acoustic data (see Chen ¶ 31 PGPub / ¶ 19 in provisional app, 34/22, 58/46: RNN-T model includes an encoder network which is roughly analogous to an acoustic model (AM) in a traditional ASR system [that] includes a recurrent network of stacked Long Short-Term Memory (LSTM) layers, encoder reads a sequence of d-dimensional feature vectors (e.g., acoustic frames ) and produces at each time step a higher-order feature representation; softmax layer [additional fully connected layer] composed of a unified word piece or grapheme set that is generated using all unique word pieces or graphemes in a plurality of training data sets; for calculating consistent loss between two output sequences predicted by a frame alignment-based transducer model (e.g., RNN-T), Equations (4)-(9) are applied to determine a probability of having a sequence of output labels predicted/output by the RNN-T model that align with a sequence of output labels representing the ground-truth transcription).
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba and Chen are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba to incorporate the disclosure of Chen so that an ASR engine does not need to wait for the entirety of an audio input before beginning speech recognition computation, thus enabling speech recognition in a streaming fashion (see Chen ¶ 53/41: This misalignment is due to the blank symbols that the RNN-T model emits at output steps in the time-domain to enable speech recognition in the streaming fashion, e.g., alleviating the need to wait for all of the audio before beginning the computation).

Furthermore, regarding claim 7, Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba do not expressly teach with an additional fully-connected layer.
Yi discloses:
with an additional fully-connected layer (see Yi Fig. 1: [in diagram, fully connected layer processes the output of the encoder]).
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba and Yi are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have combined the prior art element of recognizing non-synthetic speech in a training utterance as taught by Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba according to the known method of processing an encoder output with a fully-connected layer as disclosed in Yi to yield the predictable result of allowing all  connections of the output of an encoder to be connected to the inputs of the subsequent layer in order to establish dependency of the different layers (see KSR v. Teleflex).

Regarding claim 16, Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba teach all the limitations of claim 14.
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba do not expressly teach processing an output of the encoder of the RNN-T to add alignment information to a high level representation of each frame of acoustic data.
Chen discloses:
processing an output of the encoder of the RNN-T to add alignment information to a high level representation of each frame of acoustic data (see Chen ¶ 31 PGPub / ¶ 19 in provisional app, 34/22, 58/46: RNN-T model includes an encoder network which is roughly analogous to an acoustic model (AM) in a traditional ASR system [that] includes a recurrent network of stacked Long Short-Term Memory (LSTM) layers, encoder reads a sequence of d-dimensional feature vectors (e.g., acoustic frames ) and produces at each time step a higher-order feature representation; softmax layer [additional fully connected layer] composed of a unified word piece or grapheme set that is generated using all unique word pieces or graphemes in a plurality of training data sets; for calculating consistent loss between two output sequences predicted by a frame alignment-based transducer model (e.g., RNN-T), Equations (4)-(9) are applied to determine a probability of having a sequence of output labels predicted/output by the RNN-T model that align with a sequence of output labels representing the ground-truth transcription).
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba and Chen are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba to incorporate the disclosure of Chen so that an ASR engine does not need to wait for the entirety of an audio input before beginning speech recognition computation, thus enabling speech recognition in a streaming fashion (see Chen ¶ 53/41: This misalignment is due to the blank symbols that the RNN-T model emits at output steps in the time-domain to enable speech recognition in the streaming fashion, e.g., alleviating the need to wait for all of the audio before beginning the computation).

Furthermore, regarding claim 16, Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba do not expressly teach with an additional fully-connected layer.
Yi discloses:
with an additional fully-connected layer (see Yi Fig. 1: [in diagram, fully connected layer processes the output of the encoder]).
Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba and Yi are considered to be analogous because they are from the field of automatic speech recognition.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have combined the prior art element of recognizing non-synthetic speech in a training utterance as taught by Peyser1 in view of Chen in view of Wang in view of Peyser2 in view of Chelba according to the known method of processing an encoder output with a fully-connected layer as disclosed in Yi to yield the predictable result of allowing all  connections of the output of an encoder to be connected to the inputs of the subsequent layer in order to establish dependency of the different layers (see KSR v. Teleflex).

Allowable Subject Matter
Claims 8, 10-12, 17, and 19-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
The closest prior art of record, Zhu et al. (“Multilingual Speech Recognition with Self-Attention Structured Parameterization”), is cited to disclose a method of speech recognition that employs a softmax equation for formulating a final attenuation value based on input features, positional embedding, projection matrices, and acoustic features.  
The closest prior art of record, Korzeniowski et al. (“A FULLY CONVOLUTIONAL DEEP AUDITORY MODEL FOR MUSICAL CHORD RECOGNITION”), is cited to disclose a model for musical chord recognition that employs equations for loss terms based on a number of frames in the training data, a regularization factor, and network parameters, a target, and a network output.  
The closest prior art of record, Afouras et al. (“Deep Audio-visual Speech Recognition”), is cited to disclose a model for speech recognition in which a neural network is trained with a cross-entropy loss to recognize words from a dataset and to process 3D image sequences by changing convolutional and pooling blocks from two dimensions to three dimensions.
The closest prior art of record, Qian et al. (“EXPLORING ASR-FREE END-TO-END MODELING TO IMPROVE SPOKEN LANGUAGE UNDERSTANDING IN A CLOUD-BASED DIALOG SYSTEM”), is cited to disclose end-to-end speech modeling using an RNN acoustic auto-encoder to learn compact lower-dimensional feature representations of higher-dimensional input feature vector sequences.

However, none of these cited references either alone or in combination thereof teaches or makes obvious the combination of limitations as recited in the dependent claims; specifically, the limitation(s) of: 
“the output of the fully-connected layer is calculated by:

    PNG
    media_image1.png
    57
    281
    media_image1.png
    Greyscale

where ffc represents a fully-connected layer, k is a label index and K denotes the largest dimension of an output of a joint network of the RNN-T, and Yt is a word piece label for each input frame Xt,” as recited in claims 8 and 17;
“determining a three-dimensional label matrix from a two-dimensional label matrix used to train the encoder of the RNN-T for each utterance, wherein the three-dimensional label matrix includes an alignment path that represents a series of tokens of an utterance over a series of frames of audio data representing the utterance,” as recited in claims 10 and 19;
“identifying a space in the alignment path representing a pause in the utterance; and setting a blank token in the alignment path at the location of the space,” as recited in claims 11 and 20, due to these claims’ dependence on claims 10 and 19 respectively; and
“optimizing a CE loss between the three-dimensional label matrix and an output of the model,” as recited in claims 12 and 20, due to these claims’ dependence on claims 10, by way of claim 11, and 19 respectively.

Conclusion
Any inquiry concerning this communication or earlier communications from Examiner should be directed to AARON G. ZELLER whose telephone number is (571) 272-5765.  Examiner can normally be reached Monday - Thursday 10 AM - 7:30 PM and every other Friday 10:00 AM - 6:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach Examiner by telephone are unsuccessful, Examiner’s supervisor, Pierre-Louis Desir can be reached at (571) 272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AARON G ZELLER/Examiner, Art Unit 2659                                                                                                                                                                                                        
15 June 2022

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659