Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The following claims is/are pending in this office action: 1-9, 11-21
The following claim(s) is/are amended: 1, 2, 14, 15, and 18
The following claim(s) is/are new: 21
The following claim(s) is/are cancelled: 10
Claim(s) rejected: 1-9, 11-21

Previous Objections Withdrawn
Objection to specification are withdrawn based on the amendments to abstract.

Previous Rejections Withdrawn
Rejections to claims 2 and 15 under 35 U.S.C. 112(b) are withdrawn based on the
Amendments.



Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:

A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5, 8-9, and 14-21 are rejected under 35 U.S.C. 102 as being anticipated by Senior et al. (US 2017 /0011738 Al, herein after "Senior").

Regarding claim 1, Senior teaches a computer-implemented method for generating soft labels for training (Para 0012: “A system of one or more computers can be so configured by virtue of software, firmware, hardware, or a combination of them installed on the system that in operation cause the system to perform the actions.” Para 0009: “Additional types of training under changes of output-symbol inventories can be performed. For example, alignments from a context-independent (CI or phone) model can be used, e.g., as "soft targets", optionally with additional softening, to train a model with context-dependent states.” Computer system and program products are used to implement the invention which comprises generating soft labels to train.) the method comprising: preparing a teacher model having a teacher side class set (Para 0008: “For example, a first neural network may be trained to generate outputs indicating likelihoods for a first set of phonetic units, e.g., phones, context-independent (CI) states, or context-dependent (CD) states.” Teacher model is prepared by training the model using set of phonetic units and their labels and states.)
Obtaining a collection of class pairs for respective data units, each class pair including classes labeled to a corresponding data unit from among the teacher side class set and from among a student side class set different from the teacher side class set (Para 0037: “The first neural network is trained using output targets that indicate specific single labels, during the training process overall the first neural network learns relationships among various phones… The output of the first neural network 120 for a particular frame may distribute the probability for an output among multiple different output labels … The first network, through the confusion or degree of uncertainty between the different labels, is encoding additional information about the similarity of the classes. This represents implicit knowledge about relationships between different output classes.” Para 0008: “A second neural network may be trained based on the outputs of the first neural network to generate outputs indicating likelihoods for a second set of phonetic units that is different from the first set used by the first neural network.” In the first model (teacher model), collection of class pairs are obtained by establishing the relationships among different phones during training process. Similarly, a second model (student model) is trained to generate output indicating data units of classes which is different from the first neural network (teacher model).)
(Para: “The first neural network can be comprehensively trained using a large network, without size or processing constraints, to achieve very high accuracy. As an example, the first neural network 120 may represent a collection of many neural networks each trained somewhat differently. The collection can be used as an ensemble of classifiers, e.g., 50 different neural networks, and the average of the output distributions of the networks can be used to represent the output of the first neural network.” First neural network (teacher model) is trained using a large network and the output is obtained from the model.)
calculating a set of soft labels for the student side class set from the set of the outputs by using, for each member of the student side class set, at least an output obtained for a class within a subset of the teacher side class set having relevance to the member of the student side class set, based at least in part on observations in the collection of the class pairs (Para 0005: “The process of transferring information from a first neural network to a second neural network can involve training the second neural network based on a distribution of outputs from the first neural network. For example, rather than training the second neural network to produce a specific labelled target output, the second neural network can be trained to produce a distribution that matches or approximates a distribution produced by the first neural network.” The output label of the second NN (student model) data units is obtained by matching it with the output label of the class or data units of the first NN (teacher model). The closer match is selected as an output label of the student model class of data units.).
(Para 0049: "Instead, for each frame of an utterance, the second neural network 130 can be trained with the goal of matching the output distribution 122 that the first neural network 120 produced for the same frame of the same utterance” Para 0078: “Output distributions are obtained from the first neural network for an utterance (204). The output distributions can include scores indicating likelihoods corresponding to different phonetic units.” Para 0079: “A second neural network can be trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network (206).” Acoustic models are used in first (teacher) and second (student) neural network where output distribution are generated based on phonetic unit. Spec para 0026 defines phonetic units as: “In the exemplary embodiment, the phonetic units are the context-dependent phoneme.” In other words, in Senior, output distribution will also be based on phoneme. Since, second model (student model) uses same target output distribution as first model (teacher model), this ensures that second (student) model also has same phonetic units (or phoneme and sub-state).

Regarding claim 2, Senior teaches the method of claim 1.
Senior also teaches wherein calculating the set of soft labels for the student side class set comprises selecting, for each member of the student side class set, a class most frequently observed in the collection together with the member of the student side class set from among the subset (Para 0038: “For example, the first neural network 120 has been trained to provide output values indicative of the likelihoods that different phones have been observed in input data. The output of the first neural network 120 may be a vector of output values, where each output value is a probability score indicating the likelihood for a different output label.” Para 0049: “For each frame of an utterance, the second neural network 130 can be trained with the goal of matching the output distribution 122 that the first neural network 120 produced for the same frame of the same utterance.” Output values of first NN or teacher model are assigned to the class label that have most likely or frequently been observed for that output. The Class/Frame label of second NN or student network is matched with that of the teacher model).

Regarding claim 3, Senior teaches the method of claim 2.
Senior also teaches wherein a class of the subset most frequently observed in the collection together with the member is selected and mapped to the member of the student side class set, the output for the most frequently observed class being used to calculate a soft label corresponding to the member by using softmax function (Para 0038: “For example, the first neural network 120 has been trained to provide output values indicative of the likelihoods that different phones have been observed in input data. The output of the first neural network 120 may be a vector of output values, where each output value is a probability score indicating the likelihood for a different output label.” Para 0049: “For each frame of an utterance, the second neural network 130 can be trained with the goal of matching the output distribution 122 that the first neural network 120 produced for the same frame of the same utterance.” Output values of first model or teacher model are assigned to the class label that have most likely or frequently been observed for that output. The Class/Frame label of second model or student network is matched with that of the teacher model. Para 0053: “In some implementations, this process is performed in conjunction with softmax outputs of the first neural network 120. Softmax output nodes may include a linear combination as well as exponentiation and/or normalization, and so the softening of the output distribution may be integrated in the softmax output calculations.” The above process can also be done by softmax function).

Regarding claim 4, Senior teaches the method of claim 1.
Senior also teaches wherein the method further comprises: creating a data structure summarizing, for each member of the student side class set, a distribution of observations in the collection over at least classes of the subset of the teacher side class set observed together with the member of the student side class set, the data structure being used in calculating the set of the soft labels (Para 0038: “The output of the first neural network 120 may be a vector of output values, where each output value is a probability score indicating the likelihood for a different output label. The output vector has a dimension that equals the total number of output labels that can be predicted. This output vector encodes a probability distribution, indicating the allocation of probability among the various output labels.” Para 0047: “The computing system 110 then trains the second neural network using the output distributions 122 from the first neural network 120 as the target outputs of the second neural network 130. For this training iteration, the output distributions 122 and the input audio features 146 correspond to the same training utterance, although the input audio features 146 reflect added noise that was not used to generate the output distributions. The second neural network 130 is effectively trained with the goal of matching the outputs of the first neural network 120 for the same utterance. Because the audio features 146 to the second neural network 130 include additional noise, the second neural network 130 can learn to produce the appropriate output distributions even when noise characteristics are reflected in the input audio features 146. The use of input data to the first neural network 120 based on clean audio data allows the second neural network 130 to learn accurate output distributions 122.” Probability distribution of different utterance or classes are used to train second or student model whereas second model is aimed to produce the same or appropriate target output distribution for a given utterance. This distribution reflects the probability scores for different set of output labels which provides the basis of finding the soft labels).

Regarding claim 5, Senior teaches the method of claim 1.
Senior also teaches wherein obtaining the collection of the class pairs for the respective data units comprises: preparing a trained model having a class set same as the student side class set (Para 0049: “Instead, for each frame of an utterance, the second neural network 130 can be trained with the goal of matching the output distribution 122 that the first neural network 120 produced for the same frame of the same utterance. For example, cross-entropy training can be used to align the outputs of the second neural network 130 with the output distributions 122 of the first neural network.” Para 0010: “training a first neural network using the output of a second neural network as targets for the output of the first neural network. The first neural network may be trained with a CTC algorithm, and the output of the first neural network trained with the CTC algorithm may be used as targets for the output of the second neural network that is in training.” First NN (teacher) having same frame and utterance (which forms classes) is trained using the output of the second NN (student) as target)
aligning a class to each data unit from among the student side class set by using the trained model (Para 0033: “Unlike many DNN, HMM, and GMM acoustic models, CTC models learn how to align phones and with audio data and are not limited to a specific forced alignment.” Para 0039: “In some implementations, the second neural network 130 is trained to produce CTC-type outputs, e.g., output vectors indicating probability distributions for a set of output labels including a "blank" symbol, and where phone indications are in sequence but not strictly time-aligned with input data.” CTC models which aligns the classes to input data can be used to train second model (student model). The second model (student model) will be a trained model after training which can do the alignment of the classes to input data to mimic CTC type models)
aligning a class to each data unit from among the teacher side class set by using the teacher model or other model having a class set same as the teacher side class set (Para 0038: “The first neural network 120 can be trained, using the CTC algorithm, to indicate CI phone labels.” Para 0007: “The CTC model is required to indicate the presence of each phonetic unit of an utterance, in the proper sequence, but the output is not necessarily aligned in time with the corresponding input data. CTC models generally learn alignments during training using the forward-backward algorithm. However, the techniques of the present application can use fixed, stored alignments from a previous model, or alignments computed on-the-fly by a trained model, to give the targets for a new CTC model that is being trained.” Para 0033: “Unlike many DNN, HMM, and GMM acoustic models, CTC models learn how to align phones and with audio data and are not limited to a specific forced alignment.” A pretrained CTC model which has the previous stored alignment of phonetic units (classes) can be used to train first NN (teacher) having same phonetic units).

Regarding claim 8, Senior teaches the method of claim 1.
Senior also teaches wherein the teacher side class set is a class set of phonetic units having N (N is a positive integer) classes, the student side class set is a class set of phonetic units having M (M is a positive integer) classes (Para 0037: “The first network, through the confusion or degree of uncertainty between the different labels, is encoding additional information about the similarity of the classes.” Para 0039: “In some implementations, the second neural network 130 may have the same or similar structure to the first neural network 120.” From above paras, we can infer that first or teacher and second or student model both use positive number of classes in their network)
the data unit represents a frame in a speech data (Para 0034: “Through the recurrent properties of the neural network, the neural network may accumulate and use information about future context to classify an acoustic frame. The neural network is generally permitted to accumulate a variable amount of future context before indicating the phone that a frame represents. Typically, when connectionist temporal classification (CTC) is used, the neural network can use an arbitrarily large future context to make a classification decision.” Para 0040: “Instead, for each frame of an utterance, the second neural network 130 can be trained with the goal of matching the output distribution 122 that the first neural network 120 produced for the same frame of the same utterance.” The data units in first or teacher model and second or student model is frame of an utterance (word of speech)
the teacher model includes an acoustic model and the student model is a neural network for an acoustic model (Para 0004: “For example, a first neural network can be trained as an acoustic model. Then, a "distillation" technique can be used to transfer the training state or "knowledge" obtained through training of the first neural network to a second neural network.” First or teacher model is an acoustic model which is used to transfer knowledge to second or student model (which is NN)).

Regarding claim 9, Senior teaches the method of claim 8.
Senior also teaches wherein the subset of the teacher side class set for each member of the student side class set includes one or more classes having a center phoneme same as the member of the student side class set (Para 0049: “Instead, for each frame of an utterance, the second neural network 130 can be trained with the goal of matching the output distribution 122 that the first neural network 120 produced for the same frame of the same utterance.” Para 0039: “In some implementations, the second neural network 130 may have the same or similar structure to the first neural network 120.” Models are trained to produce same output for the same frame (which is a subset of a class). This tells both models having similar structure share the common goal. First or teacher model would contain the same phoneme for a given member of second or student model).


Regarding claims 14, it is substantially similar to claims 1, and is rejected in the same manner, the same art and reasoning applying.

Regarding claim 15, Senior teaches the method of claim 14.
Senior also teaches wherein the processing circuitry is further configured to: select, for each member of the student side class set, a class most frequently observed in the collection together with the member of the student side class set from among the subset to calculate the set of soft labels for the student side class set (Para 0038: “For example, the first neural network 120 has been trained to provide output values indicative of the likelihoods that different phones have been observed in input data. The output of the first neural network 120 may be a vector of output values, where each output value is a probability score indicating the likelihood for a different output label.” Para 0049: “For each frame of an utterance, the second neural network 130 can be trained with the goal of matching the output distribution 122 that the first neural network 120 produced for the same frame of the same utterance.” Output values of first or teacher model are assigned to the class label that have most likely or frequently been observed for that output. The Class/Frame label of second or student network is matched with that of the teacher model to produce the same or matching output label).

Regarding claims 16, it is substantially similar to claims 4, and is rejected in the same manner, the same art and reasoning applying.

Regarding claim 17, Senior teaches the method of claim 14.
Senior also teaches wherein the processing circuitry is further configured to: prepare a trained model having a class set same as the student side class set (Para 0049: “Instead, for each frame of an utterance, the second neural network 130 can be trained with the goal of matching the output distribution 122 that the first neural network 120 produced for the same frame of the same utterance. For example, cross-entropy training can be used to align the outputs of the second neural network 130 with the output distributions 122 of the first neural network.” Para 0010: “training a first neural network using the output of a second neural network as targets for the output of the first neural network. The first neural network may be trained with a CTC algorithm, and the output of the first neural network trained with the CTC algorithm may be used as targets for the output of the second neural network that is in training.” First NN (teacher) having same frame and utterance (which forms classes) is trained using the output of the second NN (student) as target)
align a class to each data unit from among the student side class set by using the trained model as one for each class pair (Para 0033: “Unlike many DNN, HMM, and GMM acoustic models, CTC models learn how to align phones and with audio data and are not limited to a specific forced alignment.” Para 0039: “In some implementations, the second neural network 130 is trained to produce CTC-type outputs, e.g., output vectors indicating probability distributions for a set of output labels including a "blank" symbol, and where phone indications are in sequence but not strictly time-aligned with input data.” CTC models which aligns the classes to input data can be used to train second model (student model). The second model (student model) will be a trained model after training which can do the alignment of the classes to input data to mimic CTC type models)
align a class to each data unit from among the teacher side class set by using the teacher model or other model having a class set same as the teacher side class set as other for each class pair (Para 0038: “The first neural network 120 can be trained, using the CTC algorithm, to indicate CI phone labels.” Para 0007: “The CTC model is required to indicate the presence of each phonetic unit of an utterance, in the proper sequence, but the output is not necessarily aligned in time with the corresponding input data. CTC models generally learn alignments during training using the forward-backward algorithm. However, the techniques of the present application can use fixed, stored alignments from a previous model, or alignments computed on-the-fly by a trained model, to give the targets for a new CTC model that is being trained.” “Para 0033: “Unlike many DNN, HMM, and GMM acoustic models, CTC models learn how to align phones and with audio data and are not limited to a specific forced alignment.” A pretrained CTC model which has the previous stored alignment of phonetic units (classes) can be used to train first NN (teacher) having same phonetic units).

Regarding claims 18, 19, and 20, they are substantially similar to claims 1, 4 and 5 respectively, and are rejected in the same manner, the same art and reasoning applying.

Regarding claim 21, Senior teaches the method of claim 1.
Senior also teaches further comprising: mapping pairs between the student side class and the teacher side class, the mapping pairs being determined by identifying a highest count (Confusion matrix is defined in spec para 0034 as “The confusion matrix 106 is a data structure that summarizes, for each member of the student side class set, a distribution of observations over classes of the teacher side class set that are observed together with the corresponding member of the student side class set.” Based on this definition of confusion matrix Senior teaches in Para 0037: “The output of the first neural network 120 for a particular frame may distribute the probability for an output among multiple different output labels. For example, while the first neural network 120 may assign a highest likelihood to the "a" phone label that is the correct phone... The first network, through the confusion or degree of uncertainty between the different labels, is encoding additional information about the similarity of the classes… Through the training process described herein, this implicit knowledge and the general proficiency of the first neural network 120 can be transferred efficiently to the second neural network 130.”  Para 0071: “In other words, where a Viterbi target would treat one label as correct and all others as incorrect, the output distribution of a trained network encodes the confusability between classes.” Label assigned to an input is based on likelihood or probability (which makes the highest count) makes an input-output mapping pairs both in teacher model. Since student model gets the same knowledge from teacher model during training, it will generate similar mapping pairs.).


Claim Rejections - 35 USC § 103

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 7, 11, 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Senior et al. (US 2017/0011738A1, hereinafter “Senior”) in view of Shi et al. (A scalable convolutional neural network for task-specified scenarios via knowledge distillation; hereinafter “Shi”).

Regarding claim 6, Senior teaches the method of claim 1. But Senior does not explicitly teach wherein the training input is fed into the teacher model for each training data in a pool and the set of the soft labels for the student side class set is calculated for each training data in the pool.
Shi, however, does teach wherein the training input is fed into the teacher model for each training data in a pool and the set of the soft labels for the student side class set is calculated for each training data in the pool (Page 2469 para 3 and Algorithm 1: “Set the dataset to train teacher model as D and the dataset to train student model as D(θ) named as the transfer set. Use T to capture soft targets                         
                            
                                
                                    P
                                
                                
                                    T
                                
                                
                                    T
                                
                            
                             
                            (
                            θ
                            )
                        
                     from each sample in D(θ).” Training data D is used to trained teacher model. Training data D(θ) is used to train student model which belongs to the same dataset as used for teacher. Soft labels or targets are captured for each sample in the training dataset).


Regarding claim 7, Senior and Shi teach the method of claim 6. 
Shi also teaches wherein the method further comprises: training a student model having the student side class set by using at least a part of the soft labels calculated for each training input (Para 2469 Algorithm 1: “Train the student model with soft targets                         
                            
                                
                                    P
                                
                                
                                    T
                                
                                
                                    T
                                
                            
                        
                     (θ)  and D(θ), iteratively until the accuracy converges.” Student model is trained by training dataset (which includes distinct classes) and soft labels/targets for each sample in dataset).
Same motivation to combine the teachings of Senior and Shi as claim 6.

Regarding claim 11, Senior teaches the method of claim 8.
Senior does not explicitly teaches wherein the M classes in the student side class set belong to a phoneme system of a language same as the N classes in the teacher side class set.
Shi, however, teaches wherein the M classes in the student side class set belong to a phoneme system of a language same as the N classes in the teacher side class set (Section 4 para 2: “In the experiments, 10 classes of datasets will be divided into subsets to demonstrate the performance of task-specified student model. We set our transfer sets according to the following strategy. Take MNIST for example. The training dataset D for the teacher model covers the whole training samples, say all ten classes.” Page 2469 para 4: “Set the dataset to train teacher model as D and the dataset to train student model as D(θ) named as the transfer set, which has less task complexity than former, and typically is a sub set of D, containing only task of interested targets.” Data set D contains 10 classes from teacher dataset. Classes in student model belong to the same classes as that of teacher dataset).
Same motivation to combine the teachings of Senior and Shi as claim 6.

Regarding claim 12, Senior teach the method of claim 1.
Senior does not explicitly teach wherein the teacher side class set is an image class set having N (N is a positive integer) image classes, the student side class set is an image class set having M (M is a positive integer) image classes, the data unit represents an image data, and the teacher model includes an image recognition model.
Shi, however, teaches wherein the teacher side class set is an image class set having N (N is a positive integer) image classes, the student side class set is an image class set having M (M is a positive integer) image classes, the data unit represents an image data, and the teacher model includes an image recognition model (Page 2468 Section 2 Para 1: “A typical CNN in vision applications takes an image as input and processes it into a feature vector for image classification, scene classification, object detection, and object tracking, etc.” Page 2468 Section 2 Para 2: “As shown in Fig.1, the input of a convolutional layer is a set of feature maps. The size of input of a convolutional layer is                         
                            
                                
                                    C
                                
                                
                                    i
                                
                            
                        
                     *                         
                            
                                
                                    I
                                
                                
                                    s
                                
                            
                        
                      , where                         
                            
                                
                                    C
                                
                                
                                    i
                                
                            
                        
                     is the number of input channels, and                         
                            
                                
                                    I
                                
                                
                                    s
                                
                            
                        
                     specifies the size of input feature maps.” Fig.1 in Shi shows both teacher and student model use convolutional layers whose input is a feature map derived from an image. The output of a CNN is image classification (in other words image recognition).
Same motivation to combine the teachings of Senior and Shi as claim 6.

Regarding claim 13, Senior and Shi teach the method of claim 12.
Shi also teaches wherein the subset of the teacher side class set for each member of the student side class set includes one or more classes belonging to a superclass related to the member of the student side class set (Page 2469 Para 4: “Set the dataset to train teacher model as and the dataset to train D student model as named as the transfer set, which has D() less task complexity than former, and typically is a sub set of D, containing only task of interested targets.” Subset of teacher dataset is used in student model belonging to same parent or super set).
Same motivation to combine the teachings of Senior and Shi as claim 6.


Response to Arguments
Applicant's arguments filed on 03/29/2021 with respect to the 35 U.S.C. 102 and 103 rejections have been fully considered. Claims 1, 2, 14, 15, and 18 have been amended by the applicant to address 35 U.S.C. 102 and 103 rejections in previous Office Action. Applicant also cancelled claim 10 and added claim 21. All new claims and amendments are addressed in 103 section. Applicant also made arguments regarding claim rejections which are addressed below.

Applicant Argument 1: With regards to claims 1, 14 and 18. Applicant argues, “It is respectfully asserted that Senior fails to discuss "calculating a set of soft labels for the student side class ... by using ... for each member of the student side class, ... based at least in part on observations in the collection of the class pairs" [emphasis added], as essentially recited in independent claims 1, 14, and 18, but rather is silent regarding the same. Indeed, Senior fails to discuss class pairs anywhere in the Senior specification, and thus, clearly does not, and cannot teach or suggest at least the above-mentioned claim features at least due to the above reasons.”
Response to Argument 1: Senior used classification algorithms to generate classes and class boundaries. After generating classification, classification pairs will be formed as input-out pair. Senior mentions in para 0073 as : “In exemplary embodiments of the present disclosure, a neural network may be trained by classifying frames, for example, through forced alignment using an optimal boundary of distinct phones in an inputted sequence of phones to generate respective labels of the phones.” And in Para 0037 as “The first network, through the confusion or degree of uncertainty between the different labels, is encoding additional information about the similarity of the classes… the general proficiency of the first neural network 120 can be transferred efficiently to the second neural network.” It is evident from above citations that data is divided into various classes which will form set of class pairs.

Applicant Argument 2: Accordingly, Applicant respectfully submits that Senior fails to disclose or suggest at least "wherein the subset of the teacher side class set for each member of the student side class set comprises classes having a same center phoneme and a same sub-
Response to Argument 2: In Senior, Acoustic models are used in first (teacher) and second (student) neural network where output distribution are generated based on phonetic unit (spec para 0026 defines phonetic units as: “In the exemplary embodiment, the phonetic units are the context-dependent phoneme.” In other words output distribution in Senior will also be based on phoneme). Since, second model (student model) uses same target output distribution as first model (teacher model), this ensures that second (student) model also has same phonetic units (or phoneme and sub-state).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

An inquiry concerning this communication or earlier communication from the examiner should be directed QAMAR IQBAL whose telephone number is (571)272-2563. The examiner can normally be reached on M-F 10-6pm (EST). 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Q.I/ 
Examiner 
Art unit 2123
04/12/2021

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123