DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This communication is in response to Applicant’s “Reply to Action of August 5, 2020” filed 03 December 2020 [hereinafter Response], where:
Claim 16 has been amended.
Claims 1-16 and 18-25 are pending.
Claims 1-16 and 18-25 are rejected.
Claim Objections
3.	The objection to claim 16 is withdrawn in view of Applicant’s amendment to the claim.
Claim rejections - 35 USC § 102
4.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. § 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
5.	Claims 1, 2, 4-7, 9-12, 14, and 15 are rejected under 35 U.S.C. § 102(a)(1) as being anticipated by Miyato et al., “Distributional Smoothing with Virtual Adversarial Training,” JMLR (2015) [hereinafter Miyato].
Miyato teaches that overfitting is an unavoidable challenge in supervised and semi-supervised training of the classification and regression functions. To counter overfitting, Miyato teaches decreasing generalization error through smoothness regularization by rewarding local distributional smoothness (LDS). Miyato teaches use of a Kullback-Leibler (KL) distance as a measure of a classification model’s robustness against perturbation. 
Regarding claim 1, Miyato teaches [a] method of training a neural network (Miyato at p. 6, “3.1 Supervised Learning for the Binary Classification of Synthetic Dataset,” second paragraph, teaches [o]ur classifier was a neural network (NN) with one hidden layer consisting of 500 hidden units), wherein the neural network is configured to receive an input data item and to process the input data item to generate a respective score for each label in a predetermined set of multiple labels (Miyato at p. 6, “3.1. Supervised Learning for the Binary Classification of Synthetic Datasets”, second paragraph, teaches [o]ur classifier was a neural network (NN) with one hidden layer consisting of 500 hidden units. We . . . used softmax activation function for all the output units (that is, to process the input data item to generate a respective score for each label in a predetermined set of multiple labels)), the method comprising:
obtaining a plurality of training items, wherein each training item is associated with an initial target label distribution that assigns a respective target score to each label in the predetermined set of multiple labels (Miyato at p. 2, “2.1 Formalization of Local Distributional Smoothness”, first paragraph, teaches [w]e begin with the formal definition of the local distributional smoothness (that is, smoothness is a score, such that each training item is associated with an initial target label distribution that assigns a respective target score). Let us suppose the input space ℝI, the output space Q, and a training samples (obtaining a plurality of training items); Miyato at p. 10, “3.2. Supervised classification for the MNIST dataset”, first paragraph, teaches test[ing] the performance or our regularization method on the MNIST dataset, which consists of . . . handwritten digits and their corresponding labels (that is, values of the MNIST dataset include an initial target label distribution assigns a respective target score to each label in the predetermined set of multiple labels); see also Miyato, at p. 6, “3.1 Supervised learning for the binary classification of synthetic dataset,” last partial paragraph, teaches [t]he average LDS is nearly zero at the beginning, because the models are initially close to uniform distribution (that is, as further example, each of the MNIST training set is associated with an initial target label distribution)); 
for each training item, modifying the initial target label distribution associated with the training item by combining the initial target label distribution with a smoothing label distribution (Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness”, teaches [f]ormulating this goal [to improve the smoothness of the model in the neighborhood of all the observed inputs] based on [Local Distributional Smoothness (LDS)], we obtain the following objective function [eq. (4):

    PNG
    media_image1.png
    95
    634
    media_image1.png
    Greyscale

(note, Examiner’s notations added in text boxes), which is] the virtual adversarial training (VAT); As explained, Examiner notes that Miyato at eq. (4) is modifying the initial target label distribution associated with the training item by combining the initial target label distribution with a smoothing label distribution)), wherein the smoothing label distribution includes a respective smoothing score for each label in the predetermined set of multiple labels and is independent of the training item (Miyato at p. 2, “1. Introduction”, third full paragraph, teaches a novel parameterization invariant smoothness regularization that builds on the philosophy of adversarial training. At each step in the training, we identify for each observed input the perturbation (that is, the perturbation being a measure for a respective smoothing score for each label) to which the model distribution itself is most sensitive in the sense of Kullback-Leibler divergence (KL divergence). This perturbation is determined without the label information of the observed input (that is, is independent of the training item). At each input, we can therefore define the robustness of the model distribution against the perturbation in a local and ‘virtual’ adversarial direction. The local robustness defined this way serves as a measure of the local smoothness of the distribution, which we call the local distributional smoothness (LDS) (that is, the smoothing label distribution includes a respective smoothing score for each label in the predetermined set of multiple labels and is independent of the training item). We propose virtual adversarial training (VAT), a simple regularization technique that rewards the average of the LDS over all the training samples. Because the LDS is independent of the label informations [sic] (that is, being independent of the label information is independent of the training item), VAT is applicable to the semi-supervised learning; Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness, first paragraph teaches “local distributional smoothness (LDS),” in which the smaller the value of ΔKL, the smoother the model distribution (that is, a smoothness score)); and
training the neural network using the plurality of training items and the respective modified initial target distributions for the plurality of training items (Miyato at p. 6, “3. Experiments”, first paragraph, teaches [t]o evaluate the efficacy of VAT, we . . . applied various regularization techniques including [virtual adversarial training (VAT)] (that is, training) to the binary classification of synthetic datasets (that is, training the neural network using the plurality of training items and the respective modified initial target distribution for the plurality of training items); Miyato at p. 5, “2.2.1. Evaluation of rv-adv,” first partial paragraph, teaches [that] the value                         
                            
                                
                                    ∇
                                
                                
                                    r
                                
                            
                            
                                
                                    ∆
                                
                                
                                    K
                                    L
                                
                            
                            
                                
                                    r
                                    +
                                    ζ
                                    d
                                    ,
                                     
                                    x
                                    ,
                                    Θ
                                
                            
                            
                                
                                    |
                                
                                
                                    r
                                    =
                                    0
                                
                            
                        
                     can be computed easily by the back propagation method in the case of neural networks (that is, training the neural network)).
Examiner notes that the Applicant’s preamble does not afford patentable weight to the Applicant’s claims because this claim’s preamble is not “necessary to give life, meaning, and vitality” to the claim.
Examiner also points out that the use of phrases such as “based on” or “associated with” are loose generalities that function to expand the BRI of the claim limitations because they do not connote a causal or resultant effect between the terms but are instead refer to an “incidental” or a “passing” relation.
Regarding claim 2, Miyato teaches all of the limitations of claim 1, as described above.
Miyato teaches -
wherein combining the initial target label distribution with the smoothing label distribution includes:
calculating a weighted sum of the initial target label distribution and the smoothing label distribution (Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness“, first full paragraph, teaches the objective function,

    PNG
    media_image2.png
    55
    480
    media_image2.png
    Greyscale

in which the VAT is parameterized (that is, weighted) by the hyperparameter λ > 0 and ϵ > 0 (that is, calculating a weighted sum of the initial target label distribution and the smoothing label distribution); Examiner points out the [Local Distributional Smoothness] is a sum weighted by the hyperparameter).
Regarding claim 4, Miyato teaches all of the limitations of claim 1, as described above.
Miyato teaches -
wherein each smoothing score has the same predetermined value (Miyato at p. 6, “3.1. Supervised Learning for the Binary Classification of Synthetic Dataset”, second paragraph, teaches random perturbation training is a modified version of the VAT in which we replaced rv-adv with an ϵ sized unit vector uniformly sampled (that is, replacing the smoothing score with one having a uniformly sampled unit vector) from the I dimensional unit sphere (that is, with an ϵ sized unit vector being uniformly sampled such that each smoothing score has the same predetermined value)).
Regarding claim 5, Miyato teaches all of the limitations of claim 1, as described above.
Miyato teaches -
wherein the smoothing scores are non-uniform (Miyato at p. 15, “A.3. Semi-Supervised Classification for the MNIST Dataset”, first paragraph teaches [w]eights were initialized in the same way as in the supervised classification for MNIST. We searched for the best hyperparameter ϵ’ from {0.0075, 0.01, 0.0125, 0.015, 0.0175} (that is, resultantly, the smoothing scores are non-uniform)).
Regarding claim 6, Miyato teaches [a] system comprising:
one or more data processing apparatus; and one or more memory devices storing instructions that when executed by the one or more data processing apparatus cause the one or more data processing apparatus to perform operations for training a neural network (Miyato at p. 6, “3. Experiments”, second paragraph, teaches that [a]ll the computations were conducted with Theano . . . on a GPU environment (that is, one or more data processing apparatus); a GPU environment includes one or more memory devices storing instructions that when executed by the one or more data processing apparatus to perform operations); still further Theano is a Python library executable on an OS for the GPU environment), wherein the neural network is configured to receive an input data item (Miyato at p. 11, “4. Discussion and Related Works”, second partial paragraph, teaches [a]dversarial training and VAT are similar in that they both use the local input-output relationship (that is, an input-output relationship is configured to receive an input data item) to smooth the model distribution in the corresponding neighborhood (wherein the neural network is configured to receive an input data item)) and to process the input data item to generate a respective score for each label in a predetermined set of multiple labels (Miyato at p. 6, “3.1. Supervised Learning for the Binary Classification of Synthetic Datasets”, second paragraph, teaches [o]ur classifier was a neural network (NN) with one hidden layer consisting of 500 hidden units. We . . . used softmax activation function for all the output units (that is, the neural network receives input and via output units teaches the feature to process the input data item to generate a respective score for each label in a predetermined set of multiple labels)), the operations including:
obtaining a plurality of training items, wherein each training item is associated with an initial target label distribution that assigns a respective target score to each label in the predetermined set of labels (Miyato at p. 2, “2.1 Formalization of Local Distributional Smoothness”, first paragraph, teaches [w]e begin with the formal definition of the local distributional smoothness (that is, each training item is associated with an initial target label distribution that assigns a respective target score). Let us suppose the input space ℝI, the output space Q, and a training samples (obtaining a plurality of training items); Miyato at p. 10, “3.2. Supervised classification for the MNIST dataset”, first paragraph, teaches test[ing] the performance or our regularization method on the MNIST dataset, which consists of . . . handwritten digits and their corresponding labels (that is, the initial target label distribution assigns a respective target score to each label in the predetermined set of labels); see also Miyato, at p. 6, “3.1 Supervised learning for the binary classification of synthetic dataset,” last partial paragraph, teaches [t]he average LDS is nearly zero at the beginning, because the models are initially close to uniform distribution (that is, as further example, each of the MNIST training set is associated with an initial target label distribution));
for each training item, modifying the initial target label distribution by combining the initial target label distribution with a smoothing label distribution (Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness”, teaches [f]ormulating this goal [to improve the smoothness of the model in the neighborhood of all the observed inputs] based on [Local Distributional Smoothness (LDS)], we obtain the following objective function [eq. (4):

    PNG
    media_image1.png
    95
    634
    media_image1.png
    Greyscale

(note, Examiner’s notations added in text boxes), which is] the virtual adversarial training (VAT); As explained, Examiner notes that Miyato at eq. (4) is modifying the initial target label distribution associated with the training item by combining the initial target label distribution with a smoothing label distribution)) includes a respective smoothing score for each label in the predetermined set of multiple labels and is independent of the training item (Miyato at p. 2, “1. Introduction”, third full paragraph, teaches a novel parameterization invariant smoothness regularization that builds on the philosophy of adversarial training. At each step in the training, we identify for each observed input the perturbation (that is, the perturbation being a measure for a respective smoothing score for each label) to which the model distribution itself is most sensitive in the sense of Kullback-Leibler divergence (KL divergence). This perturbation is determined without the label information of the observed input (that is, is independent of the training item). At each input, we can therefore define the robustness of the model a smoothing score). The local robustness defined this way serves as a measure of the local smoothness of the distribution, which we call the local distributional smoothness (LDS). We propose virtual adversarial training (VAT), a simple regularization technique that rewards the average of the LDS over all the training samples. Because the LDS is independent of the label informations [sic], VAT is applicable to the semi-supervised learning; Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness, first paragraph teaches “local distributional smoothness (LDS),” in which the smaller the value of ΔKL, the smoother the model distribution (that is, a smoothness score)); and
training the neural network using the plurality of training items and the respective modified initial target distributions for the plurality of training items (Miyato at p. 6, “3. Experiments”, first paragraph, teaches [t]o evaluate the efficacy of VAT, we . . . applied various regularization techniques including [virtual adversarial training (VAT)] to the binary classification of synthetic datasets (that is, training the neural network using the plurality of training items and the respective modified initial target distribution for the plurality of training items); Miyato at p. 5, “2.2.1. Evaluation of rv-adv,” first partial paragraph, teaches [that] the value                         
                            
                                
                                    ∇
                                
                                
                                    r
                                
                            
                            
                                
                                    ∆
                                
                                
                                    K
                                    L
                                
                            
                            
                                
                                    r
                                    +
                                    ζ
                                    d
                                    ,
                                     
                                    x
                                    ,
                                    Θ
                                
                            
                            
                                
                                    |
                                
                                
                                    r
                                    =
                                    0
                                
                            
                        
                     can be computed easily by the back propagation method in the case of neural networks (that is, training the neural network))).

Examiner also notes that the Applicant’s preamble does not afford patentable weight to the Applicant’s claims because this claim’s preamble is not “necessary to give life, meaning, and vitality” to the claim.
Examiner further points out that the use of phrases such as “based on” or “associated with” are loose generalities that function to expand the BRI of the claim limitations because they do not connote a causal or resultant effect between the terms but are instead refer to an “incidental” or a “passing” relation.
Regarding claim 7, Miyato teaches all of the limitations of claim 6, as described above.
Miyato teaches -
wherein combining the initial target label distribution with the smoothing label distribution includes:
calculating a weighted sum of the initial target label distribution and the smoothing label distribution (Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness“, first full paragraph, teaches the objective function,

    PNG
    media_image2.png
    55
    480
    media_image2.png
    Greyscale

in which the VAT is parameterized (that is, weighted) by the hyperparameter λ > 0 and ϵ > 0 (that is, calculating a weighted sum of the initial target label distribution and the smoothing label distribution); Examiner points out the [Local Distributional Smoothness] is a sum weighted by the hyperparameter).
Regarding claim 9, Miyato teaches all of the limitations of claim 6, as described above.
Miyato further teaches wherein each smoothing score has the same predetermined value (Miyato at p. 6, “3.1. Supervised Learning for the Binary Classification of Synthetic Dataset”, second paragraph, teaches random perturbation training is a modified version of the VAT in which we replaced rv-adv with an ϵ sized unit vector uniformly sampled (that is, replacing the smoothing score with one having a uniformly sampled unit vector) from the I dimensional unit sphere (that is, with an ϵ sized unit vector being uniformly sampled such that each smoothing score has the same predetermined value)).
Regarding claim 10, Miyato teaches all of the limitations of claim 6, as described above.
Miyato further teaches wherein the smoothing scores are non-uniform (Miyato at p. 15, “A.3. Semi-Supervised Classification for the MNIST Dataset”, first paragraph teaches [w]eights were initialized in the same way as in the supervised classification for MNIST. We searched for the best hyperparameter ϵ’ from {0.0075, 0.01, 0.0125, 0.015, 0.0175} (that is, (resultantly, the smoothing scores are non-uniform)).
Regarding claim 11, Miyato teaches [a] non-transitory computer-readable medium storing software comprising instructions executable by one or more  (Miyato at p. 6, “3. Experiments”, second paragraph, teaches that [a]ll the computations were conducted with Theano . . . on a GPU environment (that is, one or more computers); a GPU environment includes a non-transitory computer readable medium that is storing software comprising instructions executable by one or more which, upon execution, cause the one or more computers to perform operations); still further Theano is a Python library executable on an OS of the GPU environment), wherein the neural network is configured to receive an input data item and to process the input data item to generate a respective score for each label in a predetermined set of multiple labels (Miyato at p. 6, “3.1. Supervised Learning for the Binary Classification of Synthetic Datasets”, second paragraph, teaches [o]ur classifier was a neural network (NN) with one hidden layer consisting of 500 hidden units. We . . . used softmax activation function for all the output units (that is, to process the input data item to generate a respective score for each label in a predetermined set of multiple labels)), the operations comprising:
obtaining a plurality of training items, wherein each training item is associated with an initial target label distribution that assigns a respective target score to each label in the predetermined set of labels (Miyato at p. 2, “2.1 Formalization of Local Distributional Smoothness”, first paragraph, teaches [w]e begin with the formal definition of the local distributional smoothness (that is, each training item is associated with an initial target label distribution that assigns a respective target score). Let us suppose the input space ℝI, the output space Q, and a training samples (obtaining a plurality of training items); Miyato at p. 10, “3.2. Supervised classification for the MNIST dataset”, first paragraph, teaches test[ing] the performance or our regularization method on the MNIST dataset, which consists of . . . handwritten digits and their corresponding labels (that is, the initial target label distribution assigns a respective target score to each label in the predetermined set of labels); see also Miyato, at p. 6, “3.1 Supervised learning for the binary classification of synthetic dataset,” last partial paragraph, teaches [t]he average LDS is nearly zero at the beginning, because the models are initially close to uniform distribution (that is, as further example, each of the MNIST training set is associated with an initial target label distribution));
for each training item, modifying the initial target label distribution by combining the initial target label distribution with a smoothing label distribution (Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness”, teaches [f]ormulating this goal [to improve the smoothness of the model in the neighborhood of all the observed inputs] based on [Local Distributional Smoothness (LDS)], we obtain the following objective function [eq. (4):

    PNG
    media_image1.png
    95
    634
    media_image1.png
    Greyscale

(note, Examiner’s notations added in text boxes), which is] the virtual adversarial training (VAT); As explained, Examiner notes that Miyato at eq. (4) is modifying the initial target label distribution associated with the training item by combining the initial target label distribution with a smoothing label distribution)) includes a respective smoothing score for each label in the predetermined set of multiple labels and is independent of the training item (Miyato at p. 2, “1. Introduction”, third full paragraph, teaches a novel a respective smoothing score for each label) to which the model distribution itself is most sensitive in the sense of Kullback-Leibler divergence (KL divergence). This perturbation is determined without the label information of the observed input (that is, is independent of the training item). At each input, we can therefore define the robustness of the model distribution against the perturbation in a local and ‘virtual’ adversarial direction (that is, a smoothing score). The local robustness defined this way serves as a measure of the local smoothness of the distribution, which we call the local distributional smoothness (LDS). We propose virtual adversarial training (VAT), a simple regularization technique that rewards the average of the LDS over all the training samples. Because the LDS is independent of the label informations [sic], VAT is applicable to the semi-supervised learning; Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness, first paragraph teaches “local distributional smoothness (LDS),” in which the smaller the value of ΔKL, the smoother the model distribution (that is, a smoothness score)); and
training the neural network using the plurality of training items and the respective modified initial target distributions for the plurality of training items (Miyato at p. 6, “3. Experiments”, first paragraph, teaches [t]o evaluate the efficacy of VAT, we . . . applied various regularization techniques including [virtual adversarial training (VAT)] to the binary classification of synthetic datasets (that is, training the neural network using the plurality of training items and the respective modified initial target distribution for the plurality of training items); Miyato at p. 5, “2.2.1. Evaluation of v-adv,” first partial paragraph, teaches [that] the value                         
                            
                                
                                    ∇
                                
                                
                                    r
                                
                            
                            
                                
                                    ∆
                                
                                
                                    K
                                    L
                                
                            
                            
                                
                                    r
                                    +
                                    ζ
                                    d
                                    ,
                                     
                                    x
                                    ,
                                    Θ
                                
                            
                            
                                
                                    |
                                
                                
                                    r
                                    =
                                    0
                                
                            
                        
                     can be computed easily by the back propagation method in the case of neural networks (that is, training the neural network)).
Examiner notes that the features of "non-transitory computer-readable medium" and "one or more computers" recited in Applicant's claims are interpreted to be a well-known hardware structures. 
Also, when reading the preamble in the context of the entire claim, the recitation of the features of a “non-transitory computer-readable medium" and "one or more computers" are not limiting because the body of the claim describes a complete invention and the language recited solely in the preamble does not provide any distinct definition of any of the claimed invention’s limitations. Thus, the preamble of the claims is not considered a limitation and is of no significance to claim construction. See Pitney Bowes, Inc. v. Hewlett-Packard Co., 182 F.3d 1298, 1305, 51 USPQ2d 1161, 1165 (Fed. Cir. 1999). See MPEP § 2111.02.
Examiner further points out that the use of phrases such as “based on” or “associated with” are loose generalities that function to expand the BRI of the claim limitations because they do not connote a causal or resultant effect between the terms but are instead refer to an “incidental” or a “passing” relation.
Regarding claim 12, Miyato teaches all of the limitations of claim 11, as described above.
Miyato further teaches -
wherein combining the initial target label distribution with the smoothing label distribution includes:
calculating a weighted sum of the initial target label distribution and the smoothing label distribution 
(Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness“, first full paragraph, teaches the objective function,

    PNG
    media_image2.png
    55
    480
    media_image2.png
    Greyscale

in which the VAT is parameterized (that is, weighted) by the hyperparameter λ > 0 and ϵ > 0 (that is, calculating a weighted sum of the initial target label distribution and the smoothing label distribution); Examiner points out the [Local Distributional Smoothness] is a sum weighted by the hyperparameter).
Regarding claim 14, Miyato teaches all of the limitations of claim 14, as described above.
Miyato teaches -
wherein each smoothing score has the same predetermined value (Miyato at p. 6, “3.1. Supervised Learning for the Binary Classification of Synthetic Dataset”, second paragraph, teaches random perturbation training is a modified version of the VAT in which we replaced rv-adv with an ϵ sized unit vector uniformly sampled (that is, replacing the smoothing score with one having a uniformly sampled unit vector) from the I dimensional unit sphere (that is, with an ϵ sized unit vector being uniformly sampled such that each smoothing score has the same predetermined value)).
Regarding claim 15, Miyato teaches all of the limitations of claim 11, as described above.
Miyato further teaches wherein the smoothing scores are non-uniform (Miyato at p. 15, “A.3. Semi-Supervised Classification for the MNIST Dataset”, first paragraph teaches [w]eights were initialized in the same way as in the supervised classification for MNIST. We searched for the best hyperparameter ϵ’ from {0.0075, 0.01, 0.0125, 0.015, 0.0175} (that is, resultantly, the smoothing scores are non-uniform)).
Claim Rejections - 35 USC § 103
6.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
7. 	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
8.	Claims 23-25 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 8,620,838 to Moore [hereinafter Moore] in view of Miyato et al., “Distributional Smoothing with Virtual Adversarial Training,” JMLR (2015) [hereinafter Miyato].
Regarding claim 23, Moore teaches [a] method of training a neural network, wherein the neural network is configured to receive an input data item and to process the input data item to generate a respective score for each label in a predetermined set of multiple labels, the method comprising:
receiving a request to train the neural network to optimize a loss function comprising a first error term (Moore col. 3, ll. 28-30, teaches to load-balance the processing of requests to the computing device (receiving a request). The group of computing devices 17 can alternatively be a cloud computing service); Moore col. 4, ll. 44-47 teaches that [t]he technique 50 can be referred to as Simultaneous MERT (SiMERT). SiMERT is in part related to Minimum Error Rate Training (MERT), a process for optimizing low-dimensional classifier models (optimize a loss function comprising a first error term); Moore col. 4, ll. 55-58, teaches that technique 50 can be utilized with non-convex loss functions (teaching that technique 50 includes loss functions (loss function comprising a first error term)); and
training the neural network to optimize a regularized loss function, the regularized loss function comprising the first error term and a regularizing error term that penalizes the neural network based on the error between a predicted distribution and a smoothing distribution (Moore col. 5, ll.43-45 teaches that [l]oss functions can include, for example, L1-regularized hinge loss, L2-regularized hinge loss, or a simple error count (error between a predicted distribution and a smoothing distribution) with respect to a training set; Moore col. 7, ll. 61-67, also teaches [t]he L1-
* * *
Though Moore teaches, in a regularization context, the feature of minimizing the output of a loss function of a machine learning process to provide an update (or modify) a feature weight to improve the accuracy of a classifier model, Moore does not explicitly teach -
* * *
. . . , wherein: 
the predicted distribution is generated by the neural network upon processing a particular data item and includes a respective predicted score for each label in the predetermined set of labels; and
the smoothing distribution includes a respective smoothing score for each label in the predetermined set of multiple labels.
But Miyato teaches -
* * *
. . . , wherein: 
the predicted distribution is generated by the neural network upon processing a particular data item and includes a respective predicted score for each label in the predetermined set of labels (Miyato at p. 6, “3.1. Supervised Learning for the Binary Classification of Synthetic Datasets”, second paragraph, teaches [o]ur classifier was a neural network (NN) with one hidden layer consisting of 500 hidden units. We . . . used softmax activation function for all the output units (that is, generated by a neural network); Miyato at p 2, “1. Introduction”, third full paragraph, teaches a novel parameterization invariant smoothness regularization that builds on the philosophy of the adversarial training. At each step in the training, we identify for each observed input the perturbation to which the model distribution itself is most sensitive in the sense of Kullback-Leibler divergence (KL divergence). This perturbation (that is, respective predicted score) is determined without the label information of the observed input (that is, the predicted distribution is generated by the neural network upon processing a particular data item and includes a respective predicted score for each label in the predetermined set of labels)); and
the smoothing distribution includes a respective smoothing score for each label in the predetermined set of multiple labels (Miyato at p. 2, “2.1 Formalization of Local Distributional Smoothness”, first paragraph, teaches [w]e begin with the formal definition of the local distributional smoothness (that is, the smoothing distribution includes a respective smoothing score for each label in the predetermined set of multiple labels). Let us suppose the input space ℝI, the output space Q, and a training samples (that is, training samples being each label in the predetermined set of labels); Miyato at p. 10, “3.2. Supervised classification for the MNIST dataset”, first paragraph, teaches test[ing] the performance or our regularization method on the MNIST dataset, which consists of . . . handwritten digits and their corresponding labels (that is, the initial target label distribution includes a respective smoothing score for each label in the predetermined set of multiple labels); see also, Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness, first paragraph, which teaches “local distributional smoothness (LDS),” in which the smaller the value of ΔKLthe smoother the model distribution (that is, the value of ΔKL is a smoothness score)).
Moore and Miyato are from the same or similar field of endeavor. Moore teaches the process regularizing the loss function of a training set to produce a classifier model that accurately identifies the labels of tokens in data other than the training set. Miyato teaches a novel parameterization invariant smoothness regularization that builds on the philosophy of the adversarial training. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to combine the teachings of Moore pertaining to loss function regularizing with the smoothness regularization building on adversarial training of Miyato.
The motivation for doing so is to provide training with local distributional smoothness regularization virtual adversarial training (VAT) that determines the adversarial direction from the model distribution alone, and performs well even in comparison to the state of the art methods. (Miyato, Abstract).
Regarding claim 24, the combination of Moore and Miyato teaches all of the limitations of claim 23, as described above.
Moore teaches wherein the smoothing distribution is a uniform distribution (Moore FIG. 2 (stage 62) teaches to update vector w using step size α (smoothing label distribution) and vector u (initial target label distribution); the step size α being uniform for the smoothing label distribution). Step size α taught by Moore corresponds with the modifying an initial target label distribution to generate regularizing training data. 
Moore teaches the claimed smoothing label generation, based on the BRI of the term “smoothing” in view of Applicant’s spec. para. [0051] - [0052], where it states: “a label modification process referred to as label smoothing may be employed. Assume, for example, a distribution over labels u(k), independent of the training example x, and a smoothing parameter ϵ. . . . [T]he new label distribution q’ [(that is, modifying the initial target label distribution)] is a mixture [(that is, combining)] of the original ground-truth distribution q(k|x) [(initial target label distribution)] and fixed distribution u(k), with weights 1- ϵ and ϵ [(smoothing parameter ϵ corresponding to a smoothing label distribution)], respectively.” (emphasis added). Accordingly, Moore teaches the Applicant’s smoothing label distribution for modifying the initial target label distribution to generate a modified target label distribution by combining the initial target label distribution with a smoothing label distribution)).
Regarding claim 25, the combination of Moore and Miyato teaches all of the limitations of claim 23, as described above.
Moore further teaches wherein the smoothing distribution is a distribution that was used prior to the predicted distribution (Moore FIG. 2 at stage 62 teaches to update vector w using (used prior to the predicted distribution) step size α (smoothing label distribution) and vector u (initial target label distribution); see also Moore, col. 15, ll. 1-7 & FIG. 6, which teaches [t]he scheme 100 includes a trainer 102 that is configured to accept input 104. Input 104 includes a training set T, feature universe Ƒ, label universe L and a loss function l. Input 104 is used by trainer 102 to generate a .
9.	Claims 3, 8, and 13 are rejected under 35 U.S.C. § 103 as being unpatentable over Miyato et al., “Distributional Smoothing with Virtual Adversarial Training,” JMLR (2015) [hereinafter Miyato] in view of US Patent 8,620,838 to Moore [hereinafter Moore] and further in view of US Published Application 2009/0240637 to Drissi et al. [hereinafter Drissi].
Regarding claims 3, 8, and 13, Miyato teaches all of the limitations of claims 1, 6, and 11, respectively, as described above.
However, Miyato fails to explicitly teach -
wherein, for each training item:
the target score for a known label for the training item is assigned a predetermined positive value in the initial target label distribution for the training item, and
the target score for each label other than the known label is set to 0 in the initial target label distribution.
But Moore teaches -
wherein, for each training item:
the target score for a known label for the training item is assigned a predetermined positive value in the initial target label distribution for the training item (Moore, col. 4, ll. 61-64, teaches the process of taking a training set T that includes annotations of “correct” labels (known label) for each token (training item) and w; further, Moore col. 5, ll. 6-7, teaches [v]ector w can be initialized to some non-zero values (that is, target score for a known label [(correct label)], where the non-zero values includes a predetermined positive value). include (assigned) all zero feature weights (predetermined positive value in the initial target distribution for the training item); 
Examiner points out that “correct” corresponds to “known” because the term “known” pertains to being “recognized, familiar, or within the scope of knowledge,” and hence, connotes a correct label; see also, e.g., Specification at p. 9, ¶ 0044, discloses “the value assigned to the correct label [(that is, known label)] is a positive value such as ‘1’ and all other labels are assigned a value such as ‘0.’“)), and
the target score for each label other than the known label is set to 0 in the initial target label distribution (Moore col. 5, ll. 6-7, teaches [v]ector w can be initialized to some non-zero values (that is, target score for a known label [(correct label)], where the non-zero values includes a predetermined positive value). include (assigned) all zero feature weights (predetermined positive value in the initial target distribution for the training item)).
Moore and Miyato are from the same or similar field of endeavor. Moore teaches the process regularizing the loss function of a training set to produce a classifier model that accurately identifies the labels of tokens in data other than the training set. Miyato teaches a novel parameterization invariant smoothness regularization that builds on the philosophy of the adversarial training. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to combine the teachings of Moore pertaining to loss function of Miyato.
The motivation for doing so is to provide training with local distributional smoothness regularization virtual adversarial training (VAT) that determines the adversarial direction from the model distribution alone, and performs well even in comparison to the state of the art methods. (Miyato, Abstract).
Though each of Moore and Miyato teach features of preventing model overfitting through regularization for a corpus of items with labels, the combination of Moore and Miyato does not explicitly teach where there includes a label other than the known label.
But Drissi, ¶ 0031, teaches a data set (including known label) in which noise or disturbance added to the data set so as to generate a second data set (label other than the known label).
Miyato, Moore, and Drissi are from the same or similar field of endeavor. Miyato teaches a novel parameterization invariant smoothness regularization that builds on the philosophy of the adversarial training. Moore teaches the process regularizing the loss function of a training set to produce a classifier model that accurately identifies the labels of tokens in data other than the training set. Drissi teaches generating a data set from disturbed data relating to a label other than a known label. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to combine the teachings of Drissi relating to training data sets generated from disturbed data based on label data other than the Miyato and Moore.
The motivation for doing so is to make rule-based systems more resilient to small disturbances in the data sources, by having a learner learn to classify disturbed inputs the same way it classifies “expected” input. (Drissi ¶ 0028).
10.	Claims 16 and 18-22 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 8,620,838 to Moore [hereinafter Moore] in view of US Published Application 2009/0240637 to Drissi et al. [hereinafter Drissi].
Regarding claim 16, Moore teaches [a] method of training a neural network, wherein the neural network is configured to receive an input data item and to process the input data item to generate a respective score for each label in a predetermined set of multiple labels, the method comprising:
obtaining a set of training data that includes a plurality of training items (Moore, col. 4, l. 62, teaches taking (obtaining) a training set T (set of training data that includes a plurality of training items), wherein each training item is associated with a respective label from the predetermined set of multiple labels (Moore, col. 4, ll. 62-64, teaches the process of taking a training set T that includes annotations of "correct” labels (assigns a respective target score) for each token (each training item)); 
modifying the set of training data (Moore col. 8, ll. 19-20, teaches that a regularization value prevents the model w from overfitting the training set T (regularizing training data)), comprising, for each training item: 
* * *
training the neural network on the modified training data (Moore col. 15, ll. 1-7 & FIG. 6, teaches that [t]he [training and classification] scheme 100 includes a trainer 102 that is configured to accept input 104. Input 104 includes a training set T, feature universe Ƒ, label universe L and a loss function l. Input 104 is used by trainer 102 (training) to generate a classification model w 106 (regularizing training data). The generation can be performed, for example, using technique 50. The generated classification model 2 is output to classifier 108 (training the neural network on the regularizing training data)).
Though Moore teaches, in a regularization context, the feature of minimizing the output of a loss function of a machine learning process to provide an update (or modify) a feature weight to improve the accuracy of a classifier model, Moore does not explicitly teach 
* * *
. . . for each training item: 
determining whether to modify the label associated with the training item; and
in response to determining to modify the label associated with the training item, changing the label associated with the training item that correctly describes the training item to a different label from the predetermined set of labels that incorrectly describes the training item; 
* * *
But Drissi teaches for each training item, determining whether to modify the label associated with the training item (Drissi ¶ 0031 teaches [v]arious algorithms determining to modify the label) in creating and adding the noise or disturbance); and
in response to determining to modify the label associated with the training item, changing the label associated with the training item that correctly describes the training item to a different label from the predetermined set of labels that incorrectly describes the training item (Drissi ¶ 0031 teaches in the case where the data is text data, a word and/or term may be replaced with its/their synonym(s) (changing the label associated with the training item that correctly describes the training item to a different label from the predetermined set of labels that incorrectly describes the training item)); . . . .
Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to incorporate the teachings of Drissi relating to changing a label with another label with the multi-label classification teachings of Moore.
The motivation for doing so is to [give] correct results (e.g., classifications) with regard to input data that even is changed slightly (Drissi ¶ 0021).
Regarding claim 18, the combination of Moore and Drissi teaches all of the limitations in claim 16, as described above. 
Drissi further teaches wherein the different label is randomly selected from the predetermined set of labels (Drissi ¶ 0031 teaches that [o]ther noise or disturbance under aspects of the invention include misspelling word(s); abbreviation of word(s); translation of sentences; and/or the like (different label is randomly selected from the predetermined set of labels)).
Regarding claim 19, the combination of Moore and Drissi teaches all of the limitations in claim 16, as described above. 
Drissi further teaches wherein the label includes a training label distribution that includes a score for the training item for each label in a predetermined set of labels associated with a set of training images (Drissi ¶ 0039 teaches the data may comprise multimedia data, video, images, text data, including email and/or the like (predetermined set of labels associated with a set of training images)).
Regarding claim 20, the combination of Moore and Drissi teaches all of the limitations in claim 19, as described above. 
Moore further teaches wherein changing the label associated with the training item to a different label from the predetermined set of labels includes changing the distribution of scores in a training data item's training label distribution from a distribution of scores representing a correct label to a distribution of scores representing an incorrect label (Moore col. 6, ll. 56-61 & FIG. 2, teaches the loss reduction on the training set between iterations can be measured and monitored. If the loss reduction falls below a threshold for a certain number of iterations, such as 1-5 iterations, then the stoppage of the training process can be triggered (iterations correspond to loss reduction on the training set between iterations, such that changing the label associated with the training item to a different label from the predetermined set of labels includes changing the distribution of scores representing a correct label to a distribution of scores representing an incorrect label)).
Regarding claim 21, the combination of Moore and Drissi teaches all of the limitations in claim 16, as described above. 
Moore further teaches wherein determining whether to modify the label associated with the training item is based on a predetermined probability (Moore col. 5, ll. 23-31, teaches selecting (determining whether to modify) the initial update set U includes using a log-likelihood-ratio (LLR) association metric (based on a predetermined probability). In this technique, an LLR score is calculated for each feature-value pair (f,l) found in the training set T. The feature value pairs are ordered by their scores, and a certain number are selected to include their associated feature weights in update set U in order from the highest score. The certain number can be pre-determined or be otherwise chosen or determined by technique 50).
Regarding claim 22, the combination of Moore and Drissi teaches all of the limitations in claim 21, as described above. Further, the value of the predetermined probability by Moore can be any value (see, e.g., Moore col. 5, ll. 23-31 (the predetermined probability is 10%)).
Response to Arguments
11.	Examiner has fully considered Applicant’s arguments and responds below.
12.	With regard to the rejections under Section 102, Applicant argues “that Miyato does not the disclose or suggest the feature of ‘for each training item, modifying the initial target label distribution associated with the training item by combining the initial target label distribution with a smoothing label distribution, wherein the smoothing label distribution includes a respective smoothing score for each label in the predetermined set of multiple labels,’ as recited in claim 1.” (Response at p. 11 (emphasis in original)).
Examiner respectfully disagrees because the Applicant’s argument appears to rely on features not set out in Applicant’s claims.
For example. Applicant submits as the basis of the argument that Miyato does not teach the features of the Applicant’s claims because “Miyato does not disclose modifying the labels for those training items during training.” (Response at p. 12 (emphasis in original)). But Applicant’s instant claim limitations speak to the feature of “modifying the initial target label distribution,” which is not “modifying the labels.” (see, e.g., claim 1, ll. 8-11).
Examiner notes that Applicant’s Specification does recite “[t]he action of modifying may include, for each training item, determining whether or not to modify the label associated with the training item, and in response to determining to modify the label associated with the training item, changing the label associated with the training item to a different label from the predetermined set of labels, and training the neural network on the regularizing data.” (PGPUB1 ¶ 0006). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 20 USPQ2d 1057 (Fed. Cir. 1993); see also MPEP § 2145.VI. Accordingly, Examiner interprets the language of the claim as being directed to, inter alia, the feature of “modifying the . . . target label distribution.”
modifying the . . . target label distribution” Applicant argues that “neither the LDS regularization term nor the objective function of equation (4) ‘modif[ies] the initial target label distribution’ for each training item. Moreover, as described above, the ‘perturbation’ that is determined and that is used in determining the LDS term, does not ‘modify[] the initial target label distribution,’ much less doing so by ‘combining the initial target label distribution with a smoothing label distribution,’ which includes ‘a respective smoothing score for each label in the predetermined set of multiple labels,’ as recited in claim 1.” (Response at p. 12).
Examiner respectfully disagrees because the Local Distributional Smoothness (LDS), in which Miyato teaches the feature that virtual adversarial training (VAT), [which is] a simple regularization technique that rewards the average of the LDS over all the training samples (that is, the Local Distributional Smoothness applied to the training samples is modifying the . . . target label distribution), in which the distribution is achieved by combining the initial target label distribution with a smoothing label distribution. The goal expressed in Miyato, Equation 4, which is a combinational term, teaches 

    PNG
    media_image3.png
    89
    753
    media_image3.png
    Greyscale

which is “to improve the smoothness of the model in the neighborhood of all the observed inputs . . . . The training based on [equation (4) is] the virtual adversarial training (VAT),” (Miyato, page 3, “2.1 Formalization of Local Distributional initial target label distribution.”
Moore teaches the feature of smoothness regularization to decrease generalization error. (Miyato, Abstract), in which a target label distribution associated with a training item is modified by combining the initial target label distribution with a smoothing label distribution such as the Local Distributional Smoothness of Miyato, which is a general notion that can be applied to any machine learner (Miyato, page 3, “2.1 Formalization of Local Distributional Smoothness,” last paragraph). 
13.	With respect to the rejections under Section 103 to claims 16 and 18-22, Applicant argues “Moore and Drissi does not disclose ‘changing the label associated with the training item that correctly describes the training item to a different label from the predetermined set of labels that incorrectly describes the training item,’ as recited in claim 1. As described above, Drissi discloses adding noise and/or disturbance to the input data (thus changing the purported training item) -- and not changing the classification for that training item (i.e., the purported label), as required by claim 16.” (Response at p. 14 (emphasis in original)).
Examiner respectfully disagrees because Moore teaches the feature of minimizing the output of a loss function of a machine learning process to provide an update (or modify) a feature weight to improve the accuracy of a classifier model. Drissi is directed to a resilient classifier to give correct classification results with regard to input data that may be changed slightly.
changing the label that correctly describes the training item to a different label from the predetermined set of labels that incorrectly describes the training item . . . .” (See, e.g., claim 16, ll. 10-12). 
But neither the Specification nor the claims, in a classification context, define the terms “correctly” or “incorrectly”. Without such guidance, under a broadest reasonable interpretation (BRI), words of the claim must be given their plain meaning, unless such meaning is inconsistent with the specification. (MPEP § 2111.01.I). 
“Correct” corresponds to “known” because the term “known” pertains to being “recognized, familiar, or within the scope of knowledge,” and hence, connotes a correct label; see also, e.g., Specification at p. 9, ¶ 0044, discloses “the value assigned to the correct label [(that is, known label)] is a positive value such as ‘1’ and all other labels are assigned a value such as ‘0.’“)).
The BRI of to “incorrectly describes the training item” includes purposely disrupting the classification of an item, including training items, to reduce overfitting and to improve the performance of the machine learning system (that is, the resiliency of the machine learning system). (See PGPUB ¶0041). In view of the BRI for Applicant’s claims, Drissi teaches the feature of “incorrectly describes the training item” by generating disturbed training data directed to improve classification resiliency (that is, give correct classification results) of a machine learning system. (See Drissi ¶ 0032).
Drissi teaches the features of generating two training data sets, one data set is generated from input data while the second data set is generated from disturbed data; a system for merging the two training data sets; and a system for training a data classifier with the merged training data sets. (Drissi, Abstract).

14.	With respect to the rejections under Section 103 to claims 23-25, Applicant argues “Moore and Miyato does not disclose, suggest, or otherwise render obvious the feature of ‘the smoothing distribution includes a respective smoothing score for each label in the predetermined set of multiple labels.’” (Response at p. 15).
Examiner respectfully disagrees. Moore teaches the feature of regularized loss function, in which the regularization value prevents the model w from overfitting the training set T (that is, regularization value is a smoothing distribution) (see Moore 8:14-20). Moore also teaches the feature in which a regularization constant controls a balance between regularization and unregularized loss reduction (Moore 8:32-33 & Eq. (7)).
Applicant argues “Moore's hinge loss function is based on the scores generated by the model-and is not based on another distribution, such as the ‘smoothing distribution,’ as recited in claim 23.” (Response at pp. 14-16). 
Examiner respectfully disagrees because the claim does not recite “scores generated by the model and is not based on another distribution,” but instead recites training the neural network to optimize a regularized loss function . . . comprising the first error term and a regularizing error term that penalizes the neural network based on the error between a predicted distribution and a smoothing distribution; . . . the smoothing distribution includes a respective smoothing score for each label in the predetermined set of multiple labels.” Moore teaches a regularized loss function includes a first term that is the unregularized loss value and a second term that is the regularization value. (See Moore 8:14-20),
With respect to a “smoothing score”, Examiner cites to Miyato as teaching this feature, as set out above in detail. 
Examiner also points out that the use of phrases such as “based on” or “associated with” as used in the claims are loose generalities that function to expand the BRI of the claim limitations because they do not connote a causal or resultant effect between the terms but are instead an incidental or “passing” relation. 
Moreover, the rejections hereinabove clearly sets forth which claim limitations are taught by each of the prior art references, and the reason why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings, and Applicant has not explained why the cited prior art references cannot be combined in the manner set forth in the rejection.
15.	Further, with respect to the rejection under Section 103, Applicant argues “the Action concedes that Moore does not disclose a ‘smoothing distribution’ and thus, it follows that Moore also does not disclose a ‘regularizing error term’ that is ‘based on the error between a predicted distribution and a smoothing distribution.’ Further still, and as described above, Miyato also does not disclose a ‘smoothing distribution’ and thus, like Moore, Miyato does not disclose a ‘regularizing error term’ that is ‘based on the error between a predicted distribution and a smoothing distribution.’ As such, Miyao does not cure Moore's deficiencies.” (Response at p. 16 (emphasis in original)).
Examiner respectfully disagrees. Miyato teaches “local distributional smoothness (LDS),” in which the smaller the value of ΔKLthe smoother the model distribution (that is, a smoothness score) (See Miyato at p. 3, “2.1 Formalization of Local Distributional Smoothness, first paragraph). 
Moreover, the rejections hereinabove clearly sets forth which claim limitations are taught by each of the prior art references, and the reason why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings, and Applicant has not explained why the cited prior art references cannot be combined in the manner set forth in the rejection.
Conclusion
16.	THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
17.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
(US Published Application 20130254153 to Marcheret) teaches optimizing a classifier in which a score is generated for each item of labeled training data.
18.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business 
/K.L.S./
Examiner, Art Unit 2122
/BABOUCARR FAAL/Primary Examiner, Art Unit 2184                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 For clarity, Examiner points out that the term “PGPUB” references the published application for the instant case, which is US Published Application 20170132512 to Sergey Ioffe.