DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2022-04-22 has been entered.  The status of the claims is as follows:
Claims 1-9 and 12-19 remain pending in the application.
Claims 1, 16, and 17 are amended.
Claims 10-11 and 20 are cancelled.
Response to Arguments
Applicant's argument with respect to rejections under 35 U.S.C. 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Remarks
Applicant states on Remarks Page 9:  “Support for the amendments to the specification and claims can be found throughout the application as filed, for example, at paragraph 33 of the specification, and at page 1 of Appendix A of U.S. Provisional Application Serial No. 62/379,705, filed on August 25, 2016, which is considered part of and is incorporated by reference in the disclosure of the current application.”
Regarding the newly added limitation that the task reward function is non-differentiable, Examiner notes that the only best (and only) support for this is Appendix A Page 1, which states:  “Not surprisingly, almost all task reward metrics are not differentiable, hence hard to optimize.”
Regarding the newly added limitation that the task reward function is different from the objective function optimized during the training of the neural network, Examiner notes that the best support for this lies in Appendix A Page 3: “However, since empirical reward is not amenable to numerical optimization, one often considers optimizing alternative differentiable objectives”.  Here, it appears that the “alternative…objective” is “alternative” to, and thus “different” from, the task reward function.  
For the reasons above, Examiner did not see it fit to issue any objections under 35 USC 112(a).
However, Examiner notes that it also appears from Instant Spec that the objective function actually comprises the task reward function, as the objective function in [0041] comprises q(y | y*, τ), which is subsequently shown to comprise the reward function r(y, y*) in [0054].  Thus, Applicant’s reward function is actually part of the objective function, and it may be possible for one to call the ”objective function” itself also a “reward function”. Examiner suggests, to advance prosecution, to provide more details on the interaction of the objective and reward functions, beyond them being “different”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 12-13, and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Gaidon (US 2017/0286774 A1) in view of Volkovs et. al. (“Loss-sensitive Training of Probabilistic Conditional Random Fields”; hereinafter Volkovs) and Gao et al. (US 2017/0353480 A1).
As per Claim 1, Gaidon teaches a computer-implemented method comprising: obtaining data identifying a neural network to be trained to perform a machine learning task, the neural network being configured to receive an input example and to process the input example in accordance with current values of a plurality of model parameters to generate a model output for the input example (Gaidon, Para [0010], discloses:  “One embodiment of the present disclosure is directed to a system for applying video data to a neural network (NN) for online multi-class multi-object tracking.”  Gaidon, Para [0056], discloses:  “The parameters of the disclosed neural network can be learned from scratch on training videos labeled with ground truth tracks using standard stochastic gradient descent with momentum and the hyper-parameters.”  Here, Gaidon discloses applying data to a neural network, which requires that one obtain data identifying the neural network in order to use it.  The neural network is to perform a machine learning task (“multi-class multi-object tracking” in video data).  The “video data” is an input example, and the neural network generates a model output (object tracking).  The model achieves this in accordance with current values of a plurality of model parameters, as Gaidon discloses “parameters of the disclosed neural network can be learned”.)
obtaining initial training data for training the neural network, the initial training data comprising a plurality of training examples and, for each training example, a ground truth output; (Gaidon, Para [0056], discloses:  “The parameters of the disclosed neural network can be learned from scratch on training videos labeled with ground truth tracks using standard stochastic gradient descent with momentum and the hyper-parameters.”  Here, Gaidon discloses initial training data comprising a plurality of training examples (“learned from scratch on training videos”), as well as a ground truth output for each training example (“labeled with ground truth tracks”)).
generating modified training data from the initial training data (Gaidon, Para [0053], discloses:  “First, the system learns from pairs of bounding boxes. The KITTI training ground truth tracks—without any data augmentation—provides approximately 100K training samples when down-sampling the negative sample pairs to yield the same number as all possible positive ones. This can be further increased by using either jittering, allowing for time-skips, or replacing ground truth annotations by strongly-overlapping detections (or even object proposals). These data augmentation strategies are contemplated to boost recognition performance further, or at least contribute to or prevent over-fitting.”  Here, Gaidon discloses generating modified training data from the initial training data (“replacing ground truth annotations by strongly overlapping detections (or even object proposals)”).
and training the neural network on the modified training data to determine trained values of the plurality of model parameters that optimize an objective function  (Gaidon Para [0053] discloses that this data with replaced ground truth annotations will be used for training, as Gaidon recites in the last sentence: “These data augmentation strategies are contemplated to boost recognition performance further, or at least contribute to or prevent over-fitting”, wherein overfitting is a known challenge with training.  Gaidon, Para [0048], discloses:  “The present disclosure provides a pairwise cost function that is parametrized as a deep convolutional network (“neural network”) having an architecture depicted in FIGS. 4A-4D”.  Here, Gaidon discloses optimizing parameters of an objective function (“cost function that is parametrized”)).
Gaidon further teaches that generating modified training data from the initial training data comprises:
calculating a [stationary] score distribution over a plurality of candidate auxiliary outputs from a set of possible outputs that can be generated by the neural network by processing the training example, comprising: determining, for each of the plurality of candidate auxiliary outputs, a respective measure of the similarity of the candidate auxiliary output to the ground truth output for the training example according to a task reward function for the machine learning task; and determining the [stationary] score distribution from the respective measures for the plurality of candidate auxiliary outputs  (Gaidon, Para [0053], discloses:  “First, the system learns from pairs of bounding boxes. The KITTI training ground truth tracks—without any data augmentation—provides approximately 100K training samples when down-sampling the negative sample pairs to yield the same number as all possible positive ones. This can be further increased by using either jittering, allowing for time-skips, or replacing ground truth annotations by strongly-overlapping detections (or even object proposals). These data augmentation strategies are contemplated to boost recognition performance further, or at least contribute to or prevent over-fitting.”  Here, Gaidon discloses candidate auxiliary outputs (“strongly overlapping detections”).  Gaidon also discloses a respective measure of similarity of the candidate auxiliary output to the ground truth output.  Here it helps to know that Gaidon is in the field of image object recognition, as stated in the Introduction:  “A system for applying video data to a neural network (NN) for online multi-class multi-object tracking includes a computer programed to perform an image classification method including the operations of receiving a video sequence; detecting candidate objects in each of a previous and a current video frame.”  In a video frame, there may be a plurality of candidates, and so a “strongly-overlapping detection” may be one of a plurality of nearby candidates, and “strongly-overlapping” is a measure of the similarity of the candidate auxiliary output to the ground truth output, as an annotation indicating this object would be much more similar to the ground truth than an annotation indicating a completely foreign object not in the frame at all.
Gaidon above also discloses calculating/determining a score distribution from the respective measures, as Gaidon above discloses “strongly overlapping” detections, which implies some quantitative measure for each candidate auxiliary output.  Indeed, in order for an output to be considered “strongly overlapping”, a probability is calculated.  This is stated in Gaidon [0037].  Gaidon, Para [0037], discloses the concept of a score distribution over candidates:  “In other words, the neural network 56 takes the candidate objects—which can be associated with a target object being tracked in a given previous frame and a new or existing object in a current frame—and creates a data association matrix—a number of targets by a number of detections. The neural network 56 generates a probability (“association score”) in each cell of the matrix corresponding that the detected object in the current frame matches the target object in the previous frame”.  This score is based on a measure of similarity between the ground truth (“target object”) and the candidate object.  The score is described as a probability, and is done for all candidate objects.  A collection of probabilities comprises a distribution.  Thus, in order to determine “strongly overlapping detections”, a collection of probabilities is calculated, which is what a probability distribution is.
This determination of “strongly overlapping detections” is according to a task reward function.  Gaidon, Para [0048], discloses:  “The present disclosure provides a pairwise cost function that is parametrized as a deep convolutional network (“neural network”) having an architecture depicted in FIGS. 4A-4D”.  Here, Gaidon discloses a “cost function”.  One of ordinary skill in the art will appreciate that a “cost function” is just the negative of a reward function.  Thus, if the detections are determined by utilizing a “loss function”, the measure of similarity of the candidate auxiliary outputs can be said to be “according to” a reward function, as the measure of similarity has a mathematical relationship with the equivalent reward function to the cost function.)
generating an auxiliary output for the training example from the ground truth output for the training example by sampling a candidate auxiliary output from the plurality of candidate auxiliary outputs in accordance with the calculated [stationary] score distribution over the plurality of candidate auxiliary outputs (Gaidon, Para [0053], discloses:  “First, the system learns from pairs of bounding boxes. The KITTI training ground truth tracks—without any data augmentation—provides approximately 100K training samples when down-sampling the negative sample pairs to yield the same number as all possible positive ones. This can be further increased by using either jittering, allowing for time-skips, or replacing ground truth annotations by strongly-overlapping detections (or even object proposals). These data augmentation strategies are contemplated to boost recognition performance further, or at least contribute to or prevent over-fitting.”  As shown above, Gaidon’s “strongly overlapping detections” are candidate auxiliary outputs and are based upon a score distribution.  Gaidon chooses one or more of these “strongly overlapping detections” based on the fact that they are, indeed, strongly overlapping, which is based on a collection or probabilities (as shown above in Gaidon [0037].  When Gaidon “augments” the training data with some of these detections, he is, in effect, “sampling” the candidate auxiliary outputs from the distribution of probabilities, wherein the highest probabilities are the strongest overlapping detections.)
replacing the ground truth output for the training example with the auxiliary output for the training example (Gaidon, Para [0053], discloses:  “First, the system learns from pairs of bounding boxes. The KITTI training ground truth tracks—without any data augmentation—provides approximately 100K training samples when down-sampling the negative sample pairs to yield the same number as all possible positive ones. This can be further increased by using either jittering, allowing for time-skips, or replacing ground truth annotations by strongly-overlapping detections (or even object proposals).” Here, Gaidon discloses “replacing ground truth annotations by strongly-overlapping detections”).
However, Gaidon does not teach calculating/determining a stationary score distribution from the respective measures for the plurality of candidate auxiliary outputs using a hyper-parameter that controls a concentration of the stationary score distribution; sampling a candidate auxiliary output from the plurality of candidate auxiliary outputs in accordance with the calculated stationary score distribution over the plurality of candidate auxiliary outputs
Volkovs teaches calculating/determining a stationary score distribution from the respective measures for the plurality of candidate auxiliary outputs using a hyper-parameter that controls a concentration of the stationary score distribution; sampling a candidate auxiliary output from the plurality of candidate auxiliary outputs in accordance with the calculated stationary score distribution over the plurality of candidate auxiliary outputs (Recall above that Gaidon discloses candidate auxiliary outputs.  Although Volkovs also suggests them in Pg. 7 Section 4 “Learning with Multiple Ground Truths”:  “In certain applications, for some given input xt, there is not only a single target yt that is correct (see Section 6 for the case of ranking). This information can easily be encoded within the loss function, by setting lt(y) = 0 for all such valid predictions”.
Volkovs, Pg. 5 Section 3.2, discloses:  “If the required expectations cannot be computed tractably, MCMC sampling can be used to approximate them.”  Here, Volkovs discloses “MCMC sampling”.  One of ordinary skill in the art will appreciate the Monte Carlo Markov Chain sampling comprises sampling from a stationary distribution (for support for this statement, see Kroc (“Introduction to Markov Chain Monte Carlo”), Pg. 8: “Markov Chain Monte Carlo basic idea: Given a prob. distribution π on a set Ω, the problem is to generate random elements of Ω with distribution π. MCMC does that by constructing a Markov Chain with stationary distribution π and simulating the chain.”
Volkovs also discloses using a hyper-parameter that controls a concentration of the stationary score distribution on Page 5 Section 3.3:  “There are several ways of defining the target distribution q(y|t). In this work, we define it as follows:”

    PNG
    media_image1.png
    47
    579
    media_image1.png
    Greyscale

where the temperature parameter T controls how peaked this distribution is around yt.”  Here, Volkovs discloses a temperature parameter T that controls the concentration of the distribution, which Volkovs above disclosed can be approximated by a stationary distribution.)
Gaidon and Volkovs are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the candidate auxiliary output sampling of Gaidon with the MCMC sampling of Volkovs.  One of ordinary skill in the art would be motivated to do so in order to efficiently execute sampling when calculating the true distribution would be an intractable problem (Volkovs Pg. 5 Section 3.2:  “If the required expectations cannot be computed tractably, MCMC sampling can be used to approximate them”).
It also would have been obvious to combine the temperature parameter of Volkovs.  One of ordinary skill in the art would be motivated to reduce bias for more accurate sampling.  See de Freitas (“Adaptive Parallel Tempering MCMC”) Pg. 2 Lines 63-66 and Pgs. 2-3 Lines 107-111, Section 1.1 “Parallel Tempered MCMC”:  “However, it is also often the case that the chain will be initialized in a location of extremely low probability. The first stretch of the chain will then make the rest of the chain a biased approximation for all but very large numbers of samples…PTMCMC is a method for generating candidate samples from all over a distribution, overcoming low probability regions between areas of importance. The inspiration for the parallel tempering MCMC algorithm comes from the idea that a temperature parameter could be used to flatten out the target distribution, see Figure 1(b). As the temperature of the distribution is raised the distribution flattens out, making the random walk chain for that temp more likely to mix quickly”.  Volkovs also hints at “flatten out the target distribution” in Pg. 7 Section 4 “Learning with Multiple Ground Truths”:  “On the other hand, the maximum likelihood and loss upper bound objectives add the requirement that the mass be equally distributed amongst those configurations.”)
However, the combination of Gaidon and Volkovs does not explicitly teach wherein the task reward function that is used to generate the respective measures of similarity for the candidate auxiliary outputs is non-differentiable and is different from the objective function optimized during the training of the neural network.
Gao teaches wherein the task reward function that is used [to generate the respective measures of similarity for the candidate auxiliary outputs] is non-differentiable and is different from the objective function optimized during the training of the neural network. (Recall above that Gaidon established measures of similarity for candidate auxiliary outputs.  Gao, Para [0029], concludes:  “As the Hamming distance function is non-differentiable, exemplary embodiments of the present invention may introduce a continuous loss function to approximate the Hamming distance with performance guaranteed. An effective algorithm with good convergence property may be provided via Stochastic Gradient Descent technique.”  Here, Gao discloses a “non-differentiable” function (“Hamming distance”).  This may be part of a reward function, as Instant Spec [0058] discloses a reward function comprising the Hamming distance:  “a distance metric can be Hamming distance or edit distance. In such cases, the task reward function…can be a negative of the selected distance”).  Gao also discloses a “continuous loss function” to “approximate” the Hamming distance.  This objective function (“loss function”) is different from the reward function, because it is an “approximation” of the reward function.  It is also optimized during training of the neural network, as it is optimized with “Stochastic Gradient Descent”.  Examiner notes that there is also a direct relationship between the objective function and reward function in the Instant Specification, as the objective function in [0041] comprises q(y | y*, τ), which is subsequently shown to comprise the reward function r(y, y*) in [0054].  Thus, Applicant’s reward function is actually part of the objective function.)
Gao and the combination of Gaidon and Volkovs are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the differentiable alternative to a non-differentiable function of Gao with the machine learning system of the combination of Gaidon and Volkovs.  One of ordinary skill in the art would be motivated to do so in order to more easily perform optimization of the machine learning parameters, for example using Stochastic Gradient Descent, which as one of ordinary skill in the art will appreciate, requires a differentiable objective function (Gao [0029]:  “As the Hamming distance function is non-differentiable, exemplary embodiments of the present invention may introduce a continuous loss function to approximate the Hamming distance with performance guaranteed. An effective algorithm with good convergence property may be provided via Stochastic Gradient Descent technique.”  Examiner notes that Gao echoes Applicant’s Appendix A of the Provisional Application, which states on Page 1 (“Not surprisingly, almost all task reward metrics are not differentiable, hence hard to optimize”, and on Page 3:  “However, since empirical reward is not amenable to numerical optimization, one often considers optimizing alternative differentiable objectives.”  Gao is also “optimizing alternative differentiable objectives”.)

	As per Claim 2, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1.  Gaidon teaches wherein the machine learning task is a structured output prediction task.  (Gaidon, Para [0010], discloses “One embodiment of the present disclosure is directed to a system for applying video data to a neural network (NN) for online multi-class multi-object tracking”. Object tracking in videos is a structured output prediction task.  For support for this statement, see Liu et. al. (“Learning to Track Multiple Targets”) Pg. 1062:  “Since the output labels are interdependent, this is a structured prediction problem”)).

	As per Claim 3, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1.  Gaidon teaches wherein training the neural network on the modified training data comprises training the neural network to generate model outputs for the training examples that match the auxiliary outputs for the training examples using a gradient descent training technique (Gaidon, Para [0053], discloses training the neural network to match auxiliary outputs rather than the ground truth training examples:  “First, the system learns from pairs of bounding boxes. The KITTI training ground truth tracks—without any data augmentation—provides approximately 100K training samples when down-sampling the negative sample pairs to yield the same number as all possible positive ones. This can be further increased by using either jittering, allowing for time-skips, or replacing ground truth annotations by strongly-overlapping detections (or even object proposals). These data augmentation strategies are contemplated to boost recognition performance further, or at least contribute to or prevent over-fitting.”  Gaidon, Para [0056], also discloses training with gradient descent technique:  “The parameters of the disclosed neural network can be learned from scratch on training videos labeled with ground truth tracks using standard stochastic gradient descent with momentum and the hyper-parameters”.  Examiner’s Note:  Though Gaidon recites “learned from scratch on training videos labeled with ground truth tracks”, the mere suggestion that the training videos are labeled does not imply that the labels may not be replaced as in Para [0053], nor that some alternative to gradient descent is used if the ground truth labels are replaced.)

As per Claim 4, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 3.  Volkovs teaches wherein training the neural network on the modified training data comprises training the neural network using maximum likelihood training. (Volkovs, Section 6 Last Paragraph, discloses training using maximum likelihood:  “We trained CRFs according to maximum likelihood as well as the different loss-sensitive objectives described in Section 3”)
This teaching of Volkovs and the combination of Gaidon, Volkovs, and Gao are analogous art because they are in the field of endeavor of machine learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the modified training data of the combination of Gaidon, Volkovs, and Gao with the maximum likelihood training of Volkovs. The modification would have been obvious because one of ordinary skill in the art would be motivated to achieve consistent performance and retain efficiency as the dataset grows. (Volkovs, Section 3 Line 2:  “In the well-specified case and for large datasets, this would probably not be a problem because of the asymptotic consistency and efficiency properties of maximum likelihood”)

As per Claim 5, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1.  Gaidon teaches wherein the measure of the similarity of the candidate auxiliary output to the ground truth output is a value of the task reward function for the machine learning task for the candidate auxiliary output.  (Gaidon, Para [0053], discloses:  “First, the system learns from pairs of bounding boxes. The KITTI training ground truth tracks—without any data augmentation—provides approximately 100K training samples when down-sampling the negative sample pairs to yield the same number as all possible positive ones. This can be further increased by using either jittering, allowing for time-skips, or replacing ground truth annotations by strongly-overlapping detections (or even object proposals)”).  This determination of “strongly overlapping detections” is based on a measure of similarity to the ground truth output, as Gaidon, Para [0048], discloses:  “The present disclosure provides a pairwise cost function that is parametrized as a deep convolutional network (“neural network”) having an architecture depicted in FIGS. 4A-4D”.  Here, Gaidon discloses a “cost function”.  One of ordinary skill in the art will appreciate that a “cost function” is just the negative of a reward function.  Thus, if the detections are determined by utilizing a “loss function”, the measure of similarity of the candidate auxiliary outputs can be said to be “a value of” a reward function, as the measure of similarity has a mathematical relationship with the equivalent reward function to the cost function.)

As per Claim 12, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1.  Volkovs teaches wherein the hyper-parameter is a temperature hyper-parameter that controls the concentration of the stationary score distribution.  (Volkovs, Page 5 Section 3.3, discloses:  “There are several ways of defining the target distribution q(y|t). In this work, we define it as follows:”

    PNG
    media_image1.png
    47
    579
    media_image1.png
    Greyscale

where the temperature parameter T controls how peaked this distribution is around yt.”  Here, Volkovs discloses a temperature parameter T that controls the concentration of the distribution, which Volkovs discloses can be approximated by a stationary distribution, as stated in Volkovs, Pg. 5 Section 3.2:  “If the required expectations cannot be computed tractably, MCMC sampling can be used to approximate them.”)
	It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Volkovs with Gaidon and Gao, for at least the reasons recited in Claim 1.

As per Claim 13, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 12.  Volkovs teaches wherein the score for each of the candidate auxiliary outputs is proportional to the scaled measure of the similarity exponentiated.  (Volkovs, Page 5 Section 3.3, discloses:  “There are several ways of defining the target distribution q(y|t). In this work, we define it as follows:”

    PNG
    media_image1.png
    47
    579
    media_image1.png
    Greyscale

where the temperature parameter T controls how peaked this distribution is around yt.”  Here, Volkovs discloses a score (a value of the “target distribution q(y | t)”) proportional to a scaled measure of the similarity (“lt(y)” is a loss function, and thus a measure of similarity, and is scaled by the “temperature parameter T” in the denominator in Eq. 9), which is exponentiated, as also seen in Eq. 9 as this scaled value is within the “exp” term.)
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Volkovs with Gaidon and Gao, for at least the reasons recited in Claim 1.

As per Claim 16, Claim 16 is a system claim corresponding to method Claim 1.  The difference is that Claim 16 recites one or more computers and one or more storage devices.  (Gaidon, Para [0010], discloses one or more computers: “The system comprises a computer programed to perform a method” and in [0011] one or more storage devices:  “Another embodiment of the present disclosure is directed to a non-transitory storage medium storing instructions readable and executable by a computer”).  Claim 16 is rejected for the same reasons as Claim 1.

As per Claim 17, Claim 17 is a non-transitory computer storage medium claim corresponding to method Claim 1.  The difference is that Claim 17 recites one or more computers and a non-transitory computer storage medium.  (Gaidon, Para [0010], discloses one or more computers: “The system comprises a computer programed to perform a method” and in [0011] a non-transitory computer storage medium:  “Another embodiment of the present disclosure is directed to a non-transitory storage medium storing instructions readable and executable by a computer”).  Claim 17 is rejected for the same reasons as Claim 1.

As per Claim 18, Claim 18 is a system claim corresponding to method Claim 3.  Claim 18 is rejected for the same reasons as Claim 3.

As per Claim 19, Claim 19 is a system claim corresponding to method Claim 4.  Claim 19 is rejected for the same reasons as Claim 4.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gaidon, Volkovs, and Gao, further in view of Przybocki et. al. (“Edit Distance: A Metric for Machine Translation Evaluation”; hereinafter Przybocki).
As per Claim 6, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1 as well as neural network, task reward function, ground truth output, and candidate auxiliary outputs (see Rejection to Claim 1).  However, the combination of Gaidon, Volkovs, and Gao does not explicitly teach wherein the machine learning task is a task in which the neural network generates an output that is a sequence of tokens, and wherein the task reward function is a negative edit distance between the ground truth output and the candidate auxiliary output.
Przybocki teaches wherein the machine learning task is a task in which the neural network generates an output that is a sequence of tokens, and wherein the task reward function is a negative edit distance between the ground truth output and the candidate auxiliary output. (Recall above that Gaidon discloses a neural network and a “cost function” which is just a negative reward function, as well as the candidate auxiliary output. Przybocki, Pg. 2042, discloses “Our POC exercises provided us with insight into what we might expect in a formal evaluation of machine translation when using edit distance as the metric.”  Here, Przybocki discloses a machine learning task where the output is a sequence of tokens (“machine translation”), and wherein the result is evaluated using “edit distance”.  Thus, if using a reward function, this would be based on the negative of the cost function, which is the negative edit distance.)
	Przybocki and the combination of Gaidon, Volkovs, and Gao are analogous art because they are in the field of endeavor of machine learning.
	It would have been obvious before the effective filing date of the claimed invention to combine the structured prediction task with sampling of candidate outputs of Gaidon, Volkovs, and Gao, with the edit distance of Przybocki.  One of ordinary skill in the art would be motivated to do so in order to maximize the accuracy of the machine translation by using the most accurate cost/reward function (Przybocki Pg. 2042:  “As can be seen by the data for our POC-2 exercise, edit distance correlated (across editors) with the human judgments more strongly (0.831) than did BLEU (0.764) or METEOR (0.789).”)

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gaidon, Volkovs, and Gao, further in view of Papineni et. al. (“BLEU: a Method for Automatic Evaluation of Machine Translation”; hereinafter Papineni).
As per Claim 7, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1 as well as task reward function and candidate auxiliary outputs (see Rejection to Claim 1).  However, the combination of Gaidon, Volkovs, and Gao does not explicitly teach wherein the machine learning task is a machine translation task, and wherein the task reward function is a BLEU score for the candidate auxiliary output. 
Papineni teaches wherein the machine learning task is a machine translation task, and wherein the task reward function is a BLEU score for the candidate auxiliary output. (Recall above that Gaidon discloses a neural network and a “cost function” which is just a negative reward function, as well as the candidate auxiliary output. Papineni, Pg. 8, discloses “We believe that BLEU will accelerate the MT R&D cycle by allowing researchers to rapidly home in on effective modeling ideas. Our belief is reinforced by a recent statistical analysis of BLEU’s correlation with human judgment for translation into English from four quite different languages (Arabic, Chinese, French, Spanish) representing 3 different language families (Papineni et al., 2002)!”  Here, Papineni discloses using BLEU score for machine translation.)
	Papineni and the combination of Gaidon, Volkovs, and Gao are analogous art because they are in the field of endeavor of machine learning.
	It would have been obvious before the effective filing date of the claimed invention to combine the structured prediction task with sampling of candidate outputs of Gaidon, Volkovs, and Gao, with the BLEU score of Papineni.  One of ordinary skill in the art would be motivated to do save time and money when performing machine translation (Papineni Pg. 1:  “Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that cannot be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run”).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gaidon, Volkovs, and Gao, further in view of Graves et. al. (“Towards End-to-End Speech Recognition with Recurrent Neural Networks”; hereinafter Graves).
As per Claim 8, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1 as well as task reward function (see Rejection to Claim 1).  However, the combination of Gaidon, Volkovs, and Gao does not explicitly teach wherein the machine learning task is a speech recognition task, and wherein the task reward function is a negative word error rate for the candidate auxiliary output.
Graves teaches wherein the machine learning task is a speech recognition task, and wherein the task reward function is a negative word error rate for the candidate auxiliary output. (Recall above that Gaidon discloses a neural network and a “cost function” which is just a negative reward function, as well as the candidate auxiliary output.  Graves, Abstract, discloses a speech recognition task: “This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation”.  Graves, Section 1 Page 2 Bottom of Left column, discloses:  “The basic system is enhanced by a new objective function that trains the network to directly optimize the word error rate.”  Graves, Section 4 Page 5 bottom of left column, describes word error rate as a “loss function”:  “However, for many loss functions (including word error rate) this could be optimized by only recalculating that part of the loss corresponding to the alignment change. For our experiments, five samples per sequence gave sufficiently low variance gradient estimates for effective training.”  Thus, if using a reward function, this would be based on the negative of the cost function, which is the negative word error rate.)
Graves and the combination of Gaidon, Volkovs, and Gao are analogous art because they are in the field of endeavor of machine learning.
	It would have been obvious before the effective filing date of the claimed invention to combine the structured prediction task with sampling of candidate outputs of Gaidon, Volkovs, and Gao, with the WER of Graves.  One of ordinary skill in the art would be motivated to do in order to benefit from the reliability of a well-established practice in the art (Graves Pg. 4:  “In speech recognition, for example, the standard measure is the word error rate (WER), defined as the edit distance between the true word sequence and the most probable word sequence emitted by the transcriber.”), and achieve state-of-the-art accuracy (Graves Pg. 8, “We have also introduced a novel objective function that allows the network to be directly optimised
for word error rate, and shown how to integrate the network outputs with a language model during decoding. Finally, by combining the new model with a baseline, we have achieved state-of-the-art accuracy on the Wall Street Journal corpus for speaker independent recognition.”)

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gaidon, Volkovs, and Gao, further in view of Sahba et. al. (“A reinforcement agent for object segmentation in ultrasound images”; hereinafter Sahba).
As per Claim 9, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1 as well as ground truth and candidate auxiliary output (see Rejection to Claim 1).  However, the combination of Gaidon, Volkovs, and Gao does not explicitly teach wherein the machine learning task is an image masking task, and wherein the task reward function is based on (i) a union of pixels that are masked in the candidate auxiliary output and pixels that are masked in the ground truth output and (ii) an intersection of pixels that are masked in the candidate auxiliary output and pixels that are masked in the ground truth output. 
Sahba teaches wherein the machine learning task is an image masking task, and wherein the task reward function is based on (i) a union of pixels that are masked in the candidate auxiliary output and pixels that are masked in the ground truth output and (ii) an intersection of pixels that are masked in the candidate auxiliary output and pixels that are masked in the ground truth output.  (Sahba, Pg. 772 Abstract Lines 3-5, discloses:  “The agent uses some images and their ground-truth (manually segmented) version to learn from. A reward function is employed to measure the similarities between the output and the manually segmented images, and to provide feedback to the agent.”  Here, Sahba discloses “segmented images”.  “Image segmentation” is another term for “image masking” (see extrinsic evidence https://www.tensorflow.org/tutorials/images/segmentation : “Thus, the task of image segmentation is to train a neural network to output a pixel-wise mask of the image”).  Sahba also discloses a reward function for the image masking task.  Sahba, Pg. 777 Section 3.4, further elaborates:  “The rewards and punishments can be defined based on a quality criterion representing how well the object has been segmented in each sub-image. Several criteria can be used for this purpose. A straightforward method is to compare the results with the ground-truth image after each action.  To measure this value for each sub-image, we note that how much the quality has changed after the action. In each sub-image, to improve the quality of the segmented object the agent receives rewards; otherwise it will be punished. A general form for the reward function can be represented as follows [Eq 10] where D is a measure indicating the difference between the quality after and before taking the action. It can be calculated using the normalized number of misclassified pixels in the segmented sub-images.”  Here, Sahba discloses the reward function as being based on the “normalized number of misclassified pixels in the segmented sub-images”.  The “number of misclassified pixels” is based on the union of pixels that are masked in the candidate auxiliary output and the ground truth output (in order to identify misclassified pixels, one must know all pixels that have been masked in both images, which is the union), and an intersection of pixels that are masked in the candidate auxiliary output and in the ground truth output (these are correctly classified pixels).  The difference between the union of masked pixels and the intersection of masked pixels results in the number of misclassified pixels, which is the value of the reward function.)
Sahba and the combination of Gaidon, Volkovs, and Gao are analogous art because they are in the field of endeavor of machine learning.
	It would have been obvious before the effective filing date of the claimed invention to combine the structured prediction task with sampling of candidate outputs of Gaidon, Volkovs, and Gao, with the reward function based on intersection of pixels of Sahba.  One of ordinary skill in the art would be motivated to do in order to benefit from the reliability of a well-established practice in the art (Sahba Pg. 777:  “A straightforward method is to compare the results with the ground-truth image after each action”) which results in accurate results (Sahba Pg. 779: “Considering the results in terms of visual appearance and accuracy, they can be used as a very suitable coarse level estimation to serve a fine-tuning segmentation algorithm.”)

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gaidon, Volkovs, and Gao further in view of Modarresi et. al. (US 2017/0116530 A1; hereinafter Modarresi).
As per Claim 14, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1. However, the combination of Gaidon, Volkovs, and Gao does not explicitly teach wherein sampling the candidate output comprises: sampling the candidate output using stratified sampling.
Modarresi teaches wherein sampling the candidate output comprises: sampling the candidate output using stratified sampling.  (Modarresi, Para [0047], discloses:  “Sampling of any form may be used, such as random sampling or stratified sampling.”)
Modarresi and the combination of Gaidon, Volkovs, and Gao are analogous art because they are in the field of endeavor of machine learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the structured prediction task with sampling of candidate outputs of Gaidon, Volkovs, and Gao, with the stratified sampling of Modarresi.  The modification would have been obvious because one of ordinary skill in the art would have been motivated to improve the accuracy of the structured prediction model (Modarresi [0047]: “With stratified sampling, a population (e.g., set of potential values for a parameter) is divided into different subgroups or strata and, thereafter, samples are randomly selected proportionally from the different strata in accordance with respective probabilities. As such, stratified sampling can be used to increase the likelihood of parameter values being selected that are more likely to result in an accurate prediction.”)

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gaidon, Volkovs, and Gao further in view of Vasseur et. al. (US 2017/0279834 A1; hereinafter Vasseur).
As per Claim 15, the combination of Gaidon, Volkovs, and Gao teaches the method of claim 1.  However, the combination of Gaidon, Volkovs, and Gao does not explicitly teach wherein sampling the candidate auxiliary output comprises: sampling the candidate output using importance sampling.
Vasseur teaches wherein sampling the candidate auxiliary output comprises: sampling the candidate output using importance sampling. (Vasseur, Para [0132], discloses:  “For each managed UPC 516, perform an importance sampling of the anomaly, as described below”)
Vasseur and the combination of Gaidon, Volkovs, and Gao are analogous art because they are in the field of endeavor of machine learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the structured prediction task with sampling of candidate outputs of Gaidon, Volkovs, and Gao, with the importance sampling of Vasseur.  The modification would have been obvious because one of ordinary skill in the art would have been motivated to improve the accuracy of training by avoiding bias from the training set (Vasseur [0132]: “Recall that importance sampling is a statistical technique that, amongst other things, allows for the training of unbiased classifiers from biased training sets. The technique generally entails designing an “importance function” that, for each sample, gives the probability of using this sample for training or not, and the samples are randomly chosen according to these probabilities. In the current case, the importance function may be a function of the classification output and the confidence of the classifier on this output. The former is directly the output of the classifier itself, while the latter is a measure of the estimated variance of the classifier output. Then, the importance sampling process is done as follows: If the output is “irrelevant” with high confidence, mark the anomaly to be forwarded with a very low probability (i.e., not interesting, but from time to time some anomalies of this type are forwarded for validating the classifier validity); If the output is “relevant” with high confidence, mark the anomaly to be forwarded (i.e., interesting anomaly, therefore always forwarded);If the output has a low confidence, mark the anomaly to be forwarded with a very high probability (i.e., the classifier does not know how to classify it, therefore usually forwarded for collecting data that will improve the classifier in a future training”)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ranzato et al. (“Sequence Level Training with Recurrent Neural Networks”) also discloses a task reward function, and notes on Page 1 that “Training these models to directly optimize metrics like BLEU is hard because a) these are not differentiable”, at the bottom of Page 5:  “Smoothing the input this way makes the whole process differentiable and trainable using standard back-propagation” and at the bottom of Page 9:  “Training at the sequence level and directly optimizing for testing score yields better generations than turning a sequence of discrete decisions into a differentiable process amenable to standard back-propagation of the error.”
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
/L.A.S./Examiner, Art Unit 2126     
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126