DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Preliminary Amendment
This action is responsive to the preliminary amendments and remarks received 17 April 2020. Claims 1 - 20 are currently pending.

Claim Objections
Claim 8 is objected to because of the following informalities: Lines 3 - 4 of claim 8 recite, in part, “cause clusters learnt by the clustering neural network correspond to the parts” which appears to contain a grammatical error and/or inconsistent claim terminology. The Examiner suggest amending the claim to --cause clusters learnt by the clustering neural network to correspond to the parts of the objects-- in order to maintain consistency with line 4 of claim 1 and to improve the clarity and precision of the claim. Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 7 and 10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 7 recites the limitation "the layers" in line 2 and line 3. There is insufficient antecedent basis for this limitation in the claim. The Examiner notes that line 7 of claim 1 recites “pooled layers”, however, the Examiner asserts that it is unclear as to whether or not “the layers” recited on line 2 and line 3 of claim 7 are referencing the previously recited “pooled layers” or not. Clarification and appropriate correction are required. 
Claim 10 recites the limitation "the groups learned by the clustering neural network" in lines 1 - 2. There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 4 - 9, 11, 16, 17 and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Matei et al. U.S. Publication No. 2019/0073560 A1.

-	With regards to claim 1, Matei et al. disclose a method, (Matei et al., Abstract, Figs. 4 - 6, Pg. 1 ¶ 0005, Pg. 2 ¶ 0012, Pg. 5 ¶ 0041 - Pg. 6 ¶ 0045, Pg. 11 ¶ 0079 and 0083 - 0084, Pg. 12 ¶ 0089 - 0092 and 0095 - 0096) comprising: at an electronic device with one or more processors: (Matei et al., Figs. 1 - 3, Pg. 2 ¶ 0013, Pg. 3 ¶ 0027 - Pg. 4 ¶ 0030, Pg. 5 ¶ 0040, Pg. 6 ¶ 0046, Pg. 8 ¶ 0060 - 0063 and 0065, Pg. 12 ¶ 0095 - Pg. 13 ¶ 0097) obtaining a training set of training inputs and corresponding training labels, (Matei et al., Figs. 1, 2 & 5, Pg. 2 ¶ 0014, Pg. 4 ¶ 0031 - 0033, Pg. 6 ¶ 0047 - 0048) the training labels identifying known locations of parts of objects in the training inputs; (Matei et al., Pg. 2 ¶ 0014, Pg. 4 ¶ 0031 - 0033, Pg. 6 ¶ 0047 - 0048) inputting the training inputs into a main task neural network to produce output labels predicting locations of the parts of the objects in the training inputs; (Matei et al., Figs. 1 - 7O, Pg. 2 ¶ 0014, Pg. 5 ¶ 0036 - 0038, Pg. 5 ¶ 0042 - Pg. 6 ¶ 0044, Pg. 6 ¶ 0047 - 0048, Pg. 7 ¶ 0051 - 0053, Pg. 7 ¶ 0057 - Pg. 8 ¶ 0058, Pg. 9 ¶ 0071 - Pg. 10 ¶ 0073, Pg. 10 ¶ 0076) inputting data from pooled layers of the main task neural network into a clustering neural network; (Matei et al., Abstract, Fig. 4, Pg. 1 ¶ 0008, Pg. 6 ¶ 0043, Pg. 6 ¶ 0049 - Pg. 7 ¶ 0054, Pg. 8 ¶ 063, Pg. 9 ¶ 0067 - 0068 and 0071, Pg. 10 ¶ 0074 - 0075 [“machine learning system 102 applies first set of filters 120A to image 112 to generate an intermediate representation of image 112 suitable as an input to both second set of filters 120B and third set of filters 120C. Machine learning system 102 applies second set of filters 120B to the intermediate representation of image 112 to generate part localization data 116 for the object. In some examples, part localization data 116 comprises data identifying one or more sub-parts of the object and one or more regions of image 112 in which the one or more sub-parts of the object are located” and “CCN 210 comprises a plurality of convolutional filters. Each filter comprises a vector of weights and a bias. As described herein, the terms ‘filter’ and ‘layer’ of CCN 210 may be used interchangeably. CNN 210 receives image 112 as an input, applies a convolution operation of a first filter of the plurality of filters to image 112, and passes the output of the first filter to the next filter of the plurality of filters. Thus, CNN 210 applies each filter of the plurality of filters to an output of a previous filter of the plurality of filters. Further, an output of each filter may ‘map’ to an input of a subsequent filter to form the neural network relationships of CNN 210”]) and training the main task neural network and the clustering neural network based on a main task loss from the main task neural network and a clustering loss from the clustering neural network. (Matei et al., Pg. 7 ¶ 0055, Pg. 9 ¶ 0069 - 0072, Pg. 10 ¶ 0075 - 0078, Pg. 11 ¶ 0081 [“CNN 210 combines part localization loss for the object and fine-grained classification loss for the object. By combining the loss, CNN 210 may enable end-to-end, multi-task, data-driven training of all network parameters of convolutional neural network model 106”]) 

-	With regards to claim 4, Matei et al. disclose the method of claim 1 further comprising determining the main task loss using learned quality assurance metrics. (Matei et al., Pg. 5 ¶ 0036 - 0037, Pg. 6 ¶ 0044, Pg. 7 ¶ 0055, Pg. 8 ¶ 0064, Pg. 9 ¶ 0069 and 0072, Pg. 10 ¶ 0076 - 0078, Pg. 11 ¶ 0080 - 0082) 

-	With regards to claim 5, Matei et al. disclose the method of claim 1 wherein the clustering loss is configured to cause the clustering neural network to learn to label the parts of the objects individually. (Matei et al., Abstract, Pg. 4 ¶ 0032 - 0034, Pg. 5 ¶ 0038, Pg. 6 ¶ 0043 and 0047 - 0048, Pg. 7 ¶ 0053, Pg. 8 ¶ 0058 and 0063, Pg. 9 ¶ 0071 - Pg. 10 ¶ 0073, Pg. 11 ¶ 0080, Pg. 12 ¶ 0090 and 0094) 

-	With regards to claim 6, Matei et al. disclose the method of claim 1, wherein the clustering loss is configured to cause the clustering neural network to learn groups corresponding to the parts of the objects. (Matei et al., Pg. 3 ¶ 0025, Pg. 4 ¶ 0033 - 0034, Pg. 5 ¶ 0036 - 0039, Pg. 5 ¶ 0042 - Pg. 6 ¶ 0043, Pg. 6 ¶ 0049, Pg. 9 ¶ 0068 - Pg. 10 ¶ 0073, Pg. 10 ¶ 0078, Pg. 12 ¶ 0090 and 0092 - 0094) 

-	With regards to claim 7, Matei et al. disclose the method of claim 1, wherein the clustering neural network is trained to identify a first group of the layers corresponding to a first pattern and a second group of the layers corresponding to a second pattern. (Matei et al., Figs. 4 - 6, Pg. 1 ¶ 0008, Pg. 2 ¶ 0014, Pg. 4 ¶ 0035 - Pg. 5 ¶ 0036, Pg. 6 ¶ 0043 and 0047 - 0049, Pg. 7 ¶ 0053, Pg. 8 ¶ 0058 and 0063, Pg. 9 ¶ 0067 - Pg. 10 ¶ 0073, Pg. 11 ¶ 0080, Pg. 12 ¶ 0090 and 0094) 

-	With regards to claim 8, Matei et al. disclose the method of claim 1, wherein the main task neural network and the clustering neural network are trained together using the main task loss and the clustering loss (Matei et al., Pg. 7 ¶ 0055, Pg. 9 ¶ 0069 - 0072, Pg. 10 ¶ 0075 - 0078, Pg. 11 ¶ 0081) to cause clusters learnt by the clustering neural network correspond to the parts. (Matei et al., Figs. 4 - 6, Pg. 1 ¶ 0008, Pg. 2 ¶ 0014, Pg. 4 ¶ 0035 - Pg. 5 ¶ 0036, Pg. 6 ¶ 0043 and 0047 - 0049, Pg. 7 ¶ 0053, Pg. 8 ¶ 0058 and 0063, Pg. 9 ¶ 0067 - Pg. 10 ¶ 0073, Pg. 11 ¶ 0080, Pg. 12 ¶ 0090 and 0094) 

-	With regards to claim 9, Matei et al. disclose the method of claim 1, wherein the main task neural network and the clustering neural network are trained together using the main task loss and the clustering loss (Matei et al., Pg. 7 ¶ 0055, Pg. 9 ¶ 0069 - 0072, Pg. 10 ¶ 0075 - 0078, Pg. 11 ¶ 0081) to cause similarity between sub-parts of feature maps across multiple images. (Matei et al., Pg. 3 ¶ 0024 - 0025, Pg. 5 ¶ 0036 and 0038 - 0039, Pg. 5 ¶ 0042 - Pg. 6 ¶ 0045, Pg. 6 ¶ 0049, Pg. 7 ¶ 0051 - 0053, Pg. 9 ¶ 0067 - Pg. 10 ¶ 0074, Pg. 10 ¶ 0077 - 0078, Pg. 12 ¶ 0094) 

-	With regards to claim 11, Matei et al. disclose the method of claim 1, wherein a number of groups learned by the clustering neural network corresponds to a number of the parts of the objects. (Matei et al., Pg. 2 ¶ 0014, Pg. 4 ¶ 0031 - 0034, Pg. 5 ¶ 0036, Pg. 6 ¶ 0047 - 0049, Pg. 9 ¶ 0068 - Pg. 10 ¶ 0073, Pg. 11 ¶ 0080, Pg. 12 ¶ 0090 and 0094) 

-	With regards to claim 16, Matei et al. disclose the method of claim 1 further comprising integrating the main task neural network into an application stored on a non-transitory computer-readable medium. (Matei et al., Figs. 1 - 3, Pg. 2 ¶ 0013, Pg. 3 ¶ 0026 - Pg. 4 ¶ 0030, Pg. 5 ¶ 0040 - 0042, Pg. 6 ¶ 0044 - 0047, Pg. 7 ¶ 0055, Pg. 8 ¶ 0058 - 0063, Pg. 9 ¶ 0070 - 0071, Pg. 11 ¶ 0080 and 0083, Pg. 12 ¶ 0090 - 0092, Pg. 12 ¶ 0095 - Pg. 13 ¶ 0097) 

-	With regards to claim 17, Matei et al. disclose a system (Matei et al., Abstract, Figs. 1 - 3, Pg. 1 ¶ 0005, Pg. 1 ¶ 0008 - Pg. 2 ¶ 0009, Pg. 2 ¶ 0013, Pg. 3 ¶ 0026 - Pg. 4 ¶ 0030, Pg. 8 ¶ 0060 - 0062, Pg. 12 ¶ 0095 - Pg. 13 ¶ 0097) comprising: a non-transitory computer-readable storage medium; (Matei et al., Fig. 3, Pg. 2 ¶ 0013, Pg. 3 ¶ 0027 - Pg. 4 ¶ 0030, Pg. 8 ¶ 0062, Pg. 12 ¶ 0095 - Pg. 13 ¶ 0097) and one or more processors coupled to the non-transitory computer-readable storage medium, (Matei et al., Fig. 3, Pg. 2 ¶ 0013, Pg. 3 ¶ 0027 - Pg. 4 ¶ 0030, Pg. 8 ¶ 0062, Pg. 12 ¶ 0095 - Pg. 13 ¶ 0097) wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations (Matei et al., Fig. 3, Pg. 2 ¶ 0013, , Pg. 3 ¶ 0027 - Pg. 4 ¶ 0030, Pg. 8 ¶ 0062, Pg. 12 ¶ 0095 - Pg. 13 ¶ 0097) comprising: obtaining a training set of training inputs and corresponding training labels, (Matei et al., Figs. 1, 2 & 5, Pg. 2 ¶ 0014, Pg. 4 ¶ 0031 - 0033, Pg. 6 ¶ 0047 - 0048) the training labels identifying known locations of parts of objects in the training inputs; (Matei et al., Pg. 2 ¶ 0014, Pg. 4 ¶ 0031 - 0033, Pg. 6 ¶ 0047 - 0048) inputting the training inputs into a main task neural network to produce output labels predicting locations of the parts of the objects in the training inputs; (Matei et al., Figs. 1 - 7O, Pg. 2 ¶ 0014, Pg. 5 ¶ 0036 - 0038, Pg. 5 ¶ 0042 - Pg. 6 ¶ 0044, Pg. 6 ¶ 0047 - 0048, Pg. 7 ¶ 0051 - 0053, Pg. 7 ¶ 0057 - Pg. 8 ¶ 0058, Pg. 9 ¶ 0071 - Pg. 10 ¶ 0073, Pg. 10 ¶ 0076) inputting data from pooled layers of the main task neural network into a clustering neural network; (Matei et al., Abstract, Fig. 4, Pg. 1 ¶ 0008, Pg. 6 ¶ 0043, Pg. 6 ¶ 0049 - Pg. 7 ¶ 0054, Pg. 8 ¶ 063, Pg. 9 ¶ 0067 - 0068 and 0071, Pg. 10 ¶ 0074 - 0075 [“machine learning system 102 applies first set of filters 120A to image 112 to generate an intermediate representation of image 112 suitable as an input to both second set of filters 120B and third set of filters 120C. Machine learning system 102 applies second set of filters 120B to the intermediate representation of image 112 to generate part localization data 116 for the object. In some examples, part localization data 116 comprises data identifying one or more sub-parts of the object and one or more regions of image 112 in which the one or more sub-parts of the object are located” and “CCN 210 comprises a plurality of convolutional filters. Each filter comprises a vector of weights and a bias. As described herein, the terms ‘filter’ and ‘layer’ of CCN 210 may be used interchangeably. CNN 210 receives image 112 as an input, applies a convolution operation of a first filter of the plurality of filters to image 112, and passes the output of the first filter to the next filter of the plurality of filters. Thus, CNN 210 applies each filter of the plurality of filters to an output of a previous filter of the plurality of filters. Further, an output of each filter may ‘map’ to an input of a subsequent filter to form the neural network relationships of CNN 210”]) and training the main task neural network and the clustering neural network based on a main task loss from the main task neural network and a clustering loss from the clustering neural network. (Matei et al., Pg. 7 ¶ 0055, Pg. 9 ¶ 0069 - 0072, Pg. 10 ¶ 0075 - 0078, Pg. 11 ¶ 0081 [“CNN 210 combines part localization loss for the object and fine-grained classification loss for the object. By combining the loss, CNN 210 may enable end-to-end, multi-task, data-driven training of all network parameters of convolutional neural network model 106”]) 

-	With regards to claim 20, Matei et al. disclose a non-transitory computer-readable storage medium, (Matei et al., Fig. 3, Pg. 2 ¶ 0013, Pg. 3 ¶ 0027 - Pg. 4 ¶ 0030, Pg. 8 ¶ 0062, Pg. 12 ¶ 0095 - Pg. 13 ¶ 0097) storing program instructions computer-executable on a computer to perform operations (Matei et al., Fig. 3, Pg. 2 ¶ 0013, Pg. 3 ¶ 0027 - Pg. 4 ¶ 0030, Pg. 8 ¶ 0062, Pg. 12 ¶ 0095 - Pg. 13 ¶ 0097) comprising: obtaining a training set of training inputs and corresponding training labels, (Matei et al., Figs. 1, 2 & 5, Pg. 2 ¶ 0014, Pg. 4 ¶ 0031 - 0033, Pg. 6 ¶ 0047 - 0048) the training labels identifying known locations of parts of objects in the training inputs; (Matei et al., Pg. 2 ¶ 0014, Pg. 4 ¶ 0031 - 0033, Pg. 6 ¶ 0047 - 0048) inputting the training inputs into a main task neural network to produce output labels predicting locations of the parts of the objects in the training inputs; (Matei et al., Figs. 1 - 7O, Pg. 2 ¶ 0014, Pg. 5 ¶ 0036 - 0038, Pg. 5 ¶ 0042 - Pg. 6 ¶ 0044, Pg. 6 ¶ 0047 - 0048, Pg. 7 ¶ 0051 - 0053, Pg. 7 ¶ 0057 - Pg. 8 ¶ 0058, Pg. 9 ¶ 0071 - Pg. 10 ¶ 0073, Pg. 10 ¶ 0076) inputting data from pooled layers of the main task neural network into a clustering neural network; (Matei et al., Abstract, Fig. 4, Pg. 1 ¶ 0008, Pg. 6 ¶ 0043, Pg. 6 ¶ 0049 - Pg. 7 ¶ 0054, Pg. 8 ¶ 063, Pg. 9 ¶ 0067 - 0068 and 0071, Pg. 10 ¶ 0074 - 0075 [“machine learning system 102 applies first set of filters 120A to image 112 to generate an intermediate representation of image 112 suitable as an input to both second set of filters 120B and third set of filters 120C. Machine learning system 102 applies second set of filters 120B to the intermediate representation of image 112 to generate part localization data 116 for the object. In some examples, part localization data 116 comprises data identifying one or more sub-parts of the object and one or more regions of image 112 in which the one or more sub-parts of the object are located” and “CCN 210 comprises a plurality of convolutional filters. Each filter comprises a vector of weights and a bias. As described herein, the terms ‘filter’ and ‘layer’ of CCN 210 may be used interchangeably. CNN 210 receives image 112 as an input, applies a convolution operation of a first filter of the plurality of filters to image 112, and passes the output of the first filter to the next filter of the plurality of filters. Thus, CNN 210 applies each filter of the plurality of filters to an output of a previous filter of the plurality of filters. Further, an output of each filter may ‘map’ to an input of a subsequent filter to form the neural network relationships of CNN 210”]) and training the main task neural network and the clustering neural network based on a main task loss from the main task neural network and a clustering loss from the clustering neural network. (Matei et al., Pg. 7 ¶ 0055, Pg. 9 ¶ 0069 - 0072, Pg. 10 ¶ 0075 - 0078, Pg. 11 ¶ 0081 [“CNN 210 combines part localization loss for the object and fine-grained classification loss for the object. By combining the loss, CNN 210 may enable end-to-end, multi-task, data-driven training of all network parameters of convolutional neural network model 106”]) 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 2 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Matei et al. U.S. Publication No. 2019/0073560 A1 as applied to claims 1 and 17 above, and further in view of Guttmann U.S. Publication No. 2018/0336509 A1.

-	With regards to claims 2 and 18, Matei et al. disclose the method and system of claims 1 and 17, respectively, wherein the operations further comprise: inputting additional inputs into the main task neural network to produce additional output labels and corresponding confidence values; (Matei et al., Figs. 1 - 4 and 6, Pg. 5 ¶ 0037 - 0038 and 0040 - 00042, Pg. 7 ¶ 0053 - 0055, Pg. 9 ¶ 0071 - Pg. 10 ¶ 0073, Pg. 12 ¶ 0090 - 0094) and further training the main task neural network and the clustering neural network. (Matei et al., Pg. 5 ¶ 0036 - 0037, Pg. 6 ¶ 0043 - 0044 and 0049, Pg. 7 ¶ 0051 - 0052, Pg. 8 ¶ 0064, Pg. 10 ¶ 0078, Pg. 11 ¶ 0080 - 0081, Pg. 11 ¶ 0084 - Pg. 12 ¶ 0088) Matei et al. fail to disclose explicitly selecting, based on the confidence values, an automatically-labeled training set of data comprising a subset of the additional inputs and a corresponding subset of the additional output labels; and further training the main task neural network and the clustering neural network using the automatically-labeled training set of data. Pertaining to analogous art, Guttmann discloses inputting additional inputs into the neural network to produce additional output labels and corresponding confidence values; (Guttmann, Fig. 14, Pg. 32 ¶ 0221 - Pg. 33 ¶ 0222, Pg. 33 ¶ 0224 - Pg. 34 ¶ 0226) selecting, based on the confidence values, an automatically-labeled training set of data comprising a subset of the additional inputs and a corresponding subset of the additional output labels; (Guttmann, Fig. 14, Pg. 32 ¶ 0221, Pg. 34 ¶ 0226 - 0228) and further training the neural network using the automatically-labeled training set of data. (Guttmann, Fig. 14, Pg. 32 ¶ 0221, Pg. 33 ¶ 0223 and 0225, Pg. 34 ¶ 0232 [“generating a second inference model (Step 1460) may comprise generating a second inference model using at least part of the group of labeled examples obtained by Step 1410 and/or the subset of the group of unlabeled examples selected by Step 1450 and/or the labels assigned by Step 1430 to the examples in the selected subset of the group of unlabeled examples” and “In some examples, the inference model generated by Step 1420 may be updated according to the subset of the group of unlabeled examples selected by Step 1450 (and possibly the labels assigned by Step 1430 to the examples in the selected subset), for example using an online and/or incremental machine learning algorithm, by changing the lost function of the machine learning algorithm according to the new training examples and using the inference model and/or an intermediate state from Step 1420 in the initialization of the machine learning algorithm, by changing the batches of examples to include the new examples in a batch based machine learning algorithm, and so forth”]) Matei et al. and Guttmann are combinable because they are both directed towards training neural networks to detect objects in images. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Matei et al. with the teachings of Guttmann. This modification would have been prompted in order to enhance the base device of Matei et al. with the well-known and applicable technique Guttmann applied to a similar device. Training the neural network(s) based on a selected subset of additional inputs labeled by the neural network(s) with a high level of confidence, as taught by Guttmann, would enhance the base device of Matei et al. by improving its ability to accurately and reliably perform classification and localization of images in situations wherein only a limited amount of labeled training data is initially available for training the neural network(s) of the base device. Furthermore, this modification would have been prompted by the teachings and suggestions of Matei et al. that machine learning systems may require a large amount of training data to build an accurate model, that they test their system to validate that their trained convolutional neural network model accurately recognizes classification data, that they may iteratively repeat learning to refine the accuracy of their classification data in view of an error rate and that other supervised, unsupervised, semi-supervised, or reinforcement learning algorithms may be utilized to train their models, see at least page 1 paragraph 0004, page 5 paragraph 0037, page 6 paragraph 0044 and page 8 paragraph 0064 of Matei et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the neural network(s) of the base device would be further trained on a selected subset of additional inputs that have been automatically labeled by the neural network(s) with a high level of confidence so as to improve the ability of the base device to accurately and reliably perform classification and localization of images in situations wherein only a small amount of labeled training data is initially available for training the neural network(s) of the base device. Therefore, it would have been obvious to combine Matei et al. with Guttmann to obtain the invention as specified in claims 2 and 18. 

Claims 3 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Matei et al. U.S. Publication No. 2019/0073560 A1 as applied to claims 1 and 17 above, and further in view of Hwang et al. U.S. Publication No. 2018/0060722 A1.

-	With regards to claims 3 and 19, Matei et al. disclose the method and system of claims 1 and 17, respectively. Matei et al. fail to disclose expressly wherein the operations further comprise determining the main task loss by comparing the output labels and the training labels. Pertaining to analogous art, Hwang et al. disclose wherein the operations further comprise determining the main task loss by comparing the output labels and the training labels. (Hwang et al., Figs. 8 & 13, Pg. 3 ¶ 0056, Pg. 4 ¶ 0066 - 0072, Pg. 4 ¶ 0083 - Pg. 5 ¶ 0085, Pg. 5 ¶ 0097 - 0098, Pg. 6 ¶ 0126, Pg. 7 ¶ 0134, Pg. 8 ¶ 0164, Pg. 10 ¶ 0187 - 0192) Matei et al. and Hwang et al. are combinable because they are both directed towards training convolutional neural networks to simultaneously perform object classification and localization on image data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Matei et al. with the teachings of Hwang et al. This modification would have been prompted in order to substitute the classification loss of Matei et al. for the main task loss of Hwang et al. The main task loss of Hwang et al. could be substituted in place of the classification loss of Matei et al. utilizing well-known techniques in the art and would likely yield predictable results, in that in the combination the main task loss of Hwang et al. that is based on a comparison of a predicted output label with a given training label would be utilized as the classification, main task, loss in the base device of Matei et al. Furthermore, this modification would have been prompted by the teachings and suggestions of Matei et al. that they test their system to validate that their trained convolutional neural network model accurately recognizes classification data, that they may iteratively repeat learning to refine the accuracy of their classification data in view of an error rate, that supervised learning can be utilized to train their models and that they utilize a multinomial logistic loss as a classification loss, see at least page 5 paragraph 0037, page 6 paragraph 0044, page 8 paragraph 0064 and page 10 paragraph 0076 of Matei et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the main task loss of Hwang et al. that is based on a comparison of a predicted output label with a given training label would be utilized as the classification, main task, loss in the base device of Matei et al. Therefore, it would have been obvious to combine Matei et al. with Hwang et al. to obtain the invention as specified in claims 3 and 19. 

Claims 10, 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Matei et al. U.S. Publication No. 2019/0073560 A1 as applied to claim 1 above, and further in view of Black et al. U.S. Patent No. 10,679,046.

-	With regards to claim 10, Matei et al. disclose the method of claim 1, wherein the groups learned by the clustering neural network correspond to body parts. (Matei et al., Pg. 1 ¶ 0006, Pg. 2 ¶ 0014, Pg. 4 ¶ 0033, Pg. 8 ¶ 0058, Pg. 11 ¶ 0080, Pg. 12 ¶ 0093 - 0094) Matei et al. fail to disclose explicitly wherein the groups correspond to human body parts. Pertaining to analogous art, Black et al. disclose wherein the groups learned by the clustering neural network correspond to human body parts. (Black et al., Figs. 1A & 1B, Col. 3 Line 65 - Col. 4 Line 14, Col. 4 Line 37 - Col. 5 Line 3, Col. 9 Lines 59 - 64, Col. 10 Lines 6 - 38, Col. 12 Lines 16 - 25, Col. 19 Line 65 - Col. 20 Line 58, Col. 21 Lines 3 - 17) Matei et al. and Black et al. are combinable because they are both directed towards training a convolutional neural network to predict locations of body parts in images. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Matei et al. with the teachings of Black et al. This modification would have been prompted in order to enhance the base device of Matei et al. with the well-known and applicable technique Black et al. applied to a comparable device. Learning groups to localize human body parts, as taught by Black et al., would enhance the base device of Matei et al. by allowing for it to be utilized in a wide variety of additional and/or alternative applications, such as for gesture recognition, subject tracking, identification of users or pose estimation, wherein a person is the object of interest and localizing one or more human body parts of the person within an image would be beneficial in the additional and/or alternative applications thereby increasing the appeal of the base device to potential end users. Furthermore, this modification would have been prompted by the teachings and suggestions of Matei et al. that key-point and part localization has been widely studied for the purpose of pose estimation, that their techniques can allow parts of an object to be pinpointed across a range of pose and aspect variations for objects such as people and that their techniques may be applied to numerous types of datasets and functions, see at least page 9 paragraphs 0070 - 0072 and page 11 paragraphs 0080 and 0083 of Matei et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the base device of Matei et al. would be utilized to learn groups corresponding to human body parts in order to enable human body parts to be localized in images so as to expand the number of potential applications in which the base device may be utilized, such as in gesture recognition, subject tracking, user identification or pose estimation applications, and thereby enhance its overall appeal and usefulness to potential end users. Therefore, it would have been obvious to combine Matei et al. with Black et al. to obtain the invention as specified in claim 10. 

-	With regards to claim 12, Matei et al. disclose the method of claim 1, wherein the parts of the objects correspond to parts of a body. (Matei et al., Pg. 1 ¶ 0006, Pg. 2 ¶ 0014, Pg. 4 ¶ 0033, Pg. 8 ¶ 0058, Pg. 11 ¶ 0080, Pg. 12 ¶ 0093 - 0094) Matei et al. fail to disclose explicitly wherein the main task neural network is trained for human pose estimation, wherein the parts of the objects correspond to parts of a skeleton representing human pose. Pertaining to analogous art, Black et al. disclose wherein the main task neural network is trained for human pose estimation, (Black et al., Fig. 1B, Col. 2 Lines 11 - 34, Col. 4 Lines 17 - 33, Col. 11 Lines 12 - 24, Col. 12 Lines 13 - 25, Col. 19 Line 65 - Col. 20 Line 30, Col. 20 Line 41 - Col. 21 Line 67) wherein the parts of the objects correspond to parts of a skeleton representing human pose. (Black et al., Figs. 1A & 1B, Col. 3 Line 65 - Col. 4 Line 14, Col. 4 Line 37 - Col. 5 Line 3, Col. 6 Line 62 - Col. 7 Line 3, Col. 7 Lines 46 - 54, Col. 10 Lines 22 - 38, Col. 20 Lines 3 - 26 and 41 - 58) Matei et al. and Black et al. are combinable because they are both directed towards training a convolutional neural network to predict locations of body parts in images. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Matei et al. with the teachings of Black et al. This modification would have been prompted in order to enhance the base device of Matei et al. with the well-known and applicable technique Black et al. applied to a comparable device. Training the neural network(s) for human pose estimation using parts of a skeleton representing human pose, as taught by Black et al., would enhance the base device of Matei et al. by allowing for it to be utilized in additional and/or alternative applications that would benefit from the part localization features of the base device of Matei et al., such as in gesture recognition, subject tracking and/or human pose estimation applications, thereby increasing the appeal and usefulness of the base device to potential end users. Furthermore, this modification would have been prompted by the teachings and suggestions of Matei et al. that key-point and part localization has been widely studied for the purpose of pose estimation, that their techniques can allow parts of an object to be pinpointed across a range of pose and aspect variations for objects such as people and that their techniques may be applied to numerous types of datasets and functions, see at least page 9 paragraphs 0070 - 0072 and page 11 paragraphs 0080 and 0083 of Matei et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the neural network(s) of the base device of Matei et al. would be trained for human pose estimation using parts of a skeleton representing human pose so as to enhance the overall appeal and usefulness of the base device to potential end users by allowing for it to be utilized in a variety of additional and/or alternative applications that would benefit from its part localization features, such as in gesture recognition, subject tracking and/or human pose estimation applications. Therefore, it would have been obvious to combine Matei et al. with Black et al. to obtain the invention as specified in claim 12. 

-	With regards to claim 13, Matei et al. disclose the method of claim 1. Matei et al. fail to disclose explicitly wherein the main task neural network is trained for hand tracking, body tracking, or gaze tracking. Pertaining to analogous art, Black et al. disclose wherein the main task neural network is trained for hand tracking, body tracking, or gaze tracking. (Black et al., Figs. 1A & 1B, Col. 2 Lines 11 - 34, Col. 4 Lines 3 - 33, Col. 4 Line 61 - Col. 5 Line 3, Col. 10 Lines 22 - 45, Col. 18 Line 65 - Col. 19 Line 6, Col. 19 Line 65 - Col. 20 Line 32, Col. 20 Lines 41 - 58, Col. 21 Lines 3 - 17) Matei et al. and Black et al. are combinable because they are both directed towards training a convolutional neural network to predict locations of body parts in images. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Matei et al. with the teachings of Black et al. This modification would have been prompted in order to enhance the base device of Matei et al. with the well-known and applicable technique Black et al. applied to a comparable device. Training the neural network(s) for hand tracking, body tracking, or gaze tracking, as taught by Black et al., would enhance the base device of Matei et al. by allowing for it to be utilized in additional and/or alternative applications that would benefit from its part localization features thereby increasing its overall appeal and usefulness to potential end users. Furthermore, this modification would have been prompted by the teachings and suggestions of Matei et al. that a system capable of recognizing vehicles may improve tracking of vehicles across camera views, that key-point and part localization has been widely studied for the purpose of pose estimation, that their techniques can allow parts of an object to be pinpointed across a range of pose and aspect variations for objects such as people and that their techniques may be applied to numerous types of datasets and functions, see at least page 3 paragraph 0024, page 9 paragraphs 0070 - 0072 and page 11 paragraphs 0080 and 0083 of Matei et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the neural network(s) of the base device of Matei et al. would be trained for hand tracking, body tracking, or gaze tracking so as to enhance its overall appeal and usefulness to potential end users by allowing for it to be utilized in a variety of additional and/or alternative applications that would benefit from its part localization and classification features. Therefore, it would have been obvious to combine Matei et al. with Black et al. to obtain the invention as specified in claim 13. 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Matei et al. U.S. Publication No. 2019/0073560 A1 as applied to claim 1 above, and further in view of Jansen et al. U.S. Publication No. 2020/0349921 A1.

-	With regards to claim 14, Matei et al. disclose the method of claim 1, wherein the main task neural network is trained for semantic segmentation of image data. (Matei et al., Figs. 4 & 6, Pg. 7 ¶ 0053, Pg. 9 ¶ 0070 - Pg. 10 ¶ 0073, Pg. 11 ¶ 0080) Matei et al. fail to disclose explicitly wherein the neural network is trained for semantic segmentation of audio. Pertaining to analogous art, Jansen et al. disclose wherein the neural network is trained for semantic segmentation of audio. (Jansen et al., Abstract, Pg. 1 ¶ 0002 - 0003, Pg. 2 ¶ 0018 - Pg. 3 ¶ 0020, Pg. 3 ¶ 0024 - Pg. 4 ¶ 0029, Pg. 4 ¶ 0031 - Pg. 5 ¶ 0034) Matei et al. and Jansen et al. are combinable because they are both directed towards training convolutional neural networks to semantically segment and classify input data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Matei et al. with the teachings of Jansen et al. This modification would have been prompted in order to enhance the base device of Matei et al. with the well-known and applicable technique Jansen et al. applied to a similar device. Training the neural network(s) for semantic segmentation of audio, as taught by Jansen et al., would enhance the base device of Matei et al. by allowing for it to be utilized in a wide variety of additional and/or alternative applications that would benefit from its part localization and classification features thereby increasing its overall appeal and usefulness to potential end users. Furthermore, this modification would enhance the base device of Matei et al. by allowing for it to additionally and/or alternatively process, semantically segment and classify audio data thereby facilitating its use in a plethora of potential audio processing applications. Moreover, this modification would have been prompted by the teachings and suggestions of Matei et al. that their part localization outputs a mask comprising a labeled image similar to techniques that use semantic segmentation and that their techniques may be applied to numerous types of datasets and functions, see at least page 9 paragraph 0071 and page 11 paragraph 0083 of Matei et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the neural network(s) of the base device of Matei et al. would be trained for semantic segmentation of audio so as to enhance its overall appeal and usefulness to potential end users by enabling it to be utilized in a wide variety of additional and/or alternative audio processing applications that would benefit from its part localization and classification features. Therefore, it would have been obvious to combine Matei et al. with Jansen et al. to obtain the invention as specified in claim 14. 

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Matei et al. U.S. Publication No. 2019/0073560 A1 as applied to claim 1 above, and further in view of Buhrmann et al. U.S. Publication No. 2018/0365229 A1.

-	With regards to claim 15, Matei et al. disclose the method of claim 1, wherein the main task neural network is trained for semantic segmentation of image data. (Matei et al., Figs. 4 & 6, Pg. 7 ¶ 0053, Pg. 9 ¶ 0070 - Pg. 10 ¶ 0073, Pg. 11 ¶ 0080) Matei et al. fail to disclose explicitly wherein the neural network is trained for semantic segmentation of text. Pertaining to analogous art, Buhrmann et al. disclose wherein the neural network is trained for semantic segmentation of text. (Buhrmann et al., Abstract, Figs. 1, 2 & 5, Pg. 1 ¶ 0007 - 0008, Pg. 2 ¶ 0017 and 0020 - 0024, Pg. 3 ¶ 0026, 0028 and 0030, Pg. 4 ¶ 0033 - Pg. 5 ¶ 0034) Matei et al. and Buhrmann et al. are combinable because they are both directed towards training convolutional neural networks to semantically segment and classify input data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Matei et al. with the teachings of Buhrmann et al. This modification would have been prompted in order to enhance the base device of Matei et al. with the well-known and applicable technique Buhrmann et al. applied to a similar device. Training the neural network(s) for semantic segmentation of text, as taught by Buhrmann et al., would enhance the base device of Matei et al. by allowing for it to be utilized in a wide variety of additional and/or alternative applications that would benefit from its part localization and classification features thereby increasing its overall appeal and usefulness to potential end users. Furthermore, this modification would enhance the base device of Matei et al. by allowing for it to additionally and/or alternatively process, semantically segment and classify text data thereby facilitating its use in a plethora of potential text and document processing applications. Moreover, this modification would have been prompted by the teachings and suggestions of Matei et al. that their part localization outputs a mask comprising a labeled image similar to techniques that use semantic segmentation and that their techniques may be applied to numerous types of datasets and functions, see at least page 9 paragraph 0071 and page 11 paragraph 0083 of Matei et al. This combination could be completed according to well-known techniques in the art and would likely yield predictable results, in that the neural network(s) of the base device of Matei et al. would be trained for semantic segmentation of text so as to enhance its overall appeal and usefulness to potential end users by enabling it to be utilized in a wide variety of additional and/or alternative text and document processing applications that would benefit from its part localization and classification features. Therefore, it would have been obvious to combine Matei et al. with Buhrmann et al. to obtain the invention as specified in claim 15. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hickson et al. U.S. Publication No. 2020/0027002 A1; which is directed towards methods and systems training a neural network to classify an image and cluster the image into a plurality of semantic categories, wherein the neural network is trained based on a classification loss and a clustering loss. 
Molchanov et al. U.S. Publication No. 2018/0365532 A1; which is directed towards a method and system for training a multi-task neural network to classify an image and localize landmarks in the image, wherein the multi-task neural network is trained based on semi-supervised learning.
Yoo et al. U.S. Publication No. 2016/0148080 A1; which is directed towards a method and apparatus training a convolutional neural network to perform multiple tasks, wherein the multiple tasks include extracting a plurality of parts of an object from an image and classifying the object in the image. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC RUSH whose telephone number is (571) 270-3017. The examiner can normally be reached 9am - 5pm Monday - Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571) 272 - 7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ERIC RUSH/Primary Examiner, Art Unit 2667