Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This Non-Final Office action is in reply to the request for continued examination filed 1/11/2021.
Claims 2, 5-9, 12, 13, 15, 16 and 18 were previously cancelled.
Claims 1, 3, 4, 10, 11, 14, 17 and 19-34 are pending.

Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 1/11/2021.

Information Disclosure Statement
The Information Disclosure Statement (IDS) submitted on 3/17/2021, 1/14/2021 has been considered and initialed by the examiner.


Response to arguments/amendments
Applicant’s arguments regarding the 35 USC 103 rejection and the pending claims have been considered, however Gausebeck is no longer used to teach the claim limitations; therefore applicant’s arguments are moot regarding Gausebeck. Applicant then argues the amended claims; Examiner has modified the rejection and addressed each of applicant’s claims in this Non-Final Office action.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3, 4, 10, 17, 21, 29 and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Leao et al., (WO2019164484A1), in view of Gao et al., US Patent Application Publication No  US2019/0114511A1, in further view of Watson et al., US Patent Application Publication No US2019/0354850A1.
With respect to Claims 1 and 21,
Leao discloses,
at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform: (¶11: “…computing system 100 includes a processor 101, machine-readable storage medium 102, 
obtaining a plurality of images, the plurality of images including a first image of a first room inside a home and a second image of a second room inside of the home; (¶16: “…first model application to image instructions 103 may include instructions to apply the first model 108 to an image to determine a context associated with the environment of the image. For example, the computing system 100 may receive an image or may include a camera to capture an image…”;¶25: “…The image may be any suitable image…”;¶37: “a processor captures image of environment. For example, the image may be of a room. In one implementation, multiple images are captured to be analyzed. For example, the images maybe be images of different areas of the location or the images may be of the same location at different times. The images may be captured at any suitable time…”)
determining a type of the first room by processing the first image of the first room with a first neural network model the first neural network model having a first plurality of layers comprising at least a convolutional layer, a pooling layer, a fully connected layer, or a softmax layer, (Figs 2-5, ¶17: “…The first model 108 may be a convolutional neural network trained for scene recognition…the first model 108 may be trained on a set of input images associated with different context types. The first model 108 may output information about context and confidence level associated with the context The confidence level may be used 
determining a type of the second room by processing the second image of the second room with the first neural network model; (¶17: “…The first model 108 may output information about context and confidence level associated with the context The confidence level may be used to select a second model or to determine whether to use the output from the first model 108…”;¶24: “…the first model is a machine-learning model trained on a set of images. The first model may be trained on images of different environment types, and the first model may be trained and updated with new training images. The output of the first model may be related to a description associated with the environment in which the image was taken. The output of the first model may be related to a location type associated with the image. For example, the first model may output information related to a location type and confidence level. The location type may be a room type. …”;¶32-¶37;¶33: “…Location recognition model 301 the image may be of a room. In one implementation, multiple images are captured to be analyzed…”)
identifying at least one first feature in the first image of the first room by processing the first image with a second neural network model different from the first neural network model and trained using a first plurality of training images of rooms of a same type as the first room (¶25: “…The image may be any suitable image...There may be multiple images to be input into the model, such as multiple images of the same location at different time periods or images of the same location from multiple angles…”;¶27:“…the processor applies the selected second model to the image. The second model may be any suitable model. The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image.)
identifying at least one second feature in the second image of the second room by processing the second image with a third neural network model different from the first neural network model and second neural network model, (¶17: “…The second or third model selection instructions 104 may include instructions to select at least one of the second and third model based on the determined context. As an example, the second model 109 may be a model to determine information about a home location, and the third model 110 may be a model to determine information about an office location, if the output from the first model 108 indicates that the location is in a home, the processor 101 may select the second model 109 to apply to the image. The second model 109 and the third model 110 may be convolutional neural network models trained to recognize objects of a particular type such that the second model 109 is related to a first object type and third model 110 is related to a second object type….”;¶18: “…the second model 109 may be a model to determine information about a home location, and the third model 110 may be a model to determine information about an office location, if the output from the first model 108 indicates that the location is in a home, the processor 101 may select the second model 109 to apply to the image. The second model 109 and the third model 110 may be convolutional neural network models trained to recognize objects of a particular type such that the second model 109 is related to a first object type and third model 110 is related to a second object type…”;¶28:“…The processor may select any suitable level of hierarchical models. For example, an additional model may be selected based on the output from the selected second model. There may be a stored cascade of hierarchical models including information about a relationship between models in the hierarchy such that the output of a first model is used to select a second model in the hierarchy
at least one first feature and the at least one second feature as input to a machine learning model different from the first neural network model, the second neural network model, and the third  neural network model(¶16-¶19;¶19: “…The selected model application to image instructions 105 may include instructions to apply the selected model to the image. For example, if the second model 109 is selected, the processor 101 may apply the second model 109 to the image. The second model 109 may be applied to the entire image or a segment of the image tailored to the second model 109. The models may have any suitable level of hierarchy. For example, the output of the second model 109 may be used to select a fourth or fifth model to apply...”)
the third neural network model trained using a second plurality of training images of rooms of a same type as the second room,(¶19, ¶22, ¶28: “…The processor may select any suitable level of hierarchical models. For example, an additional model may be selected based on the output from the selected second model. There may be a stored cascade of hierarchical models including information about a relationship between models in the hierarchy such that the output of a first model is used to select a second model in the hierarchy…”;¶33, ¶37: “…a processor captures image of environment. For example, the image may be of a room. In one implementation, multiple images are captured to be analyzed. For example, the images maybe be images of different areas of the location or the images may be of the same location at different times. The images may be captured at any suitable time…”)
determining a value of the home (¶30: “…the processor creates an environmental description representation based on the output of the second model. The processor may create the environmental description representation based on the output of models in addition to the second model, such as models above and below the second model in a hierarchy, in one implementation, the environmental description representation is created with different levels or types of details on the same object…”;¶38: “…, the processor determines environmental context associated with the image based on the application of hierarchical models. For example, the context may include a location type, people or objects present, or an occurrence of an event. The context information may be any suitable information used to provide a layer of context to a request. The context information may be stored to be accessed when a query and/or command is received. For example, the context information may be indexed such that it may be searched when a query and/or command is received…”;¶41)
Leao discloses all of the above limitations, Leao does not distinctly describe the following limitations, but Gao however as shown discloses,
the first plurality of training images including training images augmented by one or more transformations(¶418: “…augmenter, running on at least one of the processors, that progressively augments a set size of the pathogenic training set (first and second training images) based on the trained ensemble's evaluation of a synthetic set (one or more transformations) …”)
the second plurality of training images including training images augmented by one or more transformations, (¶418: “…augmenter, running on at least one of the processors, that progressively augments a set size of the pathogenic training set (first and second training images) based on the trained ensemble's evaluation of a synthetic set (one or more transformations) …”)
the second neural network model having a second plurality of layers comprising at least first deep neural network layers, a reduction layer, second deep neural network layers, an average pooling layer, a fully connected layer, a dropout layer, or a softmax layer (¶145: “…the convolutional neural network uses different numbers of convolution layers, sub-sampling layers, non-linear layers and fully connected layers…the convolutional neural network is a deep network with more layers…”;¶235: “…two separate deep convolutional neural network models…”)
Leo teaches a method/system for applying hierarchical neural network models to an image of an environment to determine a context of the environment. Gao discloses an augmenter for augmenting training images based on one or more transformations via neural network technology. Leao and Gao are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao with the training techniques as taught by Gao since it allows for making extracted features or 
Leao and Gao disclose all of the above limitations, the combination of Leao and Gao does not distinctly describe the following limitations, but Watson however as shown discloses,
the third neural network model having a third plurality of layers comprising at least first deep neural network layers, a reduction layer, second deep neural network layers, an average pooling layer, a fully connected layer, a dropout layer, or a softmax layer, (Fig 1, 2, ¶13: “…FIG. 6 illustrates a diagram of an example, non-limiting graph that can represent a visualization that can be generated by a system to facilitate the selection of one or more pre-trained neural network models for transfer learning that can enhance the performance of one or more machine learning tasks in accordance with one or more embodiments described herein…”; ¶30: “…the term "neural network model" can refer to a computer model that can be used to facilitate one or more machine learning tasks, wherein the computer model can simulate a number of interconnected processing units that can resemble abstract versions of neurons. For example, the processing units can be arranged in a plurality of layers (e.g., one or more input layers, one or more hidden layers, and/or one or more output layers) connected with by varying connection strengths (e.g., which can be commonly referred to within the art as "weights")…neural network models can include, but are not limited deep convolutional network ("DCN"), convolutional neural network ("CNN")…”;¶67: “…a first fully connect layer…”)
the first plurality of layers including at least one million parameters; the second plurality of layers including at least one million parameters 8109886.8Application No.: 16/739,2863 Docket No.: N0629.70000US01;Reply to Office Action of March 26, 2020 the third plurality of layers including at least one million parameters; (Fig 4A, 4B, ¶68-72; ¶68: “…the system 100 was utilized to analyze vision-based neural network models and/or source data sets, such as the database ImageNet22k, which contains 14 million images spread over 1481 categories … Each of these data sets was further split into 4 parts: a first part was used to train the source model, a second part was used for validating the source model, a third part as used to create a transfer learning target workload, and a fourth part was used for validating the transfer learning training. For example, the person hierarchy has greater than 1 million images …”)
Watson teaches a method/system for identifying and using pre-trained neural network models associated with a source data set to perform a target machine learning task. Watson further teaches that as a neural network model trains (e.g., utilizes more training data), the computer model can become increasingly accurate; thus, trained neural network models can accurately analyze data with unknown outcomes, based on lessons learning from training data, to facilitate one or more machine learning tasks (¶30). Examiner contends that it is old and well known that the deep convolutional neural networks are considered as one of the most widely used machine learning algorithms in computer vision and first, second and third plurality of layers including one million parameters) and there would have been a reasonable expectation of success in doing so. DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006). Moreover, the claimed invention would have been obvious since the visual based neural network models of Watson could have prompted one of ordinary skill in the art to vary the prior art in a predictable manner to result in the claimed invention (each first, second and third plurality of layers including one million parameters).
Leao, Gao and Watson are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao with training techniques of Gao and method/system for selecting pre-trained neural network models as taught by Watson since it allows for enhancing the performance/accuracy of one or more target machine learning tasks (¶68-¶74).

With respect to Claims 3 and 29, 
Leao, Gao and Watson disclose all of the above limitations, Gao further discloses,
wherein the first neural network comprises two neural network sub-models including a first sub-model having an average pooling layer and a second sub-model having a max pooling layer instead of the average pooling layer (¶133: “…sub-sampling layers employ two types of pooling operations, average pooling and max pooling. The pooling operations divide the input into non-overlapping two-dimensional spaces. For average pooling, the average of the four values in the region is calculated. For max pooling, the maximum value of the four values is selected….”)
Leao, Gao and Watson are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao with the method/system for selecting pre-trained neural network models of Watson and the sub sampling layer techniques as taught by Gao since it allows for making the extracted features or feature maps-robust against noise and distortion.

With respect to claims 4 and 30,
Leao, Gao and Watson disclose all of the above limitations, Leao further discloses,
wherein processing the at least one image of the first room with the first neural network model comprises: processing the at least one image using the first sub-model to obtain first results; processing the least one image using the second sub-model to obtain second results; and combining the first and second results to obtain an output result for the first neural network model(¶14: “…The first model 108, second model 109, and third model 110 may be image classification models. The second model 109 and the third model 110 may be sub-models of the first model 108 in a hierarchy…”;¶23: “…subsequent model may be selected in a hierarchical manner based on the output from a previously applied model. The models may be machine learning models that receive an image as input and output information about an environmental description associated with the image…”; Fig 4, ¶35: “…The model may be a machine learning model used to classify an image, in one implementation, the model classifies different areas of the image that are then segmented for further analysis. For example, segmented image 401 may include multiple segments associated with different object types. Segmented image 401 may include segment 1 image 402 and segment 2 image 403. A different sub- model may be applied to each segment. For example, a first sub-model may be applied to segment 1 image 402, and a second sub-model may be applied to segment 2 image 403. The output from the first sub-model and the second sub-model may be used to form context information 404… additional information from the first model is also used to determine the environmental context. Context information 404 may be stored and used to parse queries received in a location where the input image 400 was received…”)


With respect to claims 10 and 17,
Leao, Gao and Watson disclose all of the above limitations Leao further discloses, 
wherein the first space is a kitchen, and wherein identifying the at least one feature comprises identifying a type of material of a countertop in the kitchen and/or identifying a finish of an appliance in the kitchen (¶27: “…the processor applies the selected second model to the image. The second model may be any suitable model. The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image…”; ¶37: “…. The image may be associated with a location of a device for receiving a query and/or command or may be in a separate location associated with the query and/or command request, such as where a user in a living room requests information about an item in the kitchen...”) A person of ordinary skill in the art would have been motivated to combine the hierarchal modeling techniques for determining information /attributes of objects in an image as taught by Leao to achieve the claimed invention (identifying a type of material of a countertop in the kitchen and/or identifying a finish of an appliance) and DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006).

Claims 11, 14 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Leao, Gao, Watson in further view of Bilandi et al., US Patent Application Publication No US2019/0012768A1.
With respect to Claim 11,
Leao, Gao and Watson disclose all of the above limitations, the combination of Leao, Gao and Watson does not distinctly describe the following limitations but Bilandi however as shown discloses,
wherein the processor-executable instructions further cause the at least one computer hardware processor to perform: processing multiple images using the first neural network model to identify images for which the first neural network model output differs from labels produced by manual classification obtain new labels for at least some of the multiple images; and update one or more parameters of the first neural network model by using the at least some of the multiple images with the new labels.(¶63: “…The training images are generally real images of fragmented materials that have been evaluated to identify fragmented material portions. In some embodiments, the evaluation is manual in that an operator will evaluate an image and label fragmented material portions within the image. The image is then saved along with information identifying each pixel as being at an edge of a fragmented material portion, inward from the edge on the fragmented material portion, or between fragmented material portions. A plurality of training images may be assembled to make up a training set…”)
Bilandi discloses a method/system for the identification of fragmented material portions within an image. Leao, Gao, Watson and Bilandi are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, the method/system for selecting pre-trained neural network models of Watson and the sub sampling layer techniques of Gao with the method/system for processing an image as taught by Bilandi since it allows for manually evaluating and labeling images for validation purposes and for optimally configuring the network (Fig 3, ¶61-¶64)

With respect to Claim 14,
Leao, Gao and Watson disclose all of the above limitations, the combination of Leao, Gao and Watson does not distinctly describe the following limitations but Bilandi however as shown discloses,
wherein the second neural network model uses a bank of convolution kernels having different resolutions (Fig 16, ¶107: “……A convolution kernel 1604 having 16 kernels is used to produce a first convolution 1606 of having 254.times.254.times.16 neurons (i.e. 16 channels of 254.times.254 pixels)… a second convolution is performed using a kernel 1610 following the pooling layer 1608, resulting in 125.times.125.times.32 neurons in the convolution layer 1612…”)
Bilandi discloses a method/system for the identification of fragmented material portions within an image. Leao, Gao, Watson and Bilandi are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, the method/system for selecting pre-trained neural network models of Watson and the sub sampling layer techniques of Gao with the method/system for processing an image as taught by Bilandi since it allows for efficiently detecting edge features in input pixel data for training images neural networks (¶105-¶107).

With respect to Claim 28,
Leao, Gao and Watson disclose all of the above limitations, Leao further discloses,
wherein the first image of the first room has a first resolution, and wherein processing the first image of the first room with the first neural network model comprises: generating, from the first image, a second image of the first room; and8842775.1Application No.: 16/739,28610 Docket No.: N0629.70000US01 After Final Office Action of October 9, 2020 processing the second image of the first room with the first neural network model(¶8: “…determine environmental context based on an image of the environment. For example, the image may be captured by a camera associated with the electronic assistant or associated with the user's environment, such as a camera associated with a room in which the electronic assistant is located. The query and/or command may be parsed based on cascading hierarchical models. For example, the processor may apply a first model and select a second model based on the output of the first model…”;¶16: “…The first model application to image instructions 103 may include instructions to apply the first model 108 to an image to determine a context associated with the environment of the image. For example, the computing system 100 may receive an image or may include a camera to capture an image. In one implementation, the computing system 100 is associated with an electronic assistant and the electronic assistant captures an image of its environment. The image may be captured when a communication, such as a query and/or command, is initiated, when the location of a camera or other device is established, or at regular intervals. For example, the environment of an electronic assistant may change because the electronic assistant is moved to a different room or because objects in the same room change over time. ….”;¶17: “…The first model 108 may be a convolutionaJ neural network trained for scene recognition. For example, the first model 108 may be trained on a set of input images associated with different context types…”;¶27: “…the processor applies the selected second model to the image…The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image…”)
Leao, Gao and Watson disclose all of the above limitations, the combination of Leao, Gao and Watson does not distinctly describe the following limitations but Bilandi however as shown discloses,
wherein the first image of the first room has a first resolution, a second image of the first room having a second resolution lower than the first resolution (Fig 16, ¶107: “……A convolution kernel 1604 having 16 kernels is used to produce a first convolution 1606 of having 254.times.254.times.16 neurons (i.e. 16 channels of 254.times.254 pixels)… a second convolution is performed using a kernel 1610 following the pooling layer 1608, resulting in 125.times.125.times.32 neurons in the convolution layer 1612…”;¶99: “…Once the first convolutional neural network 1204 has been adequately trained, the labeled images are processed through the first network to produce sets of labeled training outputs at 1212. The labeled training outputs at 1212 are then used in a second training exercise to train the second convolutional neural network 1206 to produce desired outputs 1224…”;¶100;¶110: “…The image sensor 102 and processor circuit 200, when configured to implement the neural network 600 may be used to perform a fragmentation analysis of materials for many different purposes… operable to produce results for a fragmentation analysis of submitted image data… capture an image of fragmented material being conveyed by a ground engaging tool of heavy equipment… capture images of fragmented material in the bucket when the material is in view of the image sensor)
Bilandi discloses a method/system for the identification of fragmented material portions within an image. Leao, Gao, Watson and Bilandi are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, the method/system for selecting pre-trained neural network models of Watson and the sub sampling layer techniques of Gao with the method/system for processing an image as taught by Bilandi since it allows for an enhanced analysis of images and a more efficient input for training the neural networks (¶104-¶110).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Leao, Gao, Watson in further view of Hellman et al., US Patent Application Publication No US2019/0259293A1.

With respect to Claim 19, 
Leao, Gao, Watson disclose all of the above limitations, the combination of Leao, Gao, Watson does not distinctly describe the following limitations, but Hellman however as shown discloses,
wherein the machine learning model is a random forest model (¶301: “…exemplary models can include, for example, a logistic regression model, a random forest model, a decision tree model, a probabilistic model, deep learning model, a neural network, a Bayesian network, or the like. In some embodiments, for example, a random forest model, and/or a logistic regression model may be the easy to fully train, whereas, a deep learning model, a neural network, and/or Bayesian network may better address issues of high complexity but may also be more difficult to train…”)
Hellman discloses a system for customizing an evaluation model to an evaluation style. Leao, Gao, Watson and Hellman are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, the method/system for selecting pre-trained neural network models of Watson and the sub sampling layer techniques of Gao with the method/system for interface base machine learning model output customization as taught by Hellman since it allows for selecting/determining model types for training data (Abstract, ¶301).

Claims 22, 23 and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Leao, Gao, Watson in further view of Ho et al., US Patent Application Publication No US20190236440A1.
With respect to Claims 22 and 34,
Leao, Gao and Watson disclose all of the above limitations, the combination of Leao, Gao, Watson does not distinctly describe the following limitations, but Ho however as shown discloses,
wherein the second plurality of layers comprises first deep neural network layers, a reduction layer, second deep neural network layers, a fully connected layer, a dropout layer, and a softmax layer(¶33: “…CNN usually consists of several cascaded convolutional layers, comprising fully -connected artificial neurons. In some cases, it can also include pooling layers (average pooling or max pooling). In some cases, it can also include activation layers. In some cases, a final layer can be a softmax layer for classification and/or detection tasks. The convolutional layers are generally utilized to learn the spatial local-connectivity of input data for feature extraction. The pooling layer is generally for reduction of receptive field and hence to protect against overfitting. Activations, for example nonlinear activations, are generally used for boosting of learned features. Various variants to the standard CNN architecture can use deeper (more layers) and wider (larger layer size) architectures. To avoid overfitting for deep neural networks, some regularization methods can be used, such as dropout or dropconnect; which turn off neurons learned with a certain probability in training and prevent the co-adaptation of neurons during the training phase.
Ho teaches a method/system for building a deep convolutional neural network architecture. Leao, Gao, Watson and Ho are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, the sub sampling layer techniques of Gao with the method/system for selecting pre-trained neural network models of Watson and the artificial convolutional neural network technique as taught by Ho since it allows for an improved performance in achieving object recognition (¶82)

With respect to Claim 23, 
Leao, Gao, Watson disclose all of the above limitations, the combination of Leao, Gao, Watson does not distinctly describe the following limitations, but Ho however as shown discloses,
processing the first image with the first deep neural network layers to obtain first results; providing the first results as input to the reduction layer to obtain second results; providing the second results as input to the second deep neural network layers to obtain third results; providing the third results as input to the average pooling layer to obtain fourth results; providing the fourth results as input to the fully connected layer to obtain fifth results; providing the fifth results as input to the dropout layer to obtain sixth results; and providing the sixth results as input to the softmax layer to obtain an output result for the second neural network model.(Figs 3, 4A, 4B; ¶42: “…the CNN module 122 is able to build and use an embodiment of a deep convolutional neural network architecture (referred to herein as a Global-Connected Net or a GC-Net…”; ¶43: “…CNN architecture with cascaded connected layers; where hidden blocks are pooled and then fed into a subsequent hidden block, and so on until a final hidden block followed by an output or softmax layer. FIG. 4A illustrates an embodiment of the GC-Net CNN architecture where inputs (X) 402 are fed into plurality of pooled convolutional layers connected sequentially. Each pooled convolutional layer includes a hidden block and a pooling layer…In addition to this cascading structure, this embodiment of the GC-Net CNN architecture also includes connecting the output of each hidden block 404 to a respective global average pooling (GAP) layer, which, for example, takes an average of each feature map from the last convolutional layer. Each GAP layer is then fed to the final hidden block 408. A softmax classifier 412 can then be used, the output of which can form the output (Y) 414 of the CNN…;¶44: “…As shown in FIG. 4A, the GC-Net architecture consists of n blocks 404 in total, a fully-connected final hidden layer 408 and a softmax classifier 412. each block 404 can have several convolutional layers, each followed by normalization layers and activation layers. The pooling layers 406 can include max-pooling or average pooling layers to be applied between connected blocks to reduce feature map sizes… which is fed as input into the last fully-connected hidden layer 408 and then to the softmax classifier 412”;¶45)
Ho teaches a method/system for building a deep convolutional neural network architecture. Leao, Gao, Watson and Ho are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, the sub sampling layer techniques of Gao with the method/system for selecting pre-trained neural network models of Watson and the artificial convolutional neural network technique as taught by Ho since it allows for an improved performance in achieving object recognition (¶44, ¶45,¶82)

Claims 20, 27 and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Leao, Bilandi, in further view of Watson.
With respect to claim 20,
Leao discloses,
at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform:(¶11: “…computing system 100 includes a processor 101, 
obtaining a plurality of images, the plurality of images including a first image of a first room inside a home;(¶16: “…first model application to image instructions 103 may include instructions to apply the first model 108 to an image to determine a context associated with the environment of the image. For example, the computing system 100 may receive an image or may include a camera to capture an image…”;;¶24: “…, the first model is a machine-learning model trained on a set of images. The first model may be trained on images of different environment types, and the first model may be trained and updated with new training images. The output of the first model may be related to a description associated with the environment in which the image was taken. The output of the first model may be related to a location type associated with the image. For example, the first model may output information related to a location type and confidence level. The location type may be a room type…”;¶32-¶37;¶33: “…Location recognition model 301 is the first model in the hierarchy…: ¶37: “…a processor captures image of environment. For example, the image may be of a room. In one implementation, multiple images are captured to be analyzed…”)
determining a type of the first room by: generating, from the first image, a second image (¶24: “…The output of the first model may be related to a 
determining a value of the home at least in part by using the at least one first feature as input to a machine learning model different from the first neural network model and the second neural network model. (¶16-¶19; ¶19:“…The selected model application to image instructions 105 may include instructions to apply the selected model to the image. For example, if the second model 109 is selected, the processor 101 may apply the second model 109 to the image. The second model 109 may be applied to the entire image or a segment of the image tailored to the second model 109. The models may have any suitable level of hierarchy. For example, the output of the second model 109 may be used to select a fourth or fifth model to apply...”;¶30:“…the processor creates an environmental description representation based on the output of the second model. The processor may create the environmental description representation based on the output of models in addition to the second model, such as models above and below the second model in a hierarchy, in one implementation, the environmental description representation is created with different levels or types of details on the same object…”;¶38: “…, the processor determines environmental context associated with the image based on the application 
 identifying at least one first feature in the first image of the first room by processing the first image with a second neural network model different from the first neural network model and trained using images of rooms of a same type as the first room,(¶25: “…The image may be any suitable image...There may be multiple images to be input into the model, such as multiple images of the same location at different time periods or images of the same location from multiple angles…”;¶27:“…the processor applies the selected second model to the image. The second model may be any suitable model. The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image.)
Leao discloses all of the above limitations, Leao does not distinctly describe the following limitations, but Bilandi however as shown discloses,
the first image having a first resolution, a second image of the first room having a second resolution lower than the first resolution (Fig 16, ¶107: “…A convolution kernel 1604 having 16 kernels is used to produce a first convolution 1606 of having 254.times.254.times.16 neurons (i.e. 16 channels of 254.times.254 pixels)… a second convolution is performed using a kernel 1610 following the pooling layer 1608, resulting in 125.times.125.times.32 neurons in the convolution layer 1612…”)
Bilandi discloses a method/system for the identification of fragmented material portions within an image. Leao and Bilandi are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, with the method/system for processing an image as taught by Bilandi since it allows for efficiently detecting edge features in input pixel data for training images neural networks (¶105-¶107).
Leao and Bilandi disclose all of the above limitations, the combination of Leao and Bilandi does not distinctly describe the following limitations, but Watson however as shown discloses,
processing the second image of the first  room with a first neural network model comprising: a first neural network sub-model comprising a first plurality of layers comprising at least one million parameters, (Fig 4A, 4B, ¶68-72; ¶68: “…the system 100 was utilized to analyze vision-based neural network models and/or source data sets, such as the database ImageNet22k, which contains 14 million images spread over 1481 categories … Each of these data sets was further split into 4 parts: a first part was used to train the source model, a second part was used for validating the source model, a third part as used to create a transfer learning target workload, and a fourth part was used for validating the transfer learning training. For example, the person hierarchy has greater than 1 million images …”)
the first plurality of layers comprising at least deep neural network layers, an average pooling layer, a fully connected layer, or a softmax layer; (Fig 1, 2, ¶13: “…FIG. 6 illustrates a diagram of an example, non-limiting graph that can represent a visualization that can be generated by a system to facilitate the selection of one or more pre-trained neural network models for transfer learning that can enhance the performance of one or more machine learning tasks in accordance with one or more embodiments described herein…”; ¶30: “…the term "neural network model" can refer to a computer model that can be used to facilitate one or more machine learning tasks, wherein the computer model can simulate a number of interconnected processing units that can resemble abstract versions of neurons. For example, the processing units can be arranged in a plurality of layers (e.g., one or more input layers, one or more hidden layers, and/or one or more output layers) connected with by varying connection strengths (e.g., which can be commonly referred to within the art as "weights").… neural network models can include, but are not limited to:… deep convolutional network ("DCN"), convolutional neural network ("CNN")…”;¶67: “…a first fully connect layer…”)
a second neural network sub-model comprising a second plurality of layers comprising at least one million parameters, (Fig 4A, 4B, ¶68-72; ¶68: “…the system 100 was utilized to analyze vision-based neural network models and/or source data sets, such as the database ImageNet22k, which contains 14 million images spread over 1481 categories … Each of these data sets was further split into 4 parts: a first part was used to train the source model, a second part was used for validating the source model, a third part as used to create a transfer learning target workload, and a fourth part was used for validating the transfer learning training. For example, the person hierarchy has greater than 1 million images …”)
the second plurality of layers comprising at least deep neural network layers, a max pooling layer, a fully connected layer, or a softmax layer; (Fig 1, 2, ¶13: “…FIG. 6 illustrates a diagram of an example, non-limiting graph that can represent a visualization that can be generated by a system to facilitate the selection of one or more pre-trained neural network models for transfer learning that can enhance the performance of one or more machine learning tasks in accordance with one or more embodiments described herein…”; ¶30: “…the term "neural network model" can refer to a computer model that can be used to facilitate one or more machine learning tasks, wherein the computer model can simulate a number of interconnected processing units that can resemble abstract versions of neurons. For example, the processing units can be arranged in a plurality of layers (e.g., one or more input layers, one or more hidden layers, and/or one or more output layers) connected with by varying connection strengths (e.g., which can be commonly referred to within the art as "weights") … neural network models can include, but are not limited to:… deep convolutional network ("DCN"), convolutional neural network ("CNN")…”;¶67: “…a first fully connect layer …simple max-pool…”)
 the second neural network model further having a third plurality of layers comprising at least a convolutional layer, a pooling layer, a fully connected layer, or a softmax layer, (Fig 1, 2, ¶13: “…FIG. 6 illustrates a diagram of an example, non-limiting graph that can represent a visualization that can be generated by a system to facilitate the selection of one or more pre-trained neural network models for transfer learning that can enhance the performance of one or more machine learning tasks in accordance with one or more embodiments described herein…”; ¶30: “…the term "neural network model" can refer to a computer model that can be used to facilitate one or more machine learning tasks, wherein the computer model can simulate a number of interconnected processing units that can resemble the processing units can be arranged in a plurality of layers (e.g., one or more input layers, one or more hidden layers, and/or one or more output layers) connected with by varying connection strengths (e.g., which can be commonly referred to within the art as "weights"). … neural network models can include, but are not limited to:… deep convolutional network ("DCN"), convolutional neural network ("CNN")…”;¶67: “…a first fully connect layer …simple max-pool…”)
the third plurality of layers including at least one million parameters; (Fig 4A, 4B, ¶68-72; ¶68: “…the system 100 was utilized to analyze vision-based neural network models and/or source data sets, such as the database ImageNet22k, which contains 14 million images spread over 1481 categories … Each of these data sets was further split into 4 parts: a first part was used to train the source model, a second part was used for validating the source model, a third part as used to create a transfer learning target workload, and a fourth part was used for validating the transfer learning training. For example, the person hierarchy has greater than 1 million images …”; ¶69: “…training of the source and target models was performed on caffee using a rEsNet27 neural network model…”)
Watson teaches a method/system for identifying and using pre-trained neural network models associated with a source data set to perform a target machine learning task. Watson further teaches that as a neural network model trains (e.g., utilizes more training data), the computer model can become increasingly accurate; thus, trained neural network models can accurately analyze data with (each first, second and third plurality of layers including one million parameters) and there would have been a reasonable expectation of success in doing so. DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006). Moreover, the claimed invention would have been obvious since the visual based neural network models of Watson could have prompted one of ordinary skill in the art to vary the prior art in a predictable manner to result in the claimed invention (each first, second and third plurality of layers including one million parameters).
Leao, Bilandi and Watson are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao with the method/system for processing an image of Bilandi and the method/system for selecting pre-trained neural network models as taught by Watson since it allows for the identification and/or selection of one 

With respect to Claim 27,
Leao, Bilandi, Watson disclose all of the above limitations, Bilandi further discloses, 
wherein the first resolution is 600x600 pixels and the second resolution is 300x300 pixels(Fig 16, ¶107: “……A convolution kernel 1604 having 16 kernels is used to produce a first convolution 1606 of having 254.times.254.times.16 neurons (i.e. 16 channels of 254.times.254 pixels)… a second convolution is performed using a kernel 1610 following the pooling layer 1608, resulting in 125.times.125.times.32 neurons in the convolution layer 1612…”;¶99: “…Once the first convolutional neural network 1204 has been adequately trained, the labeled images are processed through the first network to produce sets of labeled training outputs at 1212. The labeled training outputs at 1212 are then used in a second training exercise to train the second convolutional neural network 1206 to produce desired outputs 1224…”;¶100; ¶110: “…The image sensor 102 and processor circuit 200, when configured to implement the neural network 600 may be used to perform a fragmentation analysis of materials for many different purposes… operable to produce results for a fragmentation analysis of submitted image data… capture an image of fragmented material being conveyed by a ground engaging tool of heavy capture images of fragmented material in the bucket when the material is in view of the image sensor)
Bilandi teaches a convolutional neural network may involve processing the pixel data using a first convolutional neural network and using the pixel classification output as an input for a second convolutional neural network operable to produce a refined pixel classification output. A person of ordinary skill in the art would have been motivated to combine the neural network techniques of Bilandi to achieve the claimed invention (wherein the first resolution is 600x600 pixels and the second resolution is 300x300 pixels) and there would have been a reasonable expectation of success in doing so. DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006). Moreover, the claimed invention would have been obvious since the convolutional neural network of Bilandi could have prompted one of ordinary skill in the art to vary the prior art in a predictable manner to result in the claimed invention (wherein the first resolution is 600x600 pixels and the second resolution is 300x300 pixels).
Leao, Gao, Watson and Bilandi are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, the method/system for selecting pre-trained neural network models of Watson and the sub sampling layer techniques of Gao with the method/system for processing an image as taught by Bilandi since it allows 

With respect to Claim 33,
Leao, Bilandi, Watson disclose all of the above limitations, Watson further discloses, 
wherein the first plurality of training images and the second plurality of training images each comprise at least 10,000 training images (Fig 4A, 4B, ¶68-72; ¶68: “…the system 100 was utilized to analyze vision-based neural network models and/or source data sets, such as the database ImageNet22k, which contains 14 million images spread over 1481 categories … Each of these data sets was further split into 4 parts: a first part was used to train the source model, a second part was used for validating the source model, a third part as used to create a transfer learning target workload, and a fourth part was used for validating the transfer learning training. For example, the person hierarchy has greater than 1 million images …”)
Leao, Bilandi and Watson are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao with the method/system for processing an image of Bilandi and the method/system for selecting pre-trained neural network models .

Claims 24-26 and 32 rejected under 35 U.S.C. 103 as being unpatentable over Leao, Bilandi, Watson in further view of Ho.
With respect to Claim 24,
Leao, Bilandi, Watson disclose all of the above limitations, the combination of Leao, Bilandi, Watson does not distinctly describe the following limitations, but Ho however as shown discloses,
wherein8842775.1Application No.: 16/739,2869 Docket No.: N0629.70000US01 After Final Office Action of October 9, 2020 the first plurality of layers of the first neural network sub-model comprises deep neural network layers, an average pooling layer, a fully connected layer, and a softmax layer; and the second plurality of layers of the second neural network sub-model comprises deep neural network layers, a max pooling layer, a fully connected layer, and a softmax layer. (¶42: “…a deep convolutional neural network architecture (referred to herein as a Global-Connected Net or a GC-Net…”;¶43: “…43: “…CNN architecture with cascaded connected layers; where hidden blocks are pooled and then fed into a subsequent hidden block, and so on until a final hidden block followed by an output or softmax layer…”;¶44: “…The pooling layers 406 can include max-pooling or average pooling layers to be applied between connected blocks to reduce feature map sizes… which is fed as fully-connected hidden layer 408 and then to the softmax classifier 412”;¶45)
Ho teaches a method/system for building a deep convolutional neural network architecture. Leao, Bilandi, Watson and Ho are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, the method/system for processing an image of Bilandi and the method/system for selecting pre-trained neural network models of Watson with the artificial convolutional neural network technique as taught by Ho since it allows for an improved performance in achieving object recognition (¶44, ¶45,¶82)

With respect to Claim 25,
Leao, Bilandi, Watson and Ho disclose all of the above limitations, Leao further discloses, 
wherein processing the second image of the first  room with the first neural network model comprises: processing the second image using the first neural network sub-model to obtain first output results; (¶14: “…The first model 108, second model 109, and third model 110 may be image classification models. The second model 109 and the third model 110 may be sub-models of the first model 108 in a hierarchy…”;¶17: “…The first model 108 may be a convolutionaJ neural network trained for scene recognition. For example, the first model 108 may be trained on a set of input images associated with different context types. The first model 108 may output information about context and confidence level associated with the context…”;¶18: “…the second model 109 may be a model to determine information about a home location… if the output from the first model 108 indicates that the location is in a home, the processor 101 may select the second model 109 to apply to the image…”;¶24: “…a processor applies a first model to an image of an environment to select second model… The first model may be trained on images of different environment types, and the first model may be trained and updated with new training images. The output of the first model may be related to a description associated with the environment in which the image was taken. The output of the first model may be related to a location type associated with the image. For example, the first model may output information related to a location type and confidence level. The location type may be a room type…”;¶26)
processing the second image using the second neural network sub-model to obtain second output results; and combining the first output results and second output results to obtain an output result for the first neural network model(Fig 2,¶23: “…subsequent model may be selected in a hierarchical manner based on the output from a previously applied model. The models may be machine learning models that receive an image as input and output information about an environmental description associated with the image…”;¶27: “…the processor applies the selected second model to the The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image…”)

With respect to Claim 26, 
Leao, Bilandi, Watson and Ho disclose all of the above limitations, Ho further discloses,
 processing the second image with the deep neural network layers to obtain first results; providing the first results as input to the average pooling layer to obtain second results; providing the second results as input to the fully connected layer to obtain third results; and providing the third results as input to the softmax layer to obtain the first output results. Figs 3, 4A, 4B; ¶42: “…the CNN module 122 is able to build and use an embodiment of a deep convolutional neural network architecture (referred to herein as a Global-Connected Net or a GC-Net…”; ¶43: “…CNN architecture with cascaded connected layers; where hidden blocks are pooled and then fed into a subsequent hidden block, and so on until a final hidden block followed by an output or softmax layer. FIG. 4A illustrates an embodiment of the GC-Net CNN architecture where inputs (X) 402 are fed into plurality of pooled convolutional layers connected sequentially. Each pooled convolutional layer includes a hidden block and a pooling layer…In addition to this cascading structure, this embodiment of the GC-Net CNN architecture also includes connecting the output of each hidden block 404 to a respective global average pooling (GAP) layer, which, for example, takes an average of each feature map from the last convolutional layer. Each GAP layer is then fed to the final hidden block 408. A softmax classifier 412 can then be used, the output of which can form the output (Y) 414 of the CNN…;¶44: “…As shown in FIG. 4A, the GC-Net architecture consists of n blocks 404 in total, a fully-connected final hidden layer 408 and a softmax classifier 412. each block 404 can have several convolutional layers, each followed by normalization layers and activation layers. The pooling layers 406 can include max-pooling or average pooling layers to be applied between connected blocks to reduce feature map sizes… which is fed as input into the last fully-connected hidden layer 408 and then to the softmax classifier 412”;¶45)
Ho teaches a method/system for building a deep convolutional neural network architecture. Leao, Bilandi, Watson and Ho are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the 

With respect to Claim 32,
Leao, Bilandi, Watson disclose all of the above limitations, the combination of Leao, Bilandi, Watson does not distinctly describe the following limitations, but Ho however as shown discloses,
wherein the second plurality of layers comprises first deep neural network layers, a reduction layer, second deep neural network layers, a fully connected layer, a dropout layer, and a softmax layer((¶33: “…CNN usually consists of several cascaded convolutional layers, comprising fully -connected artificial neurons. In some cases, it can also include pooling layers (average pooling or max pooling). In some cases, it can also include activation layers. In some cases, a final layer can be a softmax layer for classification and/or detection tasks. The convolutional layers are generally utilized to learn the spatial local-connectivity of input data for feature extraction. The pooling layer is generally for reduction of receptive field and hence to protect against overfitting. Activations, for example nonlinear activations, are generally used for boosting of learned features. Various variants to the standard CNN architecture can use deeper (more layers) and  dropout or dropconnect; which turn off neurons learned with a certain probability in training and prevent the co-adaptation of neurons during the training phase.
Ho teaches a method/system for building a deep convolutional neural network architecture. Leao, Bilandi, Watson and Ho are directed to the same field of endeavor since they are related to processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchal neural networking models of Leao, the method/system for processing an image of Bilandi and the method/system for selecting pre-trained neural network models of Watson with the artificial convolutional neural network technique as taught by Ho since it allows for an improved performance in achieving object recognition (¶44, ¶45,¶82).

Claim 31 is rejected under 35 U.S.C. 103 as being unpatentable over Leao, Bilandi, Watson in further view of Gao.
With respect to Claim 31,
Leao, Bilandi, Watson disclose all of the above limitations, Leao further discloses, 
wherein: the second neural network model was trained using a plurality of training images of rooms of a same type as the first room(¶17: “…the first model 108 may be trained on a set of input images associated with different context types. The first model 108 may output information about context and 
Leao, Bilandi, Watson disclose all of the above limitations, the combination of Leao, Bilandi, Watson does not distinctly describe the following limitation, but Gao however as shown discloses,
the plurality of training images including training images augmented by one or more transformations(¶418: “…augmenter, running on at least one of the processors, that progressively augments a set size of the pathogenic training set (first and second training images) based on the trained ensemble's evaluation of a synthetic set (one or more transformations) …”)
Leo teaches a method/system for applying hierarchical neural network models to an image of an environment to determine a context of the environment. Gao discloses an augmenter for augmenting training images based on one or more transformations via neural network technology. Leao, Bilandi, Watson and Gao 

Conclusion
References cited but not used:
Li, Yawei et al. “Learning Filter Basis for Convolutional Neural Network Compression.” 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019): 5622-5631; relating to convolutional neural networks based solutions for computer vision tasks including classification and super-solution of images.
Krizhevsky, A. et al. “ImageNet classification with deep convolutional neural networks.” Communications of the ACM 60 (2012): 84 – 90; relating to training deep convolutional neural networks to classify high-resolution images.
Hou et al., US Patent Application Publication No US20090324010A1, “Neural network-controlled automatic tracking and recognizing system and 
Desai et al., US Patent Application Publication No US 2018/0114334A1, “Edge-based adaptive machine learning for object recognition”, relating to adaptive object recognition for a target visual domain given a generic machine learning model.

Any inquiry of a general nature or relating to the status of this application or concerning this communication or earlier communications from the Examiner should be directed to Kimberly L. Evans whose telephone number is 571.270.3929.  The Examiner can normally be reached on Monday-Friday, 9:30am-5:00pm.  If attempts to reach the examiner by telephone are unsuccessful, the Examiner’s supervisor, Lynda Jasmin can be reached at 571.272.6782.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://portal.uspto.gov/external/portal/pair <http://pair-direct.uspto.gov >.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866.217.9197 (toll-free). Any response to this action should be mailed to: Commissioner of Patents and Trademarks, P.O. Box 1450, Alexandria, VA 22313-1450 or faxed to 571-273-8300.  Hand delivered responses United States Patent and Trademark Office Customer Service Window: Randolph Building 401 Dulany Street, Alexandria, VA 22314.

/KIMBERLY L EVANS/Examiner, Art Unit 3629                                                                                                                                                                                                                                                                                                                                                                                                             
/SANGEETA BAHL/Primary Examiner, Art Unit 3629