Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This Non-Final Office action is in reply to the request for continued examination filed 6/29/2022.
Claims 1, 3, 4, 11, 17, 21, 29 and 30 are amended.
Claims 2, 5-9, 12, 13, 15, 16 and 18 were previously cancelled.
Claim 28 is cancelled
Claims 1, 3, 4, 10, 11, 14, 17, 19-27 and 29-34 are pending.

Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 6/29/2022.

Information Disclosure Statement
The Information Disclosure Statement (IDS) submitted on 9/22/2022 and 7/1/2022 have been considered and initialed by the examiner.

Response to arguments/amendments
With respect to the 35USC103 rejection, applicant’s arguments have been reconsidered. In light of applicant’s amendments and arguments, Bilandi reference has been removed; Examiner has modified the rejection and addressed each of applicant’s claims in this Non-Final rejection as noted below.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).

A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 

The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1, 3, 4, 10, 11, 14, 17, 19-27 and 29-34 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3, 4, 10, 11, 14, 17, 19-27 and 29-34 of copending Application No. 17/510248 Although the claims at issue are not identical, they are not patentably distinct from each other because the following claims are anticipated by copending application No. 17/510248 -see mapping of claims noted below:
Claims from 
Co-Pending Application 17/510,248 
Not patentably distinct from Instant Application 16/739,286
1, 2 16, 21, 24, 25, 29, 30
1, 3, 4, 20, 21, 29, 30
11
10
12
11
15
14
18, 19
17
20
19


The instant application (16/739,286) is obvious over copending application (17/510,248) since both the instant application and co-pending application teach a method/system and/or non-transitory computer-readable storage medium for obtaining a plurality of images of a home, determining a type(s) of room utilizing/training different neural network models, identifying features of the room(s) and determining a value of the home based on first and second features as input to a machine learning model. The independent claims of the instant invention further recite in part that the first image of the first room has a first resolution and generating, from the first image, a second image of the first room having a second resolution lower than the first resolution; and processing the second image of the first room with the first neural network model; whereas the co-pending application merely recites a second neural network model having different resolutions. This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3, 4, 10, 17, 20, 21 and 29-31 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas et al., International Publication No. WO 2019/164484 A1, Gao et al., US Patent Application Publication No US 2019/0114511 A1, in view of Ranzato US Patent No US 9129190 B1, in further view of Kwak et al., US Patent No US 11,392,998 B1.
With respect to Claims 1 and 21,
Thomas discloses,
at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform (¶11: “…computing system 100 includes a processor 101, machine-readable storage medium 102, and storage 107…”; ¶12: “…The processor 101 may be a central processing unit (CPU), a semiconductor- based microprocessor, or any other device suitable for retrieval and execution of instructions…”)
obtaining a plurality of images, the plurality of images including a first image of a first room inside a home and a second image of a second room inside of the home; (¶16: “…first model application to image instructions 103 may include instructions to apply the first model 108 to an image to determine a context associated with the environment of the image. For example, the computing system 100 may receive an image or may include a camera to capture an image…”; ¶25: “…The image may be any suitable image…”; ¶37: “a processor captures image of environment. For example, the image may be of a room. In one implementation, multiple images are captured to be analyzed. For example, the images maybe be images of different areas of the location or the images may be of the same location at different times. The images may be captured at any suitable time…”)
determining a type of the first room by processing the first image of the first room with a first neural network model (Figs 2-5, ¶14: “...storage 107 may store first model 108, second model 109, and third model 110. The first model 108, second model 109, and third model 110 may be image classification models. The second model 109 and the third model 110 may be sub-models of the first model 108 in a hierarchy. In one implementation, the third model 110 is a sub-model of the second model 109. The models may have a hierarchical relationship such that the output of a first model is used to select a second model to apply...” ¶17: “…The first model 108 may be a convolutional neural network trained for scene recognition…the first model 108 may be trained on a set of input images associated with different context types. The first model 108 may output information about context and confidence level associated with the context. The confidence level may be used to select a second model or to determine whether to use the output from the first model 108…”; ¶24: “…The output of the first model may be related to a description associated with the environment in which the image was taken. The output of the first model may be related to a location type associated with the image. For example, the first model may output information related to a location type and confidence level. The location type may be a room type…”; ¶32-¶37; ¶33: “…Location recognition model 301 is the first model in the hierarchy…: ¶37: “…a processor captures image of environment. For example, the image may be of a room. In one implementation, multiple images are captured to be analyzed…”)
determining a type of the second room by processing the second image of the second room with the first neural network model; (¶17: “…The first model 108 may output information about context and confidence level associated with the context The confidence level may be used to select a second model or to determine whether to use the output from the first model 108…”;¶24: “…the first model is a machine-learning model trained on a set of images. The first model may be trained on images of different environment types, and the first model may be trained and updated with new training images. The output of the first model may be related to a description associated with the environment in which the image was taken. The output of the first model may be related to a location type associated with the image. For example, the first model may output information related to a location type and confidence level. The location type may be a room type. …”; ¶32-¶37; ¶33: “…Location recognition model 301 is the first model in the hierarchy…”; ¶37: “…a processor captures image of environment. For example, the image may be of a room. In one implementation, multiple images are captured to be analyzed…”)
identifying at least one first feature in the first image of the first room by processing the first image with a second neural network model different from the first neural network model and trained using a first plurality of training images of rooms of a same type as the first room (¶16-¶19; ¶16: “...The first model application to image instructions 103 may include instructions to apply the first model 108 to an image to determine a context associated with the environment of the image...”; ¶19: “…The selected model application to image instructions 105 may include instructions to apply the selected model to the image. For example, if the second model 109 is selected, the processor 101 may apply the second model 109 to the image. The second model 109 may be applied to the entire image or a segment of the image tailored to the second model 109. The models may have any suitable level of hierarchy. For example, the output of the second model 109 may be used to select a fourth or fifth model to apply...”;¶25: “…The image may be any suitable image...There may be multiple images to be input into the model, such as multiple images of the same location at different time periods or images of the same location from multiple angles…”;¶27:“…the processor applies the selected second model to the image. The second model may be any suitable model. The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image...”; ¶28: “…The processor may select any suitable level of hierarchical models. For example, an additional model may be selected based on the output from the selected second model. There may be a stored cascade of hierarchical models including information about a relationship between models in the hierarchy such that the output of a first model is used to select a second model in the hierarchy.…”)
identifying at least one second feature in the second image of the second room by processing the second image of the second room with a third neural network model different from the first neural network model and second neural network model (¶17: “…The second or third model selection instructions 104 may include instructions to select at least one of the second and third model based on the determined context. As an example, the second model 109 may be a model to determine information about a home location, and the third model 110 may be a model to determine information about an office location, if the output from the first model 108 indicates that the location is in a home, the processor 101 may select the second model 109 to apply to the image. The second model 109 and the third model 110 may be convolutional neural network models trained to recognize objects of a particular type such that the second model 109 is related to a first object type and third model 110 is related to a second object type….”;¶18: “…the second model 109 may be a model to determine information about a home location, and the third model 110 may be a model to determine information about an office location, if the output from the first model 108 indicates that the location is in a home, the processor 101 may select the second model 109 to apply to the image. The second model 109 and the third model 110 may be convolutional neural network models trained to recognize objects of a particular type such that the second model 109 is related to a first object type and third model 110 is related to a second object type…”; ¶25: “...There may be multiple images to be input into the model, such as multiple images of the same location at different time periods or images of the same location from multiple angles. The environment may be an area around a user or electronic device...”; ¶26: “... a third model is associated with outdoors. The first model and second model may be directed to different types of analysis...”; ¶37: “...the image may be of a room. In one implementation, multiple images are captured to be analyzed. For example, the images maybe be images of different areas of the location or the images may be of the same location at different times. The images may be captured at any suitable time. In one implementation, the images are captured to be used to determine context information that is stored...”; ¶38: “... the processor determines environmental context associated with the image based on the application of hierarchical models...”)
the third neural network model trained using a second plurality of training images of rooms of a same type as the second room, (¶19, ¶22, ¶28: “…The processor may select any suitable level of hierarchical models. For example, an additional model may be selected based on the output from the selected second model. There may be a stored cascade of hierarchical models including information about a relationship between models in the hierarchy such that the output of a first model is used to select a second model in the hierarchy…”; ¶33, ¶37: “…a processor captures image of environment. For example, the image may be of a room. In one implementation, multiple images are captured to be analyzed. For example, the images maybe be images of different areas of the location or the images may be of the same location at different times. The images may be captured at any suitable time…”; ¶130: “...The convolution layers of the convolutional neural network serve as feature extractors. Convolution layers act as adaptive feature extractors capable of learning and decomposing the input data into hierarchical features. In one implementation, the convolution layers take two images as input and produce a third image as output...”).


Thomas discloses all of the above limitations, Thomas does not distinctly describe the following limitations, but Gao however as shown discloses,
the first neural network model having a first plurality of layers comprising at least a convolutional layer, a pooling layer, a fully connected layer, or a softmax layer (¶53: “...  FIG. 1 shows one implementation of a fully connected neural network with multiple layers... The network includes multiple layers of feature-detecting neurons. Each layer has many neurons that respond to different combinations of inputs from the previous layers. These layers are constructed so that the first layer detects a set of primitive patterns in the input image data, the second layer detects patterns of patterns and the third layer detects patterns of those patterns...”; ¶106: “...a first convolution layer can learn small local patterns such as edges, a second convolution layer will learn larger patterns made of the features of the first layers, and so on. This allows convolutional neural networks to efficiently learn increasingly complex and abstract visual concepts...”; ¶107: “...A convolutional neural network learns highly non-linear mappings by interconnecting layers of artificial neurons arranged in many different layers with activation functions that make the layers dependent. It includes one or more convolutional layers, interspersed with one or more sub-sampling layers and non-linear layers, which are typically followed by one or more fully connected layers…; ¶130: “... The convolution layers of the convolutional neural network serve as feature extractors. Convolution layers act as adaptive feature extractors capable of learning and decomposing the input data into hierarchical features. In one implementation, the convolution layers take two images as input and produce a third image as output...”)
the first plurality of training images including training images augmented by one or more transformations (¶418: “…augmenter, running on at least one of the processors, that progressively augments a set size of the pathogenic training set (first and second training images) based on the trained ensemble's evaluation of a synthetic set (one or more transformations) …”)
the second plurality of training images including training images augmented by one or more transformations, (¶418: “…augmenter, running on at least one of the processors, that progressively augments a set size of the pathogenic training set (first and second training images) based on the trained ensemble's evaluation of a synthetic set (one or more transformations) …”)
the second neural network model having a second plurality of layers comprising at least first deep neural network layers, a reduction layer, second deep neural network layers, an average pooling layer, a fully connected layer, a dropout layer, or a softmax layer (¶53: “...  FIG. 1 shows one implementation of a fully connected neural network with multiple layers... The network includes multiple layers of feature-detecting neurons. Each layer has many neurons that respond to different combinations of inputs from the previous layers. These layers are constructed so that the first layer detects a set of primitive patterns in the input image data, the second layer detects patterns of patterns and the third layer detects patterns of those patterns...”; ¶145: “…the convolutional neural network uses different numbers of convolution layers, sub-sampling layers, non-linear layers and fully connected layers…the convolutional neural network is a deep network with more layers…”;¶163: “...deep convolutional neural networks (CNNs) can be easily trained and improved accuracy has been achieved for image classification and object detection...”; ¶182: “...deep neural networks are a type of artificial neural networks that use multiple nonlinear and complex transforming layers to successively model high-level features...”; ¶183: “...Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are components of deep neural networks. Convolutional neural networks have succeeded particularly in image recognition with an architecture that comprises convolution layers, nonlinear layers, and pooling layers...”; ¶200: “...fewer or additional models can be used in the ensemble, ranging from 2 to 500...”; ¶235: “…two separate deep convolutional neural network models…”)
the third neural network model having a third plurality of layers comprising at least first deep neural network layers, a reduction layer, second deep neural network layers, an average pooling layer, a fully connected layer, a dropout layer, or a softmax layer (¶53: “...  FIG. 1 shows one implementation of a fully connected neural network with multiple layers... The network includes multiple layers of feature-detecting neurons. Each layer has many neurons that respond to different combinations of inputs from the previous layers. These layers are constructed so that the first layer detects a set of primitive patterns in the input image data, the second layer detects patterns of patterns and the third layer detects patterns of those patterns...”; ¶58: “...FIG. 4 is one implementation of sub-sampling layers (average/max pooling) in accordance with one implementation of the technology disclosed...”; ¶163: “...deep convolutional neural networks (CNNs) can be easily trained and improved accuracy has been achieved for image classification and object detection...”; ¶184: “...The goal of training deep neural networks is optimization of the weight parameters in each layer, which gradually combines simpler features into complex features so that the most suitable hierarchical representations can be learned from data...”;  ¶196: “..our deep learning network learns to extract features directly from the primary sequence. To incorporate information about protein structure, we trained separate networks to predict the secondary structure and solvent accessibility from the sequence alone, and then included these as subnetworks in the full model (FIG. 19 and FIG. 20) ...”; ¶200: “...fewer or additional models can be used in the ensemble, ranging from 2 to 500....”)
the first plurality of layers including at least one million parameters; the second plurality of layers including at least one million parameters;8109886.8Application No.: 16/739,2863 Docket No.: N0629.70000US01Reply to Office Action of March 26, 2020 the third plurality of layers including at least one million parameters(¶53: “..The network includes multiple layers of feature-detecting neurons. Each layer has many neurons that respond to different combinations of inputs from the previous layers. These layers are constructed so that the first layer detects a set of primitive patterns in the input image data, the second layer detects patterns of patterns and the third layer detects patterns of those patterns...”;¶132: “...The convolutional neural network uses a various number of convolution layers, each with different convolving parameters such as kernel size, strides, padding, number of feature maps and weights...”; ¶145: “... the convolutional neural network uses different numbers of convolution layers, sub-sampling layers, non-linear layers and fully connected layers...”; see at least Figs 4-7 showing usage of a plurality of convolution layers;  Fig 43 illustrating a semi-supervised learner including an ensemble of deep convolutional neural networks being iteratively trained; Figs 43-48 showing various cycles of semi-supervised learning with over one million parameters with a plurality of convolutional neural networks...”; ¶163: “...deep convolutional neural networks (CNNs) can be easily trained and improved accuracy has been achieved for image classification and object detection...”;¶184: “...The goal of training deep neural networks is optimization of the weight parameters in each layer, which gradually combines simpler features into complex features so that the most suitable hierarchical representations can be learned from data...”;  ¶200: “...fewer or additional models can be used in the ensemble, ranging from 2 to 500....”)
Thomas teaches techniques for applying hierarchical cascading models to an image of an environment to determine a context and/or description of the environment. The context information may provide environmental intelligence related to the location type or people or objects in the environment depicted in the image. Gao discloses image classification models which may be machine learning models that receive an image as input and output information about an environmental description associated with the image. Gao further discloses various neural network architectures with multiple layers and various techniques for training deep convolutional neural networks including subsampling layers (e.g., pooling) and fully-connected layers. Gao also teaches an augmenter for augmenting training images based on one or more transformations via neural network technology. Thomas and Gao are directed to the same field of endeavor since they are related to related to analyzing images utilizing neural network technology. One of ordinary skill in the art would have been motivated to combine the known method/system for training various architectures of deep neural networks with subsampling layers as taught by Gao to the techniques for determining a context of an environment based on hierarchical cascading models of Thomas to achieve the claimed invention and there would have been a reasonable expectation of success in doing so/ DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006), and the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such training features and architectures of neural networks into similar systems, hence resulting in improved accuracy for image classification and object detection. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas with the method/system for training deep neural networks as taught by Gao since it allows for processing and training of different combinations of neural network models comprising various layers which can be easily trained to improve accuracy for image classification and object detection (¶53, ¶105, ¶107, ¶132, ¶144, ¶163, ¶183, ¶184).
Thomas and Gao disclose all of the above limitations, the combination of Thomas and Gao does not distinctly describe the following limitations, but Ranzato however as shown discloses,
wherein the first image of the first room has a first resolution, wherein processing the first image of the first room with the first neural network model comprises: generating, from the first image, a second image of the first room having a second resolution lower than the first resolution; (col 1, lines 23-27: “...one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving an input image having a first resolution; down-sampling the input image to generate a second image having a second, lower resolution...”; col 6, lines 26-29: “...image classification system 100 includes an image down-sampler 106 that down-samples input images to generate low-resolution images, e.g., a low-resolution image 108 generated by down-sampling the input image 104...”)
processing the second image of the first room with the first neural network model (col 2, lines 58-67: “...down-sampling the first training image to generate a low-resolution first training image; processing the low-resolution first training image using the first neural network to generate a plurality of features of the low-resolution first training image ...”; col 3 lines 52-61: “...processing the low-resolution second training image using the first neural network to generate first scores for the low-resolution second training image in accordance with the updated values of the parameters of the first neural network...”; col 7, lines 2-5: The image classification system 100 generates category scores for the initial patch using a patch neural network 120...  the patch neural network 120 can also receive the patch at a lower resolution, i.e., a down-sampled version of the high-resolution patch for use in generating the patch scores...”; claim 4: “... obtaining a second training image; and performing another iteration of the stochastic gradient descent training procedure on the loss function using the second training image, comprising: down-sampling the second training image to generate a low-resolution second training image; processing the low-resolution second training image using the first neural network...”)
Thomas teaches techniques for applying hierarchical cascading models to an image of an environment to determine a context and/or description of the environment. The context information may provide environmental intelligence related to the location type or people or objects in the environment depicted in the image. Gao discloses image classification models which may be machine learning models that receive an image as input and output information about an environmental description associated with the image. Gao further discloses various neural network architectures with multiple layers and various techniques for training deep convolutional neural networks including subsampling layers (e.g., pooling) and fully-connected layers. Gao also teaches that sub-sampling layers reduce the resolution of features extracted by the convolution layers. Ranzato teaches various method/systems for receiving and identifying objects in images, and classifying images utilizing an image classification system and neural network technology. Thomas, Gao and Ranzato are directed to the same field of endeavor since they are related to related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas with the method/system for training deep neural networks as taught by Gao and the image classification techniques as taught by Ranzato since it allows for identifying objects in images, and classifying an input image via a trained image classification system (claim 1,  claim 4, col 3 lines 52-61, col 6, lines 26-29). 


Thomas, Gao and Ranzato disclose all of the above limitations, the combination of Thomas, Gao and Ranzato does not distinctly describe the following limitations, but Kwak however as shown discloses,
determining a value of the home at least in part by using the at least one first feature and the at least one second feature as input to a machine learning model different from the first neural network model, the second neural network model, and the third neural network (col 5, lines 5-10: “...The new information could further be analyzed and used to provide predictive information such as estimated property values, estimated utility costs, estimated property taxes, estimated cost of ownership, estimated property insurance payments as well as other kinds of predictive information...”; col 11, line 60- col 12 line 3: “...analyze image information and identify any appliances or other kinds of property structures... the image information may be processed by one or more machine learning and/or machine vision algorithms to detect and classify the state of one or more physical structures... remote device 404 captures images in a closet 902 of a home. This image is fed into a machine learning module configured to detect and classify property structures. After analyzing the image (in step 804), the system identifies water heater 906...”; col 12 lines 37-57: “...techniques include various kinds of deep neural networks. In some cases, embodiments may use one or more kinds of convolutional deep neural networks (CNNs) that are commonly used in image recognition and other areas of machine vision...Structure information about one or more property structures can be passed from remote to device 404 to server 403 during step 814. This information can then be used to predicting various outputs. By identifying appliances in a property and details of the current state of the appliances, a property information system can make more accurate estimates of a property's value and/or the cost of ownership of the property... the process described in FIG. 8 can apply to any kind of property structure, including fixed or built-in structures like walls, roofs, floors or other structures that a system can collect information about in order to provide an estimated property value...”; col 14, lines 3-39: “...a property information system may include a prediction system 1100 that can output an estimated property value 1102 for a property... The estimated property value 1102 can be determined according to various inputs. These inputs may include user collected images 1110 (for example, images of rooms, built-in structures and appliances) ...inputs may also include property structure data... regional property data ...”; col 14, line 58- col 15 line 3: “... To detect and classify property structures, and/or to predict estimated property values and/or cost of ownership, the embodiments may utilize a machine learning system. As used herein, the term “machine learning system” refers to any collection of one or more machine learning algorithms. Some machine learning systems may incorporate various different kinds of algorithms, as different tasks may require different types of machine learning algorithms. Generally, a machine learning system will take input data and output one or more kinds of predicted values. The input data could take any form including image data, text data, audio data or various other kinds of data...can be used for training, testing and deployment...  techniques include various kinds of deep neural networks. In some cases, embodiments may use one or more kinds of convolutional deep neural networks (CNNs) that are commonly used in image recognition and other areas of machine vision...”; col 15, lines 26-30: “...Embodiments may also use known techniques in deep learning to help process and classify objects within image data. These techniques include various kinds of deep neural networks. In some cases, embodiments may use one or more kinds of convolutional deep neural networks (CNNs) that are commonly used in image recognition and other areas of machine vision...”)
Kwak discloses a method/system for determining an estimated property value utilizing one or more kinds of convolutional deep neural networks for image recognition to estimate a property’s value. Kwak further discloses receiving/collecting image data as input to a machine learning system which incorporates various algorithms and output one or more kinds of predicted values to facilitate predicting property values. Thomas, Gao, Ranzato and Kwak are directed to the same field of endeavor since they are related to related to analyzing and processing images utilizing neural network technology. A person of ordinary skill in the art would have been motivated to combine the known machine learning system for predicting property values as taught by Kwak to the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao and the image classification techniques as taught by Ranzato to achieve the claimed invention and there would have been a reasonable expectation of success in doing so DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006), and the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such machine learning features into similar systems, hence resulting in improving the efficiency of the assessment process for predicting estimated property values. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao with the image classification techniques of Ranzato and the machine learning system for predicting property values as taught by Kwak since it allows for analyzing image information for detecting and classifying property structures to determine an estimated property value (col 5, lines 5-10; col 11 line 60-col 12 line 57; col14, lines 3-col 15, line 3).

With respect to Claims 3 and 29, 
Thomas, Gao, Ranzato and Kwak disclose all of the above limitations, Gao further discloses,
wherein the first neural network model comprises two neural network sub-models including a first sub-model having an average pooling layer and a second sub-model having a max pooling layer instead of the average pooling layer (¶133: “…Sub-sampling layers reduce the resolution of the features extracted by the convolution layers to make the extracted features or feature maps-robust against noise and distortion... sub-sampling layers employ two types of pooling operations, average pooling and max pooling. The pooling operations divide the input into non-overlapping two-dimensional spaces. For average pooling, the average of the four values in the region is calculated. For max pooling, the maximum value of the four values is selected….”)
Thomas, Gao, Ranzato and Kwak are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the image classification techniques of Ranzato and the machine learning system for predicting property values of Kwak with the method/system for training deep neural networks as taught by Gao since it allows for the implementation of sub-sampling techniques for making the extracted features or feature maps-robust against noise and distortion.

With respect to claims 4 and 30,
Thomas, Gao, Ranzato and Kwak disclose all of the above limitations, Thomas further discloses,
wherein processing the first image of the first room with the first neural network model comprises: processing the at least one image using the first sub-model to obtain first results; processing the first image using the second sub-model to obtain second results; and combining the first and second results to obtain an output result for the first neural network model(¶14: “…The first model 108, second model 109, and third model 110 may be image classification models. The second model 109 and the third model 110 may be sub-models of the first model 108 in a hierarchy…”; ¶23: “…subsequent model may be selected in a hierarchical manner based on the output from a previously applied model. The models may be machine learning models that receive an image as input and output information about an environmental description associated with the image…”; Fig 1, ¶24: “…a processor applies a first model to an image of an environment to select second model… The first model may be trained on images of different environment types, and the first model may be trained and updated with new training images. The output of the first model may be related to a description associated with the environment in which the image was taken. The output of the first model may be related to a location type associated with the image. For example, the first model may output information related to a location type and confidence level. The location type may be a room type…”; ¶26: “...The processor may select the second model in any suitable manner. For example, there may be a model associated with each output type from the first model, such as where the first model outputs the probability that an image is indoors and outdoors and where a second model is associated with indoors and a third model is associated with outdoors...”; ¶27: “…the processor applies the selected second model to the image. The second model may be any suitable model. The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image…”; ¶30: “...The processor may create the environmental description representation based on the output of models in addition to the second model, such as models above and below the second model in a hierarchy, in one implementation, the environmental description representation is created with different levels or types of details on the same object or person where the different details are provided from different models. The objects recognized in the image may be stored to create searchable environmental description information. The output from a model may include sets of data including object type, object position, and confidence level for each identified object in the image, and the environmental description representation may include objects or people recognized in the image from multiple models.…”)
With respect to claims 10 and 17,
Thomas, Gao, Ranzato and Kwak disclose all of the above limitations, Thomas further discloses, 
wherein the first space is a kitchen, and wherein identifying the at least one feature comprises identifying a type of material of a countertop in the kitchen and/or identifying a finish of an appliance in the kitchen (¶27: “…the processor applies the selected second model to the image. The second model may be any suitable model. The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image…”; ¶37: “…. The image may be associated with a location of a device for receiving a query and/or command or may be in a separate location associated with the query and/or command request, such as where a user in a living room requests information about an item in the kitchen...”) A person of ordinary skill in the art would have been motivated to combine the hierarchal modeling techniques for determining information /attributes of objects in an image as taught by Thomas to achieve the claimed invention (identifying a type of material of a countertop in the kitchen and/or identifying a finish of an appliance) and there would have been a reasonable expectation of success in doing so.  DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006).

With respect to claim 20,
Thomas discloses,
at least one computer hardware processor; and at least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform:(¶11: “…computing system 100 includes a processor 101, machine-readable storage medium 102, and storage 107…”; ¶12: “…The processor 101 may be a central processing unit (CPU), a semiconductor- based microprocessor, or any other device suitable for retrieval and execution of instructions…”)
obtaining a plurality of images, the plurality of images including a first image of a first room inside a home; (¶16: “…first model application to image instructions 103 may include instructions to apply the first model 108 to an image to determine a context associated with the environment of the image. For example, the computing system 100 may receive an image or may include a camera to capture an image…”; ¶25: “…The image may be any suitable image…”; ¶37: “a processor captures image of environment. For example, the image may be of a room. In one implementation, multiple images are captured to be analyzed. For example, the images maybe be images of different areas of the location or the images may be of the same location at different times. The images may be captured at any suitable time…”)
determining a type of the first room by: generating, from the first image, a second image (Figs 2-5, ¶14: “...storage 107 may store first model 108, second model 109, and third model 110. The first model 108, second model 109, and third model 110 may be image classification models. The second model 109 and the third model 110 may be sub-models of the first model 108 in a hierarchy. In one implementation, the third model 110 is a submodel of the second model 109. The models may have a hierarchical relationship such that the output of a first model is used to select a second model to apply...” ¶17: “…The first model 108 may be a convolutional neural network trained for scene recognition…the first model 108 may be trained on a set of input images associated with different context types. The first model 108 may output information about context and confidence level associated with the context. The confidence level may be used to select a second model or to determine whether to use the output from the first model 108…”; ¶24: “…The output of the first model may be related to a description associated with the environment in which the image was taken. The output of the first model may be related to a location type associated with the image. For example, the first model may output information related to a location type and confidence level. The location type may be a room type…”; ¶32-¶37; ¶33: “…Location recognition model 301 is the first model in the hierarchy…: ¶37: “…a processor captures image of environment. For example, the image may be of a room. In one implementation, multiple images are captured to be analyzed…)
identifying at least one first feature in the first image of the first room by processing the first image with a second neural network model different from the first neural network model and trained using images of rooms of a same type as the first room, ¶16-¶19; ¶16: “...The first model application to image instructions 103 may include instructions to apply the first model 108 to an image to determine a context associated with the environment of the image...”; ¶19: “…The selected model application to image instructions 105 may include instructions to apply the selected model to the image. For example, if the second model 109 is selected, the processor 101 may apply the second model 109 to the image. The second model 109 may be applied to the entire image or a segment of the image tailored to the second model 109. The models may have any suitable level of hierarchy. For example, the output of the second model 109 may be used to select a fourth or fifth model to apply...”;¶25: “…The image may be any suitable image...There may be multiple images to be input into the model, such as multiple images of the same location at different time periods or images of the same location from multiple angles…”;¶26: “... a third model is associated with outdoors. The first model and second model may be directed to different types of analysis...”; ¶27: “…the processor applies the selected second model to the image. The second model may be any suitable model. The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image...”; ¶28: “…The processor may select any suitable level of hierarchical models. For example, an additional model may be selected based on the output from the selected second model. There may be a stored cascade of hierarchical models including information about a relationship between models in the hierarchy such that the output of a first model is used to select a second model in the hierarchy.…”; ¶37: “...the image may be of a room. In one implementation, multiple images are captured to be analyzed. For example, the images maybe be images of different areas of the location or the images may be of the same location at different times. The images may be captured at any suitable time. In one implementation, the images are captured to be used to determine context information that is stored...”; ¶38: “... the processor determines environmental context associated with the image based on the application of hierarchical models...”)
Thomas discloses all of the above limitations, Thomas does not distinctly describe the following limitations, but Gao however as shown discloses,
processing the second image of the first room with a first neural network model comprising: a first neural network sub-model comprising a first plurality of layers comprising at least one million parameters, (¶24: “…the first model is a machine-learning model trained on a set of images. The first model may be trained on images of different environment types, and the first model may be trained and updated with new training images.,,”; ¶25: “…The image may be any suitable image...There may be multiple images to be input into the model, such as multiple images of the same location at different time periods or images of the same location from multiple angles…”; ¶133: “…Sub-sampling layers reduce the resolution of the features extracted by the convolution layers to make the extracted features or feature maps-robust against noise and distortion... sub-sampling layers employ two types of pooling operations, average pooling and max pooling. The pooling operations divide the input into non-overlapping two-dimensional spaces. For average pooling, the average of the four values in the region is calculated. For max pooling, the maximum value of the four values is selected….”;  ¶145: “... the convolutional neural network uses different numbers of convolution layers, sub-sampling layers, non-linear layers and fully connected layers...”; see at least Figs 4-7 showing usage of a plurality of convolution layers;  Fig 43 illustrating a semi-supervised learner including an ensemble of deep convolutional neural networks being iteratively trained; Figs 43-48 showing various cycles of semi-supervised learning utilizing over one million parameters with a plurality of convolutional neural networks...”)
the first plurality of layers comprising at least deep neural network layers, an average pooling layer, a fully connected layer, or a softmax layer; (¶53: “...  FIG. 1 shows one implementation of a fully connected neural network with multiple layers... The network includes multiple layers of feature-detecting neurons. Each layer has many neurons that respond to different combinations of inputs from the previous layers. These layers are constructed so that the first layer detects a set of primitive patterns in the input image data, the second layer detects patterns of patterns and the third layer detects patterns of those patterns...”; ¶145: “…the convolutional neural network uses different numbers of convolution layers, sub-sampling layers, non-linear layers and fully connected layers…the convolutional neural network is a deep network with more layers…”;¶163: “...deep convolutional neural networks (CNNs) can be easily trained and improved accuracy has been achieved for image classification and object detection...”; ¶182: “...deep neural networks are a type of artificial neural networks that use multiple nonlinear and complex transforming layers to successively model high-level features...”; ¶183: “...Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are components of deep neural networks. Convolutional neural networks have succeeded particularly in image recognition with an architecture that comprises convolution layers, nonlinear layers, and pooling layers...”; ¶200: “...fewer or additional models can be used in the ensemble, ranging from 2 to 500...”; ¶235: “…two separate deep convolutional neural network models…”)
a second neural network sub-model comprising a second plurality of layers comprising at least one million parameters, (¶53: ...The network includes multiple layers of feature-detecting neurons. Each layer has many neurons that respond to different combinations of inputs from the previous layers. These layers are constructed so that the first layer detects a set of primitive patterns in the input image data, the second layer detects patterns of patterns and the third layer detects patterns of those patterns...”;¶132: “...The convolutional neural network uses a various number of convolution layers, each with different convolving parameters such as kernel size, strides, padding, number of feature maps and weights...”; ¶145: “... the convolutional neural network uses different numbers of convolution layers, sub-sampling layers, non-linear layers and fully connected layers...”; see at least Figs 4-7 showing usage of a plurality of convolution layers;  Fig 43 illustrating a semi-supervised learner including an ensemble of deep convolutional neural networks being iteratively trained; Figs 43-48 showing various cycles of semi-supervised learning with over one million parameters with a plurality of convolutional neural networks...”; ¶163: “...deep convolutional neural networks (CNNs) can be easily trained and improved accuracy has been achieved for image classification and object detection...”;¶184: “...The goal of training deep neural networks is optimization of the weight parameters in each layer, which gradually combines simpler features into complex features so that the most suitable hierarchical representations can be learned from data...”;  ¶200: “...fewer or additional models can be used in the ensemble, ranging from 2 to 500....”)
the second plurality of layers comprising at least deep neural network layers, a max pooling layer, a fully connected layer, or a softmax layer; ((¶53: “...  FIG. 1 shows one implementation of a fully connected neural network with multiple layers... The network includes multiple layers of feature-detecting neurons. Each layer has many neurons that respond to different combinations of inputs from the previous layers. These layers are constructed so that the first layer detects a set of primitive patterns in the input image data, the second layer detects patterns of patterns and the third layer detects patterns of those patterns...”; ¶145: “…the convolutional neural network uses different numbers of convolution layers, sub-sampling layers, non-linear layers and fully connected layers…the convolutional neural network is a deep network with more layers…”;¶163: “...deep convolutional neural networks (CNNs) can be easily trained and improved accuracy has been achieved for image classification and object detection...”; ¶182: “...deep neural networks are a type of artificial neural networks that use multiple nonlinear and complex transforming layers to successively model high-level features...”; ¶183: “...Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are components of deep neural networks. Convolutional neural networks have succeeded particularly in image recognition with an architecture that comprises convolution layers, nonlinear layers, and pooling layers...”; ¶200: “...fewer or additional models can be used in the ensemble, ranging from 2 to 500...”; ¶235: “…two separate deep convolutional neural network models…”)
 the second neural network model further having a third plurality of layers comprising at least a convolutional layer, a pooling layer, a fully connected layer, or a softmax layer, (¶53: “...  FIG. 1 shows one implementation of a fully connected neural network with multiple layers... The network includes multiple layers of feature-detecting neurons. Each layer has many neurons that respond to different combinations of inputs from the previous layers. These layers are constructed so that the first layer detects a set of primitive patterns in the input image data, the second layer detects patterns of patterns and the third layer detects patterns of those patterns...”; ¶58: “...FIG. 4 is one implementation of sub-sampling layers (average/max pooling) in accordance with one implementation of the technology disclosed...”; ¶163: “...deep convolutional neural networks (CNNs) can be easily trained and improved accuracy has been achieved for image classification and object detection...”; ¶184: “...The goal of training deep neural networks is optimization of the weight parameters in each layer, which gradually combines simpler features into complex features so that the most suitable hierarchical representations can be learned from data...”;  ¶196: “..our deep learning network learns to extract features directly from the primary sequence. To incorporate information about protein structure, we trained separate networks to predict the secondary structure and solvent accessibility from the sequence alone, and then included these as subnetworks in the full model (FIG. 19 and FIG. 20) ...”; ¶200: “...fewer or additional models can be used in the ensemble, ranging from 2 to 500....”)
the third plurality of layers including at least one million parameters; (¶53: ...The network includes multiple layers of feature-detecting neurons. Each layer has many neurons that respond to different combinations of inputs from the previous layers. These layers are constructed so that the first layer detects a set of primitive patterns in the input image data, the second layer detects patterns of patterns and the third layer detects patterns of those patterns...”;¶132: “...The convolutional neural network uses a various number of convolution layers, each with different convolving parameters such as kernel size, strides, padding, number of feature maps and weights...”; ¶145: “... the convolutional neural network uses different numbers of convolution layers, sub-sampling layers, non-linear layers and fully connected layers...”; see at least Figs 4-7 showing usage of a plurality of convolution layers;  Fig 43 illustrating a semi-supervised learner including an ensemble of deep convolutional neural networks being iteratively trained; Figs 43-48 showing various cycles of semi-supervised learning with over one million parameters with a plurality of convolutional neural networks...”; ¶163: “...deep convolutional neural networks (CNNs) can be easily trained and improved accuracy has been achieved for image classification and object detection...”;¶184: “...The goal of training deep neural networks is optimization of the weight parameters in each layer, which gradually combines simpler features into complex features so that the most suitable hierarchical representations can be learned from data...”;  ¶200: “...fewer or additional models can be used in the ensemble, ranging from 2 to 500....”)
Thomas teaches techniques for applying hierarchical cascading models to an image of an environment to determine a context and/or description of the environment. The context information may provide environmental intelligence related to the location type or people or objects in the environment depicted in the image. Gao discloses image classification models which may be machine learning models that receive an image as input and output information about an environmental description associated with the image. Gao further discloses various neural network architectures with multiple layers and various techniques for training deep convolutional neural networks including subsampling layers (e.g., pooling) and fully-connected layers. Gao also teaches an augmenter for augmenting training images based on one or more transformations via neural network technology. Thomas and Gao are directed to the same field of endeavor since they are related to related to analyzing images utilizing neural network technology. One of ordinary skill in the art would have been motivated to combine the known method/system for training various architectures of deep neural networks with subsampling layers as taught by Gao to the techniques for determining a context of an environment based on hierarchical cascading models of Thomas to achieve the claimed invention and there would have been a reasonable expectation of success in doing so/ DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006), and the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such training features and architectures of neural networks into similar systems, hence resulting in improved accuracy for image classification and object detection. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas with the method/system for training deep neural networks as taught by Gao since it allows for processing and training of different combinations of neural network models comprising various layers which can be easily trained to improve accuracy for image classification and object detection (¶53, ¶105, ¶107, ¶132, ¶144, ¶163, ¶183, ¶184).
Thomas and Gao disclose all of the above limitations, the combination of Thomas and Gao does not distinctly describe the following limitations, but Ranzato however as shown discloses,
the first image having a first resolution, a second image of the first room having a second resolution lower than the first resolution (col 1, lines 23-27: “...one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving an input image having a first resolution; down-sampling the input image to generate a second image having a second, lower resolution...”; col 6, lines 26-29: “...image classification system 100 includes an image down-sampler 106 that down-samples input images to generate low-resolution images, e.g., a low-resolution image 108 generated by down-sampling the input image 104...”)
Thomas teaches techniques for applying hierarchical cascading models to an image of an environment to determine a context and/or description of the environment. The context information may provide environmental intelligence related to the location type or people or objects in the environment depicted in the image. Gao discloses image classification models which may be machine learning models that receive an image as input and output information about an environmental description associated with the image. Gao further discloses various neural network architectures with multiple layers and various techniques for training deep convolutional neural networks including subsampling layers (e.g., pooling) and fully-connected layers. Gao also teaches that sub-sampling layers reduce the resolution of features extracted by the convolution layers. Ranzato teaches various method/systems for receiving and identifying objects in images, and classifying images utilizing an image classification system and neural network technology. Thomas, Gao and Ranzato are directed to the same field of endeavor since they are related to related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas with the method/system for training deep neural networks as taught by Gao and the image classification techniques as taught by Ranzato since it allows for identifying objects in images, and classifying an input image via a trained image classification system (claim 1,  claim 4, col 3 lines 52-61, col 6, lines 26-29). 
Thomas, Gao and Ranzato disclose all of the above limitations, the combination of Thomas, Gao and Ranzato does not distinctly describe the following limitations, but Kwak however as shown discloses,
determining a value of the home at least in part by using the at least one first feature as input to a machine learning model different from the first neural network model and the second neural network model. (col 5, lines 5-10: “...The new information could further be analyzed and used to provide predictive information such as estimated property values, estimated utility costs, estimated property taxes, estimated cost of ownership, estimated property insurance payments as well as other kinds of predictive information...”; col 11, line 60- col 12 line 3: “...analyze image information and identify any appliances or other kinds of property structures... the image information may be processed by one or more machine learning and/or machine vision algorithms to detect and classify the state of one or more physical structures... remote device 404 captures images in a closet 902 of a home. This image is fed into a machine learning module configured to detect and classify property structures. After analyzing the image (in step 804), the system identifies water heater 906...”; col 12 lines 37-57: “...techniques include various kinds of deep neural networks. In some cases, embodiments may use one or more kinds of convolutional deep neural networks (CNNs) that are commonly used in image recognition and other areas of machine vision...Structure information about one or more property structures can be passed from remote to device 404 to server 403 during step 814. This information can then be used to predicting various outputs. By identifying appliances in a property and details of the current state of the appliances, a property information system can make more accurate estimates of a property's value and/or the cost of ownership of the property... the process described in FIG. 8 can apply to any kind of property structure, including fixed or built-in structures like walls, roofs, floors or other structures that a system can collect information about in order to provide an estimated property value...”; col 14, lines 3-39: “...a property information system may include a prediction system 1100 that can output an estimated property value 1102 for a property... The estimated property value 1102 can be determined according to various inputs. These inputs may include user collected images 1110 (for example, images of rooms, built-in structures and appliances) ...inputs may also include property structure data... regional property data ...”; col 14, line 58- col 15 line 3: “... To detect and classify property structures, and/or to predict estimated property values and/or cost of ownership, the embodiments may utilize a machine learning system. As used herein, the term “machine learning system” refers to any collection of one or more machine learning algorithms. Some machine learning systems may incorporate various different kinds of algorithms, as different tasks may require different types of machine learning algorithms. Generally, a machine learning system will take input data and output one or more kinds of predicted values. The input data could take any form including image data, text data, audio data or various other kinds of data...can be used for training, testing and deployment...  techniques include various kinds of deep neural networks. In some cases, embodiments may use one or more kinds of convolutional deep neural networks (CNNs) that are commonly used in image recognition and other areas of machine vision...”; col 15, lines 26-30: “...Embodiments may also use known techniques in deep learning to help process and classify objects within image data. These techniques include various kinds of deep neural networks. In some cases, embodiments may use one or more kinds of convolutional deep neural networks (CNNs) that are commonly used in image recognition and other areas of machine vision...”)
Kwak discloses a method/system for determining an estimated property value utilizing one or more kinds of convolutional deep neural networks for image recognition to estimate a property’s value. Kwak further discloses receiving/collecting image data as input to a machine learning system which incorporates various algorithms and output one or more kinds of predicted values to facilitate predicting property values. Thomas, Gao, Ranzato and Kwak are directed to the same field of endeavor since they are related to related to analyzing and processing images utilizing neural network technology. A person of ordinary skill in the art would have been motivated to combine the known machine learning system for predicting property values as taught by Kwak to the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao and the image classification techniques as taught by Ranzato to achieve the claimed invention and there would have been a reasonable expectation of success in doing so DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006), and the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such machine learning features into similar systems, hence resulting in improving the efficiency of the assessment process for predicting estimated property values. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao with the image classification techniques of Ranzato and the machine learning system for predicting property values as taught by Kwak since it allows for analyzing image information for detecting and classifying property structures to determine an estimated property value (col 5, lines 5-10; col 11 line 60-col 12 line 57; col14, lines 3-col 15, line 3).


With respect to Claim 31,
Thomas, Gao, Ranzato, Kwak disclose all of the above limitations, Thomas further discloses, 
wherein: the second neural network model was trained using a plurality of training images of rooms of a same type as the first room (¶16-¶19; ¶16: “...The first model application to image instructions 103 may include instructions to apply the first model 108 to an image to determine a context associated with the environment of the image...”; ¶19: “…The selected model application to image instructions 105 may include instructions to apply the selected model to the image. For example, if the second model 109 is selected, the processor 101 may apply the second model 109 to the image. The second model 109 may be applied to the entire image or a segment of the image tailored to the second model 109. The models may have any suitable level of hierarchy. For example, the output of the second model 109 may be used to select a fourth or fifth model to apply...”;¶25: “…The image may be any suitable image...There may be multiple images to be input into the model, such as multiple images of the same location at different time periods or images of the same location from multiple angles…”;¶27:“…the processor applies the selected second model to the image. The second model may be any suitable model. The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image...”; ¶28: “…The processor may select any suitable level of hierarchical models. For example, an additional model may be selected based on the output from the selected second model. There may be a stored cascade of hierarchical models including information about a relationship between models in the hierarchy such that the output of a first model is used to select a second model in the hierarchy.…”)
the plurality of training images including training images augmented by one or more transformations (¶418: “…augmenter, running on at least one of the processors, that progressively augments a set size of the pathogenic training set (first and second training images) based on the trained ensemble's evaluation of a synthetic set (one or more transformations) …”)
Thomas, Gao, Ranzato and Kwak are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the machine learning system for predicting property values of Kwak and image classification techniques of Ranzato with the method/system for training deep neural networks of Gao since it allows for processing and training of different combinations of neural network models comprising various layers which can be easily trained to improve accuracy for image classification and object detection (¶53, ¶105, ¶107, ¶132, ¶144, ¶163, ¶183, ¶184).

Claims 11 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas, Gao, Ranzato, Kwak in further view of Yu et al., US Patent Application Publication No US 2020/0311871 A1,
With respect to Claim 11, 
Thomas, Gao, Ranzato and Kwak disclose all of the above limitations, the combination of Thomas, Gao, Ranzato and Kwak does not distinctly describe the following limitations but Yu however as shown discloses,
wherein the processor-executable instructions further cause the at least one computer hardware processor to perform: processing multiple images using the first neural network model to identify images for which the first neural network model output differs from labels produced by manual classification obtaining new labels for at least some of the multiple images; (Fig 5, Fig 6, ¶17: “...an embodiment of this application provides an image reconstruction device, including a processor and a memory. The memory is configured to store a program instruction, and the processor is configured to invoke the program instruction...¶31; ¶48: “...The neural network is a network constituted by joining many single neural units together, to be specific, an output of a neural unit may be an input of another neural unit. An input of each neural unit may be connected to a local receptive field of a previous layer to extract a feature of the local receptive field. The local receptive field may be a region including several neural units...”; ¶49: “...The convolutional neural network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network includes a feature extractor including a convolutional layer and a sub-sampling layer...”; ¶61: “... In the human-computer interaction-based method, a system usually uses the low-level feature, and a user adds high-level knowledge. The extraction method mainly includes two aspects: image preprocessing and feedback learning. An image preprocessing manner may be manually labeling images in an image library, or may be some automatic or semi-automatic image semantic labeling methods. Feedback learning is to add manual intervention to the process of extracting image semantics, extract semantic features of the image through repeated interactions between the user and the system, and establish and correct high-level semantic concepts associated with image content.
updating one or more parameters of the first neural network model by using the at least some of the multiple images with the new labels (¶59, ¶61: “...An image preprocessing manner may be manually labeling images in an image library, or may be some automatic or semi-automatic image semantic labeling methods. Feedback learning is to add manual intervention to the process of extracting image semantics, extract semantic features of the image through repeated interactions between the user and the system, and establish and correct high-level semantic concepts associated with image content.…”)
Yu teaches a method/system for image reconstruction. Yu further teaches techniques for converting an input image into a text-like language expression that can be intuitively understood via various methods for extracting image high-level semantic features including but not limited to a human-computer interaction-based method. Thomas, Gao, Ranzato, Kwak and Yu are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato, and the machine learning system for predicting property values of Kwak with the human-computer interaction based method for extracting image features as taught by Yu since it allows for establishing and correcting high-level semantic concepts associated with image content via manually labelling images and feedback learning (¶58-61).



With respect to Claim 14, 
Thomas, Gao, Ranzato and Kwak disclose all of the above limitations, the combination of Thomas, Gao, Ranzato and Kwak does not distinctly describe the following limitations but Yu however as shown discloses,
wherein the second neural network model uses a bank of convolution kernels having different resolutions (¶16: “...each convolution layer in the three-layer fully convolutional deep neural network includes at least one convolution kernel, and a weight matrix W of the convolution kernel is a parameter in the initial super-resolution model...”; ¶50: “...A convolution kernel may be initialized in a form of a random-size matrix. A proper weight may be obtained by a convolution kernel through learning in a training process of the convolutional neural network. In addition, a direct benefit brought by weight sharing is to reduce a connection between layers of the convolutional neural network, and further reduce an overfitting risk...”; ¶99: “... the super-resolution submodel 1021 may be a three-layer fully convolutional deep neural network. In the three-layer convolutional deep neural network, the first convolution layer may be an input layer, and is used to extract image information by region. The input layer may include a plurality of convolution kernels, used to extract different image information ...reconstruction process may be performing reconstruction by performing a convolution operation on an image by using the plurality of convolution kernels. An output of the output layer may be a 3-channel (color) image or a single-channel (grayscale) image...”; ¶100: “...If the super-resolution model includes a plurality of super-resolution sub models that are cascaded, in the foregoing three-layer fully convolutional deep neural network, the first convolution layer and the second convolution layer are used to extract image information from a low-resolution image, that is, obtain information that can be used for super-resolution reconstruction. The third convolution layer reconstructs a high-resolution image by using the image information extracted and transformed by the first two layers...”; ¶101: “...In an embodiment, the weight vector W of the convolution kernel may be a parameter in the super-resolution model...”;  ¶114: “...It should be noted that a size of each convolution kernel at the first convolution layer may be different from a size of a convolution kernel at the third convolution layer, and a size of a convolution kernel at the second convolution layer may be 1. A quantity of convolution kernels at the first convolution layer, a quantity of convolution kernels at the second convolution layer, and a quantity of convolution kernels at the third convolution layer may be the same or different. k is a positive integer satisfying 1≤k≤n−1...”)
Yu teaches a method/system for image reconstruction. Yu further teaches techniques for utilizing a deep neural network with a convolutional structure including a feature extractor, convolutional layer and sub-sampling layer whereby a plurality of convolution kernels may be used to extract different image information. Yu also teaches that the size and quantity of convolution kernel at a convolutional layer may be the same or different. Thomas, Gao, Ranzato, Kwak and Yu are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato, and the machine learning system for predicting property values of Kwak with the image reconstruction training techniques as taught by Yu since it allows for reconstructing a high-quality image by extracting more precise image information using fewer calculations  (¶50, ¶88, ¶98-¶101, ¶114).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Thomas, Gao, Ranzato, Kwak in further view of Hellman et al., US Patent Application Publication No US 2019/0259293 A1.

With respect to Claim 19, 
Thomas, Gao, Ranzato, Kwak disclose all of the above limitations, the combination of Thomas, Gao, Ranzato, Kwak does not distinctly describe the following limitations, but Hellman however as shown discloses,
wherein the machine learning model is a random forest model (¶301: “…exemplary models can include, for example, a logistic regression model, a random forest model, a decision tree model, a probabilistic model, deep learning model, a neural network, a Bayesian network, or the like. In some embodiments, for example, a random forest model, and/or a logistic regression model may be the easy to fully train, whereas, a deep learning model, a neural network, and/or Bayesian network may better address issues of high complexity but may also be more difficult to train…”)
Hellman discloses a system for customizing an evaluation model to an evaluation style utilizing a model database including a plurality of evaluation models (including trained machine learning models). Hellman further discloses a random forest model for training purposes. Thomas, Gao, Ranzato, Kwak and Hellman are directed to the same field of endeavor since they are related to data processing techniques utilizing machine learning technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato and the machine learning system for predicting property values of Kwak with machine learning model customization features as taught by Hellman since it allows for selecting/determining model types for training data (Abstract, ¶301).

Claims 22-26, 32 and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas, Gao, Ranzato, Kwak in further view of Ho et al., US Patent Application Publication No US20190236440A1.



With respect to Claims 22 and 34,
Thomas, Gao, Ranzato, Kwak disclose all of the above limitations, the combination of Thomas, Gao, Ranzato, Kwak does not distinctly describe the following limitations, but Ho however as shown discloses,
wherein the second plurality of layers comprises first deep neural network layers, a reduction layer, second deep neural network layers, a fully connected layer, a dropout layer, and a softmax layer (Abstract: “...a plurality of pooled convolutional layers connected sequentially...”’ Fig 3, 4A, 4B; ¶28: “...FIG. 4B is a diagram of a cascading deep convolutional neural network architecture...”; ¶33: “…CNN usually consists of several cascaded convolutional layers, comprising fully -connected artificial neurons. In some cases, it can also include pooling layers (average pooling or max pooling). In some cases, it can also include activation layers. In some cases, a final layer can be a softmax layer for classification and/or detection tasks. The convolutional layers are generally utilized to learn the spatial local-connectivity of input data for feature extraction. The pooling layer is generally for reduction of receptive field and hence to protect against overfitting. Activations, for example nonlinear activations, are generally used for boosting of learned features. Various variants to the standard CNN architecture can use deeper (more layers) and wider (larger layer size) architectures. To avoid overfitting for deep neural networks, some regularization methods can be used, such as dropout or dropconnect; which turn off neurons learned with a certain probability in training and prevent the co-adaptation of neurons during the training phase.
Ho teaches a method/system for building a deep convolutional neural network architecture. Thomas, Gao, Ranzato, Kwak and Ho are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato and the machine learning system for predicting property values of Kwak with the system/method for building a deep convolutional neural network architecture as taught by Ho since it allows for an improved performance in achieving object recognition via a plurality of pooled convolutional layers (Abstract, ¶13, ¶33, ¶43, ¶82, Fig 4B).

With respect to Claim 23, 
Thomas, Gao, Ranzato, Kwak and Ho disclose all of the above limitations, Ho further discloses,
processing the first image with the first deep neural network layers to obtain first results; providing the first results as input to the reduction layer to obtain second results; providing the second results as input to the second deep neural network layers to obtain third results; providing the third results as input to the average pooling layer to obtain fourth results; providing the fourth results as input to the fully connected layer to obtain fifth results; providing the fifth results as input to the dropout layer to obtain sixth results; and providing the sixth results as input to the softmax layer to obtain an output result for the second neural network model.(Figs 3, 4A, 4B; ¶42: “…the CNN module 122 is able to build and use an embodiment of a deep convolutional neural network architecture (referred to herein as a Global-Connected Net or a GC-Net…”; ¶43: “…CNN architecture with cascaded connected layers; where hidden blocks are pooled and then fed into a subsequent hidden block, and so on until a final hidden block followed by an output or softmax layer. FIG. 4A illustrates an embodiment of the GC-Net CNN architecture where inputs (X) 402 are fed into plurality of pooled convolutional layers connected sequentially. Each pooled convolutional layer includes a hidden block and a pooling layer…In addition to this cascading structure, this embodiment of the GC-Net CNN architecture also includes connecting the output of each hidden block 404 to a respective global average pooling (GAP) layer, which, for example, takes an average of each feature map from the last convolutional layer. Each GAP layer is then fed to the final hidden block 408. A softmax classifier 412 can then be used, the output of which can form the output (Y) 414 of the CNN…; ¶44: “…As shown in FIG. 4A, the GC-Net architecture consists of n blocks 404 in total, a fully-connected final hidden layer 408 and a softmax classifier 412. each block 404 can have several convolutional layers, each followed by normalization layers and activation layers. The pooling layers 406 can include max-pooling or average pooling layers to be applied between connected blocks to reduce feature map sizes… which is fed as input into the last fully-connected hidden layer 408 and then to the softmax classifier 412”; ¶45)
Ho teaches a method/system for building a deep convolutional neural network architecture. Thomas, Gao, Ranzato, Kwak and Ho are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato and the machine learning system for predicting property values of Kwak with the system/method for building a deep convolutional neural network architecture as taught by Ho since it allows for an improved performance in achieving object recognition via a plurality of pooled convolutional layers connected sequentially (Abstract, ¶13, ¶33, ¶43, ¶82, Fig 4B, claim 1).
With respect to Claim 24,
Thomas, Gao, Ranzato, Kwak disclose all of the above limitations, the combination of Thomas, Gao, Ranzato, Kwak does not distinctly describe the following limitations, but Ho however as shown discloses,
wherein8842775.1Application No.: 16/739,2869 Docket No.: N0629.70000US01 After Final Office Action of October 9, 2020 the first plurality of layers of the first neural network sub-model comprises deep neural network layers, an average pooling layer, a fully connected layer, and a softmax layer; and the second plurality of layers of the second neural network sub-model comprises deep neural network layers, a max pooling layer, a fully connected layer, and a softmax layer. (¶42: “…a deep convolutional neural network architecture (referred to herein as a Global-Connected Net or a GC-Net…”;¶43: “…43: “…CNN architecture with cascaded connected layers; where hidden blocks are pooled and then fed into a subsequent hidden block, and so on until a final hidden block followed by an output or softmax layer…”;¶44: “…The pooling layers 406 can include max-pooling or average pooling layers to be applied between connected blocks to reduce feature map sizes… which is fed as input into the last fully-connected hidden layer 408 and then to the softmax classifier 412”;¶45)
Thomas, Gao, Ranzato, Kwak and Ho are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato and the machine learning system for predicting property values of Kwak with the system/method for building a deep convolutional neural network architecture as taught by Ho since it allows for an improved performance in achieving object recognition via a plurality of pooled convolutional layers connected sequentially (Abstract, ¶13, ¶33, ¶43, ¶82, Fig 4B, claim 1).

With respect to Claim 25,
Thomas, Gao, Ranzato, Kwak and Ho disclose all of the above limitations, Thomas further discloses, 
wherein processing the second image of the first room with the first neural network model comprises: processing the second image using the first neural network sub-model to obtain first output results; (¶14: “…The first model 108, second model 109, and third model 110 may be image classification models. The second model 109 and the third model 110 may be sub-models of the first model 108 in a hierarchy…”; ¶17: “…The first model 108 may be a convolutional neural network trained for scene recognition. For example, the first model 108 may be trained on a set of input images associated with different context types. The first model 108 may output information about context and confidence level associated with the context…”;¶18: “…the second model 109 may be a model to determine information about a home location… if the output from the first model 108 indicates that the location is in a home, the processor 101 may select the second model 109 to apply to the image…”; ¶23: “…subsequent model may be selected in a hierarchical manner based on the output from a previously applied model. The models may be machine learning models that receive an image as input and output information about an environmental description associated with the image…”; ¶24: “…a processor applies a first model to an image of an environment to select second model… The first model may be trained on images of different environment types, and the first model may be trained and updated with new training images. The output of the first model may be related to a description associated with the environment in which the image was taken. The output of the first model may be related to a location type associated with the image. For example, the first model may output information related to a location type and confidence level. The location type may be a room type…”; ¶26)
processing the second image using the second neural network sub-model to obtain second output results; and combining the first output results and second output results to obtain an output result for the first neural network model (Fig 2, ¶23: “…subsequent model may be selected in a hierarchical manner based on the output from a previously applied model. The models may be machine learning models that receive an image as input and output information about an environmental description associated with the image…”;¶27: “…the processor applies the selected second model to the image. The second model may be any suitable model. The second model may be a machine learning model that classifies images. The second model may classify subjects or objects within the image, such as by segmenting and identifying an object or identifying an object provided in a segment of the image to the second model. In one implementation, the model is related to a particular object type. For example, the second model may be provided an image of a couch, and the second model determines information about the couch, such as brand, in one implementation, the output of the second model is related to attributes of objects in the image…”; ¶30: “...The processor may create the environmental description representation based on the output of models in addition to the second model, such as models above and below the second model in a hierarchy, in one implementation, the environmental description representation is created with different levels or types of details on the same object or person where the different details are provided from different models. The objects recognized in the image may be stored to create searchable environmental description information. The output from a model may include sets of data including object type, object position, and confidence level for each identified object in the image, and the environmental description representation may include objects or people recognized in the image from multiple models.…”)
With respect to Claim 26, 
Thomas, Gao, Ranzato, Kwak and Ho disclose all of the above limitations, Ho further discloses,
processing the second image with the deep neural network layers to obtain first results; providing the first results as input to the average pooling layer to obtain second results; providing the second results as input to the fully connected layer to obtain third results; and providing the third results as input to the softmax layer to obtain the first output results. (Figs 3, 4A, 4B; ¶42: “…the CNN module 122 is able to build and use an embodiment of a deep convolutional neural network architecture (referred to herein as a Global-Connected Net or a GC-Net…”; ¶43: “…CNN architecture with cascaded connected layers; where hidden blocks are pooled and then fed into a subsequent hidden block, and so on until a final hidden block followed by an output or softmax layer. FIG. 4A illustrates an embodiment of the GC-Net CNN architecture where inputs (X) 402 are fed into plurality of pooled convolutional layers connected sequentially. Each pooled convolutional layer includes a hidden block and a pooling layer…In addition to this cascading structure, this embodiment of the GC-Net CNN architecture also includes connecting the output of each hidden block 404 to a respective global average pooling (GAP) layer, which, for example, takes an average of each feature map from the last convolutional layer. Each GAP layer is then fed to the final hidden block 408. A softmax classifier 412 can then be used, the output of which can form the output (Y) 414 of the CNN…; ¶44: “…As shown in FIG. 4A, the GC-Net architecture consists of n blocks 404 in total, a fully-connected final hidden layer 408 and a softmax classifier 412. each block 404 can have several convolutional layers, each followed by normalization layers and activation layers. The pooling layers 406 can include max-pooling or average pooling layers to be applied between connected blocks to reduce feature map sizes… which is fed as input into the last fully-connected hidden layer 408 and then to the softmax classifier 412”; ¶45)
Ho teaches a method/system for building a deep convolutional neural network architecture. Thomas, Gao, Ranzato, Kwak and Ho are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato and the machine learning system for predicting property values of Kwak with the system/method for building a deep convolutional neural network architecture as taught by Ho since it allows for an improved performance in achieving object recognition via a plurality of pooled convolutional layers (Abstract, ¶13, ¶33, ¶43, ¶82, Fig 4B).
With respect to Claim 32,
Thomas, Gao, Ranzato, Kwak disclose all of the above limitations, the combination of Thomas, Gao, Ranzato, Kwak does not distinctly describe the following limitations, but Ho however as shown discloses,
wherein the second plurality of layers comprises first deep neural network layers, a reduction layer, second deep neural network layers, a fully connected layer, a dropout layer, and a softmax layer ((¶33: “…CNN usually consists of several cascaded convolutional layers, comprising fully -connected artificial neurons. In some cases, it can also include pooling layers (average pooling or max pooling). In some cases, it can also include activation layers. In some cases, a final layer can be a softmax layer for classification and/or detection tasks. The convolutional layers are generally utilized to learn the spatial local-connectivity of input data for feature extraction. The pooling layer is generally for reduction of receptive field and hence to protect against overfitting. Activations, for example nonlinear activations, are generally used for boosting of learned features. Various variants to the standard CNN architecture can use deeper (more layers) and wider (larger layer size) architectures. To avoid overfitting for deep neural networks, some regularization methods can be used, such as dropout or dropconnect; which turn off neurons learned with a certain probability in training and prevent the co-adaptation of neurons during the training phase.
Ho teaches a method/system for building a deep convolutional neural network architecture. Thomas, Gao, Ranzato, Kwak and Ho are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato and the machine learning system for predicting property values of Kwak with the system/method for building a deep convolutional neural network architecture as taught by Ho since it allows for an improved performance in achieving object recognition via a plurality of pooled convolutional layers (Abstract, ¶13, ¶33, ¶43, ¶82, Fig 4B).
Claims 27 and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Thomas, Gao, Ranzato, Kwak in further view of Turkelson et al., US Patent Application Publication No US 2020/0193206 A1.
With respect to Claim 27,
Thomas, Gao, Ranzato and Kwak disclose all of the above limitations, the combination of Thomas, Gao, Ranzato, Kwak does not distinctly describe the following limitations, but Turkelson however as shown discloses,
wherein the first resolution is 600x600 pixels and the second resolution is 300x300 pixels (¶4: “..., trained computer-vision models ingest an image, detect an object from among an ontology of objects in the image, and indicate a bounding area in pixel coordinates of the object along with a confidence score...”; ¶28: “...the object recognition model is responsive to both pixel values and context classifications when recognizing objects...”; ¶63: “...one or more post-image analysis processes may be performed to the image to enhance the image and perform additional, or subsequent, object recognition analysis to the enhanced image. For example, if an image is determined to include a first object at a first location within the image, the image may be cropped about a region of interest (ROI) centered about the first location, the region of interest may have its resolution, clarity, or prominence increased, or portions of the image not included within the region of interest may be compressed or otherwise have their resolution downscaled. The enhanced may then be provided as an input to the object recognition model to determine whether a second (or other) object is recognized within the enhanced image, and if so, an object identifier of the second object may be assigned to the second object...”;¶64: “...context classification subsystem 112 and object recognition subsystem 114 may extract visual features describing an image to determine a context of the image and objects depicted by the image. In some embodiments, the process of extracting features from an image represents a technique for reducing the dimensionality of an image, which may allow for simplified and expedited processing of the image, such as in the case of object recognition. An example of this concept is an N×M pixel red-blue-green (RBG) image being reduced from NxMx3 features to NxM features using a mean pixel value process of each pixel in the image from all three-color channels....”)
Turkelson teaches a context classification subsystem and object recognition subsystem for extracting visual features describing an image to determine a context of the image and objects depicted by the image utilizing neural network architecture. Turkelson further teaches that the object recognition model is responsive to both pixel values and context classifications when recognizing objects. Thomas, Gao, Ranzato, Kwak and Turkelson are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. A person of ordinary skill in the art would have been motivated to combine the known techniques for extracting features from an image as taught by Turkelson to achieve the claimed invention and there would have been a reasonable expectation of success in doing so. DyStar Textilfarben GmbH & Co. Deutschland KG v. C.H. Patrick Co., 464 F.3d 1356, 1360, 80 USPQ2d 1641, 1645 (Fed. Cir. 2006). Moreover, the claimed invention would have been obvious since the context classification subsystem and object recognition subsystem techniques could have prompted one of ordinary skill in the art to vary the prior art in a predictable manner to result in the claimed invention, hence allowing for a simplified and expedited processing of an image via object recognition. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato and the machine learning system for predicting property values of Kwak with the object recognition model and content classification subsystem techniques as taught by Turkelson since it allows for extract visual features describing an image to determine a context of the image and objects depicted by the image (¶4, ¶28, ¶63, ¶64).
With respect to Claim 33,
Thomas, Gao, Ranzato and Kwak disclose all of the above limitations, the combination of Thomas, Gao, Ranzato, Kwak does not distinctly describe the following limitations, but Turkelson however as shown discloses, 
wherein the first plurality of training images and the second plurality of training images each comprise at least 10,000 training images (¶25: “... Many examples are trained with sets of images including tens of thousands of images of each object the model is capable of detecting. Various approaches have been developed for use cases with smaller sets of training images, while candidate objects in an ontology are relatively large. For example, some training data sets may include less than 100 example images of each object, less than 10 example images of each object, or even a single image of each object, while the number of objects in the ontology may be more than 1,000, more than 10,000, more than 100,000, or more than 1,000,000...”; ¶66: “...model subsystem 116 may be configured to retrieve models stored within model database 138, provide the retrieved models to one or more subsystems for analyzing an image or set of images (e.g., to context classification subsystem 112, object recognition subsystem 114, etc.), as well as to train one or more models and generate training data for training the one or more models...”; ¶67: “...the context classification model may include a scene classification model, which may be trained on a training data set including a plurality of images depicting various scenes, where each image includes a label of the scene depicted by that image... An example set of images depicting various scenes labeled with scene identifiers of those scenes include the Places365-Standard data set, which includes over 10 million images having over 400 different categories.…”; ¶68: “..model subsystem 116 may train an object recognition model based on a training set including a plurality of images depicting different objects, where each image is labeled with an object identifier of the object from an object ontology depicted by the image. In some embodiments, the computer-vision object recognition model may be generated to specifically recognize the objects depicted by the images within a training data set...”)
Turkelson discloses a computer-vision object recognition model trained using training data sets including images and objects represented in the images. Turkelson further discloses a neural network architecture for determining a context of an image and an object depicted by the image based on the context utilizing a model subsystem, trained context classification model and a trained object recognition model. Thomas, Gao, Ranzato, Kwak and Turkelson are directed to the same field of endeavor since they are related to analyzing and processing images utilizing neural network technology. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the hierarchical cascading model techniques of Thomas, the method/system for training deep neural networks of Gao, the image classification techniques of Ranzato and the machine learning system for predicting property values of Kwak with the model subsystem features as taught by Turkelson since it allows for training sets of images depicting various scenes and objects within a training data set (¶65-¶69).

Conclusion
References cited but not used:
Simard et al., US 7,286,699 B2, “System and Method Facilitating Pattern Recognition”, relating to method/system for facilitating pattern recognition utilizing a convolutional neural network employing feature extraction layers and classifier layers.
Luciw et al., US 2018/0330238 A1, “SYSTEMS AND METHODS TO ENABLE CONTINUAL, MEMORY-BOUNDED LEARNING IN ARTIFICIAL INTELLIGENCE AND DEEP LEARNING CONTINUOUSLY OPERATING APPLICATIONS ACROSS NETWORKED COMPUTE EDGES”, relating to continually and optimally learning and training of images and/or object detection using deep neural network technology.
Tang et al., US 2018/0300855 A1, “Method and a System for Image Processing”, relating to method/system for processing an image by filtering a first image.
Any inquiry of a general nature or relating to the status of this application or concerning this communication or earlier communications from the Examiner should be directed to Kimberly L. Evans whose telephone number is 571.270.3929.  The Examiner can normally be reached on Monday-Friday, 9:30am-5:00pm.  If attempts to reach the examiner by telephone are unsuccessful, the Examiner’s supervisor, Lynda Jasmin can be reached at 571.272.6782.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://portal.uspto.gov/external/portal/pair <http://pair-direct.uspto.gov >.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866.217.9197 (toll-free). Any response to this action should be mailed to: Commissioner of Patents and Trademarks, P.O. Box 1450, Alexandria, VA 22313-1450 or faxed to 571-273-8300.  Hand delivered responses should be brought to the United States Patent and Trademark Office Customer Service Window: Randolph Building 401 Dulany Street, Alexandria, VA 22314.

/KIMBERLY L EVANS/Examiner, Art Unit 3629                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
/LYNDA JASMIN/Supervisory Patent Examiner, Art Unit 3629