DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Appeal Brief
In view of the appeal brief filed on 27 May 2021, PROSECUTION IS HEREBY REOPENED. A new ground of rejection is set forth below.
To avoid abandonment of the application, appellant must exercise one of the following two options:
(1) file a reply under 37 CFR 1.111 (if this Office action is non-final) or a reply under 37 CFR 1.113 (if this Office action is final); or,
(2) initiate a new appeal by filing a notice of appeal under 37 CFR 41.31 followed by an appeal brief under 37 CFR 41.37. The previously paid notice of appeal fee and appeal brief fee can be applied to the new appeal. If, however, the appeal fees set forth in 37 CFR 41.20 have been increased since they were previously paid, then appellant must pay the difference between the increased fees and the amount previously paid.
A Supervisory Patent Examiner (SPE) has approved of reopening prosecution by signing below:
/VINCENT RUDOLPH/Supervisory Patent Examiner, Art Unit 2661        
                                                                                                                                                                                             

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 3-4, 6-7, 9-11, 13-14, 16-17, 19-21, 23-24, 26-27, and 29-30 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Independent claims 1, 11, and 21 had been amended to recite, “… wherein the entity processing pipeline operates on the whole image and uses a convolutional neural (CNN) which scans the whole image as a single, whole image to identify a number and type of entities in the whole image …”. Reference is made to the originally filed specification paragraph [00060] and Fig. 4 for support of the amended subject matter. Specification paragraph [00060] recites, 
“As shown in FIG. 4, the entity processing pipeline scans the image 300, identifies and segments potential object locations 400, and assigns an entity class label 402 to each potential object. The entity classifier can also be a CNN whose last layer has been trained on the entity types of interest. A non-limiting example of such an entity processing pipeline was described by the inventors in Literature Reference Nos. 4 through 9, which are incorporated herein by reference. Thus, for each image 300, the entity processing pipeline produces a list of all entities in the image and their types (i.e., class labels 402). In one example embodiment, this is encoded into a bag-of-words histogram feature 404. This feature 404 has a number of dimensions equal to the number of entity classes, and the value at each dimension is the frequency (number) of such entities detected”. 
While specification paragraph [00060] provides support for scanning a whole image by the entity processing pipeline, the originally filed specification does not describe how scanning the whole image is implemented, and does not describe that the convolutional neural network scans the image as a single, whole image. That is, the originally filed specification does not specify whether the image may be scanned in one instance as a single, whole image by the CNN of the entity processing pipeline, or that the whole image may be scanned by the entity processing pipeline in segments of the whole image. Furthermore, review of Literature references Nos. 4 through 9 (i.e. US 8,885,887, US 8,965,115, US 9,008,366, US 9,111,355, US 9,165,208, and Khosla et al., “A Neuromorphic System for Video Object Recognition”; see specification [00040]), which are indicated as incorporated by reference, also fail to provide sufficient description to support that the convolutional neural network scans the image as a single, whole image. 
Thus, the amended independent claims 1, 11, and 21 recite the amended subject matter of, “uses a convolutional neural (CNN) which scans the whole image as a single, whole image”, which comprises an unsupported interpretation that a CNN scans the whole image in one instance as a single, whole image. 
Dependent claims 3-4, 6-7, 9-10, 13-14, 16-17, 19-20, 23-24, 26-27, and 29-30 depend upon their respective independent claims 1, 11, and 21, and thus incorporate the at issue claimed subject matter and are rejected for similar rationale. 
The claims thus contain the subject matter of “uses a convolutional neural (CNN) which scans the whole image as a single, whole image” which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.

Response to Arguments
Applicant's arguments filed in the Appeal Brief of 27 May 2021 have been fully considered but they are not persuasive. 
In response to Applicant’s remarks on p. 7-9 of the Appeal Brief, that the combined teachings of Lin in view of Wang, He, and Shen fail to disclose each and every element of independent claims 1, 11, and 21, the Examiner respectfully disagrees.
Examiner notes the claims are treated with their broadest reasonable interpretations consistent with the specification. See MPEP 2111. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Furthermore, the test for obviousness is what the combined teachings of the references would have suggested to those of ordinary skill in the art. See In re Keller, 642 F.2d 413, 208 USPQ871 (CCPA 1981).
Independent claims 1, 11, and 21 recite the subject matter, “wherein the entity processing pipeline operates on the whole image and uses a convolutional neural (CNN) which scans the whole image as a single, whole image to identify a number and type of entities in the whole image”.
Lin and Wang is relied upon to teach a system for assessing images using a deep convolutional neural network (DCNN) which uses two independent columns for a global and fine-grained view input to classify a scene feature for a corresponding image, where the fine grained input DCNN classifier classifies objects based on sampling the entire image (see Lin [0025], [0033]-[0034], [0053], and Fig. 5; and see Wang [0015] and [0035]). He is relied upon to teach a comparable neural network for object detection which uses a spatial pyramid pooling layer, which allows for arbitrary size and scale input images to be processed by the neural network and processes the entire input image once (see He sect 2.2. The Spatial Pyramid Pooling Layer, sect. 4 SPP-Net for Object Detection, and Fig. 6). One of ordinary skill in the art could have applied He’s known spatial pyramid pooling technique to Lin and Wang’s fine grained input DCNN classifier, predictably resulting in an improved fine grain input to the DCNN classifier which would allow for processing input images of arbitrary size and scale by the neural network and processing the entire input image once. 
Applicant’s assert that the He reference does not use a deep neural network to scan the whole image as a single, whole image, and instead operates by generating candidate windows resizing the image, and processing, with a deep neural network, the candidate windows to extract the feature maps from the image. Examiner disagrees with Applicant’s characterization of the He reference teachings.
	He recites in sect. 4. SPP-NET for Object Detection, “Our SPP-net can also be used for object detection. We extract the feature maps from the entire image only once (possibly at multiple scales). Then we apply the spatial pyramid pooling on each candidate window of the feature maps to pool a fixed-length representation of this window (see Fig. 5). Because the time-consuming convolutions are only applied once, our method can run orders of magnitude faster.”
	He depicts in Fig. 5 and further describes in sect. 3.1.5 Multi-View Testing on Feature Maps, that an entire, resized input image is processed by convolutional layers to produce feature maps, and a window is applied to a portion the feature maps and spatial pyramid pooling is applied to the windowed portion of the feature map to produce pooled fixed-length representation features, in which the pooled fixed-length representation features are fed into fully connected layers to compute a softmax score of the window. 	
As such, the SPP-Net network, taught by He, comprises convolutional neural network layers, spatial pyramid pooling layers, and fully connected layers, where the SPP-Net takes an entire image as input where the entire image is resized and processed by convolutional layers to extract feature maps from the entire image, applies candidate windows to the extracted feature maps generated from processing the entire input image by the convolutional layers, applies spatial pyramid pooling on each candidate window of the feature maps to pool a fixed-length representation of the window, and feeds the pooled fixed-length representation features into fully connected layers. 
In regards to the teachings of He in sect. 4.1 Detection Algorithm, the section recites, “We use the “fast” mode of selective search [20] to generate about 2,000 candidate windows per image. Then we resize the image such that min(w,h) = s, and extract the feature maps from the entire image. We use the SPP-net model of ZF-5 (single-size trained) for the time being. In each candidate window, we use a four-level spatial pyramid (1 x 1, 2 x 2, 3 x 3, 6 x 6, totally 50 bins) to pool the features. This generates a 12,800-d (256 x 50) representation for each window. These representations are provided to the fully-connected layers of the network. Then we train a binary linear SVM classifier for each category on these features.” That is, the SPP-net model, using ZF-5 based neural network architecture which uses five convolutional layers (see He sect. 3.1.1. Baseline Network Architectures), is used to extract feature maps from the entire image. Although He describes using a selective search technique to generate about 2000 candidate windows per image and resizing the image, He describes that the SPP-net model is used to extract feature maps from the image, and not from the candidate windows of the image. He further describes using a four-level spatial pyramid in each candidate window to pool the features from the feature maps, where the feature maps are understood to be extracted from the entire image. He does not teach extracting the feature maps from the 2,000 candidate windows. 
He further recites in sect. 4.1 Detection Algorithm, “Our method can be improved by multi-scale feature extraction. We resize the image such that min(w, h) = s ∈ S = {480, 576, 688, 864, 1,200}, and compute the feature maps of conv5 for each scale. One strategy of combining the features from these scales is to pool them channel-by-channel. But we empirically find that another strategy provides better results. For each candidate window, we choose a single scale s ∈ S such that the scaled candidate window has a number of pixels closest to 224 x 224. Then we only use the feature maps extracted from this scale to compute the feature of this window. If the pre-defined scales are dense enough and the window is approximately square, our method is roughly equivalent to resizing the window to 224 x 224 and then extracting features from it. Nevertheless, our method only requires computing the feature maps once (at each scale) from the entire image, regardless of the number of candidate windows.”. He’s additional teachings further supports that the feature maps are computed once from the entire image for each scale, and not for each candidate window. 
Thus, the combined teachings of Lin, Wang, and He would suggest to one of ordinary skill in the art of applying He’s known spatial pyramid pooling technique to Lin and Wang’s fine grained input DCNN classifier, where by using a spatial pyramid pooling layer, input images of arbitrary size and scale may be processed by the neural network and allows for the entire input image to be processed by convolutional neural network layers once to extract feature maps from the entire input image, in which the feature maps are further processed to identify a number of local visual features and classifying the visual features in the entire image. The combined teachings of Lin, Wang, and He therefore suggests the broadest reasonable interpretation for the claimed limitations of “wherein the entity processing pipeline operates on the whole image and uses a convolutional neural (CNN) which scans the whole image as a single, whole image to identify a number and type of entities in the whole image”, where He’s teachings of extracting feature maps from the entire input image by processing the entire input image once using the convolutional neural network layers suggests “uses a convolutional neural (CNN) which scans the whole image as a single, whole image”.

In response to Applicant’s remarks on p. 10 of the Appeal Brief,  that the dependent claims 3-4, 6-7, 9-10, 13-14, 16-17, 19-20, 23-24, 26-27, and 29-30 are allowable over the cited prior art due to their dependence from independent claims 1, 11, and 21 and relies on the comments directed towards the independent claims, the Examiner respectfully disagrees and refers to the above comments responding to Applicant’s remarks towards the independent claims in view of the combined teachings of Lin, Wang, He, and Shen. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 3, 6, 7, 9, 11, 13, 16, 17, 19, 21, 23, 26, 27, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 2016/0035078), in view of Wang et al. (US 2016/0140424), herein Wang, He et al. (“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”), herein He, and Shen et al. (US 2016/0140408), herein Shen.
Regarding claim 1, Lin discloses a system for scene classification, the system comprising: 
one or more processors and a memory, the memory being a non-transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions (see Lin [0066]-[0068], where a computing device including a memory and one or more processors are used to execute instructions to implement the disclosed invention), the one or more processors perform operations of:
operating at least two parallel, independent processing pipelines on an image to generate independent results (see Lin [0025], [0033]-[0034], [0053], and Fig. 3, where in a DCNN network, networks in different columns are independent, and the inputs of the two columns may be for a global view and fine-grained view input), wherein the at least two parallel, independent processing pipelines includes both an entity processing pipeline and a whole image processing pipeline , wherein the entity processing pipeline uses a convolutional neural network (CNN) to identify a number and type of entities in the image, resulting in an entity feature space (see Lin [0025], [0034], [0053] and Fig. 3, where in a deep convolutional neural network (DCNN), extracted local image representation is used as a fine-grained view input to a column of a DCNN; see also Lin [0021], where columns of the CNN are trained to identify visual features of the inputs from an image), wherein whole image processing pipeline uses a CNN to extract visual features from the whole image, resulting in a visual feature space (see Lin [0025], [0033], [0053] and Fig. 3, where in a deep convolutional neural network (DCNN), extracted global view is used as a global image representation of an image to be used as a global component input to a column of a DCNN; see also Lin [0021], where columns of the CNN are trained to identify visual features of the inputs from an image); and
fusing the independent results of the entity and whole image processing pipelines to generate a fused scene class (see Lin [0053], where at least one layer of the first column and the second column are merged into a fully connected layer and the fully connected layer is jointly trained to classify at least one image feature; and see Lin [0029] and [0046], where the feature may be a scene of the image).
Lin does not explicitly disclose operating the at least two parallel, independent processing pipelines on a whole image, wherein the entity processing pipeline operates on the whole image and that a convolutional neural network is used to identify the number and type of entities in the whole image.
Wang teaches in a related and pertinent systems and methods for classifying vehicles by performing scale aware detection and performing deep CNN fine-grained image classification to classify the vehicle type (see Wang Abstract), where a uniform sampling strategy is suggested where uniform sampling is done with uniformly sampled training images from the entire image (see Wang [0015] and [0035]).
At the time of filing, one of ordinary skill in the art would have found it obvious to use Wang’s technique of performing uniform sampling from an entire image classified by a neural network to the teachings of Lin. This modification is a use of a known technique to improve a similar system in the same way. In this instance, Lin teaches a “base” system for assessing images using a DCNN which uses two independent columns for a global and fine-grained view input to classify a scene feature for a corresponding image; Wang teaches in a “comparable” object-centric fine grained convolutional neural network for classifying vehicle types, where a uniform sampling strategy is used which uniformly samples training images from the entire image; and one of ordinary skill in the art could have applied Wang’s known sampling technique to Lin’s fine grained input DCNN classifier, predictably resulting in sampling from the entire image for the fine grain input to the DCNN classifier. 
Lin and Wang do not explicitly disclose wherein the entity processing pipeline uses a CNN which scans the whole image as a single, whole image. 
He teaches in a related and pertinent deep neural network implementing spatial pyramid pooling for object detection (see He Abstract, and sect. 4 SPP-Net for Object Detection), where spatial pyramid pooling layer is included in the deep neural network (see He sect. 4 SPP-Net for Object Detection, and Fig. 5), which allows for the input image to be any size and allows for arbitrary scales (see He sect. 2.2 The Spatial Pyramid Pooling Layer), and the SPP-net used for object detection extracts the feature maps from the entire image by processing the entire image using the convolutional layers only once, windows are applied to portions of the feature maps and spatial pyramid pooling is applied to the windowed portions of the feature map to produce pooled fixed-length representation features, in which the pooled fixed-length representation features are fed into fully connected layers to compute a softmax score of the window (see He sect. 4 SPP-Net for Object Detection and Fig. 5, “We extract the feature maps from the entire image only once”; see also He sect. 3.1.5 Multi-View Testing on Feature Maps, sect. 4.1 Detection Algorithm, and Fig. 6).
At the time of filing, one of ordinary skill in the art would have found it obvious to use He’s technique of using spatial pyramid pooling layer in the neural network, allowing for arbitrary sized and scaled input images to the teachings of Lin and Wang’s for performing scale aware fine-grained object classification using a neural network, where the entire image can be processed only once for the object detection. This modification is a use of a known technique to improve a similar system in the same way. In this instance, Lin and Wang teach a “base” system for assessing images using a DCNN which uses two independent columns for a global and fine-grained view input to classify a scene feature for a corresponding image, where the fine grained input DCNN classifier classifies objects based on sampling the entire image.  He teaches in a “comparable” neural network for object detection which uses a spatial pyramid pooling layer, allowing for arbitrary size and scale input images to be processed by the neural network, and processes the entire input image once.  One of ordinary skill in the art could have applied He’s known spatial pyramid pooling technique to Lin and Wang’s fine grained input DCNN classifier, where by using a spatial pyramid pooling layer, input images of arbitrary size and scale may be processed by the neural network and allows for the entire input image to be processed by convolutional neural network layers once to extract feature maps from the entire input image, in which the feature maps are further processed to identify a number of local visual features and classifying the visual features in the entire image, and predictably resulting in an improved fine grain input to the DCNN classifier which allows for arbitrary size and scale input images to be processed by the neural network where the entire input image is processed once by the convolutional layers. 
Lin, Wang, and He do not explicitly disclose electronically controlling machine behavior based on the fused scene class of the image or video.
Shen teaches in a related and pertinent neural network patch aggregation technique (see Shen Abstract), where a label is generated and applied to an image for an image attribute that a neural network has output for the whole image (see Shen [0033], [0039], [0063]). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Shen’s technique of applying a label corresponding to a feature of an image classified by a neural network to the teachings of Lin, Wang, and He. This modification is a use of a known technique to improve a similar system in the same way. In this instance, Lin, Wang, and He teach a “base” system for assessing images using a DCNN which uses two independent columns for a global and fine-grained view input to classify a scene feature for a corresponding image. Shen teaches in a “comparable” neural network based image attribute classification method, where a label for the corresponding classified image attribute is generated and applied to the corresponding image. One of ordinary skill in the art could have applied Shen’s known technique of generating and applying a corresponding label to a classified image in the same way to Lin, Wang, and He’s classified image using a DCNN classifier, predictably resulting in applying a label corresponding to the classified scene feature to the classified image. 

Regarding claim 3, please see the above rejection of claim 1. Lin, Wang, He, and Shen disclose the system as set forth in Claim 1, wherein the entity processing pipeline identifies and segments potential object locations within the image or video and assigns a class label to each identified and segmented potential object within the image or video (see Lin [0036] and [0038], where for each input patch in the fine-grained component, a feature representation and label may be extracted).

Regarding claim 6, please see the above rejection of claim 1. Lin, Wang, He, and Shen disclose the system as set forth in Claim 1, wherein in fusing the independent results to generate the fused scene class, the visual feature space and entity feature space are combined into a single multi-dimensional combined feature, with a classifier trained on the combined feature generating the fused scene class (see Lin [0053]-[0054], where the merging of the first and second column, corresponding to the global and fine-grained view representations, into a fully connected layer, joint training the weights associated with the fully connected layer and classifying the at least one feature suggests the combination of the global and fine-grained view representations into a single combined feature and generating a fused classification).

Regarding claim 7, please see the above rejection of claim 1. Lin, Wang, He, and Shen disclose the system as set forth in Claim 1, wherein in fusing the independent results to generate the fused scene class, two classifiers are trained separately for each of the visual feature space and entity feature space to generate independent class probability distributions over scene types, with the independent class probability distributions being combined to generate the fused scene class (see Lin [0051], where a probability of each input being assigned a class for a particular feature and the results being averaged to determine the highest class to be selected suggests a classifier for each global and fine-grained view representations are trained to generate a probability distribution and the corresponding probability results for a class for a feature are combined to determine a fused feature classification).

Regarding claim 9, please see the above rejection of claim 1. Lin, Wang, He, and Shen disclose the system as set forth in Claim 1, wherein electronically controlling machine behavior includes at least one of labeling data associated with the image or video with the fused scene class, displaying the fused scene class with the image or video, controlling vehicle performance, or controlling processor performance (see Shen [0033], [0039], [0063], where a label is generated and applied to an image for an image attribute that a neural network has output for the image).
Regarding claim 11, it recites a computer program product performing the system functions of claim 1. Lin, Wang, He, and Shen teach a computer program product performing the system functions of claim 1 (see Lin [0066]-[0069]). Please see above for detailed claim analysis.

Regarding claim 13, see above rejection for claim 11. It is a computer program product claim reciting similar subject matter as claim 3. Please see above claim 3 for detailed claim analysis as the limitations of claim 13 are similarly rejected.

Regarding claim 16, see above rejection for claim 11. It is a computer program product claim reciting similar subject matter as claim 6. Please see above claim 6 for detailed claim analysis as the limitations of claim 16 are similarly rejected.

Regarding claim 17, see above rejection for claim 11. It is a computer program product claim reciting similar subject matter as claim 7. Please see above claim 7 for detailed claim analysis as the limitations of claim 17 are similarly rejected.

Regarding claim 19, see above rejection for claim 11. It is a computer program product claim reciting similar subject matter as claim 9. Please see above claim 9 for detailed claim analysis as the limitations of claim 19 are similarly rejected.

Regarding claim 21, it recites a computer implemented method performing the system functions of claim 1. Lin, Wang, He, and Shen teach the method by performing the system functions of claim 1. Please see above for detailed claim analysis.

Regarding claim 23, see above rejection for claim 21. It is a method claim reciting similar subject matter as claim 3. Please see above claim 3 for detailed claim analysis as the limitations of claim 23 are similarly rejected.

Regarding claim 26, see above rejection for claim 21. It is a method claim reciting similar subject matter as claim 6. Please see above claim 6 for detailed claim analysis as the limitations of claim 26 are similarly rejected.

Regarding claim 27, see above rejection for claim 21. It is a method claim reciting similar subject matter as claim 7. Please see above claim 7 for detailed claim analysis as the limitations of claim 27 are similarly rejected.

Regarding claim 29, see above rejection for claim 21. It is a method claim reciting similar subject matter as claim 9. Please see above claim 9 for detailed claim analysis as the limitations of claim 29 are similarly rejected.

Claims 4, 14, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Lin, Wang, He, and Shen as applied to claims 1, 11, and 21 above, and further in view of Karpath et al. (“Large-scale Video Classification with Convolutional Neural Networks”), herein Karpath.
Regarding claim 4, please see the above rejection of claim 1. Lin, Wang, He, and Shen do not explicitly disclose the system as set forth in Claim 1, wherein the entity feature space includes a bag of words histogram feature. 
Karpath teaches in a related and pertinent video classification with convolutional neural networks technique (see Karpath Abstract), where the use of bag-of-words histograms are known to be used as features for convolutional neural network based video classification (see Karpath sect. 4.1 Experiments on Sports-1M, Feature histogram baselines).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Karpath’s technique of extracting bag-of-words histogram features from the image to be classified by a neural network to the teachings of Lin, Wang, He, and Shen. This modification is a use of a known technique to improve a similar system in the same way. In this instance, Lin, Wang, He, and Shen teach a “base” system for assessing images using a DCNN which uses two independent columns for a global and fine-grained view input to classify a scene feature for a corresponding image; Karpath teaches in a “comparable” neural network based video classification method, where bag-of-words histogram features are extracted from video images to be classified; and one of ordinary skill in the art could have applied Karpath’s known technique of extracting and using bag-of-words histogram features in the same way to Lin, Wang, He, and Shen’s DCNN classifier, predictably resulting in extracting bag-of-words histogram features for the image patches of the fine-grained view input to classify a scene feature.

Regarding claim 14, see above rejection for claim 11. It is a computer program product claim reciting similar subject matter as claim 4. Please see above claim 4 for detailed claim analysis as the limitations of claim 14 are similarly rejected.

Regarding claim 24, see above rejection for claim 21. It is a method claim reciting similar subject matter as claim 4. Please see above claim 4 for detailed claim analysis as the limitations of claim 24 are similarly rejected.

Claims 10, 20, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Lin, Wang, He, and Shen as applied to claims 1, 11, and 21 above, and further in view of Tasdizen et al. (US 2017/0228616), herein Tasdizen.
Regarding claim 10, please see the above rejection of claim 1. Lin, Wang, He, and Shen do not explicitly disclose the system as set forth in Claim 1, further comprising an operation of displaying the image or video with a label that includes the fused scene class.
Tasdizen teaches in a related and pertinent image classification system and method (see Tasdizen Abstract), where a display module that displays label annotations on a display device corresponding to classification outputs for an input image generated by the classifiers (see Tasdizen [0032] and [0072]).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply Tasdizen’s technique of displaying an applied label corresponding to a feature of an image classified by an image classifier to the teachings of Lin, Wang, He, and Shen. This modification is a use of a known technique to improve a similar system in the same way. In this instance, Lin, Wang, He, and Shen teach a “base” system for assessing images using a DCNN which uses two independent columns for a global and fine-grained view input to classify a scene feature for a corresponding image and applying a corresponding scene feature label to the image; Tasdizen teaches in a “comparable” image classification method, where an image with an applied label for a corresponding classified image attribute is displayed; and one of ordinary skill in the art could have applied Tasdizen’s known technique of displaying an image with an applied label for a corresponding classified image attribute in the same way to Lin, Wang, He, and Shen’s classified image using a DCNN classifier, predictably resulting in displaying the applied scene feature label corresponding to the classified scene feature to the classified image. 

Regarding claim 20, see above rejection for claim 11. It is a computer program product claim reciting similar subject matter as claim 10. Please see above claim 10 for detailed claim analysis as the limitations of claim 20 are similarly rejected.
Regarding claim 30, see above rejection for claim 21. It is a method claim reciting similar subject matter as claim 10. Please see above claim 10 for detailed claim analysis as the limitations of claim 30 are similarly rejected.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIMOTHY WING HO CHOI whose telephone number is (571)270-3814.  The examiner can normally be reached on 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT RUDOLPH can be reached on (571) 272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/TIMOTHY CHOI/

/VINCENT RUDOLPH/Supervisory Patent Examiner, Art Unit 2661