DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 12, and 23 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 8, 9, 10, 11, 12, 19, 20, 21, 22, 23, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Shaji et al. (US-20160098844-A1; hereinafter Shaji) in view of Dhua et al. (US-10176198-B1; hereinafter Dhua), Lu et al. ("Rapid: Gupta et al. (US 20160335120 A1; Gupta).
Regarding Claim 1,
Shaji teaches (US 20160098844 A1) a system for training a machine learning model, the system comprising: one or more hardware processors configured by machine-readable instructions to: 
select a set of training images for a machine learning model (para [0059] In some embodiments of the present invention, the neural network is trained by a process that comprises sampling a neural-net training set having a plurality of training images, wherein the sampling includes forming triplets (Im.sub.1.sup.+, Im.sub.2.sup.+, Im.sub.3.sup.-) where Im.sub.1.sup.+ and Im.sub.2.sup.+ are members of the neural-net training set that are annotated as positive aesthetic images and Im.sub.3.sup.- is a member of the neural-net training set that is annotated as a negative aesthetic image.); 
extract stylistic features from each training image to generate a stylistic feature tensor for each training image (para [0010] In some embodiments, the method further comprises extracting manually selected features from the image, wherein the manually selected features are indicative of aesthetic quality. In these embodiments, the method further comprises encoding the extracted manually selected features into a high-dimensional feature vector. In some embodiments, the neural network includes a fully connected layer which takes the encoded manually selected feature vector as an additional input. and para [0072] In a preferred embodiment, manually selected aesthetic features are inputs to the last layer of the fully connected stack 822.); 
determine an engagement metric for each training image, the engagement metric corresponding to a performance score (para [0017] The processor is further configured to apply a machine-learned model to assign an aesthetic score to the image, wherein a more aesthetically-pleasing image is given a higher aesthetic score and a less aesthetically-pleasing image is given a lower aesthetic score. The learned features are inputs to the machine-learned model.); 
train a neural network comprising a plurality of nodes arranged in a plurality of sequential layers including an input layer (fig. 8; para [0072] In one embodiment, the stack of convolution filters 802 includes two layers of convolution filters, with the first layer being the input of the second layer.) and an output layer downstream from the input layer (fig. 8; para [0072] In a preferred embodiment, manually selected aesthetic features are inputs to the last layer of the fully connected stack 822.), wherein training the neural network includes: 
propagating information included in the stylistic feature tensor for each training image through the subset of the plurality of sequential layers of the neural network … (para [0072] In a preferred embodiment, manually selected aesthetic features are inputs to the last layer of the fully connected stack 822.), 
Shaji does not explicitly disclose
extract object features from each training image to generate an object tensor for each training image; 

selecting, based on a size of the stylistic feature tensor for the set of training images, a subset of the plurality of sequential layers not including the input layer;
… wherein the layers of the neural network comprise at least a classification layer to determine probabilities for each of a plurality of ranges of performance scores for a candidate image.
However, Dhua 
extract object features from each training image to generate an object tensor for each training image (Col. 7 lines 26-39 Embodiments of the present invention can use the penultimate layer of the CNN as the feature vector. As discussed above, the CNN can be trained for object recognition, that is, this network is trained to recognize specific objects, types of scenes, or similar subject matter. Examples of objects that this network is trained to recognize may include people, faces, cars, boats, airplanes, buildings, fruits, vases, birds, animals, furniture, clothing etc. As discussed herein, a subject may include one or more objects which define a particular type of scene. For example, subjects may include landscapes, cityscapes, portraits, night skies, or other subject matter The object feature vector may indicate the object affinity of a given image (e.g., how similar an object depicted in an image is to a trained object)); 
propagating information included in the object tensor for each training image through each layer of the neural network including the input layer (Col. 6 lines 7-21; There is an input layer which along with a set of adjacent layers forms the convolution portion of the network. The bottom layer of the convolution layer along with a lower layer and an output layer make up the fully connected portion of the network. From the input layer, a number of output values can be determined from the output layer, which can include several items determined to be related to an input item, among other such options. CNN is trained on a similar data set (which includes people, faces, cars, boats, airplanes, buildings, landscapes, fruits, vases, birds, animals, furniture, clothing etc.), so it learns the best feature representation of a desired object represented for this type of image. The trained CNN is used as a feature extractor: input image is passed through the network and intermediate outputs of layers can be used as feature descriptors of the input image.); and 
… wherein the neural network comprises a classification layer to determine probabilities for each of a plurality of ranges of performance scores for a candidate image (Col. 7 lines 49-59 Embodiments of the present invention can use a classification score generated by the classification layer of the CNN to generate a local feature weight and an object recognition weight. The classification score generated by the CNN indicates how close the subject of the query image is to an object the CNN has been trained to identify. As such, high scores correspond to a high likelihood that the subject of the query image is one or more specific objects, whereas low scores indicate that the subject of the query image is likely not an object or is an object that the CNN has not been trained to identify. Col. 10 lines 44 – Col. 11 lines 32 As shown above, when the CNN classifier calculates a high confidence score (indicating a high likelihood of a specific object or subject being depicted in the query image)…For example, a very high confidence score (such as a score that is greater than 0.95) may cause the results to be automatically filtered, whereas a lower confidence score, such as 0.75 or 0.8, may cause the filter option to be displayed to the user.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of predicting an appeal of an image of Shaji et al. with the method of object detection of Dhua et al.
Doing so would allow for identifying objects in images a user is interested in (Col. 3 lines 11-15 In determining which recommendations to provide, it can be desirable in at least some embodiments to determine content that is likely to be viewed and/or objects that are likely to be consumed by a user based at least in part upon information known for the user.).
Lu teaches
selecting, based on a size of the stylistic feature tensor for the set of training images, a subset of the plurality of sequential layers not including the input layer Fig. 2; Pg. 460, section 2.1; The second convolutional layer filters the output of the first convolutional layer with 64 kernels of the size 5 × 5 × 64. Each of the third and forth convolutional layers has 64 kernels of the size 3×3×64, and the two fully-connected layers have 1000 and 256 neurons respectively. Suppose for the input patch Ip of the i-th image, we have the feature representation xi extracted from layer fc256 (the outcome of the convolutional layers and the fc1000 layers)… The size of the outputted features are based on the size of the kernel. The output of the convolutional layers are known as feature maps which are not explicitly disclose by Lu, however it is taught by Gupta.  See figure 2. Pg. 461, section 2.2; We take the two 256 × 1 vectors from each of the fc256 layer and jointly train the weights of the final fully-connected layer. Figure 4 shows the style attribute and figure 3 shows the structure of the CNN corresponding to the style attribute.); and 
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the implementation of a CNN of Shaji with the implementation of a CNN of Lin.
	Doing so would allow for improving the accuracy of the aesthetic categorization (pg. 457; In addition, we utilize the style attributes of images to help improve the aesthetic quality categorization accuracy.).
	Gupta (US 20160335120 A1) teaches
propagating information included in the stylistic feature tensor for each training image through the subset of the plurality of sequential layers of the neural network... (para [0076] "oclBase" specifies the environment to be used in order to reach the FPGA platform and the kernel to be used. "conv_ tensor_in" is a handle to the input tensor. For the first convolution layer operation, the input tensor includes the image to be classified. "conv_filter0" specifies the weights of the convolution filter kernel. "conv_ tensor_out0" is a handle to the output tensor. The output tensor, after completion of the first convolution layer operation, contains the convolution layer operation output, i.e., the feature maps resulting from the convolution layer operation performed on the input image.).

	Doing so would allow for the implementations of CNNs on FPGAs (para [0026] Once programmed into the FPGA (152) the kernel may be run-time configurable, thus allowing the parameterization of kernel behavior at run-time, without requiring reprogramming of the kernel into the FPGA. The library of kernels may include various kernels configured to perform different operations. One kernel may, for example, be configured to implement one or more operations of a convolutional neural network, whereas another kernel may be configured to perform matrix multiplications, recurrent neural networks, convolutions or de-convolutions, etc. Those skilled in the art will recognize that the kernel library is not limited to the above operations.)
Regarding Claim 8,
Shaji et al., Dhua et al., Gupta, and Lu teach the system of claim 1. Dhua et al. further teaches wherein the machine-readable instructions that cause the one or more hardware processors to extract the object features from each training image further comprise instructions to: 
propagate data corresponding to each training image through an object detection neural network comprising an input layer, a plurality of intermediate layers, and an output layer (Col. 6 lines 5-14 Different layers of the network can be composed for different purposes, such as convolution and sub-sampling. There is an input layer which along with a set of adjacent layers forms the convolution portion of the network. The bottom layer of the convolution layer along with a lower layer and an output layer make up the fully connected portion of the network. From the input layer, a number of output values can be determined from the output layer, which can include several items determined to be related to an input item, among other such options.); and 
extract outputs from at least one of the plurality of intermediate layers of the object detection neural network (Col. 6 lines 11-22 From the input layer, a number of output values can be determined from the output layer, which can include several items determined to be related to an input item, among other such options. CNN is trained on a similar data set (which includes people, faces, cars, boats, airplanes, buildings, landscapes, fruits, vases, birds, animals, furniture, clothing etc.), so it learns the best feature representation of a desired object represented for this type of image. The trained CNN is used as a feature extractor: input image is passed through the network and intermediate outputs of layers can be used as feature descriptors of the input image.).
Regarding Claim 9,
Shaji et al., Dhua et al., Gupta, and Lu teach the system of claim 1. Dhua et al. further teaches wherein the one or more hardware processors are further configured by machine-readable instructions to: 
extract scene features from each training image to generate a scene tensor for each training image (Fig. 3A and 3C; Col. 7 lines 26-39 Embodiments of the present invention can use the penultimate layer of the CNN as the feature vector. As discussed above, the CNN can be trained for object recognition, that is, this network is trained to recognize specific objects, types of scenes, or similar subject matter. Examples of objects that this network is trained to recognize may include people, faces, cars, boats, airplanes, buildings, fruits, vases, birds, animals, furniture, clothing etc. As discussed herein, a subject may include one or more objects which define a particular type of scene. For example, subjects may include landscapes, cityscapes, portraits, night skies, or other subject matter The object feature vector may indicate the object affinity of a given image (e.g., how similar an object depicted in an image is to a trained object).); 
propagate information included in the concatenated tensor for each training image through each layer of the neural network including the input layer of the neural network (Col. 6 lines 5-14 and Col. 6 lines 11-22 From the input layer, a number of output values can be determined from the output layer, which can include several items determined to be related to an input item, among other such options. CNN is trained on a similar data set (which includes people, faces, cars, boats, airplanes, buildings, landscapes, fruits, vases, birds, animals, furniture, clothing etc.), so it learns the best feature representation of a desired object represented for this type of image. The trained CNN is used as a feature extractor: input image is passed through the network and intermediate outputs of layers can be used as feature descriptors of the input image.).
Shaji et al. teaches
concatenate the object tensor and the scene tensor for each training image to generate a concatenated tensor for each training image (para [0061] In other embodiments, the high-dimensional feature vector is reduced and then concatenated with the learned features generated from executing the neural network. In this embodiment, the concatenated features are then fed as input into the machine-learned model.); 
neural network to then be fed as input into the machine-learned model.).
Regarding Claim 10,
Shaji et al., Dhua et al., Gupta, and Lu teach the system of claim 1. Dhua et al. further teaches wherein the one or more hardware processors are further configured by machine-readable instructions to: 
Dhua et al. further teaches extract at least one of a set of intensity features, a set of color features, a set of composition features, a set of contrast features, and a set of blurriness features from each training image (Col. 14 lines 33-41 For a given apparel item, a color classifier and a pattern classifier can be trained to produce classification confidences. If a confidence score associated with the pattern classifier is higher than a confidence score associated with the color classifier, then the pattern features can be more highly weighted and the weight applied to the color features can be reduced, resulting, for example, with related items that include color variations for the same or similar pattern.); 
Shen et al. further teaches propagate the at least one of the set features from each training image through a second subset of the layers of the neural network not including the input layer to further train the neural network (para [0063]).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of predicting an appeal of an image of Shen et al. with the color features of Dhua et al.
Col. 3 lines 45-49 The similarity of different types of features may be weighted differently to provide visually similar images that are similar across a variety of different visual characteristics, such as color theme and distribution, brushwork, etc.).
Regarding Claim 11,
Shaji et al., Dhua et al., Gupta, and Lu teach the system of claim 1. Shaji et al. further teaches wherein the one or more hardware processors are further configured by machine-readable instructions to: 
identify a candidate image (para [0013] In some embodiments, the neural network is trained by a process comprising sampling a neural-net training set having a plurality of training images, wherein the sampling includes forming triplets (Im.sub.1.sup.+, Im.sub.2.sup.+, Im.sub.3.sup.-) where Im.sub.1.sup.+ and Im.sub.2.sup.+ are members of the neural-net training set that are annotated as positive aesthetic images and Im.sub.1.sup.- is a member of the neural-net training set that is annotated as a negative aesthetic image.); 
48 of 564835-9044-4164.1Attorney Docket No. 107853-0115propagate data corresponding to the candidate image through the neural network to determine a performance score for the candidate image, subsequent to training the neural network (para [0019] The method also includes, for each image in the set of images, assigning a score to the image, wherein the score is based on user interactions with the image and an aesthetic score of the image.);
Regarding Claim 12,

selecting a set of training images for a machine learning model (para [0059] In some embodiments of the present invention, the neural network is trained by a process that comprises sampling a neural-net training set having a plurality of training images, wherein the sampling includes forming triplets (Im.sub.1.sup.+, Im.sub.2.sup.+, Im.sub.3.sup.-) where Im.sub.1.sup.+ and Im.sub.2.sup.+ are members of the neural-net training set that are annotated as positive aesthetic images and Im.sub.3.sup.- is a member of the neural-net training set that is annotated as a negative aesthetic image.); 
extracting stylistic features from each training image to generate a stylistic feature tensor for each training image (para [0010] In some embodiments, the method further comprises extracting manually selected features from the image, wherein the manually selected features are indicative of aesthetic quality. In these embodiments, the method further comprises encoding the extracted manually selected features into a high-dimensional feature vector. In some embodiments, the neural network includes a fully connected layer which takes the encoded manually selected feature vector as an additional input. and para [0072] In a preferred embodiment, manually selected aesthetic features are inputs to the last layer of the fully connected stack 822.); 
determining an engagement metric for each training image, the engagement metric corresponding to a performance score (para [0017] The processor is further configured to apply a machine-learned model to assign an aesthetic score to the image, wherein a more aesthetically-pleasing image is given a higher aesthetic score and a less aesthetically-pleasing image is given a lower aesthetic score. The learned features are inputs to the machine-learned model.); 
training a neural network comprising a plurality of nodes arranged in a plurality of sequential layers including an input layer (fig. 8; para [0072] In one embodiment, the stack of convolution filters 802 includes two layers of convolution filters, with the first layer being the input of the second layer.) and an output layer downstream from the input layer (fig. 8; para [0072] In a preferred embodiment, manually selected aesthetic features are inputs to the last layer of the fully connected stack 822.), wherein training the neural network includes: 
propagating information included in the stylistic feature tensor for each training image through the subset of the plurality of sequential layers of the neural network, … (para [0072] In a preferred embodiment, manually selected aesthetic features are inputs to the last layer of the fully connected stack 822.), 
Shaji does not explicitly disclose
extracting object features from each training image to generate an object tensor for each training image; 
propagating information included in the object tensor for each training image through each layer of the neural network including the input layer; and 
selecting, based on a size of the stylistic feature tensor for the set of training images, a subset of the plurality of sequential layers not including the input layer;

However, Dhua teaches
extracting object features from each training image to generate an object tensor for each training image (Col. 7 lines 26-39 Embodiments of the present invention can use the penultimate layer of the CNN as the feature vector. As discussed above, the CNN can be trained for object recognition, that is, this network is trained to recognize specific objects, types of scenes, or similar subject matter. Examples of objects that this network is trained to recognize may include people, faces, cars, boats, airplanes, buildings, fruits, vases, birds, animals, furniture, clothing etc. As discussed herein, a subject may include one or more objects which define a particular type of scene. For example, subjects may include landscapes, cityscapes, portraits, night skies, or other subject matter The object feature vector may indicate the object affinity of a given image (e.g., how similar an object depicted in an image is to a trained object)); 
propagating information included in the object tensor for each training image through each layer of the neural network including the input layer (Col. 6 lines 7-21; There is an input layer which along with a set of adjacent layers forms the convolution portion of the network. The bottom layer of the convolution layer along with a lower layer and an output layer make up the fully connected portion of the network. From the input layer, a number of output values can be determined from the output layer, which can include several items determined to be related to an input item, among other such options. CNN is trained on a similar data set (which includes people, faces, cars, boats, airplanes, buildings, landscapes, fruits, vases, birds, animals, furniture, clothing etc.), so it learns the best feature representation of a desired object represented for this type of image. The trained CNN is used as a feature extractor: input image is passed through the network and intermediate outputs of layers can be used as feature descriptors of the input image.); and 
wherein the neural network comprises a classification layer to determine probabilities for each of a plurality of ranges of performance scores for a candidate image (Col. 7 lines 49-59 Embodiments of the present invention can use a classification score generated by the classification layer of the CNN to generate a local feature weight and an object recognition weight. The classification score generated by the CNN indicates how close the subject of the query image is to an object the CNN has been trained to identify. As such, high scores correspond to a high likelihood that the subject of the query image is one or more specific objects, whereas low scores indicate that the subject of the query image is likely not an object or is an object that the CNN has not been trained to identify. Col. 10 lines 44 – Col. 11 lines 32 As shown above, when the CNN classifier calculates a high confidence score (indicating a high likelihood of a specific object or subject being depicted in the query image)…For example, a very high confidence score (such as a score that is greater than 0.95) may cause the results to be automatically filtered, whereas a lower confidence score, such as 0.75 or 0.8, may cause the filter option to be displayed to the user.).

Doing so would allow for identifying objects in images a user is interested in (Col. 3 lines 11-15 In determining which recommendations to provide, it can be desirable in at least some embodiments to determine content that is likely to be viewed and/or objects that are likely to be consumed by a user based at least in part upon information known for the user.).
Lin teaches
selecting, based on a size of the stylistic feature tensor for the set of training images, a subset of the plurality of sequential layers not including the input layer Fig. 2; Pg. 460, section 2.1; The second convolutional layer filters the output of the first convolutional layer with 64 kernels of the size 5 × 5 × 64. Each of the third and forth convolutional layers has 64 kernels of the size 3×3×64, and the two fully-connected layers have 1000 and 256 neurons respectively. Suppose for the input patch Ip of the i-th image, we have the feature representation xi extracted from layer fc256 (the outcome of the convolutional layers and the fc1000 layers)… The size of the outputted features are based on the size of the kernel. The output of the convolutional layers are known as feature maps which are not explicitly disclose by Lin, however it is taught by Gupta.  See figure 2. Pg. 461, section 2.2; We take the two 256 × 1 vectors from each of the fc256 layer and jointly train the weights of the final fully-connected layer. Figure 4 shows the style attribute and figure 3 shows the structure of the CNN corresponding to the style attribute.); and 

	Doing so would allow for improving the accuracy of the aesthetic categorization (pg. 457; In addition, we utilize the style attributes of images to help improve the aesthetic quality categorization accuracy.).
	Gupta (US 20160335120 A1) teaches
propagating information included in the stylistic feature tensor for each training image through the subset of the plurality of sequential layers of the neural network... (para [0076] "oclBase" specifies the environment to be used in order to reach the FPGA platform and the kernel to be used. "conv_ tensor_in" is a handle to the input tensor. For the first convolution layer operation, the input tensor includes the image to be classified. "conv_filter0" specifies the weights of the convolution filter kernel. "conv_ tensor_out0" is a handle to the output tensor. The output tensor, after completion of the first convolution layer operation, contains the convolution layer operation output, i.e., the feature maps resulting from the convolution layer operation performed on the input image.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the implementation of a CNN of Shaji with the implementation of a CNN of Gupta.
	Doing so would allow for the implementations of CNNs on FPGAs (para [0026] Once programmed into the FPGA (152) the kernel may be run-time configurable, thus allowing the parameterization of kernel behavior at run-time, without requiring reprogramming of the kernel into the FPGA. The library of kernels may include various kernels configured to perform different operations. One kernel may, for example, be configured to implement one or more operations of a convolutional neural network, whereas another kernel may be configured to perform matrix multiplications, recurrent neural networks, convolutions or de-convolutions, etc. Those skilled in the art will recognize that the kernel library is not limited to the above operations.)
Regarding Claim 19,
Claim 19 is the method corresponding to the system of claim 1. Claim 19 is substantially similar to claim 8 and is rejected on the same grounds.
Regarding Claim 20,
Claim 20 is the method corresponding to the system of claim 1. Claim 20 is substantially similar to claim 9 and is rejected on the same grounds.
Regarding Claim 21,
Claim 21 is the method corresponding to the system of claim 1. Claim 21 is substantially similar to claim 10 and is rejected on the same grounds.
Regarding Claim 22,
Claim 22 is the method corresponding to the system of claim 1. Claim 22 is substantially similar to claim 11 and is rejected on the same grounds.
Regarding Claim 23,
Shaji teaches a non-transitory computer-readable storage medium having instructions embodied thereon, the instructions being executable by one or more processors to perform a method for training a machine learning model, the method comprising: 
para [0059] In some embodiments of the present invention, the neural network is trained by a process that comprises sampling a neural-net training set having a plurality of training images, wherein the sampling includes forming triplets (Im.sub.1.sup.+, Im.sub.2.sup.+, Im.sub.3.sup.-) where Im.sub.1.sup.+ and Im.sub.2.sup.+ are members of the neural-net training set that are annotated as positive aesthetic images and Im.sub.3.sup.- is a member of the neural-net training set that is annotated as a negative aesthetic image.); 
extracting stylistic features from each training image to generate a stylistic feature tensor for each training image (para [0010] In some embodiments, the method further comprises extracting manually selected features from the image, wherein the manually selected features are indicative of aesthetic quality. In these embodiments, the method further comprises encoding the extracted manually selected features into a high-dimensional feature vector. In some embodiments, the neural network includes a fully connected layer which takes the encoded manually selected feature vector as an additional input. and para [0072] In a preferred embodiment, manually selected aesthetic features are inputs to the last layer of the fully connected stack 822.); 
determining an engagement metric for each training image, the engagement metric corresponding to a performance score (para [0017] The processor is further configured to apply a machine-learned model to assign an aesthetic score to the image, wherein a more aesthetically-pleasing image is given a higher aesthetic score and a less aesthetically-pleasing image is given a lower aesthetic score. The learned features are inputs to the machine-learned model.); 
training a neural network comprising a plurality of nodes arranged in a plurality of sequential layers including an input layer (fig. 8; para [0072] In one embodiment, the stack of convolution filters 802 includes two layers of convolution filters, with the first layer being the input of the second layer.) and an output layer downstream from the input layer (fig. 8; para [0072] In a preferred embodiment, manually selected aesthetic features are inputs to the last layer of the fully connected stack 822.), wherein training the neural network includes: 
propagating information included in the stylistic feature tensor for each training image through the subset of the plurality of sequential layers of the neural network,… (para [0072] In a preferred embodiment, manually selected aesthetic features are inputs to the last layer of the fully connected stack 822.).
Shaji does not explicitly disclose
extracting object features from each training image to generate an object tensor for each training image; 

	propagating information included in the object tensor for each training image through each layer of the neural network including the input layer; and 
selecting, based on a size of the stylistic feature tensor for the set of training images, a subset of the plurality of sequential layers not including the input layer;
wherein the neural network comprises a classification layer to determine probabilities for each of a plurality of ranges of performance scores for a candidate image
However, Dhua teaches
extracting object features from each training image to generate an object tensor for each training image (Col. 7 lines 26-39 Embodiments of the present invention can use the penultimate layer of the CNN as the feature vector. As discussed above, the CNN can be trained for object recognition, that is, this network is trained to recognize specific objects, types of scenes, or similar subject matter. Examples of objects that this network is trained to recognize may include people, faces, cars, boats, airplanes, buildings, fruits, vases, birds, animals, furniture, clothing etc. As discussed herein, a subject may include one or more objects which define a particular type of scene. For example, subjects may include landscapes, cityscapes, portraits, night skies, or other subject matter The object feature vector may indicate the object affinity of a given image (e.g., how similar an object depicted in an image is to a trained object)); 
	propagating information included in the object tensor for each training image through each layer of the neural network including the input layer (Col. 6 lines 7-21; There is an input layer which along with a set of adjacent layers forms the convolution portion of the network. The bottom layer of the convolution layer along with a lower layer and an output layer make up the fully connected portion of the network. From the input layer, a number of output values can be determined from the output layer, which can include several items determined to be related to an input item, among other such options. CNN is trained on a similar data set (which includes people, faces, cars, boats, airplanes, buildings, landscapes, fruits, vases, birds, animals, furniture, clothing etc.), so it learns the best feature representation of a desired object represented for this type of image. The trained CNN is used as a feature extractor: input image is passed through the network and intermediate outputs of layers can be used as feature descriptors of the input image.); and 
…wherein the neural network comprises a classification layer to determine probabilities for each of a plurality of ranges of performance scores for a candidate image (Col. 7 lines 49-59 Embodiments of the present invention can use a classification score generated by the classification layer of the CNN to generate a local feature weight and an object recognition weight. The classification score generated by the CNN indicates how close the subject of the query image is to an object the CNN has been trained to identify. As such, high scores correspond to a high likelihood that the subject of the query image is one or more specific objects, whereas low scores indicate that the subject of the query image is likely not an object or is an object that the CNN has not been trained to identify. Col. 10 lines 44 – Col. 11 lines 32 As shown above, when the CNN classifier calculates a high confidence score (indicating a high likelihood of a specific object or subject being depicted in the query image)…For example, a very high confidence score (such as a score that is greater than 0.95) may cause the results to be automatically filtered, whereas a lower confidence score, such as 0.75 or 0.8, may cause the filter option to be displayed to the user.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of predicting an appeal of an image of Shaji et al. with the method of object detection of Dhua et al.
Doing so would allow for identifying objects in images a user is interested in (Col. 3 lines 11-15 In determining which recommendations to provide, it can be desirable in at least some embodiments to determine content that is likely to be viewed and/or objects that are likely to be consumed by a user based at least in part upon information known for the user.).
Lin teaches
selecting, based on a size of the stylistic feature tensor for the set of training images, a subset of the plurality of sequential layers not including the input layer Fig. 2; Pg. 460, section 2.1; The second convolutional layer filters the output of the first convolutional layer with 64 kernels of the size 5 × 5 × 64. Each of the third and forth convolutional layers has 64 kernels of the size 3×3×64, and the two fully-connected layers have 1000 and 256 neurons respectively. Suppose for the input patch Ip of the i-th image, we have the feature representation xi extracted from layer fc256 (the outcome of the convolutional layers and the fc1000 layers)… The size of the outputted features are based on the size of the kernel. The output of the convolutional layers are known as feature maps which are not explicitly disclose by Lin, however it is taught by Gupta.  See figure 2. Pg. 461, section 2.2; We take the two 256 × 1 vectors from each of the fc256 layer and jointly train the weights of the final fully-connected layer. Figure 4 shows the style attribute and figure 3 shows the structure of the CNN corresponding to the style attribute.); and 
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the implementation of a CNN of Shaji with the implementation of a CNN of Lin.
	Doing so would allow for improving the accuracy of the aesthetic categorization (pg. 457; In addition, we utilize the style attributes of images to help improve the aesthetic quality categorization accuracy.).
	Gupta (US 20160335120 A1) teaches
propagating information included in the stylistic feature tensor for each training image through the subset of the plurality of sequential layers of the neural network... (para [0076] "oclBase" specifies the environment to be used in order to reach the FPGA platform and the kernel to be used. "conv_ tensor_in" is a handle to the input tensor. For the first convolution layer operation, the input tensor includes the image to be classified. "conv_filter0" specifies the weights of the convolution filter kernel. "conv_ tensor_out0" is a handle to the output tensor. The output tensor, after completion of the first convolution layer operation, contains the convolution layer operation output, i.e., the feature maps resulting from the convolution layer operation performed on the input image.).

	Doing so would allow for the implementations of CNNs on FPGAs (para [0026] Once programmed into the FPGA (152) the kernel may be run-time configurable, thus allowing the parameterization of kernel behavior at run-time, without requiring reprogramming of the kernel into the FPGA. The library of kernels may include various kernels configured to perform different operations. One kernel may, for example, be configured to implement one or more operations of a convolutional neural network, whereas another kernel may be configured to perform matrix multiplications, recurrent neural networks, convolutions or de-convolutions, etc. Those skilled in the art will recognize that the kernel library is not limited to the above operations.)
Regarding Claim 30,
Claim 30 is the computer-readable storage medium corresponding to the system of claim 1. Claim 30 is substantially similar to claim 8 and is rejected on the same grounds.

Claims 2, 13, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Shaji et al. (US-20160098844-A1; hereinafter Shaji) in view of Dhua et al. (US-10176198-B1; hereinafter Dhua), Lu et al. ("Rapid: Rating pictorial aesthetics using deep learning."; hereinafter Lu), Gupta et al. (US 20160335120 A1; Gupta), and Lan et al. (US-20190080176-A1; hereinafter Lan).
Regarding Claim 2,

However, Lan et al. (US 20190080176 A1) teaches 
wherein the layers of the neural network further comprise a regression layer, downstream from the classification layer (para [0038] The classification layer sometimes may also be referred to as a SoftMax layer in the neural network for the classification task.), to determine a first 45 of 56 4835-9044-4164.1Attorney Docket No. 107853-0115 performance score based on the probabilities determined by the classification layer (para [0063] As shown, thirty-five features 712 from the last layer 318 of the feature learning sub-network 310 or the FC layer 332 of the regression sub-network 330 (if being included) are arranged in a matrix 710 with each row including seven features 712. Each row of features is multiplied by one of the five probabilities).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network for image classification of Shaji with the neural network for image classification of Lan et al.
Doing so would allow for improved accuracy and efficiency (para [0019] Instead, a smaller amount of information characterizing the entity or entities in the frames may be extracted to train the model, which will help improve the accuracy and efficiency of the training process.).
Regarding Claim 13,

Regarding Claim 24,
Claim 24 is the computer-readable storage medium corresponding to the system of claim 1. Claim 24 is substantially similar to claim 2 and is rejected on the same grounds.
Claims 3, 14, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Shaji et al. (US-20160098844-A1; hereinafter Shaji) in view of Dhua et al. (US-10176198-B1; hereinafter Dhua), Lu et al. ("Rapid: Rating pictorial aesthetics using deep learning."; hereinafter Lu), Gupta et al. (US 20160335120 A1; Gupta), Rae et al. (US-20120030711-A1), and Dawson (US-8494897-B1).
Regarding Claim 3,
Shaji et al., Dhua et al., Gupta, and Lu teach the system of claim 1. 
Shaji et al., Dhua et al., Gupta, and Lu do not explicitly disclose identify a target audience; determine a first web-based property associated with the target audience, the first web-based property including a plurality of images, based on engagement of the target audience with the plurality of images; determine a plurality of additional web-based properties associated with the target audience, the plurality of additional web-based properties each including a respective additional plurality of images; identify a subset of web-based properties from among the first web-based property and the additional web-based properties that are most uniquely visited by the target audience; select the set of training images from among the images included in the subset of web-based properties.
However, Rae et al. teaches
para [0021] In one embodiment, a system or method to predict media preferences may be employed to predict photos of interest to one or more users. This may, for example, be of assistance in navigating large media repositories, which are available via the Internet, for example. In accordance with claimed subject matter, in at least one embodiment, a decision process that is at least partially based on one or more statistical processes may be applied to determine a classification for media content, such as an image or photo.); 
determine a first web-based property associated with the target audience, the first web-based property including a plurality of images, based on engagement of the target audience with the plurality of images (para [0020] Social media sharing websites typically allow individuals to share a photo or image collection with others having similar interests. Users may spend hours searching, exploring or viewing photos of interest. Users might, for example, post photos to various groups, tag photos of others, provide ratings, comment on photos of interest, or mark a photo as a "Favorite." Marking a photo as a Favorite may operate as a bookmark, allowing for fast access in the future, for example. Millions of photos are typically uploaded to social media sharing websites every day. Well-known examples of social media sharing sites include the following: the Flickr.RTM. website, the Picasa Web.RTM. website and the Facebook.RTM. website. Flickr is a web-based property.);
determine a plurality of additional web-based properties associated with the target audience, the plurality of additional web-based properties each including a para [0020] Social media sharing websites typically allow individuals to share a photo or image collection with others having similar interests. Users may spend hours searching, exploring or viewing photos of interest. Users might, for example, post photos to various groups, tag photos of others, provide ratings, comment on photos of interest, or mark a photo as a "Favorite." Marking a photo as a Favorite may operate as a bookmark, allowing for fast access in the future, for example. Millions of photos are typically uploaded to social media sharing websites every day. Well-known examples of social media sharing sites include the following: the Flickr.RTM. website, the Picasa Web.RTM. website and the Facebook.RTM. website. Picasa and facebook are additional web-based properties.); 
identify a subset of web-based properties from among the first web-based property and the additional web-based properties that are most uniquely visited by the target audience (para [0059] A graph provided in FIG. 3 illustrates a sampled distribution of the number of Favorites per user for users of the Flickr.RTM. website who use Favorites, for example. The graph is provided on a log-log scale. The x-axis represents ten thousand users, equally sampled from an ordered list of Flickr.RTM. website users that have collected Favorite photos, sorted by descending number of Favorites. Flickr is a subset of Picasa, Facebook, and Flickr.); 
select the set of training images from among the images included in the subset of web-based properties (para [0059] Rather than employing all or most images within a social image website to train a classifier, a representative subset may be employed.).

Doing so would allow for classification of interested users (para [0056] Likewise, a classifier may be applied to a group of users, rather than to individual users. A classifier intended to reflect preferences for an individual may more conveniently weigh recent information more heavily than older information, for example, so that passage of time may be included in making predictions.).
Dawson (US 8494897 B1) teaches
identify a subset of web-based properties from among the first web-based property and the additional web-based properties that are most uniquely visited by the target audience (col. 39 lines 55-59; That means the list of important categories is chosen by analyzing user trails and identifying all websites which are most visited by toolbar users.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine method of classifying websites of Rae with the method of classifying the websites of Dawson.
	Doing so would allow for categorization of websites (col. 39 lines 21-25; For classification of an uncategorized website, the system may use a similarity distance between the uncategorized website and its related websites, if any related site data exists for the website in a related links database.).
Regarding Claim 14,

Regarding Claim 25,
Claim 25 is the computer-readable storage medium corresponding to the system of claim 1. Claim 25 is substantially similar to claim 3 and is rejected on the same grounds.
Claims 4, 5, 15, 16, 26, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Shaji et al. (US-20160098844-A1; hereinafter Shaji) in view of Dhua et al. (US-10176198-B1; hereinafter Dhua), Lu et al. ("Rapid: Rating pictorial aesthetics using deep learning."; hereinafter Lu), Gupta et al. (US 20160335120 A1; Gupta), and Rae et al. (US-20120030711-A1).
Regarding Claim 4,
Shaji et al., Dhua et al., Gupta, Lu, and Rae et al. teach the system of claim 3. Rae et al. further teaches wherein the one or more hardware processors are further configured by machine-readable instructions to: 
determine a respective visual influence metric for each of the first web-based property and the plurality of additional web-based properties (para [0030] Favorite images may tend to be visually pleasing, e.g., sharp, high quality, etc. Also, users may focus on a small number of topics of interest, like children, cityscape, nature, or portraits. Likewise, favorite photos may be posted in a group a user is subscribed to, or was taken by a contact in a user's social network. Therefore, visual features may be dependent at least in part on the photo.); 
identify the subset of web-based properties based on the respective visual influence metrics (para [0031] Images may be obtained by mining records of previous behavior in which images, for example, are tagged in some way to indicate some as having been of interest. Information to be used in this matter may, for example, be stored at an online social or photo sharing website. A user need not give any explanation as to why a particular image is preferred, typically. Extracted features may be related to textual or social context, for example. Likewise, visual or image-related features may also be extracted. Visual features may include: color layout descriptor, dominant color descriptor, color or edge directivity descriptor, or edge histogram, for example. Other categories of visual features may include temporal or geographical features, for example, may also be employed).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of predicting engaging images of Shaji et al. with the visual influence metrics of Rae et al.
Doing so would allow for determining images users are not interested in (para [0032] Features of non-favored images may also be employed for training. Images may, for example, be identified by a user, for example, as disfavored images. Gradations in favored or non-favored status are possible and intended to be included with the scope of claimed subject matter.).
Regarding Claim 5,
Shaji et al., Dhua et al., Gupta, Lu, and Rae et al. teach the system of claim 3. Rae et al. further teaches wherein at least one web-based property of the subset of web-based properties comprises a social media account (para [0020] Social media sharing websites typically allow individuals to share a photo or image collection with others having similar interests… Well-known examples of social media sharing sites include the following: the Flickr.RTM. website, the Picasa Web.RTM. website and the Facebook.RTM. website).
Regarding Claim 15,
Claim 15 is the method corresponding to the system of claim 3. Claim 15 is substantially similar to claim 4 and is rejected on the same grounds.
Regarding Claim 16,
Claim 16 is the method corresponding to the system of claim 3. Claim 16 is substantially similar to claim 5 and is rejected on the same grounds.
Regarding Claim 26,
Claim 26 is the computer-readable storage medium corresponding to the system of claim 3. Claim 26 is substantially similar to claim 4 and is rejected on the same grounds.
Regarding Claim 27,
Claim 27 is the computer-readable storage medium corresponding to the system of claim 3. Claim 27 is substantially similar to claim 5 and is rejected on the same grounds.
Claims 6, 7, 17, 18, 28, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Shaji et al. (US-20160098844-A1; hereinafter Shaji) in view of Dhua et al. (US-10176198-B1; hereinafter Dhua), Lu et al. ("Rapid: Rating pictorial aesthetics using deep learning."; hereinafter Lu), Gupta et al. (US 20160335120 A1; Gupta), Rae et al. (US-20120030711-A1), and Yamamoto et al. (US-20190245925-A1)
Regarding Claim 6,
Shaji et al., Dhua et al., Gupta, Lu, and Rae et al. teach the system of claim 3.
Shaji et al., Dhua et al., Gupta, Lu, and Rae et al. do not explicitly disclose wherein the one or more hardware processors are further configured by machine-readable 
However, Yamamoto teaches 
normalize the engagement metric for each training image based on a size of its audience (para [0256] When the total page views of the blog and the number of unique users who have accessed the blog are obtained from the managing DB 53 as the information for calculating the popularity degree index, a value that becomes a source for calculating the popularity degree index is obtained for each of the total number of page views of the blog and the number of unique users who have accessed the blog. For example, a value (a normalized value) obtained by normalizing the total number of page views of the blog into 1 (low popularity) to 100 (high popularity) may be calculated, and a value (a normalized value) obtained by normalizing the number of unique users who have accessed the blog into 1 to 100 may be calculated. That is, executed in the process in the step S403 are a process of adding the normalized value of the total number of page views of the blog and the normalized value of the number of unique users who have accessed the blog, and a process of calculating an average value.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the engagement metric of Shaji et al. with the normalization of an engagement metric of Yamamoto et al.
Doing so would allow for the populartity of a web-based property to be calculated (para [0256] Accordingly, the popularity degree index of the blog subjected to the process is calculated.).
Regarding Claim 7,
Shaji et al., Dhua et al., Gupta,  Lu, Rae et al., and Yamamoto teach the system of claim 6. Rae et al. further teaches wherein the one or more hardware processors are further configured by machine-readable instructions to normalize the engagement metric for each training image based on engagement of the audience with the plurality of images included in the respective web-based property over time (Para [0056] A classifier intended to reflect preferences for an individual may more conveniently weigh recent information more heavily than older information, for example, so that passage of time may be included in making predictions. Additionally, a user's change in taste over time may be accommodated. While a unique classifier per user may yield greater accuracy in predicting which candidate images might be of interest, likewise, a classifier for a group of individuals may be employed to adjust as time passes as well.).
Regarding Claim 17,
Claim 17 is the method corresponding to the system of claim 3. Claim 17 is substantially similar to claim 6 and is rejected on the same grounds.
Regarding Claim 18,
Claim 18 is the method corresponding to the system of claim 6. Claim 18 is substantially similar to claim 7 and is rejected on the same grounds.
Regarding Claim 28,
Claim 28 is the computer-readable storage medium corresponding to the system of claim 3. Claim 28 is substantially similar to claim 6 and is rejected on the same grounds.
Regarding Claim 29,

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217.  The examiner can normally be reached on Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 5712723768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/HENRY NGUYEN/Examiner, Art Unit 2121                                                                                                                                                                                                        




/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121