DETAILED ACTION
Response to Amendment
Claims 1-20 are pending. Claims 1-20 are amended.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. In particular, examiner agrees Zhao and Lalonde do not disclose the limitation, “semantically correlated”, however this limitation is now taught by Lee et al. It is noted semantic tags are also taught in the references cited section of the conclusion. Thereby examiner recommends another route to distinguish over the known prior art.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 5, 8, 9, 11, 12, and 15-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“Compositing-aware image search”) in view of Lalonde et .

Regarding claim 1, Zhao et al. disclose a non-transitory computer-readable storage medium having instructions stored thereon for generating context-aware images, which, when executed by a processor of a computing device cause the computing device to perform actions comprising (carry our experiments on the public platform Caffe [20], part 5): receiving an object (bottom stream takes the foreground image, part 3.1, Given an image with object masks, part 4, we insert every possible foreground candidate at that location to generate the composite, 100-400 candidate foreground objects, part 5.1); determining a scene compatibility score for the object and a scene image, wherein the scene compatibility score for the object and the scene image is based on an object classification of the object and a set of image tags associated with the scene image (compatibility of a foreground and background image can be easily measured using the cosine similarity between their corresponding feature vectors, part 1, encourage the feature vectors from compatible foreground and background images to be more similar than those from incompatible pairs, learned features can be directly used to calculate the similarity in terms of compatibility for image compositing, part 3, use image mean values to fill the rectangle that indicates the location where the object to be inserted, so that the information regarding desired object location, size and aspect ratio can be provided to the network., encode the category information into the foreground and background streams, top stream focuses more on scene context, part 3.1, Utilizing those mask annotations, we can decompose the image into background scenes and foreground objects, part 4); determining a color compatibility score for the object and the scene image, wherein the color compatibility score for the object and the scene image is based on a color theme of an embedded image that 



Lalonde et al. teach receiving an object (replacing one of its objects by another one of the same semantic type and shape, select object, part 2); determining a scene compatibility score for the object and a scene image, wherein the scene compatibility score for the object and the scene image is based on an object classification of the object and a set of image tags associated with the scene image (texture matching between images, if a tree matches a particular forest scene, it might be more important to look for similar forest images, which typically have very consistent color palettes, than for green buildings which might exhibit different shades of green and still look realistic, part 3.3); determining a color compatibility score for the object and the scene image, wherein the color compatibility score for the object and the scene image is based on a color theme of an embedded image that includes the object embedded in the scene image (modeling the palette that is likely to co-occur together with a particular color in a real image, part 3.2, find a set of k most similar-looking objects based on color (k-NN), and approximate its expected co-occurring palette by the best- matching background in this k set, part 3.3); determining an overall compatibility score for the object and the scene image, the overall compatibility score for the object and the scene image being based on a combination of the scene compatibility score and the color compatibility score (To evaluate the distance between 

Zhao et al. and Lalonde et al. are in the same art of composite images (Zhao et al., abstract; Lalonde et al., abstract). The combination of Lalonde et al. with Zhao et al. will enable use of a color compatibility score. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the color score of Lalonde et al with the invention of Zhao et al. as Zhao et al. indicates color is known to be an important parameter in the composition (abstract), it would have been known at the time of the invention, the combination would have predictable results, and as Lalonde et al. indicate “our goal is to use color information to automatically predict whether a composite image such as the ones in Figure 1 will look realistic or not to a human observer” and “Finally, from the intuitions gathered while evaluating the global and local approaches, we suggest a way of combining them into a single classifier. Section 5 presents this combined approach and compares it to using either technique by itself. As an additional application, we show in Section 6 how to automatically shift the colors of an unrealistic object to make it look more realistic in its new scene” (part 1.3), which indicates a further adaptation to Zhao et al. to achieve the goal of detecting and creating realistic embedded images.

Zhao et al. and Lalonde et al. do not disclose each tag of the set of image tags is semantically correlated with physical or virtual items that are visually depicted within the scene image.

Lee et al. teach determining a scene compatibility score for the object and a scene image, wherein the scene compatibility score for the object and the scene image is based on an object classification of the object and a set of image tags associated with the scene image, wherein each tag of the set of image tags is semantically correlated with physical or virtual items that are visually depicted within the scene image (“In one embodiment, training engine 122 generates machine learning models for inserting objects into scenes.  The scenes may include semantic representations of images, such as segmentation maps that associate individual pixels in the images with semantic labels.  For example, a segmentation map of an outdoor scene may include regions of pixels that are assigned to labels such as "road," "sky," "building," "bridge," "tree," "ground," "car," and "pedestrian." In turn, machine learning models created by training engine 122 may be used to identify plausible locations of an object in the scene, as well as a plausible sizes and shapes of the objects at the locations.  In various embodiments, the machine learning models may learn "where" in a scene objects can be inserted, as well as "what" the objects look like, such that the object maintains contextual coherence with the scene.”, [0018], “Joint synthesis and placement of objects into scenes may involve combined learning of the location and scale of each object in a given scene, as well as the shapes of each object given the corresponding location and scale.  Continuing with the above example, execution engine 124 may apply a first generator model generated by training engine 122 to a semantic representation of the outdoor scene to identify plausible locations in the scene into which cars, pedestrians, and/or other types of objects can be inserted”, [0019], “For example, a 

Zhao et al. and Lalonde et al. and Lee et al. are in the same art of composite images (Zhao et al., abstract; Lalonde et al., abstract; Ludwigsen et al., [0019]). The combination of Lee et al. with Zhao et al. and Lelonde et al. will enable use of semantic tags. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the tags of Lee et al. with the invention of Zhao et al. and Lalonde et al. as this was known at the time of the invention, the combination would have predictable results, and as Lee et al. indicate this allows for inserting objects into images in a realistic and/or semantically plausible way with applications in image synthesis, augmented reality, virtual reality, and/or domain randomization in machine learning ([0002], [0020]), which will further create visual congruity to the final images when used in conjunction with Zhao et al. and Lalonde et al..

Regarding claim 2, Zhao et al. and Lalonde et al. and Lee et al. disclose the computer-readable storage medium of claim 1. Zhao et al. and Lalonde et al. further indicate the actions further comprising: generating a plurality of embedded images that includes a visual representation of the object embedded in a plurality of scene images (Zhao et al., Fig. 1, inserting a person on the lawn, part 4, evaluation the search results in terms of the compositing quality, part 5.4; Lalonde et al. automatically generate composite images that will look right semantically, Once the best matching object is found, we paste it onto the original image and apply linear feathering along the border to mask out potential seams, part 2, retrieve either a nearest scene, recolor the object to match the colors of similar objects in that nearest scene, automatically improve the realism of composite images, part 6); determining a ranking of the plurality of embedded images based on respective overall compatibility scores for the plurality 

Regarding claims 4 and 11, Zhao et al. and Lalonde et al. and Lee et al. disclose the computer-readable storage medium and method of claims 1 and 8. Zhao et al. and Lalonde et al. further indicate the actions further comprise: determining a position, within the scene image, for the object; determining an orientation, within the scene image, for the object; determining a scale, within the scene image, for the object; generating the embedded image by embedding a visual representation of the object at the position within the scene image, wherein the visual representation of the object is based on the orientation and the scale (By including the filled rectangle in the background image, the learned background features can respond to the location, size and aspect ratio of the object to be inserted when measuring compatibility, part 4, We draw a bounding box on each of the background image in appropriate position that is suitable for object insertion, part 5.1, search results are tuned to location and aspect ratio of the bounding box, Fig. 6).

Regarding claims 5 and 12 and 18, Zhao et al. and Lalonde et al. and Lee et al. disclose the computer-readable storage medium and method and system of claims 4 and 11 and 15. Zhao et al. and Lee et al. further indicate a trained reinforcement learning agent is employed to determine the position, the orientation, or the scale for the object (Zhao, training data, triplet preparation, introduce the data augmentation process to relax the size and scale constraints between paired foreground and background images to a limited extent. augmenting with more positive foreground samples, For semantic context information, since those foreground images are generated from the ones with background scenes, we can fill in the background of those 

Regarding claim 8, Zhao et al. disclose a method (abstract) for generating context-aware images, comprising: steps for receiving a scene image (given a background image, returns compatible foreground objects, abstract, Given a background image as a query, Fig. 1, top stream takes the background scene as input, part 3.1) steps for determining a scene compatibility score for the scene image and an object, wherein the scene compatibility score for the scene image and the object is based on a set of image tags associated with the scene image (compatibility of a foreground and background image can be easily measured using the cosine similarity between their corresponding feature vectors, part 1, encourage the feature vectors from compatible foreground and background images to be more similar than those from incompatible pairs, learned features can be directly used to calculate the similarity in terms of compatibility for image compositing, part 3, use image mean values to fill the rectangle that indicates the location where the object to be inserted, so that the information regarding desired object location, size and aspect ratio can be provided to the network., encode the category information into the foreground and background streams, top stream focuses more on scene 

While Zhao et al. indicate using color as a possible parameter for the matching and multiclass ranking and performing a ranking, and shows in Fig. 1 example resulting embedded images, Zhao does not explicitly disclose a color compatibility score and steps for providing the embedded image, in response to determining that the overall compatibility score is at least as large as other overall compatibility scores for the scene image and other objects.  Zhao does not explicitly disclose wherein each tag of the set of image tags is semantically correlated with physical or virtual items that are visually depicted within the scene image.

Lalonde et al. teach steps for receiving a scene image (forest scene, part 3.3, real scenes, part 4, retrieve either a nearest scene (if the global method is used), or the determination that no matching global scene is available, part 6) steps for determining a scene compatibility score for the scene image and an object, wherein the scene compatibility score for the scene image and the object is based on a set of image tags associated with the scene image (texture matching between images, if a tree matches a particular forest scene, it might be more important to look for similar forest images, which typically have very consistent color palettes, than for green buildings which might exhibit different shades of green and still look realistic, part 3.3);  steps for determining a color compatibility score for the scene image and the object, wherein the color compatibility score for the scene image and the object is based on a color theme of an embedded image that includes the object embedded in the scene image (modeling the palette that is likely to co-occur together with a particular color in a real image, part 3.2, find a set of k most similar-looking objects based on color (k-NN), and approximate its expected co-occurring 

Zhao et al. and Lalonde et al. are in the same art of composite images (Zhao et al., abstract; Lalonde et al., abstract). The combination of Lalonde et al. with Zhao et al. will enable use of a color compatibility score. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the color score of Lalonde et al with the invention of Zhao et al. as Zhao et al. indicates color is known to be an important parameter in the composition (abstract), it would have been known at the time of the invention, the combination would have predictable results, and as Lalonde et al. indicate “our goal is to use color information to automatically predict whether a composite image such as the ones in Figure 1 will look realistic or not to a human observer” and “Finally, from the intuitions gathered while evaluating the global and local approaches, we suggest a way of combining them into a single classifier. Section 5 presents this 

Zhao et al. and Lalonde et al. and Lee et al. do not disclose each tag of the set of image tags is semantically correlated with physical or virtual items that are visually depicted within the scene image.

Lee et al. teach the scene compatibility score for the scene image and the object is based on a set of image tags associated with the scene image, wherein each tag of the set of image tags is semantically correlated with physical or virtual items that are visually depicted within the scene image (“In one embodiment, training engine 122 generates machine learning models for inserting objects into scenes.  The scenes may include semantic representations of images, such as segmentation maps that associate individual pixels in the images with semantic labels.  For example, a segmentation map of an outdoor scene may include regions of pixels that are assigned to labels such as "road," "sky," "building," "bridge," "tree," "ground," "car," and "pedestrian." In turn, machine learning models created by training engine 122 may be used to identify plausible locations of an object in the scene, as well as a plausible sizes and shapes of the objects at the locations.  In various embodiments, the machine learning models may learn "where" in a scene objects can be inserted, as well as "what" the objects look like, such that the object maintains contextual coherence with the scene.”, [0018], “Joint synthesis and placement of objects into scenes may involve combined learning of the location and scale of each object in a given scene, as well as the shapes of each object given the corresponding 

Zhao et al. and Lalonde et al. and Lee et al. are in the same art of composite images (Zhao et al., abstract; Lalonde et al., abstract; Ludwigsen et al., [0019]). The combination of Lee et al. with Zhao et al. and Lelonde et al. will enable use of semantic tags. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the tags of Lee et al. with the invention of Zhao et al. and Lalonde et al. as this was known at the time of the invention, the combination would have predictable results, and as Lee et al. indicate this allows for inserting objects into images in a realistic and/or semantically plausible way with applications in image synthesis, augmented reality, virtual reality, and/or domain randomization in machine learning ([0002], [0020]), which will further create visual congruity to the final images when used in conjunction with Zhao et al. and Lalonde et al..
Regarding claim 9, Zhao et al. and Lalonde et al. and Lee et al. disclose the method of claim 8. Zhao et al. and Lalonde et al. further indicate generating candidate embedded images with each candidate embedded image including a visual representation of one object of the plurality of objects and the scene image (Zhao et al., Fig. 1, inserting a person on the lawn, part 4, evaluation the search results in terms of the compositing quality, part 5.4; Lalonde et al. automatically generate composite images that will look right semantically, Once the best matching object is found, we paste it onto the original image and apply linear feathering along the border to mask out potential seams, part 2, retrieve either a nearest scene, recolor the object to match the colors of similar objects in that nearest scene, automatically improve the realism of composite images, part 6); determining respective overall compatibility scores of the candidate embedded images; determining a ranking of the candidate embedded images based on the respective overall compatibility scores (Zhao et al., foreground is considered compatible with the background if they roughly match in terms of semantics, viewpoint, style, color, etc., the learned features can adaptively encode the most important compatibility factors, project the 

Regarding claim 15, Zhao et al. disclose a computing system for generating context-aware images (carry our experiments on the public platform Caffe [20], part 5), comprising: a processor device; and a computer-readable storage medium, coupled with the processor device, having instructions stored thereon, which, when executed by the processor device, perform actions comprising: receiving an object (bottom stream takes the foreground image, part 3.1, Given an image with object masks, part 4, we insert every possible foreground candidate at that location to generate the composite, 100-400 candidate foreground objects, part 5.1) and a scene image (given a background image, returns compatible foreground objects, abstract, Given a background image as a query, Fig. 1, top stream takes the background scene as input, part 3.1); determining a scene compatibility score for the object and the scene image, wherein the scene compatibility score for the object and the scene image is based on an object classification of the object and a set of image tags associated with the scene image (compatibility of a foreground and background image can be easily measured using the cosine similarity between their corresponding feature vectors, part 1, encourage the feature vectors from compatible foreground and background images to be more similar than those from incompatible pairs, learned features can be directly used to calculate the similarity in terms of compatibility for image compositing, part 3, use image mean values to fill the rectangle that indicates the location where the object to be inserted, so that the information regarding desired object location, size and aspect ratio can be provided to the network., encode the category information into the foreground and background streams, top stream focuses more on scene context, part 3.1, Utilizing those mask annotations, we can decompose the image into background scenes and foreground objects, part 4) determining a color compatibility score for the object and the scene image, wherein the color compatibility score for the object and the scene image is based on an embedded image that includes a color-based visual representation 

While Zhao et al. indicate using color as a possible parameter for the matching and multiclass ranking and performing a ranking, and shows in Fig. 1 example resulting embedded images, Zhao et al. do not make explicit based on a combination of the scene compatibility score and the color compatibility score, providing the embedded image.  Zhao et al. do not disclose each tag of the set of image tags is semantically correlated with physical or virtual items that are visually depicted within the scene image.

Lalonde et al. teach receiving an object (replacing one of its objects by another one of the same semantic type and shape, select object, part 2) and a scene image (forest scene, part 3.3, real scenes, part 4, retrieve either a nearest scene (if the global method is used), or the determination that no matching global scene is available, part 6); determining a color compatibility score for the object and the scene image, wherein the color compatibility score for the object and the scene image is based on a color of the plurality of colors and an embedded image that includes a visual representation of the object embedded in the scene image, 

Zhao et al. and Lalonde et al. are in the same art of composite images (Zhao et al., abstract; Lalonde et al., abstract). The combination of Lalonde et al. with Zhao et al. will enable use of a color compatibility score. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the color score of Lalonde et al with the invention of Zhao et al. as Zhao et al. indicates color is known to be an important parameter in the composition (abstract), it would have been known at the time of the invention, the combination would have predictable results, and as Lalonde et al. indicate “our goal is to use color information to automatically predict whether a composite image such as the ones in Figure 1 will look realistic or not to a human observer” and “Finally, from the intuitions gathered while evaluating the global and local approaches, we suggest a way of combining them into a single classifier. Section 5 presents this combined approach and compares it to using either technique by itself. As an additional application, we show in Section 6 how to automatically shift the colors of an unrealistic object 

Zhao et al. and Lalonde et al. do not disclose each tag of the set of image tags is semantically correlated with physical or virtual items that are visually depicted within the scene image.

Lee et al. teach determining a scene compatibility score for the object and the scene image, wherein the scene compatibility score for the object and the scene image is based on an object classification of the object and a set of image tags associated with the scene image, wherein each tag of the set of image tags is semantically correlated with physical or virtual items that are visually depicted within the scene image (“In one embodiment, training engine 122 generates machine learning models for inserting objects into scenes.  The scenes may include semantic representations of images, such as segmentation maps that associate individual pixels in the images with semantic labels.  For example, a segmentation map of an outdoor scene may include regions of pixels that are assigned to labels such as "road," "sky," "building," "bridge," "tree," "ground," "car," and "pedestrian." In turn, machine learning models created by training engine 122 may be used to identify plausible locations of an object in the scene, as well as a plausible sizes and shapes of the objects at the locations.  In various embodiments, the machine learning models may learn "where" in a scene objects can be inserted, as well as "what" the objects look like, such that the object maintains contextual coherence with the scene.”, [0018], “Joint synthesis and placement of objects into scenes may involve combined learning of the location and scale of each object in a given scene, as well as the shapes of each object given the corresponding location and scale.  Continuing with the above example, execution engine 124 may apply a first generator model generated by training engine 122 to a 

Zhao et al. and Lalonde et al. and Lee et al. are in the same art of composite images (Zhao et al., abstract; Lalonde et al., abstract; Ludwigsen et al., [0019]). The combination of Lee et al. with Zhao et al. and Lelonde et al. will enable use of semantic tags. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the tags of Lee et al. with the invention of Zhao et al. and Lalonde et al. as this was known at the time of the invention, the combination would have predictable results, and as Lee et al. indicate this allows for inserting objects into images in a realistic and/or semantically plausible way with applications in image synthesis, augmented reality, virtual reality, and/or domain randomization in machine learning ([0002], [0020]), which will further create visual congruity to the final images when used in conjunction with Zhao et al. and Lalonde et al..

Regarding claim 16, Zhao et al. and Lalonde et al. and Lee et al. disclose the system of claim 15. Zhao et al. and Lalonde et al. further indicate the actions further comprising: determining  plurality of colors in the scene image, generating based on the respective colors of the plurality of colors respective embedded images (Zhao et al., Fig. 1, inserting a person on the lawn, part 4, evaluation the search results in terms of the compositing quality, part 5.4; Lalonde et al. automatically generate composite images that will look right semantically, Once the best matching object is found, we paste it onto the original image and apply linear feathering along the border to mask out potential seams, part 2, expected color palette, part 3.2, local color statistics, part 4, retrieve either a nearest scene, recolor the object to match the colors of similar objects in that nearest scene, automatically improve the realism of composite images, part 6); determine respective color compatibility scores of the respective embedded image and determining a ranking of the respective embedded images based on the respective color compatibility scores and providing a plurality of top-ranked embedded images of the respective embedded images (Zhao et al., foreground is considered compatible with the background if they roughly match in terms of semantics, viewpoint, style, color, etc., the learned features can adaptively encode the most important compatibility factors, project the features to a common 

Regarding claim 17, Zhao et al. and Lalonde et al. and Lee et al. disclose the computing system of claim 15. Zhao et al. further indicate the actions further comprising: determining a position, within the scene image, for the object; determining an orientation, within the scene image, for the object; determining a scale, within the scene image, for the object; generating the embedded image by embedding the color-based visual representation of the object at the determined position within the scene image, wherein the color-based visual representation of the object is based on the orientation and the scale (By including the filled rectangle in the background image, the learned background features can respond to the location, size and aspect ratio of the object to be inserted when measuring compatibility, part 4, We draw a bounding box on each of the background image in appropriate position that is suitable for object insertion, part 5.1, search results are tuned to location and aspect ratio of the bounding box, Fig. 6) [a color-based representation taught in claim 15 by Lalonde et al.]

Claims 3 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“Compositing-aware image search”) and Lalonde et al. (“Using color compatibility for assessing image realism”) and Lee et al. (US 20200074707 A1) as applied to claims 1 and 8 above, further in view of van Zwol et al. (US 20110072025 A1).

Regarding claim 3, Zhao et al. and Lalonde et al. and Lee et al. disclose the computer-readable storage medium and method of claims 1 and 8, and Lalonde et al. partially further teach the scene compatibility score is further based on a co-occurrence probability for the object classification and each image tag included in the set of image tags (estimate color palette co-occurences, part 1.3, manually group objects that have similar labels, and end up with the following 15 most frequently-occurring objects in the dataset, part 2, modeling the 

van Zwol et al. teach the scene compatibility score is further based on a co-occurrence probability for the object classification and each image tag included in the set of image tags (For a particular content entity, then, related topic entities may be ranked using the probability of co-occurrence of the pairs.  Returning to the above examples, if a probability of co-occurrence of a content entity-topic entity pair "London-Big Ben" (or "london,bigben" as a pair of tags in the normalized form) in the vocabulary of a photo annotation corpus Flickr.RTM.  is higher that a probability of co-occurrence of a content entity-topic entity pair "London-Buckingham Palace" (or "london,buckinghampalace" as a pair of tags in the normalized form), then the topic entity "Big Ben" may be ranked higher than the topic entity "Buckingham Palace" in the listing of the returned search results for the query "London.", [0042]).

Zhao et al. and Lalonde et al. and van Zwol et al. are in the same art of searching images/labels (Zhao et al., abstract; Lalonde et al., part 2; van Zwol et al., abstract). The combination of van Zwol et al. with Zhao et al. and Lelonde et al. and Lee et al. will enable use of a probability of co-occurrence. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the co-occurrence probability of van Zwol et al. with the invention of Zhao et al. and Lalonde et al. and Lee et al. as this was known at the time of the invention, the combination would have predictable results, and as van Zwol et al. indicate “it may be desirable for search engine systems to employ one or more processes to rank web documents, files, or search results to assist a user in presenting .

Claims 6 and 13 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“Compositing-aware image search”) and Lalonde et al. (“Using color compatibility for assessing image realism”) and Lee et al. (US 20200074707 A1) as applied to claims 1 and 8 and 15 above, further in view of Ludwigsen (US 9514381 B1).

Regarding claims 6 and 13, Zhao et al. and Lalonde et al. and Lee et al. disclose the computer-readable storage medium and method of claims 1 and 8. Lalonde et al. further indicate the actions further comprise: determining a color effect, based on one or more colors visually depicted within the scene image, to apply to a visual representation of the object; determining a texture effect, based on one or more textures visually depicted within the scene image, to apply to the visual representation of the object; determining a lighting effect, based on one or more lighting conditions visually depicted within the scene image, to apply to the visual representation of the object; and generating the embedded image by embedding the visual representation of the object within the scene image, wherein the color effect, the texture effect, and the lighting effect are applied to the visual representation of the object (Difference in scene lighting between the source and destination images is one important consideration, Lalonde et al. [10] presented a datadriven technique for image compositing that uses a coarse illumination context descriptor to find scenes with similar lighting in a large database, part 1.2, Once the best matching object is found, we paste it onto the original image and apply linear feathering along the border to mask out potential seams, part 2, making an object match its background by shifting its colors to make them closer to the background colors, The same 

Ludwigsen et al. teach embedding the visual representation of the object within the scene image, wherein the determined lighting effect is applied to the visual representation of the object (area detection and replacement in an image includes identifying an object or area in one or more sequential images that form a moving image sequence and replacing some or all of the identified object or areas with another image such that the image looks to be part of the original composition of the original image including lighting, shadows, placement, occlusion, orientation, position, and deformation, abstract, Once the new image or video is chosen, the image 51 snaps to the size of the identified object or area 52, and the system and method makes adjustments such as lighting, shadows, placement, occlusion, orientation, position, and deformation, but not limited to, that have been previously identified to the inserted image, col. 5, lines 45-55).

Zhao et al. and Lalonde et al. and Ludwigsen et al. are in the same art of composite images (Zhao et al., abstract; Lalonde et al., abstract; Ludwigsen et al., abstract). The combination of Ludwigsen et al. with Zhao et al. and Lelonde et al. and Lee et al. will enable use of adding a lighting effect. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the lighting effect of Ludwigsen et al. with the invention of Zhao et al. and Lalonde et al. and Lee et al. as this was known at the time of the invention, the combination would have predictable results, and as Ludwigsen et al. indicate this makes the composite image look like the original composition 

Regarding claim 19, Zhao et al. and Lalonde et al. and Lee et al. disclose the computing system of claim 15. Lalonde et al. further indicate determining a texture effect, based on one or more textures visually depicted within the scene image, to apply to the color-based visual representation of the object; determining a lighting effect, based on one or more lighting conditions visually depicted within the scene image, to apply to the color-based visual representation of the object; and generating the embedded image by embedding the color-based visual representation of the object within the scene image, wherein the determined texture effect and the lighting effect are applied to the color-based visual representation of the object (Difference in scene lighting between the source and destination images is one important consideration, Lalonde et al. [10] presented a datadriven technique for image compositing that uses a coarse illumination context descriptor to find scenes with similar lighting in a large database, part 1.2, Once the best matching object is found, we paste it onto the original image and apply linear feathering along the border to mask out potential seams, part 2, modeling the palette that is likely to co-occur together with a particular color in a real image, part 3.2, making an object match its background by shifting its colors to make them closer to the background colors, The same intuition of using texture as mentioned in the previous section also applies here, and improves performance as well, part 4, Automatic Image Recoloring, find a set of k most similar-looking objects based on color (k-NN), and approximate its expected co-occurring palette by the best- matching background in this k set, part 3.3, local color statistics, part 4, Automatic Image Recoloring to match the colors, part 6). However, as Lalonde et al. do not describe how the 

Ludwigsen et al. teach embedding the color-based visual representation of the object within the scene image, wherein the determined lighting effect is applied to the color-based visual representation of the object (area detection and replacement in an image includes identifying an object or area in one or more sequential images that form a moving image sequence and replacing some or all of the identified object or areas with another image such that the image looks to be part of the original composition of the original image including lighting, shadows, placement, occlusion, orientation, position, and deformation, abstract, Once the new image or video is chosen, the image 51 snaps to the size of the identified object or area 52, and the system and method makes adjustments such as lighting, shadows, placement, occlusion, orientation, position, and deformation, but not limited to, that have been previously identified to the inserted image, col. 5, lines 45-55).

Zhao et al. and Lalonde et al. and Ludwigsen et al. are in the same art of composite images (Zhao et al., abstract; Lalonde et al., abstract; Ludwigsen et al., abstract). The combination of Ludwigsen et al. with Zhao et al. and Lelonde et al. and Lee et al. will enable use of adding a lighting effect. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the lighting effect of Ludwigsen et al. with the invention of Zhao et al. and Lalonde et al. and Lee et al. as this was known at the time of the invention, the combination would have predictable results, and as Ludwigsen et al. indicate this makes the composite image look like the original composition (abstract), indicating a way to achieve even more realistic images when combined into the image embedding process of Zhao et al. and Lalonde et al.. and Lee et al.

Claims 7 and 14 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“Compositing-aware image search”) and Lalonde et al. (“Using color compatibility for assessing image realism”) and Lee et al. (US 20200074707 A1) as applied to claims 1 and 8 and 15 above, further in view of Slaney et al. (US 20110029561 A1).

Regarding claims 7 and 14, Zhao et al. and Lalonde et al. and Lee et al. disclose the computer-readable storage medium and method of claims 1 and 8. Zhou et al. and Lalonde et al. further indicate the actions further comprise: determining a feature vector of the color theme; determining a rating of the color theme based on the feature vector; and determining the color compatibility score based on the rating of the color theme (Zhou et al., compatibility scores can be easily measured using the cosine similarity, abstract, compatibility of a foreground and background image can be easily measured using the cosine similarity between their corresponding feature vectors, part 1, multiclass fine-grained ranking problem, encourage the feature vectors from compatible foreground and background images to be more similar than those from incompatible pairs, part 3, unit feature vector for background and foreground respectively, which encodes both the category information and image content, part 3.1, Since the feature vectors have unit length after ℓ2 normalization, we can easily calculate their similarity using squared ℓ2 distance, part 3.2, MAP scores shown in the tables are all in percentage, part 5.1, use the realism score predicted by the Realism-CNN to rank all the candidates, part 5.3; Lalonde et al., Once the best matching object is found, we paste it onto the original image and apply linear feathering along the border to mask out potential seams, part 2, find a set of k most similar-looking objects based on color (k-NN), and approximate its expected co-occurring palette by the best- matching background in this k set, part 3.3, local 

Slaney et al. teach determining a feature vector of the color theme; determining a rating of the color theme based on the feature vector; and determining the color compatibility score based on the rating of the color theme (in one embodiment of the invention, an image's feature vector includes a subsection that indicates numerical values that are indicative of visual qualities of the image (e.g., number of colors, color palette, [0059], in order to rank a set of images in their degree of similarity to a user-chosen image, the feature vectors of each of those images is normalized, and then those images are similarity-ranked based on the Euclidean distances of their normalized feature vectors from the normalized feature vector of the user-chosen image, [0062]).

Zhao et al. and Lalonde et al. and Slaney et al. are in the same art of searching images/labels (Zhao et al., abstract; Lalonde et al., part 2; Slaney et al., abstract). The combination of Slaney et al. with Zhao et al. and Lelonde et al. and Lee et al. will enable use of a feature vector of a color theme. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the feature vector of Slaney et al. with the invention of Zhao et al. and Lalonde et al. and Lee et al. as this was known at the time of the invention, the combination would have predictable results, as Zhao et al. indicate the use of various types of feature vectors and color theme feature vectors would be one of a limited number of types of feature vectors that can be found for an image, and as Slaney et al. indicate, “Techniques described herein similarly involve an early fusion approach.  

Regarding claim 20, Zhao et al. and Lalonde et al. and Lee et al. disclose the computing system of claim 15. Zhou et al. and Lalonde et al. further indicate the actions further comprising: determining a color theme of the embedded image; determining a feature vector of the color theme; determining a rating of the color theme based on the feature vector; and determining the color compatibility score based on the rating of the color theme (Zhou et al., compatibility scores can be easily measured using the cosine similarity, abstract, compatibility of a foreground and background image can be easily measured using the cosine similarity between their corresponding feature vectors, part 1, multiclass fine-grained ranking problem, encourage the feature vectors from compatible foreground and background images to be more similar than those from incompatible pairs, part 3, unit feature vector for background and foreground respectively, which encodes both the category information and image content, part 3.1, Since the feature vectors have unit length after ℓ2 normalization, we can easily calculate their similarity using squared ℓ2 distance, part 3.2, MAP scores shown in the tables are all in percentage, part 5.1, use the realism score predicted by the Realism-CNN to rank all the candidates, part 5.3; Lalonde et al., Once the best matching object is found, we paste it onto 

Slaney et al. teach the actions further comprising: determining a color theme of the embedded image; determining a feature vector of the color theme; determining a rating of the color theme based on the feature vector; and determining the color compatibility score based on the rating of the color theme (e.g., number of colors, color palette, [0059], in order to rank a set of images in their degree of similarity to a user-chosen image, the feature vectors of each of those images is normalized, and then those images are similarity-ranked based on the Euclidean distances of their normalized feature vectors from the normalized feature vector of the user-chosen image, [0062]).

Zhao et al. and Lalonde et al. and Slaney et al. are in the same art of searching images/labels (Zhao et al., abstract; Lalonde et al., part 2; Slaney et al., abstract). The combination of Slaney et al. with Zhao et al. and Lelonde et al. and Lee et al. will enable use of a feature vector of a color theme. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the feature vector of Slaney et al. with the invention of Zhao et al. and Lalonde et al. and Lee et al. as this was known at the time of the invention, the combination would have predictable results, as Zhao et al. indicate the use of various types of feature vectors and color theme feature vectors would .

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: US 20200097764 A1 (The present disclosure generally relates to systems, methods, medium, and other implementations directed to learning embeddings for visual scenes via visual semantics represented based on collocated annotations of visual objects “FIGS. 7A-7B show concept clusters represented by embeddings generated by learning from visual semantics of various images, according to embodiments of the present teaching.  As can be seen, the learned concept clusters appropriately cluster similar concepts together and present hierarchical structures.  For example, in FIG. 7A, arm, hand, human body, leg are clustered nearby, hair, mouth, nose, head, eye are clustered and together they form higher and higher level concepts such as mammal.  Similarly, in FIG. 7B, concepts of tire, wheel, auto parts are grouped together and they are under the concept of vehicle which includes cars, land vehicles”); US 20180293313 A1 (Semantic concepts share complex relationships such as US 20200192389 A1 (The image generator 103 is configured to insert the second image into the images of the real-world scene based on the pose of the real-world object in the second image. The segmentation model can produce a semantically labeled mask for the real-world scene).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084.  The examiner can normally be reached on 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661