DETAILED ACTION
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1- 3, 5- 6 and 8- 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kwon et al (US PAP 2016/ 0342863), “Kwon”.
As per claim 1, as best understood and in light of the rejections, Kwon teaches a computer readable medium with instructions see for example fig. 2;
based on identifying a digital image comprising a plurality of detectable objects, receiving, user input from a client device comprising a selection query indicating a target query object to be selected within the digital image (i.e., the client device 115 sends a query image to the recognition server 101 and receive as input a query image of one product or a scene of shelf images with many products.  For example, the hybrid detection recognition application 103 may receive an image of a single box of toothpaste, or an image of a shelving unit displaying a variety of boxes of toothpaste and other types of products in a retail supermarket.  The hybrid detection recognition application 103 may determine one or more objects depicted in the image and identify the depicted objects.  For example, the hybrid detection recognition application 103 may identify the depicted objects by classifying one or more regions of interest in the query image into product classes using convolutional neural network (CNN)) see for example [0026 and 29];
analyzing the selection query to identify a first detection input comprising a query object and a second detection input comprising an object position attribute that corresponds to the query object (i.e., an image of a product may be analyzed to identify a set of image features and to determine a location, an orientation, and an image description for each feature detected in the image of the product.  The indexer 239 may then map the image of the product with a set of product metadata associated with the product, the set of image features identified for that image of the product, and the location in the image where each feature occurs) see for example [0034]; 
based on the first detection input, detecting multiple instances of the query object in the digital image from the plurality of detectable objects utilizing an object detection model utilizing an object detection neural network (i.e., the hybrid detection recognition application 103 may additionally identify the depicted objects by matching the regions of interest in the query image to indexed images using model-based features) see for example [0029]; 
based on the second detection input, select an object attribute detection model specific to the object attribute from among a plurality of object attribute detection models by identifying at least one attribute detection model trained to identify the object attribute additionally identify (clearly corresponding to the second detection input) the depicted objects by matching the regions of interest in the query image to indexed images using model-based features” wherein the hybrid detection recognition application 103 combine(s) the classification results from the convolutional neural network with the matching results using model-based features to generate the ultimate product recognition results for the detected objects” in [0029]; [0059, 68, 75- 77, 81- 84, and 86- 87] also teach similar limitation;
detect that a first query object instance from the multiple query object instances is the target query object by utilizing the based on the object attribute and the determined object attribute detection model to determine that the first query object instance reflects the object attribute; and providing, to the client device, the digital image with the first query object instance selected for editing within the digital image in response to the selection query (i.e., examples of product metadata include packaging dimension, packaging identifier, price of the product as sold in the retailer store, the number of product facing (e.g., one facing for one box of a brand or one stack of more than one identical products, two facings for two boxes of the same brand sitting side by side or two stacks of more than one identical products sitting side by side), shelf identifier, width, height, depth, area, diagonal length, color, product attributes such as product name, product identifier, product weight, product volume, product description, product size, ingredients, nutritional information, manufacturer brand, model number, and material, among other things.  In some embodiments, the results of matching may also include the product identifier (e.g., the UPC code) of the product associated with the matching index image) see for example [0070]; [0075] further teaches “merge the convolutional neural network with the matching results of the one or more ROIs determined using model based features to identify the product depicted in the query image”; [0035 and 54] respectively disclose object position/ location information in the image, and an absolute position of the object with its x-y coordinates in the query image; [0083] further discloses “the region detector 205 may group the determined ROIs based on spatial locations in the query image, wherein the region detector 205 may aggregate two or more ROIs that share a similar spatial location in the query image into a group of ROIs and rank the ROIs based on one or more ranking criteria.  For example, for each group of ROIs corresponding to a spatial location in the query image, the region detector 205 may rank the ROIs in the group based on the size of the ROI, the location of the ROI, a degree of match between the matching features in the ROI and in the indexed image, a level of geometrical consistency between the shapes formed by those two sets of matching features, etc. to generate a ranked list of ROIs.  At 608, the region detector 205 may select the top-k in the ranked list of ROIs). 
As per claim 2, Kwon teaches parse the selection query from the client device to determine: a noun that identifies the query object; and an adjective that identifies the object attribute (i.e., product metadata include packaging dimension, packaging identifier, price of the product as sold in the retailer store, the number of product facing, shelf identifier, width, height, depth, area, diagonal length, color, product attributes such as product name, product identifier, product weight, product volume, product description, product size, ingredients, nutritional information, manufacturer brand, model number, and material, among other things.  In some embodiments, the results of 
As per claim 3, Kwon teaches determine, form the object attribute of the selection query, an indicated position of the query object within the digital image; detect that the first query object instance is the target query object instance by: determining positions within the digital image for each of the multiple query object instances utilizing an object position attribute detection model; and detecting the first query object instance is the target query object based on the first query object instance having a position in the digital image that is closest to the indicated position (i.e., the region detector 205b may include a region segmentation module 311 for localizing and partitioning the query image into one or more ROIs and align the query image with a corresponding planogram to locate multiple packaged products, price labels, and other objects of interest, wherein the determined location may be an absolute position of the object with its x-y coordinates in the query image, wherein the determined location may be a relative location of the object, for example, a relative distance(s) from the object to one or more points of reference (e.g., a light source, a sign, a bottom shelf of the shelving unit, other packaged products appear in the scene, etc.), wherein the region segmentation module 311 may determine the image area covered by the located object in the query image as a detected ROI , wherein the detected ROI may be represented by a bounding box surrounding the located object and may be identified by a location (absolute location, e.g., x-y coordinates, or relative location) of the bounding box in the query image) see for example [0054].
map the synthetically modified images with these extracted features and with product metadata of the corresponding product ; examples of product metadata include product name, product identifier (e.g., dimensions (e.g., width, height, depth, etc.)) see for example [0034].
As per claim 6 and in light of the rejections, Kwon teaches the selection query is received after identifying the digital image (i.e., the hybrid detection recognition application 103 may additionally identify the depicted objects by matching the regions of interest in the query image to indexed images using model-based features) see for example [0029]; [0047 and 85] disclose “a ROI in a query image of multiple soda cans on a shelf may be a rectangular polygon with its bounding box encircling a label on a soda can.  Another ROI in that query image may be a combination of a symbolic brand name and a nearby label indicating type of the product (e.g., diet, organic cane sugar, etc.) on another soda can.  In some examples, a query image may include multiple ROIs”.

As per claim 9, Kwon teaches identify the color for the first query object instance utilizing the object color attribute detection model by comparing one or more pixels of the first query object instance to the indicated color in a multi-dimensional color space 
As per claim 10, Kwon teaches determine that the object attribute does not correspond to a known object attribute type; and detect, based on determining that the object attribute does not correspond to a known object attribute type, the first query object instance is the target query object by: generating tags for one or more of the multiple query object instances utilizing a tagging neural network; and matching the object attribute with a tag generated for the first query object instance (i.e., if the class for which the ROI is assigned the highest classification score is a product class, the CNN classification module 207 may return the class identifier (e.g., class label), the product identifier (e.g., the UPC code) and the representative image of the product class assigned to the ROI, and the classification score of the ROI corresponding to that assigned product class.  If the class for which the ROI is assigned the highest classification score is the non-product class, the CNN classification module 207 may return the class identifier of the non-product class and the classification score of the ROI corresponding to the non-product class.  In this situation, the ROI may be provided to the image matching module 209 to be interpreted using modeled-feature-based matching) see for example [0064 and 66].
As per claim 11, Kwon teaches detect the first query object instance is the target query object by filtering out one or more other query object instances of the multiple query object instances based on the one or more other instances of the multiple query object instances having tags not corresponding to the object attribute (i.e., assign a ROI one or more classification scores corresponding to one or more product classes (and/or 
As per claim 12, Kwon teaches an object material detection neural network (i.e., product metadata to include material) see for example [0070]; [0026, 76, 81 and 83] disclose similar limitation.
As per claim 13, Kwon teaches an object size detection model see for example [0055 and 83].
As per claim 14 and in light of the rejections, Kwon teaches based on the second detection input indicating the object color attribute from the selection query, select an object attribute detection model specific to the object color attribute from among a plurality of object attribute detection models by identifying at least one attribute detection model trained to identify the object color attribute; detect that a first query object instance from the multiple query object instances is the target query object by utilizing the object attribute detection model specific to the object color attribute to determine that the first query object instance reflects the object color attribute (i.e., an variety of matching scores, e.g., an area matching score, a color matching score, a number of inliers, etc. In these embodiments, the number of inliers is a number of geometrically consistent matching sets of features identified by the image matching module 209.  The color matching score may describe the similarity of color between the matching features in the index image and in the ROI.  The area matching score may indicate a ratio between a convex hull of the matching feature points in the index image and the bounding box of the ROI and product metadata to include color) see for example [0069- 70].
As per claim 15 and in light of the rejections, Kwon teaches detect multiple query object instances in the digital image by: generating approximate boundaries for the multiple query object instances utilizing the object detection neural network; and generating an object mask for each of the multiple query object instances from the approximate boundaries utilizing an object mask neural network (i.e., training dataset may include positive examples and negative examples.  In some embodiments, positive examples may be training samples that include highly visible product images (e.g., more than 90% of the product, or a stack of products, is visible in the image).  Negative examples may be training samples that include no product images or include insufficiently visible (e.g., masked) product images (e.g., less than 40% of the product, or a stack of products, is visible in the image) and CNN classification module 207 trained to classify a ROI into the non-product class when the ROI is incorrectly detected) see for example [0059 and 62].
As per claim 16 and in light of the rejections, Kwon teaches determine that the object color attribute comprises a first color see for example [0069- 70].

As per claim 18 and in light of the rejections, Kwon teaches based on the second detection input, selecting an object attribute detection model specific to the object position attribute from among a plurality of object attribute detection models by identifying at least one attribute detection model trained to identify the object position attribute; determining that a first query object instance from the multiple query object instances is the target query object by utilizing the object attribute detection model specific to the object position attribute to determine that the first query object instance comprises a first position that corresponds to the object position attribute in the selection query (i.e., an image including a plurality of objects to determine one or more objects and/or the location of one or more objects represented in an image) see for example [0035]; [0054, 70, 75, and 83] also disclose similar limitation.
As per claim 19 and in light of the rejections, Kwon teaches generating bounding boxes for each of the multiple query object instances; and wherein identifying the positions for each of the multiple query object instances in the digital image is based on the bounding boxes generated for each of the multiple query object instances see for example [0047 and 74].
As per claim 20 and in light of the rejections, Kwon teaches generating, utilizing an object mask neural network, a first object mask for the first query object instance without generating additional object masks for other query object instances of the .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Kwon in view of Cohen et al (US Patent 9129191), “Cohen” (IDS).
As per claim 4, Kwon teaches a ROI in the query image may be indicated by a bounding box enclosing the image area it covers and can be of any shape, for example, a polygon, a circle with a center point and a diameter, a rectangular shape of a width, a height and one or more reference points (e.g., a center point, one or more corner points) of the region, etc. In some embodiments, a reference point may be specified by a first coordinate value (e.g., the coordinate) and a second coordinate value (e.g., the [right arrow over (y)] coordinate).  As an example, the ROI may cover a packaged product or a group of packaged products in its entirety see for example [0047].
 teach determine, form the object attribute of the selection query, an indicated relative object position of the query object within the digital image; and detect the first query object instance is the target query object by: identifying center positions for each of the multiple query object instances within the digital image utilizing an object position attribute detection model; comparing a first center position of the first query object instance to a second center position of a second query object instance of the multiple query object instances to identify a first relative position for the first query object instance; and detecting the first query object instance is the target query object based on the first relative position for the first query object instance corresponding to the indicated relative object position.
Cohen explicitly teaches determine, form the object attribute of the selection query, an indicated relative object position of the query object within the digital image; and detect the first query object instance is the target query object by: identifying center positions for each of the multiple query object instances within the digital image utilizing an object position attribute detection model; comparing a first center position of the first query object instance to a second center position of a second query object instance of the multiple query object instances to identify a first relative position for the first query object instance; and detecting the first query object instance is the target query object based on the first relative position for the first query object instance corresponding to the indicated relative object position (i.e., if the similarity measure does fall within the predetermined range of acceptable similarity, it can then be determined whether the center of bounding box 28 is within a predetermined range of the center of t'th matched exemplar image 24' (see reference numeral 1142d in FIG. 4C).  If the center of 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art at the time the invention was made to incorporate the teachings of Cohen into Kwon such that the input image is divided into a number of "object proposals", wherein an object proposal is a region of the input image that is likely to contain a single recognizable object and object proposals provide reasonable estimates of where the target data may be located within the input image, thereby helping the system to avoid selecting background clutter and enabling the system to more robustly segment general photos of the real world, wherein the exemplar retrieval database is then queried for exemplars which correspond to the various object proposals, wherein object proposals which sufficiently correspond to positive exemplars in the exemplar retrieval database can be validated as potential matches to the semantic input and object proposals which sufficiently correspond to negative exemplars in the exemplar retrieval database, or which do not correspond to any exemplars, can be rejected as being unlikely to match the semantic input, wherein . 
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Kwon in view of Cinnamon et al (US PAP 2019/ 0019318), “Cinnamon”, (IDS).
As per claim 7, Kwon does not teach detect the multiple query object instances in the digital image by identifying a separate object mask within the digital image for each of the multiple query object instances.
However, Cinnamon explicitly teaches detect the multiple query object instances in the digital image by identifying a separate object mask within the digital image for each of the multiple query object instances (i.e., the graphical output generator may generate a graphical representation that obscures or masks some identified items in a captured image and in order to emphasize other identified items.  As a more particular example of this implementation, the graphical output generator may determine that a first identified item has a high threat level and that other identified items have relatively low threat levels.  Based on the received threat data, the graphical output generator 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art at the time the invention was made to incorporate the teachings of Cinnamon into Kwon to provide a graphical output generator which may receive information from a classification engine related to the confidence that the classification engine has correctly identified an item.  Based on the received classification confidence information, the graphical output generator may generate a graphical/ visual representation of the classification confidence level for a given identified item, wherein the visual representation of the confidence level may comprise a textual indication of the confidence, such as a percentage value or a level such as "high," "medium," or "low", wherein the confidence value may be represented using coloration or other non-textual indication, such as a color overlay (e.g., mask), wherein the color determined for the overlay reflects the determined classification confidence level, thus offer assistance to humans at sporadic visual search/ check, which is one of the biggest slowdowns in the security lines and is caused by an operator manually pausing the belt, re-scanning bags, and otherwise taking their time while scanning for threats and therefore improve efficiency at check points, avoid economic 

Response to Arguments
Applicant's arguments filed 1/13/2022 have been fully considered but they are not persuasive. 
Regarding applicant’s remarks dated 1/13/2022 examiner responds as follows:
1) Applicant argues “As discussed during the interview, and as the Examiner appeared to agree, however, the technique described in Kwon for identifying regions of interest using convolutional neural networks to detect objects does not constitute selecting an object attribute detection model specific to the object attribute, as more particularly recited above. Indeed, while Kwon describes classifying objects using neural networks in one sense, Kwon lacks details regarding selecting an object attribute detection model that is specific to a particular object attribute (of an object) indicated by a selection query. Thus, Kwon's description regarding object classification is not sufficient to read on the concept of selecting "an object attribute detection model specific to the object attribute from among a plurality of object attribute detection models by identifying at least one attribute detection model trained to identify the object attribute" and “Additionally, Kwon's description of object classification utilizing convolutional neural networks includes no mention of, nor allusion to, utilizing an object attribute detection model that is specific to the object attribute indicated by the selection query. Indeed, Kwon fails to disclose utilizing an object attribute detection model to 
In response examiner would initially remind that no agreements were reached as per telephonic interview dated 1/18/22. 
Applicant should submit an argument pointing out disagreements with the examiner’s contentions.  Applicant must also discuss the reference(s) applied against the claims, explaining how the claims avoid the references or distinguish from them. Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.
As per the rejection of the record Kwon explicitly teaches based on the second detection input, select an object attribute detection model specific to the object attribute from among a plurality of object attribute detection models by identifying at least one attribute detection model trained to identify the object attribute (i.e., the hybrid detection recognition application 103 may additionally identify (clearly corresponding to the second detection input) the depicted objects by matching the regions of interest in the query image to indexed images using model-based features” wherein the hybrid detection recognition application 103 combine(s) the classification results from the convolutional neural network with the matching results using model-based features to generate the ultimate product recognition results for the detected objects” in [0029]; [0059, 68, 75- 77, 81- 84, and 86- 87] also teach similar limitation;
product metadata include packaging dimension, packaging identifier, price of the product as sold in the retailer store, the number of product facing (e.g., one facing for one box of a brand or one stack of more than one identical products, two facings for two boxes of the same brand sitting side by side or two stacks of more than one identical products sitting side by side), shelf identifier, width, height, depth, area, diagonal length, color, product attributes such as product name, product identifier, product weight, product volume, product description, product size, ingredients, nutritional information, manufacturer brand, model number, and material, among other things.  In some embodiments, the results of matching may also include the product identifier (e.g., the UPC code) of the product associated with the matching index image) see for example [0070]; [0075] further teaches “merge the classification results of one or more ROIs determined by the convolutional neural network with the matching results of the one or more ROIs determined using model based features to identify the product depicted in the query image”; [0035 and 54] respectively disclose object position/ location information in the image, and an absolute position of the object with its x-y coordinates in the query image; [0083] further discloses “the region detector 205 may group the determined ROIs based on spatial locations in the query image, wherein the region detector 205 may aggregate similar spatial location in the query image into a group of ROIs and rank the ROIs based on one or more ranking criteria.  For example, for each group of ROIs corresponding to a spatial location in the query image, the region detector 205 may rank the ROIs in the group based on the size of the ROI, the location of the ROI, a degree of match between the matching features in the ROI and in the indexed image, a level of geometrical consistency between the shapes formed by those two sets of matching features, etc. to generate a ranked list of ROIs.  At 608, the region detector 205 may select the top-k in the ranked list of ROIs). 
2) Applicant argues “However, whether considered singly or in combination, the cited art fails to describe, teach, or suggest each limitation recited by independent claim 1. For example, the cited art fails to describe, teach, or suggest "based on the second detection input, select an object attribute detection model specific to the object attribute from among a plurality of object attribute detection models by identifying at least one attribute detection model trained to identify the object attribute" and "detect that a first query object instance from the multiple query object instances is the target query object by utilizing the object attribute detection model to determine that the first query object instance reflects the object attribute," as recited by currently amended independent claim 1. Independent claims 14 and 18 recite limitations where similar arguments apply”. Remarks at 16.
In response applicant merely repeats similar arguments that amount to mere unsupported allegations and arrives at subjective conclusions (see prong 1 and 2 of the arguments supra). Applicant further points to similar limitations as missing from the Kwon reference. Accordingly examiner responds using the same rational.

3) Applicant argues the Cohen and Cinnamon references, and the other dependent claims using similar arguments applied with reference to the independent claims. Remarks at 16- 17
Accordingly examiner responds using the same rational.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 

Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Manuchehr Rahmjoo whose telephone number is 571-272- 7789.  The examiner can normally be reached on 8 AM- 5 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

Manuchehr Rahmjoo
/Manuchehr Rahmjoo/
Primary Examiner, AU 2667
Manuchehr.Rahmjoo@uspto.gov