DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant's submission filed on 31 January 2022 has been entered.  Claims 1-17 are currently pending and have been considered below.

Response to Arguments
Applicant's arguments filed 31 January 2022 have been fully and carefully considered but they are not persuasive. Applicant argues on pages 2-3 of the Remarks that the combination of references fails to disclose the limitations “setting initial position coordinates of the to-be-identified item on the item image; and inputting the item image and the initial position coordinates into a pre-trained attention module to output an item feature of the to-be-identified item; inputting the item feature into a pre-trained long short-term memory network to output a predicted category and predicted position coordinates of the to-be-identified item; determining whether a preset condition is satisfied; and determining, in response to the preset condition being satisfied, a predicted category of the to-be-identified item outputted by the long short-term memory network a last time for use as a final category of the to-be-identified item” as recited in the independent claims.  The Examiner respectfully disagrees.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Tonioni discloses on page 26, Introduction, “Given a shelf image, we first perform a class-agnostic object detection to extract region proposals enclosing individual items. This stage relies on a deep learning object detector trained to localize product items within images taken in the store; we will refer to this network as Detector. In the second stage, we perform product recognition separately on each of the region proposal provided by the Detector ... our approach needs samples of annotated in-store images only to train the product-agnostic Detector, which, however, does not require product-specific labels but just bounding boxes drawn around items” and on pages 26-27, III. Proposed Approach, “each region proposal is encoded by an Embedder into ad-hoc image descriptors, which will be used for recognition … train a CNN (i.e., the Embedder) to learn a function E: I→ D that maps an input image i Є I into a k-dimensional descriptor.”
Tonioni also discloses on page 25, “Figure 1: Illustration of query images (a) and reference images (b) for the product recognition task. Bounding boxes overlapped to (a) shows correct detection colored according to recognized” and on page 27, B. Recognition, “Starting from the candidate regions delivered by the Detector, we perform recognition by means of K-NN similarity search between a global descriptor computed on each candidate region and a database of similar descriptors (one for each product).”
Tonioni further discloses on page 28, C. Refinement, “The aim of the final refinement is to remove false detections and re-rank the first K-NN found in the previous step in order to fix possible recognition mistakes … both the Query and each of the first K-NN reference images are described by a set of local features F1, F2, ..., Fk, each consisting in a spatial position (xi, yi) within the image and a compact descriptor fi. Given these features, we look for similarities between descriptors extracted from query and reference images, to compute a set of matches. Matches are then weighted based on the distance in the descriptor space, d(fi; fj) and a geometric consistency criterion relying on the unit-norm vector from the spatial location of a feature to the image center ... Finally, the first K-NN are re-ranked according to the sum of the weights Wij computed for the matches between the local features ... A simple additional refinement step consists in filtering out wrong recognitions by the distance ratio criterion (i.e., by thresholding the ratio of the distances in feature space between the query descriptor and its 1-NN and 2-NN). If the ratio is above a threshold, the recognition is deemed as ambiguous and discarded … given the candidate regions extracted from the query image and their corresponding sets of K-NN, we consider the 1-NN of the region proposals extracted with a high confidence (> 0:1) by the Detector in order to find the main macro category of the image. Then, in case the majority of detections votes for the same macro category, it is safe to assume that the pictured shelf contains almost exclusively items of that category thus filter the K-NN for all candidate regions accordingly” and on page 27, Figure 2, “Final output.”
In Tonioni bounding boxes are drawn around items to be identified and query and reference images are described by a set of local features F1, F2, ..., Fk, each consisting of a spatial position (xi, yi) within the image, corresponding to “setting initial position coordinates of the to-be identified item on the item image.”  The image enclosed by the bounding area, included the bounding area is input into an Embedder [pre-trained attention module] which outputs a feature descriptor of the to-be-identified item, corresponding to “output an item feature of the to-be-identified item.”  Tonioni inputs the feature descriptor into a pre-trained machine learning network which outputs a category and predicted position coordinates of the to-be-identified item [page 27, Figure 2, Final output].  Tonioni also refines the detections to remove possible false detections and recognition mistakes, corresponding to “determining whether a preset condition is satisfied” and outputs the final category of the to-be-identified item.
Tonioni does not disclose a long short-term memory network.  Liu was cited for its disclosure of a long short-term memory network.
Liu discloses in paragraph [0005], “obtaining at least two images that are time sequentially related and show a detected article at different angles; and inputting the images to a detection model in time order, to determine a damage detection result, where the detection model includes a first sub-model and a second sub-model, the first sub-model identifies respective features of each image, a feature processing result of each image is input to the second sub-model, the second sub-model performs time series analysis on the feature processing result to determine the damage detection result” and in paragraph [0017], “the first sub-model uses images of a detected article that are obtained at different angles and generated in time order as inputs, to obtain feature processing results of the images, and outputs the feature processing results to the second sub-model; and the second sub-model performs time series analysis on the feature processing results of the images to determine a damage detection result.” 
Liu further discloses in paragraph [0020], “The first sub-model can be any machine learning model, and an advantageous result usually can be achieved by using an algorithm that is suitable for feature extraction and processing, for example, a deep convolutional neural network (DCNN). The second sub-model can be any machine learning model that can perform time series analysis, for example, a recurrent neural network (RNN), a long short-term memory (LSTM) network” and in paragraphs [0038] - [0042], “the first sub-model is a deep convolutional neural network, and the second sub-model is a long short-term memory (LSTM) network ... Optionally, the second sub-model is an LSTM network based on an attention mechanism ... the damage detection result includes a classification result of each of one or more types of damage… the first sub-model performs feature extraction.”
Liu inputs the item image and the initial position coordinates / angle images (paragraph [0005], images of article at different angles) into a pre-trained attention module (paragraph [0005], a deep convolutional neural network) and outputs an item feature of the to-be-identified item (paragraph [0005], identifies features of each image), the feature is input into a long short-term memory network (paragraph [0020], second sub-model is a long short-term memory (LSTM) network) and outputs a result / predicted category and position (paragraph [0017], classified damage detection result).
Liu discloses the limitation “inputting the item feature into a pre-trained long short-term memory network to output a predicted category.”
Therefore, the combination of references discloses the limitations “setting initial position coordinates of the to-be-identified item on the item image (Tonioni, bounding boxes are drawn, query and reference images are described by a set of local features F1, F2, ..., Fk, each consisting of a spatial position (xi, yi) within the image); and inputting the item image and the initial position coordinates into a pre-trained attention module to output an item feature of the to-be-identified item (Tonioni, the Embedder outputs a feature descriptor); inputting the item feature into a pre-trained long short-term memory network to output a predicted category and predicted position coordinates of the to-be-identified item (Tonioni, recognition by means of K-NN similarity search, Figure 2, Final output; Liu, second sub-model is a long short-term memory (LSTM) network); determining whether a preset condition is satisfied (Tonioni, false detections removed); and determining, in response to the preset condition being satisfied, a predicted category of the to-be-identified item outputted by the long short-term memory network a last time for use as a final category of the to-be-identified item (Tonioni, Refinements, final output category; Liu, second sub-model outputs damage detection results including scratches, damage, adhesives)” as recited in the independent claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 2, 5, 9, 10, 13 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tonioni, Alessio, Eugenio Serra, and Luigi Di Stefano. "A deep learning pipeline for product recognition on store shelves." 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS). IEEE, 2018, hereinafter, “Tonioni”, and further in view of Liu et al., U.S. Publication No. 2020/0293830, hereinafter, “Liu”.

As per claim 1, Tonioni discloses a method for identifying an item, comprising: 
acquiring an item image of a to-be-identified item (Tonioni, page 25, Introduction, Query images for product recognition are taken in the store with cheap equipment (e.g., a smartphone)); 
setting initial position coordinates of the to-be-identified item on the item image (Tonioni, page 25, Fig. 1: Illustration of query images (a) and reference images (b) for the product recognition task. Bounding boxes overlapped to (a) shows correct detection colored according to recognized class; Tonioni, page 25, Introduction, Given a shelf image, we first perform a class-agnostic object detection to extract region proposals enclosing individual items; Tonioni, page 26, A. Detection, Given a query image featuring several items displayed in a store shelf, the first stage of our pipeline aims at obtaining a set of bounding boxes to be used as region proposals); and 
executing following identifying: 
inputting the item image and the initial position coordinates into a pre-trained attention module to output an item feature of the to-be-identified item (Tonioni, page 26, Section III. Proposed Approach, Fig. 2 shows an overview of our proposed pipeline. In the first step ... a Detector extracts region proposals from the query image. Then ... each region proposal is encoded by an Embedder into ad-hoc image descriptors); 
inputting the item feature into a pre-trained machine learning network to output a predicted category and predicted position coordinates of the to-be-identified item (Tonioni, page 27, B. Recognition, Starting from the candidate regions delivered by the Detector, we perform recognition by means of K-NN similarity search between a global descriptor computed on each candidate region and a database of similar descriptors (one for each product); Tonioni, page 28, C. Refinement, given the candidate regions extracted from the query image and their corresponding sets of K-NN, we consider the 1-NN of the region proposals extracted with a high confidence (> 0:1) by the Detector in order to find the main macro category of the image. Then, in case the majority of detections votes for the same macro category, it is safe to assume that the pictured shelf contains almost exclusively items of that category thus filter the K-NN for all candidate regions accordingly);
determining whether a preset condition is satisfied (Tonioni, page 28, C. Refinement, The aim of the final refinement is to remove false detections and re-rank the first K-NN found in the previous step in order to fix possible recognition mistakes … both the Query and each of the first K-NN reference images are described by a set of local features F1, F2, ..., Fk, each consisting in a spatial position (xi, yi) within the image and a compact descriptor fi. Given these features, we look for similarities between descriptors extracted from query and reference images, to compute a set of matches. Matches are then weighted based on the distance in the descriptor space, d(fi; fj) and a geometric consistency criterion relying on the unit-norm vector from the spatial location of a feature to the image center ... Finally, the first K-NN are re-ranked according to the sum of the weights Wij computed for the matches between the local features ... A simple additional refinement step consists in filtering out wrong recognitions by the distance ratio criterion (i.e., by thresholding the ratio of the distances in feature space between the query descriptor and its 1-NN and 2-NN). If the ratio is above a threshold, the recognition is deemed as ambiguous and discarded); and 
determining, in response to the preset condition being satisfied, a predicted category of the to-be-identified item outputted by the machine learning network a last time for use as a final category of the to-be-identified item (Tonioni, page 28, C. Refinement, Finally, we propose a re-ranking and filtering method specific to the grocery domain where ... products belonging to the same macro category are typically displayed close one to another on the shelf. In particular, given the candidate regions extracted from the query image and their corresponding sets of K-NN, we consider the 1-NN of the region proposals extracted with a high confidence (> 0:1) by the Detector in order to find the main macro category of the image. Then, in case the majority of detections votes for the same macro category, it is safe to assume that the pictured shelf contains almost exclusively items of that category thus filter the K-NN for all candidate regions accordingly). 
Tonioni does not explicitly disclose the following limitations as further recited however Liu discloses
inputting the item feature into a pre-trained long short-term memory network to output a predicted category (Liu, ¶0017, the first sub-model uses images of a detected article that are obtained at different angles and generated in time order as inputs, to obtain feature processing results of the images, and outputs the feature processing results to the second sub-model; and the second sub-model performs time series analysis on the feature processing results of the images to determine a damage detection result; Liu, ¶0020, The first sub-model can be any machine learning model, and an advantageous result usually can be achieved by using an algorithm that is suitable for feature extraction and processing, for example, a deep convolutional neural network (DCNN). The second sub-model can be any machine learning model that can perform time series analysis, for example, a recurrent neural network (RNN), a long short-term memory (LSTM) network; Liu, ¶0042, the first sub-model performs feature extraction; Liu, ¶0063, The deep convolutional neural network sub-model first performs feature extraction on each image).
Tonioni and Liu are analogous art as they are both concerned with image processing and recognition via extraction of features from images, the extracted features are input into a machine learning model in order to output a predicted category.  It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to substitute the LSTM of Liu for the machine learning algorithm of Tonioni in order to provide an alternate means to output the predicted category of the features extracted from the input image (Tonioni, page 28, C. Refinement; Liu, ¶0017). 

As per claim 2, Tonioni and Liu disclose the method according to claim 1, wherein the method further comprises: using, in response to determining the preset condition not being satisfied, the predicted position coordinates of the to-be-identified item as the initial position coordinates, and continuing executing the identifying (Tonioni, page 28, C. Refinement, The aim of the final refinement is to remove false detections and re-rank the first K-NN found in the previous step in order to fix possible recognition mistakes … re-ranking of the first K-NN may be achieved by looking at peculiar image details that may ... be crucial to differentiate a product from others looking very similar. Thus, both the Query and each of the first K-NN reference images are described by a set of local features F1, F2, ..., Fk, each consisting in a spatial position (xi, yi) within the image and a compact descriptor fi. Given these features, we look for similarities between descriptors extracted from query and reference images, to compute a set of matches). 

As per claim 5, Tonioni and Liu disclose the method according to claim 1, wherein the preset condition comprises: a number of iterations of executing the identifying being greater than or equal to a preset number of iterations (Tonioni, page 28, B. Recognition, when a query image is processed, the same embedding is computed on each of the candidate regions, ipq, cropped from the query image, iq, so to get E(ipq). Finally, for each ipq we compute the distance in the embedding space with respect to each reference descriptor, denoted as d(E(ipq), E(ir)), in order to sift-out the first K-NN of E(ipq) in the reference database). 

As per claim 9, Tonioni discloses an apparatus for identifying an item, comprising: 
acquiring an item image of a to-be-identified item (Tonioni, page 25, Introduction, Query images for product recognition are taken in the store with cheap equipment (e.g., a smartphone)); 
setting initial position coordinates of the to-be-identified item on the item image (Tonioni, page 25, Fig. 1: Illustration of query images (a) and reference images (b) for the product recognition task. Bounding boxes overlapped to (a) shows correct detection colored according to recognized class; Tonioni, page 25, Introduction, Given a shelf image, we first perform a class-agnostic object detection to extract region proposals enclosing individual items; Tonioni, page 26, A. Detection, Given a query image featuring several items displayed in a store shelf, the first stage of our pipeline aims at obtaining a set of bounding boxes to be used as region proposals); and 
executing following identifying: 
inputting the item image and the initial position coordinates into a pre-trained attention module to output an item feature of the to-be-identified item (Tonioni, page 26, Section III. Proposed Approach, Fig. 2 shows an overview of our proposed pipeline. In the first step ... a Detector extracts region proposals from the query image. Then ... each region proposal is encoded by an Embedder into ad-hoc image descriptors); 
inputting the item feature into a pre-trained machine learning network to output a predicted category and predicted position coordinates of the to-be-identified item (Tonioni, page 27, B. Recognition, Starting from the candidate regions delivered by the Detector, we perform recognition by means of K-NN similarity search between a global descriptor computed on each candidate region and a database of similar descriptors (one for each product); Tonioni, page 28, C. Refinement, given the candidate regions extracted from the query image and their corresponding sets of K-NN, we consider the 1-NN of the region proposals extracted with a high confidence (> 0:1) by the Detector in order to find the main macro category of the image. Then, in case the majority of detections votes for the same macro category, it is safe to assume that the pictured shelf contains almost exclusively items of that category thus filter the K-NN for all candidate regions accordingly); 
determining whether a preset condition is satisfied (Tonioni, page 28, C. Refinement, The aim of the final refinement is to remove false detections and re-rank the first K-NN found in the previous step in order to fix possible recognition mistakes … both the Query and each of the first K-NN reference images are described by a set of local features F1, F2, ..., Fk, each consisting in a spatial position (xi, yi) within the image and a compact descriptor fi. Given these features, we look for similarities between descriptors extracted from query and reference images, to compute a set of matches. Matches are then weighted based on the distance in the descriptor space, d(fi; fj) and a geometric consistency criterion relying on the unit-norm vector from the spatial location of a feature to the image center ... Finally, the first K-NN are re-ranked according to the sum of the weights Wij computed for the matches between the local features ... A simple additional refinement step consists in filtering out wrong recognitions by the distance ratio criterion (i.e., by thresholding the ratio of the distances in feature space between the query descriptor and its 1-NN and 2-NN). If the ratio is above a threshold, d, the recognition is deemed as ambiguous and discarded); and 
determining, in response to the preset condition being satisfied, a predicted category of the to-be-identified item outputted by the machine learning network a last time for use as a final category of the to-be-identified item (Tonioni, page 28, C. Refinement, Finally, we propose a re-ranking and filtering method specific to the grocery domain where ... products belonging to the same macro category are typically displayed close one to another on the shelf. In particular, given the candidate regions extracted from the query image and their corresponding sets of K-NN, we consider the 1-NN of the region proposals extracted with a high confidence (> 0:1) by the Detector in order to find the main macro category of the image. Then, in case the majority of detections votes for the same macro category, it is safe to assume that the pictured shelf contains almost exclusively items of that category thus filter the K-NN for all candidate regions accordingly). 
Tonioni does not explicitly disclose the following limitations as further recited however Liu discloses
at least one processor; and a memory storing instructions, wherein the instructions when executed by the at least one processor, cause the at least one processor to perform operations (Liu, ¶0071, a computing device includes one or more central processing units (CPUs), input/output interfaces, network interfaces, and memories), the operations comprising:
inputting the item feature into a pre-trained long short-term memory network to output a predicted category (Liu, ¶0017, the first sub-model uses images of a detected article that are obtained at different angles and generated in time order as inputs, to obtain feature processing results of the images, and outputs the feature processing results to the second sub-model; and the second sub-model performs time series analysis on the feature processing results of the images to determine a damage detection result; Liu, ¶0020, The first sub-model can be any machine learning model, and an advantageous result usually can be achieved by using an algorithm that is suitable for feature extraction and processing, for example, a deep convolutional neural network (DCNN). The second sub-model can be any machine learning model that can perform time series analysis, for example, a recurrent neural network (RNN), a long short-term memory (LSTM) network; Liu, ¶0042, the first sub-model performs feature extraction; Liu, ¶0063, The deep convolutional neural network sub-model first performs feature extraction on each image).
Tonioni and Liu are analogous art as they are both concerned with image processing and recognition via extraction of features from images, the extracted features are input into a machine learning model in order to output a predicted category.  It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to substitute the LSTM of Liu for the machine learning algorithm of Tonioni in order to provide an alternate means to output the predicted category of the features extracted from the input image (Tonioni, page 28, C. Refinement; Liu, ¶0017).

As per claim 10, Tonioni and Liu disclose the apparatus according to claim 9, wherein the operations further comprise: using, in response to determining the preset condition not being satisfied, the predicted position coordinates of the to-be-identified item as the initial position coordinates, and continuing executing the identifying (Tonioni, page 28, C. Refinement, The aim of the final refinement is to remove false detections and re-rank the first K-NN found in the previous step in order to fix possible recognition mistakes … re-ranking of the first K-NN may be achieved by looking at peculiar image details that may ... be crucial to differentiate a product from others looking very similar. Thus, both the Query and each of the first K-NN reference images are described by a set of local features F1, F2, ..., Fk, each consisting in a spatial position (xi, yi) within the image and a compact descriptor fi. Given these features, we look for similarities between descriptors extracted from query and reference images, to compute a set of matches). 

As per claim 13, Tonioni and Liu disclose the apparatus according to claim 9, wherein the preset condition comprises: a number of iterations of executing the identifying being greater than or equal to a preset number of iterations (Tonioni, page 28, B. Recognition, when a query image is processed, the same embedding is computed on each of the candidate regions, ipq, cropped from the query image, iq, so to get E(ipq). Finally, for each ipq we compute the distance in the embedding space with respect to each reference descriptor, denoted as d(E(ipq), E(ir)), in order to sift-out the first K-NN of E(ipq) in the reference database). 

As per claim 17, Tonioni and Liu disclose a non-transitory computer readable medium, storing a computer program thereon, wherein the computer program, when executed by a processor, implements the method according to claim 1 (Liu, ¶0073, The computer storage medium can be configured to store information that can be accessed by the computing device).

Claims 3, 4, 6-8, 11, 12 and 14-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tonioni, Alessio, Eugenio Serra, and Luigi Di Stefano. "A deep learning pipeline for product recognition on store shelves." 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS). IEEE, 2018, hereinafter, “Tonioni”, in view of Liu et al., U.S. Publication No. 2020/0293830, hereinafter, “Liu” as applied to claims 1, 5, 9 and 13 above, and further in view of Dugar et al., U.S. Publication No. 2021/0012272, hereinafter, “Dugar”.

As per claim 3, Tonioni and Liu disclose the method according to claim 1, but do not explicitly disclose the following limitations as further recited however Dugar discloses wherein the acquiring an item image of a to-be-identified item comprises: 
acquiring a shelf image before a user takes or places the to-be-identified item from or on a shelf, and a shelf image after the user takes or places the to-be-identified item from or on the shelf (Dugar, ¶0040, FIG. 3 shows a ground truth (GT) image 300 of a retail facility shelf location, such as may be stored in planogram 122 … Locations for each of the items is annotated on GT image 300, for example showing locations (1,1) through (3,7); Dugar, ¶0041, FIG. 4 shows a real time (RT) image 400 corresponding to GT image 300 that is collected for the anomaly detection. RT image 400 has an annotated empty location 402. In some examples, RT image 400 is captured by CV component 126); and 
comparing the shelf image before the user takes or places the to-be-identified item from or on the shelf, and the shelf image after the user takes or places the to-be-identified item from or on the shelf, to segment the item image of the to-be-identified item (Duga, ¶0041, initial anomaly detection is performed that identifies any overall anomalous behavior using a comparison of RT image 400 with GT image 300 … The image embedding is extracted from the current planogram image for which the anomalous condition (if present) is to be detected. Some examples use transfer learning with a pre-trained CNN-based architecture in order to compare the image embedding between RT image 400 with GT image 300. If there is a sufficient difference from majority of the planogram images (e.g., GT image 300 and other planogram images corresponding to the same shelf unit location), such as a difference exceeding a threshold, an overall anomalous indicator value is set; Dugar, ¶0042, This permits detection of first level anomalies such as empty (blank) shelf space).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to include the comparison of shelf images as taught by Dugar in the system of Tonioni and Liu in order to provide a means to detect anomalous conditions such as empty shelves, broken items, overcrowding and items in incorrect locations (Dugar, ¶0036).

As per claim 4, Tonioni, Liu and Dugar disclose the method according to claim 3, wherein the comparing the shelf image before the user takes or places the to-be-identified item from or on the shelf, and the shelf image after the user takes or places the to-be-identified item from or on the shelf, to segment the item image of the to-be-identified item comprises: 
inputting the shelf image before the user takes or places the to-be-identified item from or on the shelf, and the shelf image after the user takes or places the to-be-identified item from or on the shelf into a pre-trained target detection model, to output position information of the to-be-identified item (Duga, ¶0041, initial anomaly detection is performed that identifies any overall anomalous behavior using a comparison of RT image 400 with GT image 300 … The image embedding is extracted from the current planogram image for which the anomalous condition (if present) is to be detected. Some examples use transfer learning with a pre-trained CNN-based architecture in order to compare the image embedding between RT image 400 with GT image 300); and 
segmenting the item image of the to-be-identified item from the shelf image before the user takes or places the to-be-identified item from or on the shelf, or the shelf image after the user takes or places the to-be-identified item from or on the shelf based on the position information of the to-be-identified item (Dugar, Figure 3, ground truth image, Figure 4, real time image, item 402, annotated empty location; Dugar, ¶0045, FIG. 6 shows a detected edge image 600 corresponding to RT image 400 … In some examples, a neural net architecture is created and deployed to identify crossing points in an image (e.g., RT image 400), which will become aid in marking boundaries around the items … crossing point detection algorithm assists with segmenting the planogram image (e.g., RT image 400) into various items). 

As per claim 6, Tonioni and Liu disclose the method according to claim 5, wherein the preset number of iterations is determined by: 
inputting the sample item feature into the long short-term memory network, to output a predicted sample category and predicted sample position coordinates of the sample item (Tonioni, page 27, B. Recognition, Starting from the candidate regions delivered by the Detector, we perform recognition by means of K-NN similarity search between a global descriptor computed on each candidate region and a database of similar descriptors (one for each product); Tonioni, page 28, C. Refinement, given the candidate regions extracted from the query image and their corresponding sets of K-NN, we consider the 1-NN of the region proposals extracted with a high confidence (> 0:1) by the Detector in order to find the main macro category of the image. Then, in case the majority of detections votes for the same macro category, it is safe to assume that the pictured shelf contains almost exclusively items of that category thus filter the K-NN for all candidate regions accordingly; Liu, ¶0017, the first sub-model uses images of a detected article that are obtained at different angles and generated in time order as inputs, to obtain feature processing results of the images, and outputs the feature processing results to the second sub-model; and the second sub-model performs time series analysis on the feature processing results of the images to determine a damage detection result; Liu, ¶0020, The first sub-model can be any machine learning model, and an advantageous result usually can be achieved by using an algorithm that is suitable for feature extraction and processing, for example, a deep convolutional neural network (DCNN). The second sub-model can be any machine learning model that can perform time series analysis, for example, a recurrent neural network (RNN), a long short-term memory (LSTM) network); 
determining whether a duration of executing the determining exceeds a preset duration (Tonioni, page 28, C. Refinement, A simple additional refinement step consists in filtering out wrong recognitions by the distance ratio criterion (i.e., by thresholding the ratio of the distances in feature space between the query descriptor and its 1-NN and 2-NN). If the ratio is above a threshold, d, the recognition is deemed as ambiguous and discarded); and
statisticizing, in response to the identification accuracy rate being not lower than the preset accuracy rate, a number of iterations of the determining, for use as the preset number of iterations (Tonioni, page 28, B. Recognition, when a query image is processed, the same embedding is computed on each of the candidate regions, ipq, cropped from the query image, iq, so to get E(ipq). Finally, for each ipq we compute the distance in the embedding space with respect to each reference descriptor, denoted as d(E(ipq), E(ir)), in order to sift-out the first K-NN of E(ipq) in the reference database). 
Tonioni and Liu do not explicitly disclose the following limitations as further recited however Dugar discloses
acquiring a sample, wherein the sample includes a sample item image and a sample category tag of a sample item (Dugar, ¶0004, receive a real time (RT) image of a shelf unit corresponding to at least a first portion of a planogram; detect, within the RT image, item boundaries for a plurality of items on the shelf unit and tag boundaries for a plurality of tags associated with the shelf unit); 
setting initial sample position coordinates of the sample item on the sample item image (Dugar, ¶0004, receive a real time (RT) image of a shelf unit corresponding to at least a first portion of a planogram; detect, within the RT image, item boundaries for a plurality of items on the shelf unit); and executing following determining: 
inputting the sample item image and the initial sample position coordinates into the attention module, to output a sample item feature of the sample item (Dugar, ¶0032, An attribute extraction component 128 is operable to extract attributes, from RT image 400, for at least one of tags 108a-108h and at least of items 106a-106h. Some examples of attribute extraction component 128 use long short-term memory (LSTM) processes, Tesseract LSTM optical character recognition (OCR) processes, and convolutional neural networks (CNNs)); and 
determining an identification accuracy rate based on the predicted sample category and the sample category tag, in response to the duration of executing the determining failing to exceed the preset duration (Dugar, ¶0041, examples use transfer learning with a pre-trained CNN-based architecture in order to compare the image embedding between RT image 400 with GT image 300. If there is a sufficient difference from majority of the planogram images (e.g., GT image 300 and other planogram images corresponding to the same shelf unit location), such as a difference exceeding a threshold, an overall anomalous indicator value is set); 
determining whether the identification accuracy rate is not lower than a preset accuracy rate (Dugar, ¶0041, examples use transfer learning with a pre-trained CNN-based architecture in order to compare the image embedding between RT image 400 with GT image 300. If there is a sufficient difference from majority of the planogram images (e.g., GT image 300 and other planogram images corresponding to the same shelf unit location), such as a difference exceeding a threshold, an overall anomalous indicator value is set).
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify the teachings of Tonioni and Liu to include the sample image and sample tag as taught by Dugar in order to provide an additional means to detect anomalous conditions such as empty shelves, broken items, overcrowding and items in incorrect locations and to validate the determined category via the correspondence between the image and the tag (Dugar, ¶0029; ¶0036).

As per claim 7, Tonioni, Liu and Dugar disclose the method according to claim 6, wherein the determining the preset number of iterations further comprises: using, in response to determining the identification accuracy rate being lower than the preset accuracy rate, the predicted sample position coordinates as the initial sample position coordinates, and continuing executing the determining (Tonioni, page 27, B. Recognition, Starting from the candidate regions delivered by the Detector, we perform recognition by means of K-NN similarity search between a global descriptor computed on each candidate region and a database of similar descriptors (one for each product); Tonioni, page 28, C. Refinement, given the candidate regions extracted from the query image and their corresponding sets of K-NN, we consider the 1-NN of the region proposals extracted with a high confidence (> 0:1) by the Detector in order to find the main macro category of the image. Then, in case the majority of detections votes for the same macro category, it is safe to assume that the pictured shelf contains almost exclusively items of that category thus filter the K-NN for all candidate regions accordingly; Liu, ¶0017, the first sub-model uses images of a detected article that are obtained at different angles and generated in time order as inputs, to obtain feature processing results of the images, and outputs the feature processing results to the second sub-model; and the second sub-model performs time series analysis on the feature processing results of the images to determine a damage detection result; Liu, ¶0020, The first sub-model can be any machine learning model, and an advantageous result usually can be achieved by using an algorithm that is suitable for feature extraction and processing, for example, a deep convolutional neural network (DCNN). The second sub-model can be any machine learning model that can perform time series analysis, for example, a recurrent neural network (RNN), a long short-term memory (LSTM) network). 

As per claim 8, Tonioni, Lui and Dugar disclose the method according to claim 7, wherein the determining the preset number of iterations further comprises: statisticizing, in response to determining the duration of executing the determining exceeding the preset duration, a number of iterations of executing the determining, for use as the preset number of iterations (Tonioni, page 28, B. Recognition, when a query image is processed, the same embedding is computed on each of the candidate regions, ipq, cropped from the query image, iq, so to get E(ipq). Finally, for each ipq we compute the distance in the embedding space with respect to each reference descriptor, denoted as d(E(ipq), E(ir)), in order to sift-out the first K-NN of E(ipq) in the reference database). 

Regarding claim(s) 11: 
A corresponding reasoning as given earlier (see rejection of claim(s) 3) applies, mutatis mutandis, to the subject-matter of claim(s) 11, and therefore is/are also considered rejected under the grounds given in the rejection of claim(s) 3.

Regarding claim(s) 12: 
A corresponding reasoning as given earlier (see rejection of claim(s) 4) applies, mutatis mutandis, to the subject-matter of claim(s) 12, and therefore is/are also considered rejected under the grounds given in the rejection of claim(s) 4.

Regarding claim(s) 14: 
A corresponding reasoning as given earlier (see rejection of claim(s) 6) applies, mutatis mutandis, to the subject-matter of claim(s) 14, and therefore is/are also considered rejected under the grounds given in the rejection of claim(s) 6.

Regarding claim(s) 15 and 16: 
A corresponding reasoning as given earlier (see rejection of claim(s) 7 and 8) applies, mutatis mutandis, to the subject-matter of claim(s) 15 and 16, and therefore is/are also considered rejected under the grounds given in the rejection of claim(s) 7 and 8.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRACY MANGIALASCHI whose telephone number is (571)270-5189. The examiner can normally be reached M-F, 9:30AM TO 6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TRACY MANGIALASCHI/Examiner, Art Unit 2668                 
/VU LE/Supervisory Patent Examiner, Art Unit 2668