Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 1 and 9 are objected to because of the following informalities:  in the first substantive limitations “generate video features related a shopper” should be ‘generate video features related to a shopper’.  Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2-8, 10, 11 and 14-19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 2 and 10 recites the limitation “a processing entity associated with the store”.  It is unclear and indefinite whether this processing entity is the same processing entity invoked in claims 1 and 9, in which claim 2 and 10 depends, respectively.  To expedite prosecution, Examiner assumes the processing entity can be either the same or a different processing entity than the one invoked in claims 1 and 9.
Claims 6 and 14 recites the limitations “a shopper” and “a processing entity”.  It is unclear and indefinite whether this shopper and processing entity is the same shopper and processing entity invoked in claims 1 and 9, in which claims 6 and 14 depend, respectively.  To expedite prosecution, Examiner assumes the shopper and processing entity can be either the same or a different shopper and processing entity than the one invoked in claim 1.
Claims 2-7, 14, 15, 18 and 19, recites the limitation "the deep learning model".  There is insufficient antecedent basis for this limitation in the claim.  To expedite prosecution, Examiner assumes Applicant intends to reference “the trained deep learning model” of claims 1 and 9.
Claims 8, 11, 16 and 17 are rejected as being dependent upon a rejected base claim.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 5, 7-11, 13, 15, 16 and 20 are rejected under 35 USC 103 as being unpatentable over US Pat. Pub. No. 2015/0039458 to Reid in view of Visual Object Tracking via Deep Neural Network to Xu et al. (hereinafter Xu).
Per claim 1, Reid discloses a method (see figures) of identifying actions (fig. 6 and ¶109-114 ... each step shows identifying different scenarios associated with actions of a shopper and associated items for purchase in a retail store, the scenarios including actions of picking up a product, moving the product, placing the product into a container, keeping or removing the product from container and checkout; fig. 3: 48-52 and ¶14-21 ... distinct action scenarios by customer gesture/motion are processed by applying machine learning of features from sensor data, including: taking a product from the shelf (Pick (P) gesture), returning a product to a shelf (Return (R) gesture) and adding product to an electronic shopping cart (Commit-to-Container (C) gesture); fig. 4:66-68…tracking shopper movement and shopper gestures) of a shopper (fig. 1:16…shopper) to account for taken items by the shopper (fig. 3:48-54…accounting for take/return items of shopper) in a cashierless checkout (fig. 3:58, fig. 4:72 and fig. 5:84…automatic checkout, without need for human cashier), comprising:
sampling a shopping environment (fig. 1 and ¶37-38…store is shopping environment, such as grocery store) using one or more video cameras (fig. 1:20 and ¶27,35,39,46,93 ... multiple sensors can be used, including a system of cameras, all sensors outputting captured sensor data; ¶39…capture data with video cameras, ”High Definition (HD) cameras with 3D sensors are placed at the entrance to the store collect image data including video and images data of the customer upon entry. This sensor system communicates data wirelessly with the nearest panel”; ¶93… capture data with video cameras, ”The video camera 20 includes wireless communication hardware and automatically observes movement of the shopper 16 and communicates the shopper's movement, including gestures, to the geo-context panel 12”; pg. 8, left column, lines 18-20: “the step of recognizing implicit gestures includes 3D depth sensor/video recognition of movement of the product into a shopping container”) to generate video features related a shopper in connection to an item (fig. 3:42 and ¶97…shopper identity verification module 42 uses a trained machine learning model with camera captured images to identify shopper; fig. 3:46 and ¶98…item verification module 46 requires another trained machine learning model with camera captured images to recognize item being interacted with), the item being initially located within a zone and the zone being associated with at least the item (¶21…items/products can be in product bins, construed as zones, which these zones having sensors identifying items/products contained therein, “Product bins can have sensors…to add certainty to product identification”; ¶34…”For bulk items, shoppers' gestures and motion are tracked from the Pick event at the bin to the drop event at the IGS. The IGS contains a miniature 3D depth sensor, camera and a wireless radio, such as an 802 .11 x radio) allowing the product to be identified as it is being weighed. Both product identity and weight are sent to the panel, where the item total is tallied and added to the shopper's electronic cart”; ¶52…specific bin zones can be set up for particular types of gesture detection, “In the case of P, a level of tailoring and personalization can be achieved by treating the retail space itself like a large 3DIS. A gesture used to pick from a horizontal, waist high produce bin is different from a gesture used to pick from a high vertical shelf (Note: the panels over the horizontal bins in produce would run gesture patterns appropriate to that context. panels along the refrigerated aisle would run gesture patterns appropriate to that context, etc.”); fig. 6:98…specific zones set for items to be checked out, e.g., paid for);
sampling a shopping environment using one or more supplemental sensors to generate supplemental sensor feature data (¶21…additional sensors can be used including weight sensors to aid in product identification, “Product bins can have sensors including weight and optical sensors to add certainty to product identification”; ¶43…tracking shopper can be accomplished through fusion of at least two different camera output readings of the shopper in the store, e.g., one camera is supplemental to the other, “Voxel data from the 3D sensors, front facing remote cameras and overhead cameras within the panel is co-registered (fused), allowing a correspondence to be made between the pixels representing the face of the shopper and the pixels representing the top/back/side of the shopper's head (overhead view). This allows the shopper to be tracked as they move about the store by the front/top/back/side of their head by the panel”; ¶91…other supplemental sensors can be used, “The electronic device 24 may also include an RFID reader, bar code reader, or optical sensors to further facilitate product identification and confirmation of purchase decisions by the customer 16”), the supplemental sensor having a sensing range capable of producing supplemental feature data associated with the zone in which the item is initially located (¶21…additional sensors can be used including weight sensors to aid in product identification, weight sensor sensing the bin/zone where item/product is, “Product bins can have sensors including weight and optical sensors to add certainty to product identification”; ¶91…other supplemental sensors can be used, “The electronic device 24 may also include an RFID reader, bar code reader, or optical sensors to further facilitate product identification and confirmation of purchase decisions by the customer 16”5);
receiving output of the sampled video and supplemental sensor features as feature inputs to a trained learning model (¶31-33 ...sensor/camera output data is vectorized, e.g., made into feature data for input to machine learning models to classify/label, ”Generally, sensor data collected by the…SSoC is sent to the GPU/NoC to be de-noised and vectorized, and prepared for pattern recognition and machine learning”; ¶40...customer biometric feature data, such as key facial feature data is extracted and vectorized, e.g., produced input feature data and sent to the Neural Network ASIC (NNA) to classify/label; ¶46,50-52 ... vectorization, extracting essential characteristics/features of object categories including M (manufactured objects), G (grown objects), P (physical gestures), F (faces)), the trained learning model generating one or more labels related to a state of a scenario (fig. 3 :48-52 and ¶14-21... distinct scenarios determined by customer gesture/motion are identified/inferred by trained machine learning models as specific behavior states of the shopper in connection with a state of the item/product, namely: taking a product from the shelf (Pick (P) gesture), returning a product to a shelf (Return (R) gesture) and adding product to an electronic shopping cart (Commit-to-Container (C) gesture); ¶45-62...different trained machine learning models to recognize four object categories M, G, P, and F, “Vectorization, extracting the essential characteristic of M, G, P, F, will employ algorithms specific to each type”), the scenario including the shopper action of taking the item (fig. 3:48…pick recognizer, e.g., taking a product from the shelf (Pick (P)) gesture) and moving outside the zone initially associated with the item (fig. 6:98 and ¶108,114…”Checkout is enabled when the customer passes through the demarcation zone with products selected by gesture”); and 
processing the labels to infer the shopper action of taking the item outside the zone initially associated with the item (¶78…”the server detects a gesture indicating an intention to check out including movement of the customer within a pre-defined geography, such as a retail store exit, and automatically consummates a purchase transaction for the items in the electronic shopping cart”;
wherein at least one processing entity associated with a store (¶31-33... collected sensor data is sent to GPU/NoC, part of processing entity/panel of fig. 2:28 which is in the store; ¶9…panels/servers in the store, “the servers are attached to the ceiling of the retail store”) sets the state of the item to change from one as item handled by said shopper (fig. 3:48…item/pick) to item queued for purchase (fig. 3:52,56…commit-to-cart and add to electronic cart is item/product being queued for purchase);
wherein at least one processing entity associated with the store detects the state of the shopper as having finished shopping, causing a charge to an account associated with the shopper (fig. 6:98 and ¶108,114…item chargeable to an electronic shopping cart and shopper account upon shopper exit from the store, e.g., shopper finished shopping, ”Checkout is enabled when the customer passes through the demarcation zone with products selected by gesture”; ¶54…”Once authenticated, and the transaction is committed, the shopper’s checking account is electronically debited, directly”).
Reid does not expressly disclose, but Xu does teach: the trained learning model for visual object tracking being a trained deep learning model (Abstract and Section III…trained deep neural network).
 Reid and Xu are analogous art because they are from similar problem solving in visual object tracking using a trained neural network.
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to use a trained deep learning model.
The suggestion/motivation for doing so would have been to improve accuracy with more layers and nodes in the neural network.  Furthermore, object tracking with deep neural networks can achieve more robustness against variations (Xu:Abstract and Section V).
Per claim 2, Reid combined with Xu discloses claim 1, Reid further disclosing object detection is performed by a processing entity associated with the store (¶31-33... collected sensor data is sent to GPU/NoC, part of processing entity/panel of fig. 2:28 which is in the store; ¶9…panels/servers in the store, “the servers are attached to the ceiling of the retail store”; fig. 1:24 and ¶90…electronic device 24 attached to cart can be another processing entity) to track a location of the shopper and create shopper location features that are added as additional inputs to the deep learning model used for generating labels relating to the scenario (¶27…”Output from the 3D depth sensor and camera are calibrated to a centimeter resolution, extended UTM based coordinate system for the store, allowing image pixels to be co-registered to 3D depth sensor voxels”; ¶90…the electronic device 24 is attached to the cart 26, or to a shopping basket, and maintain electronic communication with the geo context panel 12. This enables the exact location of the shopper and the electronic device 24 to within centimeter accuracy, which facilitates item identification by the server to be rapid, and nearly error free”, item identification being performed by trained machine learning model).
Per claim 3, Reid combined with Xu discloses claim 2, Reid further disclosing feature engineering is performed (fig. 2:36…trained neural networks use training data to establish a set of nodes/neurons with a set of determined weights, construed as engineered feature data; fig. 3:48-52 and ¶14-21... distinct scenarios determined by customer gesture/motion are identified by trained machine learning models, which have engineered feature data since neural nets are nodes/neurons with engineered weights thru training, that recognize: taking a product from the shelf (Pick (P) gesture), returning a product to a shelf (Return (R) gesture) and adding product to an electronic shopping cart (Commit-to-Container (C) gesture) to generate skeletal data features of the shopper and the skeletal data features or additional engineered features based thereon are added as additional inputs of the deep learning model for generating labels relating to the scenario (¶14…vectorized camera data associated with skeletal movement/gesture of shopper will be processed by trained machine learning model to predict a take of item, e.g., pick, ”Pick (P) gesture, which includes taking a object (i.e. product) from its shelf”; ¶50..."For P, algorithms focused on the human skeleton, specifically the angles made by the joints, will be employed for vectorization").
Per claim 5, Reid combined with Xu discloses claim 1, Reid further disclosing weight sensor features or engineered features derived therefrom are added as additional inputs to the input of the deep learning model for generating labels relating to the scenario (¶38…”scales are provided that communicate wirelessly, or via a wired connection, with the panel to automatically communicate weight of products to the panel. A scale can be located in the produce section of a grocery store, or incorporated into each product bin. In this way, when a product is removed from a bin or shelf, the product weight can be determined when the customer uses a scale in proximity to the bin or shelf. The panel calculates a price based on the weight and tallies this price with other items collected by the shopper, in the shopper's electronic cart”).
Per claim 7, Reid combined with Xu discloses claim 1, Reid further disclosing a shopper-aware display is provided, the shopper-aware display coupled to a processing entity of the store that generates display information triggered by processing labels output from the deep learning model (¶73-78…shopper has a user device, e.g., smart phone, with a display and user interface, that displays the electronic cart with the items added thru gesture detection).
Per claim 8, Reid combined with Xu discloses claim 7, Reid further disclosing the display is triggered to automatically display pricing information for an item held by the shopper (¶16…”Commit-to-Container gesture, which is termed herein as a (C) gesture. C gestures cause the price/description associated with the committed item to be added to an electronic shopping cart”), or is triggered to display a current status of take or return inference drawn by the processing entity (¶74…”user device enable the customer to selectively reject the choice, which assures that mistaken charges will not appear on the electronic shopping cart”). 
Claim 9 is substantially the same as claim 1, where all the limitations of claim 9 are covered in claim 1.  Therefore, the rejection of claim 1 is applied accordingly.
Per claim 10, Reid combined with Xu discloses claim 9, Reid further disclosing object detection is performed by a processing entity associated with the store (¶31-33... collected sensor data is sent to GPU/NoC, part of processing entity/panel of fig. 2:28 which is in the store; ¶9…panels/servers in the store, “the servers are attached to the ceiling of the retail store”; fig. 1:24 and ¶90…electronic device 24 attached to cart can be another processing entity) to track a location of the shopper and create shopper location features that are an input to a machine learning model for generating labels relating to the scenario (¶27…”Output from the 3D depth sensor and camera are calibrated to a centimeter resolution, extended UTM based coordinate system for the store, allowing image pixels to be co-registered to 3D depth sensor voxels”; ¶90…the electronic device 24 is attached to the cart 26, or to a shopping basket, and maintain electronic communication with the geo context panel 12. This enables the exact location of the shopper and the electronic device 24 to within centimeter accuracy, which facilitates item identification by the server to be rapid, and nearly error free”, item identification being performed by a trained machine learning model). 
Per claim 11, Reid combined with Xu discloses claim 10, Reid further disclosing feature engineering is performed (fig. 2:36…trained neural networks use training data to establish a set of nodes/neurons with a set of determined weights, construed as engineered feature data; fig. 3:48-52 and ¶14-21... distinct scenarios determined by customer gesture/motion are identified by trained machine learning models, which have engineered feature data since neural nets are nodes/neurons with engineered weights thru training, that recognize: taking a product from the shelf (Pick (P) gesture), returning a product to a shelf (Return (R) gesture) and adding product to an electronic shopping cart (Commit-to-Container (C) gesture) to generate skeletal data features of the shopper and the skeletal data features or additional engineered features based thereon are added as an input to a machine learning model for generating labels relating to the scenario (¶14…vectorized camera data associated with skeletal movement/gesture of shopper will be processed by trained machine learning model to predict a take of item, e.g., pick, ”Pick (P) gesture, which includes taking a object (i.e. product) from its shelf”; ¶50..."For P, algorithms focused on the human skeleton, specifically the angles made by the joints, will be employed for vectorization").
Per claim 13, Reid combined with Xu discloses claim 9, Reid further disclosing weight sensor features or engineered features derived therefrom are added as additional inputs to the input to a machine learning model for generating labels relating to the scenario (¶38…”scales are provided that communicate wirelessly, or via a wired connection, with the panel to automatically communicate weight of products to the panel. A scale can be located in the produce section of a grocery store, or incorporated into each product bin. In this way, when a product is removed from a bin or shelf, the product weight can be determined when the customer uses a scale in proximity to the bin or shelf. The panel calculates a price based on the weight and tallies this price with other items collected by the shopper, in the shopper's electronic cart”).
Claims 15 and 16 are substantially similar in scope and spirit as claims 7 and 8, respectively.  Therefore, the rejection of claims 7 and 8 are applied accordingly.
Per claim 20, Reid combined with Xu discloses claim 9, Reid further disclosing the zone is defined by or related to volume of space, or one or more planes, or a distance, or a region, or a shelf, or an item, or a section, or a lane, or a value, or a variable, or a condition, or a magnitude, or a time period associated with one or more items in this list, or a combination of one or more items in this list (fig. 6:98…demarcation zone is a checkout region for items in the store).
Claims 4 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Reid combined with Xu as applied to claims 1 and 9 above, respectively, and further in view of US Pat. Pub. No. 2004/0056907 to Sharma et al. (hereinafter Sharma).
Per claim 4, Reid combined with Xu discloses claim 1.
Reid combined with Xu does not expressly disclose, but Sharma does teach: audio microphone features or engineered features derived from audio (fig. 2:208…audio features 208 extracted from audio signal captured 206 from a microphone) are added as additional inputs (fig. 2 and ¶40…audio features 208 are additional features that are combined with visual features 204) to the deep learning model (fig. 1…training of model; fig. 2…apply trained model to analyze the acquired visual and audio data/features) for generating labels relating to the scenario (fig. 2:214…classification of gesture type based on co-analysis of input visual and audio data/features using the trained model).
Reid combined with Xu and Sharma are analogous art because they are from similar problem solving area in classifying/labeling human gestures using visual signals from cameras (Reid: fig. 4:68).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to include in Reid combined with Xu, multi-modal input of both visual and audio signals/features into the trained deep learning model for classification/labeling of shopper gestures.
The suggestion/motivation for doing so would have been to improve the accuracy in recognizing human gestures in a video sequence (Sharma: ¶13).  The co-analysis helps in detecting and identifying small hand movements, which subsequently improves the rate of continuous gesture recognition (Sharma: ¶15).
Per claim 12, Reid combined with Xu discloses claim 9.
Reid combined with Xu does not expressly disclose, but Sharma does teach: audio microphone features or engineered features derived from audio (fig. 2:208…audio features 208 extracted from audio signal captured 206 from a microphone) are added as additional inputs (fig. 2 and ¶40…audio features 208 are additional features that are combined with visual features 204) to a machine learning model (fig. 1…training of model; fig. 2…apply trained model to analyze the acquired visual and audio data/features) for generating labels relating to the scenario (fig. 2:214…classification of gesture type based on co-analysis of input visual and audio data/features using the trained model).
Reid combined with Xu and Sharma are analogous art because they are from similar problem solving area in classifying/labeling human gestures using visual signals from cameras (Reid: fig. 4:68).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to include in Reid combined with Xu, multi-modal input of both visual and audio signals/features into the trained deep learning model for classification/labeling of shopper gestures.
The suggestion/motivation for doing so would have been to improve the accuracy in recognizing human gestures in a video sequence (Sharma: ¶13).  The co-analysis helps in detecting and identifying small hand movements, which subsequently improves the rate of continuous gesture recognition (Sharma: ¶15).
Allowable Subject Matter
Claims 6 and 14, 17-19 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112 set forth in this Office action and to include all of the limitations of the base claim and any intervening claims. 
The following is the statement of reasons for the indication of allowable subject matter:  The prior art disclosed by the applicant and cited by the Examiner fail to teach or suggest, alone or in combination, all the limitations of the independent and intervening claims (claims 1&4 and 9&12), further including the particular notable limitations provided below: the microphone captures a question from a shopper and a processing entity associated with the store generates a response to the question using the generated labels from the deep learning model to provide the identity of the item and at least one label describing the scenario when formulating the response.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALAN CHEN whose telephone number is (571) 272-4143. The examiner can normally be reached M-F 10-7.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALAN CHEN/Primary Examiner, Art Unit 2125