DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-21 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of U.S. Patent No. 11423657. Although the claims at issue are not identical, they are not patentably distinct from each other.
Instant application # 17/847880
Patent # 11423657
1. A system, comprising: an object configured to store items; an image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of the object, wherein the image sensor is configured to generate images of the stored items; and a tracking subsystem coupled to the image sensor, the tracking subsystem comprising at least one processor configured to: determine, using a set of images generated by the image sensor, a pixel position of a body part of a person in each image of the set of images, thereby determining a set of pixel positions of the body part during a timeframe associated with the set of images; determine an aggregated body part position based on the set of pixel positions determined for the timeframe; determine that the aggregated body part position corresponds to a position associated with the object; and in response to determining that the aggregated body part position corresponds to a position associated with the object, provide a trigger signal indicating an interaction event has occurred.

2. The system of Claim 1, wherein the processor is further configured to determine that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions (claim 2 of the instant application is rejected as being obvious over claim 1 of the Patent).


3. The system of Claim 1, wherein: the system further comprises a second image sensor positioned such that a field- of-view of the image sensor encompasses at least a portion of the object, wherein the second image sensor is configured to generate top-down images of a region around the object; and the processor is communicatively coupled to the second image sensor and further configured to: receive a top-view image feed comprising top-view images from the second image sensor: determine. based on the received top-view image feed. that the person is within a threshold distance from the object: and in response to determining that the person is within the threshold distance of the object. begin receiving an image feed comprising the set of images generated by the image sensor.

4. The system of Claim 1. wherein the processor is further configured to determine the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the set of images.


5. The system of Claim 1. wherein: the object includes a visible marker located at a predefined location: and the processor is further configured to: detect the visible marker; and determine the predefined position associated with the object based on the detected markers.

6. The system of Claim 1, wherein the processor is further configured to, in response to providing the trigger signal: determine at least one item-selection image associated with the person removing a first item from the object; determine, in the at least one item-selection image, a region-of-interest based on the aggregated body part position, wherein the region-of-interest includes a subset of the pixels of the item-selection image; identify, using an object detection algorithm, the first item in the selected region-of-interest; and assign the identified first item to the person.


7. The system of claim 6, wherein the processor is further configured to: determine, based on the aggregated body part position, candidate items that may have been removed from the object by the person, wherein the candidate items include a subset of all stored items, wherein the subset comprises the items located within a threshold distance of the aggregated body part position. for each candidate item, determine, based on a comparison of a predefined position associated with the candidate items to the aggregated body part position, a probability value that the candidate item was interacted with by the person; and identify the first item as the candidate item with the largest probability value.


8. A method, comprising: determining, using a set of images generated by an image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of an object configured to store items, a pixel position of a body part of a person in each image, thereby determining a set of pixel positions of the body part during a timeframe associated with the set of images; determining an aggregated body part position based on the set of pixel positions determined for the timeframe; determining that the aggregated body part position corresponds to a position associated with the object; and in response to determining that the aggregated body part position corresponds to a position associated with the object, providing a trigger signal indicating an interaction event has occurred.

9. The method of Claim 8, further comprising determining that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions. (claim 9 of the instant application is rejected over claim 8 of the patent).




10. The method of Claim 8, further comprising: receiving a top-view image feed comprising top-view images from a second image sensor, wherein the second image sensor is positioned such that a field-of-view of the second image sensor encompasses at least a portion of the object, wherein the second image sensor is configured to generate top-down images of a region around the object; determining, based on the received top-view image feed, that the person is within a threshold distance from the object; and in response to determining that the person is within the threshold distance of the object, beginning to receive an image feed comprising the set of images generated by the image sensor.

11. The method of Claim 8, further comprising determining the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the set of images.


12. The method of Claim 8, wherein: the object includes a visible marker located at a predefined location; and the method further comprises: detecting the visible marker; and determining the predefined position associated with the object based on the detected markers.

13. The method of Claim 8, further comprising, in response to providing the trigger signal: determining at least one item-selection image associated with the person removing a first item from the object; determining, in the at least one item-selection image, a region-of-interest based on the aggregated body part position, wherein the region-of-interest includes a subset of the pixels of the item-selection image; identifying, using an object detection algorithm, the first item in the selected region-of-interest; and assigning the identified first item to the person.

14. The method of claim 13, further comprising: determining, based on the aggregated body part position, candidate items that may have been removed from the object by the person, wherein the candidate items include a subset of all stored items, wherein the subset comprises the items located within a threshold distance of the aggregated body part position; for each candidate item, determining, based on a comparison of a predefined position associated with the candidate items to the aggregated body part position, a probability value that the candidate item was interacted with by the person; and identifying the first item as the candidate item with the largest probability value.



15. A tracking subsystem comprising at least one processor configured to: determine, using a set of images generated by an image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of an object configured to store items, a pixel position of a body part of a person in each image, thereby determining a set of pixel positions of the body part during a timeframe associated with the set of images; determine an aggregated body part position based on the set of pixel positions determined for the timeframe; determine that the aggregated body part position corresponds to a position associated with the object; and in response to determining that the aggregated body part position corresponds to a position associated with the object, provide a trigger signal indicating an interaction event has occurred.

16. The tracking subsystem of Claim 15, wherein the processor is further configured to determine that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions (claim 16 is rejected as being obvious over claim 15 of the patent).






17. The tracking subsystem of Claim 15, wherein the processor is further configured to: receive a top-view image feed comprising top-view images from a second image sensor, wherein the second image sensor is positioned such that a field-of-view of the second image sensor encompasses at least a portion of the object, wherein the second image sensor is configured to generate the top-down images of a region around the object; determine, based on the received top-view image feed, that the person is within a threshold distance from the object; and in response to determining that the person is within the threshold distance of the object, begin receiving an image feed comprising the set of images generated by the image sensor.

18. The tracking subsystem of Claim 15, wherein the processor is further configured to determine the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the set of images.


19. The tracking subsystem of Claim 15, wherein: the object includes a visible marker located at a predefined location; and the processor is further configured to: detect the visible marker; and determine the predefined position associated with the object based on the detected markers.
20. The tracking subsystem of Claim 15, wherein the processor is further configured to, in response to providing the trigger signal: determine at least one item-selection image associated with the person removing a first item from the object; determine, in the at least one item-selection image, a region-of-interest based on the aggregated body part position, wherein the region-of-interest includes a subset of the pixels of the item-selection image; identify, using an object detection algorithm, the first item in the selected region-of-interest; and assign the identified first item to the person.

21. The tracking subsystem of claim 20, wherein the processor is further configured to: determine, based on the aggregated body part position, candidate items that may have been removed from the object by the person, wherein the candidate items include a subset of all stored items, wherein the subset comprises the items located within a threshold distance of the aggregated body part position; for each candidate item, determine, based on a comparison of a predefined position associated with the candidate items to the aggregated body part position, a probability value that the candidate item was interacted with by the person; and identify the first item as the candidate item with the largest probability value.
1. A system, comprising: an object configured to store items; an image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of the object, wherein the image sensor is configured to generate angled-view images of the stored items; and a tracking subsystem coupled to the image sensor, the tracking subsystem comprising at least one processor configured to: determine that a person is within a threshold distance of the object; receive an image feed comprising frames of the angled-view images generated by the image sensor after the person is within the threshold distance of the object; for each image frame of at least a portion of the image feed, determine a pixel position of a body part of the person in the image frame, thereby determining a set of pixel positions of the body part during a timeframe associated with the image feed; determine an aggregated body part position based on the set of pixel positions determined for the image frames of the portion of the image feed; determine that the aggregated body part position corresponds to a position associated with the object; and in response to determining that the aggregated body part position corresponds to a position associated with the object, provide a trigger signal indicating an interaction event has occurred; wherein the processor is further configured to determine that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions.

2. The system of claim 1, wherein: the system further comprises a second image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of the object, wherein the second image sensor is configured to generate top-down images of a region around the object; and the processor is communicatively coupled to the second image sensor and further configured to: receive a top-view image feed comprising top-view images from the second image sensor; determine, based on the received top-view image feed, that the person is within the threshold distance from the object; and in response to determining that the person is within the threshold distance of the object, begin receiving the image feed comprising the frames of the angled-view images.

3. The system of claim 1, wherein the processor is further configured to determine the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the image frames of the portion of the image feed.

4. The system of claim 1, wherein: the object includes a visible marker located at a predefined location; and the processor is further configured to: detect the visible marker; and determine the predefined position associated with the object based on the detected markers.

5. The system of claim 1, wherein the processor is further configured to, in response to providing the trigger signal: determine at least one item-selection image frame associated with the person removing a first item from the object; determine, in the at least one item-selection image frame, a region-of-interest based on the aggregated body part position, wherein the region-of-interest includes a subset of the pixels of the image frame; identify, using an object detection algorithm, the first item in the selected region-of-interest; and assign the identified first item to the person.

6. The system of claim 5, wherein the processor is further configured to determine, based on the aggregated body part position, candidate items that may have been removed from the object by the person, wherein the candidate items include a subset of all stored items, wherein the subset comprises the items located within a threshold distance of the aggregated body part position.
7. The system of claim 6, wherein the processor is further configured to: for each candidate item, determine, based on a comparison of a predefined position associated with the candidate items to the aggregated body part position, a probability value that the candidate item was interacted with by the person; and identify the first item as the candidate item with the largest probability value.

8. A method, comprising: determining that a person is within a threshold distance of an object configured to store items; receiving an image feed comprising frames of angled-view images generated by an image sensor after the person is within the threshold distance of the object, wherein the image sensor is positioned such that a field-of-view of the image sensor encompasses at least a portion of the object; for each image frame of at least a portion of the image feed, determine a pixel position of a body part of the person in the image frame, thereby determining a set of pixel positions of the body part during a timeframe associated with the image feed; determining an aggregated body part position based on the set of pixel positions determined for the image frames of the portion of the image feed; determining that the aggregated body part position corresponds to a position associated with the object; in response to determining that the aggregated body part position corresponds to a position associated with the object, providing a trigger signal indicating an interaction event has occurred; and determining that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions.

9. The method of claim 8, further comprising: receiving a top-view image feed comprising top-view images from a second image sensor, wherein the second image sensor is positioned such that a field-of-view of the second image sensor encompasses at least a portion of the object, wherein the second image sensor is configured to generate top-down images of a region around the object; determining, based on the received top-view image feed, that the person is within the threshold distance from the object; and in response to determining that the person is within the threshold distance of the object, beginning to receive the image feed comprising the frames of the angled-view images.

10. The method of claim 8, further comprising determining the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the image frames of the portion of the image feed.

11. The method of claim 8, wherein: the object includes a visible marker located at a predefined location; and the method further comprises: detecting the visible marker; and determining the predefined position associated with the object based on the detected markers.

12. The method of claim 8, further comprising, in response to providing the trigger signal: determining at least one item-selection image frame associated with the person removing a first item from the object; determining, in the at least one item-selection image frame, a region-of-interest based on the aggregated body part position, wherein the region-of-interest includes a subset of the pixels of the image frame; identifying, using an object detection algorithm, the first item in the selected region-of-interest; and assigning the identified first item to the person.

13. The method of claim 12, further comprising determining, based on the aggregated body part position, candidate items that may have been removed from the object by the person, wherein the candidate items include a subset of all stored items, wherein the subset comprises the items located within a threshold distance of the aggregated body part position.
14. The method of claim 13, further comprising: for each candidate item, determining, based on a comparison of a predefined position associated with the candidate items to the aggregated body part position, a probability value that the candidate item was interacted with by the person; and identifying the first item as the candidate item with the largest probability value.

15. A tracking subsystem comprising at least one processor configured to: determine that a person is within a threshold distance of an object configured to store items; receive an image feed comprising frames of angled-view images generated by an image sensor after the person is within the threshold distance of the object, wherein the image sensor is positioned such that a field-of-view of the image sensor encompasses at least a portion of the object, wherein the image sensor is configured to generate the angled-view images of the stored items; for each image frame of at least a portion of the image feed, determine a pixel position of a body part of the person in the image frame, thereby determining a set of pixel positions of the body part during a timeframe associated with the image feed; determine an aggregated body part position based on the set of pixel positions determined for the image frames of the portion of the image feed; determine that the aggregated body part position corresponds to a position associated with the object; and in response to determining that the aggregated body part position corresponds to a position associated with the object, provide a trigger signal indicating an interaction event has occurred; wherein the processor is further configured to determine that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions.

16. The tracking subsystem of claim 15, wherein the processor is further configured to: receive a top-view image feed comprising top-view images from a second image sensor, wherein the second image sensor is positioned such that a field-of-view of the second image sensor encompasses at least a portion of the object, wherein the second image sensor is configured to generate the top-down images of a region around the object; determine, based on the received top-view image feed, that the person is within the threshold distance from the object; and in response to determining that the person is within the threshold distance of the object, begin receiving the image feed comprising the frames of the angled-view images.

17. The tracking subsystem of claim 15, wherein the processor is further configured to determine the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the image frames of the portion of the image feed.

18. The tracking subsystem of claim 15, wherein: the object includes a visible marker located at a predefined location; and the processor is further configured to: detect the visible marker; and determine the predefined position associated with the object based on the detected markers.
19. The tracking subsystem of claim 15, wherein the processor is further configured to, in response to providing the trigger signal: determine at least one item-selection image frame associated with the person removing a first item from the object; determine, in the at least one item-selection image frame, a region-of-interest based on the aggregated body part position, wherein the region-of-interest includes a subset of the pixels of the image frame; identify, using an object detection algorithm, the first item in the selected region-of-interest; and assign the identified first item to the person.

20. The tracking subsystem of claim 19, wherein the processor is further configured to determine, based on the aggregated body part position, candidate items that may have been removed from the object by the person, wherein the candidate items include a subset of all stored items, wherein the subset comprises the items located within a threshold distance of the aggregated body part position.
21. The tracking subsystem of claim 20, wherein the processor is further configured to: for each candidate item, determine, based on a comparison of a predefined position associated with the candidate items to the aggregated body part position, a probability value that the candidate item was interacted with by the person; and identify the first item as the candidate item with the largest probability value.


Claim 1 of the instant application is unpatentable under the judicially created doctrine of “obviousness-type” double patenting with respect to claim 1 of U.S. Patent No. 11423657.  
Application claim 1 defines an obvious variation of the invention claimed in U.S. Patent No. 11423657.
The assignee of all applications of all applications is the same.
Claim 1 of the instant application is anticipated by patent claim 1 in that claim of the patent contains all the limitations of claim 1 of the instant application.  Claim 1 of the instant application therefore is not patently distinct from the earlier patent claim and as such is unpatentable for obvious-type double patenting. 

4.	Claims 1-21 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of U.S. Patent No. 11003918. Although the claims at issue are not identical, they are not patentably distinct from each other. 
Instant application # 17/857880
Patent # 11003918
1. A system, comprising: an object configured to store items; an image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of the object, wherein the image sensor is configured to generate images of the stored items; and a tracking subsystem coupled to the image sensor, the tracking subsystem comprising at least one processor configured to: determine, using a set of images generated by the image sensor, a pixel position of a body part of a person in each image of the set of images, thereby determining a set of pixel positions of the body part during a timeframe associated with the set of images; determine an aggregated body part position based on the set of pixel positions determined for the timeframe; determine that the aggregated body part position corresponds to a position associated with the object; and in response to determining that the aggregated body part position corresponds to a position associated with the object, provide a trigger signal indicating an interaction event has occurred.

2. The system of Claim 1, wherein the processor is further configured to determine that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions.




3. The system of Claim 1, wherein: the system further comprises a second image sensor positioned such that a field- of-view of the image sensor encompasses at least a portion of the object, wherein the second image sensor is configured to generate top-down images of a region around the object; and the processor is communicatively coupled to the second image sensor and further configured to: receive a top-view image feed comprising top-view images from the second image sensor: determine. based on the received top-view image feed. that the person is within a threshold distance from the object: and in response to determining that the person is within the threshold distance of the object. begin receiving an image feed comprising the set of images generated by the image sensor.
4. The system of Claim 1. wherein the processor is further configured to determine the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the set of images.


5. The system of Claim 1. wherein: the object includes a visible marker located at a predefined location: and the processor is further configured to: detect the visible marker; and determine the predefined position associated with the object based on the detected markers.
6. The system of Claim 1, wherein the processor is further configured to, in response to providing the trigger signal: determine at least one item-selection image associated with the person removing a first item from the object; determine, in the at least one item-selection image, a region-of-interest based on the aggregated body part position, wherein the region-of-interest includes a subset of the pixels of the item-selection image; identify, using an object detection algorithm, the first item in the selected region-of-interest; and assign the identified first item to the person.
7. The system of claim 6, wherein the processor is further configured to: determine, based on the aggregated body part position, candidate items that may have been removed from the object by the person, wherein the candidate items include a subset of all stored items, wherein the subset comprises the items located within a threshold distance of the aggregated body part position. for each candidate item, determine, based on a comparison of a predefined position associated with the candidate items to the aggregated body part position, a probability value that the candidate item was interacted with by the person; and identify the first item as the candidate item with the largest probability value.





8. A method, comprising: determining, using a set of images generated by an image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of an object configured to store items, a pixel position of a body part of a person in each image, thereby determining a set of pixel positions of the body part during a timeframe associated with the set of images; determining an aggregated body part position based on the set of pixel positions determined for the timeframe; determining that the aggregated body part position corresponds to a position associated with the object; and in response to determining that the aggregated body part position corresponds to a position associated with the object, providing a trigger signal indicating an interaction event has occurred.

9. The method of Claim 8, further comprising determining that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions.





10. The method of Claim 8, further comprising: receiving a top-view image feed comprising top-view images from a second image sensor, wherein the second image sensor is positioned such that a field-of-view of the second image sensor encompasses at least a portion of the object, wherein the second image sensor is configured to generate top-down images of a region around the object; determining, based on the received top-view image feed, that the person is within a threshold distance from the object; and in response to determining that the person is within the threshold distance of the object, beginning to receive an image feed comprising the set of images generated by the image sensor.

11. The method of Claim 8, further comprising determining the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the set of images.


12. The method of Claim 8, wherein: the object includes a visible marker located at a predefined location; and the method further comprises: detecting the visible marker; and determining the predefined position associated with the object based on the detected markers.

13. The method of Claim 8, further comprising, in response to providing the trigger signal: determining at least one item-selection image associated with the person removing a first item from the object; determining, in the at least one item-selection image, a region-of-interest based on the aggregated body part position, wherein the region-of-interest includes a subset of the pixels of the item-selection image; identifying, using an object detection algorithm, the first item in the selected region-of-interest; and assigning the identified first item to the person.

14. The method of claim 13, further comprising: determining, based on the aggregated body part position, candidate items that may have been removed from the object by the person, wherein the candidate items include a subset of all stored items, wherein the subset comprises the items located within a threshold distance of the aggregated body part position; for each candidate item, determining, based on a comparison of a predefined position associated with the candidate items to the aggregated body part position, a probability value that the candidate item was interacted with by the person; and identifying the first item as the candidate item with the largest probability value.




15. A tracking subsystem comprising at least one processor configured to: determine, using a set of images generated by an image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of an object configured to store items, a pixel position of a body part of a person in each image, thereby determining a set of pixel positions of the body part during a timeframe associated with the set of images; determine an aggregated body part position based on the set of pixel positions determined for the timeframe; determine that the aggregated body part position corresponds to a position associated with the object; and in response to determining that the aggregated body part position corresponds to a position associated with the object, provide a trigger signal indicating an interaction event has occurred.

16. The tracking subsystem of Claim 15, wherein the processor is further configured to determine that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions.








17. The tracking subsystem of Claim 15, wherein the processor is further configured to: receive a top-view image feed comprising top-view images from a second image sensor, wherein the second image sensor is positioned such that a field-of-view of the second image sensor encompasses at least a portion of the object, wherein the second image sensor is configured to generate the top-down images of a region around the object; determine, based on the received top-view image feed, that the person is within a threshold distance from the object; and in response to determining that the person is within the threshold distance of the object, begin receiving an image feed comprising the set of images generated by the image sensor.

18. The tracking subsystem of Claim 15, wherein the processor is further configured to determine the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the set of images.

19. The tracking subsystem of Claim 15, wherein: the object includes a visible marker located at a predefined location; and the processor is further configured to: detect the visible marker; and determine the predefined position associated with the object based on the detected markers.


20. The tracking subsystem of Claim 15, wherein the processor is further configured to, in response to providing the trigger signal: determine at least one item-selection image associated with the person removing a first item from the object; determine, in the at least one item-selection image, a region-of-interest based on the aggregated body part position, wherein the region-of-interest includes a subset of the pixels of the item-selection image; identify, using an object detection algorithm, the first item in the selected region-of-interest; and assign the identified first item to the person.

21. The tracking subsystem of claim 20, wherein the processor is further configured to: determine, based on the aggregated body part position, candidate items that may have been removed from the object by the person, wherein the candidate items include a subset of all stored items, wherein the subset comprises the items located within a threshold distance of the aggregated body part position; for each candidate item, determine, based on a comparison of a predefined position associated with the candidate items to the aggregated body part position, a probability value that the candidate item was interacted with by the person; and identify the first item as the candidate item with the largest probability value.
1. A system, comprising: a rack comprising shelves configured to store items; an image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of the rack, wherein the image sensor is configured to generate angled-view images of the items stored on the shelves of the rack; and a tracking subsystem coupled to the image sensor, the tracking subsystem comprising at least one processor configured to: determine that a person is within a threshold distance of the rack; receive an image feed comprising frames of the angled-view images generated by the image sensor after the person is within the threshold distance of the rack; for each image frame of at least a portion of the image feed, determine a pixel position of a wrist of the person in the image frame, thereby determining a set of pixel positions of the wrist during a timeframe associated with the image feed; determine an aggregated wrist position based on the set of pixel positions determined for the image frames of the portion of the image feed; determine that the aggregated wrist position corresponds to a position on a shelf of the rack; and in response to determining that the aggregated wrist position corresponds to a position on a shelf of the rack, provide a trigger signal indicating a shelf-interaction event has occurred; wherein the processor is further configured to determine that the aggregated wrist position corresponds to the position on the shelf of the rack by: comparing the aggregated wrist position to a set of one or more predefined shelf positions; and determining, based on the comparison of the aggregated wrist position to the set of one or more predefined shelf positions, that the aggregated wrist position is within a threshold distance of at least one of the set of predefined shelf positions.

2. The system of claim 1, wherein: the system further comprises a second image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of the rack, wherein the second image sensor is configured to generate top-down images of a region around the rack; and the processor is communicatively coupled to the second image sensor and further configured to: receive a top-view image feed comprising top-view images from the second image sensor; determine, based on the received top-view image feed, that the person is within the threshold distance from the rack; and in response to determining that the person is within the threshold distance of the rack, begin receiving the image feed comprising the frames of the angled-view images.

3. The system of claim 1, wherein the processor is further configured to determine the aggregated wrist position by determining a maximum depth within the rack to which the pixel position wrist position extends in the image frames of the portion of the image feed.

4. The system of claim 1, wherein: each shelf of the rack includes a visible marker at a predefined location on the shelf; and the processor is further configured to: detect the visible marker of each shelf of the rack; and determine the predefined shelf position of each shelf based on the detected markers.
5. The system of claim 1, wherein the processor is further configured to, in response to providing the trigger signal: determine at least one item-selection image frame associated with the person removing a first item from the rack; determine, in the at least one item-selection image frame, a region-of-interest based on the aggregated wrist position, wherein the region-of-interest includes a subset of the pixels of the image frame; identify, using an object detection algorithm, the first item in the selected region-of-interest; and assign the identified first item to the person.

6. The system of claim 5, wherein the processor is further configured to determine, based on the aggregated wrist position, candidate items that may have been removed from the rack by the person, wherein the candidate items include a subset of all items stored on the shelves of the rack, wherein the subset comprises the items located within a threshold distance of the aggregated wrist position.
7. The system of claim 6, wherein the processor is further configured to: for each candidate item, determine, based on a comparison of a predefined position associated with the candidate items to the aggregated wrist position, a probability value that the candidate item was interacted with by the person; and identify the first item as the candidate item with the largest probability value.

8. A method, comprising: determining that a person is within a threshold distance of a rack comprising shelves configured to store items; receiving an image feed comprising frames of angled-view images generated by an image sensor after the person is within the threshold distance of the rack, wherein the image sensor is positioned such that a field-of-view of the image sensor encompasses at least a portion of the rack; for each image frame of at least a portion of the image feed, determine a pixel position of a wrist of the person in the image frame, thereby determining a set of pixel positions of the wrist during a timeframe associated with the image feed; determining an aggregated wrist position based on the set of pixel positions determined for the image frames of the portion of the image feed; determining that the aggregated wrist position corresponds to a position on a shelf of the rack; in response to determining that the aggregated wrist position corresponds to a position on a shelf of the rack, providing a trigger signal indicating a shelf-interaction event has occurred; and determining that the aggregated wrist position corresponds to the position on the shelf of the rack by: comparing the aggregated wrist position to a set of one or more predefined shelf positions; and determining, based on the comparison of the aggregated wrist position to the set of one or more predefined shelf positions, that the aggregated wrist position is within a threshold distance of at least one of the set of predefined shelf positions.

9. The method of claim 8, further comprising: receiving a top-view image feed comprising top-view images from a second image sensor, wherein the second image sensor is positioned such that a field-of-view of the second image sensor encompasses at least a portion of the rack, wherein the second image sensor is configured to generate top-down images of a region around the rack; determining, based on the received top-view image feed, that the person is within the threshold distance from the rack; and in response to determining that the person is within the threshold distance of the rack, beginning to receive the image feed comprising the frames of the angled-view images.

10. The method of claim 8, further comprising determining the aggregated wrist position by determining a maximum depth within the rack to which the pixel position wrist position extends in the image frames of the portion of the image feed.

11. The method of claim 8, wherein: each shelf of the rack includes a visible marker at a predefined location on the shelf; and the method further comprises: detecting the visible marker of each shelf of the rack; and determining the predefined shelf position of each shelf based on the detected markers.

12. The method of claim 8, further comprising, in response to providing the trigger signal: determining at least one item-selection image frame associated with the person removing a first item from the rack; determining, in the at least one item-selection image frame, a region-of-interest based on the aggregated wrist position, wherein the region-of-interest includes a subset of the pixels of the image frame; identifying, using an object detection algorithm, the first item in the selected region-of-interest; and assigning the identified first item to the person.

13. The method of claim 12, further comprising determining, based on the aggregated wrist position, candidate items that may have been removed from the rack by the person, wherein the candidate items include a subset of all items stored on the shelves of the rack, wherein the subset comprises the items located within a threshold distance of the aggregated wrist position.
14. The method of claim 13, further comprising: for each candidate item, determining, based on a comparison of a predefined position associated with the candidate items to the aggregated wrist position, a probability value that the candidate item was interacted with by the person; and identifying the first item as the candidate item with the largest probability value.

15. A tracking subsystem comprising at least one processor configured to: determine that a person is within a threshold distance of a rack comprising shelves configured to store items; receive an image feed comprising frames of angled-view images generated by an image sensor after the person is within the threshold distance of the rack, wherein the image sensor is positioned such that a field-of-view of the image sensor encompasses at least a portion of the rack, wherein the image sensor is configured to generate the angled-view images of the items stored on the shelves of the rack; for each image frame of at least a portion of the image feed, determine a pixel position of a wrist of the person in the image frame, thereby determining a set of pixel positions of the wrist during a timeframe associated with the image feed; determine an aggregated wrist position based on the set of pixel positions determined for the image frames of the portion of the image feed; determine that the aggregated wrist position corresponds to a position on a shelf of the rack; and in response to determining that the aggregated wrist position corresponds to a position on a shelf of the rack, provide a trigger signal indicating a shelf-interaction event has occurred; wherein the processor is further configured to determine that the aggregated wrist position corresponds to the position on the shelf of the rack by: comparing the aggregated wrist position to a set of one or more predefined shelf positions; and determining, based on the comparison of the aggregated wrist position to the set of one or more predefined shelf positions, that the aggregated wrist position is within a threshold distance of at least one of the set of predefined shelf positions.

16. The tracking subsystem of claim 15, wherein the processor is further configured to: receive a top-view image feed comprising top-view images from a second image sensor, wherein the second image sensor is positioned such that a field-of-view of the second image sensor encompasses at least a portion of the rack, wherein the second image sensor is configured to generate the top-down images of a region around the rack; determine, based on the received top-view image feed, that the person is within the threshold distance from the rack; and in response to determining that the person is within the threshold distance of the rack, begin receiving the image feed comprising the frames of the angled-view images.

17. The tracking subsystem of claim 15, wherein the processor is further configured to determine the aggregated wrist position by determining a maximum depth within the rack to which the pixel position wrist position extends in the image frames of the portion of the image feed.
18. The tracking subsystem of claim 15, wherein: each shelf of the rack includes a visible marker at a predefined location on the shelf; and the processor is further configured to: detect the visible marker of each shelf of the rack; and determine the predefined shelf position of each shelf based on the detected markers.

19. The tracking subsystem of claim 15, wherein the processor is further configured to, in response to providing the trigger signal: determine at least one item-selection image frame associated with the person removing a first item from the rack; determine, in the at least one item-selection image frame, a region-of-interest based on the aggregated wrist position, wherein the region-of-interest includes a subset of the pixels of the image frame; identify, using an object detection algorithm, the first item in the selected region-of-interest; and assign the identified first item to the person.

20. The tracking subsystem of claim 19, wherein the processor is further configured to determine, based on the aggregated wrist position, candidate items that may have been removed from the rack by the person, wherein the candidate items include a subset of all items stored on the shelves of the rack, wherein the subset comprises the items located within a threshold distance of the aggregated wrist position.
21. The tracking subsystem of claim 20, wherein the processor is further configured to: for each candidate item, determine, based on a comparison of a predefined position associated with the candidate items to the aggregated wrist position, a probability value that the candidate item was interacted with by the person; and identify the first item as the candidate item with the largest probability value.


Claims 1 of the instant application is unpatentable under the judicially created doctrine of “obviousness-type” double patenting with respect to claim 1 of U.S. Patent No. 11003918.  
Application claim 1 defines an obvious variation of the invention claimed in U.S. Patent No. 11003918.
The assignee of all applications of all applications is the same.
Claim 1 of the instant application is anticipated by patent claim 1 in that claim of the patent contains all the limitations of claim 1 of the instant application.  Claim 1 of the instant application therefore is not patently distinct from the earlier patent claim and as such is unpatentable for obvious-type double patenting.

 5.	Claims 1, 2, 4,8, 9, 11, 15, 16 and 18  are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 5, 9, 13, 17, 21 of U.S. Patent No. 11113541. Although the claims at issue are not identical, they are not patentably distinct from each other.

Claim Rejections - 35 USC § 103
6.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
7.	Claim(s) 1-5, 8-12, 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Zalewski et al. (US 9911290)(hereafter Zalewski) in view of Buibas et al. (US 10282852) (hereafter Buibas).
 	Regarding claims 1, 8 and 15 Zalewski discloses a system, comprising: 
an object configured to store items (see, Fig. 1A 0 overhead camera and shelves having items stored thereon); 
an image sensor positioned such that a field-of-view of the image sensor encompasses at least a portion of the object, wherein the image sensor is configured to generate images of the stored items (see, Fig. 1A- overhead cameras and shelves having items stored thereon (col. 11 lines 35-38 and col. 26 lines 15-20, some sensors may be image sensors. e.g. cameras); and 
a tracking subsystem coupled to the image sensor (see, col. 12 lines 1-30 discloses a tracking system configured to track a person, items and the interaction thereof based on input from one or more sensors ), the tracking subsystem comprising at least one processor configured to:
 determine, using a set of images generated by the image sensor, a pixel position of a body part of a person in each image of the set of images, thereby determining a set of pixel positions of the body part during a timeframe associated with the set of images (see, col. 13 lines 1-7 has disclosed detection of arms extending one or more shelves to interact with retail space, hence including zone/region wherein the object to eb interact which is placed. Furthermore, said process of image processing of a hand and associated arm/limb extending into the retail shelving implicitly requires the detection of one or more pixels of a wrist and associated arm of the shopper/customer (see wrist of user detected in Fig. 5A-51B of Zalewski)); 

in response to determining that the 
Zalewski does not explicitly disclose the missing limitation above.  
However, in same field of endeavor, Buibas teaches determining, based on the tracked pixel positions, an aggregated wrist position corresponding to a depth within the rack to which the wrist position extends over the period of time; and Buibas (col 15 line 55 through col 16 line 30, col. 18 lines 26-55) has disclosed determining one or more landmarks in a 3D field of the person in the image frame (see Fig 6A-6E), wherein said 3D field and landmarks correspond to at least pixel positions of a wrist of said person. Said 3D field encompassing at least depth of an aggregate field of body positions of the person’s wrist and other components. Buibas (col 15 lines 45-55, col. 18 lines 26-55 ) has disclosed the wrist and body 3D field of influence volume consisting of a plurality of pixels of the human body including a wrist position of said person. Based on the 3D field of influence of the wrist and corresponding user, when a threshold “near” distance to an item that has moved is determined (col 16 lines 7-15 of Buibas), said person is then associated with said change in state/position of said item], col. 29 lines 39-50 teaches Processor 130 analyzes data from one or more of cameras 2102, 2101, 1913a and sensor 2103, to determine the item that was taken and to associate that item with person 1901 (based for example on the 3D influence volume of the person being located near the item at the time the item was moved). Because authorization information 1933 is also associated with the person at the time the item is taken, processor 130 may transmit message 2111 to charge the account associated with the user for the item.
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to combine the teachings of Buibas with the Zalewski, as a whole, so as to use the aggregated pixel positions to determine the user interacting with the items on shelf, the motivation is to track the people in store and to detect the interactions of these people with items in the store. 

 	Regarding claims 2, 9 and 16, the combined teachings further discloses the system, wherein the processor is further configured to determine that the aggregated body part position corresponds to the position associated with the object by: comparing the aggregated body part position to a set of one or more predefined object positions; and determining, based on the comparison of the aggregated body part position to the set of one or more predefined object positions, that the aggregated body part position is within a threshold distance of at least one of the set of predefined object positions (Buibas, col. 15 lines 45-55, col. 18 lines 26-55, has disclosed the wrist and body 3D field of influence volume consisting of a plurality of pixels of the human body including a wrist position of said person. Based on the 3D field of influence of the wrist and the corresponding user (see, Figs. 6A-6E), when a threshold “near” distance to an item that has moved is determined (col. 16 lines 7-11 of Buibas), said person is then associated with said change in state/position of said item, here the threshold near distance is comparing the body part position to the object on the shelf, col. 7 lines 20-34 discloses tracking may determine when the person is near an item storage area, and analysis of two or more images of the item storage area may determine that an item has moved. Combining these analyses allows the system to attribute motion of an item to the person, and to charge the item to the person's account if the authorization is linked to a payment account. Again, as described with respect to an automated store, tracking and determining when a person is at or near an item storage area may include calculating a 3D field of influence volume around the person; determining when an item is moved or taken may use a neural network that inputs two or more images (such as before and after images) of the item storage area and outputs a probability that an item is moved).

 	Regarding claims 3, 10 and 17 Zalewski further discloses the system, wherein: the system further comprises a second image sensor positioned such that a field- of-view of the image sensor encompasses at least a portion of the object (see, Fig. 32 the cameras 3201-3203 are cameras taking field of view images), wherein the second image sensor is configured to generate top-down images of a region around the object (see, Fig. 32); and the processor is communicatively coupled to the second image sensor and further configured to: receive a top-view image feed comprising top-view images from the second image sensor (see, Fig. 32, the multiple cameras, as shown taking multiple top-view images) determined based on the received top-view image feed. that the person is within a threshold distance from the object: and  in response to determining that the person is within the threshold distance of the object (col. 3 lines 58-67 and col. 4 lines 1-10). begin receiving an image feed comprising the set of images generated by the image sensor (see, col. 7 lines 17-34 discloses tracking of the person may also occur in the secured environment, using cameras in the secured environment. As described above with respect to an automated store, tracking may determine when the person is near an item storage area, and analysis of two or more images of the item storage area may determine that an item has moved. Combining these analyses allows the system to attribute motion of an item to the person, and to charge the item to the person's account if the authorization is linked to a payment account. Again, as described with respect to an automated store, tracking and determining when a person is at or near an item storage area may include calculating a 3D field of influence volume around the person; determining when an item is moved or taken may use a neural network that inputs two or more images (such as before and after images) of the item storage area and outputs a probability that an item is moved).

 	Regarding claims 4, 11 and 18, the combined teachings further discloses the system, wherein the processor is further configured to determine the aggregated body part position by determining a maximum depth associated with the object to which the pixel position body part position extends in the set of images (see, Buibas , col. 15 lines 55 through col. 16 line 30, col. 18 lines 26-55, disclosed determining landmarks in 3D field of the person in the image frame (see, Figs. 6A-6E) wherein the said 3D field and landmarks corresponds to a pixel positions of a wrist of said person, said 3D field encompassing at least depth of an aggregated field of body positions of person’s wrist and other components, col. 20 lines 26-45 discloses FIG. 9 by looking for item movements only in item storage areas that intersect a person's 3D field of influence volume. FIG. 10 illustrates this process. At a point in time 141 or over a time interval, the tracked 3D field of influence volume 1001 of person 103 is calculated to be near item storage area 102. The system therefore calculates an intersection 1011 of the item storage area 102 and the 3D field of influence volume 1001 around person 1032 and locates camera images that contain views of this region, such as image 1011. At a subsequent time 142, for example when person 103 is determined to have moved away from item storage area 102, an image 1012 (or multiple such images) is obtained of the same intersected region. These two images are then fed as inputs to neural network 300, which may for example detect whether any item was moved, which item was moved (if any) and the type of action that was performed).

 	Regarding claims 5 ,12 and 19,  Zalewski further discloses the system wherein: the object includes a visible marker located at a predefined location: and the processor is further configured to: detect the visible marker (see, col. 40 lines 65-67 and col. 41 lines 1-3); and determine the predefined position associated with the object based on the detected markers (see, col. 27 lines 50-60, col. 62 lines 50-60, see, col. 12 lines 1-30 discloses a tracking system configured to track a person, items and the interaction thereof based on input from one or more sensors, also discloses the user is holding the item in his hand, and appears to be reading the label. In one configuration, which is optional, in ID, RFID tag, code, or WCC is integrated with the product).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DHAVAL V PATEL whose telephone number is (571)270-1818. The examiner can normally be reached Monday to Friday (8:00am-4:30pm).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sam Ahn can be reached on 571-272-3044. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DHAVAL V PATEL/Primary Examiner, Art Unit 2631