DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 23 June, 2022 has been entered.
 Response to Amendment
Claims 1-19 are pending. Claims 1-19 are amended directly or by dependency on an amended claim.
Response to Arguments
Applicant's arguments filed 23 June, 2022 have been fully considered but they are not persuasive. 
Applicant argues on pages 18-20 “Gururajan does not disclose a state of the gaming object or the target object including a motion state and an occlusion state, let alone determining the quality level of the image according to the states of the target objects”. Examiner disagrees. Gururajan indicates “The output of the thresholded image will ideally show the playing cards as independent blobs 110. This may not always be the case due to issues of motion or occlusion” ([0093]) and “Motion detected on or right beside an object positioning feature (such as a contour) of a card or card hand can be an indication that the card or card hand may be occluded and an appropriate motion flag can be set to record this potential occlusion” ([0122]) and “The values of the flags set by the points in contour test, motion detection, skin detection and contour analysis can be utilized to detect potential occlusion of a card or card hand” ([0124]) and “One way to detect an overlap of card hands is to utilize object motion tracking, as described in a foregoing section, to track identified card corners (or contours or other position features) gradually as they move and end up overlapping another card hand” ([0125]), and “No cards in the current frame and no motion on the table could also indicate a game has ended” ([0148]).  As Gururajan describes therefore the concept of an “appropriate motion flag can be set to record this potential occlusion”, this shows that both an occlusion state and a motion state (motion or no motion) are described by the motion flag. The argument that Gururajan does not disclose the further concept of using motion and occlusion to determine quality state has been considered but is moot because the new ground of rejection does not rely on Gururajan for any teaching or matter specifically challenged in the argument with respect to the new limitation “determining the quality level of the image in the bounding box of the second target object with the to-be-determined state according to the state of the second target object with the to-be-determined state comprises: in response to that the motion state of the second target object with the to-be-determined state satisfies the preset motion state condition, and the second target object with the to-be-determined state is in the unoccluded state, determining that the image in the bounding box of the second target object with the to-be-determined state is a first quality image”.
Applicant argues on page 20 “However, Divakaran does not disclose a state of the gaming object or the target object including a motion state and an occlusion state, let alone determining the quality level of the image according to the states of the target objects”. Divakaran teaches accounting for extraneous motions such as trees and flags swaying, “Hypothesized regions of foreground 326, such as those corresponding to humans and vehicles, are detected by the background modeling module 310 as outliers with respect to the background model. The background modeling module 310 may generate foreground masks based on these outliers (hypothesized regions of foreground), which may be used to restrict subsequent feature computations to only those masked regions (in order to improve computational performance or for other reasons)” (col. 9, lines 55-68), “When an occlusion occurs, either the tracked object or the occluding object, or both, may be moving” (col. 12, line 65 - col. 13, line 30), “For example, if an object having a similar appearance or motion is detected in a sequence of multiple frames of the video stream 216, the tracking manager 612 may initiate a track for the detected object” (col. 15, lines 20-30), “When a track enters a static or dynamic occlusion zone, the tracking manager 612 propagates the track for a short time based on a motion prediction that it generates using the motion model 618. If the track does not emerge from the occlusion zone within a predetermined period of time, the tracking manager 612 temporarily suspends the track” (col. 15, lines 30-40), the tracking solution is generated globally as a joint decision using all of the objects involved in the tracking (col. 16, lines 15-45), tracking module 222 can apply a tracking algorithm that utilizes a comprehensive set of measurements and available constraints, such as appearance, shape, kinematics and occlusions (col. 17, lines 35-55), “Statistical analysis and correlation of tracks and movement for people and/or vehicles in and around a scene of interest facilitates automated detection of anomalies/threats” (col. 23, lines 35-55) [outlier and similar language are interpreted as having a motion condition must exist that the outlier is outside of]. As Divakaran describes therefore the concept of occlusion and motion, this is interpreted as “teaching the state of the second target object with the to-be-determined state comprises an occlusion state and a motion state, the occlusion state of the second target object with the to-be-determined state comprises an unoccluded state and an occluded state, and the motion state of the second target object with the to-be-determined state comprises satisfying a preset motion state condition and dissatisfying the preset motion state condition”. The argument that Divakaran does not disclose the further concept of using motion and occlusion to determine quality state has been considered but is moot because the new ground of rejection does not rely on Divakaran for any teaching or matter specifically challenged in the argument with respect to the new limitation “determining the quality level of the image in the bounding box of the second target object with the to-be-determined state according to the state of the second target object with the to-be-determined state comprises: in response to that the motion state of the second target object with the to-be-determined state satisfies the preset motion state condition, and the second target object with the to-be-determined state is in the unoccluded state, determining that the image in the bounding box of the second target object with the to-be-determined state is a first quality image”.
All other arguments are by similarity or dependency and are therefore addressed by the above. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 10, 11, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gururajan et al. (US 20060252521 A1) in view of Divakaran et al. (IDS: US 9904852 B2) in view of Zheng et al. (US 20210142097 A1).

Regarding claims 1, 10, and 19, Gururajan et al. disclose a method of filtering images (filtering out playing cards from a darker table background, [0091], If a contour 112 does not match the expected dimensions of a card or card hand it can be discarded, [0095], perform recognition only on cards or card hands that are not occluded, [0118]), comprising and electronic device, comprising: a memory and a processor, wherein the memory is configured to store computer instructions executed by the processor, and the processor is configured to comprising, and non-volatile computer-readable storage medium having a computer program stored thereon, wherein the program is executable by a processor to: obtaining a first image, wherein the first image is an image frame in a video stream obtained by collecting images for a target area (“Imaging system 32 comprises overhead imaging system 40 and optional lateral imaging system 42. Imaging system 32 can be located on or beside the gaming table 12 to image a gaming region from a top view and/or from a lateral view. Overhead imaging system 40 can periodically image a gaming region from a planar overhead perspective”, [0078]); obtaining a first detection result of a first target object in the first image by detecting the first image, wherein the first target object in the first image comprises a second target object with a to-be-determined state (“A card or card hand is first identified by an image from the imaging system 32 as a blob 110. A blob may be any object in the image of a gaming area but for the purposes of this introduction we will refer to blobs 110 that are cards and card hands”, [0089], detect the presence or absence of playing cards (or other gaming objects) on the surface of gaming table, [0102]) [from following description and observing Fig. 7 can see there are at least 2 cards in the hand, the larger blob shape can be interpreted as the first target object which is comprised of second target objects which are the regions of interest of the individual cards themselves]; determining a state of the second target object with the to-be-determined state according to the first detection result of the first target object in the first image and a second detection result of the second target object with the to-be-determined state, wherein the second detection result of the second target object with the to-be-determined state is a detection result of the second target object with the to-be-determined state in a second image obtained by detecting the second image (identifying and tracking gaming objects and game states, abstract, A contour 112 is then examined for regions of interest (ROI) 118, which identify a specific card, rank and suit of a card, [0089], Optical Character Recognition (OCR) algorithms, process these overhead images of a gaming region to determine the identity and position of playing cards on the gaming table 12, [0090], card corners 116 are utilized to obtain a Region of Interest (ROI) 18 encompassing a card identifying symbol, such as the number of the card, and the suit, [0098], identify the value of the card, [0099], [0100]), the second image is at least one image frame in N image frames adjacent to the first image in the video stream, and N is a positive integer (Object motion tracking generally refers to tracking an object that is moving from frame to frame in a temporal sequence of image frames. The position and/or other parameters of the object are being tracked through consecutive or periodic frames. In case of card tracking, the objects in question are cards or card hands. Object motion tracking matches positioning features of objects over consecutive frames, [0109], Motion detection (different from object motion tracking) in the vicinity of the card or card hand position feature can be performed by comparing consecutive frames, [0120] [see at least N frames from seeing Fig. 15, images 232, 234, 242]); and determining a quality level of an image in a bounding box of the second target object with the to-be-determined state according to the state of the second target object with the to-be-determined state (“The outer boundary of blob 110 is then traced to determine a contour 112 which is a sequence of boundary points forming the outer boundary of a card or a card hand. In determining a contour, digital imaging thresholding is used to establish thresholds of grey. In the case of a card or card hand, the blob 110 would be white and bright on a table. From the blob 110 a path is traced around its boundary until the contour 112 is established” [0089], line segments 114 forming the card or card hand boundaries are obtained, [0096], Corner points 116, and line segments 114 are then utilized to create a position profile for cards and card hands, i.e. where they reside in the gaming region, [0097] [see Fig. 7, corners and sides 114 and 116 form a box shape], bounding contour, [0107]; “The output of the thresholded image will ideally show the playing cards as independent blobs 110. This may not always be the case due to issues of motion or occlusion” [0093], A contour can become partially occluded. For example, a dealer's hand may partially obstruct the overhead view and occlude a part of a card hand contour, [0107], If the superimposition was successful processing moves to step 210 as discussed above. If it was not successful a flag is set at step 222 to indicate the matching of a corner failed and processing moves to step 212. The flags set at step 222 can be used by game tracking module 86 to determine if a card hand is occluded, [0118], f the points in contour method fails for one or more corners, it could mean that the contour does not belong to a card/card hand or that the contour may be occluded, [0119], “Analysis of a contour 112 of a card 18 or card hand 120 can be utilized to determine if it is occluded. The same is true for any gaming object. The values of the flags set by the points in contour test, motion detection, skin detection and contour analysis can be utilized to detect potential occlusion of a card or card hand”, [0124] [note, see applicant’s specification for definition of quality as pertains to having motion or occlusion, this is how examiner interprets this term] [as contour analysis used first, this shows occlusion only looked for after tracing card shape]), and filtering, for the second target object with the to-be-determined state, the first image according to the quality level, wherein the bounding box of the second target object with the to-be-determined state is determined according to the first detection result of the first target object in the first image (If a contour 112 does not match the expected dimensions of a card or card hand it can be discarded, [0095], “Some features can still be representative of a card hand, even if only part of it is visible. In the case of contours, such features include the portions of the contour, which are unique in shape, such as a corner. Since under partial occlusion some of these distinguishing features would likely still be visible, the partially occluded hand could likely be matched using a subset of card hand features. For low resolution data, features such as the curvature of the bounding contour could be used for tracking”, [0107], The flags set at step 222 can also be used by a recognition method of the IP module to perform recognition only on cards or card hands that are not occluded, [0118]). 

Gururajan et al. further disclose wherein the state of the second target object with the to-be-determined state comprises an occlusion state and a motion state, the occlusion state of the second target object with the to-be-determined state comprises an unoccluded state and an occluded state, and the motion state of the second target object with the to-be-determined state comprises satisfying a preset motion state condition and dissatisfying the preset motion state condition (The output of the thresholded image will ideally show the playing cards as independent blobs 110. This may not always be the case due to issues of motion or occlusion, [0093], “Motion detected on or right beside an object positioning feature (such as a contour) of a card or card hand can be an indication that the card or card hand may be occluded and an appropriate motion flag can be set to record this potential occlusion.”, [0122], The values of the flags set by the points in contour test, motion detection, skin detection and contour analysis can be utilized to detect potential occlusion of a card or card hand, [0124],  One way to detect an overlap of card hands is to utilize object motion tracking, as described in a foregoing section, to track identified card corners (or contours or other position features) gradually as they move and end up overlapping another card hand, [0125], No cards in the current frame and no motion on the table could also indicate a game has ended, [0148] [motion flags interpreted as indicating motion state]

Gururajan et al. disclose all of the limitations of claims 1, 10 and 19; however, or completeness of record, a second reference is provided to further teach the actual phrase “bounding box”.

Divakaran et al. teach obtaining a first image, wherein the first image is an image frame in a video stream obtained by collecting images for a target area (“field of view” may refer to, among other things, the extent of the observable real world that is visible through a camera at any given moment in time, video stream, col. 3, line 60 - col. 4, line 10, “The illustrative scene awareness module 210 maintains static and dynamic occlusion maps 212, 214 for the FOV of the camera with which the OT node 200 is associated. As such, the static and dynamic occlusion maps 212, 214 may be specific to a particular camera's FOV. The illustrative human detection module 218 executes one or more human detection computer vision algorithms on each frame of the video stream 216. The video stream 216 depicts the real-world scene in the FOV of the camera associated with the OT node 200, as mentioned above”, col. 7, line 60 - col. 8, line 5) [FOV interpreted as target area]; obtaining a first detection result of a target object in the first image by detecting the first image (output a different local track for each object that is detected and tracked in the video stream, col. 3, lines 30-50, “The illustrative human detection module 218 executes one or more human detection computer vision algorithms on each frame of the video stream 216. The video stream 216 depicts the real-world scene in the FOV of the camera associated with the OT node 200, as mentioned above. The human detection module 218 relays the geo-positions of any detected persons as well as any regions-of-interest (ROIs) of the detected person(s) in each of the individual image/frames to the real-time tracking module 222, as a detection stream 220”, col. 7, line 65 – col. 8, line 5); determining a state of a target object with to-be-determined state according to the first detection result of the target object in the first image and a second detection result of the target object with to-be-determined state, wherein the target object with to-be-determined state is a target object in the first image, the second detection result of the target object with to-be-determined state is a detection result of the target object with to-be-determined state in a second image obtained by detecting the second image (Motion constraints can be used to filter tracks that are potential candidates for association, col. 5, lines 30-65, feature computation module 312 executes multiple types of object detection feature descriptors, including, for example, motion, col. 10, lines 5-35, Using these techniques, the human detection module 218 can deal with partial and temporary occlusions. For instance, when a person's legs are temporarily occluded in an image of the video stream 216, the person can still be detected and tracked from the upper-body (e.g., based on the responses of a torso and/or head-shoulder detector) until the legs are no longer occluded, col. 13, lines 30-40, For example, if an object having a similar appearance or motion is detected in a sequence of multiple frames of the video stream 216, the tracking manager 612 may initiate a track for the detected object, col. 15, lines 20-30, information about the location, motion, and appearance of the detected objects and/or object parts, col. 20, line 55 - col. 21, line 10) [motion and occlusion interpreted as state, tracking over frames of video interpreted as second detection result and second image], the second image is at least one image frame in N image frames adjacent to the first image in the video stream, and N is a positive integer (each time instant of the video stream e.g., for each frame/image in the video stream, col. 3, lines 30-50, analyzes temporal sequences of frames in the video stream, analyzing the next frame of the video stream col. 8, lines 5-20, estimates the dynamic masks for every frame of the video stream, col. 8, line 50 - col. 9, line 10) [each time instant and sequence of frames indicates adjacent frames, N being a positive integer implied by the sequence as there will not be negative frames possible for analysis]; and determining a quality level of an image in a bounding box of the target object with to-be-determined state according to the state of the target object with to-be-determined state, wherein the bounding box of the target object with to-be-determined state is determined according to the first detection result of the target object with to-be-determined state (inference may have associated with it a degree of certainty, such as a statistical or probabilistic likelihood that the hypothesis is correct, col. 5, lines 55-60, static and dynamic occlusion maps, “During static modeling, static occluders (such as poles and trees) detected in the scene are marked with static masks using a distinguishing color (e.g., orange), and dynamic occluders (such as vehicles) are marked with dynamic masks in a different distinguishing color (e.g., cyan), col. 8, lines 30-35,  “mask” may refer to, among other things, a computer-synthesized graphic or other marking, such as a differently-colored block or region, which can be applied to or overlayed on an image in a video stream, col. 8, lines 40-50, restrict feature computation to masked regions, col. 9, lines 60-68, This allows the tracking module 222 to maintain tracks for heavily occluded persons as long as the tracks can be assumed to be in the occlusion zone of another tracked person. In this way, the tracking module 222 can prevent the uncertainty of the occluded track from disrupting the tracking. Rather, the uncertainty of the occluded track is limited to the size of the POZ of the occluding track, col. 15, line 60 - col. 16, line 15, “As used herein, “detection window,” “bounding box,” or “marking” may be used to refer to a region within an image in which an object or a portion of an object is detected or is expected to be detected. As an example, a graphical marking (such as a rectangular box) may be applied to an image to define a detection window. In some cases, the marking used to define the bounding box may indicate the degree of confidence that the system 100 has in the detection. For instance, a bounding box that is drawn with dashed lines may indicate a lower level of confidence in the detection hypothesis than a bounding box drawn with a solid line”, col. 11, lines 55-68).

Divakaran et al. further teach wherein the state of the second target object with the to-be-determined state comprises an occlusion state and a motion state, the occlusion state of the second target object with the to-be-determined state comprises an unoccluded state and an occluded state, and the motion state of the second target object with the to-be-determined state comprises satisfying a preset motion state condition and dissatisfying the preset motion state condition (account for extraneous motions such as trees and flags swaying, “Hypothesized regions of foreground 326, such as those corresponding to humans and vehicles, are detected by the background modeling module 310 as outliers with respect to the background model. The background modeling module 310 may generate foreground masks based on these outliers (hypothesized regions of foreground), which may be used to restrict subsequent feature computations to only those masked regions (in order to improve computational performance or for other reasons)”, col. 9, lines 55-68), When an occlusion occurs, either the tracked object or the occluding object, or both, may be moving, col. 12, line 65 - col. 13, line 30, “For example, if an object having a similar appearance or motion is detected in a sequence of multiple frames of the video stream 216, the tracking manager 612 may initiate a track for the detected object”, col. 15, lines 20-30, When a track enters a static or dynamic occlusion zone, the tracking manager 612 propagates the track for a short time based on a motion prediction that it generates using the motion model 618. If the track does not emerge from the occlusion zone within a predetermined period of time, the tracking manager 612 temporarily suspends the track, col. 15, lines 30-40, the tracking solution is generated globally as a joint decision using all of the objects involved in the tracking, col. 16, lines 15-45, tracking module 222 can apply a tracking algorithm that utilizes a comprehensive set of measurements and available constraints, such as appearance, shape, kinematics and occlusions, col. 17, lines 35-55, Statistical analysis and correlation of tracks and movement for people and/or vehicles in and around a scene of interest facilitates automated detection of anomalies/threats, col. 23, lines 35-55) [outlier and similar language are interpreted as having a motion condition must exist that the outlier is outside of].

Gururajan et al. and Divakaran et al. are in the same art of tracking (Gururajan et al., abstract; Divakaran et al., col. 5, lines 30-35). The combination of Divakaran et al. with Gururajan et al. enables use of a defined bounding box. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the bounding box of Divakaran et al. with the invention of Gururajan et al. as this was known at the time of filing, the combination would have predictable results, and as Divakaran et al. indicate “The background modeling module 310 may generate foreground masks based on these outliers (hypothesized regions of foreground), which may be used to restrict subsequent feature computations to only those masked regions (in order to improve computational performance or for other reasons)” (col. 9, line 40 - col. 10, line 5), and system is designed to be modular and easily expandable to cover a large geographic area, and provides better person disambiguation (col. 21, line 50 - col. 22, line 20) indicating the computational benefit of using a mask of bounding boxes as described by Divakaran et al. in a busy casino environment such as that described by Gururajan et al..

Gururajan et al. and Divakaran et al. do not explicitly disclose determining the quality level of the image in the bounding box of the second target object with the to-be-determined state according to the state of the second target object with the to-be-determined state comprises: in response to that the motion state of the second target object with the to-be-determined state satisfies the preset motion state condition, and the second target object with the to-be-determined state is in the unoccluded state, determining that the image in the bounding box of the second target object with the to-be-determined state is a first quality image.

Zheng et al. teach determining the quality level of the image in the bounding box of the second target object with the to-be-determined state according to the state of the second target object with the to-be-determined state comprises: in response to that the motion state of the second target object with the to-be-determined state satisfies the preset motion state condition, and the second target object with the to-be-determined state is in the unoccluded state, determining that the image in the bounding box of the second target object with the to-be-determined state is a first quality image (In an embodiment, the detection modules and extraction modules are image-based models. The extraction model serves as a feature extractor, and may also serve as an input item image quality checker. The extraction module is able to determine, for the patch inside each bounding box predicted by the detector, how good the feature representation of that patch is for the retrieval task. If the bounding box is not regressed well, then the quality is determined to be low. In some examples, the level of regression may be determined by a threshold value set by an administrator and stored in a memory of the system. If the bounding box is accurate, but the patch content is not suitable for retrieval (for example due to an occlusion or motion blur) then the quality will also be low. A quality score threshold is used to remove obvious bad detections before they are fed into the association module to form tracklets. However, in some examples, quality thresholding may not be able to filter out false positives from detections, as some of the detected false positive items can have high patch quality. Therefore, in such situations, false positives are removed in the association module. In addition, the selected patches of each tracklet and the corresponding quality scores are passed to the fusion module to get a fused feature for the item that corresponds to the tracklet. Quality scores may be used as weights to fuse the tracklet features. The fused features are then used to query the database for retrieval. Since product images are usually high-quality images captured in controlled environments with clean backgrounds. The fusion module in the product domain can be an average fusion technique, [0207]). While Zheng et al. do not describe a motion state condition, Zhang et al. indicate a state of whether or not there is a motion blur, therefore in combination with the Gururajan et al. and Divakaran et al. references which find a motion state (as indicated by parameters such as a motion flag), Zheng et al. teaches the limitation as a whole. 

Gururajan et al. and Divakaran et al. and Zheng et al. are in the same art of tracking (Gururajan et al., abstract; Divakaran et al., col. 5, lines 30-35; Zheng et al., abstract). The combination of Zheng et al. with Gururajan et al. and Divakaran et al. enables use of determining that the image in the bounding box of the second target object with the to-be-determined state is a first quality image. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the determining that the image in the bounding box of the second target object with the to-be-determined state is a first quality image of Zheng et al. with the invention of Gururajan et al. and Divakaran et al. as this was known at the time of filing, the combination would have predictable results, and as Zheng et al. indicate some detections may have poor quality and are thus not suitable to pass to the extractor for feature fusion, therefore, in some examples, the video product retrieval system filters the available image to select only good detections to be used for fusion ([0202]), thereby improving the tracking accuracy achieved by Gururajan et al. and Divakaran et al. by thereby only using quality tracklets for fusion and tracking.

Regarding claims 2 and 11, Gururajan et al. and Divakaran et al. and Zheng et al. disclose the method and device according to claims 1 and 10. Gururajan et al. and Divakaran et al. further indicate determining the state of the second target object with to-be-determined state according to the first detection result of the first target object in the first image and the second detection result of the target object with to-be-determined state comprises: determining the motion state of the target object with the to-be-determined state according to the first detection result of the first target object in the first image and the second detection result of the second target object with to-be-determined state; determining whether the motion state of the second target object with to-be-determined state satisfies a preset motion state condition; and in response to that the motion state of the second target object with the to-be-determined state satisfies the preset motion state condition, determining the occlusion state of the second target object with the to-be-determined state according to the first detection result of the first target object in the first image and a first detection result of one or more other target objects in the first image except the second target object with to-be-determined state (Gururajan et al., The output of the thresholded image will ideally show the playing cards as independent blobs 110. This may not always be the case due to issues of motion or occlusion, [0093], “Motion detected on or right beside an object positioning feature (such as a contour) of a card or card hand can be an indication that the card or card hand may be occluded and an appropriate motion flag can be set to record this potential occlusion.”, [0122], The values of the flags set by the points in contour test, motion detection, skin detection and contour analysis can be utilized to detect potential occlusion of a card or card hand, [0124],  One way to detect an overlap of card hands is to utilize object motion tracking, as described in a foregoing section, to track identified card corners (or contours or other position features) gradually as they move and end up overlapping another card hand, [0125], No cards in the current frame and no motion on the table could also indicate a game has ended, [0148] [flags interpreted as indicating conditions such as motion/occlusion yes/no] [only does recognition on that ROI/second target area when appropriate in terms of motion and occlusion]; Divakaran et al., account for extraneous motions such as trees and flags swaying, “Hypothesized regions of foreground 326, such as those corresponding to humans and vehicles, are detected by the background modeling module 310 as outliers with respect to the background model. The background modeling module 310 may generate foreground masks based on these outliers (hypothesized regions of foreground), which may be used to restrict subsequent feature computations to only those masked regions (in order to improve computational performance or for other reasons)”, col. 9, lines 55-68), When an occlusion occurs, either the tracked object or the occluding object, or both, may be moving, col. 12, line 65 - col. 13, line 30, “For example, if an object having a similar appearance or motion is detected in a sequence of multiple frames of the video stream 216, the tracking manager 612 may initiate a track for the detected object”, col. 15, lines 20-30, When a track enters a static or dynamic occlusion zone, the tracking manager 612 propagates the track for a short time based on a motion prediction that it generates using the motion model 618. If the track does not emerge from the occlusion zone within a predetermined period of time, the tracking manager 612 temporarily suspends the track, col. 15, lines 30-40, the tracking solution is generated globally as a joint decision using all of the objects involved in the tracking, col. 16, lines 15-45, tracking module 222 can apply a tracking algorithm that utilizes a comprehensive set of measurements and available constraints, such as appearance, shape, kinematics and occlusions, col. 17, lines 35-55, Statistical analysis and correlation of tracks and movement for people and/or vehicles in and around a scene of interest facilitates automated detection of anomalies/threats, col. 23, lines 35-55) [outlier and similar language are interpreted as having a motion condition must exist that the outlier is outside of].

Claims 7 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gururajan et al. (US 20060252521 A1) and Divakaran et al. (IDS: US 9904852 B2) and Zheng et al. (US 20210142097 A1) as applied to claims 2 and 11 above, further in view of Wu et al. (US 20180211113 A1).

Regarding claims 7 and 16, Gururajan et al. and Divakaran et al. and Zheng et al. disclose the method and device according to claims 2 and 11. Gururajan et al. and Divakaran et al. and Zheng et al. do not explicitly disclose determining the motion state of the target object with to-be-determined state according to the first detection result of the target object with to-be-determined state and the second detection result of the target object with to-be-determined state comprises: determining the motion state of the second target object with the to-be-determined state according to the first detection result of the first target object in the first image  and the second detection result of the second target object with the to-be-determined state comprises:  determining a first position of the second target object with the to-be-determined state in the first image according to the first detection result of the first target object in the first image; determining a second position of the second target object with the to-be-determined state in the second image according to the second detection result of the second target object with the to-be-determined state; determining a motion speed of the second target object with the to-be-determined state according to the first position, the second position, a time when the first image is collected, and a time when the second image is collected; and determining the motion state of the second target object with the to-be-determined state according to the motion speed of the second target object with the to-be-determined state; and determining whether the motion state of the second target object with the to-be-determined state satisfies the preset motion state condition comprises: determining whether the motion state of the second target object with the to-be-determined state satisfies the preset motion state condition according to the motion speed of the second target object with the to-be-determined state and an image collection frame rate of an image collection device for collecting the video stream.
 
Wu et al. teach determining the motion state of the second target object with the to-be-determined state according to the first detection result of the first target object in the first image  and the second detection result of the second target object with the to-be-determined state comprises:  determining a first position of the second target object with the to-be-determined state in the first image according to the first detection result of the first target object in the first image; determining a second position of the second target object with the to-be-determined state in the second image according to the second detection result of the second target object with the to-be-determined state; determining a motion speed of the second target object with the to-be-determined state according to the first position, the second position, a time when the first image is collected, and a time when the second image is collected; and determining the motion state of the second target object with the to-be-determined state according to the motion speed of the second target object with the to-be-determined state; and determining whether the motion state of the second target object with the to-be-determined state satisfies the preset motion state condition comprises: determining whether the motion state of the second target object with the to-be-determined state satisfies the preset motion state condition according to the motion speed of the second target object with the to-be-determined state and an image collection frame rate of an image collection device for collecting the video stream (video system for automatically detecting an occurrence of an interaction event of two or more objects concurrently present in a surveilled area, “b) detect and track the two or more objects within a first common temporal sequence of video frames included in the video stream, and generate a trajectory of each object tracked within the first common temporal sequence of video frames; c) process the trajectories of the tracked objects to extract one or more trajectory interaction features (TIFs) associated with the trajectories of the two or more objects tracked within the first common temporal sequence of video frames, the TIFs including one or more of a position, a velocity, and a relative distance associated with the two or more objects within the first common temporal sequence of video frames; and d) apply predefined heuristics to the extracted TIFs to detect an interaction event has occurred between at least two objects of the two or more objects tracked within the first common temporal sequence of video frames, the predefined heuristics including a velocity threshold and a proximity threshold associated with the two or more objects tracked within the first common temporal sequence of video frames, wherein steps b)-d) are repeated for a second common temporal sequence of video frames, distinct from the first common temporal sequence of video frames, to determine if the interaction even has occurred between at least two objects of the two or more objects tracked within the second common temporal sequence of video frame”, [0015], video surveillance camera operating at conventional frame rates, [0032], trained classifier is then applied to features extracted from frames of interest and outputs the parameters of bounding boxes (e.g., location, width and height) surrounding the matching candidates, [0037], TIFs are the positions and velocities of both persons and the distance between them during the time periods that both are being tracked, [0039]) [first and second objects indicated by Wu].

Divakaran et al. and Wu et al. are in the same art of tracking (Divakaran et al., col. 5, lines 30-35; Wu et al., [0039]). The combination of Wu et al. with Gururajan et al. and Divakaran et al. and Zheng et al. enables a speed determination. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the speed calculation of Wu et al. with the invention of Gururajan et al. and Divakaran et al. and Zheng et al. as this was known at the time of filing, the combination would have predictable results, and as Wu et al. indicate this can be used to “determine if an interaction event has occurred, such as a potential illegal drug deal involving at least one pedestrian and at least one vehicle” (abstract), thereby indicating a law enforcement application when combined with Gururajan et al. and Divakaran et al. and Zheng et al..

Claims 9 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gururajan et al. (US 20060252521 A1) and Divakaran et al. (IDS: US 9904852 B2) and Zheng et al. (US 20210142097 A1) as applied to claims 1 and 10 above, further in view of Joshi et al. (US 20180121733 A1).

Regarding claims 9 and 18, Gururajan et al. and Divakaran et al. and Zheng et al. disclose the method and device according to claims 1 and 10. Gururajan et al. and Divakaran et al. and Zheng et al. do not explicitly disclose determining a quality classification result of the image in the bounding box of the second target object with the to-be-determined state in the first image by a neural network, wherein the neural network is trained with sample images annotated with quality levels, and one sample image comprises at least one second target object with a to-be-determined state; and in response to that the quality classification result of the image in the bounding box of the second target object with the to-be-determined state determined by the neural network is consistent with the quality level of the image in the bounding box of the second target object with the to-be-determined state determined according to the state of the second target object with the to-be-determined state, taking the quality level of the image in the bounding box of the second target object with the to-be-determined state as a target quality level of the image in the bounding box of the second target object with the to-be-determined state. 

Joshi et al. teach determining a quality classification result of the image in the bounding box of the second target object with the to-be-determined state in the first image by a neural network, wherein the neural network is trained with sample images annotated with quality levels, and one sample image comprises at least one second target object with a to-be-determined state; and in response to that the quality classification result of the image in the bounding box of the second target object with the to-be-determined state determined by the neural network is consistent with the quality level of the image in the bounding box of the second target object with the to-be-determined state determined according to the state of the second target object with the to-be-determined state, taking the quality level of the image in the bounding box of the second target object with the to-be-determined state as a target quality level of the image in the bounding box of the second target object with the to-be-determined state (“Examples of these features include, but are not limited to, low level features such as blur, noise, luminance, color, contrast, etc., mid-level features such as salient objects, “rule of thirds” analysis, depth of field, landmarks and bounding boxes of detected faces or objects, etc., and high-level or semantic features such as facial expressions, motions of bounding boxes and landmarks, people, animals, objects, vehicles, etc. Image feature extraction techniques are known to those skilled in the art and will not be described in detail herein”, [0025], “The training features 120, which may include camera and/or object motion features, are then provided to a Quality Model Construction Module 125 along with the corresponding human quality ratings 140. The Quality Model Construction Module 125 then generates a quality model 145 by applying various machine-learning techniques to a combination of: (1) human quality ratings of output videos generated by the Video Processing Module 130, and (2) training features 120 extracted from the training sets used to generate corresponding output videos”, [0030], “For example, assuming that the image sequence processing algorithm is a video looping generation algorithm, the features extracted for model training (and subsequent quality scoring of candidate sets), may include, but are not limited to, features such as mean, max and median of images, blur, spatial pyramid blurriness features (e.g., global and spatially local blur estimation values), noise, luminance, luminance gradient histograms, brightness, brightness histograms, color, hue, saturation, contrast, salient objects, “rule of thirds” analysis, depth of field, number of faces, face size, ratio and location, face landmark features, such as eyes open or closed features, mouth open or closed, etc., facial expressions, image content motions, camera motions, composition features, etc. Multiple techniques for extracting such image features from image sequences are known to those skilled in the art and will not be described in detail herein”, [0048], neural networks, [0056], estimating motions for each training set and providing the estimated motions as one of the extracted features for use in training the quality model, [0084], processes or techniques further comprising estimating motions for each training set and providing the estimated motions as one of the extracted features for use in training the quality model, [0091]).                                                                               

Divakaran et al. and Joshi et al. are in the same art of tracking (Divakaran et al., col. 5, lines 30-35; Joshi et al., [0025]). The combination of Joshi et al. with Gururajan et al. and Divakaran et al. and Zheng et al. enables training a neural network. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the training of Joshi et al. with the invention of Gururajan et al. and Divakaran et al. and Zheng et al. as this was known at the time of filing, the combination would have predictable results, and as Joshi et al. indicate “the Quality Predictor reduces computational overhead by eliminating unnecessary processing of candidate sets when the image sequence processing algorithm is not expected to produce acceptable results” (abstract), thereby indicating a processing time benefit when combined with the invention of Gururajan et al. and Divakaran et al. and Zheng et al..

Allowable Subject Matter
Claims 3, 4, 5, 6, 8, 12, 13, 14, 15, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Relevant art cited in the office action dated 21 December, 2021.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661