DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed on 4/15/2021 has been entered and considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 6, and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fujimura (20050271279) in view of Mao (20180024641), Iwamura (20020057383), and Fang (20130188836).
As to claim 1, Fujimura (Figs. 2-16) discloses a method for recognizing a dynamic gesture (dynamic and static human gesture recognition [0034,0035,0122]), comprising: 
positioning a dynamic gesture in a video stream to be detected to obtain a dynamic gesture box (video is generated of a scene area 400 with dynamic regions detected relative to 
capturing an image block corresponding to the dynamic gesture box from each of multiple image frames of the video stream (a sequence of image frames includes the dynamic regions 702a-h including and surrounding the defined head and torso box regions [0039,0040,0078-0080]); 
generating a detection sequence based on the captured image blocks (shape, trajectory, and other matching techniques are performed to objects in the detection regions for each of a plurality of frames in the video stream sequence [0039,0090-0092,0099,0114]); and 
performing dynamic gesture recognition according to the detection sequence (gesture recognition is performed according to the identified sequence of movements within the sequence of frames [0046,0125]),
wherein the performing dynamic gesture recognition according to the detection sequence comprises: 
determining multiple inter-frame image differences in the detection sequence (differences between a plurality of sequenced frames are determined including trajectory, orientation, and number changes of fingers, palms, etc to detect a sequence of gestural movements [0090-0093,0122]); 
generating an image difference sequence based on the multiple inter-frame image differences (sequence of trajectory and orientation changes of the target object are logged between the plurality of frames [0122,0124]); and 
performing the dynamic gesture recognition according to the detection sequence and the image difference sequence (gesture recognition is performed according to the identified sequence of movements within the sequence of frames [0046,0125]).
Fujimura does not explicitly disclose wherein respective parts of the multiple image frames, which are out of the dynamic gesture box, are removed or wherein the detection sequence is a sequence of images different from the multiple image frames of the video stream.
Mao (Figs. 2-5) discloses respective parts of the multiple image frames, which are out of the dynamic gesture box, are removed (the original acquired image frame including the gesture is cropped according to the bounding box around the gesture in refined detection models [0023,0040,0065]) and 
the detection sequence is a sequence of images different from the multiple image frames of the video stream (gesture bounding box defines a region cropped and analyzed across consecutive frames which is smaller and different from the original larger full frame [0023,0040,0065]).
At the time the invention was effectively filed, it would have been obvious for a person of ordinary skill in the art to have removed parts of the image frames as taught by Mao in the device of Fujimura. The suggestion/motivation would have been to limit analyzed area thus optimize position tracking [0019].
Fujimura in view of Mao does not explicitly disclose wherein each of the multiple inter-frame image differences is an image obtained by calculating a difference between pixels at each same position in two adjacent or non-adjacent image frames.
Iwamura (Figs. 2-22) discloses each of the multiple inter-frame image differences is an image obtained by calculating a difference between pixels at each same position in two image frames (motion detection steps within successive image frames used to detect hand gesture movements is based on comparing the pixel value of each pixel location within an image frame with that of a previous image frame and the difference indicates motion at the location [0008,0014,0032]).
Iwamura in the device of Fujimura as modified by Mao. The suggestion/motivation would have been to improve motion detection in a small scale while minimizing detection errors [0005,0006,0014].
Fujimura in view of Mao and Iwamura does not explicitly disclose the difference is between non-adjacent and wherein the inter-frame image difference of the two non-adjacent image frames is an image difference between frames spaced by a fixed number of frames or between random frames.
Fang (Figs. 3A-3L,5,6) discloses gesture detection by calculating the difference between non-adjacent frames and wherein the inter-frame image difference of the two non-adjacent image frames is an image difference between frames spaced by a fixed number of frames or between random frames (in order to perform asymmetrical gesture detection, a difference is calculated between each of frames 1, 2, and 3 and that of a reference frame 4 and therefore frames 1 and 2 are fixed distances from 4 respectively [0041,0042,0043,0045,0047,0048]).
At the time the invention was effectively filed, it would have been obvious for a person of ordinary skill in the art to have determined differences between non-adjacent frames as taught by Fang in the device of Fujimura as modified by Mao and Iwamura. The suggestion/motivation would have been to detect movement more accurately due to better sensitivity in situations where small changes are experienced and prevent delays and reduce complexity and computational cost [0041,0042,0043,0045,0048].
As to claim 14, Fujimura (Figs. 2-16) discloses an electronic device (computing device 103 [0034]), comprising: 

a processor, configured to execute the stored processor-executable instructions to perform operations (computer processor performs instructions stored on media to perform a gesture recognition method [0034]) of: 
positioning a dynamic gesture in a video stream to be detected to obtain a dynamic gesture box (video is generated of a scene area 400 with dynamic regions detected relative to determined static head and torso box region at a stream of 14-30 frames per second [0039,0040,0078-0080]); 
capturing an image block corresponding to the dynamic gesture box from each of multiple image frames of the video stream (a sequence of image frames includes the dynamic regions 702a-h including and surrounding the defined head and torso box regions [0039,0040,0078-0080]); 
generating a detection sequence based on the captured image blocks (shape, trajectory, and other matching techniques are performed to objects in the detection regions for each of a plurality of frames in the video stream sequence [0039,0090-0092,0099,0114]); and 
performing dynamic gesture recognition according to the detection sequence (gesture recognition is performed according to the identified sequence of movements within the sequence of frames [0046,0125]),
wherein the performing dynamic gesture recognition according to the detection sequence comprises: 
determining multiple inter-frame image differences in the detection sequence (differences between a plurality of sequenced frames are determined including trajectory, 
generating an image difference sequence based on the multiple inter-frame image differences (sequence of trajectory and orientation changes of the target object are logged between the plurality of frames [0122,0124]); and 
performing the dynamic gesture recognition according to the detection sequence and the image difference sequence (gesture recognition is performed according to the identified sequence of movements within the sequence of frames [0046,0125]).
Fujimura does not explicitly disclose wherein respective parts of the multiple image frames, which are out of the dynamic gesture box, are removed or wherein the detection sequence is a sequence of images different from the multiple image frames of the video stream.
Mao (Figs. 2-5) discloses respective parts of the multiple image frames, which are out of the dynamic gesture box, are removed (the original acquired image frame including the gesture is cropped according to the bounding box around the gesture in refined detection models [0023,0040,0065]) and 
the detection sequence is a sequence of images different from the multiple image frames of the video stream (gesture bounding box defines a region cropped and analyzed across consecutive frames which is smaller and different from the original larger full frame [0023,0040,0065]).
At the time the invention was effectively filed, it would have been obvious for a person of ordinary skill in the art to have removed parts of the image frames as taught by Mao in the device of Fujimura. The suggestion/motivation would have been to limit analyzed area thus optimize position tracking [0019].
Fujimura in view of Mao does not explicitly disclose wherein each of the multiple inter-frame image differences is an image obtained by calculating a difference between pixels at each same position in two adjacent or non-adjacent image frames.
Iwamura (Figs. 2-22) discloses each of the multiple inter-frame image differences is an image obtained by calculating a difference between pixels at each same position in two adjacent or non-adjacent image frames (motion detection steps within successive image frames used to detect hand gesture movements is based on comparing the pixel value of each pixel location within an image frame with that of a previous image frame and the difference indicates motion at the location [0008,0014,0032]).
At the time the invention was effectively filed, it would have been obvious for a person of ordinary skill in the art to have determined differences by calculating differences between pixels as taught by Iwamura in the device of Fujimura as modified by Mao. The suggestion/motivation would have been to improve motion detection in a small scale while minimizing detection errors [0005,0006,0014].
Fujimura in view of Mao and Iwamura does not explicitly disclose the difference is between non-adjacent and wherein the inter-frame image difference of the two non-adjacent image frames is an image difference between frames spaced by a fixed number of frames or between random frames.
Fang (Figs. 3A-3L,5,6) discloses gesture detection by calculating the difference between non-adjacent frames and wherein the inter-frame image difference of the two non-adjacent image frames is an image difference between frames spaced by a fixed number of frames or between random frames (in order to perform asymmetrical gesture detection, a difference is calculated between each of frames 1, 2, and 3 and that of a reference frame 4 and therefore frames 1 and 2 are fixed distances from 4 respectively [0041,0042,0043,0045,0047,0048]).
Fang in the device of Fujimura as modified by Mao and Iwamura. The suggestion/motivation would have been to detect movement more accurately due to better sensitivity in situations where small changes are experienced and prevent delays and reduce complexity and computational cost [0041,0042,0043,0045,0048].
As to claim 15, Fujimura (Figs. 2-16) discloses a control method using gesture interaction (device or computer system control through gesture recognition [0035,0036]), comprising: 
obtaining a video stream (image capturing system 102 captures a video stream of the detection region [0033,0038]); 
determining a dynamic gesture recognition result of the video stream by the method according to claim 1 (see rejection of claim 1); and 
controlling a device to execute an operation corresponding to the dynamic gesture recognition result (dynamic gesture recognition is used to control many different computer gesture-based interaction systems such as vehicle applications, sign language translation, robots, etc [0035,0036,0046]). 
As to claim 20, Fujimura (Figs. 2-16) discloses a non-transitory computer-readable storage medium having stored thereon computer-readable instructions (media storing instructions to be performed by the computing device [0034]) that, when executed by a processor, cause the processor to perform operations of a method for recognizing a dynamic gesture (computer processor performs instructions stored on media to perform a gesture recognition method [0034]), the method comprising: 
positioning a dynamic gesture in a video stream to be detected to obtain a dynamic gesture box (video is generated of a scene area 400 with dynamic regions detected relative to 
capturing an image block corresponding to the dynamic gesture box from each of multiple image frames of the video stream (a sequence of image frames includes the dynamic regions 702a-h including and surrounding the defined head and torso box regions [0039,0040,0078-0080]); 
generating a detection sequence based on the captured image blocks (shape, trajectory, and other matching techniques are performed to objects in the detection regions for each of a plurality of frames in the video stream sequence [0039,0090-0092,0099,0114]); and 
performing dynamic gesture recognition according to the detection sequence (gesture recognition is performed according to the identified sequence of movements within the sequence of frames [0046,0125]),
wherein the performing dynamic gesture recognition according to the detection sequence comprises: 
determining multiple inter-frame image differences in the detection sequence (differences between a plurality of sequenced frames are determined including trajectory, orientation, and number changes of fingers, palms, etc to detect a sequence of gestural movements [0090-0093,0122]); 
generating an image difference sequence based on the multiple inter-frame image differences (sequence of trajectory and orientation changes of the target object are logged between the plurality of frames [0122,0124]); and 
performing the dynamic gesture recognition according to the detection sequence and the image difference sequence (gesture recognition is performed according to the identified sequence of movements within the sequence of frames [0046,0125]).
Fujimura does not explicitly disclose wherein respective parts of the multiple image frames, which are out of the dynamic gesture box, are removed or wherein the detection sequence is a sequence of images different from the multiple image frames of the video stream.
Mao (Figs. 2-5) discloses respective parts of the multiple image frames, which are out of the dynamic gesture box, are removed (the original acquired image frame including the gesture is cropped according to the bounding box around the gesture in refined detection models [0023,0040,0065]) and 
the detection sequence is a sequence of images different from the multiple image frames of the video stream (gesture bounding box defines a region cropped and analyzed across consecutive frames which is smaller and different from the original larger full frame [0023,0040,0065]).
At the time the invention was effectively filed, it would have been obvious for a person of ordinary skill in the art to have removed parts of the image frames as taught by Mao in the device of Fujimura. The suggestion/motivation would have been to limit analyzed area thus optimize position tracking [0019].
Fujimura in view of Mao does not explicitly disclose wherein each of the multiple inter-frame image differences is an image obtained by calculating a difference between pixels at each same position in two adjacent or non-adjacent image frames.
Iwamura (Figs. 2-22) discloses each of the multiple inter-frame image differences is an image obtained by calculating a difference between pixels at each same position in two adjacent or non-adjacent image frames (motion detection steps within successive image frames used to detect hand gesture movements is based on comparing the pixel value of each pixel location within an image frame with that of a previous image frame and the difference indicates motion at the location [0008,0014,0032]).
Iwamura in the device of Fujimura as modified by Mao. The suggestion/motivation would have been to improve motion detection in a small scale while minimizing detection errors [0005,0006,0014].
Fujimura in view of Mao and Iwamura does not explicitly disclose the difference is between non-adjacent and wherein the inter-frame image difference of the two non-adjacent image frames is an image difference between frames spaced by a fixed number of frames or between random frames.
Fang (Figs. 3A-3L,5,6) discloses gesture detection by calculating the difference between non-adjacent frames and wherein the inter-frame image difference of the two non-adjacent image frames is an image difference between frames spaced by a fixed number of frames or between random frames (in order to perform asymmetrical gesture detection, a difference is calculated between each of frames 1, 2, and 3 and that of a reference frame 4 and therefore frames 1 and 2 are fixed distances from 4 respectively [0041,0042,0043,0045,0047,0048]).
At the time the invention was effectively filed, it would have been obvious for a person of ordinary skill in the art to have determined differences between non-adjacent frames as taught by Fang in the device of Fujimura as modified by Mao and Iwamura. The suggestion/motivation would have been to detect movement more accurately due to better sensitivity in situations where small changes are experienced and prevent delays and reduce complexity and computational cost [0041,0042,0043,0045,0048].
As to claim 2, Fujimura (Figs. 2-16) discloses the positioning a dynamic gesture in a video stream to be detected to obtain a dynamic gesture box comprises: positioning a static gesture in at least one image frame of the multiple image frames of the video stream to obtain a static 
determining the dynamic gesture box according to the static gesture box of the at least one image frame (dynamic detection regions detected relative to determined static head and torso box region [0039,0040,0078-0080]). 
As to claim 3, Fujimura (Figs. 2-16) discloses the determining the dynamic gesture box according to the static gesture box of the at least one image frame comprises: enlarging the static gesture box of the at least one image frame to obtain the dynamic gesture box (dynamic gesture detection region include plural regions surrounding the statically defined head and torso box regions [0049,0077,0078]). 
As to claim 4, Fujimura (Figs. 2-16) discloses the static gesture box of the at least one image frame of the multiple image frames of the video stream meets the following condition: the static gesture box is located within the dynamic gesture box, or the static gesture box is as same as the dynamic gesture box (dynamic gesture detection region include plural regions surrounding the statically defined head and torso box regions as well as the head and box regions themselves [0049,0077,0078]). 
As to claim 6, Fujimura (Figs. 2-16) discloses one of the multiple inter-frame image differences is an image difference between two adjacent reference frames in the detection sequence (each of the frames of the sequence are analyzed and therefore adjacent as well as the start and end reference points [0122]). 
As to claim 16, Fujimura (Figs. 2-16) discloses the controlling a device to execute an operation corresponding to the dynamic gesture recognition result comprises: obtaining the operation instruction corresponding to the dynamic gesture recognition result according to a predetermined correspondence between the dynamic gesture recognition result and the 
As to claim 17, Fujimura (Figs. 2-16) discloses the controlling the device to execute a corresponding operation according to the operation instruction comprises: controlling a window, a door, or a vehicle-mounted system of a vehicle according to the operation instruction (input action provides instructions to a vehicle-mounted system to control driver assistance features or other vehicular safety features [0036,0046]). 
As to claim 18, Fujimura (Figs. 2-16) discloses the predefined dynamic action comprises a dynamic gesture comprising at least one of: single-finger clockwise/counterclockwise rotation, palm left/right swing, two-finger poke, extending the thumb and pinky finger, press-down with the palm downward, lift with the palm upward, fanning to the left/right with the palm, left/right movement with the thumb extended, long slide to the left/right with the palm, changing a fist into a palm with the palm upward, changing a palm into a fist with the palm upward, changing a palm into a fist with the palm downward, changing a fist into a palm with the palm downward, single-finger slide, pinch-in with multiple fingers, single-finger double click, single-finger single click, multi-finger double click, or multi-finger single click (hand shape detection such as number of hands, fingers, palms, etc. is used with orientation and trajectory to determine finger and palm movements including mouse clicks or any other possible body part gesture [0090-0093]); and 

As to claim 19, Fujimura (Figs. 2-16) discloses an electronic device (computing device 103 [0034]), comprising: 
a memory storing processor-executable instructions (media storing instructions to be performed by the computing device [0034]); and 
a processor, configured to execute the stored processor-executable instructions to perform operations (computer processor performs instructions stored on media to perform a gesture recognition method [0034]) of the control method using gesture interaction according to claim 16 (see rejection of claim 16). 
Claims 7-9 rejected under 35 U.S.C. 103 as being unpatentable over Fujimura (20050271279) in view of Mao (20180024641), Iwamura (20020057383), Fang (20130188836), and Sinha (20170168586).
As to claim 7, Fujimura (Figs. 2-16) discloses the performing the dynamic gesture recognition according to the detection sequence and the image difference sequence comprises: inputting the detection sequence into a first dynamic gesture recognition model to obtain a first dynamic gesture category by the first dynamic gesture recognition model (detection sequences are compared to a collection of profiles including hand to body part location information and hand shape information where the first model is a shape matching model to determine a matching shape profile from the shape database [0044,0045,0094,0096,0099]);

determining a dynamic gesture recognition result according to the first dynamic gesture category and the second dynamic gesture category (gesture recognition is performed according to the identified shape and gesture profile matching within the sequence of frames [0046,0125]).
Fujimura in view of Mao, Iwamura, and Fang does not expressly disclose obtaining a first dynamic gesture category prediction probability output; or obtain a second dynamic gesture category prediction probability output; and determining a dynamic gesture recognition result according to the prediction probabilities. 
Sinha (Figs. 1-7) discloses obtain a first dynamic gesture category prediction probability output; or obtain a second dynamic gesture category prediction probability output; and determining a dynamic gesture recognition result according to the prediction probabilities (gesture recognition is based on a plurality of dynamic gesture recognition models defined for specific discrete regions and scenarios where the matching of each is determined according to a nearest neighbor if the match is not exact and therefore based on the highest probability in order to provide each output and further where the overall gesture recognition is based on a combination of the resulting highest probable matches [0003,0023,0028,0030,0034,0035,0047]).
At the time the invention was effectively filed, it would have been obvious for a person of ordinary skill in the art to have determined gestures according to prediction probabilities as taught by Sinha in the gesture detection of Fujimura as modified by Mao, Iwamura, and Fang. 
As to claim 8, Fujimura in view of Mao, Iwamura, and Fang does not expressly disclose the first dynamic gesture recognition model is a first neural network, the second dynamic gesture recognition model is a second neural network, and the first neural network and the second neural network have a same structure or different structures.
Sinha (Figs. 1-7) discloses obtain the first dynamic gesture recognition model is a first neural network, the second dynamic gesture recognition model is a second neural network, and the first neural network and the second neural network have a same structure or different structures (each recognition model is a different neural network layer for each discrete detection portion [0020,0030,0032,0034]).
At the time the invention was effectively filed, it would have been obvious for a person of ordinary skill in the art to have used neural networks as taught by Sinha in the gesture detection of Fujimura as modified by Mao, Iwamura, and Fang. The suggestion/motivation would have been to improve the operation of automated systems to recognize hand gestures that require minimalistic hardware, unconstrained physical setup and more broadly influence future deep learning constructs and improve overall robustness [0006].
As to claim 9, Fujimura (Figs. 2-16) discloses performing the capturing a plurality of times to obtain the detection sequence, generating the image difference sequence a plurality of times, and performing the dynamic gesture recognition a plurality of times according to the detection sequence and the image difference sequence ( differences between a stream of 14-30 sequenced frames per second are determined including trajectory, orientation, and number 
Fujimura in view of Mao, Iwamura, and Fang does not expressly disclose determining the dynamic gesture recognition result according to a probability of a dynamic gesture category, the probability being obtained by dynamic gesture recognition each time. 
Sinha (Figs. 1-7) discloses determining the dynamic gesture recognition result according to a probability of a dynamic gesture category, the probability being obtained by dynamic gesture recognition each time (gesture recognition is based on a plurality of dynamic gesture recognition models defined for specific discrete regions and scenarios where the matching of each is determined according to a nearest neighbor if the match is not exact and therefore based on the highest probability in order to provide each output and further where the overall gesture recognition is based on a combination of the resulting highest probable matches [0003,0023,0028,0030,0034,0035,0047]).
At the time the invention was effectively filed, it would have been obvious for a person of ordinary skill in the art to have determined gestures according to prediction probabilities as taught by Sinha in the gesture detection of Fujimura as modified by Mao, Iwamura, and Fang. The suggestion/motivation would have been to improve the operation of automated systems to recognize hand gestures that require minimalistic hardware, unconstrained physical setup and more broadly influence future deep learning constructs and improve overall robustness [0006].

Allowable Subject Matter
Claims 10-13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Response to Arguments
Applicant's arguments with respect to amended claims 1, 14, and 20 and claims dependent thereon have been considered but are moot in view of the new ground(s) of rejection. 


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT M STONE whose telephone number is (571)270-5310.  The examiner can normally be reached on 9:30am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ROBERT M STONE/Examiner, Art Unit 2628              

/NITIN PATEL/Supervisory Patent Examiner, Art Unit 2628