DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claims 4 - 10 and 24 - 30 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: 
With regards to claims 4 and 14, several of the features of these claims were known in the art as evidenced by Molchanov et al, “Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks”, which renders obvious the limitations of parent claim 1. Molchanov further discloses obtaining a combined gesture recognition result by performing averaging on the gesture recognition results of the N consecutive video segments at p. 4209, col. 1, par. 2. Molchanov alternatively discloses obtaining a combined gesture recognition result by summing probabilities along an overall gesture motion trend (e.g., “path π”) composed of the input N consecutive gesture recognition results (e.g., p(π /x)”), and to output a gesture corresponding to the overall gesture motion trend as the combined gesture recognition result at pp. 4209-10, sec. 3.2, in the subsec. entitled “Connectionist temporal classification”. But, in neither embodiment does Molchanov disclose performing weighted averaging on the gesture recognition results of the N consecutive video segments based on the weight coefficients respectively corresponding to the gesture recognition results of the N consecutive video segments, to obtain the combined gesture recognition result, where the weight coefficients were preset.
With regards to claims 5 and 15, several of the features of these claims were known in the art as evidenced by Molchanov et al, “Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks”, which renders obvious the limitations of parent claim 1. Molchanov further discloses obtaining a combined gesture recognition result by performing averaging on the gesture recognition results of the N consecutive video segments at p. 4209, col. 1, par. 2. But, Molchanov does not disclose obtaining an optical flow information image corresponding to the first video segment, wherein the optical flow information image comprises optical flow information between a first image in the M images and a pth image before the first image, the first image is one of the M images, and the optical flow information comprises instantaneous speed vector information of a pixel in the image, and perform gesture recognition on the optical flow information image by using a first deep learning algorithm, to obtain a first recognition result, wherein p is an integer greater than or equal to 1. Nor does Molchanov disclose combine the first recognition result with a second recognition result, to obtain the gesture recognition result of the first video segment.
With regards to claims 6 - 7, 10, and 26 - 27, 30, these claims depend from claims 5 and 15, respectively, and therefore incorporate the features of those claims that were found allowable. These claims are found allowable for the same reasons as were provided with respect to their parent claim(s).
With regards to claims 8 and 28, several of the features of these claims were known in the art as evidenced by Molchanov et al, “Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks”, which renders obvious the limitations of parent claim 1. Molchanov further discloses determining a time window with a preset time length in the video stream, wherein an end moment of the time window is within a time period corresponding to the first video segment at pp. 4208-9, sec. 3.1. But, Molchanov does not disclose conditioning the step of obtaining M images upon a determination that an a reference image in the time window.
With regards to claims 9 and 29, these claims depend from claims 8 and 28, respectively, and therefore incorporate the features of those claims that were found allowable. These claims are found allowable for the same reasons as were provided with respect to their parent claim(s).
Other prior art considered and hereby made of record includes:
Dai et al (U.S. PG Pub. No. 2018/0088679), which discloses obtaining a combined gesture recognition result by combining shape (“hand contours”) recognition results of N consecutive video segments in a video stream at ¶ [0049], but does not disclose obtaining gesture recognition results of N consecutive video segments.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 - 3 and 21 - 23 are rejected under 35 U.S.C. 103 as being unpatentable over Molchanov et al, “Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks.”
With regards to claim 1, Molchanov discloses a gesture recognition device wherein, because the device comprises a trained deep 3D-CNN, the device implicitly comprises a processor and a memory  at pp. 4208-9, secs. 3.1-3.2.
Molchanov discloses obtaining M images, wherein the M images (“clips of length m”) are extracted from a first video segment (“splitting the entire video V into T clips”) in a video stream, the first video segment is a video segment in the video stream at pp. 4208-9, sec. 3.1; to wit: “We perform classification by splitting the entire video V into T clips of length m ...” But, Molchanov discloses that M is an integer greater than or equal to 1, rather than 
greater than or equal to 2, at pp. 4208-9, sec. 3.1: “We define a video clip as … of m ≥ 1 sequential frames with c channels of size k×ℓ pixels ending at time t.” However, the claimed range lies inside the range of the prior art and is an obvious variant of the prior art. In the case where the claimed ranges "overlap or lie inside ranges disclosed by the prior art" a prima facie case of obviousness exists. In re Wertheim, 541 F.2d 257, 191 USPQ 90 (CCPA 1976); In re Woodruff, 919 F.2d 1575, 16 USPQ2d 1934 (Fed. Cir. 1990). See, also, MPEP 2144.05(I). "[A] prior art reference that discloses a range encompassing a somewhat narrower claimed range is sufficient to establish a prima facie case of obviousness." In re Peterson, 315 F.3d 1325, 1330, 65 USPQ2d 1379, 1382-83 (Fed. Cir. 2003). See also In re Harris, 409 F.3d 1339, 74 USPQ2d 1951 (Fed. Cir. 2005). Applicants can rebut a prima facie case of obviousness by showing the criticality of the range. "The law is replete with cases in which the difference between the claimed invention and the prior art is some range or other variable within the claims. . . . In such a situation, the applicant must show that the particular range is critical, generally by showing that the claimed range achieves unexpected results relative to the prior art range." In re Woodruff, 919 F.2d 1575, 16 USPQ2d 1934 (Fed. Cir. 1990).
Molchanov discloses perform gesture recognition on the M images by using a deep learning algorithm, to obtain a gesture recognition result corresponding to the first video segment at pp. 4208-9, sec. 3.1; to wit: “We perform classification by splitting the entire video V m and computing the set of class-conditional probabilities S = {s0, s1, ..., sT−1} for each individual clip. For offline gesture classification, we average the probabilities of all the clips belonging to a pre-segmented gesture                 
                    
                        
                            s
                        
                        
                            a
                            v
                            g
                        
                    
                    =
                    
                        
                            1
                        
                        
                            T
                        
                    
                    
                        
                            ∑
                            
                                s
                                ∈
                                S
                            
                        
                        
                            s
                        
                    
                
            , and the predicted class is                  
                    
                        
                            y
                        
                        ^
                    
                    =
                    
                        
                            a
                            r
                            g
                            m
                            a
                            x
                        
                        
                            i
                        
                    
                    
                        
                            
                                
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    a
                                                    v
                                                    g
                                                
                                            
                                        
                                    
                                
                                
                                    i
                                
                            
                        
                    
                
            , across all gesture classes i.”
Molchanov discloses, after obtaining gesture recognition results of N consecutive video segments in the video stream that comprise the first video segment, combine the gesture recognition results of the N consecutive video segments, to obtain a combined gesture recognition result, wherein N is an integer greater than or equal to 2 at pp. 4208-9, sec. 3.1; to wit: “We perform classification by splitting the entire video V into T clips of length m and computing the set of class-conditional probabilities S = {s0, s1, ..., sT−1} for each individual clip. For offline gesture classification, we average the probabilities of all the clips belonging to a pre-segmented gesture                 
                    
                        
                            s
                        
                        
                            a
                            v
                            g
                        
                    
                    =
                    
                        
                            1
                        
                        
                            T
                        
                    
                    
                        
                            ∑
                            
                                s
                                ∈
                                S
                            
                        
                        
                            s
                        
                    
                
            , and the predicted class is                  
                    
                        
                            y
                        
                        ^
                    
                    =
                    
                        
                            a
                            r
                            g
                            m
                            a
                            x
                        
                        
                            i
                        
                    
                    
                        
                            
                                
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    a
                                                    v
                                                    g
                                                
                                            
                                        
                                    
                                
                                
                                    i
                                
                            
                        
                    
                
            , across all gesture classes i.”
With regards to claim 2, Molchanov discloses inputting the gesture recognition results of the N consecutive video segments (“input sequence X”) into a pre-trained first machine learning model to obtain the combined gesture recognition result (e.g., “p(y/X)”), wherein the first machine learning model is used to determine an overall gesture motion trend (e.g., “path π”) composed of the input N consecutive gesture recognition results (e.g., p(π /x)”), and to output a gesture corresponding to the overall gesture motion trend as the combined gesture recognition result at pp. 4209-10, sec. 3.2, in the subsec. entitled “Connectionist temporal classification”; to wit: “[T]he network computes the probability of observing a particular gesture (or no gesture) k at time t in an input sequence X: 
    PNG
    media_image1.png
    17
    180
    media_image1.png
    Greyscale
. We define a path π as a possible mapping of the input sequence X into a sequence of class labels y. The probability of observing path π is 
    PNG
    media_image2.png
    19
    118
    media_image2.png
    Greyscale
, where πt is the class label predicted at time t in path π… Under B, many paths π result in the same event sequence y. The probability of observing a particular −1(y)={π : B(π) = y}:

    PNG
    media_image3.png
    38
    180
    media_image3.png
    Greyscale


 See, also, p. 4209, sec. 3.2 (“Let X = {V0, V1, ..., VP−1} be a mini-batch of training examples in the form of weakly-segmented gesture videos Vi.”).
With regards to claim 3, Molchanov discloses the first machine learning model is a neural network model (e.g., “3D-CNN”) at pp. 4208-9, sec. 3.1. As a matter of claim construction, applicant has set out in claim 1 that “N is an integer greater than or equal to 2.” Molchanov does not specify the number of neurons in its network, but it speaks of them in the plurality at p. 4210, col. 2, par. 1 (“…the activations of the others neurons…”). Thus, the 3D-CNN neural network model taught by Molchanov has N neurons. Moreover, Molchanov discloses the first machine learning model may comprise a support vector machine (SVM) model at pp. 4208-9, sec. 3.1
With regards to claim 21, the steps performed by the method of this claim are obvious over Molchanov for the same reasons as were presented with respect to claim 1, which recites an apparatus configured to perform these same steps. 
With regards to claim 22, the steps performed by the method of this claim are obvious over Molchanov for the same reasons as were presented with respect to claim 2, which recites an apparatus configured to perform these same steps. 
With regards to claim 23, the steps performed by the method of this claim are obvious over Molchanov for the same reasons as were presented with respect to claim 3, which recites an apparatus configured to perform these same steps.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID F DUNPHY whose telephone number is (571)270-1230. The examiner can normally be reached 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on 5712727332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DAVID F DUNPHY/Primary Examiner, Art Unit 2668