DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:

A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 19, 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by D1.1
With regard to claim 1, D1 teach method for anti-spoofing detection, comprising: obtaining at least one image subsequence from an image sequence, wherein the image sequence is acquired by an image acquisition apparatus after a user is prompted to read a specified content, and the image subsequence comprises at least one image in the image sequence (see abstract, claim 1-3: video of user reading verification content); performing lipreading on the at least one image subsequence to obtain a lipreading result of the at least one image subsequence (see abstract, claims 1-2: lip feature sequence extracted); and determining an anti-spoofing detection result based on the lipreading result of the at least one image subsequence (abstract, claims 1-3: determining live person/user reading verification text).
With regard to claim 19, see discussion of claim 1. See p. 8 ¶ 7: computer processor and memory for implementing the method of claim 1. 
With regard to claim 20, see discussion of claim 1. See p. 8 ¶ 7: computer readable medium or memory. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-14 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over D1 and further in view of D2.2
With regard to claim 2, D1 teach method according to claim 1, but fail to explicitly teach wherein obtaining the at least one image subsequence from the image sequence comprises: obtaining the at least one image subsequence from the image sequence according to a segmentation result of audio corresponding to the image sequence. However, D2 teach the missing feature. See fig. 2, 3, ¶¶ 31-33: segmenting video sequence into image segments based on corresponding audio.
One skilled in the art before the effective filing date would have found it obvious to combine the teachings to arrive at the claimed invention. In particular, it would have been obvious to identify the image sequence segment corresponding to a word, or character or phoneme before extracting the lip reading from the image sequence yielding predictable and enhanced results by limiting the search for lipreading of particular character or word to a smaller segment of the video image sequence. 
With regard to claim 3, D2 teach method according to claim 2, wherein the segmentation result of the audio comprises: an audio segment corresponding to each of at least one character included in the specified content (see fig. 2, 3, ¶¶ 31-33: audio segment corresponding to each phoneme or character of word); and obtaining the at least one image subsequence from the image sequence according to the segmentation result of the audio corresponding to the image sequence comprises: obtaining from the image sequence the image subsequence corresponding to each character according to time information of the audio segment corresponding to each character in the specified content (see figs. 2, 3, ¶¶ 32-33: segmenting image sequence into groups or segments based on audio timing corresponding to each phoneme or character of a word). The motivation for combining the references is the same as stated above.
With regard to claim 4, D2 teach method according to claim 2, further comprising: obtaining the audio corresponding to the image sequence; and segmenting the audio to obtain at least one audio segment, wherein each of the at least one audio segment corresponds to one character in the specified content (see fig. 2-3, ¶ 31: audio speech segmented into respective words and phoneme which is read as character). 
With regard to claim 5, D1 teach method according to claim 1, wherein performing lipreading on the at least one image subsequence to obtain the lipreading result of the at least one image subsequence comprises: obtaining lip region images from at least two target images included in the image subsequence (see abstract, claims 1-3: mouth/lip reading results extracted); and obtaining the lipreading result of the image subsequence based on the lip region images of the at least two target images (see abstract, claims 1-3: lip reading results are extracted lip or mouth region of the image sequence).
With regard to claim 6, D1 teach method according to claim 5, but fail to explicitly teach wherein obtaining lip region images from at least two target images included in the image subsequence comprises: performing key point detection on the at least two target images to obtain information of face key points, wherein the information of the face key points comprises position information of lip key points; and obtaining the lip region images from the at least two target images based on the position information of the lip key points. However, Examiner takes Official Notice to the fact that extracting lip region from a sequence of images based on face or lip key points is extremely  well known in the art before the effective filing date and would have been particularly obvious for one skilled in the art to incorporate known teachings into the configuration of D1 yielding predictable and enhanced results by limiting the search for the lip region based on key points and further enhancing the results of lip reading. 
With regard to claim 7, D1 teach method according to claim 5, but fail to explicitly teach wherein obtaining the lipreading result of the image subsequence based on the lip region images of the at least two target images comprises: performing recognition processing on the input lip region images of the at least two target images by using a first neural network model to output the lipreading result of the image subsequence. However, Examiner takes Official Notice to the fact that lip reading and lip region identification based on neural network is well known in the art before the effective filing date and one skilled in the art would have found it obvious to incorporate known teachings into the configuration of D1 yielding predictable and enhanced lip recognition and lip reading results based on trained neural networks. 
With regard to claim 8, D1 teach method according to claim 1, but fail to explicitly wherein performing lipreading on the at least one image subsequence to obtain the lipreading result of the at least one image subsequence comprises: obtaining lip morphology information of the at least two target images included in the image subsequence; and obtaining the lipreading result of the image subsequence based on the lip morphology information of the at least two target images. However, Examiner takes Official Notice to the fact lip reading and localization based on morphology operators is well known in the art before the effective filing date and one skilled in the art would have found it obvious to incorporate known teachings into the configuration of D1 yielding predictable and enhanced lip recognition and reading results by using the lip shape information. 
With regard to claim 9, D1 teach method according to claim 8, but fail to explicitly teach wherein the obtaining lip morphology information of the at least two target images included in the image subsequence comprises: performing feature extraction processing on a lip region image obtained from each of the at least two target images to obtain a lip morphology feature of the each target image, wherein the lip morphology information of the target image comprises the lip morphology feature. However, Examiner takes Official Notice to the fact lip reading and localization based on morphology operators for extracting lip features is well known in the art before the effective filing date and one skilled in the art would have found it obvious to incorporate known teachings into the configuration of D1 yielding predictable and enhanced lip recognition and reading results by using the lip shape information. 
With regard to claim 10, D1 teach method according to claim 5, but fail to explicitly teach further comprising: selecting a first image that satisfies a predetermined quality standard, from the image subsequence; and determining the first image and at least one second image adjacent to the first image as the at least two target images. However, Examiner takes Official Notice to the fact that evaluating the quality of images prior to processing is well known in the art before the effective filing date and one skilled in the art would have found it obvious to incorporate known teachings into the configuration of D1 yielding predictable and enhanced lip recognition and reading results by eliminating degraded or defective images from processing. 
With regard to claim 11, D1 teach method according to claim 10, wherein the at least one second image comprises at least one image that is before the first image and adjacent to the first image, and comprises at least one image that is after the first image and adjacent to the first image. (see abstract, claim 1-3: video of user reading verification content; implicitly evaluating sequence of images in the video comprising adjacent frames).
With regard to claim 12, D1 teach method according to claim 1, wherein each of the at least one image subsequence corresponds to one character in the specified content (see abstract, claim 1-3: verification content read by user comprises characters).

With regard to claim 13, D1 teach method according to claim 1, wherein determining the anti-spoofing detection result based on the lipreading result of the at least one image subsequence comprises: fusing the lipreading result of the at least one image subsequence to obtain a fusion recognition result (see abstract, claims 1-3: mouth/lip features extracted from sequence of images are combined or fused to identify a character results); but fails to teach determining whether the fusion recognition result matches a voice recognition result of the audio corresponding to the image sequence; and determining the anti-spoofing detection result based on a matching result between the fusion recognition result and the voice recognition result of the audio. However, Examiner takes Official Notice to the fact that multi modal speech recognition based on audio and visual information such as lip reading is well known in the art before the effective filing date and would have been particularly obvious for one skilled in the art to incorporate known teachings into the configuration of D1 yielding predictable and enhanced results by increasing the confidence in the recognition results by fusing multiple modalities. 
With regard to claim 14, D2 teach method according to claim 13, teach wherein fusing the lipreading result of the at least one image subsequence to obtain the fusion recognition result comprises: fusing, based on the voice recognition result of the audio corresponding to the image sequence, the lipreading result of the at least one image subsequence to obtain the fusion recognition result (fig. 2, 3, ¶¶ 31-33: segmenting video sequence into image segments based on corresponding audio). In other words, the lip reading performed in D1 by fusing sequence of images from the video is in turn based on the voice recognition results which is used for segmenting the image sequence into subsequences, as noted above in the combination of D1 and D2. 

With regard to claim 18, D1 teach method according to claim 1, but fail to explicitly teach wherein the lipreading result of the image subsequence comprises: probabilities that the image subsequence is classified as each of multiple predetermined characters corresponding to the specified content. However, Examiner takes Official Notice to the fact that assigning probabilities to recognition results is extremely well known in the art before the effective filing date and would have been particularly obvious to incorporate known teachings into the configuration of D1 yielding predictable and enhanced results by increasing the confidence of the results based on probabilities. 

	 
Claims 15-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AVINASH YENTRAPATI whose telephone number is (571)270-7982.  The examiner can normally be reached on 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AVINASH YENTRAPATI/Primary Examiner, Art Unit 2662                                                                                                                                                                                                        




    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Machine translation of CN106529379.
        2 US Publication No. 20020/0161582.