DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claim 18 is rejected under 35 U.S.C. 101  because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claim is directed to towards a "computer program product" that includes "code instructions", which broadly encompasses a computer program per se. Such computer programs, per se, are not, in and of themselves, methods or machines, nor are they physical products of manufacture or compositions of matter. Therefore, such programs do not fall into any of the categories of eligible subject matter defined in 35 U.S.C. § 101 and are not, by themselves, eligible for patent protection. Such programs can be eligible for patent protection if claimed as embodied on or in a computer readable storage device or medium, but only if the claim clearly and unambiguously excludes transitory, propagating signals from the full scope of the claimed subject matter, as such signals are also not eligible under 35 U.S.C. § 101. It is suggested that amending the claim language to define the computer program product as having the code instructions embodied on a "non-transitory computer-readable medium" would satisfy these requirements and would limit the claimed invention to eligible subject matter.

Claim Objections
Claim 14 is objected to because of the following informalities:  the claim recites in line 5 "extract at least one feature vector from the at least some images, the at least one feature vector expresses movement of at least some of the plurality of wrinkles and/or if the other dynamic facial features". It appears that “if” is a typographical error and the applicant meant to type “of”. Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-5, 7-8, 10-11, 13-14, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz (US PG-Pub. 20200036528) in view of Preuss (US PG-Pub. 20210233031).
Regarding claim 1: 
Ortiz teaches: computer implemented method of determining whether a user engaged in an interactive video session is genuine (FIGS. 1-2 and 5-6; ¶ [0209] “System 100 may be able to process the video 202 and determine in real time or near real time that the user in the video 202 is a real human being, as opposed to a robot, an image, or a spoofing attack.”; ¶ [0228] “FIG. 5 shows two example processes 510, 520 for user verification”; Also see FIG. 25), comprising:
using at least one processor (FIG .1, processor 101) for:
receiving a sequence of consecutive images captured by at least one imaging sensor configured to depict at least a face of a user engaged in a video session while the user moves his lips (¶ [0208] “In some embodiments, the user may, during the capturing of video 202, speak a word or phrase. The word may be provided by a vendor or a third party, and may include alphabets, numbers, and/or words. For example, the word may be “1A3CB” or “hello world.””; also see ¶ [0230] - ¶ [0231]);
analyzing at least some of the images of the sequence to identify at least one dynamic facial pattern of the user while his lips are moving, the at least one dynamic facial pattern expressing a movement of  (¶ [0211] “…System 100 can also capture a user's facial movements, including lip movements and store in data storage 108. In some embodiments, extracted features of images depicting a user's facial movements including lip movements may be stored in data storage 108.”; Also see ¶ [0231], ¶ [0125] and FIG. 12);
determining whether the user is genuine or an impersonator based on a comparison between the at least one identified dynamic facial pattern of the user and at least one reference dynamic facial pattern (¶ [0213] “In some embodiments, system 100 may execute algorithms in machine-learning unit 113 to analyze a user's lip movements in the video 202 when speaking the provided word. The output from the machine-learning unit 113 may be stored as a feature in the database in data storage 108.”; ¶ [0215] “Next, system 100 may process the 3D video and compare it to the video data stored in data storage 108 based on a user profile that was created or updated at the time user registered himself with a 2D image or video. The system may compare various features of the two videos (i.e., the 2D video and the 3D video), such as facial features, facial movements, lip movements, eye movements, and so on. The system can at this stage, based on the comparison of the two videos, determine that the person appearing in the first video (i.e., the 2D video) is the same person appearing in the second video (i.e., the 3D video).”; Also see ¶ [0232]);
outputting an indication of whether the user is genuine or not based on the determination (¶ [0220] “Once successfully registered, the user is able to make payments using facial recognition through a POS or through a mobile device, for example, in accordance with the process described below.”; ¶ [0227] “Once system 100 has verified and authenticated the identity of the user in the 2D video, system 100 may proceed to trigger or request a payment in accordance with the user request.”; also see ¶ [0232]; In order to authorize a payment, an indication of a user being genuine is implied to be outputted).
Ortiz does not specifically teach: that the dynamic facial pattern expressing a movement of at least one of a plurality of wrinkles in the face of the user.
However, a person of ordinary skill in the art understands that extracting various features from the 2D video, such as facial movements, facial features, nose, skin, and so on, and analyzing lip movements of the user during the time the user is heard speaking a word as disclosed by Ortiz in ¶ [0215] “The system may compare various features of the two videos (i.e., the 2D video and the 3D video), such as facial features, facial movements, lip movements, eye movements, and so on” includes analyzing wrinkles.
Nonetheless, in a related field, Preuss teaches: dynamic facial pattern expressing a movement of at least one of a plurality of wrinkles in the face of the user (¶ [0043] “…In some examples, the classification engine 138 can be trained to detect facial features associated with one or more emotional expressions such as happiness, sadness, surprise, neutral, anger, contempt, and disgust. The features associated with each of the detectable expressions can include shape and movement of eyes, pupil size, facial wrinkles (e.g., around eyes and mouth), lips/mouth, and/or cheeks.” Also see ¶ [0062]).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Ortiz to incorporate the teachings of Preuss by including: dynamic facial pattern expressing a movement of at least one of a plurality of wrinkles in the face of the user in order to detect facial features associated with one or more emotional expressions.


Regarding claim 4: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz further teaches: wherein the user moves his lips for the purpose of at least one of: speak, smile, laugh, cry and yawn (¶ [0213] “In some embodiments, system 100 may execute algorithms in machine-learning unit 113 to analyze a user's lip movements in the video 202 when speaking the provided word.” Also see ¶ [0231]).

Regarding claim 5: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz further teaches: further comprising instructing the user to say at least one word selected to stimulate an increased movement of the at least one dynamic facial feature of the user (¶ [0208] “In some embodiments, the user may, during the capturing of video 202, speak a word or phrase. The word may be provided by a vendor or a third party, and may include alphabets, numbers, and/or words. For example, the word may be “1A3CB” or “hello world.””; [0217] In some embodiments, a user may be requested to speak a word provided to the user when the 3D video is being captured. The provided word may be the same word”; also see ¶ [0231]; stimulating an increased movement is a conclusory statement caused by speaking/moving the lips).


Regarding claim 7: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz further teaches: further comprising determining whether the user is genuine or not based on comparison between at least one another dynamic facial pattern of the user identified by analyzing the sequence of consecutive images of the user captured while moving his lips and at least one another reference dynamic facial pattern, the at least one another dynamic facial pattern expressing a movement of at least one of a plurality of dynamic facial features in the face of the user (¶ [0213] “In some embodiments, system 100 may execute algorithms in machine-learning unit 113 to analyze a user's lip movements in the video 202 when speaking the provided word. The output from the machine-learning unit 113 may be stored as a feature in the database in data storage 108.”; ¶ [0215] “Next, system 100 may process the 3D video and compare it to the video data stored in data storage 108 based on a user profile that was created or updated at the time user registered himself with a 2D image or video. The system may compare various features of the two videos (i.e., the 2D video and the 3D video), such as facial features, facial movements, lip movements, eye movements, and so on. The system can at this stage, based on the comparison of the two videos, determine that the person appearing in the first video (i.e., the 2D video) is the same person appearing in the second video (i.e., the 3D video).”; Also see ¶ [0232]) and see FIG. 14 for registering dynamic facial patterns).


Regarding claim 8: 
Ortiz in view of Preuss teaches the limitations of claim 7 as applied above. 
Ortiz further teaches: wherein the plurality of dynamic facial features comprising: a nostril, a distance between nostrils, a facial skin portion, an eyelid, an ear and a facial muscle (¶ [0231] “…At step B4, system 100 may be configured to extract various features from the 2D video, such as facial movements, facial features, nose, skin, and so on. At step B5, system 100 tracks and analyzes lip movements of the user during the time the user is heard speaking the code on video.”).

Regarding claim 10: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz further teaches: wherein the at least one reference dynamic facial pattern is defined based on analysis of a plurality of dynamic facial patterns identified for a plurality of users (¶ [0207] “In some embodiments, instead of or in addition to images or videos of users, storage 108 may be configured to save extracted features from the images or videos.”; ¶ [0229] “…At step A5, system 100 may search in the database of video data and return the top 5 results (i.e. 5 users) that are the best matches based on the extracted features of the 3D video”; ¶ [0232] “…At step B7, system 100 may match the identified code with the code that has been previously provided to the user and if the match of codes is successful, system 100 may at step B8 search in the database of video data and return the top 5 results (i.e. 5 users) that are the best matches based on the extracted facial features of the 2D video.”).

Regarding claim 11: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz further teaches: wherein the at least one reference dynamic facial pattern is specifically defined for the user based on at least one previously captured dynamic facial pattern of the user (¶ [0215] “…Next, system 100 may process the 3D video and compare it to the video data stored in data storage 108 based on a user profile that was created or updated at the time user registered himself with a 2D image or video”; ¶ [0229] “…At step A5, system 100 may search in the database of video data and return the top 5 results (i.e. 5 users) that are the best matches based on the extracted features of the 3D video”; ¶ [0232] “…At step B7, system 100 may match the identified code with the code that has been previously provided to the user and if the match of codes is successful, system 100 may at step B8 search in the database of video data and return the top 5 results (i.e. 5 users) that are the best matches based on the extracted facial features of the 2D video.”).

Regarding claim 13: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz further teaches: wherein the at least one reference dynamic facial pattern is learned by at least one model created using at least one Machine Learning (ML) model trained with a plurality of dynamic facial features of a plurality of users (¶ [0130] “A machine-learning unit 113 may be configured to process one or more data sets representative of one or more images or videos for training one or more models for generating predictions regarding new images or videos.”; ¶ [0213] “In some embodiments, system 100 may execute algorithms in machine-learning unit 113 to analyze a user's lip movements in the video 202 when speaking the provided word. The output from the machine-learning unit 113 may be stored as a feature in the database in data storage 108.”; also see FIG. 26, 2603 and ¶ [0171]).

Regarding claim 14: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz further teaches: further comprising the comparison is done by applying the at least one trained ML model to:- extract at least one feature vector from the at least some images, the at least one feature vector expresses movement of at least some of the plurality of wrinkles and/or if the other dynamic facial features (¶ [0131] “In some embodiments, the machine-learning unit 113 may be configured to process images or videos of a user speaking a specific word and to derive models representing a user's lip movements when speaking the specific word. In some embodiments, unit 113 may be a deep learning unit.”; ¶ [0132] “There may be two machine learning units—a first unit adapted for extracting, for example, using an encoder neural network, the data sets into a data subset that represents a constrained set of features identifying an individual.”; [0134] “A constrained set of features can be represented as a floating point latent vector extracted from raw image data, and the floating point vector generated from an encoder neural network can be adapted to learn a compression of the raw image data into the floating point latent vector defined by the feature set representing speech motions of the individual.” Also see ¶ [0213] – [0215] and ¶ [0225]), 
And  classify the at least one extracted feature vector according to the at least one learned reference dynamic facial pattern (¶ [0155] “A verification unit 115 may be configured to receive processed images and videos from video processing unit 111 and verify a user's identity based on a user profile stored in data storage 108.” ; ¶ [0217] “…This way, system 100 can, through the machine-learning unit 113, further validate that the person appearing in the 2D video is the same person appearing in the 3D video, based on analysis of the lip movements of the user speaking the same word.” Also see FIG. 30).

Regarding claim 16: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz further teaches: further comprising biometrically authenticating the user based on the at least one dynamic facial pattern compared to a biometric face signature of the user (¶ [0164] “In some embodiments, verification unit 115 may retrieve stored data from a corresponding user profile, and use the stored data to determine if a person presented in an image or video has the same identity associated with the corresponding user profile. The stored data may relate to one or more biometric features of the user associated with the corresponding user profile. The one or more biometric features may include, for example, a user's facial movement such as lip movements. In some example, the stored data may relate to the user's lip movements when the user speaks a specific word or phrase.”).

Regarding claims 17-18: the limitations of the claims are similar to those of claim 1; therefore, rejected in the same manner as applied above.   


Claims 2-3 are rejected under 35 U.S.C. 103 as being unpatentable over Ortiz (US PG-Pub. 20200036528) in view of Preuss (US PG-Pub. 20210233031) and Faridul (US PG-Pub. 20180260643 ).
Regarding claim 2: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz in view of Preuss does not explicitly teach: wherein the impersonator uses a mask to impersonate as the user during the video session.
However, in a related field, Faridul teaches: wherein the impersonator uses a mask to impersonate as the user during the video session (¶ [0005] “Such systems may be vulnerable to ‘spoof’ or ‘presentation’ attacks, in which an attacker claims an authorised user's identity by presenting a falsified face of the authorised user to the system, for example by use of a mask, a photograph, a video, or a virtual reality representation of the authorised user's face.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Ortiz in view of Preuss to incorporate the teachings of Faridul by including: wherein the impersonator uses a mask to impersonate as the user during the video session in order to determine whether a live human face is present by generating a stimulus; predicting, using a model, human face movement in response to said generated stimulus; presenting the stimulus to a face of a person; tracking a movement of the face in response to the stimulus using a camera; and determining whether a live human face is present by comparing the movement of the face against said prediction by taking into consideration the type of techniques used to impersonate a guanine user.

Regarding claim 3: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz in view of Preuss does not explicitly teach: wherein the impersonator is applied using synthetic media generated to simulate the user during the video session.
However, in a related field, Faridul teaches: wherein the impersonator is applied using synthetic media generated to simulate the user during the video session (¶ [0005] “Such systems may be vulnerable to ‘spoof’ or ‘presentation’ attacks, in which an attacker claims an authorised user's identity by presenting a falsified face of the authorised user to the system, for example by use of a mask, a photograph, a video, or a virtual reality representation of the authorised user's face.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Ortiz in view of Preuss to incorporate the teachings of Faridul by including: wherein the impersonator is applied using synthetic media generated to simulate the user during the video session  in order to determine whether a live human face is present by generating a stimulus; predicting, using a model, human face movement in response to said generated stimulus; presenting the stimulus to a face of a person; tracking a movement of the face in response to the stimulus using a camera; and determining whether a live human face is present by comparing the movement of the face against said prediction.


Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Ortiz (US PG-Pub. 20200036528) in view of Preuss (US PG-Pub. 20210233031) and Zhang (US PG-Pub. 20200110863).
Regarding claim 6: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz in view of Preuss does not explicitly teach: wherein the at least one selected word is directed to stimulate the user to smile and/or to laugh thus accentuating the movement of the at least one wrinkle.
However, in a related field, Zhang teaches: wherein the at least one selected word is directed to stimulate the user to smile and/or to laugh thus accentuating the movement of the at least one wrinkle (¶ [0036] “…For example, if an embodiment determines that the facial expression does not correlate with an accepted facial expression but does share a predetermined level of similarity (e.g., 50% similarity, 75% similarity, etc.) with an accepted facial expression, then an embodiment may request the user to provide the facial expression again or may provide the user with guidance on how to complete the expression (e.g., an embodiment may advise the user to smile more or to contort their face in a specific way to better match the accepted expression, etc.).”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Ortiz in view of Preuss to incorporate the teachings of Zhang by including: wherein the at least one selected word is directed to stimulate the user to smile and/or to laugh thus accentuating the movement of the at least one wrinkle in order to match the facial expression provided by the user with accepted facial expression to the prompted emotion when the facial features match the feature of an authorized user but the fails expression fails to match the required expression.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Ortiz (US PG-Pub. 20200036528) in view of Preuss (US PG-Pub. 20210233031) and Min (US PG-Pub. 20220163650).
Regarding claim 12: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz in view of Preuss does not explicitly teach: wherein the at least one reference dynamic facial pattern is defined by at least one rule based model.
However, in a related field, Min teaches: wherein the at least one reference dynamic facial pattern is defined by at least one rule based model (¶ [0160] “…The electronic device 101 may perform training by applying the above-described data to a trained model (e.g., a rule-based model or an artificial intelligence model trained according to at least one of machine learning, a neural network, or a deep learning algorithm). The electronic device 101 may generate authentication templates (in other words, authentication reference data) from the gathered data based on the training and may store the generated authentication templates in the memory 130.”; ¶ [0163] “According to various embodiments, the electronic device 101 may compare the obtained data with the authentication templates stored in the memory 130, identifying whether the object 205 on which authentication is performed is the user's face.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Ortiz in view of Preuss to incorporate the teachings of Min by including: wherein the at least one reference dynamic facial pattern is defined by at least one rule based model in order to utilize training models to generate enrollment templates.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Ortiz (US PG-Pub. 20200036528) in view of Preuss (US PG-Pub. 20210233031) and Lau (US PG-Pub. 20130015946).
Regarding claim 15: 
Ortiz in view of Preuss teaches the limitations of claim 1 as applied above. 
Ortiz in view of Preuss does not explicitly teach: further comprising illuminating the face of the user to accentuate the movement of the at least one dynamic facial feature of the user.
However, in a related field, Lau teaches: further comprising illuminating the face of the user to accentuate the movement of the at least one dynamic facial feature of the user (¶ [0079] “In any of the example image capture screens and methods described above with respect to FIGS. 7-10, additional lighting can be provided by the mobile device itself to help illuminate the user when the image capture occurs in dark or poorly lit environments (e.g., at a night, in a dark restaurant, in a night club, in a car at night, and the like).”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Ortiz in view of Preuss to incorporate the teachings of Lau by including: further comprising illuminating the face of the  in order help illuminate the user when the image capture occurs in dark or poorly lit environments.

Allowable Subject Matter
Claim 9 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
Ortiz in view of Preuss teaches the limitations of claim 1. However, the prior art either alone or in combination fails to disclose, teach, or suggest “determining whether the user is genuine or not based on comparison of at least one dynamic neck pattern of the user and at least one reference dynamic neck pattern, the at least one dynamic neck pattern identified by analyzing a sequence of consecutive images of a neck of the user captured by the at least one imaging sensor while the user says at least one word expresses a movement of at least one of a plurality of dynamic neck features of the user, the plurality of dynamic neck features comprising: a wrinkle, a neck skin portion, a neck muscle and the laryngeal prominence” in the context of the claim as a whole. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Alon (US 20180060680) teaches: a device to provide a spoofing or no spoofing indication. The device may comprise a processor and a sensor. The sensor may receive multiple facial frames of a face of a user. The processor coupled to the sensor may be configured to: perform a function based upon components of the face of the user relative to one another to determine measured facial features for two adjacent frames of the multiple facial frames; and determine whether the measured facial features for the two adjacent frames are sufficiently different to indicate liveness of the user and no spoofing attempt (see ¶ [0022]).
Wang (US 20170308739) teaches: a human face recognition request is acquired, and a statement is randomly generated according to the human face recognition request; audio data and video data returned by a user in response to the statement are acquired; corresponding voice information is acquired according to the audio data; corresponding lip movement information is acquired according to the video data; and when the lip movement information and the voice information satisfy a preset rule, the human face recognition request is permitted. By performing fit goodness matching between the lip movement information and voice information in a video for dynamic human face recognition, an attack by human face recognition with a real photo may be effectively avoided, and higher security is achieved.
Rodriguez (US 10546183) teaches: a video input is configured to receive a moving image of the entity captured by a camera over the interval of time. The feature recognition module is configured to process the moving image to detect at least one human feature of the entity. The liveness detection module is configured to compare with the randomized outputs a behaviour exhibited by the detected human feature over the interval of time to determine whether the behaviour is an expected reaction to the randomized outputs, thereby determining whether the entity is a living being. 
Fourati (Anti-spoofing in face recognition-based biometric authentication using Image Quality Assessment) teaches: a fast and non-intrusive anti-spoofing solution based on Image Quality Assessment (IQA) and motion cues to distinguish between genuine and fake face-appearances.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to WASSIM MAHROUKA whose telephone number is (571)272-2945. The examiner can normally be reached Monday-Thursday 7:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward Urban can be reached on (571)272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/WASSIM MAHROUKA/Examiner, Art Unit 2665                                                                                                                                                                                                        
/EDWARD F URBAN/Supervisory Patent Examiner, Art Unit 2665