DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/28/2021 and 12/16/2020 have been considered by the examiner and been placed of record in the file.

Claim Objections
Claim 5 is objected to because of the following informalities:  A claim must end with a period and not a semicolon.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-2, 4-5, 6-9, 11-12, 14-16 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Lillo et al. (A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets) in view of Weinzaepfel et al. (EP 3792821 A1).

Claim 1. Lillo et al. disclose a system (read as computer (section 4)) for learning actions for image-based action recognition (read as model for human action recognition using body joint locations…categorize complex actions in videos, and perform spatio-temporal annotations of the atomic actions that compose the complex action being performed (Abstract)) in an autonomous vehicle, the system comprising: 
one or more processors (read as CPU (Section 4)) and a non-transitory computer-readable medium (read as computer (section 4)) having executable instructions encoded thereon such that when executed  (read as execution of the cutting plane algorithm (Section 4)), the one or more processors  (read as CPU (Section 4)) perform an operation of: 
(read as body poses, atomic actions, and complex actions (Introduction)); 
mapping the components onto a set of action patterns, thereby creating a dictionary of action patterns (read as model learns a dictionary that captures relevant spatio-temporal configurations of body parts. We refer to the components of this dictionary as motion poselets. At the level of atomic actions, our model learns a dictionary that captures the main modes of variation in the execution of each action. We refer to the components of this dictionary as action lets (Introduction)); 
for each action in the set of labeled action data (read as body poses, atomic actions, and complex actions (Introduction)), learning a mapping from the action pattern representing the action onto a class label (sections 3.3-3.4 provide details about labeling different actions to be used by the model) for the action (read as We refer to the components of this dictionary as motion poselets. At the level of atomic actions... We refer to the components of this dictionary as actionlets …focus on modeling atomic actions that can be characterized by the body motions of a single actor, such as running, drinking, or eating (Introduction)); 
mapping a set of new unlabeled target action image data (read as weakly supervised model can recover some of the information that is missing
while performing well at the activity categorization task (Section 4.3)… random initialization of latent variables v, ii) Clustering: initialize v by first computing a BoW descriptor for the atomic action intervals and then perform k-means clustering (Section 4.6)) onto a shared embedding feature space in which action patterns can be discriminated (This limitation is a design choice related. This is true since any new action will be will be in a shared feature space before it is more clearly labelled and associated with a particular class (i.e. the features must first be recognized as actions/poses by a human); 
for each target action in the set of new unlabeled target action image data, identifying a class label for the target action (Lillo et al.: read as weakly supervised model can recover some of the information that is missing while performing well at the activity categorization task (Section 4.3)… random initialization of latent variables v, ii) Clustering: initialize v by first computing a BoW descriptor for the atomic action intervals and then perform k-means clustering (Section 4.6) ); and 
Lillo et al. do not explicitly disclose: 
based on the identified class label, causing the autonomous vehicle to perform a vehicle maneuver corresponding to the identified class label.
However, in the related field of endeavor Weinzaepfel et al. disclose:  an autonomous vehicle can be detect dangerous human behavior, comprising an action that is classified as dangerous, and the vehicle can adapt its speed accordingly or perform an emergency break (Weinzaepfel et al. [0081]). The idea, of an autonomous vehicle making decision based on detecting a human behavior, is clearly disclosed by Weinzaepfel et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Lillo et al. with the teaching of Weinzaepfel et al. in order to can adapt the speed accordingly or perform an emergency break of an autonomous vehicle in the presence of dangerous human behavior.

Claim 2. The system as set forth in Claim 1, the combination of Lillo et al. and Weinzaepfel et al. teaches,
wherein the one or more processors further perform an operation of learning the shared embedding feature space using self-supervised learning (Lillo et al.: read as weakly supervised model can recover some of the information that is missing
while performing well at the activity categorization task (Section 4.3). Features must be included is a shared feature space before being categorized (i.e. the features must first be recognized as actions/poses by a human.).


Claim 4. The system as set forth in Claim 1, the combination of Lillo et al. and Weinzaepfel et al. teaches,
where in mapping the set of new unlabeled target action image data (Lillo et al.: read as weakly supervised model can recover some of the information that is missing while performing well at the activity categorization task (Section 4.3)… random initialization of latent variables v, ii) Clustering: initialize v by first computing a BoW descriptor for the atomic action intervals and then perform k-means clustering (Section 4.6)), the one or more processors further perform operations of: 
computing a skeletal pose map (Lillo et al. FIG. 3) comprising a set of joint angles (Lillo et al.: read as each body joint… depth coordinate with a value z = d to joints corresponding to wrist and knees, z = -d to elbows, and z = 0 to other joints, so we can compute angles between segments (Section 4.4)) from the set of new unlabeled target action image data (Lillo et al.: read as weakly supervised model (Section 4.6)); 
generating a feature representation of poses based on the set of joint angles, resulting in a set of action pose features (Lillo et al.: read as Our body pose feature vector consists of the concatenation of two descriptors. At frame t and region r, a descriptor xf,r encodes geometric information about the spatial configuration of body joints, and a descriptor xr,'r encodes local motion information around each body joint position (Section 3).  FIG. 3) ; and 
encoding the set of action pose features to the shared embedding feature space with a uniform distribution (this is a Design choice for the model).

Claim 5. The system as set forth in Claim 4, the combination of Lillo et al. and Weinzaepfel et al. teaches,
wherein the one or more processors further perform operations of: 
generating a target action pattern representing the target action in the shared embedding feature space from the encoded set of action pose features (Lillo et al.: read as … body poses, atomic actions, and complex actions…here we focus on modeling atomic actions that can be characterized by the body motions of a single actor, such as running, drinking, or eating (Introduction)); and 
classifying the target action pattern (Lillo et al.: read as … body poses, atomic actions, and complex actions… (Introduction));

Claim 7. The system as set forth in Claim 1, the combination of Lillo et al. and Weinzaepfel et al. teaches,
wherein the identified class label corresponds to a pedestrian's actions (Lillo et al.: running (Introduction)), and the vehicle maneuver is a maneuver around the pedestrian (Weinzaepfel et al.: read as the vehicle can adapt its speed accordingly or perform an emergency break ([0081]).

Claim 8. Lillo et al. disclose a computer (read as computer (section 4)) implemented method for learning actions for image-based action recognition (read as model for human action recognition using body joint locations…categorize complex actions in videos, and perform spatio-temporal annotations of the atomic actions that compose the complex action being performed (Abstract)) in an autonomous vehicle, the method comprising an act of: 
causing one or more processers (read as CPU (Section 4)) to execute instructions encoded (read as execution of the cutting plane algorithm (Section 4)) on a non- transitory computer-readable medium, such that upon execution (read as CPU (Section 4). There must be memory storing instructions to be executed by the CPU I order to perform the proposed learning algorithm), the one or more processors perform operations of: 
separating a set of labeled action image data from a source domain into components (read as body poses, atomic actions, and complex actions (Introduction)); 
mapping the components onto a set of action patterns, thereby creating a dictionary of action patterns (read as model learns a dictionary that captures relevant spatio-temporal configurations of body parts. We refer to the components of this dictionary as motion poselets. At the level of atomic actions, our model learns a dictionary that captures the main modes of variation in the execution of each action. We refer to the components of this dictionary as action lets (Introduction)); 
for each action in the set of labeled action data, learning a mapping from the action pattern representing the action onto a class label for the action for the action (read as We refer to the components of this dictionary as motion poselets. At the level of atomic actions... We refer to the components of this dictionary as actionlets …focus on modeling atomic actions that can be characterized by the body motions of a single actor, such as running, drinking, or eating (Introduction)) (read as body poses, atomic actions, and complex actions (Introduction)); 
mapping a set of new unlabeled target action image data (Lillo et al.: read as weakly supervised model can recover some of the information that is missing
while performing well at the activity categorization task (Section 4.3)… random initialization of latent variables v, ii) Clustering: initialize v by first computing a BoW descriptor for the atomic action intervals and then perform k-means clustering (Section 4.6) )  onto a shared embedding feature space in which action patterns can be discriminated (read as then perform k-means clustering (Section 4.6). This limitation is a design choice related. This is true since any new action will be will be in a shared feature space before it is more clearly labelled and associated with a particular class (i.e. the features must first be recognized as actions/poses by a human); 
for each target action in the set of new unlabeled target action image data, identifying a class label for the target action (read as then perform k-means clustering (Section 4.6)); and 
Lillo et al. do not explicitly disclose: 
based on the identified class label, causing the autonomous vehicle to perform a vehicle maneuver corresponding to the identified class label.
However, in the related field of endeavor Weinzaepfel et al. disclose:  an autonomous vehicle can be detect dangerous human behavior, comprising an action that is classified as dangerous, and the vehicle can adapt its speed accordingly or perform an emergency break (Weinzaepfel et al. [0081]). The idea, of an autonomous vehicle making decision based on detecting a human behavior, is clearly disclosed by Weinzaepfel et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Lillo et al. with the teaching of Weinzaepfel et al. in order to can adapt the speed accordingly or perform an emergency break of an autonomous vehicle in the presence of dangerous human behavior.

Claim 9. The method as set forth in Claim 8, the combination of Lillo et al. and Weinzaepfel et al. teaches,
wherein the one or more processors further perform an operation of learning the shared embedding feature space using self- supervised learning (Lillo et al.: read as weakly supervised model can recover some of the information that is missing
while performing well at the activity categorization task (Section 4.3)).

Claim 11. The method as set forth in Claim 8, the combination of Lillo et al. and Weinzaepfel et al. teaches,
where in mapping the set of new unlabeled target action image data (Lillo et al.: read as weakly supervised model can recover some of the information that is missing while performing well at the activity categorization task (Section 4.3)… random initialization of latent variables v, ii) Clustering: initialize v by first computing a BoW descriptor for the atomic action intervals and then perform k-means clustering (Section 4.6) ), the one or more processors further perform operations of: 
computing a skeletal pose map (Lillo et al. FIG. 3) comprising a set of joint angles from the set of new unlabeled target action image data (Lillo et al.: read as each body joint… depth coordinate with a value z = d to joints corresponding to wrist and knees, z = -d to elbows, and z = 0 to other joints, so we can compute angles between segments (Section 4.4]); 
generating a feature representation of poses based on the set of joint angles, resulting in a set of action pose features (Lillo et al.: read as Our body pose feature vector consists of the concatenation of two descriptors. At frame t and region r, a descriptor xf,r encodes geometric information about the spatial configuration of body joints, and a descriptor xr,'r encodes local motion information around each body joint position (Section 3).  FIG. 3); and 
encoding the set of action pose features to the shared embedding feature space with a uniform distribution (this is a Design choice for the model).

Claim 12. The method as set forth in Claim 11, the combination of Lillo et al. and Weinzaepfel et al. teaches,
wherein the one or more processors further perform operations of: 
generating a target action pattern representing the target action in the shared embedding feature space from the encoded set of action pose features (Lillo et al.: read as … body poses, atomic actions, and complex actions…here we focus on modeling atomic actions that can be characterized by the body motions of a single actor, such as running, drinking, or eating (Introduction)); and 
classifying the target action pattern (Lillo et al.: read as … body poses, atomic actions, and complex actions… (Introduction)).

Claim 14. The method as set forth in Claim 8, the combination of Lillo et al. and Weinzaepfel et al. teaches,
wherein the identified class label corresponds to a pedestrian's actions (Lillo et al.: running (Introduction)), and the vehicle maneuver is a maneuver around the pedestrian (Weinzaepfel et al.: read as the vehicle can adapt its speed accordingly or perform an emergency break ([0081].).

Claim 15. Lillo et al. disclose a computer program product (read as computer… execution of the cutting plane algorithm (section 4)) for learning actions for image-based action recognition (read as model for human action recognition using body joint locations…categorize complex actions in videos, and perform spatio-temporal annotations of the atomic actions that compose the complex action being performed (Abstract)) in an autonomous vehicle, the computer program product comprising: 
computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having one or more processors for causing the processor  (read as CPU (Section 4). There must be memory storing instructions to be executed by the CPU I order to perform the proposed learning algorithm)to perform operations of: 
separating a set of labeled action image data from a source domain into components (read as body poses, atomic actions, and complex actions (Introduction)); 
mapping the components onto a set of action patterns, thereby creating a dictionary of action patterns (read as model learns a dictionary that captures relevant spatio-temporal configurations of body parts. We refer to the components of this dictionary as motion poselets. At the level of atomic actions, our model learns a dictionary that captures the main modes of variation in the execution of each action. We refer to the components of this dictionary as action lets (Introduction)); 
for each action in the set of labeled action data, learning a mapping from the action pattern representing the action onto a class label for the action (sections 3.3-3.4 provide details about labeling different actions to be used by the model) for the action (read as We refer to the components of this dictionary as motion poselets. At the level of atomic actions... We refer to the components of this dictionary as actionlets …focus on modeling atomic actions that can be characterized by the body motions of a single actor, such as running, drinking, or eating (Introduction)); 
mapping a set of new unlabeled target action image data (read as weakly supervised model can recover some of the information that is missing
while performing well at the activity categorization task (Section 4.3)… random initialization of latent variables v, ii) Clustering: initialize v by first computing a BoW descriptor for the atomic action intervals and then perform k-means clustering (Section 4.6)) onto a shared embedding feature space in which action patterns can be discriminated (read as perform k-means clustering (Section 4.6)); 
for each target action in the set of new unlabeled target action image data, identifying a class label for the target action (read as weakly supervised model can recover some of the information that is missing while performing well at the activity categorization task (Section 4.3)… random initialization of latent variables v, ii) Clustering: initialize v by first computing a BoW descriptor for the atomic action intervals and then perform k-means clustering (Section 4.6)); and 
Lillo et al. do not explicitly disclose: 
based on the identified class label, causing the autonomous vehicle to perform a vehicle maneuver corresponding to the identified class label.
However, in the related field of endeavor Weinzaepfel et al. disclose:  an autonomous vehicle can be detect dangerous human behavior, comprising an action that is classified as dangerous, and the vehicle can adapt its speed accordingly or perform an emergency break (Weinzaepfel et al. [0081]). The idea, of an autonomous vehicle making decision based on detecting a human behavior, is clearly disclosed by Weinzaepfel et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Lillo et al. with the teaching of Weinzaepfel et al. in order to can adapt the speed accordingly or perform an emergency break of an autonomous vehicle in the presence of dangerous human behavior.

Claim 16. The computer program product as set forth in Claim 15, the combination of Lillo et al. and Weinzaepfel et al. teaches,
 further comprising instructions for causing the one or more processors to further perform an operation of learning the shared embedding feature space using self-supervised learning (Lillo et al.: read as weakly supervised model can recover some of the information that is missing while performing well at the activity categorization task (Section 4.3).).


Claim 18. The computer program product as set forth in Claim 15, the combination of Lillo et al. and Weinzaepfel et al. teaches,
where in mapping the set of new unlabeled target action image data, the computer program product further comprises instructions for causing the one or more processors (Lillo et al.: read as weakly supervised model can recover some of the information that is missing while performing well at the activity categorization task (Section 4.3)… random initialization of latent variables v, ii) Clustering: initialize v by first computing a BoW descriptor for the atomic action intervals and then perform k-means clustering (Section 4.6) ) to further perform operations of: 
computing a skeletal pose map (Lillo et al. FIG. 3) comprising a set of joint angles from the set of new unlabeled target action image data (Lillo et al.: read as each body joint… depth coordinate with a value z = d to joints corresponding to wrist and knees, z = -d to elbows, and z = 0 to other joints, so we can compute angles between segments (Section 4.4]); 
generating a feature representation of poses based on the set of joint angles, resulting in a set of action pose features (Lillo et al.: read as Our body pose feature vector consists of the concatenation of two descriptors. At frame t and region r, a descriptor xf,r encodes geometric information about the spatial configuration of body joints, and a descriptor xr,'r encodes local motion information around each body joint position (Section 3).  FIG. 3); and 
encoding the set of action pose features to the shared embedding feature space with a uniform distribution (this is a Design choice for the model).

Claim 19. The computer program product as set forth in Claim 18, the combination of Lillo et al. and Weinzaepfel et al. teaches,
further comprising instructions for causing the one or more processors to further perform operations of: 
generating a target action pattern representing the target action in the shared embedding feature space from the encoded set of action pose features (Lillo et al.: read as … body poses, atomic actions, and complex actions…here we focus on modeling atomic actions that can be characterized by the body motions of a single actor, such as running, drinking, or eating (Introduction)); and 
classifying the target action pattern (Lillo et al.: read as … body poses, atomic actions, and complex actions… (Introduction)).

Claims 3, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Lillo et al. (A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets) and Weinzaepfel et al. (EP 3792821 A1) in view of Yuan et al. (CN 112446244 A).

Claim 3. The system as set forth in Claim 1, the combination of Lillo et al. and Weinzaepfel et al. does not explicitly disclose,
wherein the one or more processors further perform an operation of mapping a data point from the set of new unlabeled target action image data onto a probability distribution of action patterns.
However, in the related field of endeavor Yuan et al. disclose: the to-be-identified image are respectively input to the target human body action identification model, can obtain the probability of each type of human action (English translation). The idea, of mapping human action onto a probability distribution, is clearly disclosed by Yuan et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of the combination of Lillo et al. and Weinzaepfel et al with the teaching of Yuan et al. in order to select the type of the human action corresponding to the maximum probability as the identification result (Yuan et al. (English translation)).

Claim 10. The method as set forth in Claim 8, the combination of Lillo et al. and Weinzaepfel et al. does not explicitly disclose,
wherein the one or more processors further perform an operation of mapping a data point from the set of new unlabeled target action image data onto a probability distribution of action patterns.
However, in the related field of endeavor Yuan et al. disclose: the to-be-identified image are respectively input to the target human body action identification model, can obtain the probability of each type of human action (English translation). The idea, of mapping human action onto a probability distribution, is clearly disclosed by Yuan et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of the combination of Lillo et al. and Weinzaepfel et al with the teaching of Yuan et al. in order to select the type of the human action corresponding to the maximum probability as the identification result (Yuan et al. (English translation)).

Claim 17. The computer program product as set forth in Claim 15, the combination of Lillo et al. and Weinzaepfel et al. does not explicitly disclose,
further comprising instructions for causing the one or more processors to perform an operation of mapping a data point from the set of new unlabeled target action image data onto a probability distribution of action patterns.
However, in the related field of endeavor Yuan et al. disclose: the to-be-identified image are respectively input to the target human body action identification model, can obtain the probability of each type of human action (English translation). The idea, of mapping human action onto a probability distribution, is clearly disclosed by Yuan et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of the combination of Lillo et al. and Weinzaepfel et al with the teaching of Yuan et al. in order to select the type of the human action corresponding to the maximum probability as the identification result (Yuan et al. (English translation)).


Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Lillo et al. (A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets) and Weinzaepfel et al. (EP 3792821 A1) in view of Xu et al. (CN 102214309 A – English translation).

Claim 6. The system as set forth in Claim 1, the combination of Lillo et al. and Weinzaepfel et al. does not explicitly disclose,
wherein the target action pattern is classified by a transductive K-Nearest Neighbor classifier.
However, in the related field of endeavor Xu et al. disclose: The KNN classifier corresponding to the above classification of the head and shoulders model is used to realize the recognition of a specific human body [0009]. The idea, of using a K-nearest neighbor classifier, is clearly disclosed by Xu et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of the combination of Lillo et al. and Weinzaepfel et al with the teaching of Xu et al. in order to improve the recognition accuracy (Xu et al. [0009]).

Claim 13. The method as set forth in Claim 8, the combination of Lillo et al. and Weinzaepfel et al. does not explicitly disclose,
wherein the target action pattern is classified by a transductive K-Nearest Neighbor classifier.
However, in the related field of endeavor Xu et al. disclose: The KNN classifier corresponding to the above classification of the head and shoulders model is used to realize the recognition of a specific human body [0009]. The idea, of using a K-nearest neighbor classifier, is clearly disclosed by Xu et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of the combination of Lillo et al. and Weinzaepfel et al with the teaching of Xu et al. in order to improve the recognition accuracy (Xu et al. [0009]).

Claim 20. The computer program product as set forth in Claim 15, the combination of Lillo et al. and Weinzaepfel et al. does not explicitly disclose,
wherein the target action pattern is classified by a transductive K-Nearest Neighbor classifier.
However, in the related field of endeavor Xu et al. disclose: The KNN classifier corresponding to the above classification of the head and shoulders model is used to realize the recognition of a specific human body [0009]. The idea, of using a K-nearest neighbor classifier, is clearly disclosed by Xu et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of the combination of Lillo et al. and Weinzaepfel et al with the teaching of Xu et al. in order to improve the recognition accuracy (Xu et al. [0009]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Refer to PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMED RACHEDINE whose telephone number is (571)272-9249.  The examiner can normally be reached on Mon-Fri 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lester Kincaid can be reached on (571)272-7922.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MOHAMMED RACHEDINE
Examiner
Art Unit 2649



/MOHAMMED RACHEDINE/Primary Examiner, Art Unit 2646