Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED OFFICE ACTION

Status of Claims

Claims 1-20 are pending in this Office Action.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b) (2) (C) for any potential 35 U.S.C. 102(a) (2) prior art against the later invention.

1.	Claims 1,9 and 17    are rejected under 35 U.S.C 103 as being patentable over HAYAKAWA et al. (USPUB 20210271866) in view of DENG et al. (USPUB 20210312321).

As per Claim 1,  HAYAKAWA et al. teaches A computer-implemented method( Computer medium taught within Paragraphs [0015] and [0042])   for interactive activity recognition ( action recognition and interactive relationship taught within Paragraphs [0018-0019]) , comprising: retrieving, by one or more processors, a temporal sequence of image frames  ( Paragraph [0011]- “…The processing system, and processor-readable medium encodes the identified key points for each frame. A second convolutional neural network, including a third temporal dimension, is used to process the data structures corresponding to the temporal sequence of frames to identify human behavior in the sequence of frames of the digital video….”) from a video recording ( Paragraph [0036]- “…a neural network designed to learn and reason about temporal dependencies between video frames (e.g., images in a sequence of images), …”) ; identifying, by the one or more processors ( Paragraph [0020]- “…neural networks 108 can be executed on a remote server (e.g., with one or more processors and memory)…”) , second keypoints in each of the image frames in the temporal sequence , the second keypoints are associated with an individual interacting with the object( Paragraph [0019]- “…the one or more cameras can be configured as RGB cameras that can capture RGB bands that are configured to capture rich information about object appearance, as well as relationships and interactions between the vehicle 102 and objects (e.g., pedestrians) within the surrounding environment of the vehicle 102….”) ; combining, by the one or more processors ( Paragraph [0043]- “…one or more processors; and memory storing instructions, which when executed by the one or more processors,…”) , the first keypoints with the second keypoints( Paragraphs [0041-0042]- “… combining the processed first set of image data and the processed second set of image data, and providing the combined processed first set of image data and the processed second set of image data as an input to the first neural network to determine the recognized action of the person. …”) ; 
HAYAKAWA et al. does not explicitly teach identifying, by the one or more processors, first keypoints in each of the image frames in the temporal sequence, the first keypoints are associated with an object in the temporal sequence of image frames; extracting, by the one or more processors, spatial-temporal features from the combined first keypoints and second keypoints; and based on the extracted spatial-temporal features, training, by the one or more processors, a classification model for recognition of interactive activities between the individual and the object.  
	However, within analogous art, DENG et al. teaches  identifying, by the one or more processors ( One or more processor taught within Paragraphs [0101]) , first keypoints in each of the image frames in the temporal sequence ( Paragraphs [0014] and {0057]) , the first keypoints are associated with an object in the temporal sequence of image frames ( Paragraphs [0006] and [0014]) ;extracting, by the one or more processors ( one or more processors taught within Paragraph [0078]) , spatial-temporal features( Paragraphs [0014] and [0057])  from the combined first keypoints and second keypoints ( combining of keypoint positions taught within Paragraphs [0021-0022]) ; and based on the extracted spatial-temporal features( spatial temporal sequence taught within Paragraphs [0006] and  [0057]) , training, by the one or more processors( One or more processor taught within Paragraphs [0101]), a classification model for recognition of interactive activities between the individual and the object ( Paragraph [0048]- “… identify the behavior of the human body detected and tracked in the sequence of frames (e.g., to identify a class or category of human behavior from a set of classes or categories of human behavior). Two different human body position and movement encodings are described herein as two different encoded representations,…” AND Paragraph [0071]) .  
One of ordinary skill in the art would have been motivated to combine the teaching of DENG et al. within the modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al.   because the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  provides a system and method for implementing the identification of body movement within digital video utilizing convolutional neural network. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al. within the modified teaching of the  Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al.   for implementation of a system and method for the identification of body movement within digital video utilizing convolutional neural network.

As per Claim 9, HAYAKAWA et al. teaches A computer system ( Computer medium taught within Paragraphs [0015] and [0042]) for interactive activity recognition ( action recognition and interactive relationship taught within Paragraphs [0018-0019]), comprising: one or more processors( Paragraph [0041]- “…one or more processors and memory…”) , one or more computer-readable memories( non-transitory computer readable storage taught within Paragraph [0042]) , one or more computer-readable tangible storage devices( Paragraph [0042]), and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at P202003251US01Page 22 of 27least one of the one or more memories( Paragraphs [0015] and [0043]) , wherein the computer system is capable of performing a method comprising: retrieving, by one or more processors, a temporal sequence of image frames from a video recording  ( Paragraph [0011]- “…The processing system, and processor-readable medium encodes the identified key points for each frame. A second convolutional neural network, including a third temporal dimension, is used to process the data structures corresponding to the temporal sequence of frames to identify human behavior in the sequence of frames of the digital video….”) from a video recording ( Paragraph [0036]- “…a neural network designed to learn and reason about temporal dependencies between video frames (e.g., images in a sequence of images), …”);identifying, by the one or more processors( Paragraph [0020]- “…neural networks 108 can be executed on a remote server (e.g., with one or more processors and memory)…”), second keypoints in each of the image frames in the temporal sequence, the second keypoints are associated with an individual interacting with the object( Paragraph [0019]- “…the one or more cameras can be configured as RGB cameras that can capture RGB bands that are configured to capture rich information about object appearance, as well as relationships and interactions between the vehicle 102 and objects (e.g., pedestrians) within the surrounding environment of the vehicle 102….”); combining, by the one or more processors( Paragraph [0043]- “…one or more processors; and memory storing instructions, which when executed by the one or more processors,…”), the first keypoints with the second keypoints( Paragraphs [0041-0042]- “… combining the processed first set of image data and the processed second set of image data, and providing the combined processed first set of image data and the processed second set of image data as an input to the first neural network to determine the recognized action of the person. …”); 
HAYAKAWA et al. does not explicitly teach  identifying, by the one or more processors, first keypoints in each of the image frames in the temporal sequence, the first keypoints are associated with an object in the temporal sequence of image frames; extracting, by the one or more processors, spatial-temporal features from the combined first keypoints and second keypoints; and based on the extracted spatial-temporal features, training, by the one or more processors, a classification model for recognition of interactive activities between the individual and the object.  
However, within analogous art, DENG et al. teaches identifying, by the one or more processors ( One or more processor taught within Paragraphs [0101]) , first keypoints in each of the image frames in the temporal sequence ( Paragraphs [0014] and {0057]) , the first keypoints are associated with an object in the temporal sequence of image frames ( Paragraphs [0006] and [0014]) ; extracting, by the one or more processors ( one or more processors taught within Paragraph [0078]) , spatial-temporal features ( Paragraphs [0014] and [0057])  from the combined first keypoints and second keypoints ( combining of keypoint positions taught within Paragraphs [0021-0022]) ; and based on the extracted spatial-temporal features ( spatial temporal sequence taught within Paragraphs [0006] and  [0057]) , training, by the one or more processors ( One or more processor taught within Paragraphs [0101]), a classification model for recognition of interactive activities between the individual and the object ( Paragraph [0048]- “… identify the behavior of the human body detected and tracked in the sequence of frames (e.g., to identify a class or category of human behavior from a set of classes or categories of human behavior). Two different human body position and movement encodings are described herein as two different encoded representations,…” AND Paragraph [0071]) .  
One of ordinary skill in the art would have been motivated to combine the teaching of DENG et al. within the modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al.   because the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  provides a system and method for implementing the identification of body movement within digital video utilizing convolutional neural network. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al. within the modified teaching of the  Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al.   for implementation of a system and method for the identification of body movement within digital video utilizing convolutional neural network.

As per Claim 17,  The limitations within claim 17 are similar to the limitations within claims 1 and 9 , therefore the prior art on record  mentioned within claims 1 and 9 teaches the limitations within claim 17. 

2.	Claims 2,3,4,10,11,12  and 18   are rejected under 35 U.S.C 103 as being patentable over HAYAKAWA et al. (USPUB 20210271866) in view of DENG et al. (USPUB 20210312321) in further view of MANGALAM et al.  (USPUB  20210097266).

As per Claim 2, Combination of HAYAKAWA et al. and DENG et al. teach claim 1, 
Combination of HAYAKAWA et al. and DENG et al. does not explicitly teach wherein identifying the first keypoints further comprises: using, by the one or more processors, a Convolutional Neural Network (CNN) based detection model trained for identifying the first keypoints.
Within analogous art, MANGALAM et al. teaches  wherein identifying the first keypoints further comprises: using, by the one or more processors ( one or more processors taught within Paragraph [0068]) , a Convolutional Neural Network (CNN) ( Paragraph [0053])  based detection model trained for identifying the first keypoints ( Detection of the key point taught within Paragraphs [0063-0064])  .  
One of ordinary skill in the art would have been motivated to combine the teaching of MANGALAM et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  because the Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision mentioned by MANGALAM et al. provides a system and method for implementing  spatial position of key points of pedestrian interaction with vehicle. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision mentioned by MANGALAM et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  for implementation of a system and method for spatial position of key points of pedestrian interaction with vehicle.

As per Claim 3,  Combination of HAYAKAWA et al. and DENG et al. and  MANGALAM et al. teach claim 2,
Within analogous art, MANGALAM et al. teaches further comprising: using, by the one or more processors, pre-labeled data to train the CNN based detection model for identifying the first keypoints ( Paragraph [0063]- “… These masked frames are then processed through a pre-trained pose detection model (e.g., OpenPose) to generate pose labels for every pedestrian in the frame…” AND convolutional neural network taught within Paragraph [0053]) .  

As per Claim 4, Combination of HAYAKAWA et al. and DENG et al. teach claim 1,
Combination of HAYAKAWA et al. and DENG et al. does not explicitly teach wherein identifying the second keypoints further comprises: using, by the one or more processors, a real-time method for multi-person pose detection in images and videos.  
Within analogous art, MANGALAM et al. teaches wherein identifying the second keypoints further comprises: using, by the one or more processors( one or more processors taught within Paragraph [0068]), a real-time method for multi-person pose detection in images and videos ( FIG. 3 showing multi pose detection within image video AND Paragraphs [0031-0034]) .  

As per Claim 10,  Combination of HAYAKAWA et al. and DENG et al. teach claim 9,
Combination of HAYAKAWA et al. and DENG et al. does not explicitly teach wherein identifying the first keypoints further comprises: using, by the one or more processors, a Convolutional Neural Network (CNN) based detection model trained for identifying the first keypoints.
Within analogous art, MANGALAM et al. teaches  wherein identifying the first keypoints further comprises: using, by the one or more processors ( one or more processors taught within Paragraph [0068]) , a Convolutional Neural Network (CNN) ( Paragraph [0053])  based detection model trained for identifying the first keypoints( Detection of the key point taught within Paragraphs [0063-0064])  .  
One of ordinary skill in the art would have been motivated to combine the teaching of MANGALAM et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  because the Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision mentioned by MANGALAM et al. provides a system and method for implementing  spatial position of key points of pedestrian interaction with vehicle. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision mentioned by MANGALAM et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  for implementation of a system and method for spatial position of key points of pedestrian interaction with vehicle.


As per Claim 11,  Combination of HAYAKAWA et al. and DENG et al. and  MANGALAM et al. teach claim 10,
Within analogous art, MANGALAM et al. teaches further comprising: using, by the one or more processors, pre-labeled data to train the CNN based detection model for identifying the first keypoints( Paragraph [0063]- “… These masked frames are then processed through a pre-trained pose detection model (e.g., OpenPose) to generate pose labels for every pedestrian in the frame…” AND convolutional neural network taught within Paragraph [0053]) .  

As per Claim 12,  Combination of HAYAKAWA et al. and DENG et al. teach claim 9,
Combination of HAYAKAWA et al. and DENG et al. does not explicitly teach wherein identifying the second keypoints further comprises: using, by the one or more processors, a real-time method for multi-person pose detection in images and videos.
Within analogous art, MANGALAM et al. teaches wherein identifying the second keypoints further comprises: using, by the one or more processors ( one or more processors taught within Paragraph [0068]), a real-time method for multi-person pose detection in images and videos ( FIG. 3 showing multi pose detection within image video AND Paragraphs [0031-0034]).  

As per Claim 18, Combination of HAYAKAWA et al. and DENG et al. teach claim 17,
Combination of HAYAKAWA et al. and DENG et al. does not explicitly teach wherein identifying the first keypoints and identifying the second keypoints further comprises: using, by the one or more processors, a Convolutional Neural Network (CNN) based detection model trained for identifying the first keypoints; and using, by the one or more processors, a real-time method for multi-person pose detection in images and videos for identifying the second keypoints.
Within analogous art, MANGALAM et al. teaches wherein identifying the first keypoints and identifying the second keypoints( Paragraphs [0063-0064])  further comprises: using, by the one or more processors, a Convolutional Neural Network (CNN) ( Paragraph [0053])  based detection model trained for identifying the first keypoints( Detection of the key point taught within Paragraphs [0063-0064]); and using, by the one or more processors( one or more processors taught within Paragraph [0068]), a real-time method for multi-person pose detection in images and videos( FIG. 3 showing multi pose detection within image video AND Paragraphs [0031-0034]) for identifying the second keypoints (keypoint detection scheme taught within Paragraphs [0063-0064]).
One of ordinary skill in the art would have been motivated to combine the teaching of MANGALAM et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  because the Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision mentioned by MANGALAM et al. provides a system and method for implementing  spatial position of key points of pedestrian interaction with vehicle. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision mentioned by MANGALAM et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  for implementation of a system and method for spatial position of key points of pedestrian interaction with vehicle.

3.	Claims 6,7,14  and 15   are rejected under 35 U.S.C 103 as being patentable over HAYAKAWA et al. (USPUB 20210271866) in view of DENG et al. (USPUB 20210312321) in further view of Zhang et al.  (USPUB 20200184846).

As per Claim 6, Combination of HAYAKAWA et al. and DENG et al. teach claim 1,
Combination of HAYAKAWA et al. and DENG et al. does not explicitly teach further comprising: feeding, by the one or more processors  , the spatial-temporal features  to the classification model.
 Within analogous art, Zhang et al. teaches  further comprising: feeding, by the one or more processors ( one or more processors within Paragraph [0080]) , the spatial-temporal features ( Paragraphs [0083-0084] and [0167])  to the classification model ( classifying of objects and human interactive taught within Paragraph [0151]) .  
One of ordinary skill in the art would have been motivated to combine the teaching of Zhang et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  because the Methods and systems for facilitating interactive training of body-eye coordination and reaction time mentioned by Zhang et al.  provides a system and method for implementing  3D trajectory detection and estimation of object  movement prediction. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Methods and systems for facilitating interactive training of body-eye coordination and reaction time mentioned by Zhang et al.  within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  for implementation of a system and method for a  3D trajectory detection and estimation of object  movement prediction.

As per Claim 7, Combination of HAYAKAWA et al. and DENG et al. teach claim 1,
Combination of HAYAKAWA et al. and DENG et al. does not explicitly teach further comprising: identifying, by the one or more processors, a third keypoint corresponding to a sheltered keypoint.
Within analogous art, Zhang et al. teaches further comprising: identifying, by the one or more processors( one or more processors within Paragraph [0080]) , a third keypoint corresponding to a sheltered keypoint ( Paragraphs [0133] and [0139]) .  
One of ordinary skill in the art would have been motivated to combine the teaching of Zhang et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  because the Methods and systems for facilitating interactive training of body-eye coordination and reaction time mentioned by Zhang et al.  provides a system and method for implementing  3D trajectory detection and estimation of object  movement prediction. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Methods and systems for facilitating interactive training of body-eye coordination and reaction time mentioned by Zhang et al.  within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  for implementation of a system and method for a  3D trajectory detection and estimation of object  movement prediction.

As per Claim 14, Combination of HAYAKAWA et al. and DENG et al. teach claim 9,
Combination of HAYAKAWA et al. and DENG et al. does not explicitly teach further comprising: feeding, by the one or more processors, the spatial-temporal features to the classification model.
Within analogous art, Zhang et al. teaches  further comprising: feeding, by the one or more processors ( one or more processors within Paragraph [0080]) , the spatial-temporal features ( Paragraphs [0083-0084] and [0167])  to the classification model ( classifying of objects and human interactive taught within Paragraph [0151]) .  
One of ordinary skill in the art would have been motivated to combine the teaching of Zhang et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  because the Methods and systems for facilitating interactive training of body-eye coordination and reaction time mentioned by Zhang et al.  provides a system and method for implementing  3D trajectory detection and estimation of object  movement prediction. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Methods and systems for facilitating interactive training of body-eye coordination and reaction time mentioned by Zhang et al.  within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  for implementation of a system and method for a  3D trajectory detection and estimation of object  movement prediction.

As per Claim 15, Combination of HAYAKAWA et al. and DENG et al. teach claim 9,
Combination of HAYAKAWA et al. and DENG et al. does not explicitly teach further comprising: identifying, by the one or more processors, a third keypoint corresponding to a sheltered keypoint. 
Within analogous art, Zhang et al. teaches further comprising: identifying, by the one or more processors( one or more processors within Paragraph [0080]), a third keypoint corresponding to a sheltered keypoint( Paragraphs [0133] and [0139])  .  
One of ordinary skill in the art would have been motivated to combine the teaching of Zhang et al. within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  because the Methods and systems for facilitating interactive training of body-eye coordination and reaction time mentioned by Zhang et al.  provides a system and method for implementing  3D trajectory detection and estimation of object  movement prediction. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Methods and systems for facilitating interactive training of body-eye coordination and reaction time mentioned by Zhang et al.  within the combined  modified teaching of the Pedestrian action recognition and localization using rgb images mentioned by HAYAKAWA et al. and the Method, system, and medium for identifying human behavior in a digital video using convolutional neural networks mentioned by DENG et al.  for implementation of a system and method for a  3D trajectory detection and estimation of object  movement prediction.
It is noted that any citations to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the reference should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. See MPEP 2123. 

Allowable Subject Matter

4.          Claims 5,13,8,16,19 and 20  are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

5.         The following is an examiner’s statement of reasons for objecting the claims as allowable subject matter: 
As to claims 5,13 and 19 , prior art of record does not teach or suggest the limitation mentioned within claims 5,13 and 19: “… extracting the spatial-temporal features from the combined first keypoints and second keypoints further comprises: using, by the one or more processors, the combined first keypoints and second keypoints as input for a Graph Convolutional Neural Network (GCN) model to extract the spatial-temporal features, wherein a result of the GCN model comprises final first keypoints and final second keypoints. ”  

As to claims 8,16 and 20 , prior art of record does not teach or suggest the limitation mentioned within claims 8,16 and 20: “…a relationship between an nth frame and an (n+1)th frame from the temporal sequence of image frames using tracking algorithms, wherein a position of the third keypoint in the (n+1)th frame is determined based on a position of the third keypoint in the nth frame in which the third keypoint is visible.  ”  



Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
6. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to OMAR S. ISMAIL whose telephone number is (571)272-9799 and Fax # (571)273-9799. The examiner can normally be reached on M-F: 9:00 AM - 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http:/ If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, David C. Payne can be reached on (571)272-3024. The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free)? If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OMAR S ISMAIL/Primary Examiner, Art Unit 2637