DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/15/2022 was filed after the mailing date of 6/15/2022.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 7-9, 12, 13, 16, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jagannathan et al (US10842415) in view of Sato (US20210225029).
Regarding claim 1, Jagannathan teaches a system (fig. 1), comprising: 
a computing system including at least one computing device (120 in fig. 1, col. 8 lines 25-28); and 
a camera communicatively coupled to the computing system via a network (110 in fig. 1, col. 7 lines 21-24, col. 8 lines 25-28 and 58-61,  Such a sensor system may include one or more of: a gyroscope, an accelerometer, a magnetometer, and an image sensor, for example to measure movement of one or more body portions in nine degrees of freedom); 
wherein the computing system is configured to: 
generate a set of features (input to 820 in fig. 8, 1110 in fig. 11, col. 20 lines 22-27, The data in table 1110 are fed into the joint-vision model described elsewhere herein) based on three-dimensional joint location data representing postures of a subject (fig. 4, col. 15 lines 32-37, using an image sensor to detect range of motion, the angle of joints observed on the limbs, location of various body portions, etc) over a plurality of time steps depicted in video data captured by the camera (TIME in 1110 in fig. 11, col. 8 lines 44-48, col. 20 lines 57-59, Applying the joint-vision model to the incoming video (subsequently parsed into individual images or frames)); 
provide the set of features to a first machine learning model (820 in fig. 8, col. 18 lines 51-57, joint-vision model) trained to identify a start time step, an end time step, and an action identity for each action of the plurality of actions (1130 in fig. 11, col. 20 lines 22-27, The data in table 1110 are fed into the joint-vision model described elsewhere herein, which outputs the data shown in table 1130 showing no directional movement of the shoulders at times 12.10 and 12.20 but directional movement of the shoulders towards the right (opposite the hips) at times 12.30 and 12.40); and
provide a set of features to a second machine learning model (830 in fig. 8, col. 24 lines 54-59, col. 26 lines 51-59, The third machine learning model or posture master model may select to use the data output from one or both of the joint-vision model and the joint-sensor model to output a gait, movement, stability, etc. of a user; third machine learning model is further configured to perform a method including: selecting the sensor signal) trained to determine a postural assessment score for each time step (col. 25 lines 5-14, Based on these same inputs, in the deployment phase, the posture master model assigns a score).

Jagannathan fails to teach providing the same set of features to a second machine learning model and determining an action score for each action of the plurality of actions based on the start time steps, the end time steps, the action identities, and the postural assessment scores for each time step.
However Sato teaches providing a set of features (output of 32 in fig. 5, para. [0074], annotation information) to a first machine learning model and a second machine learning model (joint position estimation model and posture class estimation model in fig. 5, para. [0076], the learner 33 creates the joint position estimation model and the posture class estimation model, trained with video recordings for training and annotation information acquired by the annotator 32) and determining an action score for each action (23 in fig. 9, para. [0094], [0121], [0123], The analyzer 23 is configured to generate analytical information on a work efficiency status of a worker; That is, when the difference between the working time and the standard time is less than the first threshold value, the analyzer 23 determines that the skill level is “good” (A). When the difference is equal to or greater than the first threshold value and less than the second threshold value, the analyzer 23 determines that the skill level is “passing” (B). When the difference is equal to or greater than the second threshold value, the analyzer 23 determines that the skill level is “failing” (C)) of a plurality of actions (para. [0039], different predetermined reference body postures for respective steps of work) based on start time steps (62 in fig. 9, para. [0096], Specifically, the working time measurer 62 acquires a working time as a time period from a work start time to the present time, where the work start time is the time at which the trigger posture is detected when the work is not in progress), end time steps (62 in fig. 9, para. [0096], the working time measurer 62 acquires a working time as a time period from the work start time to a work end time, where the work end time is the time at which the trigger posture is detected when the work is in progress), action identities (para. [0038], [0114], then the analyzer 23 determines whether or not the worker is working (ST208)), and postural assessment scores for each time step (61 in fig. 9, para. [0094], [0098], based on the posture detection information acquired by the posture detector 22; posture class label).

Therefore taking the combined teachings of Jagannathan and Sato as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the features of Sato into the apparatus of Jagannathan. The motivation to combine Sato and Jagannathan would be to make analysis on work efficiency statuses of workers in an efficient manner without putting a greater burden on an administrator to perform setting operations for the analysis (para. [0132] of Sato).


	Regarding claim 2, the modified invention of Jagannathan teaches a system wherein the computing system is further configured to: 
receive the video data (col. 8 lines 44-48 of Jagannathan, Applying the joint-vision model to the incoming video (subsequently parsed into individual images or frames)) from the camera (110 in fig. 1, col. 7 lines 21-24 of Jagannathan, image sensor); and 
generate the three-dimensional joint location data based on the video data (fig. 4, col. 15 lines 32-37 of Jagannathan, using an image sensor to detect range of motion, the angle of joints observed on the limbs, location of various body portions, etc).


	Regarding claim 3, the modified invention of Jagannathan teaches a system wherein each time step of the plurality of time steps is represented by a frame of the video data (TIME in 1110 in fig. 11, each time instance would correspond to a single frame).


	Regarding claim 7, the modified invention of Jagannathan teaches a system wherein the computing system is further configured to: 
receive a set of training data (para. [0074] of Sato, each frame image of the video recordings for training), wherein each instance of training data in the set of training data is labeled with a plurality of training activities that each take place over a series of time steps (para. [0039], [0074] of Sato, different predetermined reference body postures for respective steps of work; to thereby generate annotation information. The annotation information includes joint positions and a posture class label which are associated with the ID (frame number) of each frame image), and wherein each training activity of the plurality of training activities is labeled with a postural assessment score (para. [0074] of Sato, posture class label).


Regarding claim 8, the modified invention of Jagannathan teaches a system wherein the computing system is further configured to: 
train the first machine learning model and the second machine learning model using the set of training data (para. [0028] of Sato, by using machine learning, create machine-learned information for the joint position estimation and machine-learned information for the posture match determination, trained with video recordings for training, information records of joint positions of the worker in the video recordings, and information indicating whether or not to achieve the posture match).


Regarding claim 9, the claim recites limitations which are similar to those claimed in claim 1 and there is rejected for the same reasons as stated above.


Regarding claim 12, the claim recites limitations which are similar to those claimed in claim 2 and there is rejected for the same reasons as stated above.


Regarding claim 13, the claim recites limitations which are similar to those claimed in claim 3 and there is rejected for the same reasons as stated above.


Regarding claim 16, Jagannathan teaches a computer-implemented method of training machine learning models (col. 20 lines 34-38) to provide biomechanical analyses of movements of subjects (col. 20 lines 47-54, As such, the model was modified to use sequences of images (e.g., up to 32 images per second) to monitor and analyze movement of an individual over time. The joint-vision model is trained to classify a plurality of image frames having one or more body portions therein, output a discrete label for the one or more body portions, and localize the one or more body portions in each image frame), the method comprising: 
receiving, by a computing device, a set of training data (col. 20 lines 34-38); 
generating, by the computing device, features for each instance of training data (input to 820 in fig. 8, 1110 in fig. 11, col. 20 lines 22-27, The data in table 1110 are fed into the joint-vision model described elsewhere herein); 
training, by the computing device, a first machine learning model (820 in fig. 8, col. 20 lines 34-38, train a joint-vision model) to accept features of an instance of training data as input and to provide a plurality of action identities, start time steps, and end time steps as output (1130 in fig. 11, col. 20 lines 22-27, The data in table 1110 are fed into the joint-vision model described elsewhere herein, which outputs the data shown in table 1130 showing no directional movement of the shoulders at times 12.10 and 12.20 but directional movement of the shoulders towards the right (opposite the hips) at times 12.30 and 12.40); and
training (col. 8 lines 48-50, In some embodiments, computing device or server 130, learns or trains each of the machine learning models described elsewhere herein), by the computing device, a second machine learning model (830 in fig. 8, 1140 in fig. 11, col. 25 lines 16-18) to accept features of an instance of training data as input (col. 8 lines 48-50, In some embodiments, computing device or server 130, learns or trains each of the machine learning models described elsewhere herein) and to provide a postural assessment score as output for each time step (col. 25 lines 5-14, Based on these same inputs, in the deployment phase, the posture master model assigns a score).

Jagannathan fails to teach wherein each instance of training data in the set of training data is labeled with a plurality of activities that each take place over a series of time steps, and wherein each activity of the plurality of activities is labeled with a postural assessment score; and 
storing, by the computing device, the first machine learning model and the second machine learning model in a model data store for processing new data.

However Sato teaches training data (para. [0028] of Sato, by using machine learning, create machine-learned information for the joint position estimation and machine-learned information for the posture match determination, trained with video recordings for training, information records of joint positions of the worker in the video recordings, and information indicating whether or not to achieve the posture match) is labeled with a plurality of activities (para. [0039], different predetermined reference body postures for respective steps of work) that each take place over a series of time steps (62 in fig. 9, para. [0096]), and wherein each activity of the plurality of activities is labeled with a postural assessment score (61 in fig. 9, para. [0094], [0098], based on the posture detection information acquired by the posture detector 22; posture class label); and 
storing, by the computing device, a first machine learning model and a second machine learning model in a model data store for processing new data (para. [0065], The storage 13 stores joint position estimation models, posture class estimation models (machine-leaned information trained with video recordings), posture detection information, and analytical information generated by the controller 14).

Therefore taking the combined teachings of Jagannathan and Sato as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the features of Sato into the apparatus of Jagannathan. The motivation to combine Sato and Jagannathan would be to make analysis on work efficiency statuses of workers in an efficient manner without putting a greater burden on an administrator to perform setting operations for the analysis (para. [0132] of Sato).


Regarding claim 17, the modified invention of Jagannathan teaches a computer-implemented method wherein each instance of training data is a video of a subject (col. 20 lines 50-56 of Jagannathan, it would be necessary to train the model on body portions of a subject to label body portions), and wherein the method further comprises generating three-dimensional joint location data based on the video of the subject (fig. 4, col. 15 lines 32-37 of Jagannathan, using an image sensor to detect range of motion, the angle of joints observed on the limbs, location of various body portions, etc).


Claim(s) 4, 10, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jagannathan et al (US10842415) and Sato (US20210225029) in view of Lea et al (“Temporal Convolutional Networks for Action Segmentation and Detection”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 156-165, retrieved from the Internet on 8/16/2022).
Regarding claim 4, the modified invention of Jagannathan fails to teach a system wherein the first machine learning model includes an encoder-decoder temporal convolutional network.
However Lea teaches using an encoder-decoder temporal convolutional network for action segmentation (fig. 1, Our Encoder-Decoder Temporal Convolutional Network (ED-TCN) hierarchically models actions using temporal convolutions, pooling, and upsampling), wherein action segmentation predicts actions at every frame (page 157, section 2, Action segmentation methods predict what action is occurring at every frame in a video and detection methods output a sparse set action segments, where a segment is defined by a start time, end time, and class label).
Therefore taking the combined teachings of Jagannathan and Sato with Lea as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the features of Lea into the apparatus of Jagannathan and Sato. The motivation to combine Lea, Sato and Jagannathan would be to efficiently capture long-range patterns (page 156, left side, last paragraph of Lea).


Regarding claim 10, the claim recites limitations which are similar to those claimed in claim 4 and there is rejected for the same reasons as stated above.


Regarding claim 19, the modified invention of Jagannathan fails to teach a computer-implemented method wherein training the first machine learning model includes training an encoder-decoder temporal convolutional network.
However Lea teaches training an encoder-decoder temporal convolutional network (page 163 right side first paragraph).
Therefore taking the combined teachings of Jagannathan and Sato with Lea as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the features of Lea into the apparatus of Jagannathan and Sato. The motivation to combine Lea, Sato and Jagannathan would be to efficiently capture long-range patterns (page 156, left side, last paragraph of Lea).


Claim(s) 5, 14, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jagannathan et al (US10842415) and Sato (US20210225029) in view of Hoffmann et al (US20210012100).
Regarding claim 5, the modified invention of Jagannathan fails to teach a system wherein the second machine learning model includes a stacked long short-term memory model.
However Hoffmann teaches using a stacked long short-term memory model for action recognition (para. [0056], With this clustering technique, a LSTM framework is enhanced by feeding the output of stacked LSTM to a DEC layer and enforcing maximum separation in the embedded space. Then, a dense layer is used to predict actions).
Therefore taking the combined teachings of Jagannathan and Sato with Hoffmann as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the features of Hoffmann into the apparatus of Jagannathan and Sato. The motivation to combine Hoffmann, Sato and Jagannathan would be to achieve performance similar to prior action recognition techniques using only a fraction of information (para. [0062] of Hoffmann).


Regarding claim 14, the modified invention of Jagannathan fails to teach a computer-implemented method wherein providing the set of features to the second machine learning model includes providing the set of features to a stacked long short-term memory (LSTM) model to obtain the postural assessment scores.
However Hoffmann teaches using a stacked long short-term memory model for action recognition (para. [0056], With this clustering technique, a LSTM framework is enhanced by feeding the output of stacked LSTM to a DEC layer and enforcing maximum separation in the embedded space. Then, a dense layer is used to predict actions) and generating a postural assessment score (812 in fig. 8A, LSTM_Pose accuracy value).
Therefore taking the combined teachings of Jagannathan and Sato with Hoffmann as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the features of Hoffmann into the apparatus of Jagannathan and Sato. The motivation to combine Hoffmann, Sato and Jagannathan would be to achieve performance similar to prior action recognition techniques using only a fraction of information (para. [0062] of Hoffmann).


Regarding claim 20, the modified invention of Jagannathan fails to teach a computer-implemented method wherein training the second machine learning model includes training a stacked long short-term memory structure to provide postural assessment scores.
However Hoffmann teaches training (para. [0071], In this evaluation, all the poses that belong to the same label are separated into separate training sets and a linear classifier, such as SVM (curve 800), is applied to recognize the actions) a stacked long short-term memory structure (para. [0056], With this clustering technique, a LSTM framework is enhanced by feeding the output of stacked LSTM to a DEC layer and enforcing maximum separation in the embedded space) to provide postural assessment scores (812 in fig. 8A, LSTM_Pose accuracy value).


Claim(s) 6, 15, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jagannathan et al (US10842415) and Sato (US20210225029) in view of Yao et al (US20200394499).
Regarding claim 6, the modified invention of Jagannathan fails to teach a system wherein generating the set of features based on the three-dimensional joint location data includes providing the three-dimensional joint location data to a sequence of stacked graph convolutional networks.
However Yao teaches providing joint location data (para. [0035], [0047], skeletal joint connections over time;  A conventional STG is based on a human skeleton and includes nodes that describe only skeletal joints, spatial connections that depend on physical adjacency of these joints in the human body, and temporal edges that connect joints of the same type (e.g., right wrist to right wrist) across one consecutive time steps) to a sequence of stacked graph convolutional networks (fig. 6, para. [0008], a machine learning system as described herein implements a plurality of stacked hourglass STGCNs (also referred to herein as “stacked STGCNs”) that process the STG, each STGCN comprising a plurality of STGCN layers).
Therefore taking the combined teachings of Jagannathan and Sato with Yao as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the features of Yao into the apparatus of Jagannathan and Sato. The motivation to combine Yao, Sato and Jagannathan would be to improve the accuracy of action recognition, localization, and segmentation over conventional techniques (para. [0009] of Yao).


Regarding claim 15, the claim recites limitations which are similar to those claimed in claim 6 and there is rejected for the same reasons as stated above.


Regarding claim 18, the claim recites limitations which are similar to those claimed in claim 6 and there is rejected for the same reasons as stated above.


Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jagannathan et al (US10842415), Sato (US20210225029) and Lea et al (“Temporal Convolutional Networks for Action Segmentation and Detection”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 156-165, retrieved from the Internet on 8/16/2022) in view of Hoffmann et al (US20210012100).
Regarding claim 11, the modified invention of Jagannathan fails to teach a computer-implemented method wherein providing the set of features to the second machine learning model includes providing the set of features and the action identities to a stacked long short-term memory (LSTM) model to generate the postural assessment scores.
However Hoffmann teaches using a stacked long short-term memory model for action recognition (para. [0056], With this clustering technique, a LSTM framework is enhanced by feeding the output of stacked LSTM to a DEC layer and enforcing maximum separation in the embedded space. Then, a dense layer is used to predict actions) and generating a postural assessment score (812 in fig. 8A, LSTM_Pose accuracy value).
Therefore taking the combined teachings of Jagannathan, Lea and Sato with Hoffmann as a whole, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to incorporate the features of Hoffmann into the apparatus of Jagannathan, Lea and Sato. The motivation to combine Hoffmann, Lea, Sato and Jagannathan would be to achieve performance similar to prior action recognition techniques using only a fraction of information (para. [0062] of Hoffmann).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEON VIET Q NGUYEN whose telephone number is (571)270-1185. The examiner can normally be reached Mon-Fri 11AM-7PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LEON VIET Q NGUYEN/Primary Examiner, Art Unit 2663