DETAILED ACTION
This action is in reply to the submission filed on 9/30/2022.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Applicant’s cancellation of claims 6 and 20, amendments to claims 1, 3, 4 and 7-19, and addition of claims 21 and 22 are acknowledged.
Claims 1-5, 7-19 and 21-22 are currently pending and have been examined under the effective filing date of 3/26/2021.
Response to Arguments
Applicant's arguments filed 9/30/2022 have been fully considered but they are not persuasive in full.  
Regarding pages 25 and 26 of Applicant’s remarks, Examiner thanks Applicant for incorporating the specific limitations of the machine learning dataset into the independent claims, thereby indicating subject matter eligibility, explained in further detail in this action’s section titled “Claim Analysis – 35 USC 101.” In summation: The specific, meaningful limitation of using one or more training corpora comprising a plurality of images of distinct gestures to comprise the time recording machine learning model brings the judicial exception into a practical application. Therefore, these additional elements integrate the abstract idea into a practical application because it imposes meaningful limits on practicing the abstract idea, and the claims are patent eligible.
Regarding pages 27-31 of remarks and the amendments to claims 1-5, 7-19 and 21, Examiner cites Sahashi to teach the limitations of using hand gestures as a way to enter an attendance or time punch event.  However, Sahashi does not teach a punch out event, as seen in claim 22.

Allowable Subject Matter
Claim 22 is allowed.
	Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: 
Regarding the prior art rejections, no prior art or non-patent literature has been found that matches digitally mapping a plurality of hand gestures recognized by machine learning to a clock-in or clock-out time recording code.  The closest non-patent literature that reads on the Application is Ganguly, Kinect Sensor Based Gesture Recognition for Surveillance Application. Ganguly teaches using hand gesture recognition for a variety of surveillance applications. The closest prior art that reads on the claims are: Svenson (Pub. No. US 2021/0307621 A1), Sahashi (Pub. No. US 2006/0057550 A1), and Ido Nobuhiko (JP 2004220130 A), and Diamant et al. (Pub. No. US 2019/0341050 A1.) Svenson teaches a machine learning system for recognizing multiple people’s faces, gestures, and movements but not digitally mapping them to a clock-in or clock-out time code.  Sahashi teaches using hand gestures to take attendance, but not using them with machine learning to clock out.  Ido Nobuhiko teaches a raising a hand to take attendance, but also not using machine learning to link them to a clock out event.  Diamant teaches using a variety of hand gestures to indicate an action in a machine learning environment, but not to generate a clock in or clock out event. None of these teaches alone or in combination disclose digitally mapping a plurality of hand gestures recognized by machine learning to a clock-in or clock-out time recording code. In summation, Claim 22 is distinct from the closest prior art and non-patent literature.  For these reasons, claim 22 is allowable.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Please see below for claim 22 mapping:
Regarding Claim 22, Svenson teaches the method for machine learning-based automated electronic time recording, the method comprising: 
(1) identifying, via a scene capturing device, a representation of a time recording space; (Svenson ¶0440; conversion unit 1118 firstly invokes a viewpoint fusion module 1117 which maps the tubelets into a common 3D space and fuses together tubelets which are associated with the same real world entity.)
(2) identifying, by one or more computers, a plurality of distinct bodies having a time recording pose within the time recording space based on an assessment of the representation of the time recording space: (Svenson ¶0496; another example, a low frequency of hand-raising by a student, as compared to a frequency that is indicated as normal by the student's profile, may also be detected)
 (3) extracting a plurality of distinct features from the representation of the time recording space based on identifying the plurality of distinct bodies having the time recording pose, wherein extracting the plurality of distinct features includes: (Svenson ¶0507; reporting process of the detection server 120 can also be configured to include attendance monitoring, which reports on the detection of the ingress and egress of each student in a class, for example based on facial recognition)
(3-A) extracting a first distinct portion of a body for each of the plurality of distinct bodies within the representation of the time recording space; (Svenson ¶0091; automatically detect head, face (including facial expressions), hands (including hand gestures), eyes, nose and mouth)
and (3-B) extracting a second distinct portion of the body for each of the plurality of distinct bodies within the representation of the time recording space, wherein extracting the second distinct portion of the body for each of the plurality of distinct bodies includes extracting a hand segment that is causing each of the plurality of distinct bodies to be in the time recording pose; (Svenson ¶0091; automatically detect head, face (including facial expressions), hands (including hand gestures), eyes, nose and mouth)
(4) instantiating and executing a distinct automated employee-recognition for each of the plurality of distinct bodies based on extracting the first portion of the body for each of the plurality of distinct bodies, wherein executing the automated employee-recognition includes: (Svenson ¶0225; checking their facial image against stored set of facial features associated with expected attendees, by using high level of confidence (or low level of tolerance))
(4-A) generating, by an employee-identification machine learning model, (Svenson ¶0384; object detector unit 1106 can be implemented as one of a variety of different pattern classifiers, such as for example Convolutional Neural Networks (CNNs) which are trained on large, manually annotated datasets) an employee-identification inference for each of the plurality of distinct bodies based on a model input comprising extracted features of the first portion of the body, (Svenson ¶0086; The face detection performed by the face detector or the abnormality detection server 120 may use known facial recognition techniques,) wherein the employee-identification machine learning model comprises a convolutional neural network (CNN) that is trained based on one or more training corpora comprising a plurality of distinct portrait images of a plurality of distinct employees; (Svenson ¶0191; predetermined gestures may include, for example, one or more of the following:… hand raising) (Svenson ¶0385; initial base CNN is pretrained on the very large image classification … detected large objects are then cropped out from the original, high resolution camera frame and fed into a sub-component object detector 1306 which identifies the location of finer subcomponents (e.g., faces, hands, arms, etc.).)
and (4-B) identifying, by one or more computers, an employee identifier value for each body of the plurality of distinct bodies based on the user-identification inference; (Svenson ¶0225; checking their facial image against stored set of facial features associated with expected attendees, by using high level of confidence (or low level of tolerance))
(5) instantiating and executing a distinct automated time recording-recognition for each of the plurality of distinct bodies based on extracting the second distinct portion of the body for each of the plurality of distinct bodies; (Svenson ¶0191; predetermined gestures may include, for example, one or more of the following:… hand raising)
(5-A) generating, by a time recording machine learning model, a time recording action inference for each of the plurality of distinct bodies based on a model input comprising extracted features of the second portion of the body, wherein: (Svenson ¶0191; predetermined gestures may include, for example, one or more of the following:… hand raising) (Svenson ¶0385; initial base CNN is pretrained on the very large image classification … detected large objects are then cropped out from the original, high resolution camera frame and fed into a sub-component object detector 1306 which identifies the location of finer subcomponents (e.g., faces, hands, arms, etc.).)
Svenson does not, but Sahashi does teach:
	(2-A) a target distinct body is determined to be in a time recording pose when a hand of the target body is detected above a head of the target body, (Sahashi ¶0015; Examples of the actions that the student is requested to perform by the action request means include any action that causes changes in the acquired video, and suitable examples include moving the head, closing the eyes, moving the mouth, or raising a hand)
and (2-B) the target distinct body is determined not to be in the time recording pose when the hand of the target body is detected below the head of the target body; (Sahashi ¶0015; Examples of the actions that the student is requested to perform by the action request means include any action that causes changes in the acquired video, and suitable examples include moving the head, closing the eyes, moving the mouth, or raising a hand)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the system in Svenson with the known technique of time logging in response to an action in Sahashi because applying the known technique would have yielded predictable results and resulted in an improved system by allowing a more flexible attendance taking system. (Sahashi ¶0013; providing the system for confirming that the legitimate student is in attendance)
Neither Svenson nor Sahashi teaches:
(5-A-1) generating the time recording action inference for each of the plurality of distinct bodies includes predicting a hand gesture classification inference for each of the plurality of distinct bodies, 
(5-A-2) the time recording action inference generated for a first distinct body of the plurality of distinct bodies indicates the hand segment associated with the first distinct body corresponds to a first hand gesture, 
and (5-A-3) the time recording action inference generated for a second distinct body of the plurality of distinct bodies indicates the hand segment associated with the second distinct body corresponds to a second hand gesture, different from the first hand gesture; 
and (5-B) identifying, by the one or more computers, a time recording code of a plurality of distinct time recording codes for each body of the plurality of distinct bodies based on the time recording action inference, wherein:
 (5-B-1) the one or computers identified that the time recording code associated with the first distinct body corresponds to a clock-in time recording code based on the one or more computers identifying that the first hand gesture is digitally mapped to the clock-in time recording code; 
and (5-B-2) the one or computers identified that the time recording code associated with the second distinct body corresponds to a clock-out time recording code based on the one or more computers identifying that the second hand gesture is digitally mapped to the clock-out time recording code; 
and (6) executing, via a time recording application executed by one or more computers, a distinct automated electronic time recording event for each of the plurality of distinct bodies based on time recording inputs of (i) the employee identifier value and (ii) the time recording code associated with each body of the plurality of distinct bodies within the representation of the time recording space, wherein: 
(6-A) executing the distinct automated electronic time recording event for the first distinct body includes setting a first user account of the time recording application to a clocked-in time recording state, wherein the first user account is associated with the first distinct body;
 and (6-B) executing the distinct automated electronic time recording event for the second distinct body includes setting a second user account of the time recording application to a clocked-out time recording state, wherein the second user account is associated with the second distinct body.

Claim Analysis - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-5, 7-19 and 21-22 are not rejected under 35 U.S.C. 101 because the claimed invention is directed to a practical application of an abstract idea. 
Step 1: the claims fall under statutory categories of processes and/or machines.
Step 2A Prong 1: the claims recite: identifying a representation of a time recording space that includes a plurality of distinct bodies, detecting a pose for each body, determining that a first subset of bodies are in a time recording pose and a second subset are not, wherein the pose involves a position of a body’s hand, and either executing or foregoing executing a time punch for each body based on the pose.  These limitations, as drafted, are a process that, under its broadest reasonable interpretation, covers mental processes, specifically concepts performed in the human mind (including an observation, evaluation, judgement or opinion).
Step 2A Prong 2: While the benefits of computing technology applied to methods of organizing human activity are recognized, said judicial exception is integrated into a practical application because the claims as a whole, looking at the additional elements of: a scene capturing device, a pose detection model, an employee-identification machine learning model with employee-identification inferences and employee identifier values, a time recording machine learning model, wherein the time recording machine learning model comprises a convolutional neural network (CNN) that is trained based on one or more training corpora comprising a plurality of images of distinct hand-based time recording gestures, one or more computers, and a time recording application executed by the one or more computers individually and in combination, do not merely use a computer as a tool to perform the abstract idea (see MPEP 2106.05f.) These limitations are not recited at a high level of generality (i.e. as a general purpose computer performing the claimed abstract ideas) such that it amounts to no more than mere instructions to apply the exception using a general purpose computer component. The specific, meaningful limitation of using one or more training corpora comprising a plurality of images of distinct gestures to comprise the time recording machine learning model brings the judicial exception into a practical application. Therefore, these additional elements integrate the abstract idea into a practical application because it imposes meaningful limits on practicing the abstract idea, and the claims are patent eligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 7-19 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Svenson et al. (Pub. No. US 2021/0307621 A1) in view of Sahashi (Pub. No. US 2006/0057550 A1.)
Regarding Claim 1, Svenson discloses a method for a machine learning-based automated electronic time recording, the method comprising: 
identifying, via a scene capturing device, a representation of a time recording space located at a facility, wherein the representation of the time recording space includes a plurality of distinct bodies; (Svenson ¶0440; conversion unit 1118 firstly invokes a viewpoint fusion module 1117 which maps the tubelets into a common 3D space and fuses together tubelets which are associated with the same real world entity.)
detecting, via a pose detection model, a pose for each of the plurality of distinct bodies based on the representation of the time recording space; (Svenson ¶0496; another example, a low frequency of hand-raising by a student, as compared to a frequency that is indicated as normal by the student's profile, may also be detected)
wherein executing the distinct digital time punch for each distinct body included in the first subset includes:
(1) extracting a plurality of distinct features from the representation of the time recording space, wherein extracting the plurality of distinct features includes: (Svenson ¶0507; reporting process of the detection server 120 can also be configured to include attendance monitoring, which reports on the detection of the ingress and egress of each student in a class, for example based on facial recognition)
 		(1-a) extracting, from the representation of the time recording space, a first distinct portion of a body for each of distinct body included in the first subset of the plurality of distinct bodies; and (Svenson ¶0091; automatically detect head, face (including facial expressions), hands (including hand gestures), eyes, nose and mouth)
(1-b) extracting, from the representation of the time recording space, a second distinct portion of the body for each of distinct body included in the first subset of the plurality of distinct bodies; (Svenson ¶0091; automatically detect head, face (including facial expressions), hands (including hand gestures), eyes, nose and mouth)
(2) instantiating and executing a distinct automated employee-recognition for each distinct body included in the first subset of the plurality of distinct bodies based on extracting the first portion of the body for each distinct body included in the first subset of the plurality of distinct bodies, wherein executing the automated employee-recognition includes: (Svenson ¶0225; checking their facial image against stored set of facial features associated with expected attendees, by using high level of confidence (or low level of tolerance))
(2-a) generating, by an employee-identification machine learning model, (Svenson ¶0384; object detector unit 1106 can be implemented as one of a variety of different pattern classifiers, such as for example Convolutional Neural Networks (CNNs) which are trained on large, manually annotated datasets) an employee-identification inference for each distinct body included in the first subset of the plurality of distinct bodies based on a model input comprising extracted features of the first portion of the body for each distinct body included in the first subset of the plurality of distinct bodies; and (Svenson ¶0086; The face detection performed by the face detector or the abnormality detection server 120 may use known facial recognition techniques,)
(2-b) identifying, by one or more computers, an employee identifier value for each distinct body included in the first subset of the plurality of distinct bodies based on the employee-identification inference generated for each distinct body included in the first subset of the plurality of distinct bodies; (Svenson ¶0225; checking their facial image against stored set of facial features associated with expected attendees, by using high level of confidence (or low level of tolerance))
(3) instantiating and executing a distinct automated time recording-recognition for each distinct body included in the first subset of the plurality of distinct bodies based on extracting the second distinct portion of the body for each distinct body included in the first subset of the plurality of distinct bodies, wherein executing the distinct time recording-recognition includes; (Svenson ¶0191; predetermined gestures may include, for example, one or more of the following:… hand raising)
(3-a) generating, by a time recording machine learning model, a time recording action inference for each distinct body included in the first subset of the plurality of distinct bodies based on a model input comprising extracted features of the second portion of the body for each distinct body included in the first subset of the plurality of distinct bodies, wherein the time recording machine learning model comprises a convolutional neural network (CNN) that is trained based on one or more training corpora comprising a plurality of images of distinct hand-based time recording gestures; (Svenson ¶0191; predetermined gestures may include, for example, one or more of the following:… hand raising) (Svenson ¶0385; initial base CNN is pretrained on the very large image classification … detected large objects are then cropped out from the original, high resolution camera frame and fed into a sub-component object detector 1306 which identifies the location of finer subcomponents (e.g., faces, hands, arms, etc.).)
(3-b) identifying, by one or more computers, a time recording code of a plurality of distinct time recording codes for each distinct body included in the first subset of the plurality of distinct bodies based on the time recording action inference generated for each distinct body included in the first subset of the plurality of distinct bodies; (Svenson ¶0508; The attendance module 134 can be configured to process the attendance event data to generate logging event data)
(4) and executing, via a time recording application executed by the one or more computers, a distinct automated electronic time recording event for each distinct body included in the first subset of the plurality of distinct bodies based on time recording inputs of (4-a) the employee identifier value associated with each distinct body included in the first subset of the plurality of distinct bodies and (4-b) the time recording code associated with each distinct body included in the first subset of the plurality of distinct bodies. (Svenson ¶0291; the attendance data transmitted to the BPMS include an identifier of a student detected to be entering or leaving the monitored location (e.g., a unique student ID value), and an indication of the time when the student enters or leaves the location)
Svenson does not, but Sahashi does teach:
determining, based on the pose detected for each of the plurality of distinct bodies, that a first subset of the plurality of distinct bodies are in a time recording pose and that a second subset of the plurality of distinct bodies are not in a time recording pose, wherein:
 a target body of the plurality of distinct bodies is determined to be in a time recording pose when a hand of the target body is detected above a predetermined portion of the target body, and (Sahashi ¶0015; Examples of the actions that the student is requested to perform by the action request means include any action that causes changes in the acquired video, and suitable examples include moving the head, closing the eyes, moving the mouth, or raising a hand)
 the target body is determined not to be in the time recording pose when the hand of the target body is detected below the predetermined portion of the target body; and (Sahashi ¶0015; Examples of the actions that the student is requested to perform by the action request means include any action that causes changes in the acquired video, and suitable examples include moving the head, closing the eyes, moving the mouth, or raising a hand)
based on determining that the first subset of the plurality of distinct bodies are in the time recording pose and that the second subset of the plurality of distinct bodies are not in the time recording pose:
	forgoing executing a distinct digital time punch for each distinct body included in the second subset of the plurality of distinct bodies; and (Sahashi ¶0014; it is possible to detect a substitute attendee, because the image changes corresponding to the action request does not occur if the legitimate student asks a third person to be a substitute attendee using prerecorded video, and thus, the attendance of the legitimate student can be confirmed.)
executing a distinct digital time punch for each distinct body included in the first subset of the plurality of distinct bodies. (Sahashi ¶0014; it is possible to detect a substitute attendee, because the image changes corresponding to the action request does not occur if the legitimate student asks a third person to be a substitute attendee using prerecorded video, and thus, the attendance of the legitimate student can be confirmed.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the system in Svenson with the known technique of time logging in response to an action in Sahashi because applying the known technique would have yielded predictable results and resulted in an improved system by allowing a more flexible attendance taking system. (Sahashi ¶0013; providing the system for confirming that the legitimate student is in attendance)

Regarding Claim 2, Svenson as modified by Sahashi teaches the method according to claim 1, wherein the scene capturing device comprises an image capturing device configured to capture one or more representations of the time recording space. (Svenson ¶0068; video cameras 112 and the thermal imaging devices 114 are in communication with the abnormality detection server 120 via one or more communication networks)

Regarding Claim 3, Svenson as modified by Sahashi teaches the method according to claim 1, wherein at least one body of the plurality of bodies is determined to have the time recording pose in response to detecting that a hand of the at least one body is above a predetermined position of the at least one body. (Svenson ¶0191; predetermined gestures may include, for example, one or more of the following:… hand raising)

Regarding Claim 4, Svenson as modified by Sahashi teaches the method according to claim 1, wherein: 
extracting the first distinct portion of the body for each of the plurality of distinct bodies includes extracting at least a head segment or a facial segment from each of the plurality of distinct bodies; (Svenson ¶0351; Facial recognition is performed by extracting a set of facial features from the stored frame data)
and extracting the second distinct portion of the body for each of the plurality of distinct bodies includes extracting at least a hand segment in a requisite pose from each of the plurality of distinct bodies. (Svenson ¶0191; predetermined gestures may include, for example, one or more of the following:… hand raising)

Regarding Claim 5, Svenson as modified by Sahashi teaches the method according to claim 1, wherein the employee-identification machine learning model comprises a convolutional neural network (CNN) that is trained based on one or more training corpora comprising a plurality of distinct portrait images of a plurality of distinct employees. (Svenson ¶0384; The object detector unit 1106 can be implemented as one of a variety of different pattern classifiers, such as for example Convolutional Neural Networks (CNNs) which are trained on large, manually annotated datasets.)

Regarding Claim 6, Svenson as modified by Sahashi teaches the method according to claim 1, wherein the time recording machine learning model comprises a convolutional neural network (CNN) that is trained based on one or more training corpora comprising a plurality of images of distinct hand-based time recording gestures. (Svenson ¶0385; initial base CNN is pretrained on the very large image classification … detected large objects are then cropped out from the original, high resolution camera frame and fed into a sub-component object detector 1306 which identifies the location of finer subcomponents (e.g., faces, hands, arms, etc.).)

Regarding Claim 7, Svenson as modified by Sahashi teaches the method according to claim 1, wherein: 
generating the employee-identification inference for each of the plurality of distinct bodies includes predicting a facial classification for each of the plurality of distinct bodies, (Svenson ¶0400; detection hypothesis is a single bounding box and classification from a single frame generated by the object detector 1106.)
the facial classification for each of the plurality of distinct bodies includes a distinct facial image value and an associated degree of confidence in the prediction, and (Svenson ¶0413; input to the face recognition module consists of tubelets classified as containing human faces. A confidence measure for each individual image frame in the face tubelet is computed using a conventional face detector algorithm with support for confidence estimates.)
identifying the employee identifier value for each of the of the plurality of distinct bodies includes: 
performing a search, using the employee-identification inference for each of the plurality of distinct bodies, of a data structure comprising a plurality of distinct employee identifier values digitally associated a plurality of employee-identification data; (Svenson ¶0442; behavioural event data includes: i) an identifier (ID) value which identifies the individual recognised within the snippet)
returning a distinct employee identifier value for each of the plurality of distinct bodies based on the search; (Svenson ¶0442; ii) an indication of an action class representing the action performed by the identified individual)
and digitally linking each of the plurality of distinct bodies within the representation of the time recording space to each respective distinct employee identifier for each of the plurality of distinct bodies. (Svenson ¶0443; The records of the database 1120 are used to construct behavioural profile data for each individual monitored (i.e. during a behavioural profile training process, as described below).)

Regarding Claim 8, Svenson as modified by Sahashi teaches the method according to claim 1, further comprising: 
implementing a time recording ensemble of machine learning models comprising: 
(1) the employee-identification machine learning model, and (Svenson ¶0364; comparison is performed between the extracted features and one or more facial expression models of the individual)
(2) the time recording machine learning model, (Svenson ¶0365; each student may have their own corresponding set of trained gesture models to represent the behaviour of the student when performing the gesture)
wherein the time recording ensemble of machine learning models output each of the user-identification inference and the time recording gesture inference for each body of the plurality of distinct bodies in concert to the time recording application. (Svenson Table 2; TABLE 2 Example behavioural event records stored in database 1120. No. Date Timestamp Duration(s) Identity ID Action Expression)

Regarding Claim 9, Svenson as modified by Sahashi teaches the method according to claim 1, wherein: 
the employee identifier value identified for each of the plurality of distinct bodies is associated with a distinct employee account accessible to the time recording application,  (Svenson ¶0224; exemplary attendance recording process may include: [0225] (a) for each person captured in the stored frames, checking their facial image against stored set of facial features associated with expected attendees)
before executing the plurality of distinct time recording events for each of the plurality distinct bodies, the distinct employee account associated with each identified employee identifier value is in a first time recording state, (Svenson ¶0227; if all expected attendees are recognised or all frames have been recognised, the process is complete)
and after executing the plurality of distinct time recording events for each of the plurality distinct bodies, the distinct employee account associated with each identified employee identifier value is changed to a second time recording state, distinct from the first time recording state. (Svenson ¶0228; flagging a recognition anomaly if there are unrecognised frames and absentees left)

Regarding Claim 10, Svenson as modified by Sahashi teaches the method according to claim 1, wherein: 
generating the time recording action inference for each of the plurality of distinct bodies includes predicting a hand gesture classification for each of the plurality of distinct bodies based on the model input comprising the second portion of the body, (Svenson ¶0394; affinity is simply the distance from the predicted position to the detection hypothesis position)
the hand gesture classification for the second portion of the body for each of the plurality of distinct bodies includes a distinct gesture image value and an associated degree of confidence in the prediction, and (Svenson ¶0145; higher level of confidence in skeletal tracking)
identifying the time recording code for each of the plurality of distinct bodies based on the time recording action inference includes: 
performing a search, using the time recording action inference for each of the plurality of distinct bodies, of a data structure comprising a plurality of distinct gesture image values associated with a plurality of distinct time recording codes; (Svenson ¶0442; behavioural event data includes: i) an identifier (ID) value which identifies the individual recognised within the snippet)
returning a distinct time recording code for each of the plurality of distinct bodies based on the search; (Svenson ¶0442; ii) an indication of an action class representing the action performed by the identified individual)
and digitally linking each of the plurality of distinct bodies within the representation of the time recording space to each respective time recording code for each of the plurality of distinct bodies. (Svenson ¶0443; The records of the database 1120 are used to construct behavioural profile data for each individual monitored (i.e. during a behavioural profile training process, as described below).)
	
Claim 11 is rejected on the same basis as claim 1, with the additional limitations of:
instantiating and executing a distinct automated user-recognition for each of the plurality of distinct bodies, (Svenson ¶0087; enable faces to be identified from a distance and the face image to be stored) wherein executing the automated user-recognition includes: 
(i-a) generating, by a user-identification machine learning model, (Svenson ¶0462; each activity state model is represented by a conventional multilayer perceptron neural network) a user- identification inference for each of the plurality of distinct bodies based on a model input comprising the first portion of the body; (Svenson ¶0447; activity states represent high-level behavioural categorisations of an individual, as inferred from the behavioural events that are observed from monitoring the individual over time.) and 
(i-b) identifying a user identifier value for each body of the plurality of distinct bodies based on the user-identification inference; (Svenson ¶0221; facial features may be stored in a separate document, with each set of facial features associated with a unique identifier,)
executing a plurality of distinct automated electronic time recording events, via a time recording application, for each of the plurality of distinct bodies based on inputs of (1) the user identifier value and (2) the time recording code associated with each of the plurality of distinct bodies. (Svenson ¶0291; the attendance data transmitted to the BPMS include an identifier of a student detected to be entering or leaving the monitored location (e.g., a unique student ID value), and an indication of the time when the student enters or leaves the location)

Regarding Claim 12, Svenson as modified by Sahashi teaches the method according to claim 11, wherein the time recording code identified for a first body of the plurality of bodies is different from the time recording code identified for a second body of the plurality of bodies. (Svenson ¶0291; unique student ID value)

Regarding Claim 13, Svenson as modified by Sahashi teaches the method according to claim 11, wherein at least one of the plurality of distinct bodies is in a still position or moving through the time recording space while directing attention towards the scene capturing device. (Svenson ¶0417; recognise action classes including: … Body movements (e.g. standing up, sitting down, turning back etc.).)

Regarding Claim 14, Svenson as modified by Sahashi teaches the method according to claim 11, wherein: 
the employee identifier value for each of the plurality of distinct bodies is identifiable when the plurality of distinct bodies have previously been enrolled into an automated electronic time recording system, (Svenson ¶0412; could involve, for example, students having their photo taken and added to the database of known individuals and the execution of a training algorithm on this database) and 
the employee identifier value for each of the plurality of distinct bodies is not identifiable when the plurality of distinct bodies have not previously been enrolled into an automated electronic time recording system. (Svenson ¶0220; If a face is identified in the frame but no matching set of facial features is found, the system may create a new biometric profile based on the detected biometric characteristics, and store the facial features in this biometric profile)

Regarding Claim 15, Svenson as modified by Sahashi teaches the method according to claim 11, wherein the distinct automated user- recognition and the distinct automated time recording-recognition for each of the plurality of distinct bodies are simultaneously instantiated and executed. (Svenson ¶0366; detection server 120 is configured to allow the detection of gesture and facial expression behaviours that are simultaneously exhibited by an individual)

Regarding Claim 16, Svenson as modified by Sahashi teaches the method according to claim 11, wherein the time recording space includes a respective distinct body, different from the plurality of distinct bodies, (Svenson ¶0089; may detect from one to at least 120 faces per frame (to allow monitoring a group of people simultaneously) the method further comprising: 
identifying that each of the plurality of distinct bodies have the time recording pose within the time recording space and that the respective distinct body does not have the time recording pose within the time recording space; and (Svenson ¶0357; processing the behavioural detection data to determine that the behavioural characteristic is one of a behaviour type including: a gesture; and a facial expression; comparing the behavioural detection data to one or more behavioural characteristic models of the determined behaviour type; and selecting a particular behavioural characteristic based on a result of the comparison.)
after identifying that the plurality of distinct bodies have the time recoding pose and that the respective distinct body does not have the time recording pose: 
extracting the plurality of distinct features from the representation of the time recording space based on identifying the plurality of distinct bodies having the time recording pose; and (Svenson ¶0366; detection server 120 is configured to allow the detection of gesture and facial expression behaviours that are simultaneously exhibited by an individual)
 forgoing extracting a plurality of distinct features from the representation of the time recording space corresponding to the respective distinct body based on identifying the respective distinct body does not have the time recording pose. (Svenson ¶0361; a “spinning” gesture can be defined as the spinning of an individual's forefinger in a circular motion. This spinning gesture can be interpreted as an indication that the individual desires the teacher to speak up (i.e. raise his or her voice), and allows the individual to express this desire without needing to raise their hand) Examiner notes the system has the capability to use gestures as a trigger to initiate certain actions, and the capability to recognize raising hands and taking attendance. (Svenson ¶0259; addition to the database systems storing the biometric profiles, the DIC 506 may also provide access to other databases, for example, a database for maintaining records of the abnormal detection history, and/or a database for storing the received raw data for a predetermined period of time) 

Regarding Claim 17, Svenson as modified by Sahashi teaches the method according to claim 11, wherein: 
a first distinct body of the plurality of distinct bodies comprises a first hand and a second hand, (Svenson ¶0364; features are extracted from particular regions of the image data which are identified during a pre-processing stage (e.g. the areas corresponding to the hands)
the first distinct body is determined to be in the time recording pose when a respective hand of the first distinct body is detected above a pre-determined body part of the first distinct body, and (Svenson ¶0191; predetermined gestures may include, for example, one or more of the following: [0192] a) scratching; [0193] b) hitting/striking; [0194] c) nodding; [0195] d) hand raising;)
extracting the second distinct portion for the first distinct body includes:
 in accordance with a determination that the first hand of the first distinct body is detected above the pre-determined body part, extracting the first hand of the first distinct body without extracting the second hand of the first distinct body; and (Svenson ¶0363; in the case where the individual raises their hand, the features can include the trajectories of particular regions of the individual's hand.)
in accordance with a determination that the second hand of the first distinct body is detected above the pre-determined body part, extracting the second distinct portion for the first distinct body includes extracting the second hand of the first distinct body without extracting the first hand of the first distinct body. (Svenson ¶0363; in the case where the individual raises their hand, the features can include the trajectories of particular regions of the individual's hand.)

Regarding Claim 18, Svenson as modified by Sahashi teaches the method according to claim 11, wherein the time recording space includes a plurality of time recording zones, the method further comprising: 
before executing the plurality of distinct automated electronic time recording events: 
identifying a location for each of the plurality of distinct bodies within the time recording space based on the assessment of the representation of the time recording space; (Svenson ¶0403; Following human pose estimation for each detected person, the predicted location of the various body parts (faces, hands, arms etc) are matched to the location of nearby tubelets representing the corresponding classes (body parts))
determining a time recording zone from the plurality of time recording zones associated with each of the plurality of distinct bodies based on identifying the location for each of the plurality of distinct bodies; (Svenson ¶0287; context data includes environmental attributes representing the properties of the physical environment of the school, and specifically of the location in which the monitoring apparatus is deployed) and 
after determining the time recording zone associated with each of the plurality of distinct bodies, executing the plurality of distinct automated electronic time recording events based on inputs of (1) the user identifier value, (2) the time recording code associated with each of the plurality of distinct bodies, and (3) the recording zone associated with each of the plurality of distinct bodies. (Svenson ¶0291; the attendance data transmitted to the BPMS include an identifier of a student detected to be entering or leaving the monitored location (e.g., a unique student ID value), and an indication of the time when the student enters or leaves the location.)

Regarding Claim 19, Svenson as modified by Sahashi teaches the method according to claim 11, further comprising: 
after extracting the first portion or the second portion of the body for a first distinct body of the plurality of distinct bodies: 
identifying that the first portion or the second portion of the body for the first distinct body does not satisfy an image resolution threshold; (Svenson ¶0224; flagging a recognition anomaly if there are unrecognised frames and absentees left…¶0434; The accuracy of the action recognition module, as represented by a Mean Average Precision (MAP) value, can be influenced by a variety of factors including, the resolution of each camera 111-11N;)
in response to identifying that the first portion or the second portion of the body for the first distinct body does not satisfy the image resolution threshold: forgoing instantiating and executing the distinct automated user- recognition for the first distinct body or forgoing instantiating and executing the distinct automated time recording-recognition for the first distinct body; (Svenson ¶0307; DAPRC performs identification, and if recognition is under 46%, the DAPRC raises a manual-identification-required event to the biometric profile management system server 130)
forgoing executing the distinct automated electronic time recording event for the first distinct body; and (Svenson ¶0229; if the flagged anomaly has been manually matched by a supervisor, updating the recognition parameters with a strongly weighted moving average)
providing at least a portion of the representation of the time recording space identified via the scene capturing device to a predetermined entity to assess a time recording intent of the first distinct body. (Svenson ¶0229; if the flagged anomaly has been manually matched by a supervisor, updating the recognition parameters with a strongly weighted moving average)

 Regarding Claim 21, Svenson as modified by Sahashi teaches the method of claim 1, wherein the first subset of the plurality of distinct bodies includes a plurality of distinct bodies. (Svenson ¶0122; biometric reference data may be obtained by: [0123] a) detecting the biometric characteristic of a plurality of individuals)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, this action is made final.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Aaron Tutor, whose telephone number is 571-272-3662.  The examiner can normally be reached Monday through Friday, 9 AM to 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nathan Uber, can be reached at 571-270-3923.  The fax number for the organization where this application or proceeding is assigned is 571-273-5266.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free.) If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (in USA or Canada) or 571-272-1000.

/ANT/          Examiner, Art Unit 3687

	
/SANGEETA BAHL/Primary Examiner, Art Unit 3629