DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 7/19/2021 has been entered.
 
Response to Amendment
3.	The Amendment filed 7/19/2021 has been entered. Claims 2 and 9 have been canceled. Claims 1, 5, 8 and 12 have been amended. Claims 1, 5-8 and 12-14 remain pending in the application. 

Response to Arguments
4.	Applicant’s arguments with respect to Claims 1, 5-8 and 12-14 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention 

5.	Claims 1, 7-8 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Peter et al. (US 20210076002 A1) in view of Schlesinger et al. (US 20160227277 A1) and further in view of Von et al. (US 20160080835 A1).
Regarding claim 1, Peter teaches a method for evaluating social intelligence (method of Fig. 2B, see [0320]), comprising: 
segmenting, based on behavior recognition, an observation video sequence that captures social interaction behavior of a target to be evaluated, thereby creating multiple segmented video clips (For example, instead of analyzing video frames at 30 frames per second (fps), the client software can analyze video data at 10 fps (e.g., using only ever third frame for 30 fps capture). As another example the system could forgo the micro-expression analysis on certain device types (e.g., mobile phones), so that either the micro-expression analysis is performed by the server based on compressed video or is omitted altogether, see [0172]); and 
 calculating an evaluation score based on automatically calculated similarities between a reference video clip group of ground truth reference video clips (The moderator module 20 can also determine the similarity between the vector of collaboration factor scores 140 for the current participant at the current time relative to the different reference vectors, see [0077] and the table 2053 includes scores that indicate the different effects that different types of content e.g., images, video clips, text, etc, se [0371]), which are created based on social interaction analysis, and an observed video clip group of the multiple segmented video clips, thereby evaluating social intelligence of the target (The system can maintain profiles that represent different complex emotions or mental states, where each profile indicates a corresponding combination of emotion scores and potentially a pattern in which the scores change are maintained over time. The system compares the series of emotion data e.g., a time 
wherein the evaluation score is calculated by applying a score for each Evaluation of Social Interaction (ESI) item (The system can produce a vector having a score for each of various different emotions. For example, for the seven basic emotions, each can be scored on a scale of 0 to 100 where 100 is the most intense, resulting in a vector with a score of 20 for happiness, 40 for disgust, 15 for anger, and so on, see [0191]) and a weight for specific behavior to each of the automatically calculated similarities, the score for each ESI item being set based on an ESI scenario, and the weight for specific behavior being set based on specific behavior items (The composite score may be a selective combination of one or more of the raw trait scores. Each raw trait score may be equally or differently weighted depending on the overall group composite score and/or scenario, see [0090]).
However, Peter does not teach wherein the automatically calculated similarities are calculated by sequentially comparing the multiple segmented video clips to the ground truth reference video clips through comparison of content of the video clips and comparison of a context of content that precedes and follows the video clips.
In an analogous art, Schlesinger teaches wherein the automatically calculated similarities are calculated by sequentially comparing the multiple segmented video clips to the ground truth reference video clips (A baseline may be measured so as to determine the success rate of the set of rules in comparison to the success rate of other sets of rules, see [0080] and such a multivariate test may be used to, for example, compare the result of applying specific focus testing rules to the identified category to a baseline to determine the successfulness of the applied focus testing rules, see [0084]) through comparison of content of the video clips and comparison of a context of content that precedes 
Therefore, it would have been obvious to one of ordinary skill in the art to, at the time of the claimed invention, to have modified the behavior detection of Peter with the video clip attention of Schlesinger to provide a system and a method for optimizing content to be inserted into a web object as suggested, see Schlesinger [0010].
However, Peter and Schlesinger do not teach wherein the ground truth reference video clips correspond to multiple verification video clips that are created by classifying an input video sequence pertaining to social interaction based on the specific behavior items of the ESI scenario.
In an analogous art, Von teaches wherein the ground truth (e.g. baseline feature set, see [0143]) reference video clips correspond to multiple verification video clips that are created by classifying an input video sequence (The processing logic may train an algorithm e.g., create the baseline feature set based on the selected features. To train the algorithm, the processing logic may iteratively analyze the video data to identify which types of video clips are most likely to be interesting for inclusion in a compilation video, see [0143], create and validate a machine-learned algorithm, select an algorithm, perform final classification, see step 1015 to 1025 of Fig. 10) pertaining to social interaction based on the specific behavior items of the ESI scenario (Thus, the baseline feature set specifies that video clips that include the face of that particular person are to be included in the compilation video, see [0081]).
Therefore, it would have been obvious to one of ordinary skill in the art to, at the time of the claimed invention, to have modified the behavior detection of Peter and the video clip attention of Schlesinger with the baseline creation of Von to provide a system and a method for generating baseline 
Regarding claim 7, Peter as modified by Schlesinger and Von teaches the method of claim 1, wherein creating the multiple segmented video clips is configured to segment the observation video sequence into the multiple segmented video clips by performing behavior recognition based on at least one of an object detection function, an object-tracking function, and a gesture recognition function (For example, the system can calculate during the communication session and store, for each participant, data such as: … time-stamped data indicating the detected occurrence of gestures, specific facial expressions, micro-expressions, vocal properties, speech recognition results, etc.; extracted features from images or video, such as scores for the facial action coding system; and so on, see Peter [0162]).

Regarding claim 8, Peter teaches an apparatus for evaluating social intelligence (system of Fig. 2A), comprising: 
a processor (e.g. processor 30) for creating multiple segmented video clips by segmenting, based on behavior recognition, an observation video sequence that captures social interaction behavior of a target to be evaluated (For example, instead of analyzing video frames at 30 frames per second (fps), the client software can analyze video data at 10 fps (e.g., using only ever third frame for 30 fps capture). As another example the system could forgo the micro-expression analysis on certain device types (e.g., mobile phones), so that either the micro-expression analysis is performed by the server based on compressed video or is omitted altogether, see [0172]), for calculating an evaluation score based on automatically calculated similarities between a reference video clip group of ground truth reference video clips (The moderator module 20 can also determine the similarity between the vector of collaboration factor scores 140 for the current participant at the current time relative to the different reference vectors, see [0077]; the table 2053 includes scores that indicate the different effects that video clip group of the multiple segmented video clips, and for evaluating social intelligence of the target (The system can maintain profiles that represent different complex emotions or mental states, where each profile indicates a corresponding combination of emotion scores and potentially a pattern in which the scores change are maintained over time. The system compares the series of emotion data e.g., a time series of emotion score vectors, occurrence or sequence of micro-expressions detected, etc. with the profiles to determine whether and to what degree each person matches the profile, see [0119]; the table 2053 includes scores that indicate the different effects that different types of content e.g., images, video clips, text, etc, se [0371]); and
 memory (e.g. memory 32) for storing the ground truth reference video clips (The system can maintain profiles that represent different complex emotions or mental states, where each profile indicates a corresponding combination of emotion scores and potentially a pattern in which the scores change are maintained over time, see [0119]), 
wherein the evaluation score is calculated by applying a score for each Evaluation of Social Interaction (ESI) item (The system can produce a vector having a score for each of various different emotions. For example, for the seven basic emotions, each can be scored on a scale of 0 to 100 where 100 is the most intense, resulting in a vector with a score of 20 for happiness, 40 for disgust, 15 for anger, and so on, see [0191]) and a weight for specific behavior to each of the automatically calculated similarities, the score for each ESI item being set based on an ESI scenario, and the weight for specific behavior being set based on specific behavior items (The composite score may be a selective combination of one or more of the raw trait scores. Each raw trait score may be equally or differently weighted depending on the overall group composite score and/or scenario, see [0090]).
 However, Peter does not teach wherein the automatically calculated similarities are calculated by sequentially comparing the multiple segmented video clips to the ground truth reference video clips 
In an analogous art, Schlesinger teaches wherein the automatically calculated similarities are calculated by sequentially comparing the multiple segmented video clips to the ground truth reference video clips (A baseline may be measured so as to determine the success rate of the set of rules in comparison to the success rate of other sets of rules, see [0080] and such a multivariate test may be used to, for example, compare the result of applying specific focus testing rules to the identified category to a baseline to determine the successfulness of the applied focus testing rules, see [0084]) through comparison of content of the video clips and comparison of a context of content that precedes and follows the video clips (The determination of close categories may be based on matching between the identified category and a plurality of categories associated with existing sets of focus rules. The category matching may include comparing video clips of the identified category with video clips of the categories having existing focus rules. Comparing video clips may include, but is not limited to, comparing file names, metadata, audio, and/or video content contained therein, see [0081]).
Therefore, it would have been obvious to one of ordinary skill in the art to, at the time of the claimed invention, to have modified the behavior detection of Peter with the video clip attention of Schlesinger to provide a system and a method for optimizing content to be inserted into a web object as suggested, see Schlesinger [0010].
However, Peter and Schlesinger do not teach wherein the ground truth reference video clips correspond to multiple verification video clips that are created by classifying an input video sequence pertaining to social interaction based on the specific behavior items of the ESI scenario.
In an analogous art, Von teaches wherein the ground truth (e.g. baseline feature set, see [0143]) reference video clips correspond to multiple verification video clips that are created by classifying an input video sequence (The processing logic may train an algorithm e.g., create the baseline feature set ) pertaining to social interaction based on the specific behavior items of the ESI scenario (Thus, the baseline feature set specifies that video clips that include the face of that particular person are to be included in the compilation video, see [0081]).
Therefore, it would have been obvious to one of ordinary skill in the art to, at the time of the claimed invention, to have modified the behavior detection of Peter and the video clip attention of Schlesinger with the baseline creation of Von to provide a system and a method for generating baseline feature set based on the received indicia of interestingness for better analysis of video content as suggested, see Van [0007].

Regarding claim 14, Peter as modified by Schlesinger and Von teaches the apparatus of claim 8, wherein the processor segments the observation video sequence into the multiple segmented video clips by performing behavior recognition based on at least one of an object detection function, an object-tracking function, and a gesture recognition function (For example, the system can calculate during the communication session and store, for each participant, data such as: … time-stamped data indicating the detected occurrence of gestures, specific facial expressions, micro-expressions, vocal properties, speech recognition results, etc.; extracted features from images or video, such as scores for the facial action coding system; and so on, see Peter [0162]).

6.	Claims 5-6 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Peter in view of Schlesinger and further in view of Von and Inoue (US 20160012248 A1).
Regarding claim 5, Peter as modified by Schlesinger and Von teaches the method of claim 1.

In an analogous field of endeavor, Inoue teaches wherein the similarities are measured using cosine similarity between feature information extracted from the multiple segmented video clips and feature information extracted from the multiple verification video clips (the comparison unit 204 compares the face feature quantity read in step S2 to the face feature quantity selected in step S3, and calculates similarity therebetween (step S4). For example, when the face feature quantity is represented by a multidimensional vector, the calculation of the similarity can be represented using its cosine distance, see [0054] and the feature quantity extraction unit 202 extracts a face region from the image data, see [0036]).
Therefore, it would have been obvious to one of ordinary skill in the art to, at the time of the claimed invention, to have modified the behavior detection of Peter and the video analysis of Schlesinger and the baseline of Von with the image verification of Inoue to provide a method for acquiring an activity situation of a user using a sensor of a terminal and displaying the activity situation on a terminal of another user as suggested.

Regarding claim 6, Peter as modified by Schlesinger, Von and Inoue teaches the method of claim 1, and Inoue further teaches wherein the feature information is behavior recognition information and facial expression recognition information, which are extracted from image data (The feature quantity extraction unit 202 extracts a face region from the image data received from the information communication terminal 100 and extracts a feature quantity from the extracted face region, see Inoue [0036] and a type of information to be disclosed such as contact information, schedule information, or hobby information among personal information may be controlled based on clothing or a facial 
Therefore, it would have been obvious to one of ordinary skill in the art to, at the time of the claimed invention, to have modified the behavior detection of Peter and the video analysis of Schlesinger and the baseline of Von with the image verification of Inoue to provide a method for acquiring an activity situation of a user using a sensor of a terminal and displaying the activity situation on a terminal of another user as suggested.

Regarding claim 12, Peter as modified by Schlesinger and Von teaches the apparatus of claim 8.
However, Peter, Schlesinger and Von do not clearly teach wherein the similarities are measured using cosine similarity between feature information extracted from the multiple segmented video clips and feature information extracted from the multiple verification video clips.
In an analogous field of endeavor, Inoue teaches wherein the similarities are measured using cosine similarity between feature information extracted from the multiple segmented video clips and feature information extracted from the multiple verification video clips (the comparison unit 204 compares the face feature quantity read in step S2 to the face feature quantity selected in step S3, and calculates similarity therebetween (step S4). For example, when the face feature quantity is represented by a multidimensional vector, the calculation of the similarity can be represented using its cosine distance, see [0054] and the feature quantity extraction unit 202 extracts a face region from the image data, see [0036]).

Regarding claim 13, Peter as modified by Schlesinger, Von and Inoue teaches the apparatus of claim 12, and Inoue further teaches wherein the feature information is behavior recognition information and facial expression recognition information, which are extracted from image data (The feature quantity extraction unit 202 extracts a face region from the image data received from the information communication terminal 100 and extracts a feature quantity from the extracted face region, see Inoue [0036] and a type of information to be disclosed such as contact information, schedule information, or hobby information among personal information may be controlled based on clothing or a facial expression included in the image data, see [0069]), and conversation information and emotion recognition information, which are extracted from sound data (The feature quantity extraction unit 202 extracts a voice feature quantity from voice data, see [0078] and Specifically, there are provided a feature quantity extraction unit configured to extract a feature quantity representing the emotion from the voice data, see [0096]).
Therefore, it would have been obvious to one of ordinary skill in the art to, at the time of the claimed invention, to have modified the behavior detection of Peter and the video analysis of Schlesinger and the baseline of Von with the image verification of Inoue to provide a method for acquiring an activity situation of a user using a sensor of a terminal and displaying the activity situation on a terminal of another user as suggested.

Conclusion
Tiranoff et al. (US 20180279936 A1) discloses a method of assessing a behavioral abnormality in the test subject. The normal and abnormal patterns for a given behavior can be compared to a test subject performing the same behavior depicted in the video library, to permit a medical professional to assess the extent of a behavioral abnormality in the test subject.
8.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE M LOUIS-FILS whose telephone number is (571)270-0671.  The examiner can normally be reached on Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on 571-272-7904.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.