DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in China on 10/28/2019. It is noted, however, that applicant has not filed a certified copy of the 201911033267 application as required by 37 CFR 1.55.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claim 1-2, 5, 8-9, 12, 15 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Amer U.S. Patent Application 20190304157 in view of He U.S. Patent Application 20140192212, and further in view of Ai U.S. Patent Application 20170256288.
Regarding claim 8, Amer discloses a system of clipping a video, comprising: 
at least one processor (processor 313); and 
at least one memory (storage devices 320) communicatively coupled to the at least one processor and storing instructions that upon execution by the at least one processor cause the system to perform operations (paragraph [0123]: One or more processors 313 may execute instructions and one or more storage devices 320 may store instructions and/or data of one or 
obtaining a video, the video comprising a plurality of frames (paragraph [0127]: Graph module 322 may perform a number of instantiations such as… manipulating multiple cameras and capturing video); 
performing object detection on each of the plurality of frames; identifying objects contained in each of the plurality of frames, wherein a region where each object is located is selected through a detection box (paragraph [0162]: FIG. 6A and FIG. 6B, may operate on videos that have been processed by extracting visual abstractions such as objects, parts and their spatial configurations. Processing of videos may be performed by detectors for objects, parts and relations resulting in a set of video abstractions where human body joints are accurately localized in 3D using motion capture devices; paragraph [0097]: bounding boxes and polygons can be created to mark different on-screen entities, while landmark points can be used to represent skeletal and facial feature data); 
classifying and recognizing the objects identified in each of the plurality of frames using a classification model, the classification model being pre-trained (paragraph [0152]: The RNN is a skipthought vectors RNN pre-trained using a data set (e.g., the BookCorpus dataset.) In some examples, the CNN-RNN discriminative ranking model is trained jointly for action classification and retrieval of human activity videos; paragraph [0162]: Such a system may pre-process the joint angles into an exponential map representation and filter out activity that spans less than 8 seconds, and may be trained to spot query activity in a sliding window of 8 seconds); 
selecting human body region images based on the classifying and recognizing the objects; determining a similarity between each of the human body region images selected from the plurality of frames and a target character image (paragraph [0157]: To find closest matches in the video database, comparison module 630 may use a CNN-CNN similarity function 
Amer discloses all the features with respect to claim 8 as outlined above. However, Amer fails to disclose in response to determining that a similarity between a human body region image among the human body region images and the target character image is greater than a predetermined threshold, identifying the human body region image as a clipping image; and synthesizing clipping images identified in the plurality of frames in order of time to obtain a clipping video.
He discloses in response to determining that a similarity between a human body region image among the human body region images and the target character image is greater than a predetermined threshold, identifying the human body region image as a clipping image (paragraph [0155]: a calculating unit configured to obtain similarity between an edge area of the second image and a position that accommodates the second image in the first image; a synthesizing unit configured to synthesize the second image with the first image if the similarity is greater than or equal to a threshold; paragraph [0094]: a human face image of the user is obtained by using the camera, and then the area of the human face image is classified as a prominent area; paragraph [0145]: The embodiment of the present invention is applicable to a digital camera, a digital video camera… and may be applied in scenarios such as television, movies).

Amer as modified by He discloses all the features with respect to claim 8 as outlined above. However, Amer as modified by He fails to disclose synthesizing clipping images identified in the plurality of frames in order of time to obtain a clipping video. 
Ai discloses synthesizing clipping images identified in the plurality of frames in order of time to obtain a clipping video (paragraph [0238]: At step S206, the selected video clips may be synthesized into a video file. When synthesizing the video file, the video clips may be sorted in a sequence of a time in the target video).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Amer and He’s to synthesize video as taught by Ai, to generate good quality video.

Regarding claim 9, Amer as modified by He and Ai discloses the system of claim 8, the operations further comprising: 
performing the object detection on each of the plurality of frames using a pre-trained object detection model to identify the objects contained in each of the plurality of frames (Amer’s paragraph [0152]: The RNN is a skipthought vectors RNN pre-trained using a data set (e.g., the BookCorpus dataset.) In some examples, the CNN-RNN discriminative ranking model is trained jointly for action classification and retrieval of human activity videos; paragraph [0162]: Such a system may pre-process the joint angles into an exponential map representation and filter out activity that spans less than 8 seconds, and may be trained to spot query activity in a sliding window of 8 seconds; paragraph [0187]: graph module 322 may identify an actor (901A) mapped to the “who” portion of the event frame, identify an action (901B) mapped to the “did 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Amer’s to determine similarity based on threshold as taught by He, to synthesize images smoothly; and combine Amer and He’s to synthesize video as taught by Ai, to generate good quality video.

Regarding claim 12, Amer as modified by He and Ai discloses the system of claim 8, wherein the identifying the human body region image as a clipping image in response to determining that a similarity between a human body region image and the target character image is greater than a predetermined threshold further comprises: 
setting a clipping box based on a detection box corresponding to the human body region image, wherein the clipping box includes the human body region image and the similarity correspond to the human body region image (He’s paragraph [0094]: a human face image of the user is obtained by using the camera, and then the area of the human face image is classified as a prominent area (H), and the remaining area is classified as a non-prominent area (L), as shown in FIG. 6; paragraph [0022]: cropping the image obtained by photographing according to a set scale; Amer’s paragraph [0157]: To find closest matches in the video database, comparison module 630 may use a CNN-CNN similarity function between videos 506 in video data store 505 and one or more animations 620); and 
identifying the clipping box including the human body region image and the corresponding similarity, and selecting the human body region image with the similarity being greater than the predetermined threshold in the clipping box as the clipping image (He’s paragraph [0155]: a calculating unit configured to obtain similarity between an edge area of the second image and a position that accommodates the second image in the first image; a synthesizing unit configured to synthesize the second image with the first image if the similarity 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Amer’s to determine similarity based on threshold as taught by He, to synthesize images smoothly; and combine Amer and He’s to synthesize video as taught by Ai, to generate good quality video.

Claim 1 recites the functions of the apparatus recited in claim 8 as method steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 8 applies to the method steps of claim 1. 
Claim 2 recites the functions of the apparatus recited in claim 9 as method steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 9 applies to the method steps of claim 2.
Claim 5 recites the functions of the apparatus recited in claim 12 as method steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 12 applies to the method steps of claim 5.

Claim 15 recites the functions of the apparatus recited in claim 8 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 8 applies to the medium steps of claim 15. 
Claim 18 recites the functions of the apparatus recited in claim 12 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 12 applies to the medium steps of claim 18.

Claim 3, 10 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Amer U.S. Patent Application 20190304157 in view of He U.S. Patent Application 20140192212, in view of Ai U.S. Patent Application 20170256288, and further in view of Chen U.S. Patent Application 20160188715, in view of Zhang U.S. Patent Application 20180260665.
Regarding claim 10, Amer as modified by He and Ai discloses classifying images to be processed using a sample character image as a reference object (Amer's paragraph [0157]: To find closest matches in the video database, comparison module 630 may use a CNN-CNN similarity function between videos 506 in video data store 505 and one or more animations 620; paragraph [0160]: video clips from a data store of videos (e.g., data store 505) are matched with each of the generated exemplar video clips using a CNN-CNN similarity function, thereby producing a number (e.g., 30) scores representative of the degree to which each video matches the exemplars; paragraph [0187]: graph module 322 may identify an actor (901A) mapped to the “who” portion of the event frame, identify an action (901B) mapped to the “did what” portion of the event frame, identify an object (901C) mapped to the “to whom” portion of the event frame, identify a location (901D) mapped to the “where” portion of the event frame). However, Amer as modified by He and Ai fails to disclose identifying an image among the images with a same category as the sample character image as positive sample data, and identifying another image among the images with a different category from the sample character image as negative sample data; and adjusting an inter-class distance between the positive sample data and the negative sample data based on Triplet loss to enlarge a difference between the positive sample data and the negative sample data.
Chen discloses identifying an image among the images with a same category as the sample character image as positive sample data, and identifying another image among the images with a different category from the sample character image as negative sample data (paragraph [0028]: the searching module calculates a similarity between a frame of the divided 
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine Amer, He and Ai’s to identify image as taught by Chen, to facilitate searching for video clips.
Amer as modified by He, Ai and Chen discloses all the features with respect to claim 10 as outlined above. However, Amer as modified by He, Ai and Chen fails to disclose adjusting an inter-class distance between the positive sample data and the negative sample data based on Triplet loss to enlarge a difference between the positive sample data and the negative sample data.
Zhang discloses adjusting an inter-class distance between the positive sample data and the negative sample data based on Triplet loss to enlarge a difference between the positive sample data and the negative sample data (paragraph [0042]: triplet loss is used to optimize the training process for the Teacher multi-CNN network, such that features are recognized that minimize the differences between consumer images and reference images of the same type of pill, while maximizing the differences between consumer images and reference images of different types of pills).
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine Amer, He, Ai and Chen’s to use triplet loss as taught by Zhang, to provide a simple to use, and highly accurate recognition system, reduce cost.

Claim 3 recites the functions of the apparatus recited in claim 10 as method steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 10 applies to the method steps of claim 3.

Claim 16 recites the functions of the apparatus recited in claim 10 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 10 applies to the medium steps of claim 16. 

Claim 4, 11 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Amer U.S. Patent Application 20190304157 in view of He U.S. Patent Application 20140192212, in view of Ai U.S. Patent Application 20170256288, and further in view of Vemulapalli U.S. Patent Application 20190005313.
Regarding claim 11, Amer as modified by He and Ai discloses extracting multiple first feature vectors of each of the human body region images to obtain an n-dimensional first feature vector; extracting multiple second feature vectors of the target character image to obtain an m-dimensional second feature vector; wherein n≤m, and both n and m are positive integers (Amer’s paragraph [0119]: Animation module 324 may, for example, project three-dimensional deictic vectors from the deixis and gaze onto a display (e.g., output device 317) illustrating an animation. Animation module 324 may enable identification of objects of reference and objects of attention within the animation or within the subject's space, thereby enabling efficient resolution of references to specific objects and locations within the actual space near the subject or within an animation to place actors, props, and/or other objects where the subject or user wants the objects to be placed). However, Amer as modified by He and Ai fails to disclose determining a Euclidean distance between the first feature vector and the second feature vector, the Euclidean distance being indicative of the similarity between each of the human body region images and a target character image.

Therefore, it would be obvious before the effective filing date of the claimed invention to combine Amer, He and Ai’s to calculate Euclidean distance as taught by Vemulapalli, to determine similarity.

Claim 4 recites the functions of the apparatus recited in claim 11 as method steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 11 applies to the method steps of claim 4.

Claim 17 recites the functions of the apparatus recited in claim 11 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 11 applies to the medium steps of claim 17. 

Claim 6, 13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Amer U.S. Patent Application 20190304157 in view of He U.S. Patent Application 20140192212, in view of Ai U.S. Patent Application 20170256288, and further in view of Kawano U.S. Patent Application 20140341427.
Regarding claim 13, Amer as modified by He and Ai discloses all the features with respect to claim 12 as outlined above. However, Amer as modified by He and Ai fails to disclose determining a moving speed of the detection box, wherein the moving speed of the detection 
Kawano discloses determining a moving speed of the detection box, wherein the moving speed of the detection box is an average speed of the detection box in a unit frame; and identifying the moving speed of the detection box as a moving speed of the clipping box (paragraph [0036]: The human body detection/tracking unit 202 performs pattern matching processing to detect a human body from the captured image of the surveillance region created by the image capturing unit 201, and adds unique human body tracking ID to a human body identified from a positional relationship between frames to perform human body tracking processing; paragraph [0037]: tracking information including human body tracking ID unique to each human body, center point coordinates in an image, a width/height and a size of a bounding box, and a moving speed).
Therefore, it would be obvious before the effective filing date of the claimed invention to combine Amer, He and Ai’s to determine moving speed as taught by Kawano, to efficiently recognize a specific person from image.

Claim 6 recites the functions of the apparatus recited in claim 13 as method steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 13 applies to the method steps of claim 6.

Claim 19 recites the functions of the apparatus recited in claim 13 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the apparatus in claim 13 applies to the medium steps of claim 19. 

Allowable Subject Matter

Claim 7, 14 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
The following is a statement of reasons for the indication of allowable subject matter:  
Claim 7, 14 and 20 are about the determining a moving speed of the detection box further comprise: 
determining whether a distance between center points of the detection box in adjacent frames among the plurality of frames is greater than a predetermined distance value; and identifying the average speed of the detection box in the unit frame as the moving speed of the detection box in response to determining that the distance between the center points of the detection box in the adjacent frames is greater than the predetermined distance value.
Amer 20190304157, He 20140192212, Ai 20170256288, Kawano 20140341427 and Mannino 20160328856 combined cannot teach these features perfectly. These limitations when read in light of the rest of the limitations in the claim and the claims to which it depends make the claim allowable subject matter.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Yi Yang whose telephone number is (571)272-9589.  The examiner can normally be reached on Monday-Friday 9:00 AM-6:00 PM EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/YI YANG/
Examiner, Art Unit 2616