DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
Examiner acknowledges applicants’ reply dated September 15, 2021, including arguments and amendments.

Examiner acknowledges applicants’ amendment to claim 1, addressing the typographical error (“to identity”) and the objection that it raised. That objection is withdrawn.

Claims 1 – 5 and 7 – 20 are currently pending.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

s 1 – 5, 7 – 13 and 18 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yokoi, U.S. PG-Pub. No. 2009/0190804 (hereafter, “Yokoi”), in view of Cheung, et al., U.S. PG-Pub. No. 2019/0198057 (hereafter, “Cheung”).

As to Claim 1, Yokoi discloses: a computer-implemented method, comprising:
obtaining, by one or more processors, from a user, via a client, a video to upload to a repository accessible to the one or more processors ([0022], referring to the capability of the system to record and play back video content data, and [0023] having access to video data stored in a storage device);
segmenting, by the one or more processors, the video into temporal shots, wherein the temporal shots comprise a timeline of the video ([0034], “… it is also possible to divide the moving image data into a plurality of partial sections, and each human face image that appears in this partial section may be extracted, for every partial section.”);
cognitively analyzing, by the one or more processors, the video, by applying an image recognition algorithm, to the video, to identify image entities in each temporal shot of the video ([0031] – [0032], referring to facial recognition processing);
cognitively analyzing, by the one or more processors, by applying a data structure to the temporal shots, to identify personal entities in each temporal shot of the video ([0041], referring to the use of a database storing a pair of the face image and e.g. a human name to identify faces extracted from the moving image);
generating, by the one or more processors, a search index for the video, utilizing user entities, wherein the user entities comprise the image entities and the personal entities, wherein each entry of the search index comprises a given user entity, wherein the given user entity is selected from the user entities, and a linkage to a given temporal shot of the temporal shots, wherein the linkage indicates a location of the given user entity in the timeline of the video ([0047] – [0048], referring to the generation of a search index, which includes each instance of a particular face appearing in the video, and a “time zone” corresponding to the time in the video at which that particular face appears in the video, in association with the metadata of that particular face);
searching, by the one or more processors, the video for relevant user entities, wherein the searching comprises utilizing the search index to locate the relevant user entities in the video ([0050], “…when the search index information includes the human name that appears in the scene for every scene, the moving image data search module 203 can search the moving image data associated with the human name input by typing, from the moving image data group to be searched. In addition, the moving image data search module 203 can search each scene associated with the human name input by typing, from each moving image data to be searched.”); and
formulating, responsive to the searching, by the one or more processors, search results, where the search results comprise the relevant user entities and for each relevant user entity, a location of the relevant user entity in the timeline, wherein the location comprises a start time and an end time ([0048], “…as shown in FIG. 4, the search index information #1 includes the information showing that the persons having names N1, N1, N2, N2 appear in the scenes 1, 2, 5, 10 respectively. A data structure of the search index information #1 is not particularly limited, and any data structure can be taken, for example, only by including time information showing the time zone of each scene in which a certain reference face image appears, and by including the human name corresponding to the reference face image that appears in the aforementioned each scene.”).

Yokoi does not appear to explicitly disclose: the data structure comprising a user profile of the user, to automatically add context that assists the image recognition algorithm in 

Cheung discloses: the data structure comprising a user profile of the user ([0002], The user profile may include demographic information, communication-channel information, and information on personal interests of the user. The social-networking system may also, with input from a user, create and store a record of relationships of the user with other users of the social-networking system…”), to automatically add context that assists the image recognition algorithm in identifying the image entities ([0022], referring to the use of user activity to generate a user relevance score associated with an object, said user relevance reading on the claimed “context”);
wherein the search results comprise the automatically added context for the relevant user entities ([0048], “social-networking system 560 may perform particular actions with respect to a user based on coefficient information. Coefficients may be used to predict whether a user will perform a particular action based on the user's interest in the action. A coefficient may be used when generating or presenting any type of objects to a user, such as … search results…”).

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Yokoi and Cheung before him/her, to have modified the data structure application of Yokoi with the user profile from Cheung, in order to populate the face database (data structure) of Yokoi.

As to Claim 2, Yokoi, as modified, discloses: monitoring, by the one or more processors, computing activities performed by the user, via the client, based on the client connecting, Cheung, [0036], referring to tracking users’ interest in a particular concept, possibly across third-party websites);
analyzing, by the one or more processors, the computing activities performed by the user, in the one or more applications, to identify data comprising elements relevant to the user and relationships between the elements and the user (Cheung, [0013], “A relevancy of the identified objects to a particular user may be determined (e.g., calculating a relevance score of the identified objects with respect to the user).”); and
generating, by the one or more processors, based on the analyzing, the data structure, wherein the data structure comprises the user profile (Cheung, [0002], “The social-networking system may, with input from a user, create and store in the social-networking system a user profile associated with the user. The user profile may include demographic information, communication-channel information, and information on personal interests of the user.”).

As to Claim 3, Yokoi, as modified, discloses: applying the user profile further comprises: converting, by the one or more processors, non-textual elements in the video to textual content, for each temporal shot of the temporal shots (Cheung, [0017], “In particular embodiments, identification of an object may comprise information identifying a location of an object in a frame, a description of an object, a label associated with an object…”); and identifying, by the one or more processors, in the textual content of each temporal shot, the elements relevant to the user and the relationships between the elements and the user, wherein the elements comprise the personal entities (Cheung, [0017], “… may identify object 130 as user ‘Jacques,’ object 140 as a tea pot, and object 160 as user ‘Cecile.’”).

Claim 4, Yokoi, as modified, discloses: storing, by the one or more processors, the search index in an indexed repository (Yokoi, Fig. 2, showing the Face Database 111A in communication with the matching processing module).

As to Claim 5, Yokoi, as modified, discloses: obtaining, by the one or more processors, search parameters identifying one or more relevant user entities of the user entities in the search index; identifying, by the one or more processors, the relevant user entities; and wherein the searching comprises accessing the index repository (Yokoi, [0050], “…when the search index information includes the human name that appears in the scene for every scene, the moving image data search module 203 can search the moving image data associated with the human name input by typing, from the moving image data group to be searched. In addition, the moving image data search module 203 can search each scene associated with the human name input by typing, from each moving image data to be searched.”).

As to Claim 7, Yokoi, as modified, discloses: wherein formulating the search results comprises ranking the search results based on relevance to the search parameters (Cheung, [0048], “The coefficient may also be utilized to rank and order such objects, as appropriate. In this way, social-networking system 560 may provide information that is relevant to user's interests and current circumstances, increasing the likelihood that they will find such information of interest.”).

As to Claim 8, Yokoi, as modified, discloses: generating, by the one or more processors, a search deliverable, the generating comprising: obtaining, by the one or more processors, a portion of the temporal shots from the video, wherein each temporal shot of the portion comprises the location of the relevant user entity in the timeline for each Cheung, [0013], “…a highlight may be generated or selected from a video based on object recognition. A highlight of a video may include frames or portions of a video that have a higher likelihood to be interesting, enticing, appealing, or relevant to a particular user.”).

As to Claim 9, Yokoi, as modified, discloses: providing, by the one or more processors, the search deliverable to the user, via the client (Cheung, [0027], referring to presentation of the highlight video to the user).

As to Claim 10, Yokoi, as modified, discloses: assembling the portion of the temporal shots according to the ranking of the search results based on the relevance to the search parameters (Cheung, [0013], “…a highlight may be generated or selected from a video based on object recognition. A highlight of a video may include frames or portions of a video that have a higher likelihood to be interesting, enticing, appealing, or relevant to a particular user.” And [0048], “The coefficient may also be utilized to rank and order such objects, as appropriate.).

As to Claim 11, Yokoi, as modified, discloses: wherein the new video comprises more than one individual new videos, and where the providing of the search deliverable comprises providing links to each of the individual new videos (Cheung, [0027], referring to the varying ways in which the selected portions of the highlight video may be presented to the user).

Claim 12, Yokoi, as modified, discloses: wherein a format of the search parameters are selected from the group consisting of: text, voice, image, and video (Yokoi, [0050], referring to the user typing the search parameter).

As to Claim 13, Yokoi, as modified, discloses: wherein applying the image recognition algorithm comprises accessing an image metadata repository accessible to the one or more processors (Yokoi, Fig. 1, showing the Face Database 111A).

As to Claim 18, Yokoi discloses: prior to generating the search index, generating, by the one or more processors, in a user interface of the client, an interface displaying the personal entities and respective linkages of the personal entities (Yokoi, [0041] – [0046], referring to the generation of the face database 111a by the database registration tool),
, wherein the interface comprises a point of entry by which the user can provide feedback; obtaining, by the one or more processors, the feedback from the user, provided via the interface; and updating, by the one or more processors, the user entities based on the feedback (Cheung, [0044], referring to implementation of user feedback).

As to Claim 19, Yokoi discloses: a computer program product comprising: a computer readable storage medium readable by one or more processors and storing instructions for execution by the one or more processors (Fig. 1, showing a CPU 101 in communication with Main Memory 103) for performing a method comprising:
obtaining, by the one or more processors, from a user, via a client, a video to upload to a repository accessible to the one or more processors ([0022], referring to the capability of the system to record and play back video content data, and [0023] having access to video data stored in a storage device);
[0034], “… it is also possible to divide the moving image data into a plurality of partial sections, and each human face image that appears in this partial section may be extracted, for every partial section.”);
cognitively analyzing, by the one or more processors, the video, by applying an image recognition algorithm, to the video, to identify image entities in each temporal shot of the video ([0031] – [0032], referring to facial recognition processing);
cognitively analyzing, by the one or more processors, by applying a data structure to the temporal shots, to identify personal entities in each temporal shot of the video ([0041], referring to the use of a database storing a pair of the face image and e.g. a human name to identify faces extracted from the moving image);
generating, by the one or more processors, a search index for the video, utilizing user entities, wherein the user entities comprise the image entities and the personal entities, wherein each entry of the search index comprises a given user entity, wherein the given user entity is selected from the user entities, and a linkage to a given temporal shot of the temporal shots, wherein the linkage indicates a location of the given user entity in the timeline of the video ([0047] – [0048], referring to the generation of a search index, which includes each instance of a particular face appearing in the video, and a “time zone” corresponding to the time in the video at which that particular face appears in the video, in association with the metadata of that particular face);
searching, by the one or more processors, the video for relevant user entities, wherein the searching comprises utilizing the search index to locate the relevant user entities in the video ([0050], “…when the search index information includes the human name that appears in the scene for every scene, the moving image data search module 203 can search the moving image data associated with the human name input by typing, from the moving image data group to be searched. In addition, the moving image data search module 203 can search each scene associated with the human name input by typing, from each moving image data to be searched.”); and
formulating, responsive to the searching, by the one or more processors, search results, where the search results comprise the relevant user entities and for each relevant user entity, a location of the relevant user entity in the timeline, wherein the location comprises a start time and an end time ([0048], “…as shown in FIG. 4, the search index information #1 includes the information showing that the persons having names N1, N1, N2, N2 appear in the scenes 1, 2, 5, 10 respectively. A data structure of the search index information #1 is not particularly limited, and any data structure can be taken, for example, only by including time information showing the time zone of each scene in which a certain reference face image appears, and by including the human name corresponding to the reference face image that appears in the aforementioned each scene.”).

Yokoi does not appear to explicitly disclose: the data structure comprising a user profile of the user, to automatically add context that assists the image recognition algorithm in identifying the image entities; wherein the search results comprise the automatically added context for the relevant user entities.

Cheung discloses: the data structure comprising a user profile of the user ([0002], The user profile may include demographic information, communication-channel information, and information on personal interests of the user. The social-networking system may also, with input from a user, create and store a record of relationships of the user with other users of the social-networking system…”), to automatically add context that assists the image recognition algorithm in identifying the image entities ([0022], referring to the use of user activity to generate a user relevance score associated with an object, said user relevance reading on the claimed “context”);
wherein the search results comprise the automatically added context for the relevant user entities ([0048], “social-networking system 560 may perform particular actions with respect to a user based on coefficient information. Coefficients may be used to predict whether a user will perform a particular action based on the user's interest in the action. A coefficient may be used when generating or presenting any type of objects to a user, such as … search results…”).

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Yokoi and Cheung before him/her, to have modified the data structure application of Yokoi with the user profile from Cheung, in order to populate the face database (data structure) of Yokoi.

As to Claim 20, Yokoi discloses: a system comprising: a memory; one or more processors in communication with the memory; program instructions executable by the one or more processors via the memory to perform a method (Fig. 1, showing a CPU 101 in communication with Main Memory 103), the method comprising: 
obtaining, by the one or more processors, from a user, via a client, a video to upload to a repository accessible to the one or more processors ([0022], referring to the capability of the system to record and play back video content data, and [0023] having access to video data stored in a storage device);
segmenting, by the one or more processors, the video into temporal shots, wherein the temporal shots comprise a timeline of the video ([0034], “… it is also possible to divide the moving image data into a plurality of partial sections, and each human face image that appears in this partial section may be extracted, for every partial section.”);
[0031] – [0032], referring to facial recognition processing);
cognitively analyzing, by the one or more processors, by applying a data structure to the temporal shots, to identify personal entities in each temporal shot of the video ([0041], referring to the use of a database storing a pair of the face image and e.g. a human name to identify faces extracted from the moving image);
generating, by the one or more processors, a search index for the video, utilizing user entities, wherein the user entities comprise the image entities and the personal entities, wherein each entry of the search index comprises a given user entity, wherein the given user entity is selected from the user entities, and a linkage to a given temporal shot of the temporal shots, wherein the linkage indicates a location of the given user entity in the timeline of the video ([0047] – [0048], referring to the generation of a search index, which includes each instance of a particular face appearing in the video, and a “time zone” corresponding to the time in the video at which that particular face appears in the video, in association with the metadata of that particular face);
searching, by the one or more processors, the video for relevant user entities, wherein the searching comprises utilizing the search index to locate the relevant user entities in the video ([0050], “…when the search index information includes the human name that appears in the scene for every scene, the moving image data search module 203 can search the moving image data associated with the human name input by typing, from the moving image data group to be searched. In addition, the moving image data search module 203 can search each scene associated with the human name input by typing, from each moving image data to be searched.”); and
formulating, responsive to the searching, by the one or more processors, search results, where the search results comprise the relevant user entities and for each relevant user [0048], “…as shown in FIG. 4, the search index information #1 includes the information showing that the persons having names N1, N1, N2, N2 appear in the scenes 1, 2, 5, 10 respectively. A data structure of the search index information #1 is not particularly limited, and any data structure can be taken, for example, only by including time information showing the time zone of each scene in which a certain reference face image appears, and by including the human name corresponding to the reference face image that appears in the aforementioned each scene.”).

Yokoi does not appear to explicitly disclose: the data structure comprising a user profile of the user, to automatically add context that assists the image recognition algorithm in identifying the image entities; wherein the search results comprise the automatically added context for the relevant user entities.

Cheung discloses: the data structure comprising a user profile of the user ([0002], The user profile may include demographic information, communication-channel information, and information on personal interests of the user. The social-networking system may also, with input from a user, create and store a record of relationships of the user with other users of the social-networking system…”), to automatically add context that assists the image recognition algorithm in identifying the image entities ([0022], referring to the use of user activity to generate a user relevance score associated with an object, said user relevance reading on the claimed “context”);
wherein the search results comprise the automatically added context for the relevant user entities ([0048], “social-networking system 560 may perform particular actions with respect to a user based on coefficient information. Coefficients may be used to predict whether a user will perform a particular action based on the user's interest in the action. A coefficient may be used when generating or presenting any type of objects to a user, such as … search results…”).

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Yokoi and Cheung before him/her, to have modified the data structure application of Yokoi with the user profile from Cheung, in order to populate the face database (data structure) of Yokoi.

Claims 14 – 17 are rejected under 35 U.S.C. 103 as being unpatentable over Yokoi, as modified by Cheung and applied to claim 1, further in view of Cheng, et al., U.S. PG-Pub. No. 2016/0004911 (hereafter, “Cheng”).

As to Claim 14, Yokoi, as modified by Cheung, does not appear to explicitly disclose: wherein the non-textual elements comprise speech and audio, and wherein converting the elements comprises applying a speech to text processing algorithm to produce the textual content.

Cheng discloses: wherein the non-textual elements comprise speech and audio, and wherein converting the elements comprises applying a speech to text processing algorithm to produce the textual content ([0017], “Any video of the input 102 may include or have associated therewith an audio soundtrack (which may include speech and/or non-speech audio), and/or a speech transcript, where the speech transcript may be generated by, for example, an automated speech recognition (ASR) module of the computing system 100.”).



As to Claim 15, Yokoi, as modified by Cheung, does not appear to explicitly disclose: wherein the non-textual elements comprise embedded text in images comprising the video, wherein converting the elements comprises executing an optical character recognition process on the embedded text to convert the embedded text to the textual content, wherein the one or more applications comprise a social media website, and wherein the elements relevant to the user comprise images posted by the user on a social media website and tags associated with the images.

Cheng discloses: wherein the non-textual elements comprise embedded text in images comprising the video, wherein converting the elements comprises executing an optical character recognition process on the embedded text to convert the embedded text to the textual content, wherein the one or more applications comprise a social media website, and wherein the elements relevant to the user comprise images posted by the user on a social media website and tags associated with the images ([0047], referring to the use of OCR to recognize text present in a visual scene of a video and provide the recognized text to the text feature detection module).



As to Claim 16, Yokoi, as modified by Cheung, discloses: extracting, by the one or more processors, from a search index of a video, user entities comprising the search index of the other video (Yokoi, [0047] – [0048], referring to the generation of a search index, which includes each instance of a particular face appearing in the video, and a “time zone” corresponding to the time in the video at which that particular face appears in the video, in association with the metadata of that particular face);
searching, by the one or more processors, the video, for the user entities comprising the search index of the video; and locating, by the one or more processors, a portion of the user entities comprising the search index of the video in the video (Yokoi, [0050], “…when the search index information includes the human name that appears in the scene for every scene, the moving image data search module 203 can search the moving image data associated with the human name input by typing, from the moving image data group to be searched. In addition, the moving image data search module 203 can search each scene associated with the human name input by typing, from each moving image data to be searched.”).

Yokoi, as modified by Cheung, does not appear to explicitly disclose: prior to generating the search index, determining, by the one or more processors, a classification for the video, wherein obtaining the video from the user, via the client, further comprises 

Cheng discloses: prior to generating the search index, determining, by the one or more processors, a classification for the video, wherein obtaining the video from the user, via the client, further comprises obtaining the classification, from the user via the client ([0019], referring to the function of the multimedia content understanding module 104, which categorizes content);
identifying, by the one or more processors, in the repository, another video uploaded by the user, wherein the classification of the other video is equivalent to the classification of the video ([0019] – [0032], referring to the system using categorization to further identify salient activities that are prevalent in content of that particular category in other videos of that category);

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Yokoi, Cheung, and Cheng before him/her, to have further modified the combination of Yokoi and Cheung, by modifying the relevance and interest tracking features from Cheung with the video categorization from Cheng, in order to better populate the user profile, as suggested in Cheung at [0045]: “In particular embodiments, social-networking system 560 may calculate a coefficient based on the user's actions with particular types of content. The content may be associated with the online social network, a third-party system 570, or another suitable system. The content may include users, profile pages, posts, news stories, headlines, instant messages, chat room conversations, emails, advertisements, pictures, video, music, other suitable objects, or any combination thereof. Social-networking system 560 may 

As to Claim 17, Yokoi, as further modified, discloses: wherein the user entities further comprise the portion of the user entities (Cheung, [0045], “Social-networking system 560 may analyze a user's actions to determine whether one or more of the actions indicate an affinity for subject matter, content, other users, and so forth.”).

Response to Arguments
Applicant's arguments filed September 15, 2021, have been fully considered but they are not persuasive. Accordingly, THIS ACTION IS MADE FINAL.

	Applicants argue that Yokoi and Cheung do not adequately render obvious the limitation of the independent claims directed to using user profile data to add context that assists the image recognition algorithm. Examiner respectfully disagrees. The Specification, at [0007], applying the user profile comprises monitoring user activities to identify data elements relevant to the user. This aspect of the invention is disclosed by Cheung at [0022], in reference to the user relevance score. For this reason, examiner maintains the rejection of the independent claims.

	Applicants’ arguments regarding the dependent claims rely on the preceding arguments, and are considered addressed by the above.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NIRAV K KHAKHAR whose telephone number is (571)270-1004. The examiner can normally be reached Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Robert W Beausoliel, Jr. can be reached on 571-272-3645. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NIRAV K KHAKHAR/Examiner, Art Unit 2167

/ROBERT W BEAUSOLIEL JR/Supervisory Patent Examiner, Art Unit 2167