DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/21/2022 has been entered.
Response to Amendment
The amendments, filed 3/21/2022, have been entered and made of record. Claims 1, 8, and 15 have been amended. Claims 1-20 are pending.
Response to Arguments
Applicant’s arguments in the Remarks filed on 3/21/2022 have been considered but are moot in view of the new ground(s) of rejection.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Cooper in view of Brauckmann and Anders
Claims 1-3, 8, 9, 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Cooper et al.(USPubN 2007/0153125; hereinafter Cooper) in view of Brauckmann et al.(USPubN 2017/0294213; hereinafter Brauckmann) further in view of Anders et al.(USPubN 2020/0090067; hereinafter Anders).
As per claim 1, Cooper teaches a method comprising: retrieving, by one or more processors, a video(“receiving audio video information 201” in Para.[0052], “a dedicated processor” in Para.[0065]); 
extracting, by the one or more processors, an audio portion of the video to create an audio file; extracting, by the one or more processors, a video portion of the video to create a video file(“separately extracting the audio information and the video information 203” in Para.[0052], Fig. 4); 
analyzing, by the one or more processors, the audio file to create an audio analysis; analyzing, by the one or more processors, the video file to create a video analysis(“analyzing the audio information 205 and the video information 207” in Para.[0052]); 
determining, by the one or more processors, one or segments in the video based on the audio analysis and the video analysis(“Determining and associating a dominant audio class in a video frame, locating matching locations, and estimating offset of audio and video” in Para.[0061]);
Cooper is silent about determining, by a machine learning algorithm being executed by the one or more processors, one or segments in the video based on the audio analysis and the video analysis; wherein the machine learning algorithm predicts where a change in topic occurs in the video; adding, by the one or more processors, one or more chapter markers to the video to create a segmented video, wherein individual chapter markers of the one or more chapter markers correspond to a starting position of individual segments of the one or more segments; associating, by the one or more processors, a hyperlink of one or more hyperlinks with individual chapter markers of the one or more chapter markers; creating, by the one or more processors, an index that includes the one or more hyperlinks; and providing, by the one or more processors, access to the index via a network.
Brauckmann teaches determining, by a machine learning algorithm being executed by the one or more processors, one or segments in the video based on the audio analysis and the video analysis(“The invention refers to a method for processing and analyzing forensic video data using a computer program” in Para.[0001], “executing an algorithm to identify the flag or trigger and tagging the video data with a tag depending on the identified flag or trigger” in Para.[0010], “Every single video needs to be analyzed sequence by sequence. There are algorithms available searching for persons, faces, license plates and movements” in Para.[0046], The algorithm is executed by a computer program and the computer program is executed by the processor well known in the art.); 
adding, by the one or more processors, one or more chapter markers to the video to create a segmented video, wherein individual chapter markers of the one or more chapter markers correspond to a starting position of individual segments of the one or more segments(“executing an algorithm to identify the flag or trigger and tagging the video data with a tag depending on the identified flag or trigger” in Para.[0039], “Adding the notification (e.g. the tagging) itself can be done for example by a special gesture or movement or any other optical or acoustical signal or by manipulating (e.g. activating a key on/pressing a button on) the recording device. A special investigation algorithm looking for such notifications (triggers) may be executed on the video material which than adds tags on the video material” in Para.[0040], “A tag cloud is created in order to show content statistic of one or more videos. Clicking on an item in the tag cloud may show a list of hyperlinks to the different videos (incl. timestamp) allowing to jump directly into the video scene” in Para.[0053], The tag can be interpreted as a chapter marker. The view scene is comprising video segments.); 
associating, by the one or more processors, a hyperlink of one or more hyperlinks with individual chapter markers of the one or more chapter markers; creating, by the one or more processors, an index that includes the one or more hyperlinks; and providing, by the one or more processors, access to the index via a network(“providing a tag related to an object of interest in the video data the step of analyzing may comprise creating a text file for each of the video data, the text file including a listing of tags and/or identified objects, in particular further comprising hyperlinks pointing to positions in the video data related to the tags and/or objects” in Para.[0045], “identifying an object of interest in the video data and selecting a frame out of the video data including the object of interest, in particular a frame with the first occurrence of the object of interest, and assigning a tag to the selected frame. Such a selected frame may be used as a representation of the object of interest in the displaying step. This step may further comprise the step of displaying a tag cloud with a plurality of tags for a plurality of video data based on the number of occurrences of the respective tags in the plurality of video data, in particular wherein the size and/or colour of the displayed tags in the tag cloud may depend of the number of occurrences of the respective tag; wherein preferably each tag may be associated with hyperlinks to the respective positions in the plurality of video data related to the tag” in Para.[0051], “the public may be included for video analysis by providing tools for video analysis to the public before they upload their material for investigation. Use can be made of existing public services from the internet, by asking the public to get the material analyzed by their “social media tools” and upload the material extended with the “knowledge” of their cloud services” in Para.[0036], A tag cloud can be interpreted as an index. The tag cloud can be shared by public services from the internet well known in the art. ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Cooper with the above teachings of Brauckmann in order to more efficiently present relevant video material to a user of the computer program.
Anders teaches wherein the machine learning algorithm predicts where a change in topic occurs in the video(“The emotion time series model is built per environmental factor for prediction of future changes in the state of emotion of the subject person or group and for recommending certain activities to attain a desired state of emotion in the future. … Certain embodiments of the present invention, receives feedback and recommendation and trains the emotion time series models respective to environmental factors, for individual users or for a group, by machine learning” in Para.[0081]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Cooper and Brauckmann with the above teachings of Anders in order to improve the effectiveness of the video analysis with less computing resources.
As per claim 2, Cooper, Brauckmann and Anders teach all of limitation of claim 1. 
Cooper is silent about further comprising: receiving from a computing device, via the network, a command selecting a particular hyperlink of the one or more hyperlinks; determining a particular segment of the one or more segments corresponding to the particular hyperlink; and initiating streaming the particular segment, via the network, to the computing device.
Brauckmann teaches further comprising: receiving from a computing device, via the network, a command selecting a particular hyperlink of the one or more hyperlinks; determining a particular segment of the one or more segments corresponding to the particular hyperlink; and initiating streaming the particular segment, via the network, to the computing device(“displaying a tag cloud with a plurality of tags for a plurality of video data based on the number of occurrences of the respective tags in the plurality of video data, in particular wherein the size and/or colour of the displayed tags in the tag cloud may depend of the number of occurrences of the respective tag; wherein preferably each tag may be associated with hyperlinks to the respective positions in the plurality of video data related to the tag” in Para.[0051], “A tag cloud is created in order to show content statistic of one or more videos. Clicking on an item in the tag cloud may show a list of hyperlinks to the different videos (incl. timestamp) allowing to jump directly into the video scene. Text based data is created from video material to present this in a tag cloud in order to allow quick and easy navigation through a big number of different video materials. A tag cloud is created for example based on the number of occurrence of each tag (best shot) and on the duration of a tagged object being visible in the scenes” in Para.[0053]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Cooper with the above teachings of Brauckmann in order to more efficiently present relevant video material to a user of the computer program.
As per claim 3, Cooper, Brauckmann and Anders teach all of limitation of claim 2. 
Cooper is silent about further comprising: receiving from the computing device, via the network, a second command selecting a second particular hyperlink of the one or more hyperlinks; determining a second particular segment of the one or more segments corresponding to the second particular hyperlink; and initiating streaming of the second particular segment, via the network, to the computing device.
Brauckmann teaches further comprising: receiving from the computing device, via the network, a second command selecting a second particular hyperlink of the one or more hyperlinks; determining a second particular segment of the one or more segments corresponding to the second particular hyperlink; and initiating streaming of the second particular segment, via the network, to the computing device(“displaying a tag cloud with a plurality of tags for a plurality of video data based on the number of occurrences of the respective tags in the plurality of video data, in particular wherein the size and/or colour of the displayed tags in the tag cloud may depend of the number of occurrences of the respective tag; wherein preferably each tag may be associated with hyperlinks to the respective positions in the plurality of video data related to the tag” in Para.[0051], “A tag cloud is created in order to show content statistic of one or more videos. Clicking on an item in the tag cloud may show a list of hyperlinks to the different videos (incl. timestamp) allowing to jump directly into the video scene. Text based data is created from video material to present this in a tag cloud in order to allow quick and easy navigation through a big number of different video materials. A tag cloud is created for example based on the number of occurrence of each tag (best shot) and on the duration of a tagged object being visible in the scenes” in Para.[0053], This is repeated process as claim 2 so rejected under the same rationale.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Cooper with the above teachings of Brauckmann in order to more efficiently present relevant video material to a user of the computer program.
As per claim 8, Cooper teaches a server comprising: one or more processors; and one or more non-transitory computer-readable storage media to store instructions executable by the one or more processors(“The invention may be implemented, for example, by having the various means of receiving video signals and associated signals, identifying Audio-visual events and comparing video signal and associated signal Audio-visual events to determine relative timing as a software application (as an operating system element), a dedicated processor, or a dedicated processor with dedicated code. The software executes a sequence of machine-readable instructions, which can also be referred to as code. These instructions may reside in various types of signal-bearing media. In this respect, one aspect of the present invention concerns a program product, comprising a signal-bearing medium or signal-bearing media tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for receiving video signals and associated signals, identifying Audio-visual events and comparing video signal and associated signal Audio-visual events to determine relative timing” in Para.[0065], “This signal-bearing medium may comprise, for example, memory in server. The memory in the server may be non-volatile storage, a data disc, or even memory on a vendor server for downloading to a processor for installation. Alternatively, the instructions may be embodied in a signal-bearing medium such as the optical data storage disc. Alternatively, the instructions may be stored on any of a variety of machine-readable data storage mediums or media, which may include, for example, a "hard drive", a RAID array, a RAMAC, a magnetic data storage diskette (such as a floppy disk), magnetic tape, digital optical tape, RAM, ROM, EPROM, EEPROM, flash memory, magneto-optical storage, paper punch cards, or any other suitable signal-bearing media including transmission media such as digital and/or analog communications links, which may be electrical, optical, and/or wireless” in Para.[0066]) and the other limitations in the claim 8 has been discussed in the rejection claim 1 and rejected under the same rationale.
As per claim 9, the limitations in the claim 9 has been discussed in the rejection claim 2 and rejected under the same rationale. 	
As per claim 15, the limitations in the claim 15 has been discussed in the rejection claim 8 and rejected under the same rationale.
As per claim 16, the limitations in the claim 16 has been discussed in the rejection claim 2 and rejected under the same rationale.

Cooper in view of Brauckmann, Anders and Shen
Claims 4-7, 10, 11, 13, 14, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Cooper et al.(USPubN 2007/0153125; hereinafter Cooper) in view of Brauckmann et al.(USPubN 2017/0294213; hereinafter Brauckmann) further in view of Anders et al.(USPubN 2020/0090067; hereinafter Anders) further in view of Shen et al.(USPubN 2019/0311743; hereinafter Shen).
As per claim 4, Cooper, Brauckmann and Anders teach all of limitation of claim 1. 
Cooper, Brauckmann and Anders are silent about wherein analyzing the audio file to create the audio analysis comprises: performing natural language processing to identify, in the audio file, a set of one or more words indicative of the start of a segment; and indicating in the audio analysis, a time from a start of the video where the set of one or more words were spoken.
Shen teaches wherein analyzing the audio file to create the audio analysis comprises: performing natural language processing to identify, in the audio file, a set of one or more words indicative of the start of a segment; and indicating in the audio analysis, a time from a start of the video where the set of one or more words were spoken(“identify video content, or an exact frame or a collection frames from the video content relating to a word, sentence, paragraph, object, action, presence of people, a particular person, or section of text from the script. For example, the script or a portion of the script may be entered into the Search Engine 22 (e.g., via a user device 44 with a user interface). The search engine 22 may parse the script into keywords (referred to herein a “script keywords”) and search the media assets in the content database 32 and associated indexed information (such as the metadata, keywords, features, text (converted from speech), and the like stored with the media asset) using natural language processing techniques to locate video content or one or more frames of video content relevant to the script or portions” in Para.[0050]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Cooper, Brauckmann and Anders with the above teachings of Shen in order to be advantageous to reduce the production costs and to automate some or all of the video production processes, removing at least some of the human involvement.
As per claim 5, Cooper, Brauckmann and Anders teach all of limitation of claim 1. 
Cooper, Brauckmann and Anders are silent about wherein analyzing the audio file to create the audio analysis comprises: performing text-to-speech analysis of contents of the audio file to create a first text file; performing natural language processing to identify, in the first text file, a set of one or more words indicative of the start of a segment; and indicating in the audio analysis, a time from a start of the video where the set of one or more words are located.
Shen teaches wherein analyzing the audio file to create the audio analysis comprises: performing text-to-speech analysis of contents of the audio file to create a first text file; performing natural language processing to identify, in the first text file, a set of one or more words indicative of the start of a segment; and indicating in the audio analysis, a time from a start of the video where the set of one or more words are located(“identify video content, or an exact frame or a collection frames from the video content relating to a word, sentence, paragraph, object, action, presence of people, a particular person, or section of text from the script. For example, the script or a portion of the script may be entered into the Search Engine 22 (e.g., via a user device 44 with a user interface). The search engine 22 may parse the script into keywords (referred to herein a “script keywords”) and search the media assets in the content database 32 and associated indexed information (such as the metadata, keywords, features, text (converted from speech), and the like stored with the media asset) using natural language processing techniques to locate video content or one or more frames of video content relevant to the script or portions” in Para.[0050]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Cooper, Brauckmann and Anders with the above teachings of Shen in order to be advantageous to reduce the production costs and to automate some or all of the video production processes, removing at least some of the human involvement.
As per claim 6, Cooper, Brauckmann and Anders teach all of limitation of claim 1. 
Cooper, Brauckmann and Anders are silent about wherein analyzing the video file to create the video analysis comprises: performing, using a convolutional neural network, a frame analysis of the video file; determining, using the frame analysis, a time of a start of at least one segment in the video file; and indicating in the video analysis, the time of the start of at least one segment in the video file.
Shen teaches wherein analyzing the video file to create the video analysis comprises: performing, using a convolutional neural network, a frame analysis of the video file; determining, using the frame analysis, a time of a start of at least one segment in the video file; and indicating in the video analysis, the time of the start of at least one segment in the video file(“identify video content, or an exact frame or a collection frames from the video content relating to a word, sentence, paragraph, object, action, presence of people, a particular person, or section of text from the script. For example, the script or a portion of the script may be entered into the Search Engine 22 (e.g., via a user device 44 with a user interface). The search engine 22 may parse the script into keywords (referred to herein a “script keywords”) and search the media assets in the content database 32 and associated indexed information (such as the metadata, keywords, features, text (converted from speech), and the like stored with the media asset) using natural language processing techniques to locate video content or one or more frames of video content relevant to the script or portions” in Para.[0050]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Cooper, Brauckmann and Anders with the above teachings of Shen in order to be advantageous to reduce the production costs and to automate some or all of the video production processes, removing at least some of the human involvement.
As per claim 7, Cooper, Brauckmann and Anders teach all of limitation of claim 1. 
Cooper, Brauckmann and Anders are silent about wherein analyzing the video file to create the video analysis comprises: performing optical character recognition of one or more frames of the video to create a second text file; performing natural language processing to identify, in the second text file, a set of one or more words indicative of the start of a segment; and indicating in the video analysis, a time from a start of the video where the set of one or more words are located.
Shen teaches wherein analyzing the video file to create the video analysis comprises: performing optical character recognition of one or more frames of the video to create a second text file; performing natural language processing to identify, in the second text file, a set of one or more words indicative of the start of a segment; and indicating in the video analysis, a time from a start of the video where the set of one or more words are located(“identify video content, or an exact frame or a collection frames from the video content relating to a word, sentence, paragraph, object, action, presence of people, a particular person, or section of text from the script. For example, the script or a portion of the script may be entered into the Search Engine 22 (e.g., via a user device 44 with a user interface). The search engine 22 may parse the script into keywords (referred to herein a “script keywords”) and search the media assets in the content database 32 and associated indexed information (such as the metadata, keywords, features, text (converted from speech), and the like stored with the media asset) using natural language processing techniques to locate video content or one or more frames of video content relevant to the script or portions” in Para.[0050]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Cooper, Brauckmann and Anders with the above teachings of Shen in order to be advantageous to reduce the production costs and to automate some or all of the video production processes, removing at least some of the human involvement.
As per claim 10, the limitations in the claim 10 has been discussed in the rejection claim 4 and rejected under the same rationale.
As per claim 11, the limitations in the claim 11 has been discussed in the rejection claim 5 and rejected under the same rationale.
As per claim 13, the limitations in the claim 13 has been discussed in the rejection claim 6 and rejected under the same rationale.
As per claim 14, the limitations in the claim 14 has been discussed in the rejection claim 7 and rejected under the same rationale.
As per claim 17, the limitations in the claim 17 has been discussed in the rejection claim 4 and rejected under the same rationale.
As per claim 18, the limitations in the claim 18 has been discussed in the rejection claim 5 and rejected under the same rationale.
As per claim 19, the limitations in the claim 19 has been discussed in the rejection claim 6 and rejected under the same rationale.
As per claim 20, the limitations in the claim 20 has been discussed in the rejection claim 7 and rejected under the same rationale.

Cooper in view of Brauckmann, Anders and Moehrle
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Cooper et al.(USPubN 2007/0153125; hereinafter Cooper) in view of Brauckmann et al.(USPubN 2017/0294213; hereinafter Brauckmann) further in view of Anders et al.(USPubN 2020/0090067; hereinafter Anders) in view of Moehrle(USPubN 2011/0137753).
As per claim 12, Cooper, Brauckmann and Anders teach all of limitation of claim 8. 
Cooper, Brauckmann and Anders are silent about wherein analyzing the video file to create the video analysis comprises: determining using a micro-expression analyzer one or more micro-expressions of a presenter in the video; determining a sentiment corresponding to the micro-expression; wherein the sentiment comprises one of happy or unhappy; and indicating in the video analysis, a time of a start of the micro-expression and the corresponding sentiment.
Moehrle teaches wherein analyzing the video file to create the video analysis comprises: determining using a micro-expression analyzer one or more micro-expressions of a presenter in the video; determining a sentiment corresponding to the micro-expression; wherein the sentiment comprises one of happy or unhappy; and indicating in the video analysis, a time of a start of the micro-expression and the corresponding sentiment(“the states of objects will be indexed such as by means of facial expression algorithms already known in the art which extract the state of a person in a video such as happy or sad. According one embodiment a user may search for objects in videos by submitting a video object of an image of an object as the search input” in Para.[0057m]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings Cooper, Brauckmann and Anders with the above teachings of Moehrle in order to enhance an end user's experience of indexing variety of features in the video.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUNGHYOUN PARK whose telephone number is (571)270-1333. The examiner can normally be reached M - Thur 6:00 am - 4 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI Q TRAN can be reached on (571)272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SUNGHYOUN PARK/Examiner, Art Unit 2484