Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to communications: Application filed on 05/17/2019. Claims 1, 8 and 15 are independent claims. Claims 1-20 have been examined and rejected in the current patent application. 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/01/2021. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. 
Response to Arguments
Applicant presents the following arguments in the August 30, 2021 amendment.
Applicant's arguments with respect to claims 1, 8 and 15 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection. 
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/30/2021 has been entered. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 8-9 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Song et al. (US 2018/0254070 A1, hereinafter Song) in view of Nelson et al. (US 2019/0273767 A1, hereinafter Nelson).  
Regarding independent claim(s) 1, Song discloses a computer system comprising: a hardware processor operatively coupled to memory; a knowledge engine in communication with the hardware processor, the knowledge engine configured to implement one or more tools to support identification of duplicate media content, the one or more tools comprising (Song discloses Execution of the sequences of instructions contained in memory 704 causes processing unit 712 to perform one or more of the method steps described herein. In alternative embodiments, hardware, such as ASIC, may be used in place of or in combination with software.  A module can include sub-modules. Software and hardware components of a module may be stored on a computer readable medium for execution by a processor. A web search engine algorithms or related applications by analyzing the returned results against a graded relevance scale of content items in a search engine result set (knowledge engine). A thumbnail engine 300 can be embodied as a stand-alone application that executes on a user device. The identification of a video file for which automatic thumbnail analysis can be performed, as discussed herein, such as, for example, downloading a video, capturing a video, receiving a video (e.g., via sharing), posting a video (e.g., to a social networking platform), and the like, without departing from the scope of the instant disclosure (identification of a quality value/metric/determination of the frames as well as the types of frames in the video). The duplicate images are identified and discarded. The remaining frames are then subject to keyframe extraction where duplicate frames are removed from the frames remaining in the set, (see Song: Para. 0034-0043, 0080-0092, 0094-0105, 0107-0111 and 0143-0150). This reads on the claim concepts of a computer system comprising: a hardware processor operatively coupled to memory; a knowledge engine in communication with the hardware processor, the knowledge engine configured to implement one or more tools to support identification of duplicate media content, the one or more tools comprising): 
an assessment manager configured to conduct a first similarity assessment between the first and second data streams, the first similarity assessment to produce a first distance measurement between the first audio representation and the second audio representation, the first distance measurement to quantify similarity between the first and second sequence of events (Song discloses the disclosed systems and methods automatically select frames of a video file as thumbnails by analyzing various visual quality and aesthetic metrics of video frames, and performing computerized clustering analysis to determine the relevance to video content, thus making resulting thumbnails more representative of the video. automatically generated based upon an ordered or weighted, computationally determined combination of both relevance and visual aesthetic quality analysis of the frames of the video. Cluster analysis clusters frames by their visual similarity, and selects the most representative frames, one per cluster, by selecting a frame closest to the centroid (e.g., using the k-means algorithm) or the medoid (using the k-medoids algorithm) of samples within each cluster. The disclosed systems and methods exploit two important characteristics commonly associated with meaningful thumbnails: high relevance to video content and superior aesthetic quality. The threshold can correspond to a predetermined distance between features in the representation/vector (distance between the original frame and its contrast normalized version/between the first and second data streams). Examples of content may include videos, text, audio and images. Automatic extraction of a single most representative frame from a video sequence. A video sequence, by nature, has many near duplicate frames. The module 308 selects a thumbnail from each cluster by finding the smallest frame difference value within each cluster. A clustering analysis to determine the relevance to video content, thus making the resulting thumbnails more representative of the video (first distance measurement between the first audio representation and the second audio representation), (see Song: Para. 0037-0047, 054-0055 0061-0070, 0072-0095, 0108-0112 and 0125-0135),. This reads on the claim concepts of an assessment manager configured to conduct a first similarity assessment between the first and second data streams, the first similarity assessment to produce a first distance measurement between the first audio representation and the second audio representation, the first distance measurement to quantify similarity between the first and second sequence of events); 
selectively identify duplicate data in the first and second sequence of events responsive to the first similarity assessment and produced distance measurement (Song discloses the goal of the keyframe analysis of Step 406 is to identify duplicate frames using the least computational energy as possible, keyframe extraction module 306 can employ de-duplication software that employs, for example, a color and edge histogram technique. Automatic extraction of a single most representative frame from a video sequence. For example, one known system uses a sparse dictionary selection approach which aims to reconstruct a video sequence from only a few "basis frames" from the video using a machine learning statistical/regression analysis algorithm, such as, group LASSO (least absolute shrinkage and selection operator). Analysis technique or algorithm. Cluster analysis clusters frames by their visual similarity, and selects the most representative frames, one per cluster, by selecting a frame closest to the centroid (e.g., using the k-means algorithm) or the medoid (using the k-medoids algorithm) of samples within each cluster (distance measurement). The data associated with the content produces a resolution, focus, pixel quality, size, dimension, color scheme, exposure, sharpness, stillness and white balance. For example, a shot can correspond to a set of features in the feature representation, or set of features within (or across) a dimension of the feature representation. Similarly, a subshot can be identified from a set of features whereby the features correspond to one another at or within a threshold value (first and second sequence of events responsive). The threshold can correspond to a predetermined distance between features in the representation/vector, (see Song: Para. 0034-0047, 0079, 0107-0110 and 0111-0130). This reads on the claim concepts of selectively identify duplicate data in the first and second sequence of events responsive to the first similarity assessment and produced distance measurement); and
the assessment manager configured to conduct a second similarity assessment between one of the first and second data streams and a source, the second similarity assessment to produce a second measurement of similarity between the assessed data stream and the source and to selectively present the assessed data stream to the source based on the second measurement (Song discloses a 52-dimensional vector of visual features can be constructed by the thumbnail selection module 308 which captures a set of visual aesthetic properties of a frame(s), such as properties. a set of visual features designed to capture various aesthetic properties are extracted, and a random forest regression model is trained on a set of images annotated with subjective aesthetic scores. a variety of possible tasks, such as browsing, searching, playing, streaming or displaying various forms of content, including locally stored or uploaded images and/or video, or games (not limited to, video, text, audio, images, and/or any other type of known or to be known multi-media item or object). The disclosed systems and methods automatically select frames of a video file as thumbnails by analyzing various visual quality and aesthetic metrics of video frames, and performing computerized clustering analysis to determine the relevance to video content, thus making resulting thumbnails more representative of the video. Automatically generated based upon an ordered or weighted, computationally determined combination of both relevance and visual aesthetic quality analysis of the frames of the video. Cluster analysis clusters frames by their visual similarity, and selects the most representative frames, one per cluster, by selecting a frame closest to the centroid (e.g., using the k-means algorithm) or the medoid (using the k-medoids algorithm) of samples within each cluster. K-Means Clustering are groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters (the second similarity assessment to produce a second measurement of similarity between the assessed data stream and the source and to selectively present the assessed data stream to the source based on the second measurement). The disclosed systems and methods exploit two important characteristics commonly associated with meaningful thumbnails: high relevance to video content and superior aesthetic quality. The threshold can correspond to a predetermined distance between features in the representation/vector (distance between the original frame and its contrast normalized version/between the first and second data streams). Examples of content may include videos, text, audio and images. Automatic extraction of a single most representative frame from a video sequence. A video sequence, by nature, has many near duplicate frames. The module 308 selects a thumbnail from each cluster by finding the smallest frame difference value within each cluster. A clustering analysis to determine the relevance to video content, thus making the resulting thumbnails more representative of the video (first distance measurement between the first audio representation and the second audio representation), (see Song: Para. 0037-0047, 054-0055 0061-0070, 0072-0095, 0108-0112 and 0121-0135), This reads on the claim concepts of the assessment manager configured to conduct a second similarity assessment between one of the first and second data streams and a source, the second similarity assessment to produce a second measurement of similarity between the assessed data stream and the source and to selectively present the assessed data stream to the source based on the second measurement). 
However, Song does not appear to specifically disclose a data manager configured to utilize natural language processing to convert a first data stream into a first sequence of events and to convert a second data stream into a second sequence of events, the first data stream having a at least one first audio representation and the second data stream having at least one second audio representation, wherein the first and second audio representations are interpreted as natural language text. 
In the same field of endeavor, Nelson discloses a data manager configured to utilize natural language processing to convert a first data stream into a first sequence of events and to convert a second data stream into a second sequence of events (Nelson discloses the ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable. For example, NLP makes it possible for computers to read text, hear speech, interpret it, measure sentiment and determine which parts are important. The use of speech and/or text recognition provides a more favorable user experience by allowing users to manage various aspects of electronic meetings using voice commands and/ or text commands. A data stream is a countably infinite sequence of elements and is used to represent data elements that are made available over time, such as data streams, electronic documents, etc. Audio/video data 300 may be one or more data packets, a data stream, and/or any other form of data that includes audio and/or video information related to an electronic meeting. In the example depicted in FIG. 3, audio/video data 300 includes first meeting content data 302 which, in turn, includes cue 304. Second meeting content data 312 includes JSON that can be used by an electronic meeting application to make decisions about a current electronic meeting. A sequence of events or things is a number of events or things that come one after another in a particular order. The input may be a cue for meeting intelligence apparatus to convert the input into a different format, (see Nelson: Para. 0133, 0197-0206, 0227-0232 and 0237-0240). This reads on the claim concept of a data manager configured to utilize natural language processing to convert a first data stream into a first sequence of events and to convert a second data stream into a second sequence of events), 
the first data stream having a at least one first audio representation and the second data stream having at least one second audio representation, wherein the first and second audio representations are interpreted as natural language text (Nelson discloses the ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable. For example, NLP makes it possible for computers to read text, hear speech, interpret it, measure sentiment and determine which parts are important. The use of speech and/or text recognition provides a more favorable user experience by allowing users to manage various aspects of electronic meetings using voice commands and/ or text commands. A data stream is a countably infinite sequence of elements and is used to represent data elements that are made available over time, such as data streams, electronic documents, etc. Audio/video data 300 may be one or more data packets, a data stream, and/or any other form of data that includes audio and/or video information related to an electronic meeting. In the example depicted in FIG. 3, audio/video data 300 includes first meeting content data 302 which, in turn, includes cue 304. Second meeting content data 312 includes JSON that can be used by an electronic meeting application to make decisions about a current electronic meeting. A sequence of events or things is a number of events or things that come one after another in a particular order. The input may be a cue for meeting intelligence apparatus to convert the input into a different format. Speech or text recognition logic 400 parses an interprets meeting content data 302 to detect natural language request 406, which is a cue 304 for meeting intelligence apparatus 102 to generate intervention data 310 to be sent to at least node 104A during an electronic meeting. Meeting intelligence apparatus may analyze meeting content data using any of a number of tools, such as speech or text recognition, voice or face identification, sentiment analysis, object detection, gestural analysis, thermal imaging, etc. (see Nelson: Para. 0133, 0197-0206, 0227-0232 and 0237- 0240). This reads on the claim concept of the first data stream having a at least one first audio representation and the second data stream having at least one second audio representation, wherein the first and second audio representations are interpreted as natural language text); and 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the duplicate frames multimedia streaming audio or video of Song in order to have incorporated natural language processing, as disclosed by Nelson, since both of these mechanisms are directed to Speech recognition is a technology that enables a machine or program to identify and understand words or phrases from spoken language and convert them into machine readable format. It is a subfield of computational linguistics that deals with technologies to allow spoken input into systems. Natural Language Processing (NLP), on the other hand, is a branch of artificial intelligence that investigates the use of computers to process or to understand human languages for the purpose of performing useful tasks. NLP is a technology used to simplify speech recognition processes to make them less time consuming. Voice recognition, also referred to as speech recognition, is a technology that offers great advantages for many types of human-machine communication. With speech recognition, computers can understand and interpret spoken words of phrases and convert them into text. It is used primarily for dictation, interface and security. NLP, on the other hand, is a technology that develops methodologies and algorithms that take as input or produce as output unstructured, natural language data. NLP and speech recognition are sometimes used in conjunction in applications such as voice assistants, ASR engines, and speech analytics tools. Speech recognition basically means talking to a computer and getting it to understand and interpret your spoken words. Speech recognition software use different algorithms to identify spoken languages and convert it into text. As a dictation device, voice recognition can be used to pick-up the words you say and type in on a computer. It is also used as an interface and control system for computers. The best example of natural language processing is machine translation, which automatically translates text or speech from one language to another. NLP is used to perform tasks such as automatic summarization, topic segmentation, relationship extraction, information retrieval, and speech recognition. Speech recognition basically means talking to a computer and getting it to understand and interpret your spoken words. It identifies and interprets words and phrases in spoken language and converts them into texts by computers. Natural Language Processing simply deals with the interaction between humans and computers using a natural language such as English. NLP technology applies machine learning algorithms to text and speech. NLP and speech recognition are often used in conjunction in applications such as voice assistants, engines, and speech analytics tools. Incorporating the teachings of Nelson into Song would produce the approach includes a user friendly way for users to join electronic meetings using mobile devices. The approach also allows participants to command and control an electronic meeting using their mobile device, and to receive individualized output, such as meeting transcripts, real-time language translation, messages, prompts, meeting information, and personalized audio streams, as disclosed by Nelson, (see Abstract).
 Regarding dependent claim(s) 2, the combination of Song and Nelson discloses the system as in claim 1. However, Song does not appear to specifically disclose further comprising the data manager configured to identify the at least one first audio representation in the first data stream and the at least one second audio representation in the second data stream, and the data manager to utilize natural language processing to process the identified first audio representation into a first natural language text representation and to process the identified second audio representation into a second natural language text representation. 
In the same field of endeavor, Nelson discloses further comprising the data manager configured to identify the at least one first audio representation in the first data stream and the at least one second audio representation in the second data stream, and the data manager to utilize natural language processing to process the identified first audio representation into a first natural language text representation and to process the identified second audio representation into a second natural language text representation (Nelson discloses the ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable. For example, NLP makes it possible for computers to read text, hear speech, interpret it, measure sentiment and determine which parts are important. The use of speech and/or text recognition provides a more favorable user experience by allowing users to manage various aspects of electronic meetings using voice commands and/ or text commands. A data stream is a countably infinite sequence of elements and is used to represent data elements that are made available over time, such as data streams, electronic documents, etc. Audio/video data 300 may be one or more data packets, a data stream, and/or any other form of data that includes audio and/or video information related to an electronic meeting. In the example depicted in FIG. 3, audio/video data 300 includes first meeting content data 302 which, in turn, includes cue 304. Second meeting content data 312 includes JSON that can be used by an electronic meeting application to make decisions about a current electronic meeting. A sequence of events or things is a number of events or things that come one after another in a particular order. The input may be a cue for meeting intelligence apparatus to convert the input into a different format. Speech or text recognition logic 400 parses an interprets meeting content data 302 to detect natural language request 406, which is a cue 304 for meeting intelligence apparatus 102 to generate intervention data 310 to be sent to at least node 104A during an electronic meeting. Meeting intelligence apparatus may analyze meeting content data using any of a number of tools, such as speech or text recognition, voice or face identification, sentiment analysis, object detection, gestural analysis, thermal imaging, etc. (see Nelson: Para. 0133, 0197-0206, 0227-0232 and 0237- 0240). This reads on the claim concept of further comprising the data manager configured to identify the at least one first audio representation in the first data stream and the at least one second audio representation in the second data stream, and the data manager to utilize natural language processing to process the identified first audio representation into a first natural language text representation and to process the identified second audio representation into a second natural language text representation). 
Regarding independent claim(s) 8, Song discloses a computer program product for similarity assessment, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by a processor to (Song discloses  computer readable medium (or computer-readable storage medium/ media) stores computer data, which data can include computer program code (or computer-executable instructions) that is executable by a computer. Execution of the sequences of instructions contained in memory 704 causes processing unit 712 to perform one or more of the method steps described herein. In alternative embodiments, hardware, such as ASIC, may be used in place of or in combination with software.  A module can include sub-modules. Software and hardware components of a module may be stored on a computer readable medium for execution by a processor. A web search engine algorithms or related applications by analyzing the returned results against a graded relevance scale of content items in a search engine result set (knowledge engine). A thumbnail engine 300 can be embodied as a stand-alone application that executes on a user device. The identification of a video file for which automatic thumbnail analysis can be performed, as discussed herein, such as, for example, downloading a video, capturing a video, receiving a video (e.g., via sharing), posting a video (e.g., to a social networking platform), and the like, without departing from the scope of the instant disclosure (identification of a quality value/metric/determination of the frames as well as the types of frames in the video). The duplicate images are identified and discarded. The remaining frames are then subject to keyframe extraction where duplicate frames are removed from the frames remaining in the set, (see Song: Para. 0023-0043, 0080-0092, 0094-0105, 0107-0111 and 0143-0150). This reads on the claim concepts of a computer program product for similarity assessment, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by a processor to):
conduct a first similarity assessment between the first and second data streams the first similarity assessment to produce a first distance measurement between the first audio representation and the second audio representation, the first distance measurement, the distance measurement to quantify similarity between the first and second sequence of events (Song discloses the disclosed systems and methods automatically select frames of a video file as thumbnails by analyzing various visual quality and aesthetic metrics of video frames, and performing computerized clustering analysis to determine the relevance to video content, thus making resulting thumbnails more representative of the video. automatically generated based upon an ordered or weighted, computationally determined combination of both relevance and visual aesthetic quality analysis of the frames of the video. Cluster analysis clusters frames by their visual similarity, and selects the most representative frames, one per cluster, by selecting a frame closest to the centroid (e.g., using the k-means algorithm) or the medoid (using the k-medoids algorithm) of samples within each cluster. The disclosed systems and methods exploit two important characteristics commonly associated with meaningful thumbnails: high relevance to video content and superior aesthetic quality. The threshold can correspond to a predetermined distance between features in the representation/vector (distance between the original frame and its contrast normalized version/between the first and second data streams). Examples of content may include videos, text, audio and images. Automatic extraction of a single most representative frame from a video sequence. A video sequence, by nature, has many near duplicate frames. The module 308 selects a thumbnail from each cluster by finding the smallest frame difference value within each cluster. A clustering analysis to determine the relevance to video content, thus making the resulting thumbnails more representative of the video (first distance measurement between the first audio representation and the second audio representation), (see Song: Para. 0037-0047, 054-0055 0061-0070, 0072-0095, 0108-0112 and 0125-0135),. This reads on the claim concepts of conduct a first similarity assessment between the first and second data streams the first similarity assessment to produce a first distance measurement between the first audio representation and the second audio representation, the first distance measurement, the distance measurement to quantify similarity between the first and second sequence of events); 
selectively identify duplicate data in the first and second sequence of events responsive to the first similarity assessment and produced distance measurement (Song discloses the goal of the keyframe analysis of Step 406 is to identify duplicate frames using the least computational energy as possible, keyframe extraction module 306 can employ de-duplication software that employs, for example, a color and edge histogram technique. Automatic extraction of a single most representative frame from a video sequence. For example, one known system uses a sparse dictionary selection approach which aims to reconstruct a video sequence from only a few "basis frames" from the video using a machine learning statistical/regression analysis algorithm, such as, group LASSO (least absolute shrinkage and selection operator). Analysis technique or algorithm. Cluster analysis clusters frames by their visual similarity, and selects the most representative frames, one per cluster, by selecting a frame closest to the centroid (e.g., using the k-means algorithm) or the medoid (using the k-medoids algorithm) of samples within each cluster (distance measurement). The data associated with the content produces a resolution, focus, pixel quality, size, dimension, color scheme, exposure, sharpness, stillness and white balance. For example, a shot can correspond to a set of features in the feature representation, or set of features within (or across) a dimension of the feature representation. Similarly, a subshot can be identified from a set of features whereby the features correspond to one another at or within a threshold value (first and second sequence of events responsive). The threshold can correspond to a predetermined distance between features in the representation/vector, (see Song: Para. 0034-0047, 0079, 0107-0110 and 0111-0130). This reads on the claim concepts of selectively identify duplicate data in the first and second sequence of events responsive to the first similarity assessment and produced distance measurement); 
conduct a second similarity assessment between one of the first and second data streams and a source, the second similarity assessment to produce a second measurement of similarity between the assessed data stream and the source and to selectively present the assessed data stream to the source based on the second measurement (Song discloses a 52-dimensional vector of visual features can be constructed by the thumbnail selection module 308 which captures a set of visual aesthetic properties of a frame(s), such as properties. a set of visual features designed to capture various aesthetic properties are extracted, and a random forest regression model is trained on a set of images annotated with subjective aesthetic scores. a variety of possible tasks, such as browsing, searching, playing, streaming or displaying various forms of content, including locally stored or uploaded images and/or video, or games (not limited to, video, text, audio, images, and/or any other type of known or to be known multi-media item or object). The disclosed systems and methods automatically select frames of a video file as thumbnails by analyzing various visual quality and aesthetic metrics of video frames, and performing computerized clustering analysis to determine the relevance to video content, thus making resulting thumbnails more representative of the video. Automatically generated based upon an ordered or weighted, computationally determined combination of both relevance and visual aesthetic quality analysis of the frames of the video. Cluster analysis clusters frames by their visual similarity, and selects the most representative frames, one per cluster, by selecting a frame closest to the centroid (e.g., using the k-means algorithm) or the medoid (using the k-medoids algorithm) of samples within each cluster. K-Means Clustering are groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters (the second similarity assessment to produce a second measurement of similarity between the assessed data stream and the source and to selectively present the assessed data stream to the source based on the second measurement). The disclosed systems and methods exploit two important characteristics commonly associated with meaningful thumbnails: high relevance to video content and superior aesthetic quality. The threshold can correspond to a predetermined distance between features in the representation/vector (distance between the original frame and its contrast normalized version/between the first and second data streams). Examples of content may include videos, text, audio and images. Automatic extraction of a single most representative frame from a video sequence. A video sequence, by nature, has many near duplicate frames. The module 308 selects a thumbnail from each cluster by finding the smallest frame difference value within each cluster. A clustering analysis to determine the relevance to video content, thus making the resulting thumbnails more representative of the video (first distance measurement between the first audio representation and the second audio representation), (see Song: Para. 0037-0047, 054-0055 0061-0070, 0072-0095, 0108-0112 and 0121-0135), This reads on the claim concepts of conduct a second similarity assessment between one of the first and second data streams and a source, the second similarity assessment to produce a second measurement of similarity between the assessed data stream and the source and to selectively present the assessed data stream to the source based on the second measurement). 
      However, Song does not appear to specifically disclose utilize natural language processing to convert a first data stream into a first sequence of events and to convert a second data stream into a second sequence of events, the first data stream having at least one first audio representation and the second data stream having at least one second audio representation, wherein the first and second audio representations are conveyed as natural language text. 
In the same field of endeavor, Nelson discloses utilize natural language processing to convert a first data stream into a first sequence of events and to convert a second data stream into a second sequence of events, the first data stream having at least one first audio representation and the second data stream having at least one second audio representation, wherein the first and second audio representations are conveyed as natural language text (Nelson discloses the ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable. For example, NLP makes it possible for computers to read text, hear speech, interpret it, measure sentiment and determine which parts are important. The use of speech and/or text recognition provides a more favorable user experience by allowing users to manage various aspects of electronic meetings using voice commands and/ or text commands. A data stream is a countably infinite sequence of elements and is used to represent data elements that are made available over time, such as data streams, electronic documents, etc. Audio/video data 300 may be one or more data packets, a data stream, and/or any other form of data that includes audio and/or video information related to an electronic meeting. In the example depicted in FIG. 3, audio/video data 300 includes first meeting content data 302 which, in turn, includes cue 304. Second meeting content data 312 includes JSON that can be used by an electronic meeting application to make decisions about a current electronic meeting. A sequence of events or things is a number of events or things that come one after another in a particular order. The input may be a cue for meeting intelligence apparatus to convert the input into a different format. Speech or text recognition logic 400 parses an interprets meeting content data 302 to detect natural language request 406, which is a cue 304 for meeting intelligence apparatus 102 to generate intervention data 310 to be sent to at least node 104A during an electronic meeting. Meeting intelligence apparatus may analyze meeting content data using any of a number of tools, such as speech or text recognition, voice or face identification, sentiment analysis, object detection, gestural analysis, thermal imaging, etc. (see Nelson: Para. 0133, 0197-0206, 0227-0232 and 0237- 0240). This reads on the claim concept of utilize natural language processing to convert a first data stream into a first sequence of events and to convert a second data stream into a second sequence of events, the first data stream having at least one first audio representation and the second data stream having at least one second audio representation, wherein the first and second audio representations are conveyed as natural language text);
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the duplicate frames multimedia streaming audio or video of Song in order to have incorporated natural language processing, as disclosed by Nelson, since both of these mechanisms are directed to Speech recognition is a technology that enables a machine or program to identify and understand words or phrases from spoken language and convert them into machine readable format. It is a subfield of computational linguistics that deals with technologies to allow spoken input into systems. Natural Language Processing (NLP), on the other hand, is a branch of artificial intelligence that investigates the use of computers to process or to understand human languages for the purpose of performing useful tasks. NLP is a technology used to simplify speech recognition processes to make them less time consuming. Voice recognition, also referred to as speech recognition, is a technology that offers great advantages for many types of human-machine communication. With speech recognition, computers can understand and interpret spoken words of phrases and convert them into text. It is used primarily for dictation, interface and security. NLP, on the other hand, is a technology that develops methodologies and algorithms that take as input or produce as output unstructured, natural language data. NLP and speech recognition are sometimes used in conjunction in applications such as voice assistants, ASR engines, and speech analytics tools. Speech recognition basically means talking to a computer and getting it to understand and interpret your spoken words. Speech recognition software use different algorithms to identify spoken languages and convert it into text. As a dictation device, voice recognition can be used to pick-up the words you say and type in on a computer. It is also used as an interface and control system for computers. The best example of natural language processing is machine translation, which automatically translates text or speech from one language to another. NLP is used to perform tasks such as automatic summarization, topic segmentation, relationship extraction, information retrieval, and speech recognition. Speech recognition basically means talking to a computer and getting it to understand and interpret your spoken words. It identifies and interprets words and phrases in spoken language and converts them into texts by computers. Natural Language Processing simply deals with the interaction between humans and computers using a natural language such as English. NLP technology applies machine learning algorithms to text and speech. NLP and speech recognition are often used in conjunction in applications such as voice assistants, engines, and speech analytics tools. Incorporating the teachings of Nelson into Song would produce the approach includes a user friendly way for users to join electronic meetings using mobile devices. The approach also allows participants to command and control an electronic meeting using their mobile device, and to receive individualized output, such as meeting transcripts, real-time language translation, messages, prompts, meeting information, and personalized audio streams, as disclosed by Nelson, (see Abstract).
Regarding claim 9, (drawn computer program product): claim 9 is computer program product claims respectively that correspond to system of claim 2. Therefore, 9 is rejected for at least the same reasons as the system of 2.                           
    	Regarding claim 15, (drawn method): claim 15 is method claims respectively that correspond to system of claim 1. Therefore, 15 is rejected for at least the same reasons as the system of 1.
	Regarding claim 16, (drawn method): claim 16 is method claims respectively that correspond to system of claim 2. Therefore, 16 is rejected for at least the same reasons as the system of 2.
Claims 3, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Song et al. (US 2018/0254070 A1, hereinafter Song) in view of Nelson et al. (US 2019/0273767 A1, hereinafter Nelson) and in view of Tjalve et al. (US 2015/0127347 A1, hereinafter Tjalve). 
Regarding dependent claim(s) 3, the combination of Song and Nelson discloses the system as in claim 2. However, the combination of Song and Nelson do not appear to specifically disclose further comprising the assessment manager configured to leverage a phonetic confusion matrix to identify confusable phrases within the first and second text representations, including the assessment manager to identify a first confusable phrase from one of the first and second phrase representations and to replace the identified first confusable phrase with an equivalent phrase from the matrix. 
In the same field of endeavor, Tjalve discloses further comprising the assessment manager configured to leverage a phonetic confusion matrix to identify confusable phrases within the first and second text representations, including the assessment manager to identify a first confusable phrase from one of the first and second phrase representations and to replace the identified first confusable phrase with an equivalent phrase from the matrix (Tjalve discloses configured to detect recognized speech segments from audio data received via a microphone 136 or other suitable acoustic input device. Recognized speech segments may be provided by the speech recognition system 134 to programs on the end user computing devices, represented by program 1138 and program n 140, based upon the speech grammars of those programs. Receiving an input of text representations of proposed speech grammar terms for a program under development and converting each text representation to a phonetic representation to allow for the identification of potentially confusable speech grammar terms. Speech recognition is the process of converting audio into text. Where a risk of confusion is identified, the speech grammar development tool may recommend a replacement phrase. The suggested replacement phrase may be selected based on data related to localization, synonyms, and/or any other suitable information, (see Tjalve: Para. 0013-0027). This reads on the claim concept of further comprising the assessment manager configured to leverage a phonetic confusion matrix to identify confusable phrases within the first and second text representations, including the assessment manager to identify a first confusable phrase from one of the first and second phrase representations and to replace the identified first confusable phrase with an equivalent phrase from the matrix). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the detection of repetitions in a transmitted in s multimedia streaming audio or video, and includes extracting a plurality of samples from the information stream and representing and identifying similarity in data streams of Song and Nelson in order to have incorporated speech grammar terms using weighted similarity matrix for confusion risk, as disclosed by Tjalve, since both of these mechanisms are directed to speech recognition works by breaking down the audio of a speech recording into individual sounds, analyzing each sound, using algorithms to find the most probable word fit in that language, and transcribing those sounds into text. Speech is the most natural form of human communication and speech processing has been one of the most exciting research areas of the signal processing. The classic way of building a speech recognition system is to build a generative model of language. And then for each word, you have a pronunciation model that says how this particular word is spoken. Typically it's written out as the sequence of phonemes which are basic units of sound, but for our vocabulary, we'll just say a sequence of tokens which represent a cluster of things that have been defined by linguistics experts. Then, the pronunciation models are fed into an acoustic model, which basically defines how does a given token sounds. These acoustic models are now used to describe the data itself. Here the data would be x, which is the sequence of frames of audio. Incorporating the teachings of Tjalve into Song and Nelson would produce that relate to identifying phonetically similar speech grammar terms during computer program development. For example, one disclosed embodiment provides a method including providing a speech grammar development tool configured to receive input of a text representation of each of a plurality of proposed speech grammar terms, convert each text representation to a phonetic representation of the speech grammar term, compare the phonetic representation of the speech grammar term to the phonetic representations of other speech grammar terms using a weighted similarity matrix, and provide an output regarding risk of confusion between two proposed speech grammar terms based upon a comparison of the phonetic representations of the two proposed speech grammar terms, as disclosed by Tjalve, (see Abstract).  
Regarding claim 10, (drawn computer program product): claim 10 is computer program product claims respectively that correspond to system of claim 3. Therefore, 10 is rejected for at least the same reasons as the system of 3. 
Regarding claim 17, (drawn method): claim 17 is method claims respectively that correspond to system of claim 3. Therefore, 17 is rejected for at least the same reasons as the system of 3. 
   Claims 4, 5, 11, 12, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Song et al. (US 2018/0254070 A1, hereinafter Song) in view of Nelson et al. (US 2019/0273767 A1, hereinafter Nelson) and in view of Kali et al. (US 2015/0161214 A1, hereinafter Kali). 
Regarding dependent claim(s) 4, the combination of Song and Nelson discloses the system as in claim 2. However, the combination of Song and Nelson do not appear to specifically disclose further comprising a representation manager configured to form a first sequence of tuple representations for the first data stream and to form a second sequence of tuple representations for the second data stream, and the representation manager to sequentially order the first and second sequences of tuple representations based on time stamp metadata associated with each tuple representation. 
In the same field of endeavor, Kali discloses further comprising a representation manager configured to form a first sequence of tuple representations for the first data stream and to form a second sequence of tuple representations for the second data stream, and the representation manager to sequentially order the first and second sequences of tuple representations based on time stamp metadata associated with each tuple representation (Kali discloses stream analytics supports compression across all data stream input sources. Data is stored in one or more databases usually in the form of tables. An event stream may thus be a sequence of timestamped tuples or events. The system includes mulita sequence of event and each data element having an associated timestamp. The homogeneous schema 128 may include a representation of one or more attributes of the first input data stream and the second input data stream including the common attribute and the first dynamic data type and the second dynamic data type. A stream may be a sequence of timestamped tuples. In some cases, there may be more than one tuple with the same timestamp, (see Kali: 0023- 0037, 0044, 0087 and 0124). This reads on the claim concept of further comprising a representation manager configured to form a first sequence of tuple representations for the first data stream and to form a second sequence of tuple representations for the second data stream, and the representation manager to sequentially order the first and second sequences of tuple representations based on time stamp metadata associated with each tuple representation). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the detection of repetitions in a transmitted in s multimedia streaming audio or video, and includes extracting a plurality of samples from the information stream and representing and identifying similarity in data streams of Song and Nelson in order to have incorporated a tuple is a finite ordered list (sequence) of element, as disclosed by Kali, since both of these mechanisms are directed to a tuple is a collection of objects which ordered and immutable. Tuples are sequences, just like lists. Stream query processing engines that achieves scalability via the notion of promising tuples. Tuples limit the attention of the query processor to a smaller subset of the input tuples that preserve the output features with respect to a specific query preference. This is particularly useful when the output of the models provides data for multiple purposes or the model requires data from multiple data sources to run. An example might be that a model produces a score that will be consumed by a downstream application and also generates additional data that describes the logic that was used to produce the score for auditing purposes. A data stream is an unbounded sequence of events over time. Stream Analytics jobs must include at least one data stream input. Stream Analytics also supports input known as reference data. Reference data is either completely static or changes slowly. It is typically used to perform correlation and lookups. The Audio stream is taken from the output of the audio mixer. Thus, the stream received over the network contains the same data as the data sent over HOM I and back to the audio codec for conversion to the analog signal that is available on the DIN connector. The ID of this stream is 1. The format of the audio stream is simpler than the video stream. The only two bytes that are sent as a header in front of the raw audio data is the sequence number of the packet, since the stream was enabled. This allows for detection of missing packets. Incorporating the teachings of Kali into Song and Nelson would produce detecting patterns across multiple input data streams related to one or more applications is disclosed. The method includes receiving multiple input data streams and generating one or more dynamic data types for one or more attributes of the input data streams, as disclosed by Kali, (see Abstract). 
Regarding dependent claim(s) 5, the combination of Song, Nelson and Kali discloses the system as in claim 4. However, the combination of Song and Nelson do not appear to specifically disclose wherein the conversion of the first data stream into a first sequence of events and the conversion of the second data stream into a second sequence of events further comprising the data manager configured to identify one or more images in each of the first and second data streams and to process the identified one or more images into one or more object representations. 
In the same field of endeavor, Kali discloses wherein the conversion of the first data stream into a first sequence of events and the conversion of the second data stream into a second sequence of events further comprising the data manager configured to identify one or more images in each of the first and second data streams and to process the identified one or more images into one or more object representations (Kali discloses the dynamic data types may refer to a composite data type identified for one or more attributes of the input data streams. A data stream is a sequence of digitally encoded coherent signals (packets of data or data packets) used to transmit or receive information that is in the process of being transmitted. A data stream is a set of extracted information from a data provider. It contains raw data that was gathered out of users' browser behavior from websites, where a dedicated pixel is placed. Data streams are useful for data scientists for big data and Al algorithms supply. These data sets can involve structured data, such as that organized in a database or otherwise according to a structured model, and/or unstructured data (e.g., emails, images, data blobs {binary large objects), web pages, complex event processing), {see Kali: Para. 0050, 0083-0095, 0158 and 0184). Stream analytics supports compression across all data stream input sources. Data is stored in one or more databases usually in the form of tables. An event stream may thus be a sequence of timestamped tuples or events. The system includes mulita sequence of event and each data element having an associated timestamp. The homogeneous schema 128 may include a representation of one or more attributes of the first input data stream and the second input data stream including the common attribute and the first dynamic data type and the second dynamic data type. A stream may be a sequence of timestamped tuples. In some cases, there may be more than one tuple with the same timestamp, (see Kali: 0023-0037, 0044, 0087 and 0124). This reads on the claim concept of wherein the conversion of the first data stream into a first sequence of events and the conversion of the second data stream into a second sequence of events further comprising the data manager configured to identify one or more images in each of the first and second data streams and to process the identified one or more images into one or more object representations). 
Regarding claim 11, (drawn computer program product): claim 11 is computer program product claims respectively that correspond to system of claim 4. Therefore, 11 is rejected for at least the same reasons as the system of 4. 
Regarding claim 12, (drawn computer program product): claim 12 is computer program product claims respectively that correspond to system of claim 5. Therefore, 12 is rejected for at least the same reasons as the system of 5. 
Regarding claim 18, (drawn method): claim 18 is method claims respectively that correspond to system of claim 4. Therefore, 18 is rejected for at least the same reasons as the system of 4. 
Regarding claim 19, (drawn method): claim 19 is method claims respectively that correspond to system of claim 5. Therefore, 19 is rejected for at least the same reasons as the system of 5.     
   Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Song et al. (US 2018/0254070 A1, hereinafter Song) in view of Nelson et al. (US 2019/0273767 A1, hereinafter Nelson), in view of Kali et al. (US 2015/0161214 A1, hereinafter Kali) and in view of Teerlink (US 2012/0233177 A1, hereinafter Teerlink). 
Regarding dependent claim(s) 6, the combination of Song, Nelson and Kali discloses the system an in claim 5. However, the combination of Song, Nelson and Kali do not appear to specifically disclose wherein at least a first subset of tuple representations includes a set of elements selected from the group consisting of: multiple words per object and multiple objects per word. 
In the same field of endeavor, Teerlink discloses wherein at least a first subset of tuple representations includes a set of elements selected from the group consisting of: multiple words per object and multiple objects per word (Teerlink discloses the Huffman coding scheme assigns shorter code words to the more frequent symbols, which helps reduce the size length of the encoded data. Sequences of repeated symbols create a special case that must be considered when tallying tuples. The Huffman code for a symbol is defined as the string of values associated with each path transition from the root to the symbol terminal node. Identifying a subset of symbols which characterize the group. The subset of symbols is used to find similar files from a general population of files to the files in the group of interest, (see Teerlink: Para. 0037-0054, 0063-0079 and 0141). This reads on the claim concept of wherein at least a first subset of tuple representations includes a set of elements selected from the group consisting of: multiple words per object and multiple objects per word). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the detection of repetitions in a transmitted in multimedia streaming audio or video, and includes extracting a plurality of samples from the information stream and representing and identifying similarity in data streams using tuple as ordered list (sequence) of Song, Nelson and Kali in order to have incorporated subset of symbols is used to find similar files from a general population of files to the files in the group of interest, as disclosed by Teerlink, since both of these mechanisms are directed to searches one or more input files for lines containing an identical match to a specified pattern. Three, "Beyond Compare," and similar algorithms, are line-by-line comparisons of multiple documents that highlight differences between them. Four, block level data de-duplication has no application in compliance contexts, data relocation, or business intelligence. Multi-way tree built based on the prefix of strings. Its nodes store the letters of an alphabet and point to multiple child nodes. Every node of consists of multiple branches. Each branch represents a possible character of keys. Every character of the input key is inserted as an individual node. Searching for a prefix in an ordered vocabulary is pretty fast with the help of a binary search algorithm. The binary                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     search compares a query prefix against a middle element in the list to see if the prefix comes before or after the middle element. The binary search is then repeated recursively on the correct half of the list while the other half is ignored. Because every step of the binary search halves the range still to be searched, the total search time is proportional to the logarithm of the number of words in the vocabulary. A data structure that exploits the shared prefixes to speed up the completion. Prefix tree arranges a set of words in a tree of nodes. The words are stored along paths from the root to leaf nodes. The edges corresponds to the letters, and the level on the tree corresponds to the letter position of a prefix. The first letter of the word defines which edge from the root to follow. The second letter defines which edge to take from the child node. Incorporating the teachings of Teerlink into Song, Nelson and Kali would produce identify groups of files based on symbols corresponding to an underlying data stream of original bits of data that are determined to be informationally important. The resulting symbols of a selected group are ordered according to how effectively each symbol characterizes the selected group of interest. The subset of symbols is used to find similar files from a general population of files to the files in the group of interest, as disclosed by Teerlink, (see Abstract). 
Regarding claim 13, (drawn computer program product): claim 13 is computer program product claims respectively that correspond to system of claim 6. Therefore, 13 is rejected for at least the same reasons as the system of 6. 
Regarding claim 20, (drawn method): claim 18 is method claims respectively that correspond to system of claim 6. Therefore, 20 is rejected for at least the same reasons as the system of 6. 
   Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Song et al. (US 2018/0254070 A1, hereinafter Song) in view of Nelson et al. (US 2019/0273767 A1, hereinafter Nelson) and in view of Moreno (US 2003/0101144 A1, hereinafter Moreno). 
Regarding dependent claim(s) 7, the combination of Moreno, Nelson and Carin discloses the system as in claim 1. However, the combination of Song and Nelson do not appear to specifically disclose wherein the distance measurement reflects a quantity of edits required to create equivalency between the first and second sequence of events. 
In the same field of endeavor, Moreno discloses wherein the distance measurement reflects a quantity of edits required to create equivalency between the first and second sequence of events (The distance processor 42 determines a measure of similarity between signatures 40, and generates distance matrices 44, described further below. Vectors indicative of samples in respective segments are generated, and each of the vectors in the segments is correlated to generate a covariance matrix corresponding to the segment. Each graphed point (i,j) indicates the distance between element i in the sequence to element j in the sequence, which is distance measurement reflects a quantity of edits. The distance matrices 44 contain entries of the distance between signatures covariance matrices, generated from the transmission stream. Each of the segments includes a plurality of samples represented as a sequence vector set of respective vectors, each vector being generated from a sample of the transmission stream. An event is any occurrence that your application program is designed to handle. The signatures 40 are received by a distance processor 42, which determines signatures that are similar by comparing them to other signatures. Similarity is determined by computing a distance between signatures in the multidimensional space corresponding to the vectors. Measures are used to show how closely two data sets are related to each other. The main difference between them is the units in which they are measured. The correlation measure is defined to assume between values. An event is any occurrence that your application program is designed to handle, (see Moreno: Para. 0019-0030). This reads on the claim concept of wherein the distance measurement reflects a quantity of edits required to create equivalency between the first and second sequence of events). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the detection of repetitions in a transmitted in multimedia streaming audio or video, and includes extracting a plurality of samples from the information stream and representing and identifying similarity in data streams using tuple as ordered list (sequence) of Song and Nelson in order to have incorporated edits required to create equivalency, as disclosed by Moreno, since both of these mechanisms are directed to the covariance matrices are aggregated into a sequence of covariance matrices and compared to each other covariance matrix in the sequence to generate a distance matrix. The distance matrix includes a distance value, indicative of the similarity between the covariance matrices, as a result of the comparing of each covariance matrix. The distance matrix is then traversed to determine similar sequences of covariance matrices, wherein determining similar sequences comprises searching for diagonals of similar distance values. The distance matrix, therefore, contains a distance value for each pair of covariance matrices compared. A relatively low distance value between the two covariance matrices is indicative of a high degree of similarity. Incorporating the teachings of Moreno into Moreno and Nelson would produce detection of repetitions in a transmitted signal such as streaming audio or video is described, and includes extracting a plurality of samples from the information stream and accumulating the samples into segments comprising an interval of the transmitted signal. A vector indicative of the samples in each of the segments is generated, and each of the vectors in the segments is correlated to generate a covariance matrix, or signature, corresponding to the segment. Each of the covariance matrices are aggregated into a sequence of covariance matrices and compared to other covariance matrices to generate a distance matrix. The distance matrix includes a distance value, indicative of the similarity between the distance matrices, as a result of the comparing of each matrix. The distance matrix is then traversed to determine similar sequences of covariance matrices, as disclosed by Moreno, (see Abstract).  
Regarding claim 14, (drawn computer program product): claim 14 is computer program product claims respectively that correspond to system of claim 7. Therefore, 14 is rejected for at least the same reasons as the system of 7. 
                                                                Examiner's Notes
Examiner cites particular columns and line numbers in the references as applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner and the additional related prior arts made of record that are considered pertinent to applicant's disclosure to further show the general state of the art.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOHANES Demiss KELEMEWORK whose telephone number is (571)272-8772. The examiner can normally be reached Monday-Friday 8:00 am-5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on 571-272-0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOHANES D KELEMEWORK/Examiner, Art Unit 2164                                                                                                                                                                                                        

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2164