Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the communication filed on April 5, 2021.
Claims 1-19 are pending in this action. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-7 and 12-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saggi et al. (US 2020/0372066) in view of Khan et al. (WO 2021/030915 A1).
Regarding claim 1, Saggi discloses a method, comprising:  
decoding, by one or more processors, digital audio signals from a set of digital video files (para [0019] “A) receiving multimedia content, wherein the multimedia content Includes one or more frames and each of the one or more frames includes one or more audio elements, one or more visual elements, and metadata; B) extracting of the one or more audio elements..."); 
transcribing, by the one or more processors, the digital audio signals into corresponding digital text data (para [0019] "C) retrieving or generating a transcript of the multimedia content based on the one or more audio elements and the one or more visual elements;"); 
extracting, by the one or more processors, attributes from the digital text data (para [0019] "D) determining a plurality of keywords from the transcript;"); 
Saggi does not disclose determining, by the one or more processors, a mapping between at least a portion of the attributes and an aspect, and generating, by the one or more processors, a set of mapping data that includes the mapping arid embedding data corresponding to the attributes and the aspect, wherein the digital text data comprises different words, and each of the different words is assigned an embedding vector, and wherein the aspect is a descriptor associated with a collection of related attributes. 
However, Khan does disclose determining, by the one or more processors, a mapping between at least a portion of the attributes and an aspect (para [0024] …“classifying the utterances comprising classifying as one of a question utterance, a statement utterance, a positive answer utterance, a negative answer utterance, a backchannel utterance, and an excluded utterance."), and generating, by the one or more processors, a set of mapping data that includes the mapping and embedding data corresponding to the attributes and the aspect (para [0078] "...the historical data in the corpora can be mapped to the set of labels; for example, question, statement, positive answer, negative answer, backchannel or excluded."), wherein the digital text data comprises different words (para [0078] "The first layer of the GRU network can treat each utterance as a sequence of words..."), and each of the different words is assigned an embedding vector (para [0026] "...each utterance can be represented as a multi- dimensional vector using a word embedding model."), and wherein the aspect is a descriptor associated with a collection of related attributes (para [0024]). 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Saggi by including, “a mapping between at least a portion of the attributes and an aspect, and generating, by the one or more processors, a set of mapping data that includes the mapping arid embedding data corresponding to the attributes and the aspect, wherein the digital text data comprises different words, and each of the different words is assigned an embedding vector, and wherein the aspect is a descriptor associated with a collection of related attributes” taught by Khan because Khan teaches, performing natural language processing on an audio transcript by classifying extracted attributes can drastically reduce the amount of human labor required to document relevant information contained in said transcript (see para [0002]).

Regarding claim 2, Saggi and Khan disclose the method of claim 1. Khan further discloses further comprising generating, by a client computer system, a visual representation of a mapping between the attributes and the aspect based on at least one of the mapping data and the embedding data (para [0099] "The user interface can display the transcribed word/phrase associated with the removed edit, and each word/phrase's associated contextual linguistic entities...").
Regarding claim 3, Saggi and Khan disclose the method of claim 1. Saggi further discloses further comprising selecting, by the one or more processors, the set of digital video files (para [0055] "...selecting a plurality of the one or more segments...”).
Regarding claim 4, Saggi and Khan disclose the method of claim 1. Khan further discloses wherein extracting the attributes comprises classifying, by a natural language sequence tagging algorithm, the different words within the digital text data (para [0024)).
Regarding claim 5, Saggi and Khan disclose the method of claim 1. Saggi further discloses ‘by a bidirectional encoder representations from transformers language model (para [0120] ”...the BERT analysis outputs a plurality of boundaries (e.g., timestamps or transcript markers) that segment the transcript into a plurality of semantically coherent units..."). Khan further discloses wherein extracting the attributes comprises classifying, the different words within the digital text data (para [0024]).
Regarding claim 6, Saggi and Khan disclose the method of claim 1. Khan further discloses wherein determining the mapping comprises receiving input commands from a user interface, wherein each input command associates a specific attribute with a specific aspect (para [0096] "With the API, the clinician inputs a location and documentation of different kinds of EMR fields with a specific EMR action type. In this way, local EMR actions can be mapped to a set of generic EMR actions.").
Regarding ciaim 7, Saggi and Khan disclose the method of claim 1. Khan further discloses wherein determining the mapping comprises identifying clusters of attributes (para [0088] "...can use topic modeling using a topic machine learning model; for example, by performing unsupervised machine learning to form k number of topics (clusters of words) occurring together...”), and assigning each cluster of attributes to a specific aspect (para [0089] "...the system 200 can use topic modelling, for example, to keep track of the focus of each visit, the distribution of word usage, categorization...").

Regarding claim 12, Saggi discloses a method comprising: 
directing, via a client computer system, a cloud server system (para [0084] "..the analysis engine 207 employs a plurality of computing devices that may be arranged, for example, in one or more server banks or computer banks or other arrangements. Such computing devices can be located in a single installation or may be distributed among many different geographical locations. For example, the analysis engine 207 can include a plurality of computing devices that together may include a hosted computing resource...") to decode digital audio signals from a set of digital video files (para [0019]), wherein the cloud server system is connected to the client computer system via a data network (para [0077] "According to one embodiment, the summarization system 100 is operative to transmit and receive transmissions from one or more users 202 via a network 218, In at least one embodiment, access to functions of the system 201 is provided and secured through an application programming interface 220.");
directing, via the client computer system, the cloud server system (para [0084]) to transcribe the digital audio signals into corresponding digital text data (para [0019]); 
directing, via the client computer system, the cloud server system (para [0084]) to extract attributes from the digital text data (para [0019]). 
Saggi does not disclose directing, via the client computer system, the cloud server system to determine a mapping between the attributes and an aspect, directing, via the client computer system, the cloud server system to generate a set of mapping data and embedding data corresponding to the attributes and the aspect, and generating, via the client computer system, a visual representation of a mapping between the attributes and the aspect based on at least one of the mapping data and the embedding data, wherein the digital text data comprises different words, and each of the different words is assigned an embedding vector. 
Khan does disclose directing, via the client computer system, the cloud server system (para [0074] "In some cases, functions of the above modules can be executed on remote computing devices, such as centralized servers and cloud computing resources communicating over the network module 276.") to determine a mapping between the attributes and an aspect (para [0024]), directing, via the client computer system, the cloud server system (para [0074]) to generate a set of mapping data and embedding data corresponding to the attributes and the aspect (para [0078}), and generating, via the client computer system, a visual representation of a mapping between the attributes and the aspect based on at least one of the mapping data and the embedding data (para [0099]), wherein the digital text data comprises different words (para [0078]), and each of the different words is assigned an embedding vector (para [0026]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Saggi by including, “determine a mapping between the attributes and an aspect, directing, via the client computer system, the cloud server system to generate a set of mapping data and embedding data corresponding to the attributes and the aspect, and generating, via the client computer system, a visual representation of a mapping between the attributes and the aspect based on at least one of the mapping data and the embedding data, wherein the digital text data comprises different words, and each of the different words is assigned an embedding vector” as taught by Khan because Khan teaches performing natural language processing on an audio transcript by classifying extracted attributes can drastically reduce the amount of human labor required to document relevant information contained in said transcript (see para [00077).
Regarding claim 13, Saggi and Khan disclose the method of claim 12. Khan further discloses wherein to transcribe the digital audio signals (para [0074)), the cloud server system performs an on-demand function (para (0074).
Regarding claim 14, Saggi and Khan disclose the method of claim 12. Khan further discloses wherein to extract attributes (para [0075]), the cloud server system performs an on-demand function (para [0074]).

Claim(s) 8 and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saggi et al. (US 2020/0372066) in view of Khan et al. (WO 2021/030915 A1) as applied to claim 7 above, and further in view of Janakiraman et al, (US 10,628,264).
Regarding claim 8, Saggi and Khan disclose the method of claim 7. Khan further discloses on a set of word embedding vectors corresponding to the attributes (para [0026]). Neither Saggi nor Khan disclose wherein identifying the clusters comprises performing a density based spatial clustering of applications with noise operation. Janakiraman does disclose wherein identifying the clusters comprises performing a density based spatial clustering of applications with noise operation (col 7, In 23-26 °...cluster analysis engine 225 can implement other cluster analysis techniques such as Censity-based Spatial Clustering of Applications with Noise (DBSCAN)..."). 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Saggi in view of Khan by including, identifying the clusters comprises performing a density based spatial clustering of applications with noise operation as taught by Janakiraman because Janakiraman taches, DBSCAN is one of multiple techniques commonly used in clustering analysis (sea col 7, In 18-30).
Regarding claim 9, Saggi, Khan, and Janakiraman disclose the method of claim 8. Khan further discloses wherein the word embedding vectors are standardized embedding vectors para [0078] ". each word/utterance can be represented as a muiti-dimensional (or example, 200-dimensional) vector using a word embedding model.,.”).

Claim(s) 10-11 and 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saggi et al. (US 2020/0372066) in view of Khan et al. (WO 2021/030915 A1) as applied to claims 1 and 12 above, and further in view of Benkreira et al. (US 2020/0251102).
Regarding claim 10, Saggi and Khan disclose the method of claim 1. Saggi further discloses according to a bidirectional encoder representations from transformers language model (para [0120)). Neither Saggi nor Khan disclose further comprising generating, by the one or more processors, a sentiment score for the aspect. However, Benkreira does disclose further comprising generating, by the one or more processors, a sentiment score for the aspect (para 0003] "a sentiment analysis model to analyze a sentiment of the user in relation to the topic... sentiment analysis model determines one or more second scores. based on the communication’). 
Therefore, it would have been obvious to one of ordinary skill in the art to combine this teachings of Saggi and Khan with those of Benkreira one because sentiment analysis could be used to make decisions based on a user's satisfaction with regard to a specific topic or subject of interest (see para [O060)). 
Regarding claim 11, Saggi, Khan, and Benkreira disclose the method of claim 10. Benkreira further discloses further comprising generating, by a client computer system connected to the one or more processors via a data network, a sentiment contour based on the sentiment score (para [0059] "...the sentiment analysis model may track one or more scores during a duration of the communication (from time To to time Te) to determine a representative performance score... based on a change in one or more sentiment scores during the duration of the communication... Accordingly, the graph may indicate an overall sentiment of the user during the communication."). 
Regarding claim 15, Saggi and Khan disclose the method of claim 12. Saggi further discloses according to a bidirectional encoder representations from transformers language model (para [0120]). Neither Saggi nor Khan disclose further comprising directing the cloud server system to generate a sentiment score for the aspect. Capital One does disclose further comprising directing the cloud server system to generate a sentiment score for the aspect (para [0003]). 
It would have been obvious to one of ordinary skill in the art to combine the teachings of Saggi and Khan with those of Benkreira because sentiment analysis could be used to make decisions based on a user's satisfaction with regard to a specific topic or subject of interest (see para [0060)).
Regarding claim 16, Saggi, Khan, and Benkreira disclose the method of claim 15. Benkreira further discloses the cloud server performs an on-demand function (para [0074]). Benkreira further discloses wherein to generate the sentiment score (para [0003)).

Claim(s) 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saggi et al. (US 2020/0372066) in view of Khan et al. (WO 2021/030915 A1) and further in view of Shum et al. (US 2020/0380980) and Bao et al. (US 2018/0121443).
Regarding claim 17, Saggi discloses a device, comprising: 
a non-transitory memory storing instructions (para [0130 "...such computer-readable media can comprise various forms of data storage devices or media such as... any other medium which can be used to carry or store computer program code in the form of computer-executable instructions or data structures and which can be accessed by a computer.”); and 
one or more processors in communication with the non-transitory memory (para [0084] "...the analysis engine 207 includes one or more processors..."), wherein the one or more processors execute the instructions to: receive a user interface request (para [0054] "... The system or process may accept an input requesting one or more salient moments from original content.”); and
generate a visual display responsive to the user interface request (para [0128] "...the moments are merged into a final summarization and the final summarization (or a visual display thereof)...”). Saggi does not disclose receive mapping data that associates a set of mapped attributes with an aspect, receive word embedding vectors corresponding to the set of mapped attributes and word embedding vectors corresponding to a set of unmapped attributes, by: calculating a vector space distance between the aspect and each unmapped attribute, sorting the unmapped attributes according to each vector space distance, and rendering a display image depicting the unmapped attributes in a sorted order. 
However, Khan does disclose receive mapping data that associates a set of mapped attributes with an aspect (para [0037] "...machine learning model trained using one or more corpora of historical data comprising previous textual data labelled with attributes..."), receive word embedding vectors corresponding to the set of mapped attributes and word embedding vectors corresponding to a set of unmapped attributes (para [0078] "...each word/utterance can be represented as a multi-dimensional (for example, 200-dimensional) vector using a word embedding model (for example, the Wikipedia-PubMed word embedding model)... The GRU neural network can be trained using one or more suitable corpora of historical data... mapped to the set of labels; for example, question, statement, positive answer, negative answer, backchannel or excluded."). 
It would have been obvious to one of ordinary skill in the art to combine the teachings of Saggi and Khan because mapping data and word embedding data generated by the system could beused to generate a display for the user which they could use to assess the validity of mapping decisions made by the system (see para [0099]). Neither Saggi nor Khan disclose by: calculating a vector space distance between the aspect and each unmapped attribute, sorting the unmapped attributes according to each vector space distance, and rendering a display image depicting the unmapped attributes in a sorted order. However, Shum does disclose by: calculating a vector space distance between the aspect and each unmapped attribute (para [0268] "...device 800 determines a representation (e.g., embedding) of the speech input and compares the determined embedding to each of a plurality of (e.g., 40) embeddings included in the user's speaker profile. For example, for each of the plurality of embeddings, device 800 computes a distance metric (e.g., normalized cosine distance) between the respective embedding and the determined embedding."). 
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Saggi, Khan, and Shum because calculating a distance between vector representations can be used to determine similarity between an unmapped attribute and attributes which have previously been mapped (see para [0268]). Neither Saggi, Khan, nor Shum disclose sorting the unmapped attributes according to each vector space distance, and rendering a display image depicting the unmapped attributes in a sorted order. However, Bao does disclose sorting the unmapped attributes according to each vector space distance (para [0098] "the NLPS may form an ordered list 450 of words or phrases. The word or phrase associated with corpus vector 522 may be ranked first within the list 450 because corpus vector 522 was determined to be the most similar vector to query vector 440."), and rendering a display image depicting the unmapped attributes in a sorted order (para [0125] "The ranked list 700 may be displayed upon a display of the client device... The ranked list 700 includes one or more words or phrase that were included within information corpus 331 and were deemed similar"). 
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Saggi, Khan, Shum, and Bao because sorting and displaying attributes according to semantic distances would provide a user with further insight regarding the text being processed (see para [0002]).
Regarding claim 18, Saggi, Khan, Shum, and Bao disclose the device of claim 17. Shum further discloses wherein each vector space distance is calculated according to a cosine distance function and scaled according to a standardization function (para [0268)).

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saggi et al. (US 2020/0372066) in view of Khan et al. (WO 2021/030915 A1) and further in view of Shum et al. (US 2020/0380980) and Bao et al. (US 2018/0121443) as applied to claim 17 above, and further in view of Larcheveque et al. (US 2011/0238409).
Regarding claim 19, SalesTing, Univ Toronto, Apple, and IBM disclose the device of claim 17, but do not disclose wherein each vectorspace distance is calculated to be a minimum distance between word embedding vectors associated with the mapped attributes and a word embedding vector associated with an unmapped attribute. Larcheveque does disclose wherein each vector space distance is calculated to be a minimum distance between word embedding vectors associated with the mapped attributes and a word embedding vector associated with an unmapped attribute (para [0082] "A number of metrics may be used to measure a distance between a graph of an utterance and a matching graph pattern in the conversational agent's knowledge. These metrics may combine one or more of the following quantities algebraically... The semantic distance between the trait values in a matching pair."). 
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Saggi, Khan, Shum, Bao and Larcheveque because the semantic distance between two attributes may more accurately be calculated by comparing unclassified attributes with known attributes previously classified (see para [0075]).



Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Abul K. Azad whose telephone number is (571) 272-7599. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Bhavesh Mehta, can be reached at (571) 272-7453.
Any response to this action should be mailed to:
Commissioner for Patents 
P.O. Box 1450
Alexandria, VA 22313-1450
Or faxed to: (571) 273-8300.
Hand-delivered responses should be brought to 401 Dulany Street, Alexandria, VA-22314 (Customer Service Window).
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
					
December 17, 2022								
	
/ABUL K AZAD/           Primary Examiner, Art Unit 2656