Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on March 30, 2021, April 14, 2022, and April 28, 2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Solomon (U.S. Publication No. 20180233141).
Regarding claim 1, Solomon discloses a method for mining an entity focus in a text ([0032] - The present disclosure relates generally to systems, methods and logical constructs for providing intelligent assistance to users), the method comprising:
performing word and phrase feature extraction on an 5input text ([0047] - Audio data from the audio processor 134 may be transformed by feature extractor 136 into data for processing by a speech recognition engine 140 of the speech recognition program 120);
inputting an extracted word and phrase feature into a text coding network for coding, to obtain a coding sequence of the input text ([0052] - Individual HMMs for separate phonemes and words may be combined to create an HMM for a sequence of phonemes or words);
processing the coding sequence of the input text using 10a core entity labeling network to predict a position of a core entity in the input text ([0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100);
extracting a subsequence corresponding to the core entity in the input text from the coding sequence of the input text, based on the position of the core entity in the 15input text ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language);
and predicting a position of a focus corresponding to the core entity in the input text using a focus labeling network, based on the coding sequence of the input text and the subsequence corresponding to the core entity in the input 20text ([0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100).
Regarding claim 2, Solomon discloses the method, wherein, the performing word and phrase feature extraction on an input text, comprises:
performing word and phrase hybrid embedding on a word 25sequence corresponding to the input text to obtain a corresponding word feature vector ([0048] - Feature extractor 136 may utilize any suitable dimensionality reduction techniques to process the audio data and generate feature vectors 142. Example techniques include using mel-frequency cepstral coefficients (MFCCs), linear discriminant analysis, deep neural network techniques, etc.);
performing position embedding on characters in the word sequence corresponding to the input text to obtain a 41corresponding position feature vector ([0155] - The position identifier 106 may be configured to output an entity position (i.e., location) 114 of a detected entity. In other words, the position identifier 106 may predict the current position of a given entity based on collected sensor data, and output such information as entity position 114);
performing named entity recognition on the input text, and generating a named entity type feature vector representing a type of a named entity based on a result of 5the named entity recognition ([0051] - In some examples, the speech recognition engine 140 may utilize Hidden Markov models (HMMs) to match feature vectors 142 with phonemes and/or other speech components. [0152] - The entity identifier 104 may output an entity identity 112 of a detected entity, and such entity identity may have any suitable degree of specificity. In other words, based on received sensor data, the entity tracker 100 may predict the identity of a given entity, and output such information as entity identity 112);
and splicing the word feature vector, the position feature vector and the named entity type feature vector corresponding to the input text to form a word and phrase feature vector of the input text ([0052] - Individual HMMs for separate phonemes and words may be combined to create an HMM for a sequence of phonemes or words).
Regarding claim 3, Solomon discloses the method, wherein, the generating a named entity type feature vector representing a type of a named entity based on a result of the named entity recognition, comprises: performing part of speech labeling on a target phrase, 15in response to not recognizing a type of a named entity of the target phrase in the input text, and generating the named entity type feature vector based on the type of the named entity recognized from the input text and a part of speech labeling result of the target phrase ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language [0132] - the intent handler 50 may be unable to recognize a particular surface form it receives. The intent handler 50 may clarify this surface form via one or more grounding and repairing techniques. In this manner and going forward, the unrecognized surface form subsequently may be correlated with the clarified surface form, whereby the intent handler 50 now may recognize the previously-unrecognized surface form).
Regarding claim 4. Solomon discloses the method, wherein the processing the coding sequence of the input text using a core entity labeling network to predict a position of a core entity in the input text, comprises: inputting the coding sequence of the input text into the 25core entity labeling network to predict a probability of each word string in the input text being the core entity, and labeling a starting position and an ending position of the core entity respectively using a double pointer based on the probability of each word string in the input text 30being the core entity ([0218] - The threshold data 820 may include an entity identification threshold 822, an entity position/location threshold 824, and an entity status threshold 826. Each of these thresholds may be defined as a probability. When an entity identity, location, or status is determined to have a detection probability that exceeds the threshold probability for that entity identity, location, or status, a detection of that entity identity, location, or status may be indicated and/or recorded).
Regarding claim 5, Solomon discloses the method, wherein, the predicting a position of a focus corresponding to the core entity in the input text using a focus labeling network, based on the coding sequence of the input text and the subsequence 5corresponding to the core entity in the input text, comprises:
acquiring a priori feature of the focus of the core entity constructed based on a focus repository of the core entity ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language);
and generating a first focus feature of the input text, based on the coding sequence of the input text, the priori feature of the focus of the core entity, and the subsequence corresponding to the core entity in the input text, and inputting the first focus feature of the input text into the focus labeling network, to predict the position of the focus corresponding to the core entity predicted by the core entity labeling network ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language [0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100).
Regarding claim 6, Solomon discloses the method, wherein, the inputting the first focus feature of the input text into the focus labeling 20network, to predict the position of the focus corresponding to the core entity predicted by the core entity labeling network, comprises: inputting the first focus feature of the input text into the focus labeling network to predict a probability of each 25word string in the input text being the focus of the core entity, and labeling a starting position and an ending position of the focus of the core entity respectively using a double pointer based on the probability of each word string in the input text being the focus of the core entity ([0218] - The threshold data 820 may include an entity identification threshold 822, an entity position/location threshold 824, and an entity status threshold 826. Each of these thresholds may be defined as a probability. When an entity identity, location, or status is determined to have a detection probability that exceeds the threshold probability for that entity identity, location, or status, a detection of that entity identity, location, or status may be indicated and/or recorded).
Regarding claim 7, Solomon discloses the method, wherein, the input text comprises:
labeling information of the core entity and the 43corresponding focus ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language);
and wherein the method further comprises:

determining prediction errors of positions of the core entity and the corresponding focus in the input text, based 5on the labeling information of the core entity and the corresponding focus in the input text, and iteratively adjusting parameters in a model for labeling the focus of the core entity using a backpropagation method to obtain a trained model for labeling the focus of the core entity, the 10model for labeling the focus of the core entity comprising the text coding network, the core entity labeling network, and the focus labeling network ([0132] – training the intent handler 50 in a real-time or batch-mode manner to correlate an unrecognized surface form with a newly resolved surface form. [0151] - Each of the entity identifier 104, person identifier 105, position identifier 106, and status identifier 108 is configured to interpret and evaluate sensor data received from the plurality of sensors 102, and to output context information 110 based on the sensor data. Context information 110 may include the entity tracker's guesses/predictions as to an identity, position, and/or status of one or more detected entities based on received sensor data. As will be described in more detail below, each of the entity identifier 104, person identifier 105, position identifier 106, and status identifier 108 may output their predictions/identifications along with a confidence value).
Regarding claim 8, Solomon discloses the method, wherein the method further comprises:
extracting a relative position feature of the core entity and the corresponding focus based on the labeling information of the core entity and the corresponding focus in the input text and coding the relative position feature to obtain a relative position feature sequence ([0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100);
and 20the predicting a position of a focus corresponding to the core entity in the input text using a focus labeling network, based on the coding sequence of the input text and the subsequence corresponding to the core entity in the input text ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language), comprises:
acquiring a priori feature of the focus of the core entity constructed based on a focus repository of the core entity ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language);
splicing a result of coding the subsequence corresponding to the core entity in the input text with the 30relative position feature sequence to obtain a coding 44sequence of the core entity in the input text ([0052] - Individual HMMs for separate phonemes and words may be combined to create an HMM for a sequence of phonemes or words);
and generating a second focus feature of the input text, based on the coding sequence of the input text, the priori feature of the focus of the core entity, and the coding 5sequence corresponding to the core entity in the input text, and inputting the second focus feature of the input text into the focus labeling network, to predict the position of the focus corresponding to the core entity predicted by the core entity labeling network ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language [0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100).
Regarding claim 9, Solomon discloses an electronic device ([0032] - The present disclosure relates generally to systems, methods and logical constructs for providing intelligent assistance to users), comprising:
one or more processors (Figure 3 – Processor 128);
and a storage configured to store one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to 15perform operations (Figure 3 – Memory 126), the operations comprising:
performing word and phrase feature extraction on an 5input text ([0047] - Audio data from the audio processor 134 may be transformed by feature extractor 136 into data for processing by a speech recognition engine 140 of the speech recognition program 120);
inputting an extracted word and phrase feature into a text coding network for coding, to obtain a coding sequence of the input text ([0052] - Individual HMMs for separate phonemes and words may be combined to create an HMM for a sequence of phonemes or words);
processing the coding sequence of the input text using 10a core entity labeling network to predict a position of a core entity in the input text ([0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100);
extracting a subsequence corresponding to the core entity in the input text from the coding sequence of the input text, based on the position of the core entity in the 15input text ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language);
and predicting a position of a focus corresponding to the core entity in the input text using a focus labeling network, based on the coding sequence of the input text and the subsequence corresponding to the core entity in the input 20text ([0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100).
Regarding claim 10, Solomon discloses the electronic device, wherein, the performing word and phrase feature extraction on an input text, comprises:
performing word and phrase hybrid embedding on a word 25sequence corresponding to the input text to obtain a corresponding word feature vector ([0048] - Feature extractor 136 may utilize any suitable dimensionality reduction techniques to process the audio data and generate feature vectors 142. Example techniques include using mel-frequency cepstral coefficients (MFCCs), linear discriminant analysis, deep neural network techniques, etc.);
performing position embedding on characters in the word sequence corresponding to the input text to obtain a 41corresponding position feature vector ([0155] - The position identifier 106 may be configured to output an entity position (i.e., location) 114 of a detected entity. In other words, the position identifier 106 may predict the current position of a given entity based on collected sensor data, and output such information as entity position 114);
performing named entity recognition on the input text, and generating a named entity type feature vector representing a type of a named entity based on a result of 5the named entity recognition ([0051] - In some examples, the speech recognition engine 140 may utilize Hidden Markov models (HMMs) to match feature vectors 142 with phonemes and/or other speech components. [0152] - The entity identifier 104 may output an entity identity 112 of a detected entity, and such entity identity may have any suitable degree of specificity. In other words, based on received sensor data, the entity tracker 100 may predict the identity of a given entity, and output such information as entity identity 112);
and splicing the word feature vector, the position feature vector and the named entity type feature vector corresponding to the input text to form a word and phrase feature vector of the input text ([0052] - Individual HMMs for separate phonemes and words may be combined to create an HMM for a sequence of phonemes or words).
Regarding claim 11, Solomon discloses the electronic device, wherein, the generating a named entity type feature vector representing a type of a named entity based on a result of the named entity recognition, comprises: performing part of speech labeling on a target phrase, 15in response to not recognizing a type of a named entity of the target phrase in the input text, and generating the named entity type feature vector based on the type of the named entity recognized from the input text and a part of speech labeling result of the target phrase ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language [0132] - the intent handler 50 may be unable to recognize a particular surface form it receives. The intent handler 50 may clarify this surface form via one or more grounding and repairing techniques. In this manner and going forward, the unrecognized surface form subsequently may be correlated with the clarified surface form, whereby the intent handler 50 now may recognize the previously-unrecognized surface form).
Regarding claim 12. Solomon discloses the electronic device, wherein the processing the coding sequence of the input text using a core entity labeling network to predict a position of a core entity in the input text, comprises: inputting the coding sequence of the input text into the 25core entity labeling network to predict a probability of each word string in the input text being the core entity, and labeling a starting position and an ending position of the core entity respectively using a double pointer based on the probability of each word string in the input text 30being the core entity ([0218] - The threshold data 820 may include an entity identification threshold 822, an entity position/location threshold 824, and an entity status threshold 826. Each of these thresholds may be defined as a probability. When an entity identity, location, or status is determined to have a detection probability that exceeds the threshold probability for that entity identity, location, or status, a detection of that entity identity, location, or status may be indicated and/or recorded).
Regarding claim 13, Solomon discloses the electronic device, wherein, the predicting a position of a focus corresponding to the core entity in the input text using a focus labeling network, based on the coding sequence of the input text and the subsequence 5corresponding to the core entity in the input text, comprises:
acquiring a priori feature of the focus of the core entity constructed based on a focus repository of the core entity ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language);
and generating a first focus feature of the input text, based on the coding sequence of the input text, the priori feature of the focus of the core entity, and the subsequence corresponding to the core entity in the input text, and inputting the first focus feature of the input text into the focus labeling network, to predict the position of the focus corresponding to the core entity predicted by the core entity labeling network ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language [0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100).
Regarding claim 14, Solomon discloses the electronic device, wherein, the inputting the first focus feature of the input text into the focus labeling 20network, to predict the position of the focus corresponding to the core entity predicted by the core entity labeling network, comprises: inputting the first focus feature of the input text into the focus labeling network to predict a probability of each 25word string in the input text being the focus of the core entity, and labeling a starting position and an ending position of the focus of the core entity respectively using a double pointer based on the probability of each word string in the input text being the focus of the core entity ([0218] - The threshold data 820 may include an entity identification threshold 822, an entity position/location threshold 824, and an entity status threshold 826. Each of these thresholds may be defined as a probability. When an entity identity, location, or status is determined to have a detection probability that exceeds the threshold probability for that entity identity, location, or status, a detection of that entity identity, location, or status may be indicated and/or recorded).
Regarding claim 15, Solomon discloses the electronic device, wherein, the input text comprises:
labeling information of the core entity and the 43corresponding focus ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language);
and wherein the method further comprises:
determining prediction errors of positions of the core entity and the corresponding focus in the input text, based 5on the labeling information of the core entity and the corresponding focus in the input text, and iteratively adjusting parameters in a model for labeling the focus of the core entity using a backpropagation method to obtain a trained model for labeling the focus of the core entity, the 10model for labeling the focus of the core entity comprising the text coding network, the core entity labeling network, and the focus labeling network ([0132] – training the intent handler 50 in a real-time or batch-mode manner to correlate an unrecognized surface form with a newly resolved surface form. [0151] - Each of the entity identifier 104, person identifier 105, position identifier 106, and status identifier 108 is configured to interpret and evaluate sensor data received from the plurality of sensors 102, and to output context information 110 based on the sensor data. Context information 110 may include the entity tracker's guesses/predictions as to an identity, position, and/or status of one or more detected entities based on received sensor data. As will be described in more detail below, each of the entity identifier 104, person identifier 105, position identifier 106, and status identifier 108 may output their predictions/identifications along with a confidence value).
Regarding claim 16, Solomon discloses the electronic device, wherein the method further comprises:
extracting a relative position feature of the core entity and the corresponding focus based on the labeling information of the core entity and the corresponding focus in the input text and coding the relative position feature to obtain a relative position feature sequence ([0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100);
and 20the predicting a position of a focus corresponding to the core entity in the input text using a focus labeling network, based on the coding sequence of the input text and the subsequence corresponding to the core entity in the input text ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language), comprises:
acquiring a priori feature of the focus of the core entity constructed based on a focus repository of the core entity ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language);
splicing a result of coding the subsequence corresponding to the core entity in the input text with the 30relative position feature sequence to obtain a coding 44sequence of the core entity in the input text ([0052] - Individual HMMs for separate phonemes and words may be combined to create an HMM for a sequence of phonemes or words);
and generating a second focus feature of the input text, based on the coding sequence of the input text, the priori feature of the focus of the core entity, and the coding 5sequence corresponding to the core entity in the input text, and inputting the second focus feature of the input text into the focus labeling network, to predict the position of the focus corresponding to the core entity predicted by the core entity labeling network ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language [0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100).
Regarding claim 17, Solomon discloses a non-transitory computer readable medium, storing a computer program thereon, wherein the program, when executed by a processor, causes the processor to perform operations ([0032] - The present disclosure relates generally to systems, methods and logical constructs for providing intelligent assistance to users), the operations comprising:
performing word and phrase feature extraction on an 5input text ([0047] - Audio data from the audio processor 134 may be transformed by feature extractor 136 into data for processing by a speech recognition engine 140 of the speech recognition program 120);
inputting an extracted word and phrase feature into a text coding network for coding, to obtain a coding sequence of the input text ([0052] - Individual HMMs for separate phonemes and words may be combined to create an HMM for a sequence of phonemes or words);
processing the coding sequence of the input text using 10a core entity labeling network to predict a position of a core entity in the input text ([0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100);
extracting a subsequence corresponding to the core entity in the input text from the coding sequence of the input text, based on the position of the core entity in the 15input text ([0042] - The entity tracker 100 is configured to detect entities and their activities, including people, animals, or other living things, as well as non-living objects. Entity tracker 100 includes an entity identifier 104 that is configured to recognize individual users and/or non-living objects. Voice listener 30 receives audio data and utilizes speech recognition functionality to translate spoken utterances into text. Voice listener also may assign confidence value (s) to the translated text, and may perform speaker recognition to determine an identity of the person speaking, as well as assign probabilities to the accuracy of such identifications. Parser 40 analyzes text and confidence values received from voice listener 30 to derive user intentions and generate corresponding machine-executable language);
and predicting a position of a focus corresponding to the core entity in the input text using a focus labeling network, based on the coding sequence of the input text and the subsequence corresponding to the core entity in the input 20text ([0142] - the commitment engine 60 may receive context information 110, such as entity identity, entity position, and entity status information, from the entity tracker 100).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Alvelda, VII (U.S. Publication No. 20200104641) teaches machine learning using semantic concepts represented with temporal and spatial data. Armstrong (U.S. Patent No. 8478420) teaches implantable medical device charge balance assessment. Sundaram (U.S. Publication No. 20190114544) teaches semi-supervised learning for training an ensemble of deep convolutional neural networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571) 272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/
Examiner, Art Unit 2658

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658