DETAILED ACTION
The preliminary amendment filed on March 3rd, 2020, has been entered.
Claims 11, 12, 18, and 19 have been amended.
Claim 17 has been cancelled.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. CN201811005796.4, filed on August 30th, 2018.
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on September 21st, 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
It appears to the examiner that there is a typo in citation number 3 under “Foreign Patent Documents”: The listed document number, CN 1042008004, appears that it should instead be CN 104200804. For the purposes of consideration, CN 104200804 is the document being considered.
Claim Objections
Claim 8 is objected to because of the following informalities: 
In line 6, the word “phase” appears that it should read “phrase”.
Appropriate correction is required. For the purposes of expediting prosecution, the claims will be assumed to read as though they were corrected as above.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim does not fall within at least one of the four categories of patent eligible subject matter because it recites: 
A computer-readable storage medium, characterized in having computer-readable instructions stored thereon and when the instructions are executed by a computer…
	Paragraphs 96-99 of the specification describe the computer-readable medium, but do not specifically limit it to non-transitory propagating signals. Examiner suggests that the applicant narrow the claim to cover only statutory embodiments to avoid a rejection under 35 U.S.C. 101 by adding the limitation “non-transitory” to the claim.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claim 1-2, 4, 6-8, 13, 18, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Printz (U.S. Patent Publication: 2018/0068661 A1).
In regards to claim 1, Printz teaches:
A named entity recognition method (Abstract, lines 1-5: “high accuracy recognition and understanding of freely spoken utterances which many contain proper names and similar entities” is construed as named entity recognition), comprising:
acquiring a voice signal (Paragraph 114, lines 1-4);
extracting a voice feature vector in the voice signal (Paragraph 124: Printz notes that the both the primary and secondary recognizer’s components are capable of various internal structures for the purpose of speech recognition. Notably, on Page 6, Paragraph 124, lines 15-16, Printz specifically teaches the use of Mel-frequency cepstral coefficients, which is cited by the instant application as an example of a prosodic feature that may be found in a voice feature vector);
extracting, based on a literalness result after voice recognition is performed on the voice signal, a literalness feature vector in the literalness result (Paragraph 116: Printz describes a “primary recognizer” that may create a textual transcription from an audio signal input, where the textual transcription consists of words that belongs to its vocabulary. This output is construed as a literalness feature vector; See also Paragraph 122: Printz further describes a “secondary recognizer” that is similarly capable of generating an output that may be construed as a literalness feature vector);
splicing the voice feature vector and the literalness feature vector to obtain a composite feature vector of each word in the voice signal (Either the primary recognizer or the secondary 
processing the composite feature vector of each word in the voice signal to obtain a named entity recognition result (Paragraph 124: The composite feature vector being processed is the combination of the data described above (the data which contained both Mel-frequency cepstral coefficients and the textual transcription). Paragraphs 129-139 describe how a named entity recognition result may be obtained through the use of the described recognizer components. See specifically paragraphs 129, 138, and 139).
Printz does not explicitly teach processing the composite feature vector through a deep learning model to obtain a named entity recognition result. However, Printz does teach various internal structures without description as to their implementation or combination, which nonetheless render the instant application as obvious; Page 7, Paragraph 124, lines 1-3 specifically teach the use of a recurrent neural network in an automated speech recognizer, which is a type of deep learning model specifically disclosed in the instant application. In addition, Printz teaches that the primary and secondary recognizers may share significant internal operating details and may thus share the results of certain computations (Paragraphs 366 and 367). Printz notes specifically that the shared internal operating details may include “feature vectors or other intermediate representations of the speech signal” as well as “an acoustic model, neural network, or other computational device for evaluating the quality of a given acoustic match” (Paragraph 366, lines 7-11). The sharing of these results is advantageous because it may reduce the computational workload and may reduce the usage of one or both of RAM and non-
With these in mind, it would have been obvious to one of ordinary skill in the art at the time of filing to create an embodiment wherein the data from the primary recognizer (comprising of both the Mel-frequency cepstral coefficients and the textual transcription, i.e. the composite feature vector) is shared with the secondary recognizer, where the data is processed using a neural network (which may be a recurrent neural network, i.e. a deep learning model) to obtain a named entity recognition result. Such a structure could be expected to reduce the computational workload and RAM or non-volatile memory usage in the secondary recognizer. Thus, Printz further teaches:
processing the composite feature vector (Paragraph 124: The composite feature vector being processed is the combination of the data described above (the data which contained both Mel-frequency cepstral coefficients and the textual transcription) as it is shared by the primary recognizer with the secondary recognizer) of each word in the voice signal through a deep learning model (Page 7, Paragraph 124, lines 1-3 specifically teach the use of a recurrent neural network in an automated speech recognizer, which is a type of deep learning model specifically disclosed in the instant application) to obtain a named entity recognition result (Paragraphs 129-139 describe how a named entity recognition result may be obtained through the use of the described recognizer components. See specifically paragraphs 129, 138, and 139).
In regards to claim 2, Printz further teaches:	
The named entity recognition method according to claim 1, wherein extracting the voice feature vector in the voice signal comprises:
extracting a voice sentence feature (Paragraph 124: Printz notes that the both the primary and secondary recognizer’s components are capable of various internal structures for the purpose of speech recognition. Notably, on Page 6, Paragraph 124, lines 15-16, Printz 
In regards to claim 4, Printz further teaches:
The named entity recognition method according to claim 2, wherein extracting a voice word feature vector in the voice signal comprises:
obtaining a voice word feature vector (Paragraph 116, lines 4-7: The primary recognizer may produce a textual transcription of an audio input, labeled with the start time and end time of each transcribed word. Extracting a start time and an end time of each word is cited by the instant application as an example of a feature that may be found in a voice word feature vector) in the voice signal by performing voice analysis on the voice signal (Paragraph 116: Automatic speech recognition is construed as voice analysis).
In regards to claim 6, Printz further teaches:
The named entity recognition method according to claim 1, wherein extracting, based on the literalness result after the voice recognition is performed on the voice signal, the literalness feature vector in the literalness result comprises:
extracting a word feature vector (Paragraph 186: The primary recognizer may identify words corresponding to the phonetic components) and extracting a word segmentation embedding feature vector (Paragraph 137: a natural language understanding module is able to identify “acoustic spans”, 
In regards to claim 7, Printz further teaches:
The named entity recognition method according to claim 6, wherein extracting the word feature vector in the literalness result comprises:
converting, according to a literalness-vector value comparison table in a preset word database (Paragraph 186: The primary recognizer may employ a lexicon associating energy patterns in a waveform with phonetic components (i.e. a literalness-vector value comparison table in a preset word database) to identify words corresponding to the phonetic components), the literalness into a corresponding word feature vector (Paragraph 187, lines 1-4: The result of the above operation is sent as a transcription to the natural language understanding module. This transcription is construed as a word feature vector).
In regards to claim 8, Printz further teaches:
The named entity recognition method according to claim 6, wherein extracting the word segmentation embedding feature vector in the literalness result comprises:
dividing, according to a phrase comparison table in a preset phrase database (Figure 2: A phrase comparison table in a preset phrase database), a phrase and an individual word in the literalness result (Paragraph 137: The natural language understanding module determines that a part of the transcription probably contains a spoken rendering of one of the phrases in the phrase comparison table, and identifies the acoustic span (i.e. the phrase) to be distinct from the other words in the transcription (i.e. individual words)); 
converting, according to a preset transform rule (Conversion of the textual transcription data to a vector containing an acoustic span is done through the natural language understanding module, which was preset to perform this task; that is, the words in the phrase and the 
In regards to claim 13, Printz does not explicitly teach processing the composite feature vector through a deep learning model to obtain a named entity recognition result. However, Printz does teach various internal structures without description as to their implementation or combination, which nonetheless render the instant application as obvious; Page 7, Paragraph 124, lines 1-3 specifically teach the use of a recurrent neural network in an automated speech recognizer, which is a type of deep learning model specifically disclosed in the instant application. In addition, Printz teaches that the primary and secondary recognizers may share significant internal operating details and may thus share the results of certain computations (Paragraphs 366 and 367). Printz notes specifically that the shared internal operating details may include “feature vectors or other intermediate representations of the speech signal” as well as “an acoustic model, neural network, or other computational device for evaluating the quality of a given acoustic match” (Paragraph 366, lines 7-11). The sharing of these results is advantageous because it may reduce the computational workload and may reduce the usage of one or both of RAM and non-volatile memory usage, which may yield significant reductions in overall system latency and resource requirements (Paragraph 367).
With these in mind, it would have been obvious to one of ordinary skill in the art at the time of filing to create an embodiment wherein the data from the primary recognizer (comprising of both the Mel-frequency cepstral coefficients and the textual transcription, i.e. the composite feature vector) is shared with the secondary recognizer, where the data is processed using a neural network (which may be a recurrent neural network, i.e. a deep learning model) to obtain a named entity recognition result. 
The named entity recognition method according to claim 1, wherein processing the composite feature vector of each word in the voice signal (Paragraph 124: The composite feature vector being processed is the combination of the data described above (the data which contained both Mel-frequency cepstral coefficients and the textual transcription) as it is shared by the primary recognizer with the secondary recognizer) through the deep learning model (Page 7, Paragraph 124, lines 1-3 specifically teach the use of a recurrent neural network in an automated speech recognizer, which is a type of deep learning model specifically disclosed in the instant application) to obtain the named entity recognition result (Paragraphs 129-139 describe how a named entity recognition result may be obtained through the use of the described recognizer components. See specifically paragraphs 129, 138, and 139) comprises:
sending the composite feature vector to an input terminal of a selected deep learning model (The composite feature vector is shared by the primary recognizer with the secondary recognizer, which must inherently feed the composite feature vector into the input terminal of the selected deep learning model in order to process the composite feature vector);
processing the composite feature vector through respective layers in the selected deep learning model (processing an input vector through respective layers of deep learning model is an inherent function of deep learning models);
obtaining a named entity recognition result at an output terminal of the deep7Preliminary AmendmentAtty. Docket: 1734-619 learning model (Paragraph 138: The secondary recognizer obtains a named entity recognition result, and as explained above, may use a deep learning model to do so; thus it is construed that the named entity recognition result may be obtained at the output of the deep learning model).
In regards to claim 18, claim 18 is a device claim corresponding to the method of claim 1. In addition to the elements of claim 1, Printz also teaches:
A named entity recognition equipment, wherein the equipment comprises:
A voice acquisition device (Paragraph 376, lines 3-5: Printz teaches that the system may include “input/output devices”. In addition, in Paragraph 114, Printz notes that the embodiments accept, as input, an “audio signal comprising fluent, natural human speech”, which is construed as voice. Also Claim 68, which cites “a processor configured from receiving an utterance from a user as input; that is, a device that can acquire a voice), a processor (Paragraph 376, lines 3-4), and a memory (Paragraph 376, lines 3-4), the memory contains a set of instructions that, when executed by the processor, cause the named entity recognition equipment to execute the operations of claim 1 (Paragraph 378).
In regards to claim 19, claim 19 is a device claim corresponding to the method of claim 1. In addition to the elements of claim 1, Printz also teaches:
A computer-readable storage medium, characterized in having computer-readable instructions stored thereon (Paragraph 376, lines 3-4 and Paragraph 377: Printz describes a device which may contain memory and storage devices that are computer-readable storage media that may store instructions), and when the instructions are executed by a computer, executing the operations of claim 1 (Paragraph 378: the instructions stored in the memory can be implemented to carry out the actions described above):
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Printz as applied to claim 1 above, and further in view of Garman et al. (U.S. Patent Publication: 2021/0256961 A1, hereinafter “Garman”).
In regards to claim 3, Printz teaches all the elements of claim 1 as above. However, Printz does not teach the use of a voice parameter comparison table in a preset voice sentence database. In a 
Together, Printz and Garman teach:
The named entity recognition method according to claim 2, wherein extracting the voice sentence feature vector in the voice signal comprises:
Converting (Paragraph 32, lines 10-12: Pitch and loudness may be encoded (i.e. converted) in a prosody block 218), according to a voice parameter comparison table in a preset voice sentence database (Paragraph 32: Pitch/loudness values for the variables may be extracted from lexicon 216; prosodic value may be relative (i.e. compared) to an average or predictable value for that variable. Paragraph 50: Lexicon is noted to have “entries”, which can be construed as the claimed “table”. See also Paragraph 32, lines 15-17: Printz discusses determining a focus word in a sentence, and Paragraph 73: Printz describes an embodiment in which prosodic values are stored in the trained model data 1018 (i.e. in a database)), a voice sentence feature in the voice signal into a corresponding voice sentence feature vector (Paragraph 33 and Fig. 2: prosody 218 is embedded in feature vector 222).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Printz incorporate the teachings of Garman to include this system of using embedding. Doing so could have helped a deep recurrent neural network find an optimal balance between factorization and adaptation during the learning process, as taught by Garman (Paragraph 36), which is desirable in the art.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Printz as applied to claim 4 above, and further in view of Austin et al. (U.S. Patent 5598505 A, hereinafter “Austin”).
In regards to claim 5, Printz teaches all the elements of claim 4 as above. Printz further teaches:
The named entity recognition method according to claim 4, wherein performing voice analysis on the voice signal comprises:
performing discrete sampling (Paragraph 138, lines 8-10: Printz refers to the data corresponding to the audio input signal as a “sequence of samples”) on the voice signal in a time domain (Fig. 3 shows discrete samples of a voice signal in a time domain) to obtain a digital voice signal (Paragraph 376: Printz notes that the system may be implemented using computers or computer processors, which would necessitate digital signals);
processing each word in the digital voice signal in the time domain (extracting a start time and an end time of each word would constitute processing in the time domain) and the frequency domain (extracting mel frequency cepstral coefficients would constitute processing in the frequency domain) respectively to obtain a time domain feature vector and a frequency domain feature vector thereof;
splicing, for each word in the voice signal, the time domain feature vector and the frequency domain feature vector thereof to obtain a voice word feature vector corresponding to each word (Either the primary recognizer or the secondary recognizer taught by Printz is capable of extracting both time domain features and frequency domain features as above; thus, the Mel-frequency cepstral coefficients (i.e. the frequency domain feature vector) is being combined (i.e. spliced) with start and end time of words (i.e. the time domain feature vector) for processing in pursuit of the goal of automatic speech recognition).
	However, Printz fails to explicitly disclose discrete sampling on the voice signal in a frequency domain to obtain a digital voice signal, despite teaching processing in the frequency domain.
	Austin teaches a system for speech recognition that performs discrete sampling on an input signal in a time domain and a frequency domain to obtain a digital voice signal (Column 7, lines 5-10).

Printz teaches a system for named entity recognition that is capable of performing processing in both the time domain and the frequency domain for the purposes of analyzing an input voice signal, but does not explicitly describe how a frequency domain voice signal is obtained prior to analysis; thus, while the system may perform discrete sampling in the frequency domain, it may instead perform discrete sampling in the time domain and transform the signal into the frequency domain prior to processing and thus may differ from the claimed device.
Austin teaches a system for improving speech recognition that does perform discrete sampling in both the frequency and time domain; thus, such a technique is known in the art of speech recognition.
One of ordinary skill in the art could have substituted the discrete sampling element described by Austin for the discrete sampling element of Printz, wherein the discrete sampling was performed in both the time and frequency domain. The elements described by Printz that utilize frequency domain analysis (e.g. Mel frequency cepstrum coefficients) would have predictably achieved similar results whether the frequency domain signal was sampled from the input voice signal or was transformed from the time domain signal.
Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Printz as applied to claim 1 above, and further in view of Sapugay et al. (U.S. Patent Publication: 2019/0295537, hereinafter “Sapugay 1”)
In regards to claim 9, Printz teaches all the elements of claim 1 as above. However, Printz does not teach performing normalization processing on the extracted feature vectors.
In a related art, Sapugay 1 teaches a system for natural language understanding (Abstract, lines 1-4) using a deep learning model (Paragraph 11: Sapugay 1 teaches that the natural language understanding may utilize machine learning-based methods; Paragraph 36, Sapugay 1 notes that machine learning may refer to e.g. a recurrent neural network, which is a type of deep learning model specifically disclosed in the instant application). In particular, Sapugay 1 teaches the generation of “subtree vectors”, which are a component of “meaning representations”, which are processed using machine learning based methods for a natural language understanding task (Paragraphs 82, 87, and 11). Sapugay 1 notes that the subtree vector is the combination of multiple word vectors (Paragraph 88, lines 1-9) and that the resulting vector after addition is subsequently normalized (Paragraph 88, lines 9-11). Sapugay 1 teaches that doing so ensures that the dimensions of the combined subtree vector are each within a suitable range (Paragraph 88, lines 9-13). 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Printz to incorporate the teachings of Sapugay 1. Such a combination would have been an example of known work in one field of endeavor prompting variations of it for use in a different one based on design incentives when the variations are predictable to one of ordinary skill in the art (See MPEP 2143(I)(F)).
Though the instant application is directed towards specifically automatic speech recognition, both the system taught by Sapugay 1 and the instant application are in the broader field of natural language understanding.
The vector normalization taught by Sapugay 1 is noted by Sapugay 1 to be desirable, in that it ensures the dimensions of the combined vector are within a suitable range (Paragraph 88, lines 9-13).
In this way, Sapugay 1 teaches the normalization of feature vectors that Printz fails to teach. Notably, the instant claim cites normalization of the vectors prior to their splicing, while Sapugay 1 teaches the normalization of the vectors posterior to their combination. However, one of ordinary skill in the art would appreciate that normalization before and after the combination of vectors would result in a mathematically equivalent result.
Thus, one of ordinary skill in the art could have implemented the claimed variation of the prior art, as the normalization process being performed before or after would have predictably generated an equivalent result.
Thus, Printz and Sapugay 1 teach:
The named entity recognition method according to claim 1, wherein splicing the voice feature vector and the literalness feature vector to obtain the composite feature vector of each word in the voice signal comprises (Paragraph 88, lines 1-9: Sapugay 1 teaches the combination of the word vectors):
performing normalization processing on the extracted voice feature vector and the extracted literalness feature vector respectively (Paragraph 88, lines 9-11: Sapugay 1 teaches the normalization of the vectors);
subjecting a dense literalness feature vector and a dense voice feature vector (the term “dense” is given its broadest reasonable interpretation consistent with the specification as describing a vector with few zero values; Fig. 3 of Printz shows both a dense literalness feature vector (e.g. the word transcription) and a dense voice feature vector (e.g. the start times of each of the words)) obtained for each word in the voice signal after the normalization processing to vector-splicing so as to obtain the composite feature vector for each word in the voice signal (Paragraph 82, 87, 88, 11: The subtree vector is combined from word vectors, and though 
In regards to claim 10, Printz teaches all the elements of claim 1 as above. However, Printz does not teach performing normalization processing on the extracted feature vectors.
In a related art, Sapugay 1 teaches a system for natural language understanding (Abstract, lines 1-4) using a deep learning model (Paragraph 11: Sapugay 1 teaches that the natural language understanding may utilize machine learning-based methods; Paragraph 36, Sapugay 1 notes that machine learning may refer to e.g. a recurrent neural network, which is a type of deep learning model specifically disclosed in the instant application). In particular, Sapugay 1 teaches the generation of “subtree vectors”, which are a component of “meaning representations”, which are processed using machine learning based methods for a natural language understanding task (Paragraphs 82, 87, and 11). Sapugay 1 notes that the subtree vector is the combination of multiple word vectors (Paragraph 88, lines 1-9) and that the resulting vector after addition is subsequently normalized (Paragraph 88, lines 9-11). Sapugay 1 teaches that doing so ensures that the dimensions of the combined subtree vector are each within a suitable range (Paragraph 88, lines 9-13). 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Printz to incorporate the teachings of Sapugay 1. Such a combination would have been an example of known work in one field of endeavor prompting variations of it for use in a different one based on design incentives when the variations are predictable to one of ordinary skill in the art (See MPEP 2143(I)(F)).
Though the instant application is directed towards specifically automatic speech recognition, both the system taught by Sapugay 1 and the instant application are in the broader field of natural language understanding.
The vector normalization taught by Sapugay 1 is noted by Sapugay 1 to be desirable, in that it ensures the dimensions of the combined vector are within a suitable range (Paragraph 88, lines 9-13).
Sapugay 1 teaches the normalization of feature vectors prior to their splicing that Printz fails to teach.
Thus, one of ordinary skill in the art could have implemented the claimed variation of the prior art, as the normalization process would have predictably generated an equivalent result.
Thus, Printz and Sapugay 1 teach:
The named entity recognition method according to claim 1, wherein splicing the voice feature vector and the literalness feature vector to obtain the composite feature vector of each word in the voice signal comprises (Paragraph 88, lines 1-9: Sapugay 1 teaches the combination of the word vectors):
vector-splicing a dense literalness feature vector and a dense voice feature vector (The term “dense” is given its broadest reasonable interpretation consistent with the specification as describing a vector with few zero values; Fig. 3 of Printz shows both a dense literalness feature vector (e.g. the word transcription) and a dense voice feature vector (e.g. the start times of each of the words)) obtained for each word in the voice signal to obtain the composite feature vector for each word in the voice signal (Paragraph 88, lines 1-9: Sapugay 1 notes that the subtree vector is the combination of multiple word vectors);
performing normalization processing on the voice feature vector and the literalness feature vector in the obtained composite feature vector respectively (Paragraph 88, lines 9-11: The resulting vector after addition is subsequently normalized).
Claims 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Printz and Sapugay 1 as applied to claim 9 above, and further in view of Almaliki (2018, "Standardization VS Normalization").
In regards to claim 11, Printz and Sapugay 1 teach all the elements of claim 9 as above. However, neither disclose the use of linear function normalization specifically. However, linear function normalization is a method of normalization well known in the field of data processing for machine learning (Almaliki, Section “Normalization”: Almaliki teaches “Normalization” or “Min-Max scaling” and cites an equation that is equivalent to the “linear function” disclosed in the instant application). Thus, such a variation would have been a simple substitution of one known element for another to obtain predictable results (see MPEP 2143(1)(B)).
Printz and Sapugay 1 combine to teach all the elements of claim 9 as above, but do not disclose any specific normalization method.
As taught by Almaliki, “Min-max scaling” (i.e. linear function normalization) is a known method for normalization to build features that have similar ranges to each other.
One of ordinary skill in the art could have substituted the undisclosed normalization method of Sapugay 1 with the well-known Min-max scaling taught by Almaliki to achieve a similar result; both Sapugay 1 and Almaliki teach a similar motivation for normalizing data prior to processing in e.g. a neural network (Almaliki, Section “Use Cases”, bullet 2: Almaliki lists neural networks as an area where feature scaling is important; Almaliki, Section “Conclusion”: Almaliki notes that the goal of normalization (i.e. linear function normalization) is to build features that have similar ranges to each other. Compare this to Sapugay 1, Paragraph 88: Sapugay 1 teaches normalization is desirable to ensure the dimensions of the combined vector are within a suitable range.)
Thus, Printz, Sapugay 1, and Almaliki teach:

performing linear function normalization (Almaliki, Section “Normalization”: Almaliki teaches “Normalization” or “Min-Max scaling” and cites an equation that is equivalent to the “linear function” disclosed in the instant application) processing on the voice feature vector and the literalness feature vector respectively.
In regards to claim 12, Printz and Sapugay 1 teach all the elements of claim 9 as above. However, neither disclose the use of zero-mean standardization specifically. However, zero-mean standardization is a method of normalization well known in the field of data processing for machine learning (Almaliki, Section “Standardization”: Almaliki teaches “standardization” or “z-score normalization” and cites an equation that is equivalent to the zero-mean standardization processing formula disclosed in the instant application). Thus, such a variation would have been a simple substitution of one known element for another to obtain predictable results (see MPEP 2143(1)(B)).
Printz and Sapugay 1 combine to teach all the elements of claim 9 as above, but do not disclose any specific normalization method.
As taught by Almaliki, “z-score normalization” (i.e. zero-mean standardization) is a known method for normalization to build features that have similar ranges to each other.
One of ordinary skill in the art could have substituted the undisclosed normalization method of Sapugay 1 with the well-known Min-max scaling taught by Almaliki to achieve a similar result; both Sapugay 1 and Almaliki teach a similar motivation for normalizing data prior to processing in e.g. a neural network (Almaliki, Section “Use Cases”, bullet 2: Almaliki lists neural networks as an area where feature scaling is important; Almaliki, Section “Conclusion”: Almaliki notes that the goal of standardization (i.e. zero-mean 
Thus, Printz, Sapugay 1, and Almaliki teach:
The named entity recognition method according to claim 9, wherein performing normalization (Paragraph 88, lines 9-11: Sapugay 1 teaches the normalization of the vectors) processing comprises:
performing zero-mean standardization (Almaliki, Section “Standardization”: Almaliki teaches “standardization” or “z-score normalization” and cites an equation that is equivalent to the zero-mean standardization processing formula disclosed in the instant application) processing on the voice feature vector and the literalness feature vector respectively.
Allowable Subject Matter
Claims 14-16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
In regards to claims 14, Printz does not teach truncating according to a sentence length feature value in combination with the other elements of the claim. 
Notably, however, Sapugay et al. (U.S. Patent Publication: 2019/0294676 A1, hereinafter “Sapugay 2”) teaches a natural language understanding system that analyzes text based on prosodic cues (Paragraph 9) and is capable of dividing utterances into multiple sentences (Paragraph 89). In addition, Sapugay 2 discusses how a prosody subsystem may use rules stored in a database to digest a conversation log (Paragraph 121). Such rules may include a number of messages (Paragraph 121, lines 12-18), according to which an episode may be demarcated (i.e. a sentence length feature value 
The named entity recognition method according to claim 1, wherein in a case where the voice signal contains multiple sentences, before processing the vector of each word in the voice signal through a deep learning model (Sapugay 2, Paragraph 9: Intent segments extracted by the prosody subsystem may be consumed by a training process for a machine learning-based structure) to obtain the named entity recognition result, the method further comprises:
truncating, according to a sentence length feature value (Sapugay 2, Paragraph 121, lines 12-18) corresponding to a current sentence in the voice signal (Sapugay 2, Paragraph 66: the input text may be derived from a voice utterance), all obtained feature vectors (Sapugay 2, Paragraph 66: the textual transcription of the voice utterance is similar to the literalness feature vector of the instant application and is construed as such) of the voice signal to obtain multiple feature vector sequences (Sapugay 2, Paragraph 89: Each segment is construed as a feature vector sequence);
wherein the number of the feature vector sequences is equal to the number of sentences contained in the voice signal (Sapugay 2, Paragraph 89: Each utterance can be divided into a number of different intent segments (i.e. sentences); thus, the system is capable of dividing into a number of intent segments equal to the number of sentences contained in the voice signal).
and the number of the feature vectors possessed by each of the multiple feature vector sequences is equal to the sentence length feature value corresponding to the current sentence in the voice signal (Sapugay 2, Paragraph 121, lines 12-18: The number of feature vectors 
	However, Sapugay 2 fails to teach dividing a composite feature vector into composite feature vector segments. While Sapugay 2, as noted above, does teach deriving an input text from a voice utterance (i.e. extracting a literalness feature vector), there is no explicit teaching for combining this input text with any information that may be construed as a voice feature vector before performing division.
In regards to claim 15, claim 15 is dependent on a claim that contains allowable subject matter. However, it should be noted that a combination of Printz and Sapugay 2 teaches:
wherein the sentence length feature value of the current sentence in the voice signal is obtained from a voice feature vector in the voice signal (Paragraph 121: the “number of messages” (i.e. the sentence length feature value) may be defined by the rules 114 in database 106. Paragraph 83 notes that the rules can be generated based on e.g. a cadence of written conversation, which is construed as a voice feature).
In regards to claim 16, claim 16 is dependent on a claim that contains allowable subject matter. However, it should be noted that a combination of Printz and Sapugay 2 teaches:
wherein the sentence length feature value of the current sentence in the voice signal is obtained from the literalness result after voice recognition (Paragraph 66) is performed on the voice signal (Paragraph 121: the “number of messages” (i.e. the sentence length feature value) may be defined by the rules 114 in database 106. Paragraph 83 notes that the rules can be generated based on e.g. learned cue words surrounding breakpoint contexts, which is construed as a literalness result).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER J KIM whose telephone number is (571)272-4442. The examiner can normally be reached M-F 7:30 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





 /ALEXANDER JOONGIE KIM/               Examiner, Art Unit 2655            
      /ANDREW C FLANDERS/                     Supervisory Patent Examiner, Art Unit 2655