DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments and amendments in the Amendment filed March 26, 2021 (herein “Amendment”), with respect to the rejection(s) of claims 1-20 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, new grounds of rejection is made at least for claims 1-2 and 5-20, in view of DaCiDian Github Project repository, 662cbe52a6 Branch dated April 14, 2018, available at “https://github.com/aishell-foundation/DaCiDian/tree/662cbe52a6688647094ed0b7871a334f0e6cb7a1”

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., "Multiple Character Embeddings for Chinese Word Segmentation," arXiv:1808.04963v1 [cs.CL], 10 pages, August 15, 2018 (herein “Wang NPL”) in view of Yuan et al., “A Neural Network for Disambiguating Pinyin Chinese Input,” Proceedings of the CALICO '94 Annual Symposium, March 14-18, 1994, Northern Arizona University (herein “Yuan NPL”) further in view of Xianyun, "Chinese Pinyin Conversion Tool (Python Version) - pypinyin 0.31.0 documentation," including the Usage section, July 11, 2018, [http://pypinyin.readthedocs.io/zh_CN/master/ and /usage.html], Internet Archive [http://web.archive.org/web/20180711063235/http://pypinyin.readthedocs.io/zh_CN/master/ and /usage.html]  (Chrome Machine English translation) (herein “Xianyun NPL”), further in view of DaCiDian Github Project repository, 662cbe52a6 Branch dated April 14, 2018, available at https://github.com/aishell-foundation/DaCiDian/tree/662cbe52a6688647094ed0b7871a334f0e6cb7a1 (herein “DaCiDian NPL”).
Regarding claim 1, Wang teaches a computer-implemented method, comprising (Wang NPL Abstract, footnote 1, code and results of the disclosed process of multiple character embeddings for Chinese word segmentation are documented in the github repository (programming code repository – thus computer implemented, also the code in the github repository listed (https://github.com/wangjksjtu/multi-embedding-cws) is in the python language)): 
determining, by one or more processors, respective pinyin codes for respective Chinese characters comprised in a string that is to be processed(Wang NPL sections 2.2 and  2.4, Chinese characters are preprocessed, then the pypinyin library (an open-source python based Chinese to pinyin library, thus “by one or more processors”) is used to annotate pinyin codes), the pinyin codes including an initial portion and a final portion (Wang NPL, figure 1 and section 2.2, the pinyin formed with letters, where the first letters are an initial portion, and the last letters are a final portion, for example, Fig. 1, pinyin input layer includes chang, which has multiple letters, thus initial letters and final letters); 
generating, by one or more processors, respective pinyin features from the respective pinyin codes (Wang NPL sections 2.4 and 4, word2vec (another python library hence by one or more processors) is used to retrieve multiple embeddings where an input vector is generated xp(t) for the pinyin embedding, where a vector is comprised of weighted features), the pinyin features including the initial portion, the final portion (Wang NPL, figure 1, section 2.4, the pinyin which is vectorized in the word2vec, includes the initial letters and final letters thus the resulting vector including the pinyin features of initial and final letters/portion); and 
Wang NPL sections 2.4, 3.4 and 4, figures 4 and 5, Bi-LSTM architectures (Models I, II and III) which, using a neural network, output a tag that labels the input Chinese character as one of {B, M, E, S} indicating the B= beginning of a word, M=middle of a word, E= end of a word, and S= word with a single character, these classifications identifying the input Chinese character as part of a word (candidate language entity), where the identifying includes an entire word also when the word is a single character S), stored in computer memory (Wang NPL section 3.1, the fundamental unit of the neural network is the LSTM unit which is a memory unit), describing an association between pinyin features and language entities (Wang section 3.1 and figure 4, embedding layer (including the pinyin embedding (pinyin features) are input to the Bi-LSTM neural network and output as a label to the part of a word the input symbol and its associated pinyin belong to, see also the tagging layer example given in figure 1).
Wang NPL does not explicitly teach the candidate language entity being a homophone of another language entity representing the word.
Wang NPL further does not explicitly teach the pinyin features including tone marks associated with the Chinese characters.
Wang NPL further does not explicitly teach with a padding symbol between the initial portion and the final portion.
Yuan NPL teaches the candidate language entity being a homophone of another language entity representing the word (Yuan NPL, Operation section, the WinCALIS dictionary of pinyin-Chinese character correspondences is searched via a neural network to find homophones for a word in pinyin transcription, where multiple matches in the WinCALIS dictionary are found, and then probabilities for which homophone (where found homophones will be homophones to each other and their Chinese character representation) is likely the correct word are provided).
Xianyun NPL teaches the pinyin features including tone marks associated with the Chinese characters (Xianyun NPL pages 1 and 4 of the Usage section, in using the pinyin function which receives as input Chinese characters, the function outputs a two dimensional array [['zho1ng', 'zho4ng'], ['xi1n']] (as an example) including the tone marks 1, 4 and 1, therefore, the tone marks are features of the pinyin codes, and also since the output is in character arrays/strings, the placement of the tone marks are indexed, for example 'zho4ng' has the tone mark 4 in the fourth character position).
DaCiDian NPL teaches with a padding symbol between the initial portion and the final portion (DaCiDian NPL pages 1-2 and 4-5, the documented code maps pinyin syllables to phonemes (thus a form with an initial portion (first phomeme) and a final portion (ending phoneme), where the code uses a user defined mapping (page 2) and in the code itself on page 4 is given an example where pinyin ZHENG is mapped to an initial portion zh, then a space between (padding symbol) and then a final portion eng).
Therefore, taking the teachings of Wang NPL and Yuan NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL with the homophone disambiguation teachings specifically cited to above in Yuan NPL at least because doing so would provide an efficient way to 
Further, taking the teachings of Wang NPL and Xianyun NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL with the pinyin tone array output specifically cited to above in Xianyun NPL at least because doing so would intelligently match the most correct pinyin according to the phrase (Xianyun NPL, characteristics section page 1).
Still further, taking the teachings of Wang NPL and DaCiDian NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL with the pinyin to phoneme mapping as cited to above in DaCiDian NPL at least because doing so would provide a core component of automatic speech recognition, a lexicon, in an easily adaptable form (DaCiDian NPL, page 1).
Claims 2, 10, 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang NPL in view of Yuan NPL in view of Xianyun NPL in view of DaCiDian NPL as set forth above regarding claim 1 from which claim 2 depends, further in view of Qian et al., (US 2016/0019201 A1, herein “Qian”).
Regarding claims 2 and 11, Wang NPL teaches wherein the determination of respective pinyin codes further comprises: with respect to a Chinese character comprised in the string (Wang NPL fig. 1, section 2.4, Chinese characters from an input layer comprised of a string of Chinese characters, are pre-processed into annotated pinyin codes).
 Wang NPL does not explicitly teach determining, by one or more processors, the tone mark associated with the Chinese character; and updating, by the one or more processors, the pinyin code for the Chinese character based on the determined tone mark.
Qian teaches determining, by the one or more processors, the tone mark associated with the Chinese character (Qian fig. 6, paras. [0120], [0123]-[0125], method steps of method 600 performed by a processor, identify a tone mark with an input Chinese character); and 
updating, by one or more processors, the pinyin code for the Chinese character based on the determined tone mark (Qian fig. 6, paras. [0120], [0126]-[0127], method steps of method 600 performed by a processor, determining whether the identified tone mark matches at least one probable tone mark, where if one tone mark is selected as being the corrected tone mark, this selected tone mark is associated with the input text and replaces (updating) the identified tone mark, where para. [0131] teaches a tone library is updated each time a new tone mark is associated with an identified character of the input text).
Therefore, taking the teachings of Wang NPL and Qian together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the tone mark detection and updating disclosed in Qian at least because doing so would provide another way to distinguish between words in “tonal 
Regarding claim 10, Wang teaches a computer-implemented system, comprising a method comprising (Wang NPL Abstract, footnote 1, code and results of the disclosed process of multiple character embeddings for Chinese word segmentation are documented in the github repository (programming code repository – thus computer implemented, also the code in the github repository listed (https://github.com/wangjksjtu/multi-embedding-cws) is in the python language)): 
determining, by the one or more processors, respective pinyin codes for respective Chinese characters comprised in a string that is to be processed (Wang NPL sections 2.2 and  2.4, Chinese characters are preprocessed, then the pypinyin library (an open-source python based Chinese to pinyin library, thus “by one or more processors”) is used to annotate pinyin codes), the pinyin codes including an initial portion and a final portion (Wang NPL, figure 1 and section 2.2, the pinyin formed with letters, where the first letters are an initial portion, and the last letters are a final portion, for example, Fig. 1, pinyin input layer includes chang, which has multiple letters, thus initial letters and final letters); 
generating, by the one or more processors, respective pinyin features from the respective pinyin codes (Wang NPL sections 2.4 and 4, word2vec (another python library hence by one or more processors) is used to retrieve multiple embeddings where an input vector is generated xp(t) for the pinyin embedding, where a vector is comprised of weighted features), the pinyin features including the initial portion, the final portion (Wang NPL, figure 1, section 2.4, the pinyin which is vectorized in the word2vec, includes the initial letters and final letters thus the resulting vector including the pinyin features of initial and final letters/portion); and 
identifying, by the one or more processors, a candidate language entity representing a word from the string based on the respective pinyin features and a mapping (Wang NPL sections 2.4, 3.4 and 4, figures 4 and 5, Bi-LSTM architectures (Models I, II and III) which, using a neural network, output a tag that labels the input Chinese character as one of {B, M, E, S} indicating the B= beginning of a word, M=middle of a word, E= end of a word, and S= word with a single character, these classifications identifying the input Chinese character as part of a word (candidate language entity), where the identifying includes an entire word also when the word is a single character S), stored in computer memory (Wang NPL section 3.1, the fundamental unit of the neural network is the LSTM unit which is a memory unit), describing an association between pinyin features and language entities (Wang section 3.1 and figure 4, embedding layer (including the pinyin embedding (pinyin features) are input to the Bi-LSTM neural network and output as a label to the part of a word the input symbol and its associated pinyin belong to, see also the tagging layer example given in figure 1).
Wang NPL does not explicitly teach one or more processors coupled to a computer-readable memory unit, the memory unit comprising instructions that when executed by the computer processor implements.
Wang NPL further does not explicitly teach the candidate language entity being a homophone of another language entity representing the word.

Wang NPL further does not explicitly teach with a padding symbol between the initial portion and the final portion.
Qian teaches one or more processors coupled to a computer-readable memory unit, the memory unit comprising instructions that when executed by the computer processor implements (Qian fig. 1, paras. [0039]-[0040], code for implementing disclosed methods stored in a storage device that can direct a processing apparatus such that the instructions when executed by the processor implement the functions disclosed).
Yuan NPL teaches the candidate language entity being a homophone of another language entity representing the word (Yuan NPL, Operation section, the WinCALIS dictionary of pinyin-Chinese character correspondences is searched via a neural network to find homophones for a word in pinyin transcription, where multiple matches in the WinCALIS dictionary are found, and then probabilities for which homophone (where found homophones will be homophones to each other and their Chinese character representation) is likely the correct word are provided).
Xianyun NPL teaches the pinyin features including tone marks associated with the Chinese characters (Xianyun NPL pages 1 and 4 of the Usage section, in using the pinyin function which receives as input Chinese characters, the function outputs a two dimensional array [['zho1ng', 'zho4ng'], ['xi1n']] (as an example) including the tone marks 1, 4 and 1, therefore, the tone marks are features of the pinyin codes, and also since the output is in character arrays/strings, the placement of the tone marks are indexed, for example 'zho4ng' has the tone mark 4 in the fourth character position).
DaCiDian NPL teaches with a padding symbol between the initial portion and the final portion (DaCiDian NPL pages 1-2 and 4-5, the documented code maps pinyin syllables to phonemes (thus a form with an initial portion (first phomeme) and a final portion (ending phoneme), where the code uses a user defined mapping (page 2) and in the code itself on page 4 is given an example where pinyin ZHENG is mapped to an initial portion zh, then a space between (padding symbol) and then a final portion eng).
Therefore, taking the teachings of Wang NPL and Qian together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the code stored on a storage device and executed by a processor disclosed in Qian at least because implementing python code such as that disclosed by Wang NPL using a processor is common, in fact, that is why it is called a “programming language” so that computers can execute the program. Accordingly, such a modification would be (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results. see MPEP 2143(I)(D).
Further, taking the teachings of Wang NPL and Yuan NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL with the homophone disambiguity teachings specifically cited to above in Yuan NPL at least because doing so would provide an efficient way to determine which homophone word is the desired word when converting language input with the same 
Still further, taking the teachings of Wang NPL and Xianyun NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL with the pinyin tone array output specifically cited to above in Xianyun NPL at least because doing so would intelligently match the most correct pinyin according to the phrase (Xianyun NPL, characteristics section page 1).
Yet still further, taking the teachings of Wang NPL and DaCiDian NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL with the pinyin to phoneme mapping as cited to above in DaCiDian NPL at least because doing so would provide a core component of automatic speech recognition, a lexicon, in an easily adaptable form (DaCiDian NPL, page 1).
Regarding claim 19, Wang NPL teaches a computer program product, the computer program product comprising, the program instructions executable by an electronic device to cause the electronic device to perform actions of: (Wang NPL Abstract, footnote 1, code and results of the disclosed process of multiple character embeddings for Chinese word segmentation are documented in the github repository (programming code repository – thus computer implemented, also the code in the github repository listed (https://github.com/wangjksjtu/multi-embedding-cws) is in the python language)): 
determine respective pinyin codes for respective Chinese characters comprised in a string that is to be processed (Wang NPL sections 2.2 and  2.4, Chinese characters are preprocessed, then the pypinyin library (an open-source python based Chinese to pinyin library, thus “by one or more processors”) is used to annotate pinyin codes), the pinyin codes including an initial portion and a final portion (Wang NPL, figure 1 and section 2.2, the pinyin formed with letters, where the first letters are an initial portion, and the last letters are a final portion, for example, Fig. 1, pinyin input layer includes chang, which has multiple letters, thus initial letters and final letters); 
generating respective pinyin features from the respective pinyin codes (Wang NPL sections 2.4 and 4, word2vec (another python library hence by one or more processors) is used to retrieve multiple embeddings where an input vector is generated xp(t) for the pinyin embedding, where a vector is comprised of weighted features), the pinyin features including the initial portion, the final portion (Wang NPL, figure 1, section 2.4, the pinyin which is vectorized in the word2vec, includes the initial letters and final letters thus the resulting vector including the pinyin features of initial and final letters/portion); and 
identifying a candidate language entity representing a word from the string based on the respective pinyin features and a mapping (Wang NPL sections 2.4, 3.4 and 4, figures 4 and 5, Bi-LSTM architectures (Models I, II and III) which, using a neural network, output a tag that labels the input Chinese character as one of {B, M, E, S} indicating the B= beginning of a word, M=middle of a word, E= end of a word, and S= word with a single character, these classifications identifying the input Chinese character as part of a word (candidate language entity), where the identifying includes an entire word also when the word is a single character S) describing an association between pinyin features and language entities (Wang section 3.1 and figure 4, embedding layer (including the pinyin embedding (pinyin features) are input to the Bi-LSTM neural network and output as a label to the part of a word the input symbol and its associated pinyin belong to, see also the tagging layer example given in figure 1).
Wang NPL does not explicitly teach a computer readable storage medium having program instructions embodied therewith.
Wang NPL also does not explicitly teach the candidate language entity being a homophone of another language entity representing the word.
Wang NPL further does not explicitly teach the pinyin features including tone marks associated with the Chinese characters.
Wang NPL further does not explicitly teach with a padding symbol between the initial portion and the final portion.
Qian teaches a computer readable storage medium having program instructions embodied therewith (Qian paras. [0039]-[0040], code for implementing disclosed methods stored in a storage device that can direct a processing apparatus such that the instructions when executed by the processor implement the functions disclosed).
Yuan NPL teaches the candidate language entity being a homophone of another language entity representing the word (Yuan NPL, Operation section, the WinCALIS dictionary of pinyin-Chinese character correspondences is searched via a neural network to find homophones for a word in pinyin transcription, where multiple matches in the WinCALIS dictionary are found, and then probabilities for which homophone (where found homophones will be homophones to each other and their Chinese character representation) is likely the correct word are provided).
Xianyun NPL teaches the pinyin features including tone marks associated with the Chinese characters (Xianyun NPL pages 1 and 4 of the Usage section, in using the pinyin function which receives as input Chinese characters, the function outputs a two dimensional array [['zho1ng', 'zho4ng'], ['xi1n']] (as an example) including the tone marks 1, 4 and 1, therefore, the tone marks are features of the pinyin codes, and also since the output is in character arrays/strings, the placement of the tone marks are indexed, for example 'zho4ng' has the tone mark 4 in the fourth character position).
DaCiDian NPL teaches with a padding symbol between the initial portion and the final portion (DaCiDian NPL pages 1-2 and 4-5, the documented code maps pinyin syllables to phonemes (thus a form with an initial portion (first phomeme) and a final portion (ending phoneme), where the code uses a user defined mapping (page 2) and in the code itself on page 4 is given an example where pinyin ZHENG is mapped to an initial portion zh, then a space between (padding symbol) and then a final portion eng).
Therefore, taking the teachings of Wang NPL and Qian together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the code stored on a storage device disclosed in Qian at least because implementing python code such as that disclosed by Wang NPL by storing it on a computer readable storage medium, is in fact, why it is called a “programming language” so that computers can execute the program from reading it off of a storage 
Further, taking the teachings of Wang NPL and Yuan NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL with the homophone disambiguity teachings specifically cited to above in Yuan NPL at least because doing so would provide an efficient way to determine which homophone word is the desired word when converting language input with the same spelling/pronunciation (Yuan Introduction section, using neural networks to disambiguate pinyin homophones without needing frequency lists that need maintenance or additional information to be entered).
Further, taking the teachings of Wang NPL and Xianyun NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL with the pinyin tone array output specifically cited to above in Xianyun NPL at least because doing so would intelligently match the most correct pinyin according to the phrase (Xianyun NPL, characteristics section page 1).
Still further, taking the teachings of Wang NPL and DaCiDian NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL with the pinyin to phoneme mapping as cited to above in .
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Wang NPL in view of Yuan NPL in view of Xianyun NPL in view of DaCiDian NPL as set forth above regarding claim 1 from which claim 5 depends, further in view of Zhang et al., "While Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?” arXiv:1708.02657v2 [cs.CL], 24 pages, August 17, 2017 (herein “Zhang NPL”).
Regarding claim 5, Wang NPL teaches wherein the generation of the respective pinyin features comprises (Wang NPL sections 2.4 and 4, word2vec (another python library hence by one or more processors) is used to retrieve multiple embeddings where an input vector is generated xp(t) for the pinyin embedding, where a vector is comprised of weighted features), and Wang teaches by the one or more processors (Wang NPL Abstract, footnote 1, code and results of the disclosed process of multiple character embeddings for Chinese word segmentation are documented in the github repository (programming code repository – thus computer implemented via one or more processors), but does not explicitly teach the remainder of the limitations of claim 5.
Zhang NPL teaches with respect to a pinyin code, obtaining a predefined length for generating a pinyin feature from the pinyin code (Zhang NPL sections 2.3 and 4.2, the embedding process associates each entity with a fixed size (predetermined length) vector, where the embedding is at the level of Romanization character, and where the Chinese language is Romanized into Pinyin using the pypinyin package (same package as used by Wang NPL)); 
Zhang NPL section 2.3, for character level encoding (generating the pinyin feature), one additional entry is added to the encoding vocabulary to include a padding symbol for shorter texts (pinyin code being below the predefined length)).
Therefore, taking the teachings of Wang NPL and Zhang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the pinyin vector encoding as disclosed in Zhang NPL at least because doing so would allow for a smaller memory footprint in language processing (Zhang NPL section 2.3).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Wang NPL in view of Yuan NPL in view of Xianyun NPL in view of DaCiDian NPL as set forth above regarding claim 1 from which claim 6 depends, further in view of Wang et al., “Multi-Embedding-CWS,” https://github.com/wangjksjtu/multi-embedding-cws, May 2018 (selected portions of the repository have been saved as PDF text and provided with this action, herein “Wang Github Repo”, the entire repository is available at https://github.com/wangjksjtu/multi-embedding-cws, last accessed May 18, 2020).
Regarding claim 6, Wang NPL teaches further comprising: obtaining, by the one or more processors, a plurality of sample language (Wang NPL section 5.1, corpora used for training the machine learning models disclosed include SIGHAN 2005 and Chinese Treebank 6.0 datasets); 
Wang NPL section 5.1, mapping performed for characters (corresponding to sample language) from the corpora to embeddings),
determining, by the one or more processors (Wang NPL Abstract, footnote 1, code and results of the disclosed process documented in the github repository (programming code repository – thus computer (one or more processors) implemented), respective sample pinyin codes for respective Chinese characters comprised in the sample language (Wang NPL section 5.1, in the preprocessing an embedding is obtained for the Chinese characters, where under the Embedding Ablation section is taught that the models are trained with Pinyin embeddings); 
generating, by the one or more processors, respective sample pinyin features from the respective sample pinyin codes (Wang NPL section 5.1, Embedding Ablation section, Pinyin embeddings are generated from the training data, where section 2.4 teaches that the embedding annotates pinyin codes and converts them to a vector (features)  via the word2vec tool); and 
training, by the one or more processors (Wang NPL Abstract, footnote 1, code and results of the disclosed process documented in the github repository (programming code repository – thus computer (one or more processors) implemented), the mapping based on the respective sample pinyin features and the sample language (Wang NPL section 5.1, preprocessing includes obtaining the embeddings for characters (mapping to sample language) using word2vec where section I Introduction teaches word2vec generates vector representations of words or characters (features)), where the embedding ablation teaches the embeddings are for pinyin), such that the trained Wang NPL fig. 1 and section 3.4, the trained neural network outputs a label of B, M, E, and S indicating parts of a word or a single word itself, where fig. 1 shows how the labels are used to determine the word segmentation, with either/both the parts of a word or the word itself being a language entity).
While Wang NPL teaches using the Chinese Treebank corpora for training it’s disclosed neural network, Wang NPL does not explicitly teach that the Chinese Treebank contains entities.
Wang Github Repo teaches that the Chinese Treebank (and all training corpora for that matter) contains language entities (Wang Github Repo page 10 teaching a sample page from the Chinese Treebank corpora which shows the characters separated by spaces which indicate word separations (entities), and also see page 5, with code from the pre_train.py with a function that reads in a corpora file and assumes that words are separated by spaces).
Therefore, taking the teachings of Wang NPL and Wang Github Repo together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the sample training data containing language entities as disclosed in Wang Github Repo at least because the Wang NPL reference explicitly mentions that the code and results of the process disclosed therein are available at the listed repo (Wang Github Repo), and therefore this teaching, in Wang NPL, would have led one of ordinary skill to consider the Wang Github Repo for .
Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Wang NPL in view of Yuan NPL in view of Xianyun NPL in view of DaCiDian NPL in view of Wang Github Repo, as set forth above regarding claim 6 from which claims 7-9 depend, further in view of Zhao et al., “Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition,” Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing, 2008, pp. 106-111 (available at https://www.aclweb.org/anthology/I08-4017.pdf, last accessed May 18, 2020, herein “Zhao NPL”).
Regarding claim 7, Wang NPL does not explicitly teach the limitations of claim 7.
Zhao NPL teaches wherein one of the sample language entities is labeled with a name type (Zhao NPL section 2 and 2.1, character tags applied to characters in the training data including three types of named entities, person, location and organization names), and the training of the mapping further comprises training, by the one or more processors, the mapping based on the name type, such that the trained mapping identifies the sample language entity as the name type (Zhao NPL Abstract, sections 2.1-2.3, in a machine learning based algorithm (thus by one or more processors), NE (named entity) recognizers are trained with a 6-tag set (including the named entity tags) and n-gram feature templates, where section 3 teaches effective performance of the named entities as an output from the recognizers).

Regarding claim 8, Wang NPL does not explicitly teach the limitations of claim 8.
Zhao NPL teaches further comprising providing, by the one or more processors, a candidate name type associated with the candidate language entity (Zhao NPL, Abstract and section 2.1, the tag set used with the disclosed named entity recognizer includes PER, LOC and ORG designating three types of names – person, location or organization).
Therefore, taking the teachings of Wang NPL and Zhao NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the named entity training and recognition as disclosed in Zhao NPL at least because doing so would provide further performance enhancement on both word segmentation and named entity recognition (Zhao NPL Abstract and section 1, Introduction).
Regarding claim 9, Wang NPL teaches wherein the obtaining of the plurality of sample language entities comprises selecting a sample language entity that is Wang NPL section 5.1, fig. 1, training corpora is Chinese character based (another language) which is translated to determine English word boundaries).
Wang NPL does not explicitly teach wherein the name type comprises at least one of: a name of a person, a name of place, and a name of a drug.
Zhao NPL teaches wherein the name type comprises at least one of: a name of a person, a name of place, and a name of a drug (Zhao NPL section 2.1, tag set includes a person’s name and location).
Therefore, taking the teachings of Wang NPL and Zhao NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the named entity training and recognition as disclosed in Zhao NPL at least because doing so would provide further performance enhancement on both word segmentation and named entity recognition (Zhao NPL Abstract and section 1, Introduction).
Claims 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Wang NPL in view of Yuan NPL in view of Qian in view of Xianyun NPL in view of DaCiDian NPL as set forth above regarding claim 10 from which claim 12 depends, further in view of Yang et al., (US 6,562,078 B1, herein “Yang”).
Regarding claim 12, Wang NPL teaches wherein the determination of respective pinyin codes further comprises: with respect to a Chinese character comprised in the string (Wang NPL fig. 1, section 2.4, Chinese characters from an input layer comprised of a string of Chinese characters, are pre-processed into annotated pinyin codes). 
Wang NPL does not explicitly teach determining, by the one or more processors, an initial portion in a pinyin code for the Chinese character; and updating, by the one or more processors, the pinyin code based on the determined initial portion.
Yang teaches determining, by the one or more processors (Yang col. 12, lines 26-37, invention as described implemented as a combination of hardware and software, the hardware including a processing unit shown in fig. 12), an initial portion in a pinyin code for the Chinese character (Yang col. 7, fig. 8, lines 30-61, and col. 8, lines 19-23, an initial sound for a pinyin representation is selected, then entered (determined) by the system in order to reference a table which lists all the potential final sounds that can follow the initial sound);
updating, by the one or more processors, the pinyin code based on the determined initial portion (Yang col. 8, lines 15-24, the pinyin string is constructed by the initial and final sound and then output (updating) to a prediction module).
Therefore, taking the teachings of Wang NPL and Yang together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the formation of the pinyin string considering its initial part as disclosed in Yang at least because doing so would provide accuracy in Pinyin representations (Yang col. 11, lines 33-34).
Regarding claim 13, Wang NPL teaches wherein the determination of respective pinyin codes further comprises: with respect to a Chinese character Wang NPL fig. 1, section 2.4, Chinese characters from an input layer comprised of a string of Chinese characters, are pre-processed into annotated pinyin codes). 
Wang NPL does not explicitly teach determining, by the one or more processors, a final portion in a pinyin code for the Chinese character; and updating, by the one or more processor, the pinyin code based on the determined final portion.
Yang teaches determining, by the one or more processors (Yang col. 12, lines 26-37, invention as described implemented as a combination of hardware and software, the hardware including a processing unit shown in fig. 12), a final portion in a pinyin code for the Chinese character (Yang col. 7, fig. 8, lines 30-61, and col. 8, lines 19-23, possible valid final sounds are presented to the user in view of the input initial sound, and a final sound is selected, then the system a string representing the selected final sound is appended to the Pinyin string); and 
updating, by the one or more processor, the pinyin code based on the determined final portion (Yang col. 8, lines 15-24, the pinyin string is constructed by the initial and final sound and then output (updating) to a prediction module).
Therefore, taking the teachings of Wang NPL and Yang together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the formation of the pinyin string considering its final part as disclosed in Yang at least because doing so would provide accuracy in Pinyin representations (Yang col. 11, lines 33-34).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Wang NPL in view of Yuan NPL in view of Qian in view of Xianyun NPL  in view of DaCiDian NPL as set forth above regarding claim 10 from which claim 14 depends, further in view of Zhang et al., "While Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?” arXiv:1708.02657v2 [cs.CL], 24 pages, August 17, 2017 (herein “Zhang NPL”).
Regarding claim 14, Wang NPL teaches wherein the generation of the respective pinyin features comprises (Wang NPL sections 2.4 and 4, word2vec (another python library hence by one or more processors) is used to retrieve multiple embeddings where an input vector is generated xp(t) for the pinyin embedding, where a vector is comprised of weighted features), and Wang teaches by the one or more processors (Wang NPL Abstract, footnote 1, code and results of the disclosed process of multiple character embeddings for Chinese word segmentation are documented in the github repository (programming code repository – thus computer implemented via one or more processors), but does not explicitly teach the remainder of the limitations of claim 5.
Zhang NPL teaches with respect to a pinyin code, obtaining a predefined length for generating a pinyin feature from the pinyin code (Zhang NPL sections 2.3 and 4.2, the embedding process associates each entity with a fixed size (predetermined length) vector, where the embedding is at the level of Romanization character, and where the Chinese language is Romanized into Pinyin using the pypinyin package (same package as used by Wang NPL)); 
Zhang NPL section 2.3, for character level encoding (generating the pinyin feature), one additional entry is added to the encoding vocabulary to include a padding symbol for shorter texts (pinyin code being below the predefined length)).
Therefore, taking the teachings of Wang NPL and Zhang NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the pinyin vector encoding as disclosed in Zhang NPL at least because doing so would allow for a smaller memory footprint in language processing (Zhang NPL section 2.3).
Claims 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang NPL in view of Yuan NPL in view of Qian in view of Xianyun NPL in view of DaCiDian NPL as set forth above regarding claim 10 from which claim 15 depends, and as set forth above regarding claim 19 from which claim 20 depends, further in view of Wang et al., “Multi-Embedding-CWS,” https://github.com/wangjksjtu/multi-embedding-cws, May 2018 (selected portions of the repository have been saved as PDF text and provided with this action, herein “Wang Github Repo”, the entire repository is available at https://github.com/wangjksjtu/multi-embedding-cws, last accessed May 18, 2020).
Regarding claim 15, Wang NPL teaches further comprising: obtaining, by the one or more processors, a plurality of sample language (Wang NPL section 5.1, corpora used for training the machine learning models disclosed include SIGHAN 2005 and Chinese Treebank 6.0 datasets); 
with respect to one of the plurality of sample language (Wang NPL section 5.1, mapping performed for characters (corresponding to sample language) from the corpora to embeddings),
determining, by the one or more processors (Wang NPL Abstract, footnote 1, code and results of the disclosed process documented in the github repository (programming code repository – thus computer (one or more processors) implemented), respective sample pinyin codes for respective Chinese characters comprised in the sample language (Wang NPL section 5.1, in the preprocessing an embedding is obtained for the Chinese characters, where under the Embedding Ablation section is taught that the models are trained with Pinyin embeddings); 
generating, by the one or more processors, respective sample pinyin features from the respective sample pinyin codes (Wang NPL section 5.1, Embedding Ablation section, Pinyin embeddings are generated from the training data, where section 2.4 teaches that the embedding annotates pinyin codes and converts them to a vector (features)  via the word2vec tool); and 
training, by the one or more processors (Wang NPL Abstract, footnote 1, code and results of the disclosed process documented in the github repository (programming code repository – thus computer (one or more processors) implemented), the mapping based on the respective sample pinyin features and the sample language (Wang NPL section 5.1, preprocessing includes obtaining the embeddings for characters (mapping to sample language) using word2vec where section I Introduction teaches word2vec generates vector representations of words or characters (features)), where the embedding ablation teaches the embeddings are for pinyin), such that the trained mapping identifies the sample language entity (Wang NPL fig. 1 and section 3.4, the trained neural network outputs a label of B, M, E, and S indicating parts of a word or a single word itself, where fig. 1 shows how the labels are used to determine the word segmentation, with either/both the parts of a word or the word itself being a language entity).
While Wang NPL teaches using the Chinese Treebank corpora for training it’s disclosed neural network, Wang NPL does not explicitly teach that the Chinese Treebank contains entities.
Wang Github Repo teaches that the Chinese Treebank (and all training corpora for that matter) contains language entities (Wang Github Repo page 10 teaching a sample page from the Chinese Treebank corpora which shows the characters separated by spaces which indicate word separations (entities), and also see page 5, with code from the pre_train.py with a function that reads in a corpora file and assumes that words are separated by spaces).
Therefore, taking the teachings of Wang NPL and Wang Github Repo together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the sample training data containing language entities as disclosed in Wang Github Repo at least because the Wang NPL reference explicitly mentions that the code and results of the process disclosed therein are available at the listed repo (Wang Github Repo), and therefore this teaching, in 
Regarding claim 20, Wang NPL teaches further comprising: obtaining a plurality of sample language (Wang NPL section 5.1, corpora used for training the machine learning models disclosed include SIGHAN 2005 and Chinese Treebank 6.0 datasets); 
with respect to one of the plurality of sample language (Wang NPL section 5.1, mapping performed for characters (corresponding to sample language) from the corpora to embeddings),
determining respective sample pinyin codes for respective Chinese characters comprised in the sample language (Wang NPL section 5.1, in the preprocessing an embedding is obtained for the Chinese characters, where under the Embedding Ablation section is taught that the models are trained with Pinyin embeddings); 
generating respective sample pinyin features from the respective sample pinyin codes (Wang NPL section 5.1, Embedding Ablation section, Pinyin embeddings are generated from the training data, where section 2.4 teaches that the embedding annotates pinyin codes and converts them to a vector (features)  via the word2vec tool); and 
training the mapping based on the respective sample pinyin features and the sample language (Wang NPL section 5.1, preprocessing includes obtaining the embeddings for characters (mapping to sample language) using word2vec where section I Introduction teaches word2vec generates vector representations of words or characters (features)), where the embedding ablation teaches the embeddings are for pinyin), such that the trained mapping identifies the sample language entity (Wang NPL fig. 1 and section 3.4, the trained neural network outputs a label of B, M, E, and S indicating parts of a word or a single word itself, where fig. 1 shows how the labels are used to determine the word segmentation, with either/both the parts of a word or the word itself being a language entity).
While Wang NPL teaches using the Chinese Treebank corpora for training it’s disclosed neural network, Wang NPL does not explicitly teach that the Chinese Treebank contains entities.
Wang Github Repo teaches that the Chinese Treebank (and all training corpora for that matter) contains language entities (Wang Github Repo page 10 teaching a sample page from the Chinese Treebank corpora which shows the characters separated by spaces which indicate word separations (entities), and also see page 5, with code from the pre_train.py with a function that reads in a corpora file and assumes that words are separated by spaces).
Therefore, taking the teachings of Wang NPL and Wang Github Repo together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the sample training data containing language entities as disclosed in Wang Github Repo at least because the Wang NPL reference explicitly mentions that the code and results of the process disclosed therein are available at the listed repo (Wang Github Repo), and therefore this teaching, in Wang NPL, would have led one of ordinary skill to consider the Wang Github Repo for .
Claims 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Wang NPL in view of Yuan  NPL in view of Qian in view of Xianyun NPL in view of DaCiDian NPL in view of Wang Github Repo, as set forth above regarding claim 15 from which claims 16-18 depend, further in view of Zhao et al., “Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition,” Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing, 2008, pp. 106-111 (available at https://www.aclweb.org/anthology/I08-4017.pdf, last accessed May 18, 2020, herein “Zhao NPL”).
Regarding claim 16, Wang NPL does not explicitly teach the limitations of claim 16.
Zhao NPL teaches wherein one of the sample language entities is labeled with a name type (Zhao NPL section 2 and 2.1, character tags applied to characters in the training data including three types of named entities, person, location and organization names), and the training of the mapping further comprises training, by the one or more processors, the mapping based on the name type, such that the trained mapping identifies the sample language entity as the name type (Zhao NPL Abstract, sections 2.1-2.3, in a machine learning based algorithm (thus by one or more processors), NE (named entity) recognizers are trained with a 6-tag set (including the named entity tags) and n-gram feature templates, where section 3 teaches effective performance of the named entities as an output from the recognizers).

Regarding claim 17, Wang NPL does not explicitly teach the limitations of claim 17.
Zhao NPL teaches further comprising providing, by the one or more processors, a candidate name type associated with the candidate language entity (Zhao NPL, Abstract and section 2.1, the tag set used with the disclosed named entity recognizer includes PER, LOC and ORG designating three types of names – person, location or organization).
Therefore, taking the teachings of Wang NPL and Zhao NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the named entity training and recognition as disclosed in Zhao NPL at least because doing so would provide further performance enhancement on both word segmentation and named entity recognition (Zhao NPL Abstract and section 1, Introduction).
Regarding claim 18, Wang NPL teaches wherein the obtaining of the plurality of sample language entities comprises selecting a sample language entity that is Wang NPL section 5.1, fig. 1, training corpora is Chinese character based (foreign language) which is translated to determine English word boundaries).
Wang NPL does not explicitly teach wherein the name type comprises at least one of: a name of a person, a name of place, and a name of a drug.
Zhao NPL teaches wherein the name type comprises at least one of: a name of a person, a name of place, and a name of a drug (Zhao NPL section 2.1, tag set includes a person’s name and location).
Therefore, taking the teachings of Wang NPL and Zhao NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Chinese word segmentation disclosed in Wang NPL to include the named entity training and recognition as disclosed in Zhao NPL at least because doing so would provide further performance enhancement on both word segmentation and named entity recognition (Zhao NPL Abstract and section 1, Introduction).

Allowable Subject Matter
Claims 3-4 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: The closest cited art of record includes Wang NPL, Yang and DaCiDian NPL. While Yang teaches the claimed determining and updating initial and final portion in a .


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908.  The examiner can normally be reached on Monday-Friday, 9:30am-6:30pm, Eastern time zone.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 


MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656