DETAILED ACTION
This communication is in response to the Amendment and Arguments filed on 01/05/2021.  Claims 1-8, 11-18, and 20 are pending and have been examined.  Claims 9, 10, and 19 have been canceled.
All previous objections/rejections not mentioned in this Office Action has been withdrawn by the Examiner.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments and Amendments
With respect to the 35 USC 112(f) claim interpretation, Applicant’s amendments moot the application of the 35 USC 112(f) interpretations, which are withdrawn.
With respect to the objection to the Specification/Abstract, Applicant’s amendments moot the objection, which is withdrawn.
With respect to the objection to Claim 1, Applicant’s amendment moots the objection, which is withdrawn.
With respect to the 35 USC 101 rejections, Applicant’s amendments moot the rejections, which are withdrawn.
With respect to the 35 USC 102 rejections, Applicant’s amendments moot the rejections, which are withdrawn.
With respect to the 35 USC 103 rejections for claims 9 and 19, Applicant’s amendments incorporate the subject matter of claims 9 and 19 into independent claims 1 and 11 respectively (and cancel claims 9 and 19).  
Argument 1: With respect to the 35 USC 103 rejections to claims 9 and 19 (as stated in the Non-Final Rejection of October 5, 2020), the Applicant presents, in the paragraph spanning pages 14 and 15, the argument that Liu teaches a similarity calculator 126 that “computes text similarity … based on sentence embedding,” “represents the words in each text ..., where semantically similar or semantically related words come closer depending on the training model,” “considers the meaning of the words in the text, but also the relationship of the words in the texts, especially the sequence of the words in the text.”  In the first full paragraph of page 15, Applicant notes that paragraph [0046] of the (as-filed) instant application describes how an example language—English—that relates to the document can be used to determine character coordinates in multidimensional space, where the character encoding is limited to the number of unique characters of the English language, where each character coordinate for a sentence is represented by a dimension size of [1, 10, 50, 256], wherein the sentence has a maximum of 10 words, each word has a maximum of 50 characters and each character is identified from the 256 unique characters in English.  Applicant concludes—based on the passage(s) cited from Liu and the description of paragraph [0046] of the instant application—that Liu fails to disclose or provide any teaching about the implementation of utilization of the language relating to the document to determine the character coordinates corresponding to the plurality of tokens in the multi-dimensional hierarchical space in the character based embeddings.
Examiner Response 1:  Applicant's arguments filed January 5, 2021 have been fully considered but they are not persuasive.  In response to applicant's argument that Liu fails to show a language related to the document, it is noted that the features upon which Applicant relies (e.g., a language related to the document) are indeed taught by Liu (US 20200134058 A1).  
Liu [0120] recites, “At procedure 508, the similarity calculator 126, upon receiving the tokenized data entries, computes the text similarity between any two of the tokenized data entries based on sentence embedding.”  This sentence embedding does not clearly exclude using character coordinates in n-dimensional vectors to determine the text similarity (e.g., note that sentences include words, which include characters).  Liu [0120] recites next, “Specifically, the similarity calculator 126 represents the words in each text (i.e., each cleaned and tokenized data entry) by an n-dimensional vector space, where semantically similar or semantically related words come closer depending on the training model.”  The words in each text being represented by an n-dimensional space does not clearly exclude using character coordinates in n-dimensional vectors to represent words in each text.  Liu [0120] then recites, “After representation of the texts by vectors, the similarity calculator 126 calculates the similarity between any two of the texts.”  This citation does not clearly exclude the use of character coordinates in n-dimensional vectors in calculating the similarity between texts.  Liu [0120] further recites, “In certain embodiments, for calculating the similarity, the similarity calculate [sic] 126 not only considers the meaning of the words in the text, but also the relationship of the words in the texts, especially the sequence of the words in the text.” The calculating the similarity considering the meaning of the words in the text, the relationship of the words in the text, and especially the sequence of the words in the text does not clearly exclude the use of 
Moreover, Liu [0120] notes tokenized texts/documents having language-related words, representation of tokenized text by vectors/character coordinates in the n-dimensional vector space, language-related semantic relationships between words, language-related meanings/concepts of words (e.g., “color,” which is an English word), and language-related syntactic structures, where the words, meanings, and syntactic structures are language objectifications per se.  Liu [0116] notes the process of FIG. 5 (a portion of which is described in Liu [0120]) is implemented by the computing device of FIG. 1 (which includes the similarity calculator 126 described in Liu [0120] as well as the data cleaning and tokenizer 122, which is described in Liu [0073]).  Liu [0073] notes the data cleaning and tokenizer 122 provides a list of stop words, and in operation removes the listed stop words from the data entries. In an example where the language related to the document is English, where a user 1 submits a feedback message in English, “the color of this under armor T-shirt is cool,” and where the 
In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., number of unique characters of the English language, where each character coordinate for a sentence is represented by a dimension size of [1, 10, 50, 256], where the sentence has a maximum of 10 words, and each word has a maximum of 50 characters) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Hence, Applicant’s arguments are not persuasive.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention 


Claims 1-3, 5, 7, 11-13, 15, 17, and 20 are rejected under 35 U.S.C. 102(a)(2) as being unpatentable over Zhong (US 20190130248 A1) in view of Liu (US 20200134058 A1)
As per claim 1, Zhong teaches
 A system for character based contextual embedding of entities in a document, the document (Zhong discloses contextual embedding of entities in a document, where the entities can be character based,  ¶[0015], Although some dual sequence inference models receive text input sequences, such as the QA models described above, it is to be understood that the dual sequence inference models may operate on a wide variety of types of input sequences, including but not limited to text sequences, audio sequences, image sequences (e.g., video), and/or the like)5 comprising a plurality of sentences (Zhong discloses tokenizing text sequences from documents that are sentences,  ¶[0023], Tokenizing the text sequences may include identifying tokens within the text sequences, where examples of tokens include characters, character n-grams, words, word n-grams, lemmas, phrases (e.g., noun phrases), sentences, paragraphs, and/or the like) wherein the system comprises: 
a database (Zhong discloses a database such as model description file 142 and/or model parameters file 144 of FIG. 1, ¶[0021], In general, model description file 142 and/or model parameters file 144 may be store information associated with model 140 in any suitable format, including but not limited to structured, unstructured, serialized, and/or database formats); and 
a processor (Zhong discloses a processing arrangement such as processor 120 of FIG. 1, ¶[0017], Although processor 120 may include one or more general purpose central processing units (CPUs), processor 120 may additionally or alternately include at least one processor that provides accelerated performance when evaluating neural network models) communicably coupled, via one or more data communication networks, to the database (Zhong discloses a communicably coupled, via one or more data communication networks to the database, processor 120 and memory 130 of FIG. 1—which includes model description file 142 and/or model parameters file 144, ¶[0019], In some embodiments, processor 120 and/or memory 130 may include distributed, virtualized, and/or containerized computing resources. Consistent with such embodiments, processor 120 and/or memory 130 may be located in one or more data centers and/or cloud computing facilities), wherein the processor is configured to:  10
tokenize each of the plurality of sentences of the document (Zhong discloses the model 200 of FIG. 2, which is similar to model 140, and which includes any 200-series elements cited herein, ¶[0022],  According to some embodiments consistent with FIG. 1, model 200 may be used to implement model 140. Zhong further discloses tokenizing each of the plurality of sentences of the document such as text sequences 202 and/or 204, ¶[0023], For example, when sequences 202 and/or 204 correspond to text sequences, input stages 212 and/or 214 may generate the corresponding vector representations by (1) tokenizing the text sequences and (2) embedding the tokenized text sequences in a vector space. Tokenizing the text sequences may include identifying tokens within the text sequences, where examples of tokens include characters, character n-grams, words, word n-grams, lemmas, phrases (e.g., noun phrases), sentences, paragraphs, and/or the like) to obtain a plurality of tokens (Zhong discloses to obtain a plurality of tokens such as input representations 216 and/or 218, ¶[0023], Input stage 212 generates an input representation 216 of sequence 202, and input stage 214 generates an input representation 218 of sequence 204);
 determine at least one character coordinate corresponding to each of the plurality of tokens, wherein each of the at least one character coordinates corresponding to each of the plurality of tokens 15occurs in a multi-dimensional hierarchical space (Zhong teaches the input stages 212 and/or 214 determine at least one character coordinate such as a vector corresponding to each of the plurality of tokens, wherein the character coordinates corresponding to the plurality of tokens 15occur in a multi-dimensional hierarchical space, ¶[0023], Embedding the tokenized text sequences may include mapping each token to a vector representation in a multidimensional vector space. For example, a token corresponding to a word may be mapped to a 300-dimensional vector representation of the word using pre-trained GloVe vectors); 
process the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space Zhong discloses the model 300 of FIG. 3A, which is similar to encoder stage 220 model 200, and which includes any 300-series elements cited herein, ¶[0027],  According to some embodiments consistent with FIGS. 1-2, deep coattention encoder 300 may be used to implement encoder stage 220.  Zhong further discloses the model 400 of FIG. 4, which is similar to the model 300, and which includes any 400-series elements cited herein, ¶[0039], In some embodiments consistent with FIGS. 1-3B, deep coattention encoder 400 may be used to implement deep coattention encoder 300.  Zhong further discloses processing, by a two-layer BiLSTM encoder 412 and/or 414, the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space, ¶[0039], In some embodiments, the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, pretrained on a text corpus, such as the WMT machine translation corpus) by implementing a plurality of transmutation layers (Zhong discloses a coattention encoder implemented as a plurality of transmutation layers and a plurality of prediction layers, respectively, see FIGS. 3A and 3B, ¶[0039], In some embodiments, the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, ¶[0027], Deep coattention encoder 300 may include a plurality of coattention layers 310a-n arranged sequentially (e.g., in a pipelined fashion)); and  20
memorize sequential information pertaining to the contextual embeddings of the character coordinates corresponding to the plurality of tokens (Zhong discloses a coattention sublayer 416 for memorizing sequential information—such as the context representation C1D –pertaining to the contextual embeddings of the character coordinates corresponding to the plurality of tokens, ¶[0044], Based on affinity matrix A and summary representations S1D and S1Q, coattention sub-layer 416 determines document context representation C1D according to the following equation [Eq. 14], where the determined context representation C1D is inherently memorized as being an instantiation of model 140 in memory 130 as cited above) by implementing a plurality of prediction layers (Zhong discloses a context vector encoder and a coattention encoder implemented as a plurality of transmutation layers and a plurality of prediction layers, respectively, see FIGS. 3A and 3B, ¶[0039], In some embodiments, the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, ¶[0027], Deep coattention encoder 300 may include a plurality of coattention layers 310a-n arranged sequentially (e.g., in a pipelined fashion)).
However, Zhong does not teach utilizing a language relating to the document to determine the character coordinates corresponding to the plurality of tokens in the multi-dimensional hierarchical space.
Liu does teach to determine at least one character coordinate corresponding to each of the plurality of tokens utilizing a language relating to the document (see Liu [0120], which notes at procedure 508, the similarity calculator 126, upon receiving the tokenized data entries, computes the text similarity between any two of the tokenized data entries based on sentence embedding. Specifically, the similarity calculator 126 represents the words in each text (i.e., each cleaned and tokenized data entry) by an n-dimensional vector space, where semantically similar or semantically related words come closer depending on the training model. After representation of the texts by vectors, the similarity calculator 126 calculates the similarity between any two of the texts. In certain embodiments, for calculating the similarity [the similarity score], the similarity calculate [sic] 126 not only considers the meaning of the words in the text, but also the relationship of the words in the texts, especially the sequence of the words in the text).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the systems and methods as taught by Zhong to use a language as taught by Liu in order to tune a model in according to a need between high precision and high recall (see Liu [0126], which notes at procedure 514, the cluster classifier 132, upon receiving the semantic similarity scores between each pair (any two) of the data entries, classifies the data entries based on the semantic similarity scores. Specifically, the cluster classifier 132 groups the data entries into clusters, the data entries in the same cluster have high semantic similarity scores. In certain embodiments, a threshold is defined for the clusters, which means that any two data entries in the same cluster has the semantic similarity score greater than the threshold score. The value of the threshold score may be determined based on the subject matter of the data entries, the required recall, and the required precision. In certain embodiments, a small threshold value is given when high recall is needed. In certain embodiments, a large threshold value is given when high precision is needed).
The combination of Zhong in view of Liu includes predictable results, such as a selectable improvement in either recall or precision.

As per claims 2 and 12, Zhong in view of Liu teaches all the limitations of claim 1 and further teaches wherein the plurality of transmutation layers and the plurality of prediction layers employ machine learning algorithms (Zhong discloses the transmutation layers and the prediction layers employ machine learning algorithms, ¶[0036], In some embodiments, deep coattention encoder 300 may include a plurality of model parameters learned according to a machine learning process, such as a supervised learning process, a reinforcement learning process, an unsupervised learning process, and/or the like.)

As per claims 3 and 13, Zhong in view of Liu teaches all the limitations of claim 2 and further teaches wherein the plurality of transmutation layers and the plurality of prediction layers, employing the machine learning algorithms, are trained using unsupervised learning techniques (Zhong discloses the transmutation layers and the prediction layers employ unsupervised learning techniques, ¶[0036], In some embodiments, deep coattention encoder 300 may include a plurality of model parameters learned according to a machine learning process, such as a supervised learning process, a reinforcement learning process, an unsupervised learning process, and/or the like.)

As per claims 5 and 15, Zhong in view of Liu teaches all the limitations of claim 2 and further teaches wherein the system further: determines a loss score (Zhong discloses determining an F1 score accuracy, see ¶[0014], where the F1 score accuracy includes a loss evaluation, ¶[0054] Accordingly, learning objective 520 may include a reinforcement learning objective 550 based on a non-binary evaluation metric, such as the F1 score. In some embodiments, reinforcement learning objective 550 may use the non-binary evaluation metric to define a loss and/or reward function for a reinforcement learning process…, reinforcement learning objective 550 may be evaluated as follows… [Eq. 21] … where … lossrl(Θ) is the reinforcement learning loss for a given set of model parameters Θ.) relating to the plurality of transmutation layers and 10the plurality of prediction layers (Zhong discloses the model 510 of FIG. 5, which is similar to models 140, 200, 300, and 400, ¶[0047],  In some embodiments consistent with FIGS. 1-4, model 510 may be used to implement model 200.  In some embodiments, training configuration 500 may be used to reduce the amount of time and/or training data used to train model 510. In some embodiments, model 510 may include a deep coattention encoder, such as deep coattention encoders 300 and/or 400; and re-trains the plurality of transmutation layers and the plurality of prediction layers, for determining optimum character based contextual embedding of entities in the document (Zhong further discloses the model 510 is trained as a whole, so that the elements of the model—such as the transmutation module and the prediction module—are both re-trained for determining optimum character based contextual embedding of entities in the document, ¶[0050],  In some embodiments, model 510 may iteratively generate a series of inferences for a given pair of input sequences.  For example, model 510 may include a coattention encoder, such as deep coattention encoder 300, that generates a codependent representation of the pair of input sequences and a dynamic decoder that iteratively generates inferences based on the codependent representation until the inferences converge (e.g., when the inferences change by less than a threshold amount during consecutive iterations).

As per claim 7 and 17, Zhong in view of Liu teaches all the limitations of claim 1 and 11, above.
However, Zhong does not specifically teach wherein the database includes at least one ontology therein.  
Liu does teach a database includes at least one ontology therein.  (Liu discloses the database includes at least one ontology therein, ¶[0054], In certain aspects, to utilize the large volume and diverse user-generated content on the web, the present disclosure provide an ontology structure for such dataset, so as to improve the efficiency of a lot of downstream semantic analysis work.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Zhong with the ontology as taught by Liu in order to improve the efficiency of a lot of downstream semantic analysis work (see Liu ¶[0054]).
The combination of Zhong and Liu includes predictable results, such as a reduction in the amount of downstream semantic analyses in a model using the ontology of the database.

As per claim 11, Zhong teaches 
a method for character based contextual embedding of 5entities in a document (Zhong discloses character based contextual embedding of entities in a document, where the entities are character based,  ¶[0015], Although some dual sequence inference models receive text input sequences, such as the QA models described above, it is to be understood that the dual sequence inference models may operate on a wide variety of types of input sequences, including but not limited to text sequences, audio sequences, image sequences (e.g., video), and/or the like), 
wherein the method (Zhong abstract) is implemented via a system (see claim 1 above) comprising
a processor (Zhong discloses a processor such as processor 120 of FIG. 1, ¶[0017], Although processor 120 may include one or more general purpose central processing units (CPUs), processor 120 may additionally or alternately include at least one processor that provides accelerated performance when evaluating neural network models) communicably coupled, via one or more data communication networks (Zhong discloses a communicably coupled, via one or more data communication networks to the database, processor 120 and memory 130 of FIG. 1—which includes model description file 142 and/or model parameters file 144, ¶[0019], In some embodiments, processor 120 and/or memory 130 may include distributed, virtualized, and/or containerized computing resources. Consistent with such embodiments, processor 120 and/or memory 130 may be located in one or more data centers and/or cloud computing facilities), 
to a database (Zhong discloses a database such as model description file 142 and/or model parameters file 144 of FIG. 1, ¶[0021], In general, model description file 142 and/or model parameters file 144 may be store information associated with model 140 in any suitable format, including but not limited to structured, unstructured, serialized, and/or database formats), the method comprising 
tokenizing each of the plurality of 10sentences of the document (Zhong discloses the model 200 of FIG. 2, which is similar to model 140, and which includes any 200-series elements cited herein, ¶[0022],  According to some embodiments consistent with FIG. 1, model 200 may be used to implement model 140. Zhong further discloses tokenizing each of the plurality of sentences of the document such as text sequences 202 and/or 204, ¶[0023], For example, when sequences 202 and/or 204 correspond to text sequences, input stages 212 and/or 214 may generate the corresponding vector representations by (1) tokenizing the text sequences and (2) embedding the tokenized text sequences in a vector space. Tokenizing the text sequences may include identifying tokens within the text sequences, where examples of tokens include characters, character n-grams, words, word n-grams, lemmas, phrases (e.g., noun phrases), sentences, paragraphs, and/or the like), to obtain a plurality of tokens (Zhong discloses to obtain a plurality of tokens such as input representations 216 and/or 218, ¶[0023], Input stage 212 generates an input representation 216 of sequence 202, and input stage 214 generates an input representation 218 of sequence 204);
 determining at least one character coordinate corresponding to each of the plurality of tokens, wherein each of the character coordinates corresponding to each of the plurality of tokens occurs in a multi-dimensional hierarchical space (Zhong teaches the input stages 212 and/or 214 determine at least one character coordinate such as a vector corresponding to each of the plurality of tokens, wherein the character coordinates corresponding to the plurality of tokens 15occur in a multi-dimensional hierarchical space, ¶[0023], Embedding the tokenized text sequences may include mapping each token to a vector representation in a multidimensional vector space. For example, a token corresponding to a word may be mapped to a 300-dimensional vector representation of the word using pre-trained GloVe vectors);  15
processing the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space (Zhong discloses the model 300 of FIG. 3A, which is similar to encoder stage 220 model 200, and which includes any 300-series elements cited herein, ¶[0027],  According to some embodiments consistent with FIGS. 1-2, deep coattention encoder 300 may be used to implement encoder stage 220.  Zhong further discloses the model 400 of FIG. 4, which is similar to the model 300, and which includes any 400-series elements cited herein, ¶[0039], In some embodiments consistent with FIGS. 1-3B, deep coattention encoder 400 may be used to implement deep coattention encoder 300.  Zhong further discloses processing, by a two-layer BiLSTM encoder 412 and/or 414, the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space, ¶[0039], In some embodiments, the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, pretrained on a text corpus, such as the WMT machine translation corpus) by implementing a plurality of transmutation layers (Zhong discloses a coattention encoder implemented as a plurality of transmutation layers and a plurality of prediction layers, respectively, see FIGS. 3A and 3B, ¶[0039], In some embodiments, the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, ¶[0027], Deep coattention encoder 300 may include a plurality of coattention layers 310a-n arranged sequentially (e.g., in a pipelined fashion)); and 
memorizing sequential information pertaining to the contextual embeddings of the character coordinates 20corresponding to the plurality of tokens (Zhong discloses a prediction module such as coattention sublayer 416 for memorizing sequential information (such as the context representation C1D) pertaining to the contextual embeddings of the character coordinates corresponding to the plurality of tokens, ¶[0044], Based on affinity matrix A and summary representations S1D and S1Q, coattention sub-layer 416 determines document context representation C1D according to the following equation [Eq. 14], where the determined context representation C1D is inherently memorized as being an instantiation of model 140 in memory 130 as cited above) by implementing a plurality of prediction layers (Zhong discloses a coattention encoder implemented as a plurality of transmutation layers and a plurality of prediction layers, respectively, see FIGS. 3A and 3B, ¶[0039], In some embodiments, the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, ¶[0027], Deep coattention encoder 300 may include a plurality of coattention layers 310a-n arranged sequentially (e.g., in a pipelined fashion)).

Liu does teach determining at least one character coordinate corresponding to each of the plurality of tokens utilizing a language relating to the document (see Liu [0120], which notes at procedure 508, the similarity calculator 126, upon receiving the tokenized data entries, computes the text similarity between any two of the tokenized data entries based on sentence embedding. Specifically, the similarity calculator 126 represents the words in each text (i.e., each cleaned and tokenized data entry) by an n-dimensional vector space, where semantically similar or semantically related words come closer depending on the training model. After representation of the texts by vectors, the similarity calculator 126 calculates the similarity between any two of the texts. In certain embodiments, for calculating the similarity [the similarity score], the similarity calculate [sic] 126 not only considers the meaning of the words in the text, but also the relationship of the words in the texts, especially the sequence of the words in the text).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the systems and methods as taught by Zhong to use a language as taught by Liu in order to tune a model in according to a need between high precision and high recall (see Liu [0126], which notes at procedure 514, the cluster classifier 132, upon receiving the semantic similarity scores between each pair (any two) of the data entries, classifies the data entries based on the semantic similarity scores. Specifically, the cluster classifier 132 groups the data entries into clusters, the data entries in the same cluster have high semantic similarity scores. In certain embodiments, a threshold is defined for the clusters, which means that any two data entries in the same cluster has the semantic similarity score greater than the threshold score. The value of the threshold score may be determined based on the subject matter of the data entries, the required recall, and the required precision. In certain embodiments, a small threshold value is given when high recall is needed. In certain embodiments, a large threshold value is given when high precision is needed).
The combination of Zhong in view of Liu includes predictable results, such as a selectable improvement in either recall or precision.

As per claim 20, Zhong teaches
computer program product comprising non-transitory computer- readable storage media having computer-readable instructions stored 25thereon, (Zhong discloses non-transitory computer- readable storage media having computer-readable instructions stored 25thereon,  ¶[0018], Memory 130 may include various types of short-term and/or long-term storage modules including cache memory, static random access memory (SRAM), dynamic random access memory (DRAM), non-volatile memory (NVM), flash memory, solid state drives (SSD), hard disk drives (HDD), optical storage media, magnetic tape, and/or the like), the computer-readable instructions being executable by a computerized device comprising processing hardware to execute a method of claim 11 (Zhong discloses computer-readable instructions being executable by a computerized device comprising processing hardware to execute a method of claim 11,  ¶[0018], In some embodiments, memory 130 may store instructions that are executable by processor 120 to cause processor 120 to perform operations corresponding to processes disclosed herein and described in more detail below). 
Claim 20 incorporates by reference the limitations of the method of claim 11, for which, as per claim 11, Zhong in view of Liu teaches:
a method for character based contextual embedding of 5entities in a document (Zhong discloses character based contextual embedding of entities in a document, where the entities are character based,  ¶[0015], Although some dual sequence inference models receive text input sequences, such as the QA models described above, it is to be understood that the dual sequence inference models may operate on a wide variety of types of input sequences, including but not limited to text sequences, audio sequences, image sequences (e.g., video), and/or the like), 
wherein the method (Zhong abstract) is implemented via a system (see claim 1 above) comprising
a processor (Zhong discloses a processor such as processor 120 of FIG. 1, ¶[0017], Although processor 120 may include one or more general purpose central processing units (CPUs), processor 120 may additionally or alternately include at least one processor that provides accelerated performance when evaluating neural network models) communicably coupled, via one or more data communication networks (Zhong discloses a communicably coupled, via one or more data communication networks to the database, processor 120 and memory 130 of FIG. 1—which includes model description file 142 and/or model parameters file 144, ¶[0019], In some embodiments, processor 120 and/or memory 130 may include distributed, virtualized, and/or containerized computing resources. Consistent with such embodiments, processor 120 and/or memory 130 may be located in one or more data centers and/or cloud computing facilities), 
to a database (Zhong discloses a database such as model description file 142 and/or model parameters file 144 of FIG. 1, ¶[0021], In general, model description file 142 and/or model parameters file 144 may be store information associated with model 140 in any suitable format, including but not limited to structured, unstructured, serialized, and/or database formats), the method comprising 
tokenizing each of the plurality of 10sentences of the document (Zhong discloses the model 200 of FIG. 2, which is similar to model 140, and which includes any 200-series elements cited herein, ¶[0022],  According to some embodiments consistent with FIG. 1, model 200 may be used to implement model 140. Zhong further discloses tokenizing each of the plurality of sentences of the document such as text sequences 202 and/or 204, ¶[0023], For example, when sequences 202 and/or 204 correspond to text sequences, input stages 212 and/or 214 may generate the corresponding vector representations by (1) tokenizing the text sequences and (2) embedding the tokenized text sequences in a vector space. Tokenizing the text sequences may include identifying tokens within the text sequences, where examples of tokens include characters, character n-grams, words, word n-grams, lemmas, phrases (e.g., noun phrases), sentences, paragraphs, and/or the like), to obtain a plurality of tokens (Zhong discloses to obtain a plurality of tokens such as input representations 216 and/or 218, ¶[0023], Input stage 212 generates an input representation 216 of sequence 202, and input stage 214 generates an input representation 218 of sequence 204);
 determining at least one character coordinate corresponding to each of the plurality of tokens, wherein each of the character coordinates corresponding to each of the plurality of tokens occurs in a multi-dimensional hierarchical space (Zhong teaches the input stages 212 and/or 214 determine at least one character coordinate such as a vector corresponding to each of the plurality of tokens, wherein the character coordinates corresponding to the plurality of tokens 15occur in a multi-dimensional hierarchical space, ¶[0023], Embedding the tokenized text sequences may include mapping each token to a vector representation in a multidimensional vector space. For example, a token corresponding to a word may be mapped to a 300-dimensional vector representation of the word using pre-trained GloVe vectors);  15
processing the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space (Zhong discloses the model 300 of FIG. 3A, which is similar to encoder stage 220 model 200, and which includes any 300-series elements cited herein, ¶[0027],  According to some embodiments consistent with FIGS. 1-2, deep coattention encoder 300 may be used to implement encoder stage 220.  Zhong further discloses the model 400 of FIG. 4, which is similar to the model 300, and which includes any 400-series elements cited herein, ¶[0039], In some embodiments consistent with FIGS. 1-3B, deep coattention encoder 400 may be used to implement deep coattention encoder 300.  Zhong further discloses processing, by a two-layer BiLSTM encoder 412 and/or 414, the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space, ¶[0039], In some embodiments, the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, pretrained on a text corpus, such as the WMT machine translation corpus) by implementing a plurality of transmutation layers (Zhong discloses a coattention encoder implemented as a plurality of transmutation layers and a plurality of prediction layers, respectively, see FIGS. 3A and 3B, ¶[0039], In some embodiments, the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, ¶[0027], Deep coattention encoder 300 may include a plurality of coattention layers 310a-n arranged sequentially (e.g., in a pipelined fashion)); and 
memorizing sequential information pertaining to the contextual embeddings of the character coordinates 20corresponding to the plurality of tokens (Zhong discloses a prediction module such as coattention sublayer 416 for memorizing sequential information (such as the context representation C1D) pertaining to the contextual embeddings of the character coordinates corresponding to the plurality of tokens, ¶[0044], Based on affinity matrix A and summary representations S1D and S1Q, coattention sub-layer 416 determines document context representation C1D according to the following equation [Eq. 14], where the determined context representation C1D is inherently memorized as being an instantiation of model 140 in memory 130 as cited above) by implementing a plurality of prediction layers (Zhong discloses a coattention encoder implemented as a plurality of transmutation layers and a plurality of prediction layers, respectively, see FIGS. 3A and 3B, ¶[0039], In some embodiments, the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, ¶[0027], Deep coattention encoder 300 may include a plurality of coattention layers 310a-n arranged sequentially (e.g., in a pipelined fashion)).
However, Zhong does not teach utilizing a language relating to the document to determine the character coordinates corresponding to the plurality of tokens in the multi-dimensional hierarchical space.
Liu does teach determining at least one character coordinate corresponding to each of the plurality of tokens utilizing a language relating to the document (see Liu [0120], which notes at procedure 508, the similarity calculator 126, upon receiving the tokenized data entries, computes the text similarity between any two of the tokenized data entries based on sentence embedding. Specifically, the similarity calculator 126 represents the words in each text (i.e., each cleaned and tokenized data entry) by an n-dimensional vector space, where semantically similar or semantically related words come closer depending on the training model. After representation of the texts by vectors, the similarity calculator 126 calculates the similarity between any two of the texts. In certain embodiments, for calculating the similarity [the similarity score], the similarity calculate [sic] 126 not only considers the meaning of the words in the text, but also the relationship of the words in the texts, especially the sequence of the words in the text).
(see Liu [0126], which notes at procedure 514, the cluster classifier 132, upon receiving the semantic similarity scores between each pair (any two) of the data entries, classifies the data entries based on the semantic similarity scores. Specifically, the cluster classifier 132 groups the data entries into clusters, the data entries in the same cluster have high semantic similarity scores. In certain embodiments, a threshold is defined for the clusters, which means that any two data entries in the same cluster has the semantic similarity score greater than the threshold score. The value of the threshold score may be determined based on the subject matter of the data entries, the required recall, and the required precision. In certain embodiments, a small threshold value is given when high recall is needed. In certain embodiments, a large threshold value is given when high precision is needed).
The combination of Zhong in view of Liu includes predictable results, such as a selectable improvement in either recall or precision.


Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Zhong in view of Liu and in further view of Zhang US20180165554.  
As per claims 4 and 14, Zhong in view of Liu teaches all the limitations as in claim 3 and 13, above, and Zhong further teaches a training dataset for 5the plurality of transmutation layers includes a first set of existing publications and a training dataset for the plurality of prediction layers includes a second set of existing publications (Zhong discloses an training dataset for the transmutation module includes a first set of existing publications such as a WMT corpus, ¶[0039], the context vector embeddings are generated by a context vector encoder, such as a two-layer BiLSTM encoder, pretrained on a text corpus, such as the WMT machine translation corpus, and a training dataset for the prediction module includes a second set of existing publications, such as a SQuAD training set, ¶[0066], FIGS. 8A-8D are simplified diagrams of an experimental evaluation of a QA model according to some embodiments. The QA model being evaluated includes a deep coattention encoder, configured as depicted in FIG. 4, and is trained on the Stanford Question Answering Dataset (SQuAD) using a mixed learning objective, with a training configuration as depicted in FIG. 5.) . 
However, Zhong in view of Liu does not specifically teach wherein an unlabeled training dataset for 5the transmutation module includes a first set of existing publications and an unlabeled training dataset for the prediction module includes a second set of existing publications.
Zhang does teach that the training sets can be unlabeled datasets.  (Zhang discloses a training dataset can be unlabeled ¶[0070], Note also that Equation (4) on the one hand uses label information (θ has been trained with labeled data), on the other hand no explicit labels are directly referred to (only requires xi). Thus one is able to train an autoencoder on both labeled and unlabeled data with the loss function in Equation (4). This subtlety distinguishes our method from pure supervised or unsupervised learning, and allows us to enjoy the benefit from both worlds).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by (see Zhang ¶[0070]).
The combination of Zhong and Liu with Zhang includes predictable results, such as a training of the transmutation module and the prediction module.

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Zhong in view of Liu as applied to claim 2 and 12 above, and further in view of non-patent literature Gu, Shixiang, and Luca Rigazio, "Towards deep neural network architectures robust to adversarial examples." arXiv preprint arXiv:1412.5068 (2014). 
As per claim 6 and 16, Zhong in view of Liu teaches all the limitations of claim 2 and 12, above.
However, Zhong in view of Liu does not specifically teach wherein the plurality of prediction layers is trained by 15employing a generator-adversarial network.
Gu does teach wherein the plurality of prediction layers is trained by 15employing a generator-adversarial network (Gu, see Abstract,  § 2.1 Generating Adversarial Examples, page 3, first paragraph, “we propose Deep Contractive Network, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE). This increases the network robustness to adversarial examples, without a significant performance penalty,” for training using a generator-adversarial network to generate deep neural network architectures that are robust to adversarial examples).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Gu, § I Introduction, page 2, last paragraph, “We propose Deep Contractive Networks (DCNs), which incorporate a layer-wise contractive penalty, and show that adversarials generated from such networks have significantly higher distortion. We believe our initial results could serve as the basis for training more robust neural networks that can only be misdirected by a substantial noise, in a way that is more attuned to how human perception performs).
The combination of Zhong and Liu with Gu includes predictable results, such as a training of the deep neural network architectures robust to adversarials.

Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhong in view of Liu as applied to claims 7 and 17 above, and further in view of Saha (US 20200073787 A1).  
As per claims 8 and 18, Zhong in view of Liu teaches all the limitations of claim 7 and 17, above.
However, Zhong in view of Liu does not specifically teach the processor employs the at least one ontology stored in the database of the system for tokenizing 20each of the plurality of sentences of the document to obtain the plurality of tokens.
Saha does teach wherein the processor employs the at least one ontology stored in the database of the system for tokenizing 20each of the plurality of sentences of the document to obtain the plurality of tokens (Saha discloses the transmutation module and the prediction module employ machine learning algorithms, see FIG. 1, ¶[0019], the evidence mapping component 102, which maps tokens into ontology elements, considers head phrases versus ontology elements and aggregation, how to match phrases (longer versus shorter matches, for example), index matches (value matches), metadata (ontology element) matches, and/or implicit matches (time and/or sentiment), boundaries of nested queries, implicit versus explicit relationships, relationship versus concept and/or property mapping, etc…. As also noted above, "evidence mapping," as used herein, refers to a process wherein tokens and/or words in the queries are mapped to the domain ontology.  This mapping can be dictionary-based and/or machine learning-based, and an objective of such mapping is to map the words as closely and correctly to domain terms represented through the ontology).
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify the systems and methods as taught by Zhong in view of Liu to use an ontology as taught by Saha in order to improve the mapping of words closely and correctly to domain terms represented through the ontology (see Saha ¶[0019]).
The combination of Zhong and Liu in further view of Saha includes predictable results, such as an improvement in the mapping of words and the accuracy of the output of a model using the ontology of the database.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK R HENNINGS whose telephone number is (571) 272-9676. The examiner can normally be reached on Monday-Friday 8:00 am-5:00 pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- 

/MARK HENNINGS/
Examiner, Art Unit 2659


/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659