DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the amendment filed July 8, 2022.  Claims 1, 2, and 14-18 have been amended.   Claims 1-20 remain pending.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed inventions are directed to non-statutory subject matter, namely an abstract idea without significantly more.  
Independent claims 1 and 14 recite “an apparatus comprising processing circuitry configured to pre-process […]” and “a method of pre-processing […],” respectively, “text data for inputting to a trained model, the pre-processing comprising: receiving  […].”  Independent claims 15 and 16 recite “an apparatus comprising processing circuitry […]” and “a method […],” respectively, comprising “receiving a vector representation of a set of text data, wherein the vector representation has been obtained by pre-processing the set of text data, the pre- processing comprising: receiving the set of text data […].”  Independent claims 1, 14, 15, and 16 recite a “set of text data including numerical information, the set of text data comprising a plurality of tokens, wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information;  transforming each of the plurality of tokens into a respective encoding vector, wherein for each of the tokens in the first subset that do not comprise numerical information, the respective encoding vector is an encoding of the token, and wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset; assigning a respective numerical vector to each of the plurality of tokens, wherein each token in the second subset is assigned a respective numerical vector in dependence on the numerical information in said token; and extending the encoding vector for each token by the numerical vector for said token to obtain a respective extended embedding vector for each token, thereby obtaining a vector representation of the text data.”
Independent claims 17 and 18 recite “an apparatus comprising processing circuitry configured to receive […],” and “a method comprising: receiving […],” respectively, "a vector representation of training text data, the vector representation comprising a respective encoding vector and numerical vector for each of a plurality of tokens in the training text data, wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information; wherein for each of the tokens in the first subset that do not comprise numerical information, the respective encoding vector is an encoding of the token, and wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset; and using the vector representation of the training text data to train a model to produce a desired output when provided with a vector representation of a target text, wherein the desired output is dependent on numerical information in the target text.”
Independent claims 19 and 20 recite “an apparatus comprising processing circuitry configured to: receive […]” and a “method comprising: receiving […]” respectively, “training text data; and train a model on the training text data, such that the model is trained to learn contextualized semantic meaning of a plurality of numbers in the text data from contextual information in tokens around the numbers and to pass values for the numbers to an inference system.”
As drafted, the limitations of “pre-processing text data for inputting to a trained model […], receiving a set […], comprising a plurality of tokens […], comprises tokens that do not comprise […], comprises tokens that each comprise […], transforming […] tokens into […], having a common encoding vector, assigning a […] vector […], is assigned a […] vector […], and extending the […] vectors, to obtain a vector representation” and “receiving a vector representation […], obtained by pre-processing […],” “receiving a vector representation […], comprising a respective encoding vector […], using the vector representation […], to train a model to produce a desired output […], output is dependent on numerical information […], ”receiving training text data […], train a model on the training text data […], the model is trained to learn contextualized semantic meaning […]” and “pass values for the numbers to an inference system.” all describe abstract ideas; in particular, mental processes.  
The limitation from claims 1 and 14-16  of “pre-process text data for inputting to a trained model” reads on a human proof-reading, categorizing, sorting, compiling, and encoding text.  The limitation of “receiving a set of text data including numerical information” reads on a human reading text.  The limitation of “the set of text data comprising a plurality of tokens” reads on a human categorizing text.  The limitation of “wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information” reads on a human categorizing and sorting text.  The limitation of “transforming each of the plurality of tokens into a respective encoding vector, wherein for each of the tokens in the first subset that do not comprise numerical information, the respective encoding vector is an encoding of the token, and wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset” reads on a human compiling, sorting text into portions of text with numerical information and text without numerical information, and encoding text.  The limitation of “assigning a respective numerical vector to each of the plurality of tokens” reads on a human categorizing, sorting, and compiling text.  The limitations of “wherein each token in the second subset is assigned a respective numerical vector in dependence on the numerical information in said token” and “extending the encoding vector for each token by the numerical vector for said token to obtain a respective extended embedding vector for each token, thereby obtaining a vector representation of the text data” read on a human compiling, sorting, combining/merging and encoding text.  
The limitations from claims 15 and 16 of “receiving a vector representation of a set of text data, wherein the vector representation has been obtained by pre-processing the set of text data, the pre-processing comprising: receiving the set of text data” read on a human reading, proof-reading, categorizing, sorting, compiling, and encoding text; and applying a model to get an output reads on a human using rules of organized text data to mentally an output.  
The limitations from claims 17 and 18 of “a vector representation of training text data” read on human-encoded text.  The limitations of “the vector representation comprising a respective encoding vector and numerical vector for each of a plurality of tokens in the training text data” read on human proof-read, categorized, sorted, compiled, and encoded text.  The limitation of “wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information” reads on a human categorizing and sorting text.  The limitation wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset” reads on a human compiling, sorting text into portions of text with numerical information and text without numerical information, and encoding text.  The limitation of “using the vector representation of the training text data to train a model to produce a desired output when provided with a vector representation of a target text” reads on a human using human-encoded text to create a model with other human-encoded text.  The limitation of “the desired output is dependent on numerical information in the target text” reads on numerical information in human-encoded text. 
The limitations from claims 19 and 20 of “receiving training text data and train a model on the training text data” read on a human creating a model after reading compiling and sorting text.  The limitations of “such that the model is trained to learn contextualized semantic meaning of a plurality of numbers in the text data from contextual information in tokens around the numbers and to pass values for the numbers to an inference system” read on a human reading sorting/compiling text, discerning the semantic context the numerical information in the text, and making a decision, determination, or prediction based on the text contents.
All of the limitations listed above for claims 1 and 14-20 encompass a process a human can discern mentally using pen and paper and a general purpose computer, regardless of the fact that the claim limitations may use a computer or computing apparatus to perform the process (see MPEP 2106.04(a)(2)(III)(C)).
The limitation of “comprising processing circuitry” relates to insignificant extra-solution activity (see MPEP 2106.05(g)); in particular, the post-solution activity of insignificant computer implementation serving as mere instructions to apply a judicial exception to general “processing circuitry.”  The addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood or conventional.  
Furthermore, the above judicial exception examples are not integrated into a practical application.  Pages 6:20-35 and 7:1-9 of the as-filed specification describe use of a general computer or computing apparatus.  As mentioned above, the additional element of performing the process using a computer or general-purpose processor (i.e. applying a judicial exception using a computer component) cannot provide an inventive concept.  Also, the claimed invention does not claim a particular solution to a problem or a particular way to achieve a desired outcome (see MPEP 2106.05(f)(1)).  Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  Therefore, the as-drafted claim limitations do not recite additional elements that are sufficient to amount to significantly more than the judicial exception, thereby making the claimed invention(s) ineligible for patenting.   
Regarding dependent claim 2, the limitation of “wherein each token in the first subset is assigned the same numerical vector” reads on a human encoding sorted and categorized text. Regarding dependent claim 3, the limitation of “wherein each token comprises a respective word, word-piece, group of words, number, or symbol” reads on a human categorizing, sorting, and compiling text.  Regarding dependent claim 4, the limitation of “wherein the combining of the encoding vectors and numerical vectors comprises, for each token, appending the numerical vector for said token to the encoding vector for said token” reads on a human encoding text.  Regarding dependent claim 5, the limitation of “wherein the processing circuitry is further configured to input the vector representation to the trained model and to use the trained model to process the vector representation to obtain a desired output” reads on a human using encoded in a model to reach a generic output or conclusion.  The inclusion of “processing circuitry” relates to insignificant extra-solution activity; in particular, the post-solution activity of insignificant computer implementation serving as mere instructions to apply a judicial exception to general “processing circuitry.”   Regarding dependent claim 6, the limitation of “wherein the trained model comprises a neural network” relates to insignificant extra-solution activity; in particular, the post-solution activity of insignificant computer implementation serving as mere instructions to apply a judicial exception to a generic “neural network.”  Regarding dependent claim 7, the limitation of “wherein the trained model is trained to approximate at least one mathematical operation” reads on a human-created model doing math, something a human can perform mentally with a pen and paper.  Regarding dependent claim 8, the limitations of “wherein the desired output comprises at least one of a comparison of two or more numbers in the text data; a determination of whether a number in the text data belongs to an interval; a comparison of a number in the text data to a target value” read on a human comparing numbers in text and a human discerning the characteristics of text.  Regarding dependent claim 9, the limitations of “wherein the desired output comprises at least one of an assessment of a subject's condition; a chance of survival, an assessment of contraindications; a diagnosis; a prediction; a likelihood of an outcome; a summary of the text data; a determination of whether a number in the text data is normal; a determination of whether a number in the text data is abnormal” reads on a physician assessing, calculating a probability, assessing, diagnosing, predicting, calculating a probability, summarizing, or determining normality or abnormality, respectively, of various medical outcomes of a subject based on text contents and context.  Regarding dependent claim 10, the limitation of “wherein the text data comprises clinical notes and wherein the numerical information comprises at least one of laboratory data, symptom data, vital signs data, dosage data, measurement data, genomic data” reads on a human categorizing the of content of medically-related text, in particular the numerical information.  Regarding dependent claim 11, the limitation of “wherein the processing circuitry is further configured to display at least part of the text data to a user, and to highlight at least one number in the display of the text data in dependence on at least one output of the trained model” reads on a human showing text to a user after highlighting numbers based on a generic output or conclusion.  The inclusion of “processing circuitry” and relates to insignificant extra-solution activity; in particular, the post-solution activity of insignificant computer implementation serving as mere instructions to apply a judicial exception to general “processing circuitry.”  Regarding dependent claim 12, the limitation of “wherein the vector representation is configured to capture a natural ordering of numbers” reads on a human encoding text using a designated encoding scheme.  Regarding dependent claim 13, the limitation of “wherein each numerical vector comprises at least one of: a fixed length vector comprising or representing the numerical information; a vector of length 1 comprising or representing the numerical information; a vector of length 2 comprising a mantissa and exponent that comprise or represent the numerical information; a vector of length k formed by applying k weakly monotonic functions with overlapping dynamic ranges to the numerical information” reads on a human encoding text using a designated encoding scheme, in particular the length, numerical content, format, and mathematical characteristics of the encoded text.
The as-claimed inventions in claims 2-13 listed above relate to abstract ideas - mental processes, regardless of the fact that the claim limitations may use a computer or processor-based “circuitry” to perform the process(es).  All of the limitations encompass a process a human can discern mentally using pen and paper.  Furthermore, these claims do not remedy the judicial exception being integrated into a practical application and fail to include additional elements that are sufficient to amount to significantly more than the judicial exception, thereby making the claimed invention(s) ineligible for patenting.


Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-7, 9 and 14-16, are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (CN 110968564; hereafter Zhang) in view of Peters et al. (US 20060288284; hereafter Peters) and in view of Kuang et al (Kuang et al (S. Kuang and B. D. Davison, "Numeric-Attribute-Powered Sentence Embedding," 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), 2018, pp. 623-626). 
Regarding independent claim 1, Zhang teaches an apparatus comprising processing circuitry (see [0052] computing device) configured to pre-process text data (see [0002] data processing device and method, see [0079] word embedding) for inputting to a trained model (see [0002] training method; see [0052] training method, prediction model).
Furthermore, regarding claim 1, Zhang further teaches transforming each of the plurality of tokens into a respective encoding vector (see [0079] non-numeric type, word embedding, continuous vector space), each of the plurality of tokens in the second subset (see [0085] split training data set into multiple subsets) having a common encoding vector (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set); assigning a respective numerical vector to each of the plurality of tokens (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set), wherein each token is assigned a respective numerical vector in dependence on the numerical information in said token (see [0103] static feature is a numeric type, corresponding static feature vector is formed according to the value of the static feature); and combining vectors for obtaining a vector representation of the text data. (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set of the data).
Furthermore, regarding claim 1, Zhang doesn’t teach “receiving a set of text data including numerical information, the set of text data comprising a plurality of tokens, wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information."
Peters discloses receiving a set of text data including numerical information (see [0005] portion of source data, plurality of numerical elements), the set of text data comprising a plurality of tokens (see [0103] parses document into chunks, [0104] delimiters), wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information (see [0050] differentiation between numeric and non-numeric elements and [0051] – [0055] differentiation process).
Zhang and Peters are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Peters in order to yield the predictable result of correlating numerical and nonnumerical data in a textual data set to increase the accuracy of a prediction model.
Zhang fails to specifically teach, but Kuang teaches (page 624-625, section IV. Numeric-attribute powered sentence embedding) tokens in the first subset that do not comprise numerical information, the respective encoding vector is an encoding of the token (“text vector representation”), and wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset (“determines an average numeric attribute representation based on rankings”) or extending the encoding vector for each token by the numerical vector for said token to obtain a respective extended embedding vector for each token, thereby obtaining a vector representation of the text data (“concatenates text vector representation and numeric attribute representation”).    Zhang specifically teaches the system improves classification performance (Abstract).  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Kuang in order to yield predictable results and improved classification processing performance.
Regarding claim 2, Zhang in view of Peters and Kuang teach all of the limitations of claim 1 above.   Zhang further teaches each token in the first subset is assigned a common default numerical vector (see [0103] static feature is a non-numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set). 
Regarding claim 3, Zhang in view of Peters and Kuang teach all of the limitations of claim 1 above.  Zhang further teaches each token comprises a respective word, word-piece, group of words, number, or symbol (see [0103] value of the static feature is a numeric type).
Regarding claim 4, Zhang in view of Peters and Kuang teach all of the limitations of claim 1 above.  Zhang further teaches the combining of the encoding 25vectors and numerical vectors comprises, for each token, appending the numerical vector for said token to the encoding vector for said token (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set).
Regarding claim 5, Zhang in view of Peters and Kuang teach all of the limitations of claim 1 above.   Zhang further teaches the processing circuitry is further configured to input the vector representation to the trained model and to use the trained 30model to process the vector representation to obtain a desired output (see [104] prediction module inputs feature vector set into pre-trained data state prediction model to predict the state corresponding to the data).
Regarding claim 6, Zhang in view of Peters and Kuang teach all of the limitations of claim 1 above. Zhang further teaches the trained model comprises a neural network (see [0071] data state prediction model constructed through Deep Neural Network).
Regarding claim 7, Zhang in view of Peters and Kuang teach all of the limitations of claim 1 above. Zhang further teaches the trained model is trained to approximate at least one mathematical operation (see [0057] data state prediction model implemented by GBDT regression trees, see [0058] meaningful to add and subtract based on values obtained by regression tree).  
Regarding claim 9, Zhang in view of Peters and Kuang teach all of the limitations of claims 1 and 5 above. Zhang further teaches the desired output comprises at least 10one of an assessment of a subject's condition; a chance of survival, an assessment of contraindications; a diagnosis; a prediction (see [104] prediction module inputs feature vector set into pre-trained data state prediction model to predict the state corresponding to the data); a likelihood of an outcome; a summary of the text data; a determination of whether a number in the text data is normal; a determination of whether a number in the text data is abnormal. 
Regarding independent claim 14, Zhang teaches a method of pre-processing text data (see [0002] data processing device and method, see [0079] word embedding) for inputting to a trained model (see [0002] training method; see [0052] training method, prediction model).
Furthermore, regarding claim 14, Zhang teaches transforming each of the plurality of tokens into a respective encoding vector (see [0079] non-numeric type, word embedding, continuous vector space), each of the plurality of tokens in the second subset (see [0085] split training data set into multiple subsets) having a common encoding vector (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set); assigning a respective numerical vector to each of the plurality of tokens (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set), wherein each token is assigned a respective numerical vector in dependence on the numerical information in said token (see [0103] static feature is a numeric type, corresponding static feature vector is formed according to the value of the static feature); and combining the encoding vectors and numerical vectors to obtain a vector representation of the text data (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set of the data).
Furthermore, regarding claim 14, Zhang doesn’t teach “receiving a set of text data including numerical information, the set of text data comprising a plurality of tokens, wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information."
Peters discloses receiving a set of text data including numerical information (see [0005] portion of source data, plurality of numerical elements), the set of text data comprising a plurality of tokens (see [0103] parses document into chunks, [0104] delimiters), wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information (see [0050] differentiation between numeric and non-numeric elements and [0051] – [0055] differentiation process).
Zhang and Peters are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Peters in order to yield the predictable result of correlating numerical and nonnumerical data in a textual data set to increase the accuracy of a prediction model.
Zhang fails to specifically teach, but Kuang teaches (page 624-625, section IV. Numeric-attribute powered sentence embedding) tokens in the first subset that do not comprise numerical information, the respective encoding vector is an encoding of the token (“text vector representation”), and wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset (“determines an average numeric attribute representation based on rankings”) or extending the encoding vector for each token by the numerical vector for said token to obtain a respective extended embedding vector for each token, thereby obtaining a vector representation of the text data (“concatenates text vector representation and numeric attribute representation”).    Zhang specifically teaches the system improves classification performance (Abstract).  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Kuang in order to yield predictable results and improved classification processing performance.
Regarding independent claim 15, Zhang teaches an apparatus comprising processing circuitry (see [0052] computing device) configured to receive a vector representation of a set of text data (see [0009] input the feature vector set into pre-trained data state prediction model), wherein the vector representation has been obtained by pre-processing the set of text data (see [0009] extracting mixed features of the data to be processed), the pre-processing comprising receiving the set of text data (see [0009] input the feature vector set into pre-trained data state prediction model).
Furthermore, regarding claim 15, Zhang teaches transforming each of the plurality of tokens into a respective encoding vector (see [0079] non-numeric type, word embedding, continuous vector space), each of the plurality of tokens in the second subset (see [0085] split training data set into multiple subsets) having a common encoding vector (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set); assigning a respective numerical vector to each of the plurality of tokens (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set), wherein each token is assigned a respective numerical vector in dependence on the numerical information in said token (see [0103] static feature is a numeric type, corresponding static feature vector is formed according to the value of the static feature); and combining the encoding vectors and numerical vectors to obtain a vector representation of the text data (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set of the data); and apply a trained model to the vector representation to obtain a desired output (see [0009] input feature vector set into pre-trained data state prediction model to predict state corresponding to data).
Furthermore, regarding claim 15, Zhang doesn’t teach “receiving the set of text data, wherein the set of text data includes numerical information, the set of text data comprises a plurality of tokens, wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information."
Peters discloses receiving a set of text data including numerical information (see [0005] portion of source data, plurality of numerical elements), the set of text data comprising a plurality of tokens (see [0103] parses document into chunks, [0104] delimiters), wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information (see [0050] differentiation between numeric and non-numeric elements and [0051] – [0055] differentiation process).
Zhang and Peters are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Peters in order to yield the predictable result of correlating numerical and nonnumerical data in a textual data set to increase the accuracy of a prediction model.
Zhang fails to specifically teach, but Kuang teaches (page 624-625, section IV. Numeric-attribute powered sentence embedding) tokens in the first subset that do not comprise numerical information, the respective encoding vector is an encoding of the token (“text vector representation”), and wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset (“determines an average numeric attribute representation based on rankings”) or extending the encoding vector for each token by the numerical vector for said token to obtain a respective extended embedding vector for each token, thereby obtaining a vector representation of the text data (“concatenates text vector representation and numeric attribute representation”).    Zhang specifically teaches the system improves classification performance (Abstract).  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Kuang in order to yield predictable results and improved classification processing performance.
Regarding independent claim 16, Zhang teaches a method comprising receiving a vector representation of a set of text data (see [0009] input the feature vector set into pre-trained data state prediction model), wherein the vector representation has been obtained by pre-processing the set of text data (see [0009] extracting mixed features of the data to be processed), the pre-processing comprising receiving the set of text data (see [0009] input the feature vector set into pre-trained data state prediction model).
Furthermore, regarding claim 16, Zhang teaches transforming each of the plurality of tokens into a respective encoding vector (see [0079] non-numeric type, word embedding, continuous vector space), each of the plurality of tokens in the second subset (see [0085] split training data set into multiple subsets) having a common encoding vector (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set); assigning a respective numerical vector to each of the plurality of tokens (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set), wherein each token is assigned a respective numerical vector in dependence on the numerical information in said token (see [0103] static feature is a numeric type, corresponding static feature vector is formed according to the value of the static feature); and combining the encoding vectors and numerical vectors to obtain a vector representation of the text data (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set of the data); and apply a trained model to the vector representation to obtain a desired output (see [0009] input feature vector set into pre-trained data state prediction model to predict state corresponding to data).
Furthermore, regarding claim 16, Zhang doesn’t teach “receiving the set of text data, wherein the set of text data includes numerical information, the set of text data comprises a plurality of tokens, wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information."
Peters discloses receiving a set of text data including numerical information (see [0005] portion of source data, plurality of numerical elements), the set of text data comprising a plurality of tokens (see [0103] parses document into chunks, [0104] delimiters), wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information (see [0050] differentiation between numeric and non-numeric elements and [0051] – [0055] differentiation process).
Zhang and Peters are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Peters in order to yield the predictable result of correlating numerical and nonnumerical data in a textual data set to increase the accuracy of a prediction model.
Zhang fails to specifically teach, but Kuang teaches (page 624-625, section IV. Numeric-attribute powered sentence embedding) tokens in the first subset that do not comprise numerical information, the respective encoding vector is an encoding of the token (“text vector representation”), and wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset (“determines an average numeric attribute representation based on rankings”) or extending the encoding vector for each token by the numerical vector for said token to obtain a respective extended embedding vector for each token, thereby obtaining a vector representation of the text data (“concatenates text vector representation and numeric attribute representation”).    Zhang specifically teaches the system improves classification performance (Abstract).  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Kuang in order to yield predictable results and improved classification processing performance.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. in view of Peters and Kuang, as applied in claim 1 above, and further in view of Cerchiello et al. (Cerchiello, Paola and Nicola, Giancarlo and Rönnqvist, Samuel and Sarlin, Peter, Deep Learning for Assessing Banks’ Distress from News and Numerical Financial Data (November 28, 2018). Michael J. Brennan Irish Finance Working Paper Series Research Paper No. 18-15; hereafter Cerchiello).
Regarding claim 8, Zhang in view of Peters and Kuang teach all of the limitations of claims 1 and 5 above.
Zhang in view of Peters do not teach the desired output comprises at least one of a comparison of two or more numbers in the text data; a determination of whether a number in the text data belongs to an interval; a comparison of a number in the text data to a target value.
Cerchiello discloses the desired output comprises at least one of a comparison of two or more numbers in the text data (see [Fig 2] comparison of Numerical and Textual-and-Numerical data sets); a determination of whether a number in the text data belongs to an interval; a comparison of a number in the text data to a target value.
Zhang in view of Peters and Cerchiello are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang in view of Peters to incorporate the disclosure of Cerchiello in order to positively enhance the prediction capability of a predictive model (see Cerchiello [page 7, paragraph 4] positively enhances prediction capability of the model). 

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Peters and Kuang as applied in claim 1 above, and further in view of Cohen et al. (WO 2016094330; hereafter Cohen).
Regarding claim 10, Zhang in view of Peters and Kuang teach all of the limitation of claim 1 above. 
Zhang in view of Peters do not teach “the text data comprises clinical notes and wherein the numerical information comprises at least one of laboratory data, symptom data, vital signs data, dosage data, measurement data, genomic data.”
Cohen discloses the text data comprises clinical notes (see [00359] data separated into text-based data including physician notes) and wherein the numerical information (see [00359] clinical and numeric data is provided to master neural net) comprises at least one of laboratory data, symptom data (see [00359] data separated into clinical and numeric data (e.g., symptoms)), vital signs data, dosage data, measurement data, genomic data.
Zhang in view of Peters and Cohen are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang in view of Peters to incorporate the disclosure of Cohen in order to determine a level of risk of a patient developing a medical condition or disease (see [0201] artificial intelligence computing system NACS determines risk of having cancer, see [0360] NACS master neural net NN12 analyzes various inputs including clinical and numeric data, generates risk categories corresponding to level of risk for patient to develop cancer). 

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Peters and Kuang as applied in claim 1 above, and further in view of Goloubew et al. (US 20210027167; hereafter Goloubew). 
Regarding claim 11, Zhang in view of Peters and Kuang teach all of the limitation of claim 1 above.
Zhang in view of Peters do not teach “the processing circuitry is further 20configured to display at least part of the text data to a user, and to highlight at least one number in the display of the text data in dependence on at least one output of the trained model.”
Goloubew discloses the processing circuitry is further 20configured to display at least part of the text data to a user (see [0053] examples of highlighting added to unstructured text for display), and to highlight at least one number in the display of the text data (see Fig 4A highlighted text 402 is a numerical value, see [0123] sequence of text may correspond to numbers or other tokens) in dependence on at least one output of the trained model (see [0052] highlighting engine comprises trained machine learning-based models to predict portions of unstructured text).
Zhang in view of Peters and Goloubew are considered to be analogous because they are from the field of data processing/analysis.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang in view of Peters to incorporate the disclosure in order to accurately identify, predict, and display anomalies/deviations from an expected output in a portion of text, thereby aiding in error detection. (see Goloubew [0039] performance of model evaluated based on the number of true positives, false positives, true negatives, and/or false negatives of model, see [0052] predict portions of unstructured text, highlighting engine identifies anomalies/deviations from the expected text, see [0093] highlighting anomalous portions of text allows reader to focus on portions of text that may signify errors or other problems). 

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Peters and Kuang as applied in claim 1 above, and further in view of Corrado et al. (US 9141916; hereafter Corrado). 
Regarding claim 12, Zhang in view of Peters teach all of the limitation of claim 1 above.
Zhang in view of Peters doesn’t teach “the vector representation is 25configured to capture a natural ordering of numbers.”
Corrado discloses the vector representation is 25configured to capture a natural ordering of numbers (see Col 4:66 – Col 5:8).
Zhang in view of Peters and Corrado are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have combined Zhang in view of Peters according to the known methods of Corrado to yield the predictable result of reducing the number of classifiers required to be trained in order to obtain a satisfactory output, thereby saving time and computer processing power.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Peters and Kuang as applied in claim 1 above, and further in view of Elkind et al. (US 20190273510; hereafter Elkind). 
Regarding claim 13, Zhang in view of Peters teach all of the limitation of claim 1 above.
Zhang in view of Peters do not teach “each numerical vector comprises at least one of: a fixed length vector comprising or representing the numerical information, a vector of length 1 comprising or representing the numerical information, a vector of length 2 comprising a mantissa and exponent that comprise or represent the numerical information, or a vector of length k formed by applying k weakly monotonic functions with overlapping dynamic ranges to the numerical information.”
Elkind discloses each numerical vector comprises (see [0026] source data is typically numerical values with arbitrary length) at least one of: a fixed length vector comprising or representing the numerical information (see [0025] system allows representations of source data to be mapped to set of values of a fixed dimensionality; see [0028] embeddings provide fixed length representation of input data, embeddings combined with data represented as fixed vectors of floating point numbers; see [0076] tokenized inputs are first vectorized by mapping a token to a vectorized input, see [0140] output is fixed-length representation of tokenized inputs, fixed-length representation of tokenized inputs is example embedding of source data), a vector of length 1 comprising or representing the numerical information, a vector of length 2 comprising a mantissa and exponent that comprise or represent the numerical information, or a vector of length k formed by applying k weakly monotonic functions with overlapping dynamic ranges to the numerical information.
Zhang in view of Peters and Elkind are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang in view of Peters to incorporate the disclosure of Elkind in order to facilitate a more efficient representation of input data for various machine learning algorithms  (see Elkind [0025] more efficient representation for machine learning algorithms).

Claims 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (CN 110968564; hereafter Zhang) in view of in view of Peters et al. (US 20060288284; hereafter Peters) in view Kuang et al (S. Kuang and B. D. Davison, "Numeric-Attribute-Powered Sentence Embedding," 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), 2018, pp. 623-626) in view of Cerchiello et al. (Cerchiello, Paola and Nicola, Giancarlo and Rönnqvist, Samuel and Sarlin, Peter, Deep Learning for Assessing Banks’ Distress from News and Numerical Financial Data (November 28, 2018). Michael J. Brennan Irish Finance Working Paper Series Research Paper No. 18-15; hereafter Cerchiello).
Regarding independent claim 17,  Zhang teaches an apparatus comprising processing circuitry (see [0052] computing device).  Zhang further teaches the vector representation comprising a respective encoding vector (see [0079] non-numeric type, word embedding, continuous vector space), and numerical vector (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set) for each of a plurality of tokens in the training text data (see [0016] model training based on a pre-acquired training data set).  Zhang further teaches wherein the desired output is dependent on numerical information in the target text (see [0016] prediction model performs model training, output of prediction model indicates state corresponding to input feature vector set).
Zhang does not teach “receive a vector representation of a set of training text data.”
Cerchiello discloses receive a vector representation of a set of training text data  (see [page 2, paragraph 5] for learning vector representations of sequences of words, dense vector representations of sentences mentioning target banks are learned, method creates a multidimensional space in which words are positioned according to their semantic meaning, see [page 3, paragraph 2] semantic vector representation is learned by training a neural network to predict a word using its word context).
Zhang and Cerchiello are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to incorporate the disclosure of Cerchiello to learn vector representations of sequences of words (see [page 2, paragraph 5] learn vector representations of sequences of words).
Furthermore, regarding claim 17, Zhang does not teach “and use the vector representation of the training text data to train a model to produce a desired output when provided with a vector representation of a target text.”
Cerchiello discloses use the vector representation of the training text data (see [page 2, paragraph 5] for learning vector representations of sequences of words, dense vector representations of sentences mentioning target banks are learned, method creates a multidimensional space in which words are positioned according to their semantic meaning, see [page 3, paragraph 2] semantic vector representation is learned by training a neural network to predict a word using its word context) to train a model when provided with a vector representation of a target text (see [page 3, paragraph 2] to produce a desired output semantic vector representation is learned by training a feed forward neural network to predict a certain word using its word context, word contexts used as features to predict target words sampled over the sentence), wherein the desired output (see [page 3, paragraph 3] the network is trained to predict distress events) is dependent on numerical information in the target text (see [page 3, paragraph 3] second step receives news textual data in form of multidimensional semantic vectors and numerical financial data loaded from a database, see [page 3, paragraph 4] leverage textual and numerical data to predict events of interest; see [page 4, paragraph 1] model trained in a supervised framework to associate specific language and financial figures with the target event type).  
Zhang and Cerchiello are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to incorporate the disclosure of Cerchiello so that vector representations of text positively contribute in predicting the output of a predictive model  (see Cerchiello [page 3, paragraph 2] semantic vector updated by training algorithm so its representation positively contributes in predicting the next word).
Zhang doesn’t teach “receiving a set of text data including numerical information, the set of text data comprising a plurality of tokens, wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information."
Peters discloses receiving a set of text data including numerical information (see [0005] portion of source data, plurality of numerical elements), the set of text data comprising a plurality of tokens (see [0103] parses document into chunks, [0104] delimiters), wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information (see [0050] differentiation between numeric and non-numeric elements and [0051] – [0055] differentiation process).
Zhang and Peters are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Peters in order to yield the predictable result of correlating numerical and nonnumerical data in a textual data set to increase the accuracy of a prediction model.
Zhang fails to specifically teach, but Kuang teaches (page 624-625, section IV. Numeric-attribute powered sentence embedding) tokens in the first subset that do not comprise numerical information, the respective encoding vector is an encoding of the token (“text vector representation”), and wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset (“determines an average numeric attribute representation based on rankings”) or extending the encoding vector for each token by the numerical vector for said token to obtain a respective extended embedding vector for each token, thereby obtaining a vector representation of the text data (“concatenates text vector representation and numeric attribute representation”).    Zhang specifically teaches the system improves classification performance (Abstract).  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Kuang in order to yield predictable results and improved classification processing performance.
Regarding independent claim 18,  Zhang teaches the vector representation comprising a respective encoding vector (see [0079] non-numeric type, word embedding, continuous vector space), and numerical vector (see [0103] static feature is a numeric type, static feature vector and the dynamic feature vector are combined to form the feature vector set) for each of a plurality of tokens in the training text data (see [0016] model training based on a pre-acquired training data set); and using the vector representation of the training text data to train a model to produce a desired output when provided with a vector representation of a target text (see [0016] prediction model performs model training, output of prediction model indicates state corresponding to input feature vector set), wherein the desired output is dependent on numerical information in the target text (see [0016] prediction model performs model training, output of prediction model indicates state corresponding to input feature vector set).
Zhang doesn’t teach “receiving a vector representation of a set of training text.”
Cerchiello discloses receiving a vector representation of a set of training text (see [page 2, paragraph 5] for learning vector representations of sequences of words, dense vector representations of sentences mentioning target banks are learned, method creates a multidimensional space in which words are positioned according to their semantic meaning, see [page 3, paragraph 2] semantic vector representation is learned by training a neural network to predict a word using its word context).
Zhang and Cerchiello are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to incorporate the disclosure of Cerchiello to learn vector representations of sequences of words (see [page 2, paragraph 5] learn vector representations of sequences of words).
Furthermore, regarding claim 18, Zhang does not teach “and using the vector representation of the training text data to train a model to produce a desired output when provided with a vector representation of a target text.”
Cerchiello discloses using the vector representation of the training text data (see [page 2, paragraph 5] for learning vector representations of sequences of words, dense vector representations of sentences mentioning target banks are learned, method creates a multidimensional space in which words are positioned according to their semantic meaning, see [page 3, paragraph 2] semantic vector representation is learned by training a neural network to predict a word using its word context) to train a model when provided with a vector representation of a target text (see [page 3, paragraph 2] to produce a desired output semantic vector representation is learned by training a feed forward neural network to predict a certain word using its word context, word contexts used as features to predict target words sampled over the sentence), wherein the desired output (see [page 3, paragraph 3] the network is trained to predict distress events) is dependent on numerical information in the target text (see [page 3, paragraph 3] second step receives news textual data in form of multidimensional semantic vectors and numerical financial data loaded from a database, see [page 3, paragraph 4] leverage textual and numerical data to predict events of interest; see [page 4, paragraph 1] model trained in a supervised framework to associate specific language and financial figures with the target event type).  
Zhang and Cerchiello are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to incorporate the disclosure of Cerchiello so that vector representations of text positively contribute in predicting the output of a predictive model  (see Cerchiello [page 3, paragraph 2] semantic vector updated by training algorithm so its representation positively contributes in predicting the next word).
Zhang doesn’t teach “receiving a set of text data including numerical information, the set of text data comprising a plurality of tokens, wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information."
Peters discloses receiving a set of text data including numerical information (see [0005] portion of source data, plurality of numerical elements), the set of text data comprising a plurality of tokens (see [0103] parses document into chunks, [0104] delimiters), wherein a first subset of the plurality of tokens comprises tokens that do not comprise numerical information, and a second subset of the plurality of tokens comprises tokens that each comprise respective numerical information (see [0050] differentiation between numeric and non-numeric elements and [0051] – [0055] differentiation process).
Zhang and Peters are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Peters in order to yield the predictable result of correlating numerical and nonnumerical data in a textual data set to increase the accuracy of a prediction model.
Zhang fails to specifically teach, but Kuang teaches (page 624-625, section IV. Numeric-attribute powered sentence embedding) tokens in the first subset that do not comprise numerical information, the respective encoding vector is an encoding of the token (“text vector representation”), and wherein for each of the tokens in the second subset that do comprise numerical information, the respective encoding vector is a common encoding vector that does not depend on the numerical information and is common to all tokens in the second subset (“determines an average numeric attribute representation based on rankings”) or extending the encoding vector for each token by the numerical vector for said token to obtain a respective extended embedding vector for each token, thereby obtaining a vector representation of the text data (“concatenates text vector representation and numeric attribute representation”).    Zhang specifically teaches the system improves classification performance (Abstract).  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to combine the disclosure of Kuang in order to yield predictable results and improved classification processing performance.


Claims 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (CN 110968564; hereafter Zhang) in view of Cerchiello et al. (Cerchiello, Paola and Nicola, Giancarlo and Rönnqvist, Samuel and Sarlin, Peter, Deep Learning for Assessing Banks’ Distress from News and Numerical Financial Data (November 28, 2018). Michael J. Brennan Irish Finance Working Paper Series Research Paper No. 18-15; hereafter Cerchiello).

Regarding independent claim 19, Zhang teaches an apparatus comprising processing circuitry (see [0052] computing device) configured to receive training text data (see [0009] input the feature vector set into pre-trained data state prediction model); and train a model on the training text data (see [0016] prediction model performs model training based on training data set). 
Furthermore, regarding claim 19, Zhang teaches to pass values for the numbers to an inference system (see [0030] data set input into data state prediction model to predict state corresponding to data).
Zhang does not teach “such that the model is trained to learn contextualized semantic meaning of a plurality of numbers in the text data from contextual information in tokens around the numbers.”
Cerchiello discloses the model is trained to learn contextualized semantic meaning of a plurality of numbers in the text data from contextual information in tokens around the numbers (see [page 5, paragraph 3] numerical data aligned with sentences to match bank and numerical financial data; see [page 6, paragraph 1] numerical data normalized to improve classifier training time due to easier convergence of the model). 
Zhang and Cerchiello are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to incorporate the disclosure of Cerchiello in order to better correlate related data in the textual and numerical data sets, thereby improving the accuracy of a prediction model (see [page 5, paragraph 3] match each and every mention of a bank with the corresponding numerical financial data).

Regarding independent claim 20, Zhang teaches a method comprising receiving training text data (see [0009] input the feature vector set into pre-trained data state prediction model); and training a model on the training text data (see [0016] prediction model performs model training based on training data set).
Furthermore, regarding claim 20, Zhang teaches to pass values for the numbers to an inference system (see [0030] data set input into data state prediction model to predict state corresponding to data).
Zhang does not teach “such that the model is trained to learn contextualized semantic meaning of a plurality of numbers in the text data from contextual information in tokens around the numbers.”
Cerchiello discloses the model is trained to learn contextualized semantic meaning of a plurality of numbers in the text data from contextual information in tokens around the numbers (see [page 5, paragraph 3] numerical data aligned with sentences to match bank and numerical financial data; see [page 6, paragraph 1] numerical data normalized to improve classifier training time due to easier convergence of the model). 
Zhang and Cerchiello are considered to be analogous because they are from the field of data processing.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Zhang to incorporate the disclosure of Cerchiello in order to better correlate related data in the textual and numerical data sets, thereby improving the accuracy of a prediction model (see [page 5, paragraph 3] match each and every mention of a bank with the corresponding numerical financial data).

Response to Arguments
Applicant's arguments filed July 8, 2022 with respect to the rejections under 35 USC 101 have been fully considered but they are not persuasive. 
Applicant argues “the amended claims "incorporate particular rules" by which the text data comprising a plurality of tokens is transformed into respective encoding vectors, etc. to obtain a vector representation of the text data which can then be input to a trained model to obtain a desired output (see Claim 5, for example). The "rules" clarified in amended Claims 1 and 14-18 are clearly not "mental processes" and should not be considered as such even under the "broadest reasonable interpretation" standard, which requires consideration "in light of the specification.”  The Examiner respectfully disagrees.  As indicated in the rejection above, the recited steps of the claims are related to activities of a human organizing, categorizing, sorting and encoding text.  The rules can be implemented via mental processes and/or using pen and paper.  With respect to applicant’s reference to McRo, as indicated by applicant, McRo was directed to rules that a human animator would not use.  This is not the case in recited claims of the instant application.  Parsing text two identify the types of information in the text and then sorting the text into categories/types and encoding that text are mental process steps that human use when processing text.  The claims are directed to an abstract idea.
Applicant argues “the pending claims are "directed to a specific implementation of a solution to a problem in the software arts, 'such as an improvement in the functioning of a computer'," do not merely invoke generic processes and machinery, and, and are NOT directed to a result or effect that itself is the abstract idea.”  The Examiner respectfully disagrees.  The recited computer components are recited at a high level of generality and amounts to no more than mere instructions to apply the exception using generic computer components.    
Applicant’s arguments with respect to claim(s) 1-18 have been considered but are moot because the arguments are not specifically based on the new combination of references cited in the new grounds of rejection.
Applicant's arguments filed with respect to claims 19-20 have been fully considered but they are not persuasive.   Applicant argues “There is no description in Cerchiello of numbers in the text data, nor of the presence of tokens around such numbers. Therefore, Cerchiello does not describe learning contextualized semantic meaning of a plurality of numbers in the text data from contextual information in tokens around the numbers.”    The Examiner respectfully disagrees.  The Examiner notes, the matching and aligning of the text data with the numerical data to generate a dataset provides form of data where there are numbers in text data.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 



Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659