Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The filing date of the present application is 07/29/2016.
This action is in response to amendments and/or remarks filed on 09/17/2020. In the current amendments, claims 1-17 and 19 have been amended. A new claim 20 has been introduced. Claims 1-20 are pending and have been examined. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/17/2020 has been entered.

Information Disclosure Statement
The information disclosure statements (IDSs) submitted on 10/16/2020 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 6, 13 and 19 are objected to because of the following informalities: “the selected new character” may be preferred instead of “the selected character” because “selecting a next character” is recited before.  Appropriate correction is required.
Claim 19 appears to erroneously recite “produce the new predicted structured” instead of “produce the new predicted structured events by the data generator” as amended in claims 6 new predicted structured”.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 9 and 16 recite the limitation “the textual characters” in line 16, line 18 and line 20, respectively. There is insufficient antecedent basis for this limitation in the claims.
Claim 18 recites the limitation “the deep-learning technique” in line 3.  There is insufficient antecedent basis for this limitation in the claim.
Claim 19 recites the limitation “the deep-learning technique” in line 3.  There is insufficient antecedent basis for this limitation in the claim.
Claims 1, 9, 16 and 18-19 each recite limitations that raise issues of indefiniteness as set forth above, and dependent claims 2-8, 10-15 and 17 are rejected at least based on their direct and/or indirect dependency from independent claims 1, 9 and 16. Appropriate explanation and/or amendment is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 9-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Leeman-Munk et al. (US 2016/0350646 A1) in view of Suh et al. (Suh, B, “Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network”).

Regarding claim 1, 
Leeman-Munk teaches 
A computer-implemented method comprising: 
obtaining a training corpus that contains example structured events ([fig 15]; [par 157] "In block 1506, the processor can include the canonical term and the noncanonical version of the canonical term in a database for use in training a neural network. For example, the processor can include the canonical term "hello" mapped to its corresponding noncanonical version "hzlly" in the database. The processor can repeat steps 1504-1506 for all of the terms in the dataset, thereby generating a database of annotated training data. Thereafter, the processor can use the database to train the neural network (e.g., according to a supervised or a semi-supervised training method)."; Fig 15 generates a training corpus and it creates example structured events since canonical terms and the noncanonical version of the canonical terms are used.), 
each structured event including a portion of raw machine data that reflects activity in an information technology environment and that is produced by a component of that information technology environment ([par 41] “For example, network-attached data stores 110 may hold unstructured (e.g., raw) data, such as data from a website (e.g., a forum post, a TwitterTM tweet, a FacebookTM post, a blog post, an online review), a text message, an e-mail, or any combination of these.”; [par 53] “Although network devices 204-209 are shown in FIG. 2 as a mobile phone, laptop computer, tablet computer, temperature sensor, motion sensor, and audio sensor respectively, the network devices may be or include sensors that are sensitive to detecting aspects of their environment.”; “data from a website (e.g., a forum post, a TwitterTM tweet, a FacebookTM post, a blog post, an online review), a text message, an e-mail” read on “activity in an information technology environment”. In addition, “mobile phone, laptop computer, tablet computer” read on “a component of that information technology environment”.);

training a data generator to generate new predicted structured events based on the example structured events of the training corpus, a new predicted structured event of the new predicted structured events having a data field defined by an … rule from the training corpus and predicted content in the data field ([figs 13-15]; [pars 90-93] "In block 502, a processor trains a neural network. The 15 neural network can include one or more computer-implemented algorithms or models. … In some examples, the neural network is, or includes, a deep neural network.; [pars 41-55] “For example, network-attached data stores 110 may hold unstructured (e.g., raw) data, such as data from a website (e.g., a forum post, a TwitterTM tweet, a FacebookTM post, a blog post, an online review), a text message, an e-mail, or any combination of these. … The computing environment 214 may also include storage devices that include one or more databases of structured data, such as data organized in one or more hierarchies,”; [pars 130-147] "Because each row can correspond to a particular character position in the normalized version of the term, the determined characters can be arranged in the correct order to generate the normalized version of the term. For example, the neural network or the processor can arrange the determined characters into the normalized version of the term, "you." … A desired output from the neural network is designated as “Gold” (i.e., the “gold standard”) in FIG. 13. ”; “normalized version”, “structured data”, “@ez_doessit”, and the “desired output” of each sentence of figs 13-14 read on “a new predicted structured event of the new predicted structured events having a data field defined by an … rule from the training corpus and predicted content in the data field”. Furthermore, fig 15 reads on “example structured events of the training corpus”.),
wherein the predicted content is generated via the data generator by generating the predicted content, one new character at a time, as a sequence of new characters based upon calculated statistical predictions of the sequence of the textual characters of the example structured events of the training corpus ([figs 9-12]; [pars 112-130] "In block 916, the neural network determines multiple probabilities of particular characters being in particular positions in a normalized version of the term based on the context embedding. For example, the context embedding can be transmitted to an output layer (e.g., a multi-softmax layer) of the neural network. The output layer can use a feed-forward layer to generate a prediction of the most likely character in each position of the normalized version of the term, as well as a flag indicating whether the term should be normalized or left as-is (as discussed in greater detail below). … Because each row can correspond to a particular character position in the normalized version of the term, the determined characters can be arranged in the correct order to generate the normalized version of the term. For example, the neural network or the processor can arrange the determined characters into the normalized version of the term, "you.""; [pars 135-141] “The neural network 1100 can, in some examples, operate simultaneously at the character level and the word level, performing both character-level corrections and word level corrections to generate the normalized version of the noncanonical term. For example, the neural network 1100 can construct the normalized version of the term character-by-character, while also using the BGRNN 1106 to perform context analysis, thereby increasing the accuracy that the normalized version of the term is correct.”; “generate the normalized version of the term” and “generate a prediction of the most likely character in each position of the normalized version of the term” read on “generating the predicted content, one new character at a time, as a sequence of new characters”, and the generating is based on the statistical predictions with the neural network as described in the paragraphs above. In addition, “multiple probabilities of particular characters” and “prediction of the most likely character” read on “calculated statistical predictions of the sequence of the textual characters”.),
wherein the predicted content generated via the data generator comprises a textual indication of a system error or a system performance ([figs 13-14]; [pars 142-147] “FIG. 14 shows the output from four different text-normalizers, with the neural network of the present application being designated as “DeepNorm.” As shown, DeepNorm provides a more accurate result than the Skip-Bigram approach, which is specifically designed for use in tweet normalization applications. And both DeepNorm and CLSTM by Word have one incorrect word, while the CLSTM by Sentence approach performs slightly better than DeepNorm.”; “incorrect word” reads on “a textual indication of a system error or a system performance”.).

However, Leeman-Munk does not teach explicitly 

a data field defined by an extraction rule.

Suh teaches 
each structured event further having at least one data field defined by an extraction rule for extracting a subportion of text from the portion of raw machine data in a corresponding structured event to produce a value for the at least one data field ([table 1]; [sec IV, C, par 3] “The first eight features in Table 1 were extracted for all the tweets in the two data sets – 10K and 74M data set. The content features were extracted using regular expressions from text of tweets. The contextual features were directly acquired through calls to the Twitter API.”; [sec IV, C, par 4, “Regular Expression Method”, line 3] “For example, to identify the number of retweets containing URLs, we scanned for text markers such as “RT @”, “RT:@”, “retweeting @”, “retweet @”, “via @”, “thx @”, “HT @”, and “r @”. ”; [sec I, par 2] “First, one can retweet by preceding it with RT and addressing the original author with @. For example, “RT @userA: my experience with the new #iPad is great!””; “regular expressions” read on the “extraction rule”. In addition, “the content features were extracted using regular expressions from text of tweets” reads on “extracting a subportion of text from the portion of raw machine data”. Furthermore, “RT @userA” and “content features were extracted using regular expressions” read on “produce a value for the at least one data field”.);

a data field defined by an extraction rule ([table 1]; [sec IV, C, par 3] “The first eight features in Table 1 were extracted for all the tweets in the two data sets – 10K and 74M data set. The content features were extracted using regular expressions from text of tweets. The contextual features were directly acquired through calls to the Twitter API.”; [sec IV, C, par 4, “Regular Expression Method”] “For example, to identify the number of retweets containing URLs, we scanned for text markers such as “RT @”, “RT:@”, “retweeting @”, “retweet @”, “via @”, “thx @”, “HT @”, and “r @”. ”; [sec I, par 2] “First, one can retweet by preceding it with RT and addressing the original author with @. For example, “RT @userA: my experience with the new #iPad is great!””; “regular expressions” read on the “extraction rule”. In addition, “RT @userA” and “content features were extracted using regular expressions from text of tweets” read on “a data field”.).

Leeman-Munk and Suh are all in the same field of endeavor of processing characters with neural network and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data generation operations of Leeman-Munk with the extraction rule of Suh. Doing so would lead to being able to extract a set of content features using regular expressions based on a set of markers to identify specific activities (e.g., retweets) (Suh, [sec IV, C]).

In the alternative, Leeman-Munk can also be interpreted to teach this limitation:
each structured event further having at least one data field defined by an extraction rule for extracting a subportion of text from the portion of raw machine data in a corresponding structured event to produce a value for the at least one data field; a data field defined by an extraction rule ([figs 5-9]; [pars 27-34] “the computing device can use the neural network to analyze the noncanonical term and determine if the noncanonical term should be normalized. If the neural network determines that the noncanonical term should be normalized, the computing device can output the normalized version of the noncanonical term. If the neural network determines that the noncanonical term should not be normalized, the computing device can output the noncanonical term”; see also [pars 89-134]; “use the neural network to analyze the noncanonical term and determine if the noncanonical term should be normalized” reads on “extraction rule for extracting a subportion of text from the portion of raw machine” since the neural network is used identifying (i.e. extracting) noncanonical terms to determine if the noncanonical terms should be normalized or not.);

Regarding claim 2, 
Leeman-Munk and Suh teach claim 1. 

Leeman-Munk further teaches 
the data generator is a recurrent neural network ([fig 12]; [par 5] "The method can include preprocessing the noncanonical communication by generating a vector comprising a plurality of characters from a term of the multiple terms. … The last character of the substring can be the same as the last character in the term. The method can include transmitting the vector to a neural network comprising at least two bidirectional gated recurrent neural network (BGRNN) layers.”; [par 141] “The neural network 1100 can, in some examples, operate simultaneously at the character level and the word level, performing both character-level corrections and word level corrections to generate the normalized version of the noncanonical term. For example, the neural network 1100 can construct the normalized version of the term character-by-character, while also using the BGRNN 1106 to perform context analysis, thereby increasing the accuracy that the normalized version of the term is correct.”; Fig 12 shows that a recurrent neural network is implemented with a deep-learning technique based on characters.).

Regarding claim 3, 
Leeman-Munk and Suh teach claim 1.

Leeman-Munk further teaches 
[par 0130] "Because each row can correspond to a particular character position in the normalized version of the term, the determined characters can be arranged in the correct order to generate the normalized version of the term. For example, the neural network or the processor can arrange the determined characters into the normalized version of the term, "you.""; “normalized version of the term” reads on “predicted structured events”.).

Regarding claim 4, 
Leeman-Munk and Suh teach claim 1.

Leeman-Munk further teaches 
the example structured events of the training corpus comprise a sequence of textual characters ([fig 15]; [par 157] "In block 1506, the processor can include the canonical term and the noncanonical version of the canonical term in a database for use in training a neural network. For example, the processor can include the canonical term "hello" mapped to its corresponding noncanonical version "hzlly" in the database. The processor can repeat steps 1504-1506 for all of the terms in the dataset, thereby generating a database of annotated training data. Thereafter, the processor can use the database to train the neural network (e.g., according to a supervised or a semi-supervised training method)."; Note that the canonical term and its noncanonical version are composed of textural characters.) and 
the training of the data generator includes calculating statistical predictions based upon the sequence of the textual characters of the example structured events of the training corpus ([par 92] "The neural network can determine the cost between a result from the neural network and a desired result and back propagate to reduce the cost. For example, a noncanonical term "u" (an erroneous version of the word "you") can be input into the neural network. During training, the neural network can determine that the letter "y" is 75% likely for a first character in the desired result, the letter "o" is 95% likely for a second character in the desired result, and the letter "u" is 89% likely for a third character in the desired result. A negative log likelihood for each letter can be determined to be (0.29, 0.5, 0.12). The negative log likelihoods can be summed together to determine the cost, which can be back propagated through the neural network to train the numeric weights."; [par 121] "In block 916, the neural network determines multiple probabilities of particular characters being in particular positions in a normalized version of the term based on the context embedding. For example, the context embedding can be transmitted to an output layer (e.g., a multi-softmax layer) of the neural network. The output layer can use a feed-forward layer to generate a prediction of the most likely character in each position of the normalized version of the term, as well as a flag indicating whether the term should be normalized or left as-is (as discussed in greater detail below)."; As shown in par 92, statistical predictions are calculated during the training, and block 916 is one of the possible implementations of the statistical predictions based on textual characters.).

Regarding claim 5, 
Leeman-Munk and Suh teach claim 1.

Leeman-Munk further teaches 
the example structured events of the training corpus comprise a sequence of textual characters and the training of the data generator includes calculating statistical predictions based upon the sequence of the textual characters of the example structured events of the training corpus, the method further comprising (See claim 4):

[par 130] "Because each row can correspond to a particular character position in the normalized version of the term, the determined characters can be arranged in the correct order to generate the normalized version of the term. For example, the neural network or the processor can arrange the determined characters into the normalized version of the term, "you.""; [par 121] "In block 916, the neural network determines multiple probabilities of particular characters being in particular positions in a normalized version of the term based on the context embedding. For example, the context embedding can be transmitted to an output layer (e.g., a multi-softmax layer) of the neural network. The output layer can use a feed-forward layer to generate a prediction of the most likely character in each position of the normalized version of the term, as well as a flag indicating whether the term should be normalized or left as-is (as discussed in greater detail below)."; [fig 12]; [par 141] “The neural network 1100 can, in some examples, operate simultaneously at the character level and the word level, performing both character-level corrections and word level corrections to generate the normalized version of the noncanonical term. For example, the neural network 1100 can construct the normalized version of the term character-by-character, while also using the BGRNN 1106 to perform context analysis, thereby increasing the accuracy that the normalized version of the term is correct.”; “generate the normalized version of the term” reads on “producing the new predicted structured events”, and the generating is based on the statistical predictions with the neural network as described in par 121.).

Regarding claim 6, 
Leeman-Munk and Suh teach claim 1.

Leeman-Munk further teaches 
(See claim 5): 

producing the predicted structured events by the data generator (See claim 5),

wherein the generating the predicted content includes selecting a next character of a predicted event based upon greatest likelihood of the selected character appearing next in the generated sequence as determined by the calculated statistical predictions of the sequence of the textual characters of the example structured events of the training corpus ([par 92] "The neural network can determine the cost between a result from the neural network and a desired result and back propagate to reduce the cost. For example, a noncanonical term "u" (an erroneous version of the word "you") can be input into the neural network. During training, the neural network can determine that the letter "y" is 75% likely for a first character in the desired result, the letter "o" is 95% likely for a second character in the desired result, and the letter "u" is 89% likely for a third character in the desired result. A negative log likelihood for each letter can be determined to be (0.29, 0.5, 0.12). The negative log likelihoods can be summed together to determine the cost, which can be back propagated through the neural network to train the numeric weights."; [par 121] "In block 916, the neural network determines multiple probabilities of particular characters being in particular positions in a normalized version of the term based on the context embedding. For example, the context embedding can be transmitted to an output layer (e.g., a multi-softmax layer) of the neural network. The output layer can use a feed-forward layer to generate a prediction of the most likely character in each position of the normalized version of the term, as well as a flag indicating whether the term should be normalized or left as-is (as discussed in greater detail below)."; “most likely” and the likeliness of each letter read on “greatest likelihood” when selecting/predicting a new character and “calculated statistical predictions”. In addition, “generate a prediction of the most likely character in each position” reads on “selecting a next character of a predicted event based upon greatest likelihood of the selected character appearing next in the generated sequence” since each character is predicted in each position based on its greatest likelihood.).

Regarding claim 7, 
Leeman-Munk and Suh teach claim 1.

Leeman-Munk further teaches 
obtaining a dataset of machine-generated data ([par 57], "Notably, various other devices can further be used to influence communication routing or processing between devices within computing environment 214 and with devices outside of computing environment 214. For example, as shown in FIG. 2, computing environment 214 may include a machine 240 that is a web server. Computing environment 214 can retrieve data of interest, such as client information (e.g., product information, client rules, etc.), technical product details, news, blog posts, e-mails, forum posts, electronic documents, social media posts (e.g., Twitter™ posts or Facebook™ posts), and so on."); 

generating, by the data generator and in accordance with the example structured events of the training corpus, multiple predicted events from the obtained dataset of machine-generated data ([par 134] "Although the steps of FIGS. 5, 8, and 9 are described sequentially for simplicity, it should be understood that many of the steps can be performed concurrently. Additionally, although the steps of FIGS. 5, 8, and 9 are described with respect to analyzing a single noncanonical term for simplicity, in some examples, the neural network can analyze multiple terms concurrently. For example, the neural network may simultaneously analyze an entire noncanonical communication, such as an entire tweet, and output normalized versions of any noncanonical terms as necessary."; “analyze multiple terms concurrently” and “output normalized versions of any noncanonical terms as necessary” reads on “generating … multiple predicted events”.).

Regarding claim 9,
Claim 9 is a computer-readable media claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 1. Note that Leeman-Munk teaches computer-readable media and processors [pars 40-41, 133]).

Regarding claim 10,
Leeman-Munk and Suh teach claim 9.
Claim 10 is a computer-readable media claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 2. 

Regarding claim 11,
Leeman-Munk and Suh teach claim 9.
Claim 11 is a computer-readable media claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 3. 

Regarding claim 12,
Leeman-Munk and Suh teach claim 9.


Regarding claim 13,
Leeman-Munk and Suh teach claim 9.
Claim 13 is a computer-readable media claim corresponding to the method claim 6, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 6.

Regarding claim 14,
Leeman-Munk and Suh teach claim 9.
Claim 14 is a computer-readable media claim corresponding to the method claim 7, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 7.

Regarding claim 16
Claim 16 is a system claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 1. Note that Leeman-Munk teaches computer-readable media and processors ([pars 40-41, 133]).

Regarding claim 17, 
Leeman-Munk and Suh teach claim 16.


Regarding claim 18, 
Leeman-Munk and Suh teach claim 16.
Claim 18 is a system claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 4.

Regarding claim 19, 
Leeman-Munk and Suh teach claim 16.
Claim 19 is a system claim corresponding to the method claim 6, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 6.

Regarding claim 20, 
Leeman-Munk and Suh teach claim 1. 

Leeman-Munk further teaches 
the predicted content in the data field generated using the data generator includes textual content derived from error logs associated with a same model of a computer system as used to produce the example structured events ([figs 1-2, 13-15]; [pars 148-157] “In block 1504, the processor replaces a character in a canonical term with a random character to generate a noncanonical version of the canonical term. … In block 1506, the processor can include the canonical term and the noncanonical version of the canonical term in a database for use in training a neural network. For example, the processor can include the canonical term “hello' mapped to its corresponding noncanonical version “hzlly” in the database. The processor can repeat steps 1504-1506 for all of the terms in the dataset, thereby generating a database of annotated training data. Thereafter, the processor can use the database to train the neural network (e.g., according to a supervised or a semi-super vised training method).”; [pars 142-147] “FIG. 14 shows the output from four different text-normalizers, with the neural network of the present application being designated as “DeepNorm.” As shown, DeepNorm provides a more accurate result than the Skip-Bigram approach, which is specifically designed for use in tweet normalization applications. And both DeepNorm and CLSTM by Word have one incorrect word, while the CLSTM by Sentence approach performs slightly better than DeepNorm.”; “the processor … to generate a noncanonical version of the canonical term” and “the processor can use the database to train the neural network” in combination read on “textual content derived from error logs associated with a same model of a computer system as used to produce the example structured events” since the processor is used for producing example structured events and is also associated with error logs (i.e. incorrect estimations in the training.)).

Claims 8 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Leeman-Munk et al. (US 2016/0350646 A1) in view of Suh et al. (Suh, B, “Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network”), further in view of Andrzejewski et al. (Inferring compositional style in the neo-plastic paintings of Piet Mondrian by machine learning), further in view of REI et al. (WO 2017/006104 A1).

Regarding claim 8, 
Leeman-Munk and Suh teach claim 1. 


obtaining a dataset of machine-generated data, wherein the dataset of machine-generated data includes a sequence of textual characters ([par 57], "Notably, various other devices can further be used to influence communication routing or processing between devices within computing environment 214 and with devices outside of computing environment 214. For example, as shown in FIG. 2, computing environment 214 may include a machine 240 that is a web server. Computing environment 214 can retrieve data of interest, such as client information (e.g., product information, client rules, etc.), technical product details, news, blog posts, e-mails, forum posts, electronic documents, social media posts (e.g., Twitter™ posts or Facebook™ posts), and so on."; Leeman-Munk describes different machine-generated data which include textual characters.);

processing the obtained dataset of machine-generated data by the data generator in order of the sequence of textual characters ([par 105] "Referring to FIG. 8, in block 802, the processor generates a vector having a predetermined length from a term in the noncanonical communication. The vector can include the characters in the term."; Note that the vector including characters in a term in order of the character sequence is inputted to the neural network for further processing.);

during the processing of the obtained dataset, calculating a statistical prediction of a next yet-to-be-processed group of one or more textual characters of the sequence of the textual characters ([par 121] "In block 916, the neural network determines multiple probabilities of particular characters being in particular positions in a normalized version of the term based on the context embedding. For example, the context embedding can be transmitted to an output layer (e.g., a multi-softmax layer) of the neural network. The output layer can use a feed-forward layer to generate a prediction of the most likely character in each position of the normalized version of the term, as well as a flag indicating whether the term should be normalized or left as-is (as discussed in greater detail below).").

However, Leeman-Munk and Suh do not teach 
generating, by the data generator, a predicted event from the obtained dataset of machine-generated data one textual character at a time, wherein each generated character is generated based on the calculated statistical prediction of the next yet-to-be-processed group of one or more textual characters.

REI teaches 
generating, by the data generator, a predicted event from the obtained dataset of machine-generated data one textual character at a time, wherein each generated character is generated based on the calculated statistical prediction of the next yet-to-be-processed group of one or more textual characters ([pg 11, ln 4] "The ANN 400 generates one or more predicted next items in a sequence of items based on an input sequence item, for example predicting the next word that a user may wish to include in a sentence based on the previous word that the user has input to the system. The following description is presented with respect to the specific embodiment of predicting the next word in a sequence of words, but it will be appreciated that the disclosure can be readily generalised to other sequences of items with no changes to the architecture of the ANN 400 by training the ANN 400 on different sets of data. For example, the same ANN 400 could be used to predict the next item in a sequence of items, for example: words, characters, logogram character strokes, e.g. Hanzi, morphemes, word segments, punctuation, emoticons, emoji, stickers, and hashtags, or optical character recognition or user intention prediction."; [pg 1, ln 21] "An artificial neural network is a statistical learning algorithm, the architecture of which is derived from the networks of neurons and synapses found in the central nervous systems of animals.").

Leeman-Munk, Suh, Andrzejewski and REI are all processing inputs with machine learning and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data generation method of Leeman-Munk and Suh with the event generation at a time on the basis of maintaining the context of previous predicted items between predictions of REI. Doing so would improve the accuracy of the system when used in an inherently context-based application such as language modelling (REI, pg 9, ln 34 – pg 10, ln 2).

Regarding claim 15,
Leeman-Munk and Suh teach claim 9.
Claim 15 is a computer-readable media claim corresponding to the method claim 8, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 8.

Response to Arguments
Applicant's arguments filed on 09/17/2020 have been fully considered but they are not persuasive.
Applicant asserts 
“Applicant respectfully submits that the cited references fail to teach or suggest "train a data generator to generate new predicted structured events based on the example structured events of the training corpus, a new predicted structured event of the new predicted structured events having a data field defined by an extraction rule from the training corpus and predicted content in the data field, wherein the predicted content 
The Leeman-Munk reference was cited to for ostensibly teaching this claim aspect. In contrast, the Leeman-Munk reference is directed to normalizing electronic communications using a neural network. See Abstract. In particular, Leeman-Munk discusses generating canonical word forms from noncanonical word forms using a neural network. For example, "a computing device can determine a normalized version of a noncanonical term using a neural network. The computing device can use the neural network to determine a canonical form of the noncanonical term. For example, the computing device can use the neural network to determine that a normalized version of the noncanonical term 'u' is 'you.' Additionally, the computing device can use the neural network to analyze the noncanonical term and determine if the noncanonical term should be normalized. If the neural network determines that the noncanonical term should be normalized, the computing device can output the normalized version of the noncanonical term." Leeman Munk, Para. [0034]. "[A] word can be in a noncanonical form if the word is misspelled according to an accepted and standardized spelling of the word or does not comport with one or more standardized grammatical rules." Id. at Para. [0032]. 
Although Leeman-Munk discuses generating canonical word forms from noncaonical word forms, there is no teaching or suggestion of generating new predicted structured events having predicted content comprising a textual indication of a system error or a system performance, as in claim 1. Instead, Leeman-Munk discusses (Remarks, pg 13)

Examiner’s response:
The examiner respectively disagrees. 

Leeman-Munk still teaches the limitation below since the predicted content comprises a textual indication of a system error or a system performance as follows:

the predicted content generated via the data generator comprises a textual indication of a system error or a system performance ([figs 13-14]; [pars 142-147] “FIG. 14 shows the output from four different text-normalizers, with the neural network of the present application being designated as “DeepNorm.” As shown, DeepNorm provides a more accurate result than the Skip-Bigram approach, which is specifically designed for use in tweet normalization applications. And both DeepNorm and CLSTM by Word have one incorrect word, while the CLSTM by Sentence approach performs slightly better than DeepNorm.”; “incorrect word” reads on “a textual indication of a system error or a system performance”.).

For more details, see the rejections. Thus, the examiner’s rejections are reasonable and proper.

Applicant asserts 
“Further, Applicant submits that the cited references fail to teach or suggest that the predicted content is generated via the data generator by generating the predicted content, one new character at a time, as a sequence of new characters based upon calculated statistical predictions of the sequence of the textual characters of the example structured events of the training corpus, as recited in claim 1. The Leeman-Munk reference was generally cited to for this claim aspect (e.g., with respect to previously presented dependent claim 6). Leeman-Munk discusses that the "neural network determines multiple probabilities of particular characters being in particular positions in a normalized version of the term based on the context embedding." Para. [0121]. "[A] normalized version of the term is determined based on the multiple probabilities." Para. [0130]. "The neural network 1100 can, in some examples, operate simultaneously at the character level and the word level, performing both character-level corrections and word-level corrections to generate the normalized version of the noncanonical term. For example, the neural network 1100 can construct the normalized version of the term character-by-character, while also using the BGRNN 1106 to perform context analysis..." Para. [0141]. 
Although Leeman-Munk mentions probabilities of particular characters, there is no teaching or suggestion of generating the predicted content, one new character at a time, as a sequence of characters based upon the calculated statistical predictions of the sequence of the textual characters of the example structured events of the training corpus, as in claim 1. Instead, Leeman-Munk merely mentions "determin[ing] multiple (Remarks, pg 14)

Examiner’s response:
The examiner respectively disagrees. 

Leeman-Munk still teaches the limitation below since the most likely character in each position is estimated at a time based on statistical predictions of textual characters of the training corpus as follows. Note that, as shown in fig 15, canonical terms and the noncanonical version of the canonical terms are used for training the neural network based on statistical predictions.

the predicted content is generated via the data generator by generating the predicted content, one new character at a time, as a sequence of new characters based upon calculated statistical predictions of the sequence of the textual characters of the example structured events of the training corpus ([figs 9-12]; [pars 112-130] "In block 916, the neural network determines multiple probabilities of particular characters being in particular positions in a normalized version of the term based on the context embedding. For example, the context embedding can be transmitted to an output layer (e.g., a multi-softmax layer) of the neural network. The output layer can use a feed-forward layer to generate a prediction of the most likely character in each position of the normalized version of the term, as well as a flag indicating whether the term should be normalized or left as-is (as discussed in greater detail below). … Because each row can correspond to a particular character position in the normalized version of the term, the determined characters can be arranged in the correct order to generate the normalized version of the term. For example, the neural network or the processor can arrange the determined characters into the normalized version of the term, "you.""; [pars 135-141] “The neural network 1100 can, in some examples, operate simultaneously at the character level and the word level, performing both character-level corrections and word level corrections to generate the normalized version of the noncanonical term. For example, the neural network 1100 can construct the normalized version of the term character-by-character, while also using the BGRNN 1106 to perform context analysis, thereby increasing the accuracy that the normalized version of the term is correct.”; “generate the normalized version of the term” and “generate a prediction of the most likely character in each position of the normalized version of the term” read on “generating the predicted content, one new character at a time, as a sequence of new characters”, and the generating is based on the statistical predictions with the neural network as described in the paragraphs above. In addition, “multiple probabilities of particular characters” and “prediction of the most likely character” read on “calculated statistical predictions of the sequence of the textual characters”.),

For more details, see the rejections. Thus, the examiner’s rejections are reasonable and proper.

Applicant asserts 
“Further, the dependent claims also independently recite novel subject matter. For example, with respect to dependent claim 6, claim 6 recites: "wherein the generating 
However, there is no teaching or suggestion of selecting a next character of a predicted event based upon greatest likelihood of the selected character appearing next in the generated sequence as determined by the calculated statistical predictions of the sequence of the textual characters of the example structured events of the training corpus, as recited in claim 6. Instead, Leeman-Munk merely mentions "character-level corrections" and "character-by- character," but not that selection of a next character of a predicted event is based on a greatest likelihood of the selected character appearing next in the generated sequence as determined by the calculated statistical predictions of the sequence of the textual characters of the example structured events of the training corpus, as in claim 6.” (Remarks, pg 16)

Examiner’s response:
The examiner respectively disagrees. 

Leeman-Munk still teaches the limitation below since a next character of a predicted event is selected based on the prediction of the most likely character in each position as determined by the calculated statistical predictions with the training corpus as follows:

the generating the predicted content includes selecting a next character of a predicted event based upon greatest likelihood of the selected character appearing next in the generated sequence as determined by the calculated statistical predictions of the [par 92] "The neural network can determine the cost between a result from the neural network and a desired result and back propagate to reduce the cost. For example, a noncanonical term "u" (an erroneous version of the word "you") can be input into the neural network. During training, the neural network can determine that the letter "y" is 75% likely for a first character in the desired result, the letter "o" is 95% likely for a second character in the desired result, and the letter "u" is 89% likely for a third character in the desired result. A negative log likelihood for each letter can be determined to be (0.29, 0.5, 0.12). The negative log likelihoods can be summed together to determine the cost, which can be back propagated through the neural network to train the numeric weights."; [par 121] "In block 916, the neural network determines multiple probabilities of particular characters being in particular positions in a normalized version of the term based on the context embedding. For example, the context embedding can be transmitted to an output layer (e.g., a multi-softmax layer) of the neural network. The output layer can use a feed-forward layer to generate a prediction of the most likely character in each position of the normalized version of the term, as well as a flag indicating whether the term should be normalized or left as-is (as discussed in greater detail below)."; “most likely” and the likeliness of each letter read on “greatest likelihood” when selecting/predicting a new character and “calculated statistical predictions”. In addition, “generate a prediction of the most likely character in each position” reads on “selecting a next character of a predicted event based upon greatest likelihood of the selected character appearing next in the generated sequence” since each character is predicted in each position based on its greatest likelihood.).



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409.  The examiner can normally be reached on Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/S.K./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123