Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2021-06-24 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Response to Amendment
The amendment filed 2021-06-24 has been entered. Claims 1-15 remain pending in the application. Applicant’s amendments to the claims overcome each and every objection and 112(b) rejection previously set forth in the Non-Final Office Action mailed 2021-02-24.
Response to Arguments
Applicant's arguments in response to rejections under 35 USC 103 have been fully considered but are not persuasive.
In response to Applicant’s argument that “Heaton teaches that, during manufacturing, sets of sequences of sensory input can be saved into the memory of the device. This does not refer to observed sensory sequence information”, Examiner respectfully disagrees.  Heaton, Col 7 Lines 22-25, discloses that “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.  Examiner asserts that “sequences of sensory input” are “sensory sequence information”.  Examiner also asserts that the term “observed” can be very broadly interpreted, and that in order to be “saved in memory”, the sequences must first be 
In response to Applicant’s argument that there is no motivation to apply Wang to Heaton, absent an attempt to reconstruct the instant invention, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).
In response to Applicant’s argument that Heaton and Wang are not analogous art as they are not both in the field of endeavor of machine learning because Heaton does not recite machine learning, Examiner respectfully disagrees.  Examiner points out that Heaton, Col 7 Lines 29-34 discloses “The user can also execute learning mode software routines”, which are carried out by a machine, and can thus be interpreted as machine learning.  The user is carrying out a form of training when they “perform a sequence of actions typical of the event the user wishes the device to detect”.
In response to Applicant’s argument that Holliday does not teach that “size of the plurality of history windows increase exponentially from a last observed time step” because the windows are overlapping, Examiner notes that the sizes still increase exponentially, regardless of whether or not the windows are overlapping.  The sizes of the windows and whether or not they overlap are two distinct properties.  Applicant’s amendment that specifies the windows as 
In response to Applicant’s argument that there is no motivation to apply Holliday to Wang, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, Holliday states that their method “provides more accurate past data”.
In response to Applicant’s argument that there is no motivation to apply Holliday to Wang, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin
In response to Applicant’s argument that Thorhallsson does not teach hyperparameters chosen for each of the plurality of history windows, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
In response to Applicant’s arguments that combining six or seven references relies on improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).
Claim Objections
Claims 2, 10, and 15 are objected to because of the following informalities:  “for each of the plurality of history window” should be changed to read “for each of the plurality of history windows”.  Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 4 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Heaton et. al. (US Patent 9,997,039 B1; hereinafter “Heaton”) in view of Wang et. al. (“genCNN: A Convolutional Architecture for Word Sequence Prediction”; hereinafter “Wang”), Holliday (US PGPub 2008/0059274 A1; hereinafter “Holliday”), and Begleiter et. al. (“On Prediction Using Variable Order Markov Models”; hereinafter “Begleiter”).
As per claim 1, Heaton teaches an artificial intelligence system, comprising: a computing device including at least one processor, one or more data storage devices, and a non-transitory data storage medium interfaced with the at least one processor, the non-transitory data storage medium containing instructions that, when executed cause the at least one processor to (Heaton, Col 7 Lines 29-34 discloses “learning mode”.  Heaton, Col 12 Lines 35-38, discloses a “computer system” with a “dynamic storage device” and a “processor”.  Heaton, Col 11 Lines 64-67, also discloses a “computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any one of the foregoing methods”).
save observed sensory sequence information (Heaton, Col 7 Lines 22-25, discloses that “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.”)
Heaton fails to teach that the sequence information is saved in a plurality of history windows, the plurality of history windows being reverse chronological history windows. Heaton also fails to teach wherein a size of the plurality of history windows increase exponentially from a last observed time step; apply a function to the observed sensory sequence information in 
Wang teaches that the sequence information is saved in a plurality of history windows, the plurality of history windows being reverse chronological history windows. (Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”  Wang, Section 2 Last Sentence, discloses “Also distinct from RNN, genCNN gains most of its processing power from the heavy-duty processing units (i.e., CNN and CNNs), which follow a bottom-up information flow and yet can adequately capture the temporal structure in word sequence with its convolutional-gating architecture.”  Examiner’s Note:  Here, alphaCNN and betaCNN are the plurality of history windows, and they are saving word sequence information. These windows go back in time progressively further, and are thus reverse chronological history windows.
Heaton and Wang are analogous art because they are both in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Wang with Heaton to include saving sequence information in a plurality of history windows.  One would have been motivated to do so to “handle sentences with arbitrary length” (Wang, Section 3.3).
The combination of Heaton and Wang fails to teach wherein a size of the plurality of history windows increase exponentially from a last observed time step. The combination of Heaton and Wang also fails to teach apply a function to the observed sensory sequence information in each history window, wherein the function maps the observed sensory sequence information into a fixed set of discrete classes; and apply a context tree weighting algorithm to an alphabet resulting from the fixed set of discrete classes for each of the plurality of history windows to predict a future discrete sequence.
Holliday teaches wherein a size of the plurality of history windows increase exponentially from a last observed time step. (Holliday Para [0093]-[0094] discloses “The numbers are placed into a Binary Exponential History. This takes the numbers for each minute, and then averages them over 12 different time periods (2 minutes, 4 minutes, 8 minutes . . . up to 4096 minutes).  Multiple Linear Regression is then used to create an approximate linear function for the number of arrivals that there will be over the next (m) minutes, given the 36 different averages for what has happened previously.”  Examiner’s Note:  Here “what has happened previously” is analogous to “from a last observed time step”).
Heaton, Wang, and Holliday are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Holliday with the combination of Heaton and Wang to include saving sequence information in exponentially increasing history windows.  One would have been motivated to do so in order to 
The combination of Heaton, Wang, and Holliday fails to teach apply a function to the observed sensory sequence information in each history window, wherein the function maps the observed sensory sequence information into a fixed set of discrete classes; and apply a context tree weighting algorithm to an alphabet resulting from the fixed set of discrete classes for each of the plurality of history windows to predict a future discrete sequence.
Begleiter teaches apply a function to the observed [sensory] sequence information [in each history window], wherein the function maps the observed [sensory] sequence information into a fixed set of discrete classes (Begleiter, Section 3.4 Paragraph 2, discloses, for sequence information (“each alphabet symbol”), a function to map the information into a fixed set of discrete classes (“concatenating binary words of size k, one for each alphabet symbol”).  *Heaton discloses that the sequence information is sensory:   Heaton, Col 7 Lines 22-25, discloses that “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.” *Wang discloses saving sequence information in a plurality of history windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”
and apply a context tree weighting algorithm to an alphabet resulting from the fixed set of discrete classes [for each of the plurality of history windows] to predict a future discrete Begleiter, Section 3.4 Paragraph 2, discloses a “standard binary ctw algorithm over a binary representation of the sequence”.  Begleiter, in the Abstract, identifies CTW as a sequence prediction algorithm: “prediction algorithms, including Context Tree Weighting (CTW)”).  *Wang discloses saving sequence information in a plurality of history windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”
Heaton, Wang, Holliday, and Begleiter are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Begleiter with the combination of Heaton, Wang, and Holliday to include applying a CTW algorithm to a fixed set of discrete classes mapped from an alphabet.  One would have been motivated to do so for the purpose of “extending the ctw algorithm for large alphabets” (Begleiter, Section 3.4, Paragraph 1).

As per Claim 3, the combination of Heaton, Wang, Holliday, and Begleiter teaches the artificial intelligence system of claim 1 as shown above, as well as wherein the function is a feature-wise maximum over time steps in one or more of the plurality of history windows. (Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”  Wang, Section 3.2 Line 1, discloses another embodiment in which “Previous CNNs, including those for NLP tasks (Hu et al., 2014; Kalchbrenner et al., 2014), take a straightforward convolution-pooling strategy, in which the ‘fusion’ decisions (e.g., selecting the largest one in max-pooling) are made based on the values of feature-maps”.  Examiner’s Note:  Here, Wang is describing a plurality of history windows alphaCNN and betaCNN that comprise time steps.  Each of these history windows comprises a convolutional neural network (CNN).  These CNNs comprise a “convolution-pooling strategy” (i.e., function), such as “selecting the largest one” (i.e., maximum) “based on the values of feature-maps” (i.e. feature-wise). This function is done in each CNN, and is thus applied over time steps in the plurality of history windows). 
Heaton, Holliday, Begleiter, and Wang are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine this teaching of Wang with the existing combination of Heaton, Holliday, Begleiter, and the primary teaching of Wang to include max-pooling (selecting the largest one) based on the values of feature maps.  One would have been motivated to do so because it is a “straightforward” strategy (Wang, Section 3.2 Line 1).

As per Claim 4, the combination of Heaton, Wang, Holliday, and Begleiter teaches the artificial intelligence system of claim 3 as shown above, as well as wherein the observed sensory sequence information is a binary event (Begleiter, Section 3.1, Paragraph 1, discloses “In this section we consider the original ctw algorithm for binary alphabets”.  Heaton, Col 7 Lines 22-25, discloses that the information is sensory: “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.”).

As per Claim 7, the combination of Heaton, Wang, Holliday, and Begleiter teaches the artificial intelligence system of claim 1 as shown above, as well as wherein the instructions cause the at least one processor to perform a temporal convolution in a deep neural network to map observed sensory sequence information from the plurality of history windows to symbols. (Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”  Wang, Section 2, defines each of the history windows as Convolutional Neural Networks (alpha-CNN and beta-CNN), which “capture the temporal structure” of the sequence.  Wang Figures 2 and 4 illustrate that each CNN makes a prediction of the next word (i.e., symbol).  Heaton, Col 11 Lines 64-67, also discloses a “computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any one of the foregoing methods.  Heaton, Col 7 Lines 22-25, discloses that the information is sensory: “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.”

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Heaton, Wang, Holliday, and Begleiter, further in view of Thorhallson et. al. (“Visualizing the Bias-Variance Tradeoff”; hereinafter, “Thorhallsson”).
As per Claim 2, the combination of Heaton, Wang, Holliday, and Begleiter teaches the artificial intelligence system of claim 1 as shown above.  However, the combination of Heaton, Wang, Holliday, and Begleiter fails to teach wherein the instructions cause the at least one processor to choose at least one hyperparameter for each of the plurality of history windows to allow the system to trade off bias-variance.
Thorhallsson teaches wherein [the instructions cause the at least one processor] to choose at least one hyperparameter [for each of the plurality of history windows] to allow the system to trade off bias-variance.  (Thorhallsson, Intro Para 2, discloses that “This leads to a fundamental tradeoff known as the bias-variance tradeoff which is of paramount importance for optimal choice of the hyperparameters for the learning algorithms”).  *Heaton, Col 11 Lines 64-67, discloses a “computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any one of the foregoing methods”.  *Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history” (i.e., a plurality of history windows comprising CNNs, and CNNs are a machine learning model and therefore have a bias-variance tradeoff).
 Heaton, Wang, Holliday, Begleiter, and Thorhallsson are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary .

Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Heaton, Wang, Holliday, and Begleiter, further in view of Campbell et. al. (US PGPub US 2019/0012371 A1 ; hereinafter, “Campbell”).

As per Claim 5, the combination of Heaton, Wang, Holliday, and Begleiter teaches the artificial intelligence system of claim 1 as shown above.  However, the combination of Heaton, Wang, Holliday, and Begleiter fails to teach wherein the instructions cause the at least one processor to use a deep neural network classifier to map arbitrary length histories to a second alphabet having smaller length than the alphabet of the arbitrary length histories as an input sequence for the context tree weighting algorithm.
Campbell teaches wherein [the instructions cause the at least one processor] to use a deep neural network classifier to map [arbitrary length] histories to a second alphabet having smaller length than the alphabet of the [arbitrary length] histories [as an input sequence for the context tree weighting algorithm]. (Campbell Para [0077] discloses “The belief tracker component 408 is configured to identify what table(s) of the database and what column(s) of the tables of the database are being implicated by the user's utterance. In particular, belief tracker component 408 implements a neural network (e.g., a recurrent neural network) that is configured to map dialog history to belief states. A belief state is a distribution over user goals and dialog states (e.g., context). The output of the belief tracker is an encoding of both the current user utterance and the history of utterances of user-system utterances. The user goal is related to: one or more tables of the database and their respective columns of metadata, such as names and data types; and vocabulary of columns (e.g., slots). The belief tracker component is configured to receive the feature vector as an input from the feature extractor component 404, concatenate the feature vector with the encoded dialog history that was generated by the context encoding component 406, and produce a probability distribution vector over the columns of the multiple tables of the database 410.”  Examiner’s Note:  A recurrent neural network is a type of deep neural network.  Campbell is using this to “map” (i.e., classify) “dialog history” to “belief states”.  The “belief states” are a smaller alphabet than the “user utterances” that comprise the “dialog history”, as they are “columns of the multiple tables of the database”).  *Heaton, Col 11 Lines 64-67, also discloses a “computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any one of the foregoing methods.   *Wang discloses arbitrary length histories with the repeating betaCNN windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”  *Begleiter discloses mapping an alphabet before using as input to a CTW algorithm:  Begleiter, Section 3.4 Paragraph 2, discloses “concatenating binary words of size k, one for each alphabet symbol” and uses this for “application of the standard binary ctw algorithm over a binary representation of the sequence”.  
Heaton, Wang, Holliday, Begleiter, and Campbell are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Campbell with the combination of Heaton, Wang, Holliday, and Begleiter to include mapping sequence histories to a smaller set of states.  One would have been motivated to do so to extract underlying context or meaning from a sequence, as Campbell states:  “map dialog history to belief states“. (Campbell Para [0077]).

As per Claim 6, the combination of Heaton, Wang, Holliday, Begleiter, and Campbell teaches the artificial intelligence system of claim 5 as shown above, as well as wherein a long short-term memory-based sequence to symbol method is used to map the arbitrary length histories to a second alphabet having smaller length than the alphabet of the arbitrary length histories. (Campbell Para [0077] discloses “The belief tracker component 408 is configured to identify what table(s) of the database and what column(s) of the tables of the database are being implicated by the user's utterance. In particular, belief tracker component 408 implements a neural network (e.g., a recurrent neural network) that is configured to map dialog history to belief states. A belief state is a distribution over user goals and dialog states (e.g., context). The output of the belief tracker is an encoding of both the current user utterance and the history of utterances of user-system utterances. The user goal is related to: one or more tables of the database and their respective columns of metadata, such as names and data types; and vocabulary of columns (e.g., slots). The belief tracker component is configured to receive the feature vector as an input from the feature extractor component 404, concatenate the feature vector with the encoded dialog history that was generated by the context encoding component 406, and produce a probability distribution vector over the columns of the multiple tables of the database 410.”  Examiner’s Note:  A recurrent neural network is a type of deep neural network.  Campbell is using this to “map” (i.e., classify) “dialog history” to “belief states”.  The “belief states” are a smaller alphabet than the “user utterances” that comprise the “dialog history”, as they are “columns of the multiple tables of the database.”  Furthermore, Campbell, Para [0071] first sentence, discloses that “dialog manager component 412 and belief tracker component 408 are trained via the supervised learning component 602.” Campbell Para [0074] then discloses that “In some embodiments of the present invention, the supervised learning component 602 can be represented using a variety of suitable techniques, such as for example, multiplayer perceptron (MLP) representation, gated recurrent unit (GRU) representation, long-short term memory (LSTM) representation, and/or a memory network representation”).

Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Heaton, Wang, Holliday, and Begleiter, further in view of Pan et. al. (US PGPub US 2020/0211106 A1 ; hereinafter, “Pan”).
As per Claim 8, the combination of Heaton, Wang, Holliday, and Begleiter teaches the artificial intelligence system of claim 7 as shown above.  However, the combination of Heaton, Wang, Holliday, and Begleiter fails to teach wherein the temporal convolution includes defining each of the plurality of history windows of events as a (2^k)-by-n matrix, where 2^k is a number of time steps in each of the plurality of history windows and n is a number of events at each of the time steps, and applying a convolution that is an l-by-n matrix, where l is less than 2^k, wherein a set of the convolutions produces a new set of events.
Pan teaches wherein the [temporal] convolution includes defining each [of the plurality of history windows] of events as a (2^k)-by-n matrix, where 2^k is a number of [time steps in each of the plurality of history windows] and n is a number of events [at each of the time steps], and applying a convolution that is an l-by-n matrix, where l is less than 2^k, wherein a set of the convolutions produces a new set of events. (Pan Para [0054] discloses an “n*m feature matrix” in which one dimension “m is the quantity of sub time periods” (i.e., time steps) and the other dimension “n is the quantity of feature types” (i.e., events).  Pan Para [0060] discloses that  “The input layer inputs each sample (n*m feature matrix) to a convolutional layer for convolution.”  Pan further discloses “convolution kernel quantity and size can be specified as needed” (Examiner’s note:  a convolution kernel is a matrix) wherein “at least one of a row quantity or a column quantity of the convolution kernel can be a predetermined quantity of feature types”, which is analogous to the instant application, where “n” is the column dimension for both the 2^k-by-n matrix and the l-by-n matrix (Pan describes an “n-by-m feature matrix”, and “a size of each convolution kernel can be n*j”).  Pan further discloses that the remaining dimension of the convolutional kernel must be less than the corresponding dimension of the feature matrix (“size of each convolution kernel can be n*j, where j is a positive integer less than m”).  This is analogous to the instant application, where l must be less than 2^k.  Pan Para [0060] further states that “convolutional layer can output 100s feature graphs.” (i.e. the output is a new set of events).  *Wang teaches temporal convolutions in a plurality of history windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”  Wang, Section 2, defines each of the history windows, which comprise time steps, as Convolutional Neural Networks (alpha-CNN and beta-CNN), which “capture the temporal structure” of the sequence. 
Heaton, Wang, Holliday, Begleiter, and Pan are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Pan with the combination of Heaton, Wang, Holliday, and Begleiter to include a convolution kernel wherein at least one of a row quantity or a column quantity of the convolution kernel can be a predetermined quantity of feature types (i.e., events).  One would have been motivated to do so “because feature types (i.e. events) are not continuous, the convolution kernel does not need to be scanned in a distribution direction of each feature type”. (Pan, Para [0060], Sentence 3)

As per Claim 9, the combination of Heaton, Wang, Holliday, Begleiter, and Pan teaches the artificial intelligence system of claim 8 as shown above.  The combination of Heaton, Wang, Holliday, Begleiter, and Pan further teaches wherein the convolution is applied to each (Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history. Wang Section 2 describes each of the history windows as Convolutional Neural Networks (CNNs)).
 
Claims 10, 11, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Heaton in view of Wang, Holliday, Begleiter, Thorhallsson, and Campbell.  
As per Claim 10, Heaton teaches an artificial intelligence system, comprising: a computing device including at least one processor, one or more data storage devices, and a non-transitory data storage medium interfaced with the at least one processor, the non-transitory data storage medium containing instructions that, when executed cause the at least one processor to (Heaton, Col 7 Lines 29-34 discloses “learning mode”.  Heaton, Col 12 Lines 35-38, discloses a “computer system” with a “dynamic storage device” and a “processor”.  Heaton, Col 11 Lines 64-67, also discloses a “computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any one of the foregoing methods”).
save observed sensory sequence information (Heaton, Col 7 Lines 22-25, discloses that “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.”)
Heaton fails to teach that the sequence information is saved in a plurality of history windows, the plurality of history windows being reverse chronological history windows. Heaton also fails to teach wherein a size of the plurality of history windows increase exponentially from a last observed time step; apply a function to the observed sensory sequence information in each history window, wherein the function maps the observed sensory sequence information into a fixed set of discrete classes, fixed for all of the plurality of history windows; choose at least one hyperparameter for each of the plurality of history windows to allow the system to trade off bias-variance; and apply a context tree weighting algorithm to an alphabet resulting from the fixed set of discrete classes for each of the plurality of history windows to predict a future discrete sequence; and use a deep neural network classifier map arbitrary length histories to a second alphabet having smaller length than the alphabet of the arbitrary length histories as an input sequence for the context tree weighting algorithm.
Wang teaches that the sequence information is saved in a plurality of history windows, the plurality of history windows being reverse chronological history windows. (Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history. ”   Wang, Section 2 Last Sentence, discloses “Also distinct from RNN, genCNN gains most of its processing power from the heavy-duty processing units (i.e., CNN and CNNs), which follow a bottom-up information flow and yet can adequately capture the temporal structure in word sequence with its convolutional-gating architecture.”  Examiner’s Note:  Here, alphaCNN and betaCNN are the plurality of history windows, and they are saving word sequence information. These windows go back in time progressively further, and are thus reverse chronological history windows).
Heaton and Wang are analogous art because they are both in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Wang with Heaton to include saving sequence information in a plurality of history windows.  One would have been motivated to do so to “handle sentences with arbitrary length” (Wang, Section 3.3).
The combination of Heaton and Wang fails to teach wherein a size of the plurality of history windows increase exponentially from a last observed time step. The combination of Heaton and Wang also fails to teach apply a function to the observed sensory sequence information in each history window, wherein the function maps the observed sensory sequence information into a fixed set of discrete classes, fixed for all of the plurality of history windows; choose at least one hyperparameter for each of the plurality of history windows to allow the system to trade off bias-variance; and apply a context tree weighting algorithm to an alphabet resulting from the fixed set of discrete classes for each of the plurality of history windows to predict a future discrete sequence; and use a deep neural network classifier map arbitrary length histories to a second alphabet having smaller length than the alphabet of the arbitrary length histories as an input sequence for the context tree weighting algorithm.
Holliday teaches wherein a size of the plurality of history windows increase exponentially from a last observed time step. (Holliday Para [0093]-[0094] discloses  “The numbers are placed into a Binary Exponential History. This takes the numbers for each minute, and then averages them over 12 different time periods (2 minutes, 4 minutes, 8 minutes . . . up to 4096 minutes).  Multiple Linear Regression is then used to create an approximate linear function for the number of arrivals that there will be over the next (m) minutes, given the 36 different averages for what has happened previously.”  Examiner’s Note:  Here “what has happened previously” is analogous to “from a last observed time step”). 
Heaton, Wang, and Holliday are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Holliday with the combination of Heaton and Wang to include saving sequence information in exponentially increasing history windows.  One would have been motivated to do so in order to include a more comprehensive data set for analysis, as stated by Holliday: “provides more accurate past data” (Holliday Para [0027]).
The combination of Heaton, Wang, and Holliday fails to teach apply a function to the observed sensory sequence information in each history window, wherein the function maps the observed sensory sequence information into a fixed set of discrete classes, fixed for all of the plurality of history windows; and apply a context tree weighting algorithm to an alphabet resulting from the fixed set of discrete classes for each of the plurality of history windows to predict a future discrete sequence. The combination of Heaton, Wang, and Holliday also fails to teach choose at least one hyperparameter for each of the plurality of history windows to allow the system to trade off bias-variance; and use a deep neural network classifier map arbitrary length histories to a second alphabet having smaller length than the alphabet of the arbitrary length histories as an input sequence for the context tree weighting algorithm.
Begleiter teaches apply a function to the observed [sensory] sequence information [in each history window], wherein the function maps the observed [sensory] sequence information into a fixed set of discrete classes, fixed [for all of the plurality of history windows] (Begleiter, Section 3.4 Paragraph 2, discloses, for sequence information (“each alphabet symbol”), a function to map the information into a fixed set of discrete classes (“concatenating binary words of size k, one for each alphabet symbol”).  *Heaton discloses that the sequence information is sensory:  Heaton, Col 7 Lines 22-25, discloses that “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.” *Wang discloses saving sequence information in a plurality of history windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”

and apply a context tree weighting algorithm to an alphabet resulting from the fixed set of discrete classes [for each of the plurality of history windows] to predict a future discrete sequence (Begleiter, Section 3.4 Paragraph 2, discloses a “standard binary ctw algorithm over a binary representation of the sequence”.  Begleiter, Abstract, identifies CTW as a sequence prediction algorithm: “prediction algorithms, including Context Tree Weighting (CTW)”).  *Wang discloses saving sequence information in a plurality of history windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”
Heaton, Wang, Holliday, and Begleiter are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Begleiter with the combination of Heaton, Wang, and Holliday to include applying a CTW algorithm to a fixed set of discrete classes mapped from an alphabet.  One would have been motivated to do so for the purpose of “extending the ctw algorithm for large alphabets” (Begleiter, Section 3.4, Paragraph 1).
The combination of Heaton, Wang, Holliday, and Begleiter fails to teach choose at least one hyperparameter for each of the plurality of history windows to allow the system to trade off bias-variance.  The combination of Heaton, Wang, Holliday, and Begleiter also fails to teach and use a deep neural network classifier map arbitrary length histories to a second alphabet having smaller length than the alphabet of the arbitrary length histories as an input sequence for the context tree weighting algorithm.
Thorhallsson teaches choose at least one hyperparameter [for each of the plurality of history windows] to allow the system to trade off bias-variance.  (Thorhallsson, Intro Para 2, discloses that “This leads to a fundamental tradeoff known as the bias-variance tradeoff which is of paramount importance for optimal choice of the hyperparameters for the learning algorithms”).  *Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history” (i.e., a plurality of history windows comprising CNNs, and CNNs are a machine learning model and therefore have a bias-variance tradeoff).
 Heaton, Wang, Holliday, Begleiter, and Thorhallsson are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Thorhallsson with the combination of Heaton, Wang, Holliday, and Begleiter to include selecting the optimal choice of hyperparameters, for which the bias-variance tradeoff is of paramount importance.  One would have been motivated to do so to “capture sophisticated relationships in the data while keeping it simple to prevent noise from affecting the outcome” (Thorhallsson, Intro, Paragraph 2).
The combination of Heaton, Wang, Holliday, Begleiter, and Thorhallsson fails to teach and use a deep neural network classifier map arbitrary length histories to a second alphabet having smaller length than the alphabet of the arbitrary length histories as an input sequence for the context tree weighting algorithm.  
Campbell teaches and use a deep neural network classifier to map [arbitrary length] histories to a second alphabet having smaller length than the alphabet of the [arbitrary length] histories [as an input sequence for the context tree weighting algorithm]. (Campbell Para [0077] discloses “The belief tracker component 408 is configured to identify what table(s) of the database and what column(s) of the tables of the database are being implicated by the user's utterance. In particular, belief tracker component 408 implements a neural network (e.g., a recurrent neural network) that is configured to map dialog history to belief states. A belief state is a distribution over user goals and dialog states (e.g., context). The output of the belief tracker is an encoding of both the current user utterance and the history of utterances of user-system utterances. The user goal is related to: one or more tables of the database and their respective columns of metadata, such as names and data types; and vocabulary of columns (e.g., slots). The belief tracker component is configured to receive the feature vector as an input from the feature extractor component 404, concatenate the feature vector with the encoded dialog history that was generated by the context encoding component 406, and produce a probability distribution vector over the columns of the multiple tables of the database 410.”  Examiner’s Note:  A recurrent neural network is a type of deep neural network.  Campbell is using this to “map” (i.e., classify) “dialog history” to “belief states”.  The “belief states” are a smaller alphabet than the “user utterances” that comprise the “dialog history”, as they are “columns of the multiple tables of the database”).  *Wang discloses arbitrary length histories with the repeating betaCNN windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.” *Begleiter discloses mapping an alphabet before using as input to a CTW algorithm:  Begleiter, Section 3.4 Paragraph 2, discloses “concatenating binary words of size k, one for each alphabet symbol” and uses this for “application of the standard binary ctw algorithm over a binary representation of the sequence”.  
 


As per Claim 11, the combination of Heaton, Wang, Holliday, Begleiter, Thorhallsson, and Campbell teaches the artificial intelligence system of claim 10 as shown above, as well as wherein a long short-term memory-based sequence to symbol method is used to map the arbitrary length histories to the minimal output symbol alphabet.  (Campbell Para [0077] discloses “The belief tracker component 408 is configured to identify what table(s) of the database and what column(s) of the tables of the database are being implicated by the user's utterance. In particular, belief tracker component 408 implements a neural network (e.g., a recurrent neural network) that is configured to map dialog history to belief states. A belief state is a distribution over user goals and dialog states (e.g., context). The output of the belief tracker is an encoding of both the current user utterance and the history of utterances of user-system utterances. The user goal is related to: one or more tables of the database and their respective columns of metadata, such as names and data types; and vocabulary of columns (e.g., slots). The belief tracker component is configured to receive the feature vector as an input from the feature extractor component 404, concatenate the feature vector with the encoded dialog history that was generated by the context encoding component 406, and produce a probability distribution vector over the columns of the multiple tables of the database 410.”  Examiner’s Note:  A recurrent neural network is a type of deep neural network.  Campbell is using this to “map” (i.e., classify) “dialog history” to “belief states”.  The “belief states” are a smaller alphabet than the “user utterances” that comprise the “dialog history”, as they are “columns of the multiple tables of the database.”  Furthermore, Campbell, Para [0071] first sentence, discloses that “dialog manager component 412 and belief tracker component 408 are trained via the supervised learning component 602.” Campbell Para [0074] then discloses that “In some embodiments of the present invention, the supervised learning component 602 can be represented using a variety of suitable techniques, such as for example, multiplayer perceptron (MLP) representation, gated recurrent unit (GRU) representation, long-short term memory (LSTM) representation, and/or a memory network representation”). 

As per Claim 12, the combination of Heaton, Wang, Holliday, Begleiter, Thorhallsson, and Campbell teaches the artificial intelligence system of claim 10 as shown above, as well as wherein the observed sensory sequence information is a binary event. (Begleiter, Section 3.1, Paragraph 1, discloses “In this section we consider the original ctw algorithm for binary alphabets”.  Heaton, Col 7 Lines 22-25, discloses that the information is sensory: “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.”).

As per Claim 13, the combination of Heaton, Wang, Holliday, Begleiter, Thorhallsson, and Campbell teaches the artificial intelligence system of claim 10 as shown above, as well as wherein the instructions cause the at least one processor to perform a temporal convolution in a deep neural network to map observed sensory sequence information from the plurality of history windows to symbols. (Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”  Wang, Section 2, defines each of the history windows as Convolutional Neural Networks (alpha-CNN and beta-CNN), which “capture the temporal structure” of the sequence.  Wang Figures 2 and 4 illustrate that each CNN makes a prediction of the next word (i.e., symbol).  Heaton, Col 11 Lines 64-67, also discloses a “computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any one of the foregoing methods.  Heaton, Col 11 Lines 64-67, also discloses a “computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any one of the foregoing methods. Heaton, Col 7 Lines 22-25, discloses that the information is sensory: “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.”).

Claims 14 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Heaton, Wang, Holliday, Begleiter, Thorhallsson, Campbell, further in view of Pan.  
As per Claim 14, the combination of Heaton, Wang, Holliday, Begleiter, Thorhallsson, and Campbell teaches the artificial intelligence system of claim 10 as shown above.  The combination of Heaton, Wang, Holliday, Begleiter, Thorhallsson, and Campbell fails to teach wherein the temporal convolution includes defining each of the plurality of history windows of events as a 2^k-by-n matrix, where 2^k is a number of time steps in each of the plurality of history windows and n is a number of events at each of the time steps, and applying a convolution that is an l-by-n matrix, where l is less than 2^k, wherein a set of the convolutions produces a new set of events.
Pan teaches wherein the [temporal] convolution includes defining each [of the plurality of history windows] of events as a (2^k)-by-n matrix, where 2^k is a number of [time steps in each of the plurality of history windows] and n is a number of events [at each of the time steps], and applying a convolution that is an l-by-n matrix, where l is less than 2^k, wherein a set of the convolutions produces a new set of events. (Pan Para [0054] discloses an “n*m feature matrix” in which one dimension “m is the quantity of sub time periods” (i.e., time steps) and the other dimension “n is the quantity of feature types” (i.e., events).  Pan Para [0060] discloses that  “The input layer inputs each sample (n*m feature matrix) to a convolutional layer for convolution.”  Pan further discloses “convolution kernel quantity and size can be specified as needed” (Examiner’s note:  a convolution kernel is a matrix) wherein “at least one of a row quantity or a column quantity of the convolution kernel can be a predetermined quantity of feature types”, which is analogous to the instant application, where “n” is the column dimension for both the 2^k-by-n matrix and the l-by-n matrix (Pan describes an “n-by-m feature matrix”, and “a size of each convolution kernel can be n*j”).  Pan further discloses that the remaining dimension of the convolution kernel must be less than the corresponding dimension of the feature matrix (“size of each convolution kernel can be n*j, where j is a positive integer less than m”).  This is analogous to the instant application, where l must be less than 2^k.  Pan Para [0060] further states that “convolutional layer can output 100s feature graphs.” (i.e. the output is a new set of events).  *Wang teaches temporal convolutions in a plurality of history windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”  Wang, Section 2, defines each of the history windows, which comprise time steps, as Convolutional Neural Networks (alpha-CNN and beta-CNN), which “capture the temporal structure” of the sequence.  
Heaton, Wang, Holliday, Begleiter, Thorhallsson, Campbell, and Pan are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Pan with the combination of Heaton, Wang, Holliday, Begleiter, Thorhallsson, and Campbell to include a convolution kernel wherein at least one of a row quantity or a column quantity of the convolution kernel can be a predetermined quantity of feature types.  One would have been motivated to do so “because feature types (i.e. events) 

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Heaton in view of Wang, Holliday, Begleiter, Thorhallsson, and Pan.  
As per Claim 15, Heaton teaches an artificial intelligence system, comprising: a computing device including at least one processor, one or more data storage devices, and a non-transitory data storage medium interfaced with the at least one processor, the non-transitory data storage medium containing instructions that, when executed cause the at least one processor to (Heaton, Col 7 Lines 29-34 discloses “learning mode”.  Heaton, Col 12 Lines 35-38, discloses a “computer system” with a “dynamic storage device” and a “processor”.  Heaton, Col 11 Lines 64-67, also discloses a “computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any one of the foregoing methods”).
save observed sensory sequence information (Heaton, Col 7 Lines 22-25, discloses that “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.”)
Heaton fails to teach that the sequence information is saved in a plurality of history windows, the plurality of history windows being reverse chronological history windows, and perform a temporal convolution in a deep neural network to map observed sensory sequence information from the plurality of history windows to symbols.  Heaton also fails to teach 
Wang teaches that the sequence information is saved in a plurality of history windows, the plurality of history windows being reverse chronological history windows. (Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history. ”   Wang, Section 2 Last Sentence, discloses “Also distinct from RNN, genCNN gains most of its processing power from the heavy-duty processing units (i.e., CNN and CNNs), which follow a bottom-up information flow and yet can adequately capture the temporal structure in word sequence with its convolutional-gating architecture.”  Examiner’s Note:  Here, alphaCNN and betaCNN are the plurality of history windows, and they are saving word sequence information. These windows go back in time progressively further, and are thus reverse chronological history windows).
and perform a temporal convolution in a deep neural network to map observed sensory sequence information from the plurality of history windows to symbols (Wang, Section 2, defines each of the history windows as Convolutional Neural Networks (alpha-CNN and beta-CNN), which “capture the temporal structure” of the sequence.  Figures 2 and 4 illustrate that each CNN makes a prediction of the next word (i.e., symbol).  Heaton discloses that the sequence information is sensory:  Heaton, Col 7 Lines 22-25, discloses that “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.”)
Heaton and Wang are analogous art because they are both in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Wang with Heaton to include saving sequence information in a plurality of history windows.  One would have been motivated to do so to “handle sentences with arbitrary length” (Wang, Section 3.3).
The combination of Heaton and Wang fails to teach wherein a size of the plurality of history windows increase exponentially from a last observed time step. The combination of Heaton and Wang also fails to teach apply a function to the observed sensory sequence information in each history window, wherein the function maps the observed sensory sequence information into a fixed set of discrete classes, fixed for all of the plurality of history windows; choose at least one parameter of the exponentially increasing history window size as a 
Holliday teaches wherein a size of the plurality of history windows increase exponentially from a last observed time step. (Holliday Para [0093]-[0094] discloses  “The numbers are placed into a Binary Exponential History. This takes the numbers for each minute, and then averages them over 12 different time periods (2 minutes, 4 minutes, 8 minutes . . . up to 4096 minutes).  Multiple Linear Regression is then used to create an approximate linear function for the number of arrivals that there will be over the next (m) minutes, given the 36 different averages for what has happened previously.”  Examiner’s Note:  Here “what has happened previously” is analogous to “from a last observed time step”). 
Heaton, Wang, and Holliday are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Holliday with the combination of Heaton and Wang saving sequence information in exponentially increasing history windows.  One would have been motivated to do so in order to include a 
The combination of Heaton, Wang, and Holliday fails to teach apply a function to the observed sensory sequence information in each history window, wherein the function maps the observed sensory sequence information into a fixed set of discrete classes, fixed for all of the plurality of history windows; apply a context tree weighting algorithm to an alphabet resulting from the fixed set of discrete classes for each of the plurality of history windows to predict a future discrete sequence. The combination of Heaton, Wang, and Holliday also fails to teach choose at least one parameter of the exponentially increasing history window size as a hyperparameter to allow the system to trade off bias-variance; wherein the temporal convolution includes defining each of the plurality of history windows of events as a 2^k-by-n matrix, where 2^k is a number of time steps in each of the plurality of history windows and n is a number of events at each of the time steps, and applying a convolution that is an l-by-n matrix, where l is less than 2^k, wherein a set of the convolutions produces a new set of events.
Begleiter teaches apply a function to the observed [sensory] sequence information [in each history window], wherein the function maps the observed [sensory] sequence information into a fixed set of discrete classes fixed set of discrete classes, fixed [for all of the plurality of history windows] (Begleiter, Section 3.4 Paragraph 2, discloses, for sequence information (“each alphabet symbol”), a function to map the information into a fixed set of discrete classes (“concatenating binary words of size k, one for each alphabet symbol”).  *Heaton discloses that the sequence information is sensory:  Heaton, Col 7 Lines 22-25, discloses that “During the manufacturing process, various sets of sequences of sensory input (e.g., default sensory sequence patterns, etc.) to detect one or more events that may occur may be saved in memory.” *Wang discloses saving sequence information in a plurality of history windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”
and apply a context tree weighting algorithm to an alphabet resulting from the fixed set of discrete classes [for each of the plurality of history windows] to predict a future discrete sequence (Begleiter, Section 3.4 Paragraph 2, discloses a “standard binary ctw algorithm over a binary representation of the sequence”.  Begleiter, Abstract, identifies CTW as a sequence prediction algorithm: “prediction algorithms, including Context Tree Weighting (CTW)”).  *Wang, discloses saving sequence information in a plurality of history windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”
Heaton, Wang, Holliday, and Begleiter are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Begleiter with the combination of Heaton, Wang, and Holliday to include applying a CTW algorithm to a fixed set of discrete classes mapped from an alphabet.  One would have been motivated to do 
The combination of Heaton, Wang, Holliday, and Begleiter fails to teach choose at least one hyperparameter for each of the plurality of history windows to allow the system to trade off bias-variance.  The combination of Heaton, Wang, Holliday, and Begleiter also fails to teach wherein the temporal convolution includes defining each of the plurality of history windows of events as a 2^k-by-n matrix, where 2^k is a number of time steps in each of the plurality of history windows and n is a number of events at each of the time steps, and applying a convolution that is an l-by-n matrix, where l is less than 2^k, wherein a set of the convolutions produces a new set of events.
Thorhallsson teaches choose at least one hyperparameter [for each of the plurality of history windows] to allow the system to trade off bias-variance.  (Thorhallsson, Intro Para 2, discloses that “This leads to a fundamental tradeoff known as the bias-variance tradeoff which is of paramount importance for optimal choice of the hyperparameters for the learning algorithms”).  *Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history” (i.e., a plurality of history windows comprising CNNs, and CNNs are a machine learning model and therefore have a bias-variance tradeoff).
 Heaton, Wang, Holliday, Begleiter, and Thorhallsson are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings 
The combination of Heaton, Wang, Holliday, Begleiter, and Thorhallsson fails to teach wherein the temporal convolution includes defining each of the plurality of history windows of events as a 2^k-by-n matrix, where 2^k is a number of time steps in each of the plurality of history windows and n is a number of events at each of the time steps, and applying a convolution that is an l-by-n matrix, where l is less than 2^k, wherein a set of the convolutions produces a new set of events.
Pan teaches wherein the [temporal] convolution includes defining each [of the plurality of history windows] of events as a (2^k)-by-n matrix, where 2^k is a number of [time steps in each of the plurality of history windows] and n is a number of events [at each of the time steps], and applying a convolution that is an l-by-n matrix, where l is less than 2^k, wherein a set of the convolutions produces a new set of events. (Pan Para [0054] discloses an “n*m feature matrix” in which one dimension “m is the quantity of sub time periods” (i.e., time steps) and the other dimension “n is the quantity of feature types” (i.e., events).  Pan Para [0060] discloses that  “The input layer inputs each sample (n*m feature matrix) to a convolutional layer for convolution.”  Pan further discloses “convolution kernel quantity and size can be specified as needed” (Examiner’s note:  a convolution kernel is a matrix) wherein “at least one of a row quantity or a column quantity of the convolution kernel can be a predetermined quantity of feature types”, which is analogous to the instant application, where “n” is the column dimension for both the 2^k-by-n matrix and the l-by-n matrix (Pan describes an “n-by-m feature matrix”, and “a size of each convolution kernel can be n*j”).  Pan further discloses that the remaining dimension of the convolution kernel must be less than the corresponding dimension of the feature matrix (“size of each convolution kernel can be n*j, where j is a positive integer less than m”).  This is analogous to the instant application, where l must be less than 2^k.  Pan Para [0060] further states that “convolutional layer can output 100s feature graphs.” (i.e. the output is a new set of events).  *Wang teaches temporal convolutions in a plurality of history windows:  Wang, Section 2 Sentence 1, discloses “As shown in Figure 1, genCNN is overall recursive, consisting of CNN-based processing units of two types:  alphaCNN as the ‘front-end’, dealing with the history that is closest to the prediction; betaCNNs (which can repeat), in charge of more ‘ancient’ history.”  Wang, Section 2, defines each of the history windows, which comprise time steps, as Convolutional Neural Networks (alpha-CNN and beta-CNN), which “capture the temporal structure” of the sequence.  
Heaton, Wang, Holliday, Begleiter, Thorhallsson, and Pan are analogous art because they are all in the field of machine learning.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Pan with the combination of Heaton, Wang, Holliday, Begleiter, and Thorhallsson to include a convolution kernel wherein at least one of a row quantity or a column quantity of the convolution kernel can be a predetermined quantity of feature types.  One would have been motivated to do so “because feature types (i.e. events) are not continuous, the 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710.  The examiner can normally be reached on M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/L.A.S./Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126