DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Acknowledgement is made of Applicant’s claim amendments on 12/09/2019. The claim amendments are entered. Presently, claims 1-20 remain pending. Claims 1, 8, and 15 have been amended.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 8, and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant argues: The instant claims provide a methodology for reducing memory consumption related to machine learning processing of time series data. The Examiner suggests that consuming a portion of memory storage, where a size of the portion is based on the generated feature matrix is an insignificant extra-solution activity. However, as stated in the specification, the reduced memory consumption is an improvement over related systems (see, for example, paragraphs [0052] - [0054], and [0072])). The claims described an improvement over related technology, and are therefore integrated into a practical application and thus patent eligible.
Examiner response: Examiner respectfully disagrees. Consuming a portion of memory is directed to mere data gathering (storing data) which is understood to be insignificant where a size of the portion is based on a product of a number of time series in the set and a number of random series” is understood to be a field of use limitation. See MPEP 2106.05(h).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 8, and 15 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claims 1, 8, and 15 recite the limitation “training, by the processor system, one or more machine learning models to use the feature matrix as an input for predicting a relationship”. Previously, the limitation recited training one or more machine learning models using the feature matrix.  The current amendment suggests the machine learning models are trained to use the feature matrix to make a prediction.  It is unclear how the machine learning model is trained to use the feature matrix for predicting. Claims 2-7, 9-14, and 16-20 are dependent claims that do not cure the deficiencies and are rejected for the same reasons. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:



Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to and abstract idea without significantly more. 
When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g., mathematical concepts, mental processes, certain methods of organizing human activity). If it is determined in Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the second prong (Step 2A, Prong 2), where it is determined whether or not the claims integrate the judicial exception into a practical application. If it is determined at step 2A, Prong 2 that the claims do not integrate the judicial exception into a practical application, the analysis proceeds to determining whether the claim is a patent-eligible application of the exception (Step 2B). If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim integrates the judicial exception into a practical application, or else amounts to significantly more than the abstract idea itself. Applicant is advised to consult the 2019 PEG for more details of the analysis.
Step 1


Claim 1 recites: 
Step 2A, Prong 1
“A computer-implemented method for performing unsupervised time-series feature learning comprising: generating, by a processor system, a set of reference time-series data of random lengths, wherein each random length is uniformly sampled from a predetermined minimum length to a predetermined maximum length, and wherein values of each reference time-series in the set are drawn from a distribution;” Save for the recitation of generic computer equipment (“processor”), this step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can assign a sequence from a time series with random lengths within a certain range to a time series data reference set. The values of the time series data reference set can come from a mean or median of previous time series data set.
“generating, by the processor system, a feature matrix for raw time-series data based on a set of computed distances between the generated set of reference time-series and the raw time-series data;” Save for the recitation of 
“wherein generating the feature matrix comprises approximating a positive definite kernel by transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of random series of length D on a randomized feature map to reduce computational complexity” This step appears to recite mathematical computations and is understood to be a mathematical concept that can be accomplished by a human with the aid of paper and pencil.
Step 2A, Prong 2
“a processor system” The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
 “consuming a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of random series” (This step appears to be directed to storing data, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g). The specification “where a size of the portion is based on a product” is understood as a is understood to be a field of use limitation. See MPEP 2106.05(h).);
 “training, by the processor system, one or more machine learning models to use the feature matrix as an input for predicting a relationship between reference time-series data and the raw time series data” (This claim 
Step 2B
“a processor system” The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
“training, by the processor system, one or more machine learning models to use the feature matrix as an input for predicting a relationship between reference time-series data and the raw time series data” (This claim element is considered a form of insignificant computer implementation, which is directed to post-solution activity for use in a claimed process. See MPEP 2106.05(g). The additional element of training a machine learning model does not add a meaningful limitation to the claim, and hence does not integrate the mental process into a practical application. The specification “to use the feature matrix for prediction” is understood to be a field of use limitation. See MPEP 2106.05(h). This claim element is a well-known, understood, routine, and conventional activity of training a generic machine learning model using a feature matrix as evidenced by Liu et al. “Multi-Task Feature Learning Via Efficient 2,1-Norm Minimization” (2012). Liu teaches “Multi-task learning [1, 2, 3, 7, 17, 18, 19, 26, 28] has recently received increasing attention in machine learning, artificial intelligence, and computer vision… Argyriou et al. [1] assumed that the tasks share a small subset of features (via the feature matrix), and formulated the problem as the squared 2,1-norm regularized non-convex optimization problem. If the feature matrix is not learned (set to an identity matrix), the problem reduces to the squared 2,1-norm regularized convex optimization problem, which is similar to the one proposed in [17, 18].”)
 “consuming a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of random series” (This step appears to be directed to storing data, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g). The specification “where a size of the portion is based on a product” is understood to be a field of use limitation. See MPEP 2106.05(h).);
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 2 recites:
Step 2A, Prong 1
“wherein the distribution is a predetermined random distribution.” This claim appears to recite a mathematical concept.
Step 2A, Prong 2
	This claims does not appear to recite any additional elements.
Step 2B
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 3 recites:
Step 2A, Prong 1
“wherein the distribution is a probability distribution of the raw time-series data.” This claim appears to recite a mathematical concept.
Step 2A, Prong 2
	This claims does not appear to recite any additional elements.
Step 2B
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 4 recites: 
Step 2A, Prong 1
“wherein the processor system is a two-party protocol system comprising a first-party component and a second-party component, wherein the first-party component is configured to generate the probability disruption from the raw time-series data” This claim appears to recite a mathematical concept.
“generate the feature matrix based on the set of generated set of reference time-series” This claim appears to recite a mathematical concept.
Step 2A, Prong 2
“transmit the probability distribution of the raw time-series data to the second-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated set of reference time-series from the second-party component,” (This step appears to be directed to transmitting or receiving 
“and transmit the generated feature matrix to the second-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
Step 2B
“transmit the probability distribution of the raw time-series data to the second-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated set of reference time-series from the second-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“and transmit the generated feature matrix to the second-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 5 recites:
Step 2A, Prong 1
YOR820170089US01 Page 32 of 38 	“generate the set of reference time-series” This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process.
Step 2A, Prong 2
“wherein the second-party component is configured to receive the probability distribution from the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“transmit the set of reference time- series to the first-party component, receive the generated feature matrix from the first party-component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“provide the feature matrix as the input to the one or more machine learning models,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“and transmit results from the machine learning models to the first-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
Step 2B
“wherein the second-party component is configured to receive the probability distribution from the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“transmit the set of reference time- series to the first-party component, receive the generated feature matrix from the first party-component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“provide the feature matrix as the input to the one or more machine learning models,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“and transmit results from the machine learning models to the first-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 6 recites:
Step 2A, Prong 1
	“wherein generating the feature matrix includes: computing, by the processor system, a set of distance vectors between the raw time-series data and the set of generated reference time-series;” Save for the recitation of generic computer equipment (“processor”), this claim appears to recite a mathematical concept.
	“translating, by the processor system, the distance vectors into similarity vectors;” Save for the recitation of generic computer equipment (“processor”), this claim appears to recite a mathematical concept.
	“and concatenating, by the processor system, the similarity vectors to generate the feature matrix.” Save for the recitation of generic computer equipment (“processor”), this claim appears to recite a mathematical concept.
Step 2A, Prong 2
“the processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
Step 2B
“the processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 7 recites:
Step 2A, Prong 1
“wherein generating the feature matrix includes: computing, by the processor system, a set of feature vectors between the raw time-series data and the set of generated reference time-series using dynamic time warping” Save for the recitation of generic computer equipment (“processor”), this claim appears to recite a mathematical concept.
“and concatenating, by the system, the feature vectors to generate the feature matrix.” Save for the recitation of generic computer equipment (“processor”), this claim appears to recite a mathematical concept.
Step 2A, Prong 2
“the processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
Step 2B
“the processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 8 recites:
Step 2A, Prong 1
“A computer program product for performing unsupervised time-series feature learning, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage YOR820170089US01 Page 33 of 38medium is not a transitory signal per se, the program instructions executable by a processor system to cause the processor system to perform a method comprising: generating, by the processor system, a set of reference time-series of random lengths, wherein each length is uniformly sampled from a predetermined minimum length to a predetermined maximum length, and wherein values of each reference time-series in the set are drawn from a distribution;” Save for the recitation of generic 
“generating, by the processor system, a feature matrix for raw time-series data based on a set of computed distances between the generated set of reference time-series and the raw time-series data;” Save for the recitation of generic computer equipment (“processor”), this step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can compute a distance between two time series data and store the values in a matrix with the help of paper and pen.
“wherein generating the feature matrix comprises approximating a positive definite kernel by transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of random series of length D on a randomized feature map to reduce computational complexity” This step appears to recite mathematical computations and is understood to be a mathematical concept that can be accomplished by a human with the aid of paper and pencil.
Step 2A, Prong 2
“A computer program product for performing unsupervised time-series feature learning, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage YOR820170089US01 Page 33 of 38medium is not a transitory signal per se, the program instructions executable by a processor system to cause the processor system to perform a method comprising” (The “computer program product “, “computer readable storage medium having program instructions”, and “processor system” are understood to be generic computer equipment. See MPEP 2106.05(f).).
 “consuming a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of random series” (This step appears to be directed to storing data, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g). The specification “where a size of the portion is based on a product” is understood to be a field of use limitation. See MPEP 2106.05(h).);
“training, by the processor system, one or more machine learning models to use the feature matrix as an input for predicting a relationship between reference time-series data and the raw time series data” (This claim element is considered a form of insignificant computer implementation, which is directed to post-solution activity for use in a claimed process. See MPEP 2106.05(g). The additional element of training a machine learning model does not add a meaningful limitation to the claim, and hence does not integrate the mental process into a practical application. The specification “to use the feature matrix for prediction” is understood to be a field of use limitation. See MPEP 2106.05(h).)
Step 2B
“A computer program product for performing unsupervised time-series feature learning, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage YOR820170089US01 Page 33 of 38medium is not a transitory signal per se, the program instructions executable by a processor system to cause the processor system to perform a method comprising” (The “computer program product “, “computer readable storage medium having program instructions”, and “processor system” are understood to be generic computer equipment. See MPEP 2106.05(f).).
 “consuming a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of random series” (This step appears to be directed to storing data, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g). The specification “where a size of the portion is based on a product” is understood to be a field of use limitation. See MPEP 2106.05(h).);
 “training, by the processor system, one or more machine learning models to use the feature matrix as an input for predicting a relationship between reference time-series data and the raw time series data” (This claim element is considered a form of insignificant computer implementation, which is directed to post-solution activity for use in a claimed process. See MPEP 2106.05(g). The additional element of training a machine learning model does not add a meaningful limitation to the claim, and hence does not integrate the mental process Liu et al. “Multi-Task Feature Learning Via Efficient 2,1-Norm Minimization” (2012). Liu teaches “Multi-task learning [1, 2, 3, 7, 17, 18, 19, 26, 28] has recently received increasing attention in machine learning, artificial intelligence, and computer vision… Argyriou et al. [1] assumed that the tasks share a small subset of features (via the feature matrix), and formulated the problem as the squared 2,1-norm regularized non-convex optimization problem. If the feature matrix is not learned (set to an identity matrix), the problem reduces to the squared 2,1-norm regularized convex optimization problem, which is similar to the one proposed in [17, 18].”)
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 9 recites:
Step 2A, Prong 1
“wherein the distribution is a predetermined random distribution.” This claim appears to recite a mathematical concept.
Step 2A, Prong 2
This claim does not appear to recite any additional elements.
Step 2B

Claim 10 recites:
Step 2A, Prong 1
           “wherein the distribution is a probability distribution of the raw time-series data.” This claim appears to recite a mathematical concept.
Step 2A, Prong 2
This claim does not appear to recite any additional elements.
Step 2B
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 11 recites:
Step 2A, Prong 1
“wherein the processor system is a two-party protocol system comprising a first-party component and a second-party component, wherein the first-party component is configured to generate the probability disruption from the raw time-series data,” Save for the recitation of generic computer equipment (“processor system”), this claim appears to recite a mathematical concept.
“generate the feature matrix based on the set of generated set of reference time-series,” Save for the recitation of generic computer equipment (“processor system”), this claim appears to recite a mathematical concept.
Step 2A, Prong 2
“processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
“transmit the probability distribution of the raw time-series data to the second-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated set of reference time-series from the second-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“and transmit the generated feature matrix to the second-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
Step 2B
“processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
“transmit the probability distribution of the raw time-series data to the second-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated set of reference time-series from the second-party component,” (This step appears to be directed to transmitting or receiving .
“and transmit the generated feature matrix to the second-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 12 recites:
Step 2A, Prong 1
“generate the set of reference time-series” This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process.
Step 2A, Prong 2
“wherein the second-party component is configured to receive the probability distribution from the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“transmit the set of reference time-series to the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated feature matrix from the first party- component,” .
“provide the feature matrix as the input to the one or more machine learning models,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“and transmit results from the machine learning models to the first-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
Step 2B
“wherein the second-party component is configured to receive the probability distribution from the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“transmit the set of reference time-series to the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated feature matrix from the first party- component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“provide the feature matrix as the input to the one or more machine learning models,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“and transmit results from the machine learning models to the first-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 13 recites:
Step 2A, Prong 1
“computing, by the processor system, a set of distance vectors between the raw time-series data and the set of generated reference time-series;” Save for the recitation of generic computer equipment (“processor system”), this claim appears to recite a mathematical concept.
“translating, by the processor system, the distance vectors into similarity vectors;” Save for the recitation of generic computer equipment (“processor system”), this claim appears to recite a mathematical concept.
“and concatenating, by the processor system, the similarity vectors to generate the feature matrix.” Save for the recitation of generic computer equipment (“processor system”), this claim appears to recite a mathematical concept.
Step 2A, Prong 2
“the processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
Step 2B
“the processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 14 recites:
Step 2A, Prong 1
“computing, by the processor system, a set of feature vectors between the raw time-series data and the set of generated reference time-series using dynamic time warping;” Save for the recitation of generic computer equipment (“processor system”), this claim appears to recite a mathematical concept.
“and concatenating, by the processor system, the feature vectors to generate the feature matrix.” Save for the recitation of generic computer equipment (“processor system”), this claim appears to recite a mathematical concept.
Step 2A, Prong 2
“the processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
Step 2B
“the processor system” (The “processor system” is understood to be generic computer equipment. See MPEP 2106.05(f).).

Claim 15 recites:
Step 2A, Prong 1
“A system for performing unsupervised time-series feature learning, the system comprising one or more processors configured to perform a method comprising: generating, by the system, a set of reference time-series of random lengths, wherein each length is uniformly sampled from a predetermined minimum length to a YOR820170089US01 Page 35 of 38predetermined maximum length, and wherein values of each reference time-series in the set are drawn from a distribution;” Save for the recitation of generic computer equipment (“system and processor”), this step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can assign a sequence from a time series with random lengths within a certain range to a time series data reference set. The values of the time series data reference set can come from a mean or median of previous time series data set.
“generating, by the system, a feature matrix for raw time-series data based on a set of computed distances between the generated set of reference time-series and the raw time-series data;” Save for the recitation of generic computer equipment (“system”), this step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process. A human can compute a distance between two time series data and store the values in a matrix with the help of paper and pen.
“wherein generating the feature matrix comprises approximating a positive definite kernel by transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of random series of length D on a randomized feature map to reduce computational complexity” This step appears to recite mathematical computations and is understood to be a mathematical concept that can be accomplished by a human with the aid of paper and pencil.
Step 2A, Prong 2
“A system for performing unsupervised time-series feature learning, the system comprising one or more processors configured to perform a method comprising:” (The “processor and system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
“consuming, by the system, a portion of memory storage, where a size of the portion is based a product of a number of time series in the set and a number of random series” (This step appears to be directed to storing data, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g). The specification “where a size of the portion is based on a product” is understood to be a field of use limitation. See MPEP 2106.05(h).);
“training, by the system, one or more machine learning models to use the feature matrix as an input for predicting a relationship between reference time-series data and the raw time series data” (This claim element is considered a form of insignificant computer implementation, which is directed to post-solution activity for use in a claimed process. See MPEP 2106.05(g). The additional element 
Step 2B
“A system for performing unsupervised time-series feature learning, the system comprising one or more processors configured to perform a method comprising:” (The “processor and system” is understood to be generic computer equipment. See MPEP 2106.05(f).).
“consuming, by the system, a portion of memory storage, where a size of the portion is based a product of a number of time series in the set and a number of random series” (This step appears to be directed to storing data, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g). The specification “where a size of the portion is based on a product” is understood to be a field of use limitation. See MPEP 2106.05(h).);
“training, by the system, one or more machine learning models to use the feature matrix as an input for predicting a relationship between reference time-series data and the raw time series data” (This claim element is considered a form of insignificant computer implementation, which is directed to post-solution activity for use in a claimed process. See MPEP 2106.05(g). The additional element of training a machine learning model does not add a meaningful limitation to the claim, and hence does not integrate the mental process into a practical application. The specification “to use the feature matrix for prediction” is understood to be a field Liu et al. “Multi-Task Feature Learning Via Efficient 2,1-Norm Minimization” (2012). Liu teaches “Multi-task learning [1, 2, 3, 7, 17, 18, 19, 26, 28] has recently received increasing attention in machine learning, artificial intelligence, and computer vision… Argyriou et al. [1] assumed that the tasks share a small subset of features (via the feature matrix), and formulated the problem as the squared 2,1-norm regularized non-convex optimization problem. If the feature matrix is not learned (set to an identity matrix), the problem reduces to the squared 2,1-norm regularized convex optimization problem, which is similar to the one proposed in [17, 18].”)
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 16 recites:
Step 2A, Prong 1
“wherein the distribution is a predetermined random distribution.” This claim appears to recite a mathematical concept.
Step 2A, Prong 2
This claim does not appear to recite any additional elements.
Step 2B
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 17 recites:
Step 2A, Prong 1
“wherein the distribution is a probability distribution of the raw time-series data.” This claim appears to recite a mathematical concept.
Step 2A, Prong 2
This claim does not appear to recite any additional elements.
Step 2B
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 18 recites:
Step 2A, Prong 1
“wherein the system is a two-party protocol system comprising a first-party component and a second-party component, wherein the first- party component is configured to generate the probability disruption from the raw time-series data,” This claim appears to recite a mathematical concept.
“generate the feature matrix based on the set of generated set of reference time-series,” This claim appears to recite a mathematical concept.
Step 2A, Prong 2
“transmit the probability distribution of the raw time-series data to the second- party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated set of reference time-series from the second- party component,” (This step appears to be directed to transmitting or receiving 
“and transmit the generated feature matrix to the second-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
Step 2B
“transmit the probability distribution of the raw time-series data to the second- party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated set of reference time-series from the second- party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“and transmit the generated feature matrix to the second-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 19 recites:
Step 2A, Prong 1
“generate the set of reference time-series,” This step appears to be practically implementable in the human mind and is understood to be a recitation of a mental process.
Step 2A, Prong 2
“wherein the second-party component is configured to receive the probability distribution from the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“transmit the set of reference time-series to the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated feature matrix from the first party-component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“provide the feature matrix as the input to the one or more machine learning models,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“and transmit results from the machine learning models to the first-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
Step 2B
“wherein the second-party component is configured to receive the probability distribution from the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“transmit the set of reference time-series to the first-party component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“receive the generated feature matrix from the first party-component,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“provide the feature matrix as the input to the one or more machine learning models,” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
“and transmit results from the machine learning models to the first-party component.” (This step appears to be directed to transmitting or receiving information, which is understood to be insignificant extra-solution activity. See MPEP 2106.05(g).).
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
Claim 20 recites:
Step 2A, Prong 1
“computing, by the system, a set of feature vectors between the raw time-series data and the set of generated reference time-series using dynamic time warping;” Save for the recitation of generic computer equipment (“system comprising a processor”), this claim appears to recite a mathematical concept.
“concatenating, by the system, the feature vectors to generate the feature matrix.” Save for the recitation of generic computer equipment (“system comprising a processor”), this claim appears to recite a mathematical concept.
Step 2A, Prong 2
“the processor system” (The “system comprising a processor” is understood to be generic computer equipment. See MPEP 2106.05(f).).
Step 2B
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

Claims 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Baydogan et al. ("A bag-of-features framework to classify time series.") in view of Lei et al. ("A study on the dynamic time warping in kernel machines."; hereinafter Lei), Avron et al. (US-20150331835-A1; hereinafter Avron), ANGUERA MIRO (US-20140195474-A1; hereinafter ANGUERA MIRO), and Ko et al. ("Using dynamic time warping for online temporal fusion in multisensor systems.").
Regarding Claim 1,
Baydogan teaches a computer-implemented method for performing unsupervised time-series feature learning comprising: 
generating, by a processor system, a set of reference time-series data (pg. 2797 col. 1 paragraph 6; A univariate time series, xn ¼ ðxn 1 ; xn 2 ; ... ; xn T Þ is an ordered set of T values.) of random lengths (pg. 2797 col. 2 paragraph 2; Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence.), wherein each random length is uniformly sampled from a predetermined minimum length (pg. 2797 col. 2 paragraph 3; We set a lower bound on the subsequence length lðminÞ as a proportion z ð0 < z 1Þ of the length of the time series. Thus, ls lmin ¼ z  T.)…, and wherein values of each reference time-series in the set are drawn from a distribution (pg. 2797 col. 2 paragraph 4; Interval features fkðt1; t2Þ, ð0 < t1  t2 TÞ for k ¼ 1; 2; . . .;K, are extracted and combined to represent a subsequence. For each interval, the slope of the fitted regression line, mean of the values, and variance of the values are extracted. These features provide information about the shape, level, and distribution of the values.); 
training, by the processor system, one or more machine learning models (pg. 2796 col. 2 paragraph 4; Moreover, we learn the bag-of-features representation by training a classifier on the interval feature,  to use the feature matrix (pg. 2797 col. 1 paragraph 2; We form a matrix of these features, but the value in row i and row j of the same column may be calculated from subsequences that differ in location and/or length. We further partition subsequences into intervals to detect patterns represented by a series of values over shorter time segments.) as an input for predicting a relationship between reference time-series data and the raw time series data (pg. 2797, section 2.1; We consider subsequences of fixed and random lengths. The random length subsequences can potentially detect patterns that appear with different lengths and be split across the time points [43]. Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence. This allows splits in tree-based models to be based on the features from different length intervals ws ¼ ls=d. Therefore, the relationships of patterns with different lengths can be better captured.);
Baydogan does not explicitly disclose
…to a predetermined maximum length
generating, by the processor system, a feature matrix for raw time-series data based on a set of computed distances between the generated set of reference time-series and the raw time-series data, wherein generating the feature matrix comprises 
consuming a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of random series;
However, Lei (“A Study on the Dynamic Time Warping in Kernel Machines”) teaches
…predetermined maximum length (pg. 840, col. 1; However, the time complexity for a single DTW calculation is O(w ∗ n) where n is the length of sequence and w is the width of band restriction.)
generating, by the processor system, a feature matrix for raw time-series data based on a set of computed distances between the generated set of reference time-series and the raw time-series data (pg. 840; To compute the DTW distance Dι(x, y) with x = [x1, x2, ··· , xn] and y = [y1, y2, ··· , ym], we can first construct an n-by-m matrix, as shown in Fig. 1.), wherein generating the feature matrix comprises approximating a positive definite kernel (pg. 841; Let X be a non-empty set. A kernel function k : X×X → is a Positive Definite Symmetric (PDS)).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine Baydogan’s method of dynamic time warping with Lei’s method of dynamic time warping.
Doing so would allow for improved performance (pg. 839; The Dynamic Time Warping (DTW) is state-of-the-art distance measure widely used in sequential pattern matching and it outperforms Euclidean distance in most cases because its matching is elastic and robust.)
Avron (US 20150331835 A1) teaches
by transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of random series of length D on a randomized feature map (para [0062] The goal in random features maps is to construct randomized feature maps .PSI.(.cndot.) so that the Euclidean inner product <.PSI.P(u),.PSI.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method of measuring a Euclidean distance of Baydogon with the method of measuring a Euclidean distance of Avron
Doing so would allow for constructing randomized feature maps (para [0062] The goal in random features maps is to construct randomized feature maps)
ANGUERA MIRO (US 20140195474 A1) teaches 
transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of … series of length D on a … feature map (para [0088] Any well known distance definition could be used, for example the Euclidean distance or the inner product similarity, but other distances are possible (for example, a possible distance definition is 
d ( q i , r j ) = - log ( q i r j q i r j ) ##EQU00003## 
where the dot is the inner product between the vectors, and the .parallel. is the module).) to reduce computational complexity (Abs. They use an improved algorithm partially based in Dynamic Time Warping and Information Retrieval techniques, but solving the problems (as computational complexity, memory requirements . . . ) observed in these matching techniques.);
consuming a portion of memory storage, where a size of the portion is based on the generated feature matrix (para [0023] Matrix memory requirements: All DTW-inspired algorithms that we are aware of need to store the similarities/distances between all points in both time series in a matrix structure in order to later apply some matching techniques to it in order to find possible matching paths. This limits the total amount of data that can be processed at once in the system as a matrix structure requires a minimum of N.times.M memory locations (where N and M are the number of points in both time series).);
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method of dynamic time warping of Baydogon with the method of dynamic time warping of ANGUERA MIRO.
Doing so would allow for less memory consumption (para [0016] These techniques require much less memory than DTW to process (as they work at the vector level, not with matrices of similarities between points) but can not find matching patterns that are related through non-linear alignments (i.e. warping) between the series.).
While Baydogan a complexity (pg. 2799-2800, section 3.2The overall computational complexity of our algorithm is mainly due to RF sub. The time complexity of building a single tree in RF sub is Oð ffiffiffi p log Þ, where ¼ K  d þ L is the number of features extracted from each subsequence and is the number of training instances for RF sub (equals N  ðr dÞ). The smaller z is, the more subsequences are generated, but with fewer features for each.) using random time series (pg. 2797 col. 2 paragraph 2; Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence.) Baydogan does not explicitly disclose consuming a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of random series;
However, Ko teaches
consuming a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of … series (pg. 372, section 2; Suppose we have a class sequence ðCðiÞÞI i¼1 of length I and a test sequence ðT ðjÞÞJ j¼1 of length J, with CðiÞ 2 R and T ðjÞ 2 R. To measure the similarity between these two sequences, an I · J distance table D is constructed, where d(i,j) is the local distance between C(i) and T(j) as depicted in Fig. 1. Typically, the Euclidean distance is used to measure these local distances, thus d(i,j)=(C(i) T(j))2… Finally, the recursion terminates when i = I and j = J, and the time and space complexity of this dynamic programming approach is O(IJ).);
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the dynamic time warping method of Baydogan with the dynamic time warping method of Ko.
	Doing so would allow for improved computational speed (pg. 373, section 3.2; The DTW recognizer can perform temporal fusion on raw data, but by performing fusion on features extracted from the data, both classification performance and computational speed may be improved.)

Regarding Claim 8, 
Baydogan teaches a computer program product for performing unsupervised time- series feature learning, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor system to cause the processor system to perform a method comprising: AMENDMENT AND RESPONSE TO NON-FINAL OFFICE ACTIONPage 4 of 12 Serial Number: 15/595,221 
generating, by the processor system, a set of reference time-series (pg. 2796; A univariate time series, xn ¼ ðxn 1 ; xn 2 ; ... ; xn T Þ is an ordered set of T values.) of random lengths (pg. 2797 Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence.), wherein each length is uniformly sampled from a predetermined minimum length (pg. 2797 We set a lower bound on the subsequence length lðminÞ as a proportion z ð0 < z 1Þ of the length of the time series. Thus, ls lmin ¼ z  T.)…, and wherein values of each reference time-series in the set are drawn from a distribution (pg. 2797; Interval features fkðt1; t2Þ, ð0 < t1  t2 TÞ for k ¼ 1; 2; . . .;K, are extracted and combined to represent a subsequence. For each interval, the slope of the fitted regression line, mean of the values, and variance of the values are extracted. These features provide information about the shape, level, and distribution of the values.); 
training, by the processor system, one or more machine learning models (pg. 2796 col. 2 paragraph 4; Moreover, we learn the bag-of-features representation by training a classifier on the interval feature,) to use the feature matrix (pg. 2797 col. 1 paragraph 2; We form a matrix of these features, but the value in row i and row j of the same column may be calculated from subsequences that differ in location and/or length. We further partition subsequences into intervals to detect patterns represented by a series of values over shorter time segments.) as an input for predicting a relationship between reference time-series data and the raw time series data (pg. 2797, section 2.1; We consider subsequences of fixed and random lengths. The random length subsequences can potentially detect patterns that appear with different lengths and be split across the time points [43]. Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence. This allows splits in tree-based models to be based on the features from different length intervals ws ¼ ls=d. Therefore, the relationships of patterns with different lengths can be better captured.).
Baydogan does not explicitly disclose
…to a predetermined maximum length
generating, by the processor system, a feature matrix for raw time-series data based on a set of computed distances between the generated set of reference time-series and the raw time-series data, wherein generating the feature matrix comprises approximating a positive definite kernel by transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of random series of length D on a randomized feature map to reduce computational complexity;
a product of a number of time series in the set and a number of random series; However, Lei teaches 
…to a predetermined maximum length (pg. 840, col. 1; However, the time complexity for a single DTW calculation is O(w ∗ n) where n is the length of sequence and w is the width of band restriction.).
generating, by the processor system, a feature matrix for raw time-series data based on a set of computed distances between the generated set of reference time-series and the raw time-series data (pg. 840; To compute the DTW distance Dι(x, y) with x = [x1, x2, ··· , xn] and y = [y1, y2, ··· , ym], we can first construct an n-by-m matrix, as shown in Fig. 1.), 
wherein generating the feature matrix comprises approximating a positive definite kernel (pg. 841; Let X be a non-empty set. A kernel function k : X×X → is a Positive Definite Symmetric (PDS)).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine Baydogan’s method of dynamic time warping with Lei’s method of dynamic time warping.
Doing so would allow for improved performance (pg. 839; The Dynamic Time Warping (DTW) is state-of-the-art distance measure widely used in sequential pattern matching and it outperforms Euclidean distance in most cases because its matching is elastic and robust.)
Avron teaches
(para [0062] The goal in random features maps is to construct randomized feature maps .PSI.(.cndot.) so that the Euclidean inner product <.PSI.P(u),.PSI.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method of measuring a Euclidean distance of Baydogon with the method of measuring a Euclidean distance of Avron
Doing so would allow for constructing randomized feature maps (para [0062] The goal in random features maps is to construct randomized feature maps).
ANGUERA MIRO (US 20140195474 A1) teaches 
transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of … series of length D on a … feature map (para [0088] Any well known distance definition could be used, for example the Euclidean distance or the inner product similarity, but other distances are possible (for example, a possible distance definition is 
d ( q i , r j ) = - log ( q i r j q i r j ) ##EQU00003## 
where the dot is the inner product between the vectors, and the .parallel. is the module).) to reduce computational complexity (Abs. They use an improved algorithm partially based in Dynamic Time Warping and Information Retrieval techniques, but solving the problems (as computational complexity, memory requirements . . . ) observed in these matching techniques.);
consuming a portion of memory storage, where a size of the portion is based on the generated feature matrix (para [0023] Matrix memory requirements: All DTW-inspired algorithms that we are aware of need to store the similarities/distances between all points in both time series in a matrix structure in order to later apply some matching techniques to it in order to find possible matching paths. This limits the total amount of data that can be processed at once in the system as a matrix structure requires a minimum of N.times.M memory locations (where N and M are the number of points in both time series).);
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method of dynamic time warping of Baydogon with the method of dynamic time warping of ANGUERA MIRO.
Doing so would allow for less memory consumption (para [0016] These techniques require much less memory than DTW to process (as they work at the vector level, not with matrices of similarities between points) but can not find matching patterns that are related through non-linear alignments (i.e. warping) between the series.).
While Baydogan a complexity (pg. 2799-2800, section 3.2The overall computational complexity of our algorithm is mainly due to RF sub. The time complexity of building a single tree in RF sub is Oð ffiffiffi p log Þ, where ¼ K  d þ L is the number of features extracted from each subsequence and is the number of training instances for RF sub (equals N  ðr dÞ). The smaller z is, the more subsequences are generated, but with fewer features for each.) using random time series (pg. 2797 col. 2 paragraph 2; Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence.) Baydogan does not explicitly a product of a number of time series in the set and a number of random series;
However, Ko teaches
consuming a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of … series (pg. 372, section 2; Suppose we have a class sequence ðCðiÞÞI i¼1 of length I and a test sequence ðT ðjÞÞJ j¼1 of length J, with CðiÞ 2 R and T ðjÞ 2 R. To measure the similarity between these two sequences, an I · J distance table D is constructed, where d(i,j) is the local distance between C(i) and T(j) as depicted in Fig. 1. Typically, the Euclidean distance is used to measure these local distances, thus d(i,j)=(C(i) T(j))2… Finally, the recursion terminates when i = I and j = J, and the time and space complexity of this dynamic programming approach is O(IJ).);
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the dynamic time warping method of Baydogan with the dynamic time warping method of Ko.
	Doing so would allow for improved computational speed (pg. 373, section 3.2; The DTW recognizer can perform temporal fusion on raw data, but by performing fusion on features extracted from the data, both classification performance and computational speed may be improved.)

Regarding Claim 15,

generating, by the system, a set of reference time-series (pg. 2797 col. 1 paragraph 6; A univariate time series, xn ¼ ðxn 1 ; xn 2 ; ... ; xn T Þ is an ordered set of T values.) of random lengths (pg. 2797 col. 2 paragraph 2; Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence.), wherein each length is uniformly sampled from a predetermined minimum length (pg. 2797 col. 2 paragraph 3; We set a lower bound on the subsequence length lðminÞ as a proportion z ð0 < z 1Þ of the length of the time series. Thus, ls lmin ¼ z  T.)…, and wherein values of each reference time-series in the set are drawn from a distribution (pg. 2797 col. 2 paragraph 4; Interval features fkðt1; t2Þ, ð0 < t1  t2 TÞ for k ¼ 1; 2; . . .;K, are extracted and combined to represent a subsequence. For each interval, the slope of the fitted regression line, mean of the values, and variance of the values are extracted. These features provide information about the shape, level, and distribution of the values.); 
training, by the system, one or more machine learning models (pg. 2796 col. 2 paragraph 4; Moreover, we learn the bag-of-features representation by training a classifier on the interval feature,) to use the feature matrix (pg. 2797 col. 1 paragraph 2; We form a matrix of these features, but the value in row i and row j of the same column may be calculated from subsequences that differ in location and/or length. We further partition subsequences into intervals to detect patterns represented by a series of values over shorter time segments.) as an input for predicting a relationship between reference time-series data and the raw time series data (pg. 2797, section 2.1; We consider subsequences of fixed and random lengths. The random length subsequences can potentially detect patterns that appear with different lengths and be split across the time points [43]. Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence. This allows splits in tree-based models to be based on the features from different length intervals ws ¼ ls=d. Therefore, the relationships of patterns with different lengths can be better captured.);
Baydogan does not explicitly disclose 
…to a predetermined maximum length,
generating, by the system, a feature matrix for raw time-series data based on a set of computed distances between the generated set of reference time-series and the raw time-series data, wherein generating the feature matrix comprises approximating a positive definite kernel  by transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of random series of length D on a randomized feature map to reduce AMENDMENT AND RESPONSE TO NON-FINAL OFFICE ACTIONPage 6 of 12 Serial Number: 15/595,221computational complexity; and
 consuming, by the system, a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of random series;
However, Lei teaches
pg. 840, col. 1; However, the time complexity for a single DTW calculation is O(w ∗ n) where n is the length of sequence and w is the width of band restriction.).
generating, by the system, a feature matrix for raw time-series data based on a set of computed distances between the generated set of reference time-series and the raw time-series data (pg. 840; To compute the DTW distance Dι(x, y) with x = [x1, x2, ··· , xn] and y = [y1, y2, ··· , ym], we can first construct an n-by-m matrix, as shown in Fig. 1.), wherein generating the feature matrix comprises approximating a positive definite kernel (pg. 841; Let X be a non-empty set. A kernel function k : X×X → is a Positive Definite Symmetric (PDS))
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine Baydogan’s method of dynamic time warping with Lei’s method of dynamic time warping.
Doing so would allow for improved performance (pg. 839; The Dynamic Time Warping (DTW) is state-of-the-art distance measure widely used in sequential pattern matching and it outperforms Euclidean distance in most cases because its matching is elastic and robust.).
Avron teaches 
by transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of random series of length D on a randomized feature map (para [0062] The goal in random features maps is to construct randomized feature maps .PSI.(.cndot.) so that the Euclidean inner product <.PSI.P(u),.PSI.).

Doing so would allow for constructing randomized feature maps (para [0062] The goal in random features maps is to construct randomized feature maps).
ANGUERA MIRO (US 20140195474 A1) teaches 
transforming the raw time-series data into a low-dimensional Euclidean inner product space using a number R of … series of length D on a … feature map (para [0088] Any well known distance definition could be used, for example the Euclidean distance or the inner product similarity, but other distances are possible (for example, a possible distance definition is 
d ( q i , r j ) = - log ( q i r j q i r j ) ##EQU00003## 
where the dot is the inner product between the vectors, and the .parallel. is the module).) to reduce computational complexity (Abs. They use an improved algorithm partially based in Dynamic Time Warping and Information Retrieval techniques, but solving the problems (as computational complexity, memory requirements . . . ) observed in these matching techniques.);
consuming a portion of memory storage, where a size of the portion is based on the generated feature matrix (para [0023] Matrix memory requirements: All DTW-inspired algorithms that we are aware of need to store the similarities/distances between all points in both time series in a matrix structure in order to later apply some matching techniques to it in order to find possible matching paths. This limits the total amount of data that can be processed at once in the system as a matrix structure requires a minimum of N.times.M memory locations (where N and M are the number of points in both time series).);
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method of dynamic time warping of Baydogon with the method of dynamic time warping of ANGUERA MIRO.
Doing so would allow for less memory consumption (para [0016] These techniques require much less memory than DTW to process (as they work at the vector level, not with matrices of similarities between points) but can not find matching patterns that are related through non-linear alignments (i.e. warping) between the series.).
While Baydogan a complexity (pg. 2799-2800, section 3.2The overall computational complexity of our algorithm is mainly due to RF sub. The time complexity of building a single tree in RF sub is Oð ffiffiffi p log Þ, where ¼ K  d þ L is the number of features extracted from each subsequence and is the number of training instances for RF sub (equals N  ðr dÞ). The smaller z is, the more subsequences are generated, but with fewer features for each.) using random time series (pg. 2797 col. 2 paragraph 2; Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence.) Baydogan does not explicitly disclose consuming, by the system, a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of … series.
However, Ko teaches
, by the system, a portion of memory storage, where a size of the portion is based on a product of a number of time series in the set and a number of … series (pg. 372, section 2; Suppose we have a class sequence ðCðiÞÞI i¼1 of length I and a test sequence ðT ðjÞÞJ j¼1 of length J, with CðiÞ 2 R and T ðjÞ 2 R. To measure the similarity between these two sequences, an I · J distance table D is constructed, where d(i,j) is the local distance between C(i) and T(j) as depicted in Fig. 1. Typically, the Euclidean distance is used to measure these local distances, thus d(i,j)=(C(i) T(j))2… Finally, the recursion terminates when i = I and j = J, and the time and space complexity of this dynamic programming approach is O(IJ).);
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the dynamic time warping method of Baydogan with the dynamic time warping method of Ko.
	Doing so would allow for improved computational speed (pg. 373, section 3.2; The DTW recognizer can perform temporal fusion on raw data, but by performing fusion on features extracted from the data, both classification performance and computational speed may be improved.)

Claims 2, 6, 7, 9, 13, 14, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Baydogan et al. ("A bag-of-features framework to classify time series.") in view of Lei et al. ("A study on the dynamic time warping in kernel machines."; hereinafter Lei), Avron et al. (US-20150331835-A1; hereinafter Avron), ANGUERA MIRO (US-20140195474-A1; hereinafter ANGUERA MIRO), Ko et al. Chu et al. ("Iterative deepening dynamic time warping for time series.").
Regarding Claim 2,
Baydogan, Lei, Avron, ANGUERA MIRO, and Ko teach the computer-implemented method of claim 1.
Baydogan, Lei, Avron, ANGUERA MIRO, and Ko do not explicitly disclose
wherein the distribution is a predetermined random distribution.
However, Chu et al. (“Iterative Deepening Dynamic Time Warping for Time Series”) teaches
wherein the distribution is a predetermined random distribution (pg. 206; One is a homogeneous synthetic data set generated by the random walk expression, xt = xt-1 + zt , where zt (t=1,2,..) are independent, identically distributed (uniformly) random variables.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of dynamic time warping of Baydogan with the method of dynamic time warping of Chu et al.
Doing so would allow for dimensionality reduction of time series data (pg. 201; PDTW has been shown to effectively generate a speedup of one to three orders of magnitude, compared to the classic DTW algorithm, with no significant loss of accuracy for classification and clustering tasks.).
Regarding Claim 6,
Baydogan, Lei, Avron, ANGUERA MIRO, and Ko teach the computer-implemented method of claim 1. 

wherein generating the feature matrix includes: computing, by the processor system, a set of distance vectors between the raw time-series data and the set of generated reference time-series; translating, by the processor system, the distance vectors into similarity vectors; and concatenating, by the processor system, the similarity vectors to generate the feature matrix.
However, Chu et al. teaches wherein generating the feature matrix includes: 
computing, by the processor system, a set of distance vectors between the raw time-series data and the set of generated reference time-series (pg. 199; Suppose we have two time series Q and C, of length n and m respectively, where: Q = q1,q2,…,qi,…,qn (1)C = c1,c2,…,cj,…,cm (2)To align these two sequences using DTW, we construct an n-by-m matrix where the (ith,jth) element of the matrix contains the distance d(qi,cj) between the two points qi and cj(Typically the Euclidean distance is used, so d(qi,cj) = (qi - cj) 2 ).); 
translating, by the processor system, the distance vectors into similarity vectors (pg. 195; Almost all algorithms that operate on time series data need to compute the similarity between them. Euclidean distance, or some extension or modification thereof, is typically used.); and 
concatenating, by the processor system, the similarity vectors to generate the feature matrix (pg. 199; Each matrix element (i,j)corresponds to the alignment between the points qi and cj.).

Doing so would allow for dimensionality reduction of time series data (pg. 201; PDTW has been shown to effectively generate a speedup of one to three orders of magnitude, compared to the classic DTW algorithm, with no significant loss of accuracy for classification and clustering tasks.).
Regarding Claim 7,
Baydogan, Lei, Avron, ANGUERA MIRO, and Ko teach the computer-implemented method of claim 1.
Baydogan, Lei, Avron, and ANGUERA MIRO do not explicitly disclose
wherein generating the feature matrix includes: computing, by the processor system, a set of feature vectors between the raw time-series data and the set of generated reference time-series using dynamic time warping; and concatenating, by the system, the feature vectors to generate the feature matrix.
However, Chu et al. further teaches wherein generating the feature matrix includes: 
computing, by the processor system, a set of feature vectors between the raw time-series data and the set of generated reference time-series using dynamic time warping (pg. 199; Suppose we have two time series Q and C, of length n and m respectively, where: Q = q1,q2,…,qi,…,qn (1)C = c1,c2,…,cj,…,cm (2)To align these two sequences using DTW, we construct an n-by-m matrix where the (ith,jth) element of the matrix contains the distance d(qi,cj) between the two points qi and cj(Typically the Euclidean distance is used, so d(qi,cj) = (qi - cj) 2 ).); and 
concatenating, by the system, the feature vectors to generate the feature matrix (pg. 199; Each matrix element (i,j)corresponds to the alignment between the points qi and cj.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of dynamic time warping of Baydogan et al. with the method of dynamic time warping of Chu et al.
Doing so would allow for dimensionality reduction of time series data (pg. 201; PDTW has been shown to effectively generate a speedup of one to three orders of magnitude, compared to the classic DTW algorithm, with no significant loss of accuracy for classification and clustering tasks.).
Regarding Claim 9,
Claim 9 is the computer program product corresponding to the method of claim 1. Claim 9 is substantially similar to claim 2 and is rejected on the same grounds.
Regarding Claim 13,
Claim 13 is the computer program product corresponding to the method of claim 1. Claim 13 is substantially similar to claim 6 and is rejected on the same grounds.
Regarding Claim 14,
Claim 14 is the computer program product corresponding to the method of claim 1. Claim 14 is substantially similar to claim 7 and is rejected on the same grounds.
Regarding Claim 16,

Regarding Claim 20,
Claim 20 is the system corresponding to the method of claim 1. Claim 20 is substantially similar to claim 7 and is rejected on the same grounds.
Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Baydogan et al. ("A bag-of-features framework to classify time series.") in view of Lei et al. ("A study on the dynamic time warping in kernel machines."; hereinafter Lei), Avron et al. (US-20150331835-A1; hereinafter Avron), ANGUERA MIRO (US-20140195474-A1; hereinafter ANGUERA MIRO), Ko et al. ("Using dynamic time warping for online temporal fusion in multisensor systems."), and Leonard et al. (US-20160217384-A1).
Regarding Claim 3,
Baydogan, Lei, Avron, ANGUERA MIRO, and Ko teach the computer-implemented method of claim 1.
	Baydogan, Lei, Avron, ANGUERA MIRO, and Ko do not explicitly disclose
wherein the distribution is a probability distribution of the raw time-series data.
	However, Leonard et al. teaches 
wherein the distribution is a probability distribution of the raw time-series data (para [0171] Information related to the optimal probability distribution may be provided to the time series analysis engine 2316.). 

Doing so would allow for predicting future time series data points with greater accuracy (Abs. The set of predicted future data points are adjusted using the generated set of parameters for the optimal discrete probability distribution in order to provide greater accuracy with respect to predictions of future data points.).
Regarding Claim 10,
Claim 10 is the computer program product corresponding to the method of claim 1. Claim 10 is substantially similar to claim 3 and is rejected on the same grounds.
Regarding Claim 17,
Claim 17 is the system corresponding to the method of claim 1. Claim 17 is substantially similar to claim 3 and is rejected on the same grounds.
Claims 4, 5, 11, 12, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Baydogan et al. ("A bag-of-features framework to classify time series.") in view of Lei et al. ("A study on the dynamic time warping in kernel machines."; hereinafter Lei), Avron et al. (US-20150331835-A1; hereinafter Avron), ANGUERA MIRO (US-20140195474-A1; hereinafter ANGUERA MIRO), Ko et al. ("Using dynamic time warping for online temporal fusion in multisensor systems."), Leonard et al. (US-20160217384-A1), and Pallath et al. (US-20180150547-A1).
Regarding Claim 4,

Lei further teaches
generate the feature matrix based on the set of generated set of reference time-series (pg. 840; To compute the DTW distance Dι(x, y) with x = [x1, x2, ··· , xn] and y = [y1, y2, ··· , ym], we can first construct an n-by-m matrix, as shown in Fig. 1.).
Leonard et al. further teaches
wherein the processor system is a two-party protocol system comprising a first-party component and a second-party component, wherein the first-party component is configured to generate the probability disruption from the raw time-series data (para [0171] The probability distribution selector engine 2318 may utilize the information corresponding to the number of probability distributions and the selection criterion to determine an optimal probability distribution for the time series data set.), transmit the probability distribution of the raw time-series data to the second-party component (para [0171] Information related to the optimal probability distribution may be provided to the time series analysis engine 2316.), 
	Baydogan, Lei, Avron, ANGUERA MIRO, Ko and Leonard do not explicitly disclose
receive the generated set of reference time-series from the second-party component, 
and transmit the generated feature matrix to the second-party component.
However, Pallath et al. teaches
receive the generated set of reference time-series from the second-party component (para [0040] Time series data may be received (202).),
para[0044] The time window matrix may be provided (210) for further processing.).
	It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the predictive model for time series data of Pallath et al. with the forecasting model for time series data of Leonard et al.
	Doing so would allow for a reduced representation of time series data (Abs. This new reduced-dimension representation improves the efficiency of time series data mining and forecasting.).
Regarding Claim 5,
Baydogan, Lei, Avron, ANGUERA MIRO, Ko, Leonard, and Pallath teach the computer-implemented method of claim 4 wherein the second-party component is configured to receive the probability distribution from the first-party component.
	Baydogan et al. further teaches 
generate the set of reference time-series (pg. 2797 Thus, we generate subsequences of random length ls and segment them using the same number of intervals to preserve the same number of intervals d for each subsequence.);
Pallath et al. further teaches
transmit the set of reference time-series to the first-party component (para [0036] The data storage 106 may store the data that is input to, output from, and/or otherwise generated through the time series analysis described herein. For example, the data storage 106 may store time series data 108 for one or more time series.)
para [0046] [0046] A time window matrix, such as that generated per FIG. 2A, may be accessed (212).)
provide the feature matrix as the input to the one or more machine learning models (para [0064] Time windows may be received (242) as input for a forecast. The future time windows to be predicted may also be received.)
transmit results from the machine learning models to the first-party component (para [0037] In some instances, the signal(s) 114 may instruct the service(s) 116 to perform various action(s), or discontinue performing various action(s), based on the prediction(s) 112. For example, based on a prediction for future power consumption in a power distribution system, such as a municipal power grid, the analysis module(s) 104 may send signal(s) 114 to instruct the service(s) 116 (e.g., power grid control process(es)), to adjust their power generation and/or power distribution to account to predicted increases or decreases in power consumption.)
Leonard et al. further teaches 
wherein the second-party component is configured to receive the probability distribution from the first-party component (para [0171] Information related to the optimal probability distribution may be provided to the time series analysis engine 2316.)
Regarding Claim 11,
Claim 11 is the computer program product corresponding to the method of claim 1. Claim 11 is substantially similar to claim 4 and is rejected on the same grounds.
Regarding Claim 12,
Claim 12 is the computer program product corresponding to the method of claim 1. Claim 12 is substantially similar to claim 5 and is rejected on the same grounds.
Regarding Claim 18,
Claim 18 is the system corresponding to the method of claim 1. Claim 18 is substantially similar to claim 4 and is rejected on the same grounds.
Regarding Claim 19,
Claim 19 is the system corresponding to the method of claim 1. Claim 19 is substantially similar to claim 5 and is rejected on the same grounds.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Poola et al. (US 20140372807 A1) – discloses memory usage of time series data.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217. The examiner can normally be reached Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 5712723768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HENRY NGUYEN/Examiner, Art Unit 2121                                                                                                                                                                                                        



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121