DETAILED ACTION
This communication is responsive to application 16/696,514 filed 11/26/2019.
The instant application has a total of 15 claims pending in the application, all of which are ready for examination. Claims 1, 6 and 11 are independent claims.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
As required by M.P.E.P. 609(c), the applicant’s submissions of the Information Disclosure Statement dated 11/26/2019 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. As required by M.P.E.P. 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.

Specification
The title of the invention is not descriptive. The present title recites “Learning Device, Learning Method, and Computer-Readable Medium” which does not suggest anything specific or substantive. A new title is required that is clearly indicative of the invention to which the claims are directed, see per MPEP 606.01.
The incorporation of essential material in the specification by reference to an unpublished U.S. application, foreign application or patent, or to a publication is improper. Applicant is required to amend the disclosure to include the material incorporated by reference, if the material is relied upon to overcome any objection, rejection, or other requirement imposed by the Office. The amendment must be accompanied by a statement executed by the applicant, or a practitioner representing the applicant, stating that the material being inserted is the material previously incorporated by reference and that the amendment contains no new matter. See per 37 CFR 1.57(g).
The disclosure is objected to because of the following informalities: when reviewing the specification as a whole, the lengthy disclosure appears to be a direct machine translation. Examiner requests applicant’s cooperation in correcting any minor errors of which applicant may become aware in the specification. Appropriate correction is required.

Claim Objections
Claim 1 - 15 are objected to because of the following informalities: All independent claims as well as dependent claims 3-4, 8-9 and 13-14 recite claim language comprising “in a case where…”.  Examiner notes these limitations are considered contingent clauses which may not be further limiting, see MPEP 2111.04.  Appropriate correction is suggested, replacing the language with wherein or some positive recitation of the claim limitations.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1 - 15 are rejected under 35 U.S.C. 112(b), as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, regards as the invention. Independent claims comprise first and third limitations (generate and set) that lack clarity. 
In particular, limitations recite “generate plural first subsets of time-series data by dividing”. The functionality of generate by dividing is in conflict with itself, similar to reduce by multiplying or the like. How does one generate by dividing? One must further assume the dividing is nonzero or tractable functions. Examiner interprets the functionality of generate by dividing as any time-series discretization such as windowing or similar.
Further, independent claims recite “set the learned first parameter for the first RNN, and learn, based on data and the teacher data, parameters of the RNNs”. The limitation lacks clarity because “the RNNs” may be construed as referring back to the first RNN. One would not set a parameter and learn a parameter for the same RNN over the same operation because these actions conflict with each other. In other words, learning/training is updating a parameter and setting/testing is holding a parameter fixed, one would not perform both fixing and updating of the same parameter in the same operation. Examiner interprets “the RNNs” as hierarchical RNNs in light of the specification which describes hierarchical RNNs, and further interprets parameters that are set based on learning as simply the recurrent functionality of any RNN. Remaining dependent claims fail to cure the deficiency and inherit the rejection. Accordingly, claims 1-15 are rejected under 35 U.S.C. 112(b) as indefinite.
Dependent claims 2, 7 and 12 are further rejected under 35 U.S.C. 112(b) as indefinite because the term “higher than” is a relative term which renders the claim indefinite. Claims recite “a second layer that is one layer higher than the first layer”. Does the layer fall over if it is on its side or get crushed under the weight of other layers? In the field of machine learning, hierarchical RNNs use the concept of depth in discussing layer assignment. Examiner interprets higher than as simply any other layer or grouping function different that the first.
Dependent claims 3-4, 8-9 and 13-14 are further rejected under 35 U.S.C. 112(b) as indefinite because the block limitations are insolubly ambiguous. The claim language is largely incomprehensible and appear to be direct machine translation. The specification does not provide adequate detail to ascertain the scope of what is being claimed or the significance thereof.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 - 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. In determining whether the claims are subject matter eligible, the examiner applies the eligibility guidance set forth per MPEP 2106.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—all claims fall within one of the four statutory categories: claims 1-5 are a device/machine, claims 6-10 are a method/process, and claims 11-15 are a computer-readable medium/article of manufacture.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes—the claims, under the broadest reasonable interpretation, recites an abstract idea. In this case, claims fall within the enumerated grouping of “Mathematical Calculations”, but for the recitation of generic computer components. In particular, claims recite: 
“generating plural first subsets of time-series data by dividing time-series into predetermined intervals, the time-series data including plural sets of data arranged in time series, and generating first learning data including each of the plural first subsets of time-series data associated with teacher data corresponding to the whole time-series data”
(Mathematical Calculations)
	The limitation comprises dividing as the mathematical calculation and is considered abstract because one may simply observe wall clock time and consider the whole hours as being subsets.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No—a practical application is not integrated into the judicial exception because the additional elements are as follows: 
Limitations are performed by a device comprising “processor”, “memory” and/or “computer-readable recording medium storing therein a learning program that causes a computer to execute”. It is important to note that a general purpose computer that applies a judicial exception, such as an abstract idea, by use of conventional computer functions does not qualify as a particular machine per MPEP 2106.05(b). These additional elements are recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components per MPEP 2106.05(f).
Limitations further recite “learn, based on the first learning data, a first parameter of a first RNN of recurrent neural networks (RNNs), included in plural layers, the first RNN be included in a first layer”. The additional element of RNNs with learned parameters is well-understood, routine and conventional - RNNs necessarily have parameters and applicant’s IDS notes several such RNNs dating to early 1990s, several decades prior to application’s effective filing date. Describing an RNN learning of parameters as claimed amounts to no more than cursory descriptive language of a known technological environment which does not meaningfully limit the claim, see MPEP 2106.05(d)(h)(e).
Limitations further recite “set the learned first parameter for the first RNN and learn, based on data and the teacher data, parameters of the RNNs…”. The limitation, as a whole or in part, does not recite any technical feature beyond more parameter learning and where some data is “set” loosely based on data and other data (teacher data). In fact, the elements are quite misleading because the teacher data suggests student-teacher modeling which is a different branch of machine learning altogether and simply does not come up in the specification. Assuming that the drafted limitation carries any kind of meaning at all, there are no other elements beyond layers and input, such as observations. The element of layers does not convey any particular arrangement, layering can simply be any group of neurons in the neuronal network that recurrently compute. It does not require a second RNN (e.g. hierarchical) nor any particular type of layer transformation (e.g., mean-pooling). The drafted limitation simply departs so far from a technical solution that it obfuscates claimed subject matter in a way that does not convey particular transformation or meaningful limitation, see MPEP 2106.05(c)(e).
Accordingly, these additional elements do not integrate the abstract idea into a practical application. The claims are directed to an abstract idea.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No—the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of using a processor and memory to perform the abstract idea of learning methods amounts to no more than mere instructions to apply an exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Moreover, the recitation of learning parameters for an RNN is routine as evidenced by applicant’s IDS and/or any common textbook in machine learning, see Goodfellow et al., “Deep Learning” which is replete with RNNs such as LSTM/GRU. Elements which are routine do not amount to significantly more per MPEP 2106.05(d). The final recitation of setting and learning parameters based on data and other data does not meaningfully limit the claim. The combination of elements data and other data (teacher data) does not recite how they are combined therefore considering the combination does not reveal significance. Accordingly, the additional elements do not amount to significantly more than the judicial exception.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 6 and 11, which recite a method and a computer-readable medium, respectively, as well as to dependent claims 2-5, 7-10 and 12-15.
Dependent claims 2, 7 and 12 disclose setting the parameter, generate a second learning and learn a second parameter of second RNN layer. The claim amounts to performing repetitive calculations, the performance of repetitive calculations does not amount to significantly more per MPEP 2106.05(d). Further, there is no evidence of improvement to the functioning of a computer, see MPEP 2106.05(a). There are no benchmark results or quantified demonstration of unexpected results to substantiate any allegation along these lines and would not be warranted over known techniques such as Miller et al., “When Recurrent Models Don’t Need to Be Recurrent” cited as evidence.
Dependent claims 3-4, 8-9 and 13-14 disclose language incomprehensible and is likely a direct machine translation. As best as can be understood, it appears to indicate updates to teacher data based on differing combinations of data that comprise some part of pairs. The specification scarcely provides any reference to pairs and sheds no light on what these claim limitations are even suggesting. Functionality might amount to embodiments of LSTM/GRU or feature data being pooled and discretized. LSTM/GRU and pooling is well-known and routine (evidenced by same textbook above, Goodfellow) and dividing time data into subsets is considered part of the abstract idea. 
Dependent claims 5, 10 and 15 disclose dividing based on features of speech time-series data. The dividing is considered part of the abstract idea and the data comprising speech amounts to an insignificant extra-solutionary activity per MPEP 2106.05(g) such as mere data gathering or selecting the type of data to be manipulated. Performing the collective steps for speech is a use case which amounts to mere instructions to apply the exception per MPEP 2106.05(f). Accordingly, the additional elements do not amount to significantly more.
Taken alone, their additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 6-8 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over: 
Tao et al., “Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction” hereinafter Tao (arXiv: 1806.00685v1), in view of 
Cao et al., “BRITS: Bidirectional Recurrent Imputation for Time Series” hereinafter Cao (arXiv: 1805.10572v1), in view of 
Miller et Hardt, “When Recurrent Models Don’t Need to Be Recurrent” hereinafter Miller (arXiv: 1805.10369v2).
With respect to claim 1, Tao teaches: 
	A learning device {Tao discloses hierarchical RNN for time-series Fig- 2, implementation is by device hardware “NVIDIA Tesla GPU” [P.6 RtCol]} comprising: 
a memory; and a processor coupled to the memory {Tao [P.6 RtCol] “GPU” is processor, “LSTM” conveys memory and/or requisite devices for such computational functionality [P.4 ¶4], [P.7 ¶1]} and configured to: 
generate plural first subsets of time-series data by dividing time-series data into predetermined intervals, the time-series data including plural sets of data arranged in time series, and generate first learning data including each of the plural first subsets of time-series data associated with teacher data corresponding to the whole time-series data {Tao discloses per [P.2 ¶2] “generate the future series y’T” describing time-series where y’T is detailed by Equation 14 [P.5]. The dividing is interpreted as “window size” and notes “interval lengths” of such time series data [P.3 Sect.3 ¶1-2], [P.4 ¶2]. The learning data is “training set”, ”training samples” per [P.6 Eq.15], [P.5 Sect.5] and teacher data is yT for hierarchical recurrent neural network which is consistent with the instant specification [0029]}; 
learn, based on the first learning data, a first parameter of a first RNN of recurrent neural networks (RNNs), included in plural layers, the first RNN being included in a first layer {Tao Fig 2 illustrates RNNs among layers, described as hierarchical, parameters thereof comprise [P.3-5 Sect.4] equation variables include hidden states and weighted functions. The learning is an update e.g., “parameters to be learned” [P.4 Last¶], [P.5 ¶1-2]}; and 
set the learned first parameter for the first RNN, and learn, based on data and the teacher data, parameters of the RNNs included in the plural layers, the data being acquired by input of each of the first subsets of time-series data into the first RNN, in a case where the parameters of the RNNs included in the plural layers are learned {Tao discloses RNNs, the learning of which is recurrent. Thus, a parameter/weight is set and then learned by way of recurrence. Moreover, Equation 10 [P.5] shows how teacher data yt is used on both sides of equation, hence output is based on previous teacher data and other data such as attentional context vector to capture temporal dynamics. Attention is akin to a cognitive mapping that works by alignment and yields superior modeling results. Fig 1 illustrates the framework comprising layers, RNNs and attention mechanism. Table 1 notes testing/validation when parameters are set, i.e. “fixed parameters” where “parameters in our model include the number of time steps in the window T” [P.6 Sect5.2 ¶1]}.
However, Tao’s time-series appears to be sampled regularly having interval/size of equal length which seems to depart from the specification’s implementation when reading the claim functionality of dividing in light of the specification.
Cao teaches “mean pooling mechanism, i.e., αi=1/T” dividing and irregularly sampled time series for recurrent networks with softmax per [P.4 Last¶]. See also iterative imputation using validated passes [P.3 Last2¶]. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to perform mean-pooling irregular sampling of Cao in combination with Tao as applying known techniques to known methods to yield predictable results as simply averaging an input which may comprise missing values (Cao [P.4 Last¶]).
	Finally, Miller is cited in support of Tao and Cao specifically because stability is addressed which is a core aspect the application seeks to address by efficient/steady learning. Miller introduces the issue at [P.2 Sect.2] Equation which takes the form for RNN per [P.7 Sect6.2]. In the equation of Miller, teacher data is ht-1 and other data is xt, both of which are within same parentheses for updating hidden variables in learning an RNN. The functionality amounts to affine transformation, Lipschitz continuous and smooth, the significance of which is a stable criteria that yields better generalization. In other words, learning is steady/efficient as learning progresses over time. 
Miller is directed to recurrent neural networks thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to utilize the teachings of Miller in combination with Tao and Cao for the motivation because “we prove stable recurrent model have good feed-forward approximations” and “gradients of the training objective do not explode over time” (Miller [P.2 ¶1], [P.1 Last¶]).

With respect to claim 2, the combination of Tao, Cao and Miller teaches the learning device according to claim 1, wherein the processor is further configured to: 
	set the learned first parameter for the first RNN {Tao [P.6 Sect5.2 ¶] “fixed parameters” where “parameters in our model include the number of time steps in the window T” see Fig 1, Table 1}; 
generate second learning data including each of plural second subsets of time-series data associated with the teacher data, the plural second subsets of time-series data being acquired by input of each of the first subsets of time-series data into the first RNN; and learn, based on the second learning data, a second parameter of a second RNN included in a second layer that is one layer higher than the first layer {Tao Fig 1- second learning is RHN at second time step (i.e., sequential progression- from yt-1 to yt) as in Equation 13 each y makes computation of Eq.14 with second parameter W and/or V at the second time step. Layering to which plurality of RNNs belong is referred to as a “highway layer” [P.4 RtCol ¶1] which is simply an indicator function for the group operation or depth for hierarchical structure with each recurrent network having its own layers}.

With respect to claim 3, the combination of Tao, Cao and Miller teaches the learning device according to claim 1, wherein the processor is further configured to: 
	in a case where output data output when the first subsets of time-series data are input to the first RNN is different from the teacher data, generate the first learning data, by updating the teacher data to the output data, the teacher data corresponding to the first subsets of time-series data, {Tao updates teacher data to output data by Eqs. 14/10 [P.5] over the windowed subsets of time-series indexed with subscript T}. 
However, Tao does not prima facie disclose “pairs”
Cao teaches: 
for a part of plural pairs of the first subsets of time-series data and the teacher data, the plural pairs being included in the first learning data {Cao [P.4 Last¶] the pairs are denoted Lout(y, y’) being an accumulated loss where y’ is teacher data and y carries the first subsets of time-series aggregated}.
	A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to accumulate loss over pairs of data per Cao with the update of Tao for the motivation of evaluating accumulated outputs and such that ground truth may be compared in relative error MRE (Cao [P.7 Sect5.2.1], [P.4 Last¶]).

With respect to claim 6, the rejection of claim 1 is incorporated. The scope differing as a method to perform the limitations mirrored by claim 1. Tao discloses methods using “our model” [P.5 Eq.14] in a computational framework of Fig 2. The remainder of the claim is rejected for the same rationale as is already addressed in the rejection of claim 1.

Claims 7-8 are rejected for the same rationale as claims 2-3, respectively.

With respect to claim 11, the rejection of claim 1 is incorporated. The scope differing as a non-transitory computer-readable medium storing therein a learning program that causes a computer to execute a process of limitations mirrored by claim 1. Cao discloses software “python package” and “PyTorch” per [P.7 Sect5.2.2], [P.8 ¶1] with “GPU” hardware for LSTM memory networks [P.5 ¶1]. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to use hardware/software elements disclosed by Cao in performing the techniques of Tao, Cao and Miller as obvious to try among routine computer elements with a reasonable expectation of success for implementing computational functionalities. The remainder of the claim is rejected for the same rationale as is already addressed in the rejection of claim 1.

Claims 12-13 are rejected for the same rationale as claims 2-3, respectively.

Claims 4, 9 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Tao, Cao and Miller in view of: 
Grabochka et Schmidt-Thieme, “NeuralWarp: Time-Series Similarity with Warping Networks” hereinafter Grabochka (arXiv: 1812.08306v1).
With respect to claim 4, the combination of Tao, Cao and Miller teaches the learning device according to claim 1, wherein the processor is further configured to: 
	in a case where output 78Docket No. PFJA-19032-US: FINAL data output when the first subsets of time-series data are input to the first RNN is different from the teacher data, generate the first learning data, by updating the teacher data to other data that is different from the teacher data and output data, the teacher data corresponding to the first subsets of time-series data for a part of plural pairs of the first subsets of time-series data and the teacher data, the plural pairs being included in the first learning data {Grabochka [P.4] Fig 2 illustrates time-series warping which is when a pair of two time-series are simultaneously compared by a neural network. The network is bidirectional RNN/LSTM per Fig 4 and updating is by gradient descent [P.5 RtCol] for an optimization objective and warp function Eqs. 7-8 per [P.4-5] where the two time-series are denoted A, B and parametric learning is over Ꜫ, ф. Teacher data is updated with data different than other data with other than the other other data because it comprises a second time-series}, 
	Grabochka is directed to recurrent neural networks for time-series thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to update according to Grabochka in combination for the motivation of including second time-series in pairwise evaluation (Grabochka [P.1 Sect.1 ¶2], [P.3 Sect2.4]).

Claims 9 and 14 are rejected for the same rationale as claim 4.

Claims 5, 10 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Tao, Cao and Miller in view of: 
Thickstun et al., “Coupled Recurrent Models for Polyphonic Music Composition” hereinafter Thickstun (arXiv: 1811.08045v1).
With respect to claim 5, the combination of Tao, Cao and Miller teaches the learning device according to claim 1, wherein the processor is further configured to: 
	divide, based on features of speech data corresponding to the time-series data, the time-series data into the plural first subsets of time-series data {Thickstun discloses [P.4 ¶1] “quantize time using the finest denominator ∆” Eq.1 ∑(1/T∆)logq(S0, S∆, …, ST∆) is dividing where S is scores of music data features akin to speech e.g. singing or language modeling with word embeddings [P.7 Sect5.4], Figs 1-2}.
	Thickstun is directed to recurrent neural networks with hierarchical structure and time-series thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to divide over music features per Thickstun in combination for the motivation “We can think of ∆ as the resolution… we can discretize at this resolution without losing information” (Thickstun [P.4 ¶1-4]).

Claims 10 and 15 are rejected for the same rationale as claim 5.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Gouk et al., “Regularisation of Neural Networks by Enforcing Lipshitz Continuity” Equation 4 supports finding that Lipschitz is “affine transformation” [P.5 ¶1], arXiv:1804.04368v2
Mei et al., “Deep Diabetologist: Learning to Prescribe Hypoglycemia Medications with Hierarchical Recurrent Neural Networks” excellent example of practical application.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chase P Hinckley whose telephone number is (571)272-7935. The examiner can normally be reached M-F 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda M. Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHASE P. HINCKLEY/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126