DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claim for the benefit of a prior-filed U.S. Application No. 14/266,093, which is a continuation-in-part application of and claims priority to US Patent Application Number 13/725,995 filed  December 21, 2012, which is a continuation-in-part application of US Patent Application Number 13/870,861 filed on April 25, 2013  and US Patent Application Number 13/749,618 filed on January 24, 2013, which claims the benefit of United States Provisional Patent Application Number 61/727, 114, filed November 15, 2012, which is acknowledged.
Drawings
The drawings were received on 03/18/2020.  These drawings are acceptable.



Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are listed below where the generic place holder or “mean for” recitations  are noted in bold font and the functional language is/are italicized:
Claim 1 limitation(s):
a receiver module configured to receive electronically transmitted training data over a data network using a network interface, the training data for forming a machine learning ensemble customized for the training data; 
 a function generator module configured to pseudo-randomly generate executable program code for a plurality of learned functions from a plurality of different machine learning classes using parallel computing on multiple processors based on the training data, the different machine learning classes selected without regard to a suitability of the plurality of learned functions and of the different machine learning classes for the training data, a total number of the plurality of learned functions selected such that at least a subset of the plurality of learned functions are pseudo-randomly suitable for the training data;
a function evaluator module configured to perform a machine learning evaluation of the plurality of learned functions using test data and to maintain evaluation metadata for the plurality of learned functions in one or more non-transitory computer readable storage media, the evaluation metadata comprising one or more of an indicator of a training data set used to generate a learned function and an indicator of one or more decisions made by a learned function during the machine learning evaluation; and
a machine learning compiler module configured to compile the executable program code from a subset of multiple learned functions from the plurality of learned functions to form the machine learning ensemble, the machine learning ensemble comprising the subset of multiple learned functions selected and combined based on the evaluation metadata for the plurality of learned functions, and comprising a rule set synthesized from the evaluation metadata 
Claim 2: further comprising a feature selector module configured to in response to the function generator module generating the executable program code for the plurality of learned functions, determine a subset of features from the training data for use in the machine learning ensemble based on the evaluation metadata, the machine learning compiler module configured to form the machine learning ensemble using the selected subset of features.
Claim 3: wherein the feature selector module is configured to iteratively increase a size of the subset of features until a subsequent increase in the size fails to satisfy a feature effectiveness threshold.
Claim 4: wherein one or more of the features of the training data are selected by a user as required and the feature selector module is configured to select one or more optional features to include in the subset of features with the required one or more features.
Claim 5: wherein the function evaluator module is configured to perform the machine learning evaluation of the plurality of learned functions using the test data by inputting the test data into the plurality of learned functions to output the one or more decisions.
Claim 6: wherein the function evaluator module is configured to maintain the evaluation metadata for each evaluated learned function in a metadata library stored on the one or more non-transitory computer readable storage media, the machine learning compiler module configured to include the rule set in the machine learning ensemble, the rule set comprising at least a portion of the evaluation metadata.
Claim 7: wherein the evaluation metadata further comprises one or more of the training data, classification metadata, convergence metrics, and efficacy metrics for the plurality of learned functions.
Claim 8: wherein the machine learning compiler module is configured to combine learned functions from the plurality of learned functions to form combined learned functions, the machine learning ensemble comprising at least one combined learned function.
Claim 9: wherein the function generator module is configured to determine one or more additional learned functions in response to a learned function request, the machine learning compiler module configured to request one or more additional learned functions from the function generator to combine with learned functions from the plurality of learned functions.
Claim 10: wherein the machine learning compiler module configured to add one or more layers to at least a portion of the plurality of learned functions to form one or more extended learned functions, at least one of the one or more layers comprising one or more of a Bayes classifier and a Boltzmann machine, the at least one other learned function comprising an extended learned function extended with the one or more of the multiple learned functions.
Claim 11: wherein the machine learning compiler module is configured to form the machine learning ensemble by organizing the subset of learned functions into the machine learning ensemble, the machine learning ensemble comprising the subset of 
Claim 12: further comprising an orchestration module configured to direct workload data through the machine learning ensemble based on the evaluation metadata data to produce a classification for the workload data and a confidence metric for the classification, the evaluation metadata synthesized to form the rule set for the subset of learned functions.
Claim 13: further comprising an interface module configured to receive an analytics request from a client and to provide an analytics result to the client, the analytics request comprising workload data with similar features to the training data, the analytics result produced by the machine learning ensemble.
Claim 14 limitations: 
means for generating executable program code for a plurality of learned functions from a plurality of different machine learning classes based on training data without regard to a suitability of the plurality of learned functions and of the different machine learning classes for the training data, the training data received for forming a machine learning ensemble customized for the training data;
means for evaluating the plurality of learned functions using test data to generate evaluation metadata stored in one or more non-transitory computer readable storage media, the evaluation metadata indicating an effectiveness of different learned functions at making predictions based on different subsets of the test data; and
means for compiling executable program code from a subset of multiple learned functions from the plurality of learned functions to form the machine learning ensemble, the machine learning ensemble comprising the subset of multiple learned functions selected and combined based on the evaluation metadata, and comprising a rule set synthesized from the evaluation metadata to direct different subsets of the workload data through executable program code from different learned functions of the multiple learned functions based on the evaluation metadata.
Claim 15: further comprising means for synthesizing the evaluation metadata into a rule set for the subset of learned functions, wherein the means for compiling executable code to form the machine learning ensemble further comprises means for including the rule set in the machine learning ensemble.
Claim 16: wherein the means for compiling executable program code forms the machine learning ensemble by one or more of: combining learned functions from the plurality of learned functions to form a combined learned function; and adding one or more layers to a learned function from the plurality of learned functions to form an extended learned function.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structu§re to perform the claimed function); or (2) present a sufficient showing that the claim 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Regarding claims 23-25 the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claims are directed to software per se. Specifically, the specification, in paragraphs 0026-0027, discloses that the claimed modules, may be software or programmable logic executed/implemented on generic processor circuits.  Therefore, under broadest reasonable interpretation (BRI), the claims are drawn to computer instructions (often referred to as "software per se"), see MPEP 2106.03. Thus, the claims are not directed to any of the statutory categories, see MPEP 2106.03(I), because the claims are directed to software per se as interpreted under BRI, and a rejection under 35 USC 101 as covering non-statutory subject matter is appropriate.

Claim Rejections - 35 USC § 112:  Written Description Requirements
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:


Claims 1-6, 8-16 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Specifically, the claims include limitations that invokes 35 USC § 112(f), as noted in the claim interpretation section above, where the specification does not disclose adequate structure (or material or acts) for performing the recited function (i.e. the algorithm is provided in the disclosure), see MPEP 2181. 
For a computer-implemented 35 U.S.C. § 112(f) claim limitation that performs a specific computer function, the specification must disclose sufficient corresponding structure, for example, the computer and the algorithm, that performs the entire claimed function or functions, see MPEP 2181(II)(B). The applicant’s specification discloses, in paragraphs 0026-0027, a module may be implemented on the noted generic computer processor circuits using programmable devices or software. However, the corresponding structure for a computer-implemented § 112(f) limitation that performs a specific computer function is not simply a general purpose computer and the applicant’s disclosure has not disclosed the algorithm for performing the entire recited specialized functions noted in italicized text above for the claim limitations invoking 35 U.S.C. § 112(f).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-16 and 23-25 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, the claim recited the limitation “a function generator module configured to pseudo-randomly generate executable program code for a plurality of learned functions from a plurality of different machine learning classes using parallel computing on multiple processors based on the training data” (emphasis added) that renders the claim indefinite because the limitation is unclear. Specifically, the claim recites “pseudo-randomly generate executable program code for a plurality of learned functions from a plurality of different machine learning classes using parallel computing on multiple processors”  it is unclear how to generate a machine learning algorithm using pseudo-randomly executable program code  using generic processor, that results in a deterministic, non-random algorithm for facilitating a parallel computing process for machine learning? In addition the phrase “pseudo-randomly generated executable program code” is not considered a term of art and the applicant’s specification provides no information to determine the intended scope of the claim limitation. The specification merely repeats claim language.  Thus, it would be unclear to one of ordinary skill in the art to what to ascertain as the intended scope of the claim, and the limitations renders the claim indefinite. Examiner notes that any executable code for processing learning in a parallel computing environment is within the scope of the claimed limitation.
Regarding claims 1, the claim recites the term “pseudo-randomly suitable” in the limitation  “a total number of the plurality of learned functions selected such that at least a subset of the plurality of learned functions are pseudo-randomly suitable for the training data and of the different machine learning classes for the training data,” that renders the claim indefinite because the term is not a term of art and thus renders the claim incoherent. The specification does not provide a standard for ascertaining the intended scope of the term. Specifically, how are a set of learned functions determined to be pseudo-randomly suitable for the training data? Given that machine learning learns using training data what makes this training pseudo randomly suitable, given it was used to produced the learned functions? OR is this suppose to imply that only some random fraction of data was used to train the learned function. The applicant specification paragraph 0095 discloses “pseudo-randomly indicates that the function generator module 301 is configured to generate learned functions in an automated manner, without input or selection of learned functions, machine learning classes or models for the learned functions, or the like by a Data Scientist, expert, or other user” while the claim recites a selection process for indicating “pseudo-randomly suitable” training data. The term renders the limitation incoherent. Thus, the term and limitation are unclear and the claim is rendered indefinite. Examiner notes that any selected number of learned functions are within the scope of the claim invention. 
Regarding claim 1, the claim recites the limitation “the different machine learning classes selected without regard to a suitability of the plurality of learned functions and of the different machine learning classes for the training data” (emphasis added) that renders the claim indefinite because the use of the phrase “without regard to a suitability” is unclear. Specification, how are the different classes learned for training data when there is no consideration of how the training data is associated with (e.g. suitable) with each class? What does the applicant mean by no regard to a suitability as claimed. It would appear that the learning can not occur given what one of ordinary skill in the art would regard as machine learning using training data. The applicant appears to be using the term “training data” to refer to data not used to train the learning class while also claiming that the learning classes are for the 
Regarding claims 14 and 23, the claims recites similar limitations using the phrase “without regard to a suitability” that render the claims indefinite for the same reason’s above noted for claim 1 limitation.  
Regarding claims 2-13, that depend on claim 1, the claims do not resolve the deficiencies noted in claim 1 limitation above, and are thus appropriately rejected. 
Regarding claims 15-22 that depends on claim 14 and claims 24-25 that depend on claim 23, the claims do not resolve the deficiencies noted in their respective independent claims and are therefore appropriately rejected.
Regarding claim 23, the claim recites the limitation “executable program code for multiple learned functions synthesized from executable program code for a larger plurality of learned functions from a plurality of different machine learning classes, the multiple learned functions selected and combined based on evaluation metadata for an evaluation of the larger plurality of learned functions, wherein the larger plurality of learned functions are generated based on training data without regard to a suitability of the larger plurality of learned functions and of the different machine learning classes for the training data” renders the claim indefinite because it is unclear what the intended scope of the claim should be. Specifically, the term “larger plurality of learned functions” recited as “larger plurality of learned functions the  from a plurality of different machine learning claims” appears to be a relative term that is made clear by the claim limitation. Specifically, what makes the plurality of learned functions larger from the other in the set of different class as claimed? And how to can the larger learned function to be deemed larger “without regard to a suitability of the larger and of the different machine learning classes. It would seem that the determination as “larger”, as recited in this context, 
Examiner interprets any selection process form a set of learned function is within the scope of the claim limitation.
Regarding claims 24-25that depends on claim 23, the claims do not resolve the deficiencies noted in their respective independent claim and are therefore appropriately rejected.

Regarding claims 1-6, 8-16 limitations noted in the claim interpretation section above invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. For a computer-implemented 35 U.S.C. § 112(f) claim limitation that performs a specific computer function, the specification must disclose sufficient corresponding structure, for example, the computer and the algorithm, that performs the entire claimed function or functions, see MPEP 2181(II)(B). The applicant’s specification discloses, in paragraphs 0026-0027, a module may be implemented on the noted generic computer processor circuits using programmable devices or software. However, the corresponding structure for a computer-implemented § 112(f) limitation that performs a specific computer function is not simply a general purpose computer and the applicant’s disclosure has not disclosed the algorithm for performing the entire recited specialized functions noted in italicized text above for the claim limitations invoking 35 U.S.C. § 112(f)
 Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:

(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-16 and 18-21,  are rejected under 35 U.S.C. 103 as being unpatentable over Drame (US Pat. No. 9,147,166) in view of Senior et al. (US Pat No. 8,527,276, hereinafter ‘Senior’) in further view of Burges et al. (US Pub. No. 2007/0239632,  hereinafter ‘Burg’).

Regarding claim 1 Drame teaches: an apparatus for a machine learning factory, the apparatus comprising: a receiver module configured to receive electronically transmitted training data over a data network using a network interface, the training data for forming a machine learning ensemble customized for the training data; (claimed receiver module for transmitting training data as extracted feature datasets, in 3:24-30: Certain embodiments provide methods for generating dynamically controllable data composites from two or more 25 data segments comprising the steps of: (1) building or training one or more function mappers to map between one or more extracted feature envelopes datasets from the original data and one or more analyzed dataset fitting one or more general parametric representation of the original data;…)
 a function generator module configured to pseudo-randomly generate executable program code for a plurality of learned functions from a plurality of different machine learning classes using parallel computing on multiple processors based on the training data, the different machine learning classes selected without regard to a suitability of the plurality of learned functions and of the different (claimed function generator module for claimed functions as mapping functions into two or more data segment training datasets, in 3:30-43: … (2) combining 30 the extracted feature envelopes, and/or the function mappers using two or more data segments; (3) feeding the feature envelopes or combining feature envelopes to function mapper or combination of function mappers; Certain embodiments provide methods for combining 35 extracted feature envelopes from one or more data segments ( or regions) thus ensuring more realistic features parameters correlations across exemplars. Certain embodiments provide methods for training a single function mapper with a discriminant input on two or more 40 data segments thus in effect combining the functionalities of two separate function mappers trained on two different data segments and thus obtaining a combined function mapper…; And in using the different categories to learn from the dataset where the analysis is data to extracted features is analyzed separately, claimed without suitability, form the mapping to a category, e.g. classifier, in 19:4-21: FIG. 23B illustrates a combination of analyzed datasets in the same training set. According to one embodiment, the analyzed dataset exemplars 2308A, 2308B are combined suc­cessively in time so that each analyzed dataset exemplar matches the corresponding feature envelopes exemplar in time in a combined analyzed data set 2310. FIG. 24 is an illustration of a function mapper 2400 with 10 inputs 2408 including one discriminant input 2406. During the backpropagation training with discriminant input process 2404, the two segments are presented to the neural network 2400 for learning. During training, the discriminant input informs the function mapper of what type (or category) of  segment it is currently learning. This process is an example of combination of two or more segments at the function map­per's body level. The neural network actually learns proper­ties from both segments and the resulting trained MLP with discriminant input 2402 is able to discriminate and interpolate between them...; claimed plurality of learned function mapper of the neural network, in 20:40-49: …Next in FIG. 27A train a different Neural Network MLP function mapper 1602 on each different datasets, to map each extracted feature exemplar 1406 to each corresponding ana­lyzed dataset exemplar 1500 according to training process 1606 where the outputs exemplars are spectral peaks ampli­tudes and obtain a trained function mapper neural network 1606 for each dataset. The training process may use a mag-nitude-dependent weighting or normalizing function 1700 to give equivalent weights to each value of the analyzed dataset exemplars…)
a function evaluator module configured to perform a machine learning evaluation of the plurality of learned functions using test data and to maintain evaluation metadata for the plurality of learned functions in one or more non-transitory computer readable storage media, the evaluation metadata comprising one or more of an indicator of a training data set used to generate a learned function and an indicator of one or more decisions made by a learned function during the machine learning evaluation; and (claimed function evaluator for performing claimed evaluation as backpropagation using claimed meta data as the weight parameters, in 14:53-65: According to one embodiment, during the training stage, each feature exemplar 1406 is successively presented at the input of the MLP and each corresponding analyzed dataset exemplar ( or analysis frame) 1500 is presented at the output of the MLP as the desired outputs to be learned. According to one embodiment the MLP is trained according to a well known backpropagation algorithm.  At this training ( or design) stage a magnitude weighting ( or normalizing) function [the evaluation metadata comprising one or more of an indicator of a training data set used to generate a learned function and an indicator of one or more decisions made by a learned function during the machine learning evaluation] such as 1700 in FIG. 17 can be applied to the analyzed dataset 1206 or feature envelopes 1402 or both during learning to ensure that all parts of the data are given an equivalent weight ( e.g., values of same order of magnitude) at 65 the outputs of the function mapper during training time… )
(claimed learning ensemble as the chain of trained function mappers to process in a chain to obtain a composite output, in 21: 3-16: Finally in FIG. 27B obtain an audio composite 2700 [a machine learning compiler module configured to compile the executable program code from a subset of multiple learned functions from the plurality of learned functions to form the machine learning ensemble]. For instance in the case where the original segments are a segment from a cow mooing and a segment a saxophone phrase, the composite could sound like "cow that has swallowed a saxo­phone." Note that in audio embodiment #3 segments are also combined at input levels (e.g., as in audio embodiment #1).  The audio embodiments presented above enable feeding the segments into the trained function mappers [the machine learning ensemble comprising the subset of multiple learned functions selected and combined based on the evaluation metadata for the plurality of learned functions] one exemplar 10 at a time while still taking the full segments information such as duration, into account. Furthermore, each process in the chain (i.e., the MLP neural network, the SMS synthesizer etc) can be implemented to work in real-time. Therefore every time a feature envelope exemplar is presented at the input of 15 the chain at synthesis stage, one frame of audio is output at the output…) 
and comprising a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata and such that executable program code from one or more of the multiple learned functions receives output from executable program code of at least one other learned function of the multiple learned functions as an input. (claimed rule set as the set of node learned functions learned to combined features for computing categorization output of the neural network, as depicted in Fig. 5 and in 15:19-31: …In one specific embodiment, for example, in the case where the number of partials peaks to be controlled is 100 and the number of feature envelopes is 3, the network has 100 outputs, 3 inputs, one hidden layer of 100 neurons [hidden layers having claimed a total number of the plurality of learned functions selected such that at least a subset of the plurality of learned functions are pseudo-randomly suitable for the training data] and is fully connected. Other architectures are also suitable. For example, the number of hidden layer or number or hidden neurons can vary, or the MLP can be fully or partially connected [claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata]. Additionally, shunting units may be added to shunt the outputs to zero if the amplitude is below a certain thresh­old. At the end of the training process shown in FIG. 16A, a trained MLP neural network 1604 [claimed the multiple learned functions receives output from executable program code of at least one other learned function of the multiple learned functions as an input] is obtained with its param­eters (e.g., weights, biases) fitted to the training data [claimed different subsets of the data based on the evaluation metadata]…; And in 7:16-44: FIG. 5 shows a feed-forward Multilayer Perceptron (MLP) neural network used as a function mapper 500 according to one embodiment. In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames). Both the feature envelope exemplars 502 (or frames) and analyzed 35 dataset exemplars 504 ( or frames) can be one-dimensional or multi-dimensional depending on the application. Input pat­terns for an index i may include input frames at i-1, i-2, ... i-n and/or i+l, i+2, ... i+n. During this training (or design) stage, feature envelope 40 exemplars 502 are presented at the input of the function mapper 500 and the corresponding analyzed data exemplars 504 are presented at the output of the MLP as the desired outputs to be learned [claimed executable program code from one or more of the multiple learned functions receives output from executable program code of at least one other learned function of the multiple learned functions as an input.].


    PNG
    media_image1.png
    382
    1121
    media_image1.png
    Greyscale


)
Examiner notes that recited modules are taught in Drame as the hardware executing computer instruction to perform claimed functions and the execution of computer instruction to receiving processing instruction and outputs, in 23:9-24:44: Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules or hard­ware-implemented modules. A hardware-implemented mod­ule is a tangible unit capable of performing certain operations 15 and may be configured or arranged in a certain manner… In various embodiments, a hardware-implemented module (e.g., a computer-implemented module) may be implemented mechanically or electronically. For example, a hardware­implemented module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-pur­pose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware-implemented module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily config­ured by software to perform certain operations… Hardware-implemented modules can provide information to, and receive information from, other hardware-implemented modules. Accordingly, the described hardware­implemented modules may be regarded as being communi­catively coupled… The one or more processors may also operate to support performance of the relevant operations in a "cloud comput­ing" environment or as a "software as a service" (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network ( e.g., the Internet) and via one or more appropriate interfaces ( e.g., application program interfaces (APis )).

While Drame teaches the neural network of assembling a set of functions used to synthetized an output from the grouping (e.g. ensemble) of functions as the set of nodes for computing a desired output learned by the node functions using backpropagation as disclosed above. Drame does not expressly teach the use of the neural network nodes ensemble where the  nodes are regression classifier modes for grouping functions to perform a desired output (e.g. a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions as nodes ensemble into layered sub-categorizes …).
Senior does expressly teach the use of the use of the neural network nodes ensemble where the  nodes are regression classifier modes for grouping functions to perform a desired output (e.g. a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions as nodes ensemble into layered sub-categorizes …) (as depicted in Fig. 6 And in 21:29-41: As described above, the acoustic parameter generation module 504 of the example speech synthesis system 500 may 30 be implemented using a deep neural network, such as the neural network 506, in accordance with example embodi­ments. FIG. 6 is a schematic illustration of one type a neural network, namely a "feed-forward" neural network, that could be used for mapping phonetic transcriptions (e.g. training- 35 time phonetic-context descriptors 503 and/or run-time pho­netic-context descriptors 513) to acoustic feature vectors ( e.g. training-time predicted feature vectors 505 and/or run-time predicted feature vectors 515) [claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata]. As shown, a neural network 600 includes "layers" 602, 604, 606, and 608, labeled Li, L2, 40 L , and L4, respectively. Each layer includes a set of nodes [hidden layers having claimed a total number of the plurality of learned functions selected such that at least a subset of the plurality of learned functions are pseudo-randomly suitable for the training data], represented as circles in FIG. 6… In the 55 example of FIG. 6, the output is one or more output acoustic feature vectors 603 (represented as vertical arrows), each corresponding to a frame of to-be-synthesized waveform data... The layers 60 604 (L2) and 606 (L3) may sometimes be referred to as "hid­den layers." Each node in the neural network 600 may correspond to a mathematical function [learned functions] having adjustable parameters, and from which can be computed a scalar output of one or more 65 inputs…[claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata] As shown, the output of each node of a given layer is con­nected to the input of every node in the next layer, except that the input layer receives its input from data presented to the neural network ( e.g., phonetic-context descriptors 601 in the present example), and the output layer delivers output data from the neural network ( e.g., output acoustic feature vectors 603 in the present example). Taking the example of a sigmoid function, each node could compute a sigmoidal nonlinearity of a weighted sum of its inputs… More generally, neural networks may be considered as implementations of a variety classes of regres­sion algorithms and function approximators [claimed synthesized from the evaluation metadata to direct data through the multiple learned functions], including but not limited to conventional back-propagation neural net­works, convolutional networks, time-delay neural networks, and mixture-density networks…; And using backpropagation in 17:59-18:10: As part of training (e.g., during or prior to), the associated speech samples and/or speech sample sub-segments could be 60 processed with a signal processor ( e.g. a digital signal pro­cessor) to generate target feature vectors associated with the stored sample text strings (and/or text-string sub-segments). The neural network training module 510 may function to compare the training-time predicted feature vectors 505 out- 65 put by the neural network 506 with the target feature vector 509 obtained from the speech database 512 in order to update and/or adjust parameters ( e.g., weights) of the neural network 506. More specifically, the neural network 506 may perform "forward propagation" of the input training-time phonetic­context descriptors [claimed use of rule sets] 503 to generate the training-time pre­dicted feature vectors 505, and the neural network training module 510 may perform "back propagation" the inputs the input neural network training module 510 to update the neural network 506…; And using backpropagation in 17:59-18:10: As part of training (e.g., during or prior to), the associated speech samples and/or speech sample sub-segments could be 60 processed with a signal processor ( e.g. a digital signal pro­cessor) to generate target feature vectors associated with the stored sample text strings (and/or text-string sub-segments). The neural network training module 510 may function to compare the training-time predicted feature vectors 505 out- 65 put by the neural network 506 with the target feature vector 509 obtained from the speech database 512 in order to update and/or adjust parameters ( e.g., weights) of the neural network 506. More specifically, the neural network 506 may perform "forward propagation" of the input training-time phonetic­context descriptors [claimed use of rule sets] 503 to generate the training-time pre­dicted feature vectors 505, and the neural network training module 510 may perform "back propagation" the inputs the input neural network training module 510 to update the neural network 506…)




    PNG
    media_image2.png
    844
    662
    media_image2.png
    Greyscale
 

Alternatively, Senior teaches learning using neural networks without regard to suitability, as the use of training data that correspond to contextual information where the leaning process for discovering the node learned functions is applied to the data to learn the wide variety of information of the provided example training data, in 6: 8-30: In accordance with example embodiments, a parameter generation module of a speech synthesis system ( e.g., TTS system) may include a neural network that can be trained to receive sequences of phoneme labels ( or other types phonetic labels) in and/or accompanied by a wide variety of contexts, and to map them to acoustic feature vectors through a process of leaning to account for the wide variety of contexts… The meaning of "large number and variety of contexts" may be taken to correspond to a body large enough at least to stretch practical implementation limits of conventional techniques, necessitate at least some degree of approximation, and/or impose accuracy limits on generated feature vectors. The very challenge of such a large body of context information can be turned to an advantage in training a neural network, where a wide the variety of examples presented can lead to a wide the variety of examples learned, and correspondingly to versatility [claimed the different machine learning classes selected without regard to a suitability of the plurality of learned ] when the trained neural net­work is applied to data and circumstances beyond the training regime.)
The Drame and Senior references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose to provide an apparatus, system, method, and computer program product for information processing using automated machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing natural language data to automatically train node learning functions used to ensemble a neural network model for learning features and predicting a desired outcome in a multi-processor system as disclosed by Senior with the method for automated training of one or more function mappers to map input to drive a synthesis process for determining composite outputs as disclosed by Drame.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Senior and Drame in order train a neural network implemented by one or more process to map training-time sequences into predicted vectors that correspond to training data and it’s contextual properties to determine predictive outcomes (Senior, 5:37-41); Doing so can help enhance the accuracy of the predicted outcomes when processing mapping the context information associated with the training data (Senior 5:33-41) and enable versatility when the trained functions are applied to data and circumstances beyond the training regime (Senior 6:24-30).
While Drame and Senior discuss the use of backpropagation for evaluating the learned functions by adjusting weights as disclosed above, Drame and Senior do not expressly disclose learned functions using test data and to maintain evaluation metadata for the plurality of learned functions. Burg does expressly teach claim 1 limitation: …learned functions using test data and to maintain evaluation metadata for the plurality of learned functions. (as testing the trained learning system as depicted in  Fig., in 0035: In general, learning  systems have multiple phases of  operation.  The  initial  phase  is  known  as  the  training phase.  During the training phase,  a set of training data can be input into the learning system. The learning system learns to optimize  its output for data during the processing  of the training data. Next, a set of validation data can be input into the  learning  system.  The  results  of  processing  of the validation data set [claimed test data] by the learning system can be measured using a variety of evaluation metrics to evaluate the performance of  the  learning  system [claimed learned functions using test data and to maintain evaluation metadata for the plurality of learned functions].  The  learning  system  can  alternate between the training and validation data to optimize system performance.  Once the learning  system achieves  a desired level of performance, the parameters of the learning system can  be  fixed such  that  performance  will  remain  constant before the learning system enters into the operational phase. During the operational  phase, which typically  follows  both training and validation, users can utilize the learning system to  process  operational  data  and  obtain  the  users'  desired results. )
The Burg, Senior and Drame references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose to provide an apparatus, system, method, and computer program product for information processing using automated machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing natural language data to automatically using a machine learning system for information processing task as disclosed by Burg with the method for automated training of one or more function models and synthesis process for determining desired outputs as collectively disclosed by Drame and Senior.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Burg, Senior and Drame in order to train learning models that can be used for a variety of data processing or analysis tasks using training phrases using both training and validation data (Burg, 0004); Doing so can enable the learning system to process operational data and obtain the users' desired result 

	Regarding claim 2, the rejection of claim 1 is incorporated and Drame in combination with Senior and Burg teaches the apparatus of claim 1, further comprising a feature selector module configured to in response to the function generator module generating the executable program code for the plurality of learned functions, determine a subset of features from the training data for use in the machine learning ensemble based on the evaluation metadata the machine learning compiler module configured to form the machine learning ensemble using the selected subset of features. (claimed learning ensemble as the chain of trained function mappers to process in a chain to obtain a composite output, in 21: 3-16: Finally in FIG. 27B obtain an audio composite 2700. For instance in the case where the original segments are a segment from a cow mooing and a segment a saxophone phrase, the composite could sound like "cow that has swallowed a saxo­phone." Note that in audio embodiment #3 segments are also combined at input levels (e.g., as in audio embodiment #1).  The audio embodiments presented above enable feeding the segments into the trained function mappers [claimed feature selector module configured to in response to the function generator module generating the executable program code for the plurality of learned functions, determine a subset of features from the training data for use in the machine learning ensemble based on the evaluation metadata the machine learning compiler module configured to form the machine learning ensemble] one exemplar 10 at a time while still taking the full segments information such as duration, into account. Furthermore, each process in the chain (i.e., the MLP neural network [claimed the machine learning ensemble using the selected subset of features], the SMS synthesizer etc) can be implemented to work in real-time. Therefore every time a feature envelope exemplar is presented at the input of 15 the chain at synthesis stage, one frame of audio is output at the output…)
	
	Regarding claim 3, the rejection of claim 2 is incorporated and Drame in combination with Senior and Burg the apparatus of claim 2, wherein the feature selector module is configured to iteratively increase a size of the subset of features until a subsequent increase in the size fails to satisfy a feature effectiveness threshold. (claimed increase as size of feature mappers in subset of learned function stored using a combination learned functions for categorizing a segment, in 9:3-46:  The results of the segmentation/categorization processes 808 of PIG. 8A in the categorized feature envelopes store 806 include information defining the boundaries of the segment, and one or more labels to assign one or more categories to the segment… As discussed below in greater detail, certain embodiments include a modeling stage and a synthesis stage. The modeling stage may include operations for training and building a model including:… The synthesis stage may include: ( a) operations for feeding inputs (possibly including combined inputs) to one or more function mappers (possibly including function mappers trained or built to com­bine properties of two or more segments) [configured to iteratively increase a size of the subset of features until a subsequent increase in the size fails to satisfy a feature effectiveness threshold as given by the combined properties]; and (b) operations for feeding output of the function mappers to a synthesis process [the feature selector module is configured to iteratively increase a size of the subset of features]. As discussed below in greater detail, a combination of two or more segments may occur at various places includ­ing the function mapper's input level, the function mapper's body level, the function mapper's output level, or some com­bination of these levels…; And claimed iterative process given the segment index and not adding when the pick segments fail to fit predetermined static distribution as claimed effectiveness threshold, in 9:48-62: FIG. 9 illustrates an example embodiment of a synthesis stage that combines two or more segments at the function mapper's input level. The segment picking process 900 includes picking one or more feature envelope segments from one or more categorized feature envelopes datasets ( or stores) 806 to determine segments 812. The rules and methods for segment picking including segment picking process (SPP) dynamic parameters 902 can be chosen or designed at design time. Examples of segment picking rules include, for example, random picking, manual picking, or rules such as "pick the first segment of one dataset and the first segment of a second dataset, then pick the second segment of the first dataset and the second segment of the second dataset etc." As another example a rule can be "pick segments that fit a pre­determined statistic distribution [claimed iteratively increase a size of the subset of features until a subsequent increase in the size fails to satisfy a feature effectiveness threshold].")

 	Regarding claim 4, the rejection of claim 2 is incorporated and Drame in combination with Senior and Burg the apparatus of claim 2, wherein one or more of the features of the training data
are selected by a user as required (claimed selected feature for analysis selected by means of capturing data to be model to obtain the training data set, in 13: 21-30: FIG.12 illustrates a parametric analysis process performed on recorded audio sample from an audio sample store 1202 to obtain an analyzed dataset 1206 ( or synthesis parameters set). In accordance with one embodiment, the data to be modeled are monophonic audio recordings such as instruments like flute, saxophone, singing voice or other sounds such as ani­mal sounds, sound effects, and the like. Such samples can be captured using a microphone and an audio recording device 1200, or by any other mean such as recorded from the output of another synthesis process.; And as manually received feature envelopes as claimed section by a user as required, in 13:66-14:3: FIG. 14 illustrates a feature extraction process 1400 to obtain a feature envelopes dataset 1402. Feature envelopes can be computed from the analyzed dataset 1206 or directly from the captured audio data 1202 or manually assigned using a graphical interface for example.)
(in 14:4-: Feature envelopes are successions of feature envelope frames (or feature envelope exemplars). The values in the feature envelope frames ( or feature envelope exemplars, or feature vectors) can be computed from analyzed dataset frames 1500 or 1502 from the audio slices by the feature extraction process 1400.. As illustrated in FIG. 16A, the MLP is trained to map [claimed the feature selector module is configured to select one or more optional features to include in the subset of features with the required one or more features] between feature envelope exemplars 1406 from the feature envelopes dataset 1402 [claimed selected optional features to include in the subset of features with the required one or more features] and corresponding analyzed dataset exemplars 1500 from the analyzed dataset 1206. The dataset exemplars 1500 may include the amplitudes ( or magnitudes) of the spectral peaks. Exemplars may also include other spec-tral information such as frequencies for example, and an MLP could be trained on the frequency trajectories to provide the frequencies information needed at synthesis time and thus model sounds that are not purely harmonic. An MLP could also be trained on the noise spectral envelopes to model noise like sounding sounds such as wind or breaths.)

Regarding claim 5, the rejection of claim 1 is incorporated and Drame. While Drame and Senior teach the training of learned functions that for predicting a desired output, the Drame and Senior do not expressly teach claim 5 limitation. Burg does expressly teach claim 5 limitation: wherein the function evaluator module is configured to perform the machine learning evaluation of the plurality of learned functions using the test data by inputting the test data into the plurality of learned functions to output the one or more decisions. (as testing the trained learning system as depicted in  Fig., in 0035: In general, learning  systems have multiple phases of  operation.  The  initial  phase  is  known  as  the  training phase.  During the training phase,  a set of training data can be input into the learning system. The learning system learns to optimize  its output for data during the processing  of the training data. Next, a set of validation data can be input into the  learning  system.  The  results  of  processing  of the validation data set [claimed test data] by the learning system can be measured using a variety of evaluation metrics to evaluate the performance of  the  learning  system [claimed wherein the function evaluator module is configured to perform the machine learning evaluation of the plurality of learned functions using the test data by inputting the test data into the plurality of learned functions to output the one or more decisions].  The  learning  system  can  alternate between the training and validation data to optimize system performance.  Once the learning  system achieves  a desired level of performance, the parameters of the learning system can  be  fixed such  that  performance  will  remain  constant before the learning system enters into the operational phase. During the operational  phase, which typically  follows  both training and validation, users can utilize the learning system to  process  operational  data  and  obtain  the  users'  desired results. )
The Burg, Senior and Drame references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose to provide an apparatus, system, method, and computer program product for information processing using automated machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing natural language data to automatically using a machine learning system for information processing task as disclosed by Burg with the method for automated training of one or more function models and synthesis process for determining desired outputs as collectively disclosed by Drame and Senior.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Burg, Senior and Drame in order to train learning models that can be used for a variety of data processing or analysis tasks using training phrases using both training and validation data (Burg, 0004); Doing so can enable the learning system to process operational data and obtain the users' desired result 

Regarding claim 6, the rejection of claim 5 is incorporated and Drame in combination with Senior and Burg teaches the apparatus of claim 5, wherein the function evaluator module is configured to maintain the evaluation metadata for each evaluated learned function in a metadata library stored on the one or more non-transitory computer readable storage media, (weights and parameters stored as claimed metadata library of computer storage as claimed metadata library, in 14:33-40: FIG. 16A shows an example embodiment related building or training a function mapper between extracted feature enve­lopes 1402 and analyzed dataset 1206 ( e.g. as in FIG. 14). In accordance with one embodiment, the function mapper is a Multilayer Perceptron (MLP) neural network 1602 and the training process includes adjusting the MLP's internal parameters (weights and biases) [claimed the function evaluator module is configured to maintain the evaluation metadata for each evaluated learned function in a metadata library] according to a well known backprogpagation algorithm…; And in 24:6-14: … For example, one hardware-implemented module may perform an operation, and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware- implemented module may then, at a later time, access the memory device to retrieve and process [claimed metadata library stored on the one or more non-transitory computer readable storage media] the stored output. Hardware-implemented modules may also initiate communi­cations with input or output devices, and can operate on a resource ( e.g., a collection of information)…)
the machine learning compiler module configured to include the rule set in the machine learning ensemble, the rule set comprising at least a portion of the evaluation metadata. (in 9:26-47: … The modeling stage may include operations for training and building a model including: … ( c) operations for building or training one or more function map­pers to map between extracted features and analyzed dataset (e.g., function mappers can possibly be trained or built to combine properties of two or more segments using a discrimi­nant input); and (d) operations for designing and choosing generative rules and combination processes [claimed machine learning compiler module configured to include the rule set in the machine learning ensemble, the rule set comprising at least a portion of the evaluation metadata]. The synthesis stage may include: ( a) operations for feeding inputs (possibly including combined inputs) to one or more function mappers (possibly including function mappers trained or built to com­bine properties of two or more segments); and (b) operations for feeding output of the function mappers to a synthesis process. As discussed below in greater detail, a combination of two or more segments may occur at various places includ­ing the function mapper's input level, the function mapper's body level, the function mapper's output level, or some com­bination of these levels…; And in 10:25-50: The segment picking process 900, the segment matching process 904, and the feature combination process 908 are examples of generative rules or combining rules ( or methods and rules for generating new feature entries by using the extracted features and or other information contained in the data and… As dis­cussed above, these generative rules may have dynamic parameters to change and control their behavior, and these rules together with methods for changing or generating these dynamic parameters [the rule set comprising at least a portion of the evaluation metadata] may be designed at the design stage… As discussed above, an example of synthesis stage com­bining two or more segments at the function mapper's input level is described in FIG. 9. A corresponding synthesis method includes: performing a segment picking process 900 to pick two or more segments from feature envelopes stores 806; performing the segments matching process 904 to obtain matched segments 914; applying the feature combination process 908 to the time-matched segments to obtain the com­bined segment 918; feeding the combined segment 918 to the trained function mapper 402 [claimed machine learning  ]…)

Regarding claim 7, the rejection of claim 6 is incorporated and Drame in combination with Senior and Burg teaches the apparatus of claim 6, wherein the evaluation metadata further comprises one or more of the training data, classification metadata, convergence metrics, and efficacy metrics for the plurality of learned functions. (claimed evaluation data as data used during training of learned functions using backpropagation, in 7:19-: In accordance with one embodiment, a multilayer feed­forward perceptron neural network is used as a function map­per 400. Neural networks can be considered as representatives of a broad class of adaptive function mappers and have been shown to be universal function approximators. More­over, neural networks are known for their interpolation and extrapolation properties… In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames)…  During this training (or design) stage, feature envelope 40 exemplars 502 are presented at the input of the function mapper 500 and the corresponding analyzed data exemplars 504 [claimed classification metadata] are presented at the output of the MLP as the desired outputs to be learned...; And learning using backpropagation in 7:18-60: In accordance with one embodiment, a multilayer feed­forward perceptron neural network is used as a function map­per 400. Neural networks can be considered as representa tives of a broad class of adaptive function mappers and have been shown to be universal function approximators… FIG. 5 shows a feed-forward Multilayer Perceptron (MLP) neural network used as a function mapper 500 according to one embodiment. In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames)... FIG. 6 is an example of a magnitude dependent weighting function that can be used for function mapper training. At this training ( or design) stage and depending on the modeled data, the target application, and the type of function mapper used, a magnitude weighting ( or normalizing) function F [computing claimed confidence metric for the classification as efficacy metrics] such as the one shown in FIG. 6 can be applied to analyzed dataset or feature envelopes or both during learning to ensure that all parts of the data are given an equivalent weight (i.e., values of same order of magnitude at the outputs of the function map­per)…)

Regarding claim 8, the rejection of claim 1 is incorporated and Drame in combination with Senior and Burg  teaches the apparatus of claim 1, wherein the machine learning compiler module is configured to combine learned functions from the plurality of learned functions to form combined learned functions, the machine learning ensemble comprising at least one combined learned function. (claimed combination as the set of node learned functions learned to combined features for computing categorization output of the neural network, as depicted in Fig. 5 and in 15:19-31: …In one specific embodiment, for example, in the case where the number of partials peaks to be controlled is 100 and the number of feature envelopes is 3, the network has 100 outputs, 3 inputs, one hidden layer of 100 neurons [claimed machine learning compiler module is configured to combine learned functions from the plurality of learned functions to form combined learned functions, the machine learning ensemble comprising at least one combined learned function] and is fully connected. Other architectures are also suitable. For example, the number of hidden layer or number or hidden neurons can vary, or the MLP can be fully or partially connected. Additionally, shunting units may be added to shunt the outputs to zero if the amplitude is below a certain thresh­old. At the end of the training process shown in FIG. 16A, a trained MLP neural network 1604 is obtained with its param­eters (e.g., weights, biases) fitted to the training data]…


    PNG
    media_image1.png
    382
    1121
    media_image1.png
    Greyscale
)

 Additionally Senior teaches node learned functions combined as layers to ensemble a combination as a neural network as claimed (depicted in Fig. 6 And in 21:29-41: As described above, the acoustic parameter generation module 504 of the example speech synthesis system 500 may 30 be implemented using a deep neural network, such as the neural network 506, in accordance with example embodi­ments. FIG. 6 is a schematic illustration of one type a neural network, namely a "feed-forward" neural network, that could be used for mapping phonetic transcriptions (e.g. training- 35 time phonetic-context descriptors 503 and/or run-time pho­netic-context descriptors 513) to acoustic feature vectors ( e.g. training-time predicted feature vectors 505 and/or run-time predicted feature vectors 515). As shown, a neural network 600 includes "layers" 602, 604, 606, and 608, labeled Li, L2, 40 L , and L4, respectively. Each layer includes a set of nodes [the machine learning compiler module is configured to combine learned functions from the plurality of learned functions to form combined learned functions as learned node functions combined into a layer], represented as circles in FIG. 6… In the 55 example of FIG. 6, the output is one or more output acoustic feature vectors 603 (represented as vertical arrows), each corresponding to a frame of to-be-synthesized waveform data... The layers 60 604 (L2) and 606 (L3) may sometimes be referred to as "hid­den layers." Each node in the neural network 600 may correspond to a mathematical function [learned functions] having adjustable parameters, and from which can be computed a scalar output of one or more 65 inputs… As shown, the output of each node of a given layer is con­nected to the input of every node in the next layer, except that the input layer receives its input from data presented to the neural network ( e.g., phonetic-context descriptors 601 in the present example), and the output layer delivers output data from the neural network ( e.g., output acoustic feature vectors 603 in the present example). Taking the example of a sigmoid function, each node could compute a sigmoidal nonlinearity of a weighted sum of its inputs… More generally, neural networks may be considered as implementations of a variety classes of regres­sion algorithms and function approximators [the machine learning ensemble comprising at least one combined learned function], including but not limited to conventional back-propagation neural net­works, convolutional networks, time-delay neural networks, and mixture-density networks…; And using backpropagation in 17:59-18:10: As part of training (e.g., during or prior to), the associated speech samples and/or speech sample sub-segments could be 60 processed with a signal processor ( e.g. a digital signal pro­cessor) to generate target feature vectors associated with the stored sample text strings (and/or text-string sub-segments). The neural network training module 510 may function to compare the training-time predicted feature vectors 505 out- 65 put by the neural network 506 with the target feature vector 509 obtained from the speech database 512 in order to update and/or adjust parameters ( e.g., weights) of the neural network 506. More specifically, the neural network 506 may perform "forward propagation" of the input training-time phonetic­context descriptors [claimed use of rule sets] 503 to generate the training-time pre­dicted feature vectors 505, and the neural network training module 510 may perform "back propagation" the inputs the input neural network training module 510 to update the neural network 506…)



    PNG
    media_image2.png
    844
    662
    media_image2.png
    Greyscale
 

It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Drame and Senior for the same reasons disclosed above.

	Regarding claim 9, the rejection of claim 8 is incorporated and Drame in combination with Senior and Burg teaches the apparatus of claim 8, wherein the function generator module is configured to determine one or more additional learned functions in response to a learned function request, the machine learning compiler module configured to request one or more additional learned functions from the function generator to combine with learned functions from the plurality of learned functions. (request by the synthesis process to reconstruct based on frequencies as claimed request, in 14:47-58: According to one embodiment the function mapper's out­puts represent only a subset of the parameters needed for synthesis, namely the spectral peaks amplitudes A,. In that case the frequencies can be reconstructed using the pitch 50 information from the input exemplars 1406 assuming the frequencies are harmonics of the pitch for instance. There­fore, in that case, the pitch information from the currently processed exemplar 1406 from the feature envelopes store 1402 is fed to the synthesis process 1300 [claimed function generator module is configured to determine one or more additional learned functions in response to a learned function request] in order to reconstruct the harmonic frequencies for each output frame. Alter­natively, as discussed above, the MLP or a second MLP could also be trained on the frequency trajectories [the machine learning compiler module configured to request one or more additional learned functions from the function generator to combine with learned functions from the plurality of learned functions].)

Regarding claim 10, the rejection of claim 1 is incorporated and Drame in combination with Senior and Burg teaches the apparatus of claim 1, wherein the machine learning compiler module is configured to add one or more layers to at least a portion of the plurality of learned functions to form one or more extended learned functions, at least one of the one or more layers comprising one or more of a Bayes classifier and a Boltzmann machine, the at least one other learned function comprising an extended learned function extended with the one or more of the multiple learned functions. (claimed addition as hidden layer of nodes of neural network as claimed Boltzmann machine, in 15:16-27: According to one embodiment the architecture of the MLP [claimed the machine learning compiler module is configured to add one or more layers to at least a portion of the plurality of learned functions to form one or more extended learned functions, at least one of the one or more layers comprising one or more of … a Boltzmann machine] is characterized as having one input per extracted feature envelope to control and one output per analyzed dataset frame value to control. In one specific embodiment, for example, in the case where the number of partials peaks to be controlled is 100 and the number of feature envelopes is 3, the network has 100 outputs, 3 inputs, one hidden layer of 100 neurons and is fully connected. Other architectures are also suitable. For example, the number of hidden layer or number or hidden neurons can vary, or the MLP can be fully or partially connected.; And claimed adding process for synthesizing a combination of the learned function, in 9:32-41: … operations for building or training one or more function map­pers to map between extracted features and analyzed dataset (e.g., function mappers can possibly be trained or built to combine properties of two or more segments using a discrimi­nant input); and (d) operations for designing and choosing generative rules and combination processes. The synthesis stage may include: ( a) operations for feeding inputs (possibly including combined inputs) to one or more function mappers (possibly including function mappers trained or built to com­bine properties of two or more segments);…)

Regarding claim 11, the rejection of claim 1 is incorporated and Drame in combination with Senior and Burg teaches the apparatus of claim 1, wherein the machine learning compiler module is configured to form the machine learning ensemble by organizing the subset of learned functions into the machine learning ensemble, the machine learning ensemble comprising the subset of learned functions (claimed learning ensemble as the chain of trained function mappers to process in a chain to obtain a composite output, in 21: 3-16: Finally in FIG. 27B obtain an audio composite 2700 [the machine learning compiler module is configured to form the machine learning ensemble by organizing the subset of learned functions into the machine learning ensemble, the machine learning ensemble comprising the subset of learned functions]. For instance in the case where the original segments are a segment from a cow mooing and a segment a saxophone phrase, the composite could sound like "cow that has swallowed a saxo­phone." Note that in audio embodiment #3 segments are also combined at input levels (e.g., as in audio embodiment #1).  The audio embodiments presented above enable feeding the segments into the trained function mappers one exemplar 10 at a time while still taking the full segments information such as duration, into account. Furthermore, each process in the chain [the machine learning compiler module is configured to form the machine learning ensemble by organizing the subset of learned functions into the machine learning ensemble, the machine learning ensemble comprising the subset of learned functions] (i.e., the MLP neural network, the SMS synthesizer etc) can be implemented to work in real-time. Therefore every time a feature envelope exemplar is presented at the input of  the chain at synthesis stage, one frame of audio is output at the output…) 
and the rule set synthesized from the evaluation metadata for the subset of learned functions. (claimed function evaluator for performing claimed evaluation as backpropagation using claimed meta data as the weight parameters, in 14:53-65: According to one embodiment, during the training stage, each feature exemplar 1406 is successively presented at the input of the MLP and each corresponding analyzed dataset exemplar ( or analysis frame) 1500 is presented at the output of the MLP as the desired outputs to be learned. According to one embodiment the MLP is trained according to a well known backpropagation algorithm [the rule set synthesized from the evaluation metadata for the subset of learned functions]. At this training ( or design) stage a magnitude weighting ( or normalizing) function such as 1700 in FIG. 17 can be applied to the analyzed dataset 1206 or feature envelopes 1402 or both during learning to ensure that all parts of the data are given an equivalent weight ( e.g., values of same order of magnitude) [claimed the evaluation metadata for the subset of learned functions] at the outputs of the function mapper during training [claimed rule set synthesized from the evaluation metadata for the subset of learned functions] time… )
Additionally Senior teaches node learned functions combined as layers to ensemble a combination as a neural network as claimed (depicted in Fig. 6 And in 21:29-41: As described above, the acoustic parameter generation module 504 of the example speech synthesis system 500 may 30 be implemented using a deep neural network, such as the neural network 506, in accordance with example embodi­ments. FIG. 6 is a schematic illustration of one type a neural network, namely a "feed-forward" neural network, that could be used for mapping phonetic transcriptions (e.g. training- 35 time phonetic-context descriptors 503 and/or run-time pho­netic-context descriptors 513) to acoustic feature vectors ( e.g. training-time predicted feature vectors 505 and/or run-time predicted feature vectors 515) [claimed rule set synthesized from the evaluation metadata for the subset of learned functions]. As shown, a neural network 600 includes "layers" 602, 604, 606, and 608, labeled Li, L2, 40 L , and L4, respectively. Each layer includes a set of nodes [the machine learning compiler module is configured to combine learned functions from the plurality of learned functions to form combined learned functions as learned node functions combined into a layer], represented as circles in FIG. 6… In the 55 example of FIG. 6, the output is one or more output acoustic feature vectors 603 (represented as vertical arrows), each corresponding to a frame of to-be-synthesized waveform data... The layers 60 604 (L2) and 606 (L3) may sometimes be referred to as "hid­den layers." Each node in the neural network 600 may correspond to a mathematical function [learned functions] having adjustable parameters, and from which can be computed a scalar output of one or more 65 inputs… As shown, the output of each node of a given layer is con­nected to the input of every node in the next layer, except that the input layer receives its input from data presented to the neural network ( e.g., phonetic-context descriptors 601 in the present example), and the output layer delivers output data from the neural network ( e.g., output acoustic feature vectors 603 in the present example). Taking the example of a sigmoid function, each node could compute a sigmoidal nonlinearity of a weighted sum of its inputs [evaluation metadata for the subset of learned functions using backpropagation]… More generally, neural networks may be considered as implementations of a variety classes of regres­sion algorithms and function approximators [the machine learning ensemble comprising at least one combined learned function including claimed rule set synthesized from the evaluation metadata for the subset of learned functions using backpropagation], including but not limited to conventional back-propagation neural net­works, convolutional networks, time-delay neural networks, and mixture-density networks…: And using backpropagation in 17:59-18:10: As part of training (e.g., during or prior to), the associated speech samples and/or speech sample sub-segments could be 60 processed with a signal processor ( e.g. a digital signal pro­cessor) to generate target feature vectors associated with the stored sample text strings (and/or text-string sub-segments). The neural network training module 510 may function to compare the training-time predicted feature vectors 505 out- 65 put by the neural network 506 with the target feature vector 509 obtained from the speech database 512 in order to update and/or adjust parameters ( e.g., weights) of the neural network 506. More specifically, the neural network 506 may perform "forward propagation" of the input training-time phonetic­context descriptors [claimed use of rule sets] 503 to generate the training-time pre­dicted feature vectors 505, and the neural network training module 510 may perform "back propagation" the inputs the input neural network training module 510 to update the neural network 506…)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Drame and Senior for the same reasons disclosed above.

Regarding claim 12, the rejection of claim 1 is incorporated and Drame in combination with Senior and Burg teaches the apparatus of claim 1, further comprising an orchestration module configured to direct workload data through the machine learning ensemble based on the evaluation metadata data to produce a classification for the workload data and a confidence metric for the classification, the evaluation metadata synthesized to form the rule set for the subset of learned functions. (in 9:26-47: … The modeling stage may include operations for training and building a model including: … ( c) operations for building or training one or more function map­pers to map between extracted features and analyzed dataset (e.g., function mappers can possibly be trained or built to combine properties of two or more segments using a discrimi­nant input); and (d) operations for designing and choosing generative rules and combination processes [claimed orchestration module configured to direct workload data through the machine learning ensemble based on the evaluation metadata data to produce a classification for the workload data and a confidence metric for the classification, the evaluation metadata synthesized to form the rule set for the subset of learned functions]. The synthesis stage may include: ( a) operations for feeding inputs (possibly including combined inputs) to one or more function mappers [claimed direct workload data through the machine learning ensemble based on the evaluation metadata data to produce a classification for the workload data…, the evaluation metadata synthesized to form the rule set for the subset of learned functions] (possibly including function mappers trained or built to com­bine properties of two or more segments); and (b) operations for feeding output of the function mappers to a synthesis process. As discussed below in greater detail, a combination of two or more segments may occur at various places includ­ing the function mapper's input level, the function mapper's body level, the function mapper's output level, or some com­bination of these levels…; And in 10:25-50: The segment picking process 900, the segment matching process 904, and the feature combination process 908 are examples of generative rules or combining rules ( or methods and rules for generating new feature entries by using the extracted features and or other information contained in the data and… As dis­cussed above, these generative rules may have dynamic parameters to change and control their behavior [including and a confidence metric for the classification], and these rules together with methods for changing or generating these dynamic parameters [the rule set comprising at least a portion of the evaluation metadata] may be designed at the design stage… As discussed above, an example of synthesis stage com­bining two or more segments at the function mapper's input level is described in FIG. 9. A corresponding synthesis method includes: performing a segment picking process 900 to pick two or more segments from feature envelopes stores 806; performing the segments matching process 904 to obtain matched segments 914; applying the feature combination process 908 to the time-matched segments to obtain the com­bined segment 918; feeding the combined segment 918 to the trained function mapper 402 …; using backpropagation in 7:18-60: In accordance with one embodiment, a multilayer feed­forward perceptron neural network is used as a function map­per 400. Neural networks can be considered as representa- 20 tives of a broad class of adaptive function mappers and have been shown to be universal function approximators… FIG. 5 shows a feed-forward Multilayer Perceptron (MLP) neural network used as a function mapper 500 according to one embodiment. In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames)... FIG. 6 is an example of a magnitude dependent weighting function that can be used for function mapper training. At this training ( or design) stage and depending on the modeled data, the target application, and the type of function mapper used, a magnitude weighting ( or normalizing) function F [computing claimed confidence metric for the classification] such as the one shown in FIG. 6 can be applied to analyzed dataset or feature envelopes or both during learning to ensure that all parts of the data are given an equivalent weight (i.e., values of same order of magnitude at the outputs of the function map­per)…)

Regarding claim 13, the rejection of claim 1 is incorporated and Drame in combination with Senior and Burg teaches the apparatus of claim 1, further comprising an interface module configured to receive an analytics request from a client and to provide an analytics result to the client, the analytics request comprising workload data with similar features to the training data, the analytics result produced by the machine learning ensemble. (in 9:48-62: FIG. 9 illustrates an example embodiment of a synthesis stage that combines two or more segments at the function mapper's input level. The segment picking process 900 includes picking one or more feature envelope segments from one or more categorized feature envelopes datasets ( or stores) 806 to determine segments 812 [claimed ]. The rules and methods for segment picking including segment picking process (SPP) [] dynamic parameters 902 can be chosen or designed at design time. Examples of segment picking rules include, for example, random picking, manual picking [using claimed interface module configured to receive an analytics request from a client and to provide an analytics result to the client, the analytics request comprising workload data with similar features to the training data, the analytics result produced by the machine learning ensemble], or rules such as "pick the first segment of one dataset and the first segment of a second dataset, then pick the second segment of the first dataset and the second segment of the second dataset etc." As another example a rule can be "pick segments that fit a pre­determined statistic distribution [the analytics request comprising workload data with similar features to the training data, the analytics result produced by the machine learning ensemble]…

Regarding claim 14, Drame teaches: an apparatus for a machine learning factory, the apparatus comprising: means for generating executable program code for a plurality of learned functions from a plurality of different machine learning classes based on training data without regard to a suitability of the plurality of learned functions and of the different machine learning classes for the training data, (claimed function generator means  for learning claimed functions as mapping functions, in 3:30-43: … Certain embodiments provide methods for training a single function mapper with a discriminant input on two or more 40 data segments thus in effect combining the functionalities of two separate function mappers trained on two different data segments and thus obtaining a combined function mapper…; And in using the different categories to learn from the dataset where the analysis is data to extracted features is analyzed separately, claimed without suitability, form the mapping to a category, e.g. classifier, in 19:4-21: FIG. 23B illustrates a combination of analyzed datasets in the same training set. According to one embodiment, the analyzed dataset exemplars 2308A, 2308B are combined suc­cessively in time so that each analyzed dataset exemplar matches the corresponding feature envelopes exemplar in time in a combined analyzed data set 2310. FIG. 24 is an illustration of a function mapper 2400 with 10 inputs 2408 including one discriminant input 2406. During the backpropagation training with discriminant input process 2404, the two segments are presented to the neural network 2400 for learning. During training, the discriminant input informs the function mapper of what type (or category) of segment it is currently learning. This process is an example of combination of two or more segments at the function map­per's body level. The neural network actually learns proper­ties from both segments and the resulting trained MLP with discriminant input 2402 is able to discriminate and interpolate between them...; claimed plurality of learned function mapper of the neural network, in 20:40-49: …Next in FIG. 27A train a different Neural Network MLP function mapper 1602 on each different datasets, to map each extracted feature exemplar 1406 to each corresponding ana­lyzed dataset exemplar 1500 according to training process 1606 where the outputs exemplars are spectral peaks ampli­tudes and obtain a trained function mapper neural network 1606 for each dataset. The training process may use a mag-nitude-dependent weighting or normalizing function 1700 to give equivalent weights to each value of the analyzed dataset exemplars…)
the training data received for forming a machine learning ensemble customized for the training data; (claimed training data as extracted feature datasets received for analysis, in 3:24-30: Certain embodiments provide methods for generating dynamically controllable data composites from two or more 25 data segments comprising the steps of: (1) building or training one or more function mappers to map between one or more extracted feature envelopes datasets [the training data received for forming a machine learning ensemble customized for the training data]  from the original data and one or more analyzed dataset fitting one or more general parametric representation of the original data;…)
means for evaluating the plurality of learned functions using test data to and generate evaluation metadata stored in one or more non-transitory computer readable storage media, the evaluation metadata indicating an effectiveness of different learned functions at making predictions based on different subsets of the test data; and (claimed function evaluator for performing claimed evaluation as backpropagation using claimed meta data as the weight parameters, in 14:53-65: According to one embodiment, during the training stage, each feature exemplar 1406 is successively presented at the input of the MLP and each corresponding analyzed dataset exemplar ( or analysis frame) 1500 is presented at the output of the MLP as the desired outputs to be learned. According to one embodiment the MLP is trained according to a well known backpropagation algorithm.  At this training ( or design) stage a magnitude weighting ( or normalizing) function [the evaluation metadata comprising one or more of an indicator of a training data set used to generate a learned function and an indicator of one or more decisions made by a learned function during the machine learning evaluation] such as 1700 in FIG. 17 can be applied to the analyzed dataset 1206 or feature envelopes 1402 or both during learning to ensure that all parts of the data are given an equivalent weight ( e.g., values of same order of magnitude) at 65 the outputs of the function mapper during training time… )
means for compiling executable program code from a subset of multiple learned functions from the plurality of learned functions to form the machine learning ensemble, the machine learning ensemble comprising the subset of multiple learned functions selected and combined based on the evaluation metadata, and comprising a rule set synthesized from the evaluation metadata to direct different subsets of the workload data through executable program code from different learned functions of the multiple learned functions based on the evaluation metadata. (claimed learning ensemble as the chain of trained function mappers to process in a chain to obtain a composite output, in 21: 3-16: Finally in FIG. 27B obtain an audio composite 2700. For instance in the case where the original segments are a segment from a cow mooing and a segment a saxophone phrase, the composite could sound like "cow that has swallowed a saxo­phone." Note that in audio embodiment #3 segments are also combined at input levels (e.g., as in audio embodiment #1).  The audio embodiments presented above enable feeding the segments into the trained function mappers one exemplar 10 at a time while still taking the full segments information such as duration, into account. Furthermore, each process in the chain (i.e., the MLP neural network, the SMS synthesizer etc) [the machine learning ensemble comprising the subset of multiple learned functions selected and combined based on the evaluation metadata] can be implemented to work in real-time. Therefore every time a feature envelope exemplar is presented at the input of 15 the chain at synthesis stage, one frame of audio is output at the output…) 
(claimed rule set as the set of node learned functions learned to combined features for computing categorization output of the neural network, as depicted in Fig. 5 and in 15:19-31: …In one specific embodiment, for example, in the case where the number of partials peaks to be controlled is 100 and the number of feature envelopes is 3, the network has 100 outputs, 3 inputs, one hidden layer of 100 neurons  and is fully connected. Other architectures are also suitable. For example, the number of hidden layer or number or hidden neurons can vary, or the MLP can be fully or partially connected [claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata]. Additionally, shunting units may be added to shunt the outputs to zero if the amplitude is below a certain thresh­old. At the end of the training process shown in FIG. 16A, a trained MLP neural network 1604 [claimed the multiple learned functions receives output from executable program code of at least one other learned function of the multiple learned functions as an input] is obtained with its param­eters (e.g., weights, biases) fitted to the training data [claimed different subsets of the data based on the evaluation metadata]…; And in 7:16-44: FIG. 5 shows a feed-forward Multilayer Perceptron (MLP) neural network used as a function mapper 500 according to one embodiment. In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames). Both the feature envelope exemplars 502 (or frames) and analyzed 35 dataset exemplars 504 ( or frames) can be one-dimensional or multi-dimensional depending on the application. Input pat­terns for an index i may include input frames at i-1, i-2, ... i-n and/or i+l, i+2, ... i+n. During this training (or design) stage, feature envelope 40 exemplars 502 are presented at the input of the function mapper 500 and the corresponding analyzed data exemplars 504 are presented at the output of the MLP as the desired outputs to be learned [claimed executable program code from one or more of the multiple learned functions receives output from executable program code of at least one other learned function of the multiple learned functions as an input.].


    PNG
    media_image1.png
    382
    1121
    media_image1.png
    Greyscale


)
Examiner notes that recited modules/means are taught in Drame as the hardware executing computer instruction to perform claimed functions and the execution of computer instruction to receiving processing instruction and outputs, in 23:9-24:44: Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules or hard­ware-implemented modules. A hardware-implemented mod­ule is a tangible unit capable of performing certain operations 15 and may be configured or arranged in a certain manner… In various embodiments, a hardware-implemented module (e.g., a computer-implemented module) may be implemented mechanically or electronically. For example, a hardware­implemented module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-pur­pose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware-implemented module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily config­ured by software to perform certain operations… Hardware-implemented modules can provide information to, and receive information from, other hardware-implemented modules. Accordingly, the described hardware­implemented modules may be regarded as being communi­catively coupled… The one or more processors may also operate to support performance of the relevant operations in a "cloud comput­ing" environment or as a "software as a service" (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network ( e.g., the Internet) and via one or more appropriate interfaces ( e.g., application program interfaces (APis )).

While Drame teaches the neural network of assembling a set of functions used to synthetized an output from the grouping (e.g. ensemble) of functions as the set of nodes for computing a desired output learned by the node functions using backpropagation as disclosed above. Drame does not expressly teach the use of the neural network nodes ensemble where the  nodes are regression classifier modes for grouping functions to perform a desired output (e.g. a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions as nodes ensemble into layered sub-categorizes …).
Senior does expressly teach the use of the use of the neural network nodes ensemble where the  nodes are regression classifiers for grouping functions training using contextual data subset to perform a desired output (e.g. evaluation metadata indicating an effectiveness of different learned functions at making predictions based on different subsets … of data) (as depicted in Fig. 6 And in 21:29-41: As described above, the acoustic parameter generation module 504 of the example speech synthesis system 500 may 30 be implemented using a deep neural network, such as the neural network 506, in accordance with example embodi­ments. FIG. 6 is a schematic illustration of one type a neural network, namely a "feed-forward" neural network, that could be used for mapping phonetic transcriptions (e.g. training- 35 time phonetic-context descriptors 503 and/or run-time pho­netic-context descriptors 513) to acoustic feature vectors ( e.g. training-time predicted feature vectors 505 and/or run-time predicted feature vectors 515) [claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata]. As shown, a neural network 600 includes "layers" 602, 604, 606, and 608, labeled Li, L2, 40 L , and L4, respectively. Each layer includes a set of nodes [hidden layers having claimed a total number of the plurality of learned functions selected such that at least a subset of the plurality of learned functions are pseudo-randomly suitable for the training data], represented as circles in FIG. 6… In the 55 example of FIG. 6, the output is one or more output acoustic feature vectors 603 (represented as vertical arrows), each corresponding to a frame of to-be-synthesized waveform data... The layers 60 604 (L2) and 606 (L3) may sometimes be referred to as "hid­den layers." Each node in the neural network 600 may correspond to a mathematical function [learned functions] having adjustable parameters, and from which can be computed a scalar output of one or more 65 inputs…[claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata] As shown, the output of each node of a given layer is con­nected to the input of every node in the next layer, except that the input layer receives its input from data presented to the neural network ( e.g., phonetic-context descriptors 601 in the present example), and the output layer delivers output data from the neural network ( e.g., output acoustic feature vectors 603 in the present example). Taking the example of a sigmoid function, each node could compute a sigmoidal nonlinearity of a weighted sum of its inputs… More generally, neural networks may be considered as implementations of a variety classes of regres­sion algorithms and function approximators [claimed synthesized from the evaluation metadata to direct data through the multiple learned functions], including but not limited to conventional back-propagation neural net­works, convolutional networks, time-delay neural networks, and mixture-density networks…; 


    PNG
    media_image2.png
    844
    662
    media_image2.png
    Greyscale
 

Alternatively, Senior teaches learning using neural networks without regard to suitability, as the use of training data that correspond to contextual information where the leaning process for discovering the node learned functions is applied to the data to learn the wide variety of information of the provided example training data, in 6: 8-30: In accordance with example embodiments, a parameter generation module of a speech synthesis system ( e.g., TTS system) may include a neural network that can be trained to receive sequences of phoneme labels ( or other types phonetic labels) in and/or accompanied by a wide variety of contexts, and to map them to acoustic feature vectors through a process of leaning to account for the wide variety of contexts… The meaning of "large number and variety of contexts" may be taken to correspond to a body large enough at least to stretch practical implementation limits of conventional techniques, necessitate at least some degree of approximation, and/or impose accuracy limits on generated feature vectors. The very challenge of such a large body of context information can be turned to an advantage in training a neural network, where a wide the variety of examples presented can lead to a wide the variety of examples learned, and correspondingly to versatility [claimed the different machine learning classes selected without regard to a suitability of the plurality of learned functions and of the different machine learning classes for the training data] when the trained neural net­work is applied to data and circumstances beyond the training regime.)
The Drame and Senior references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose to provide an apparatus, system, method, and computer program product for information processing using automated machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing natural language data to automatically train node learning functions used to ensemble a neural network model for learning features and predicting a desired outcome in a multi-processor system as disclosed by Senior with the method for automated training of one or more function mappers to map input to drive a synthesis process for determining composite outputs as disclosed by Drame.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Senior and Drame in order train a neural network implemented by one or more process to map training-time sequences into predicted vectors that correspond to training data and it’s contextual properties to determine predictive outcomes (Senior, 5:37-41); Doing so can help enhance the accuracy of the predicted outcomes when processing mapping the context information associated with the training data (Senior 5:33-41) and enable versatility when the trained functions are applied to data and circumstances beyond the training regime (Senior 6:24-30).
While Drame and Senior discuss the use of backpropagation for evaluating the learned functions by adjusting weights as disclosed above, Drame and Senior do not expressly disclose learned functions  (as testing the trained learning system as depicted in  Fig., in 0035: In general, learning  systems have multiple phases of  operation.  The  initial  phase  is  known  as  the  training phase.  During the training phase,  a set of training data can be input into the learning system. The learning system learns to optimize  its output for data during the processing  of the training data. Next, a set of validation data can be input into the  learning  system.  The  results  of  processing  of the validation data set [claimed test data] by the learning system can be measured using a variety of evaluation metrics to evaluate the performance of  the  learning  system [claimed learned functions using test data and to maintain evaluation metadata for the plurality of learned functions].  The  learning  system  can  alternate between the training and validation data to optimize system performance.  Once the learning  system achieves  a desired level of performance, the parameters of the learning system can  be  fixed such  that  performance  will  remain  constant before the learning system enters into the operational phase. During the operational  phase, which typically  follows  both training and validation, users can utilize the learning system to  process  operational  data  and  obtain  the  users'  desired results. )
The Burg, Senior and Drame references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose to provide an apparatus, system, method, and computer program product for information processing using automated machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing natural language data to automatically using a machine learning system for information processing task as disclosed by Burg with the method for automated training of one or more function models and synthesis process for determining desired outputs as collectively disclosed by Drame and Senior.


Regarding claim 15, the rejection of claim 14 is incorporated and Drame in combination with Senior and Burg teaches apparatus of claim 14, further comprising means for synthesizing the evaluation metadata into a rule set for the subset of learned functions, wherein the means for compiling executable code to form the machine learning ensemble  further comprises means for including the rule set in the machine learning ensemble. (claimed learning ensemble as the chain of trained function mappers to process in a chain to obtain a composite output, in 21: 3-16: Finally in FIG. 27B obtain an audio composite 2700. For instance in the case where the original segments are a segment from a cow mooing and a segment a saxophone phrase, the composite could sound like "cow that has swallowed a saxo­phone." Note that in audio embodiment #3 segments are also combined at input levels (e.g., as in audio embodiment #1).  The audio embodiments presented above enable feeding the segments into the trained function mappers [claimed further comprising means for synthesizing the evaluation metadata into a rule set for the subset of learned functions, wherein the means for compiling executable code to form the machine learning ensemble  further comprises means for including the rule set in the machine learning ensemble] one exemplar 10 at a time while still taking the full segments information such as duration, into account. Furthermore, each process in the chain (i.e., the MLP neural network [claimed the machine learning ensemble using the selected subset of features) [means for including the rule set in the machine learning ensemble], the SMS synthesizer etc) can be implemented to work in real-time. Therefore every time a feature envelope exemplar is presented at the input of 15 the chain at synthesis stage, one frame of audio is output at the output…)


Regarding claim 16, the rejection of claim 14 is incorporated and Drame in combination with Senior and Burg teaches apparatus of claim 14, wherein the means for compiling executable program code forms the machine learning ensemble by one or more of: combining learned functions from the plurality of learned functions to form a combined learned function; and  adding one or more layers to a learned function from the plurality of learned functions to form an extended learned function. (claimed addition as hidden layer of nodes of neural network, in 15:16-27: According to one embodiment the architecture of the MLP [claimed adding one or more layers to a learned function from the plurality of learned functions to form an extended learned function as a layer in Neural network] is characterized as having one input per extracted feature envelope to control and one output per analyzed dataset frame value to control. In one specific embodiment, for example, in the case where the number of partials peaks to be controlled is 100 and the number of feature envelopes is 3, the network has 100 outputs, 3 inputs, one hidden layer of 100 neurons and is fully connected. Other architectures are also suitable. For example, the number of hidden layer or number or hidden neurons can vary, or the MLP can be fully or partially connected.; And claimed combination of the learned function, in 9:32-41: … operations for building or training one or more function map­pers to map between extracted features and analyzed dataset (e.g., function mappers can possibly be trained or built to combine properties of two or more segments using a discrimi­nant input); and (d) operations for designing and choosing generative rules and combination processes [claimed combining learned functions from the plurality of learned functions to form a combined learned function]. The synthesis stage may include: ( a) operations for feeding inputs (possibly including combined inputs) to one or more function mappers (possibly including function mappers trained or built to com­bine properties of two or more segments);…)

Regarding claims 18 and 19, the rejection of claim 17 is incorporated and Drame in combination with Senior and Burg teaches the computer program product of claim 17, wherein the operations further comprise evaluating the plurality of learned functions using … data to generate the evaluation metadata. (claimed function evaluator for performing claimed evaluation as backpropagation for generating claimed meta data as the weight parameters, in 14:53-65: According to one embodiment, during the training stage, each feature exemplar 1406 is successively presented at the input of the MLP and each corresponding analyzed dataset exemplar ( or analysis frame) 1500 is presented at the output of the MLP as the desired outputs to be learned. According to one embodiment the MLP is trained according to a well known backpropagation algorithm.  At this training ( or design) stage a magnitude weighting ( or normalizing) function [the operations further comprise evaluating the plurality of learned functions using … data to generate the evaluation metadata] such as 1700 in FIG. 17 can be applied to the analyzed dataset 1206 or feature envelopes 1402 or both during learning to ensure that all parts of the data are given an equivalent weight ( e.g., values of same order of magnitude) at 65 the outputs of the function mapper during training time… )
wherein evaluating the plurality of learned functions comprises generating a machine learning ensemble for each possible combination of features of the training data and evaluating each generated machine learning ensemble using the … data. (as depicted in Fig. 5 nodes for learning the connected nodes as claimed combination of features as the set of node learned functions learned to combined features of the neural network, as depicted in Fig. 5 and in 15:19-31: …In one specific embodiment, for example, in the case where the number of partials peaks to be controlled is 100 and the number of feature envelopes is 3, the network has 100 outputs, 3 inputs, one hidden layer of 100 neurons [hidden layers having claimed a total number of the plurality of learned functions selected such that at least a subset of the plurality of learned functions are pseudo-randomly suitable for the training data] and is fully connected. Other architectures are also suitable. For example, the number of hidden layer or number or hidden neurons can vary, or the MLP can be fully or partially connected [generating a machine learning ensemble for each possible combination of features of the training data and evaluating each generated machine learning ensemble using the … data as training data]. Additionally, shunting units may be added to shunt the outputs to zero if the amplitude is below a certain thresh­old. At the end of the training process shown in FIG. 16A, a trained MLP neural network 1604 [claimed the multiple learned functions receives output from executable program code of at least one other learned function of the multiple learned functions as an input] is obtained with its param­eters (e.g., weights, biases) fitted to the training data [claimed different subsets of the data based on the evaluation metadata]…; And in 7:16-44: FIG. 5 shows a feed-forward Multilayer Perceptron (MLP) neural network used as a function mapper 500 according to one embodiment. In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm [claimed learning using training data]. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames). Both the feature envelope exemplars 502 (or frames) and analyzed 35 dataset exemplars 504 ( or frames) can be one-dimensional or multi-dimensional depending on the application. Input pat­terns for an index i may include input frames at i-1, i-2, ... i-n and/or i+l, i+2, ... i+n. During this training (or design) stage, feature envelope 40 exemplars 502 are presented at the input of the function mapper 500 and the corresponding analyzed data exemplars 504 are presented at the output of the MLP as the desired outputs to be learned.


    PNG
    media_image1.png
    382
    1121
    media_image1.png
    Greyscale


)

While Drame and Senior discuss the use of backpropagation for evaluating the learned functions by adjusting weights as disclosed above, Drame and Senior do not expressly disclose learned functions using test data and to maintain evaluation metadata for the plurality of learned functions. Burg does expressly teach claim 1 limitation: …learned functions using test data and to maintain evaluation metadata for the plurality of learned functions. (as testing the trained learning system as depicted in  Fig., in 0035: In general, learning  systems have multiple phases of  operation.  The  initial  phase  is  known  as  the  training phase.  During the training phase,  a set of training data can be input into the learning system. The learning system learns to optimize  its output for data during the processing  of the training data. Next, a set of validation data can be input into the  learning  system.  The  results  of  processing  of the validation data set [claimed test data] by the learning system can be measured using a variety of evaluation metrics to evaluate the performance of  the  learning  system [claimed learned functions using test data and to maintain evaluation metadata for the plurality of learned functions].  The  learning  system  can  alternate between the training and validation data to optimize system performance.  Once the learning  system achieves  a desired level of performance, the parameters of the learning system can  be  fixed such  that  performance  will  remain  constant before the learning system enters into the operational phase. During the operational  phase, which typically  follows  both training and validation, users can utilize the learning system to  process  operational  data  and  obtain  the  users'  desired results. )
The Burg, Senior and Drame references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose to provide an apparatus, system, method, and computer program product for information processing using automated machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing natural language data to automatically using a machine learning system for information processing task as disclosed by Burg with the method for automated training of one or more function models and synthesis process for determining desired outputs as collectively disclosed by Drame and Senior.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Burg, Senior and Drame in order to train learning models that can be used for a variety of data processing or analysis tasks using training phrases using both training and validation data (Burg, 0004); Doing so can enable the learning system to process operational data and obtain the users' desired result (Burg, 0035) and enable machine learning systems are trained to optimize system output; this is the "learning" aspect by using the output of the learning system generated using the training data to be evaluated and used to update the learning system, (Burg, 0031).

Claims 17 and 22-25  are rejected under 35 U.S.C. 103 as being unpatentable over Drame (US Pat. No. 9,147,166) in view of Senior et al. (US Pat No. 8,527,276, hereinafter ‘Senior’).

Regarding claim 17, Drame teaches: a computer program product comprising a non-transitory computer readable storage medium storing computer usable program code executable to perform operations for a machine learning factory, the operations comprising: determining executable program (claimed function generator module for claimed functions as mapping functions into two or more data segment training datasets, in 3:30-43: … (2) combining 30 the extracted feature envelopes, and/or the function mappers using two or more data segments [claimed the training data received for forming a machine learning ensemble customized for the training data]; (3) feeding the feature envelopes or combining feature envelopes to function mapper or combination of function mappers; Certain embodiments provide methods for combining 35 extracted feature envelopes from one or more data segments ( or regions) thus ensuring more realistic features parameters correlations across exemplars. Certain embodiments provide methods for training a single function mapper with a discriminant input on two or more 40 data segments thus in effect combining the functionalities of two separate function mappers trained on two different data segments and thus obtaining a combined function mapper…; And in using the different categories to learn from the dataset where the analysis is data to extracted features is analyzed separately, claimed without suitability, form the mapping to a category, e.g. classifier, in 19:4-21: FIG. 23B illustrates a combination of analyzed datasets in the same training set. According to one embodiment, the analyzed dataset exemplars 2308A, 2308B are combined suc­cessively in time so that each analyzed dataset exemplar matches the corresponding feature envelopes exemplar in time in a combined analyzed data set 2310. FIG. 24 is an illustration of a function mapper 2400 with 10 inputs 2408 including one discriminant input 2406. During the backpropagation training with discriminant input process 2404, the two segments [claimed the training data received for forming a machine learning ensemble customized for the training data] are presented to the neural network 2400 for learning. During training, the discriminant input informs the function mapper of what type (or category) of 15 segment it is currently learning. This process is an example of combination of two or more segments [claimed the training data received for forming a machine learning ensemble customized for the training data] at the function map­per's body level. The neural network actually learns proper­ties from both segments [determining executable program code for a plurality of learned functions from a plurality of different machine learning classes using training data without regard to a suitability of the plurality of learned functions and of the different machine learning classes for the training data] and the resulting trained MLP with discriminant input 2402 is able to discriminate and interpolate between them...; claimed plurality of learned function mapper of the neural network, in 20:40-49: …Next in FIG. 27A train a different Neural Network MLP function mapper 1602 on each different datasets, to map each extracted feature exemplar 1406 to each corresponding ana­lyzed dataset exemplar 1500 according to training process 1606 where the outputs exemplars are spectral peaks ampli­tudes and obtain a trained function mapper neural network 1606 for each dataset. The training process may use a mag-nitude-dependent weighting or normalizing function 1700 to give equivalent weights to each value of the analyzed dataset exemplars…)
selecting a subset of the features of the training data based on evaluation metadata generated for the plurality of learned functions and stored in one or more non-transitory computer readable storage media, (claimed learning ensemble as the chain of trained function mappers to process in a chain to obtain a composite output, in 21: 3-16: Finally in FIG. 27B obtain an audio composite 2700 [selecting a subset of the features of the training data based on evaluation metadata generated for the plurality of learned functions and stored in one or more non-transitory computer readable storage media]. For instance in the case where the original segments are a segment from a cow mooing and a segment a saxophone phrase, the composite could sound like "cow that has swallowed a saxo­phone." Note that in audio embodiment #3 segments are also combined at input levels (e.g., as in audio embodiment #1).  The audio embodiments presented above enable feeding the segments into the trained function mappers [selecting a subset of the features of the training data based on evaluation metadata generated for the plurality of learned functions and stored in one or more non-transitory computer readable storage media] one exemplar 10 at a time while still taking the full segments information such as duration, into account. Furthermore, each process in the chain (i.e., the MLP neural network, the SMS synthesizer etc) can be implemented to work in real-time. Therefore every time a feature envelope exemplar is presented at the input of 15 the chain at synthesis stage, one frame of audio is output at the output…) 

the evaluation metadata comprising an effectiveness metric for a learned function; and (claimed evaluation data as data used during training of learned functions using backpropagation, in 7:19-: In accordance with one embodiment, a multilayer feed­forward perceptron neural network is used as a function map­per 400. Neural networks can be considered as representatives of a broad class of adaptive function mappers and have been shown to be universal function approximators. More­over, neural networks are known for their interpolation and extrapolation properties… In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm [the evaluation metadata comprising an effectiveness metric for a learned function]. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames)…  During this training (or design) stage, feature envelope 40 exemplars 502 are presented at the input of the function mapper 500 and the corresponding analyzed data exemplars 504  are presented at the output of the MLP as the desired outputs to be learned... And learning using backpropagation in 7:18-60: In accordance with one embodiment, a multilayer feed­forward perceptron neural network is used as a function map­per 400. Neural networks can be considered as representatives of a broad class of adaptive function mappers and have been shown to be universal function approximators… FIG. 5 shows a feed-forward Multilayer Perceptron (MLP) neural network used as a function mapper 500 according to one embodiment. In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames)... FIG. 6 is an example of a magnitude dependent weighting function [computing claimed evaluation metadata comprising an effectiveness metric for a learned function] that can be used for function mapper training. At this training ( or design) stage and depending on the modeled data, the target application, and the type of function mapper used, a magnitude weighting ( or normalizing) function F  such as the one shown in FIG. 6 can be applied to analyzed dataset or feature envelopes or both during learning to ensure that all parts of the data are given an equivalent weight (i.e., values of same order of magnitude at the outputs of the function map­per)…)
compiling executable program code from a subset of multiple learned functions from the plurality of learned functions to form the machine learning ensemble, the machine learning ensemble comprising at least two learned functions from the plurality of learned functions, the at least two learned functions using the selected subset of features, the at least two learned functions selected and combined based on the evaluation metadata, (claimed learning ensemble as the chain of trained function mappers to process in a chain to obtain a composite output, in 21: 3-16: Finally in FIG. 27B obtain an audio composite 2700. For instance in the case where the original segments are a segment from a cow mooing and a segment a saxophone phrase, the composite could sound like "cow that has swallowed a saxo­phone." Note that in audio embodiment #3 segments are also combined at input levels (e.g., as in audio embodiment #1).  The audio embodiments presented above enable feeding the segments [claimed at least two learned functions using the selected subset of features, the at least two learned functions selected and combined based on the evaluation metadata] into the trained function mappers [the machine learning ensemble comprising the subset of multiple learned functions selected and combined based on the evaluation metadata for the plurality of learned functions] one exemplar 10 at a time while still taking the full segments information such as duration, into account. Furthermore, each process in the chain (i.e., the MLP neural network, the SMS synthesizer etc) can be implemented to work in real-time. Therefore every time a feature envelope exemplar is presented at the input of 15 the chain at synthesis stage, one frame of audio is output at the output…) 
the machine learning ensemble comprising a rule set synthesized from the evaluation metadata to direct data through executable program code from the at least two learned functions so that executable program code from different learned functions process different features of the selected subset of features. (claimed rule set as the set of node learned functions learned to combined features for computing categorization output of the neural network, as depicted in Fig. 5 and in 15:19-31: …In one specific embodiment, for example, in the case where the number of partials peaks to be controlled is 100 and the number of feature envelopes is 3, the network has 100 outputs, 3 inputs, one hidden layer of 100 neurons  and is fully connected. Other architectures are also suitable. For example, the number of hidden layer or number or hidden neurons can vary, or the MLP can be fully or partially connected [claimed machine learning ensemble comprising a rule set synthesized from the evaluation metadata to direct data through executable program code from the at least two learned functions so that executable program code from different learned functions process different features of the selected subset of features]. Additionally, shunting units may be added to shunt the outputs to zero if the amplitude is below a certain thresh­old. At the end of the training process shown in FIG. 16A, a trained MLP neural network 1604 [claimed the multiple learned functions receives output from executable program code of at least one other learned function of the multiple learned functions as an input] is obtained with its param­eters (e.g., weights, biases) fitted to the training data [claimed different subsets of the data based on the evaluation metadata]…; And in 7:16-44: FIG. 5 shows a feed-forward Multilayer Perceptron (MLP) neural network used as a function mapper 500 according to one embodiment. In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm [compiling executable program code from a subset of multiple learned functions from the plurality of learned functions to form the machine learning ensemble, the machine learning ensemble comprising at least two learned functions from the plurality of learned functions, the at least two learned functions using the selected subset of features, the at least two learned functions selected and combined based on the evaluation metadata]. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames). Both the feature envelope exemplars 502 (or frames) and analyzed 35 dataset exemplars 504 ( or frames) can be one-dimensional or multi-dimensional depending on the application. Input pat­terns for an index i may include input frames at i-1, i-2, ... i-n and/or i+l, i+2, ... i+n. During this training (or design) stage, feature envelope 40 exemplars 502 are presented at the input of the function mapper 500 and the corresponding analyzed data exemplars 504 are presented at the output of the MLP as the desired outputs to be learned [claimed executable program code from one or more of the multiple learned functions receives output from executable program code of at least one other learned function of the multiple learned functions as an input.].


    PNG
    media_image1.png
    382
    1121
    media_image1.png
    Greyscale


)
Examiner notes that recited modules/means are taught in Drame as the hardware executing computer instruction to perform claimed functions and the execution of computer instruction to in 23:9-24:44: Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules or hard­ware-implemented modules. A hardware-implemented mod­ule is a tangible unit capable of performing certain operations 15 and may be configured or arranged in a certain manner… In various embodiments, a hardware-implemented module (e.g., a computer-implemented module) may be implemented mechanically or electronically. For example, a hardware­implemented module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-pur­pose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware-implemented module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily config­ured by software to perform certain operations… Hardware-implemented modules can provide information to, and receive information from, other hardware-implemented modules. Accordingly, the described hardware­implemented modules may be regarded as being communi­catively coupled… The one or more processors may also operate to support performance of the relevant operations in a "cloud comput­ing" environment or as a "software as a service" (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network ( e.g., the Internet) and via one or more appropriate interfaces ( e.g., application program interfaces (APis )).

While Drame teaches the neural network of assembling a set of functions used to synthetized an output from the grouping (e.g. ensemble) of functions as the set of nodes for computing a desired output learned by the node functions using backpropagation as disclosed above. Drame does not expressly teach the use of the neural network nodes ensemble where the  nodes are regression classifier modes for grouping functions to perform a desired output (e.g. a rule set synthesized from the 
Senior does expressly teach the use of the use of the neural network nodes ensemble where the  nodes are regression classifiers for grouping functions training using contextual data subset to perform a desired output (e.g. evaluation metadata indicating an effectiveness of different learned functions at making predictions based on different subsets … of data) (as depicted in Fig. 6 And in 21:29-41: As described above, the acoustic parameter generation module 504 of the example speech synthesis system 500 may 30 be implemented using a deep neural network, such as the neural network 506, in accordance with example embodi­ments. FIG. 6 is a schematic illustration of one type a neural network, namely a "feed-forward" neural network, that could be used for mapping phonetic transcriptions (e.g. training- 35 time phonetic-context descriptors 503 and/or run-time pho­netic-context descriptors 513) to acoustic feature vectors ( e.g. training-time predicted feature vectors 505 and/or run-time predicted feature vectors 515) [claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata]. As shown, a neural network 600 includes "layers" 602, 604, 606, and 608, labeled Li, L2, 40 L , and L4, respectively. Each layer includes a set of nodes [hidden layers having claimed a total number of the plurality of learned functions selected such that at least a subset of the plurality of learned functions are pseudo-randomly suitable for the training data], represented as circles in FIG. 6… In the 55 example of FIG. 6, the output is one or more output acoustic feature vectors 603 (represented as vertical arrows), each corresponding to a frame of to-be-synthesized waveform data... The layers 60 604 (L2) and 606 (L3) may sometimes be referred to as "hid­den layers." Each node in the neural network 600 may correspond to a mathematical function [learned functions] having adjustable parameters, and from which can be computed a scalar output of one or more 65 inputs…[claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata] As shown, the output of each node of a given layer is con­nected to the input of every node in the next layer, except that the input layer receives its input from data presented to the neural network ( e.g., phonetic-context descriptors 601 in the present example), and the output layer delivers output data from the neural network ( e.g., output acoustic feature vectors 603 in the present example). Taking the example of a sigmoid function, each node could compute a sigmoidal nonlinearity of a weighted sum of its inputs… More generally, neural networks may be considered as implementations of a variety classes of regres­sion algorithms and function approximators [claimed synthesized from the evaluation metadata to direct data through the multiple learned functions], including but not limited to conventional back-propagation neural net­works, convolutional networks, time-delay neural networks, and mixture-density networks…; 


    PNG
    media_image2.png
    844
    662
    media_image2.png
    Greyscale
 

Alternatively, Senior teaches learning using neural networks without regard to suitability, as the use of training data that correspond to contextual information where the leaning process for discovering the node learned functions is applied to the data to learn the wide variety of information of the provided example training data, in 6: 8-30: In accordance with example embodiments, a parameter generation module of a speech synthesis system ( e.g., TTS system) may include a neural network that can be trained to receive sequences of phoneme labels ( or other types phonetic labels) in and/or accompanied by a wide variety of contexts, and to map them to acoustic feature vectors through a process of leaning to account for the wide variety of contexts… The meaning of "large number and variety of contexts" may be taken to correspond to a body large enough at least to stretch practical implementation limits of conventional techniques, necessitate at least some degree of approximation, and/or impose accuracy limits on generated feature vectors. The very challenge of such a large body of context information can be turned to an advantage in training a neural network, where a wide the variety of examples presented can lead to a wide the variety of examples learned, and correspondingly to versatility [claimed the different machine learning classes selected without regard to a suitability of the plurality of learned functions and of the different machine learning classes for the training data] when the trained neural net­work is applied to data and circumstances beyond the training regime.)
The Drame and Senior references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose to provide an apparatus, system, method, and computer program product for information processing using automated machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing natural language data to automatically train node learning functions used to ensemble a neural network model for learning features and predicting a desired outcome in a multi-processor system as disclosed by Senior with the method for automated training of one or more function mappers to map input to drive a synthesis process for determining composite outputs as disclosed by Drame.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Senior and Drame in order train a neural network implemented by one or more process to map 

Regarding claim 20, the rejection of claim 17 is incorporated. The limitation is similar to claim 3 limitation and is rejected under the same rationale.

Regarding claim 21, the rejection of claim 17 is incorporated and Drame in combination with Senior and Burg teaches the computer program product of claim 17, wherein the operations further comprise identifying one or more of the plurality of features as noisy and excluding the noisy features from the selected subset of features. (claimed excluding features not picked that fail the predetermined distribution, in 9:48-62: FIG. 9 illustrates an example embodiment of a synthesis stage that combines two or more segments at the function mapper's input level. The segment picking process 900 includes picking one or more feature envelope segments from one or more categorized feature envelopes datasets ( or stores) 806 to determine segments 812. The rules and methods for segment picking including segment picking process (SPP) dynamic parameters 902 can be chosen or designed at design time. Examples of segment picking rules include, for example, random picking, manual picking, or rules such as "pick the first segment of one dataset and the first segment of a second dataset, then pick the second segment of the first dataset and the second segment of the second dataset etc." As another example a rule can be "pick segments that fit a pre­determined statistic distribution [wherein the operations further comprise identifying one or more of the plurality of features as noisy and excluding the noisy features from the selected subset of features]…)

Regarding claim 22, the rejection of claim 17 is incorporated and Drame in combination with Senior teaches the computer program product of claim 17, wherein one or more of the features of the training data are selected by a user as required for inclusion in the subset of features. in 14:4-: Feature envelopes are successions of feature envelope frames (or feature envelope exemplars). The values in the feature envelope frames ( or feature envelope exemplars, or feature vectors) can be computed from analyzed dataset frames 1500 or 1502 from the audio slices by the feature extraction process 1400.. As illustrated in FIG. 16A, the MLP is trained to map  between feature envelope exemplars 1406 from the feature envelopes dataset 1402 [wherein one or more of the features of the training data are selected by a user as required for inclusion in the subset of features] and corresponding analyzed dataset exemplars 1500 from the analyzed dataset 1206. The dataset exemplars 1500 may include the amplitudes ( or magnitudes) of the spectral peaks. Exemplars may also include other spec-tral information such as frequencies for example, and an MLP could be trained on the frequency trajectories to provide the frequencies information needed at synthesis time and thus model sounds that are not purely harmonic. An MLP could also be trained on the noise spectral envelopes to model noise like sounding sounds such as wind or breaths.)

Regarding claim 23, Drame teaches a machine learning ensemble comprising: executable program code for multiple learned functions synthesized from executable program code for a larger plurality of learned functions from a plurality of different machine learning classes, the multiple learned functions selected and combined based on evaluation metadata for an evaluation of the larger plurality of learned functions, wherein the larger plurality of learned functions are generated based on training data without regard to a suitability of the larger plurality of learned functions and of the different machine learning classes for the training data; (claimed function generator module for claimed functions as mapping functions into two or more data segment training datasets, in 3:30-43: … (2) combining 30 the extracted feature envelopes, and/or the function mappers using two or more data segments; (3) feeding the feature envelopes or combining feature envelopes to function mapper or combination of function mappers; Certain embodiments provide methods for combining 35 extracted feature envelopes from one or more data segments ( or regions) thus ensuring more realistic features parameters correlations across exemplars. Certain embodiments provide methods for training a single function mapper with a discriminant input on two or more 40 data segments thus in effect combining the functionalities of two separate function mappers trained on two different data segments [the multiple learned functions selected and combined based on evaluation metadata for an evaluation of the larger plurality of learned functions, wherein the larger plurality of learned functions are generated based on training data without regard to a suitability of the larger plurality of learned functions and of the different machine learning classes for the training data] and thus obtaining a combined function mapper…; And in using the different categories to learn from the dataset where the analysis is data to extracted features is analyzed separately, claimed without suitability, form the mapping to a category, e.g. classifier, in 19:4-21: FIG. 23B illustrates a combination of analyzed datasets in the same training set. According to one embodiment, the analyzed dataset exemplars 2308A, 2308B are combined suc­cessively in time so that each analyzed dataset exemplar matches the corresponding feature envelopes exemplar in time in a combined analyzed data set 2310. FIG. 24 is an illustration of a function mapper 2400 with 10 inputs 2408 including one discriminant input 2406. During the backpropagation training [executable program code for multiple learned functions synthesized from executable program code for a larger plurality of learned functions from a plurality of different machine learning classes, the multiple learned functions selected and combined based on evaluation metadata for an evaluation of the larger plurality of learned functions, wherein the larger plurality of learned functions are generated based on training data without regard to a suitability of the larger plurality of learned functions and of the different machine learning classes for the training data] with discriminant input process 2404, the two segments are presented to the neural network 2400 for learning. During training, the discriminant input informs the function mapper of what type (or category) of 15 segment it is currently learning. This process is an example of combination of two or more segments at the function map­per's body level. The neural network actually learns proper­ties from both segments and the resulting trained MLP with discriminant input 2402 is able to discriminate and interpolate between them...; claimed plurality of learned function mapper of the neural network, in 20:40-49: …Next in FIG. 27A train a different Neural Network MLP function mapper 1602 on each different datasets, to map each extracted feature exemplar 1406 to each corresponding ana­lyzed dataset exemplar 1500 according to training process [combined based on evaluation metadata for an evaluation of the larger plurality of learned functions] 1606 where the outputs exemplars are spectral peaks ampli­tudes and obtain a trained function mapper neural network 1606 for each dataset [the multiple learned functions selected and combined based on evaluation metadata for an evaluation of the larger plurality of learned functions, wherein the larger plurality of learned functions are generated based on training data without regard to a suitability of the larger plurality of learned functions and of the different machine learning classes for the training data]. The training process may use a mag-nitude-dependent weighting or normalizing function 1700 to give equivalent weights to each value of the analyzed dataset exemplars…)
a metadata rule set synthesized from the evaluation metadata for the plurality of learned functions for directing data through executable program code of different learned functions of the multiple learned functions to produce a result; (in 19:4-21: FIG. 23B illustrates a combination of analyzed datasets in the same training set. According to one embodiment, the analyzed dataset exemplars 2308A, 2308B are combined suc­cessively in time so that each analyzed dataset exemplar matches the corresponding feature envelopes exemplar in time in a combined analyzed data set 2310. FIG. 24 is an illustration of a function mapper 2400 with 10 inputs 2408 including one discriminant input 2406. During the backpropagation training  with discriminant input process 2404, the two segments are presented to the neural network 2400 for learning [a metadata rule set synthesized from the evaluation metadata for the plurality of learned functions for directing data through executable program code of different learned functions of the multiple learned functions to produce a result]. During training, the discriminant input informs the function mapper of what type (or category) of 15 segment it is currently learning. This process is an example of combination of two or more segments at the function map­per's body level. The neural network actually learns proper­ties from both segments and the resulting trained MLP with discriminant input 2402 is able to discriminate and interpolate between them...; claimed plurality of learned function mapper of the neural network, in 20:40-49: …Next in FIG. 27A train a different Neural Network MLP function mapper 1602 on each different datasets, to map each extracted feature exemplar 1406 to each corresponding ana­lyzed dataset exemplar 1500 according to training process [learning using claimed evaluation metadata for the plurality of learned functions for directing data through executable program code of different learned functions of the multiple learned functions to produce a result] 1606 where the outputs exemplars are spectral peaks ampli­tudes and obtain a trained function mapper neural network 1606 for each dataset. The training process may use a mag-nitude-dependent weighting [the evaluation metadata for the plurality of learned functions for directing data through executable program code of different learned functions of the multiple learned functions to produce a result] or normalizing function 1700 to give equivalent weights to each value of the analyzed dataset exemplars…)

an orchestration module configured to direct the data through the executable program code of the different learned functions of the multiple learned functions based on the synthesized metadata rule set to produce the result. (claimed synthesized metadata rule set as the set of node learned functions learned to combined features for computing categorization output of the neural network, as depicted in Fig. 5 and in 15:19-31: …In one specific embodiment, for example, in the case where the number of partials peaks to be controlled is 100 and the number of feature envelopes is 3, the network has 100 outputs, 3 inputs, one hidden layer of 100 neurons and is fully connected. Other architectures are also suitable. For example, the number of hidden layer or number or hidden neurons can vary, or the MLP can be fully or partially connected [an orchestration module configured to direct the data through the executable program code of the different learned functions of the multiple learned functions based on the synthesized metadata rule set to produce the result]. Additionally, shunting units may be added to shunt the outputs to zero if the amplitude is below a certain thresh­old. At the end of the training process shown in FIG. 16A, a trained MLP neural network 1604  is obtained with its param­eters (e.g., weights, biases) fitted to the training data [claimed an orchestration module configured to direct the data … based on the synthesized metadata rule set to produce the result]…; And in 7:16-44: FIG. 5 shows a feed-forward Multilayer Perceptron (MLP) neural network used as a function mapper 500 according to one embodiment. In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames). Both the feature envelope exemplars 502 (or frames) and analyzed 35 dataset exemplars 504 ( or frames) can be one-dimensional or multi-dimensional depending on the application. Input pat­terns for an index i may include input frames at i-1, i-2, ... i-n and/or i+l, i+2, ... i+n. During this training (or design) stage, feature envelope 40 exemplars 502 are presented at the input of the function mapper 500 and the corresponding analyzed data exemplars 504 are presented at the output of the MLP as the desired outputs to be learned [an orchestration module configured to direct the data through the executable program code of the different learned functions of the multiple learned functions based on the synthesized metadata rule set to produce the result.].


    PNG
    media_image1.png
    382
    1121
    media_image1.png
    Greyscale


)
Examiner notes that recited modules using claimed programing code are taught in Drame as the hardware executing computer instruction to perform claimed functions and the execution of computer instruction to receiving processing instruction and outputs, in 23:9-24:44: Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules or hard­ware-implemented modules. A hardware-implemented mod­ule is a tangible unit capable of performing certain operations 15 and may be configured or arranged in a certain manner… In various embodiments, a hardware-implemented module (e.g., a computer-implemented module) may be implemented mechanically or electronically. For example, a hardware­implemented module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-pur­pose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware-implemented module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily config­ured by software to perform certain operations… Hardware-implemented modules can provide information to, and receive information from, other hardware-implemented modules. Accordingly, the described hardware­implemented modules may be regarded as being communi­catively coupled… The one or more processors may also operate to support performance of the relevant operations in a "cloud comput­ing" environment or as a "software as a service" (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network ( e.g., the Internet) and via one or more appropriate interfaces ( e.g., application program interfaces (APis )).

While Drame teaches the neural network of assembling a set of functions used to synthetized an output from the grouping (e.g. ensemble) of different functions as the set of nodes for computing a desired output learned by the node functions using backpropagation as disclosed above. Drame does not expressly teach the use of the neural network nodes ensemble where the  nodes are regression classifier modes for grouping functions to perform a desired output (e.g. a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions as nodes ensemble into layered sub-categorizes …).
Senior does expressly teach the use of the use of the neural network nodes ensemble where the  nodes are regression classifier modes for grouping functions to perform a desired output (e.g. a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions as nodes ensemble into layered sub-categorizes …) (as depicted in Fig. 6 And in 21:29-41: As described above, the acoustic parameter generation module 504 of the example speech synthesis system 500 may 30 be implemented using a deep neural network, such as the neural network 506, in accordance with example embodi­ments. FIG. 6 is a schematic illustration of one type a neural network, namely a "feed-forward" neural network, that could be used for mapping phonetic transcriptions (e.g. training- 35 time phonetic-context descriptors 503 and/or run-time pho­netic-context descriptors 513) to acoustic feature vectors ( e.g. training-time predicted feature vectors 505 and/or run-time predicted feature vectors 515) [claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata]. As shown, a neural network 600 includes "layers" 602, 604, 606, and 608, labeled Li, L2, 40 L , and L4, respectively. Each layer includes a set of nodes [hidden layers having claimed a total number of the plurality of learned functions selected such that at least a subset of the plurality of learned functions are pseudo-randomly suitable for the training data], represented as circles in FIG. 6… In the 55 example of FIG. 6, the output is one or more output acoustic feature vectors 603 (represented as vertical arrows), each corresponding to a frame of to-be-synthesized waveform data... The layers 60 604 (L2) and 606 (L3) may sometimes be referred to as "hid­den layers." Each node in the neural network 600 may correspond to a mathematical function [learned functions] having adjustable parameters, and from which can be computed a scalar output of one or more 65 inputs…[claimed a rule set synthesized from the evaluation metadata to direct data through the multiple learned functions such that executable program code from different learned functions of the machine learning ensemble processes different subsets of the data based on the evaluation metadata] As shown, the output of each node of a given layer is con­nected to the input of every node in the next layer, except that the input layer receives its input from data presented to the neural network ( e.g., phonetic-context descriptors 601 in the present example), and the output layer delivers output data from the neural network ( e.g., output acoustic feature vectors 603 in the present example). Taking the example of a sigmoid function, each node could compute a sigmoidal nonlinearity of a weighted sum of its inputs… More generally, neural networks may be considered as implementations of a variety classes of regres­sion algorithms and function approximators [claimed synthesized from the evaluation metadata to direct data through the multiple learned functions], including but not limited to conventional back-propagation neural net­works, convolutional networks, time-delay neural networks, and mixture-density networks…; And using backpropagation in 17:59-18:10: As part of training (e.g., during or prior to), the associated speech samples and/or speech sample sub-segments could be 60 processed with a signal processor ( e.g. a digital signal pro­cessor) to generate target feature vectors associated with the stored sample text strings (and/or text-string sub-segments). The neural network training module 510 may function to compare the training-time predicted feature vectors 505 out- 65 put by the neural network 506 with the target feature vector 509 obtained from the speech database 512 in order to update and/or adjust parameters ( e.g., weights) of the neural network 506. More specifically, the neural network 506 may perform "forward propagation" of the input training-time phonetic­context descriptors [claimed use of rule sets] 503 to generate the training-time pre­dicted feature vectors 505, and the neural network training module 510 may perform "back propagation" the inputs the input neural network training module 510 to update the neural network 506…; And using backpropagation in 17:59-18:10: As part of training (e.g., during or prior to), the associated speech samples and/or speech sample sub-segments could be 60 processed with a signal processor ( e.g. a digital signal pro­cessor) to generate target feature vectors associated with the stored sample text strings (and/or text-string sub-segments). The neural network training module 510 may function to compare the training-time predicted feature vectors 505 out- 65 put by the neural network 506 with the target feature vector 509 obtained from the speech database 512 in order to update and/or adjust parameters ( e.g., weights) of the neural network 506. More specifically, the neural network 506 may perform "forward propagation" of the input training-time phonetic­context descriptors [claimed use of rule sets] 503 to generate the training-time pre­dicted feature vectors 505, and the neural network training module 510 may perform "back propagation" the inputs the input neural network training module 510 to update the neural network 506…)




    PNG
    media_image2.png
    844
    662
    media_image2.png
    Greyscale
 

Alternatively, Senior teaches learning using neural networks without regard to suitability, as the use of training data that correspond to contextual information where the leaning process for discovering the node learned functions is applied to the data to learn the wide variety of information of the provided example training data, in 6: 8-30: In accordance with example embodiments, a parameter generation module of a speech synthesis system ( e.g., TTS system) may include a neural network that can be trained to receive sequences of phoneme labels ( or other types phonetic labels) in and/or accompanied by a wide variety of contexts, and to map them to acoustic feature vectors through a process of leaning to account for the wide variety of contexts… The meaning of "large number and variety of contexts" may be taken to correspond to a body large enough at least to stretch practical implementation limits of conventional techniques, necessitate at least some degree of approximation, and/or impose accuracy limits on generated feature vectors. The very challenge of such a large body of context information can be turned to an advantage in training a neural network, where a wide the variety of examples presented can lead to a wide the variety of examples learned, and correspondingly to versatility [claimed the different machine learning classes selected without regard to a suitability of the plurality of learned functions and of the different machine learning classes for the training data] when the trained neural net­work is applied to data and circumstances beyond the training regime.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing natural language data to automatically train node learning functions used to ensemble a neural network model for learning features and predicting a desired outcome in a multi-processor system as disclosed by Senior with the method for automated training of one or more function mappers to map input to drive a synthesis process for determining composite outputs as disclosed by Drame.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Senior and Drame in order train a neural network implemented by one or more process to map training-time sequences into predicted vectors that correspond to training data and it’s contextual properties to determine predictive outcomes (Senior, 5:37-41); Doing so can help enhance the accuracy of the predicted outcomes when processing mapping the context information associated with the training data (Senior 5:33-41) and enable versatility when the trained functions are applied to data and circumstances beyond the training regime (Senior 6:24-30).

Regarding claim 24, the rejection of claim 23 is incorporated and Drame in combination with Senior further teaches the machine learning ensemble of claim 23, further comprising a predictive correlation module configured to correlate one or more features of the multiple learned functions with a confidence metric associated with the result. (claimed predictive correlation module configured to correlate one or more features of the multiple learned functions with a confidence metric associated with the result using backpropagation, in 7:19-: In accordance with one embodiment, a multilayer feed­forward perceptron neural network is used as a function map­per 400. Neural networks can be considered as representatives of a broad class of adaptive function mappers and have been shown to be universal function approximators. More­over, neural networks are known for their interpolation and extrapolation properties… In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames)…  During this training (or design) stage, feature envelope 40 exemplars 502 are presented at the input of the function mapper 500 and the corresponding analyzed data exemplars 504 [claimed classification metadata] are presented at the output of the MLP as the desired outputs to be learned...; And learning using backpropagation in 7:18-60: In accordance with one embodiment, a multilayer feed­forward perceptron neural network is used as a function map­per 400. Neural networks can be considered as representa tives of a broad class of adaptive function mappers and have been shown to be universal function approximators… FIG. 5 shows a feed-forward Multilayer Perceptron (MLP) neural network used as a function mapper 500 according to one embodiment. In this embodiment the neural network is 30 trained with a standard backpropagation-type algorithm. The input patterns are feature envelope exemplars 502 ( or feature envelope frames) and the output patterns are analyzed dataset exemplars 504 ( or synthesis parameters frames)... FIG. 6 is an example of a magnitude dependent weighting function that can be used for function mapper training. At this training ( or design) stage and depending on the modeled data, the target application, and the type of function mapper used, a magnitude weighting ( or normalizing) function F [predictive correlation module configured to correlate one or more features of the multiple learned functions with a confidence metric associated with the result] such as the one shown in FIG. 6 can be applied to analyzed dataset or feature envelopes or both during learning to ensure that all parts of the data are given an equivalent weight (i.e., values of same order of magnitude at the outputs of the function map­per)…)

(claimed list as indexed outputs from the trained mapper as depicted in Fig. 16: 

    PNG
    media_image3.png
    695
    1187
    media_image3.png
    Greyscale

In 15:29-58: At the end of the training process shown in FIG. 16A, a trained MLP neural network 1604 is obtained with its param­eters (e.g., weights, biases) fitted to the training data… FIG. 16B shows an example embodiment of the process of feeding feature envelopes 1402 to a trained function mapper 1604 to obtain an approximation of the original audio data 1614. According to one embodiment original exemplars are presented, one exemplar frame 1406 at a time, to the inputs of the trained function mapper 1604. The trained function map­per outputs one output frame 1608 at a time [wherein the predictive correlation module is configured to provide a listing of the one or more features correlated with the result to a client]. For a given original input exemplar 1406, the values of an output frame 1608 are approximations of the original amplitudes A, in the original exemplar 1500. Each output frame 1608 is presented to the synthesis process 1300 to produce one frame of audio 1614 that is an approximation of the original audio data 1202. According to one embodiment the function mapper's out­puts represent only a subset of the parameters needed for synthesis, namely the spectral peaks amplitudes A,… Alter­natively, as discussed above, the MLP or a second MLP could also be trained on the frequency trajectories.; And claimed client as learning interface depicted in Fig. 16B, and claimed client user computer system for performing claimed process for providing claimed information, in 22:20-23:8: The computer system 2800 may further include a video display unit 2810 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The computer system 2800 also includes an alphanumeric input device 2812 (e.g., a key­board), a user interface (UI)… The disk drive unit 2816 includes a machine-readable medium 2822 on which is stored one or more sets of data structures and instructions 2824 ( e.g., software) embodying or utilizing any one or more of the methodologies or functions described herein... The instructions 2824 may be transmitted using the network interface device 2820 and any one of a number of well-known transfer protocols (e.g., hypertext ) transfer protocol (HTTP)). Examples of communication net­works include a local area network (LAN), a wide area net­work (WAN), the Internet, mobile telephone networks, plain old telephone (POTS) networks, and wireless data networks (e.g., WiFi and WiMax networks). The term "transmission medium" shall be taken to include any intangible medium that is capable of storing, encoding or carrying instructions for execution by the machine,…)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Van Heeswijk et al. (NPL “ GPU-accelerated and parallelized ELM ensembles for large-scale regression”): teaches node functions of a neural network to provide neural network ensemble classifiers for pattern recognition task.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/O.O.A./              Examiner, Art Unit 2129                                                                                                                                                                                          
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129