DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 10/31/2017. Claims 1-20 are pending and have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 10/31/2017. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Interpretation
Claim 20 recites “computer-readable storage medium.” The present Specification (see US 2019/0130261A1) in paragraph [0075] notes the following: “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.” Therefore, for examination purposes, the recitation of “computer-readable storage medium” in claim 20 has been interpreted as “non-transitory computer-readable storage medium.”



Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term "best match" in each of claims 1, 13, and 20 is a relative term which renders the claim indefinite.  The term "best match" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The present Specification (see US 2019/0130261A1) in paragraph [0022]: “the method can select a set of models (neural network models) to be combined based on the scenario descriptors of the existing models that best matches with the user requirement (new model descriptor and size)” only notes that the user requirement can be “new model descriptor and size,” but does not explain what a “best match” is or what metric or standard is used to derive a “best match.” Therefore, for examination purposes, any match can be considered a “best match.”
The term "best set" in claim 7 a relative term which renders the claim indefinite.  The term "best set" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The claim recites that the selection of the “best set” is “based on a similarity amongst the neural network”; however, it is unclear whether the most or least “similar” neural network, or if a neural 
The term "best correlated" in claim 9 is a relative term which renders the claim indefinite.  The term "best match" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The present Specification (see US 2019/0130261A1) in paragraph [0023]: “FIG. 2 exemplarily depicts a combination of models. One possible way of combining neural networks is to align the weights of two networks so that they are best correlated” only reiterates the claim language but does not explain what “best correlated” means or what metric or standard is used to derive “two networks” that “are best correlated.” Therefore, for examination purposes, any two networks having any correlation can be considered “two networks are best correlated.”
Claim 2 recites the limitation "the new model" in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim 3 recites the limitation "the neural network" in line 1.  There is insufficient antecedent basis for this limitation in the claim.
Claim 4 recites the limitation "the new model" in line 4.  There is insufficient antecedent basis for this limitation in the claim.
Claim 4 recites the limitation "the classifiers" in line 3.  There is insufficient antecedent basis for this limitation in the claim.
Claim 7 recites the limitation "the neural network" in line 4.  There is insufficient antecedent basis for this limitation in the claim.
8 recites the limitation "the neural networks" in line 1.  There is insufficient antecedent basis for this limitation in the claim.
Claim 8 recites the limitation "the neural networks" in lines 2-3.  There is insufficient antecedent basis for this limitation in the claim.
Claim 10 recites the limitation "the network" in line 1.  There is insufficient antecedent basis for this limitation in the claim.
Claim 13 recites the limitation "the neural network" in line 1.  There is insufficient antecedent basis for this limitation in the claim.
Claim 14 recites the limitation "the new model" in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim 16 recites the limitation "the new model" in line 4.  There is insufficient antecedent basis for this limitation in the claim.
Each dependent claim is rejected based on the rationale of the claim from which it depends.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 2, 8-14, and 19-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 1 is directed to a computer-implemented neural network classification method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented neural network classification method. Each of the following limitations:
neural network classification...comprising: selecting a set of models for combination based on scenario descriptors of existing classifiers that best match with a requirement to generate a new classifier without requiring training data for the new classifier, 
wherein the existing models are trained for different data features.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“using one or more hardware processors, executing instructions”), the above limitations in the context of this claim encompass neural network classification that includes selecting a set of models for combination based on scenario descriptors of existing classifiers that best match with a requirement to generate a new classifier without requiring training data for the new classifier (corresponds to evaluation and judgment with the assistance of pen and paper); and wherein the existing models are trained for different data features (corresponds to evaluation and judgment since training may be done by hand with the assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 2 is directed to a computer-implemented neural network classification method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented neural network classification method. Each of the following limitations:
wherein the requirement comprises a new model descriptor and a size of the new model.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass matching requirements in which the requirement includes new model descriptor and size (corresponds to evaluation and judgment with the assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 8 is directed to a computer-implemented neural network classification method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:
further comprising combining different neural network models by factoring neural network structures, weights, and relationship between scenario classes as inputs.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass combining different neural network models by factoring neural network structures, weights, and relationship between scenario classes as inputs (corresponds to evaluation and judgment with the assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a 
Regarding Claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 9 is directed to a computer-implemented neural network classification method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented neural network classification method. Each of the following limitations:
wherein the combining the neural networks aligns weights of at least two networks so that the at least two networks are best correlated, and combines each pair of aligned weights by taking an average or a maximum value.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass combining the neural networks aligns weights of at least two networks so that the at least two networks are best correlated, and combines each pair of aligned weights by taking an average or a maximum value (corresponds to evaluation and judgment with the assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 10 is directed to a computer-implemented neural network classification method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented neural network classification method. Each of the following limitations:
wherein the combining is performed by simulating the network using random input data and finding change points of a classification result, and 
merging the neural networks based on observing different change point ranges.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 11 is directed to a computer-implemented neural network classification method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented neural network classification method. Each of the following limitations:
wherein at least two neural networks are combined into a single neural network.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass wherein at least two neural networks are combined into a single neural network (corresponds to evaluation and judgment with the assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components 
Regarding Claim 12,
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 12 is directed to a computer-implemented neural network classification method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the computer-implemented neural network classification method of claim 1. Each of the limitations in claim 1, as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. Please see analysis on claim 1.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. 
Moreover, the recitation of “embodied in a cloud-computing environment” is generally linking the use of a judicial exception to a particular technological environment or field of use, namely the cloud computing technological environment. According to 2106.05(h), “limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Moreover, the recitation of “embodied in a cloud-computing environment” is generally linking the use of a judicial exception to a particular technological environment or field of use, which does not amount to significantly more than the exception itself. See 2106.05(h). The claim is not patent eligible.
Regarding Claim 13,
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 13 is directed to a computer-implemented neural network classification system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a neural network classification system. Each of the following limitations:
neural network classification...selecting a set of classifiers for combination based on scenario descriptors of existing classifiers that best match with a requirement to generate a new classifier without requiring training data for the new classifier, 
wherein the existing models are trained for different data features.

Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a processor; and a memory, the memory storing instructions to cause the processor to perform”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a 
Regarding Claim 14,
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 14 is directed to a computer-implemented neural network classification system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented neural network classification system. Each of the following limitations:
wherein the requirement comprises a new model descriptor and a size of the new model.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“a processor; and a memory, the memory storing instructions to cause the processor to perform”), the above limitations in the context of this claim encompass matching requirements in which the requirement includes new model descriptor and size (corresponds to evaluation and judgment with the assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a processor; and a memory, the memory storing instructions to cause the processor to perform”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 19,
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 19 is directed to a computer-implemented neural network classification system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the computer-implemented neural network classification system of claim 13. Each of the limitations in claim 13, as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. Please see analysis on claim 13.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a processor; and a memory, the memory storing instructions to cause the processor to perform”, as drafted, are reciting generic computer components. The generic 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Moreover, the recitation of “embodied in a cloud-computing environment” is generally linking the use of a judicial exception to a particular technological environment or field of use, which does not amount to significantly more than the exception itself. See 2106.05(h). The claim is not patent eligible.
Regarding Claim 20,
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 20 is directed to a computer program product, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer program product for terminology extraction. Each of the following limitations:
selecting a set of classifiers for combination based on scenario descriptors of existing classifiers that best match with a requirement to generate a new classifier without requiring training data for the new classifier, 
wherein the existing classifiers are trained for different data features.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform”), the above limitations in the context of this claim encompass neural network classification that includes selecting a set of models for combination based on scenario descriptors of existing classifiers that best match with a requirement to generate a new classifier without requiring training data for the new classifier (corresponds to evaluation and judgment with the assistance of pen and paper); and wherein the existing models are trained for different data features (corresponds to evaluation and judgment since training may be done by hand with the assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 8, 11, 13-15, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi et al. (US 10,878,318 B2) in view of Wang et al. (US 10,867,167 B2).
Regarding Claim 1,
Sharifi et al. teaches a computer-implemented neural network classification method, the method comprising:...wherein the existing models are trained for different data features (Col. 7 lines 23-27: “For some processing tasks (e.g., text-to-speech conversion), however, the server computing device 112 can select the selected ANN, perform the first portion of the processing task to obtain the intermediate processing results, and transmit the activations” and Col. 7 lines 49-52: “when the processing task is text-to-speech conversion, the output can be a sound or audio stream that is representative of the original digital media item (a string of text or a text file)” teach one model being trained for the text-to-speech conversion task in which the data feature being learned is associated with input data of a string of text or a text file; Col. 7 line 67 to Col. 8 line 6: “If the task is not text-to-speech conversion, the technique 400 can proceed to 416...At 416, the client computing device 104 can select one of the plurality of ANNs based on the set of operating parameters” and Col. 7 lines 43-45: “For example, when the processing task is speech-to-text, the output can be a text that is representative of the original digital media item (an audio or video file)” teach another model being trained for a speech-to-text task in which the data feature being learned is associated with audio or video file (different from text file data); Fig. 2 teaches computer-implemented method; Col. 2 lines 51-52: “FIGS. 3A-3C are diagrams of various artificial neural network (ANN) configurations” teaches implementing neural network classification).
Sharifi et al. does not appear to explicitly teach selecting a set of models for combination based on scenario descriptors of existing classifiers that best match with a requirement to generate a new classifier without requiring training data for the new classifier.
Wang et al. teaches selecting a set of models for combination based on scenario descriptors of existing classifiers that best match with a requirement to generate a new classifier without requiring training data for the new classifier (Co. 8 lines 28-32: “In Step 2), the training subsample set and the original data set obtained by the above clustering are used respectively to train a plurality of different deep network models, and the training process of each model is computed by multi-thread in parallel. The following three types of deep network models can be used here...” teaches selecting a set of models for combination to generate a new classifier (see Fig. 1) without requiring training data for the new classifier since the original data set is used; Col. 8 lines 40-47: “Step 2B) Strong deep learning detectors, such as the United Deep Model described above, which can detect pedestrians in the image more accurately and quickly than the basic deep network model, and have better countermeasures for complex scenes. The addition of a strong deep learning detector in CDN can effectively ensure the detection effect of the overall model and further improve the detection performance of the strong deep learning detector” teaches one example of selecting a model based on the description of the existing model being useful for learning complex scenes (corresponds to scenario descriptor), which matches the requirement of a model that is able to “effectively ensure the detection effect of the overall model”).
Sharifi et al. and Wang et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate selecting a set of models for combination based on scenario descriptors of existing classifiers that best match with a requirement to generate a new classifier without requiring training data for the new classifier as taught by Wang et al. to the disclosed invention of Sharifi et al.

Regarding Claim 2,
Sharifi et al. in view of Wang et al. teaches the method of claim 1. 
Sharifi et al. further teaches wherein the requirement comprises a new model descriptor and a size of the new model (Col. 4 line 57 to Col. 5 line 1: “The task can be described as training the ANNs such that they work optimally from both a quality point of view, while also balancing other objectives. For example, one other objective can be to limit client-side computation to what is feasible on the device. This may be determined, for example, by how many pre-bottleneck layers there are and the size of each layer, which corresponds to increased computation costs. Another example objective can be to limit the amount of communication/network cost and latency. This may be determined, for example, by the number of nodes and the amount of information (level of fidelity) in the bottleneck layer” teaches requirements including size of the layers in the model (model size) and model descriptors such as number of nodes and the amount of information in the bottleneck layer of the model).
Regarding Claim 3,
Sharifi et al. in view of Wang et al. teaches the method of claim 1. 
Sharifi et al. further teaches wherein the neural network comprises a distributed system having servers located at different physical locations (Col. 3 lines 41-51: “The client computing device 104 can communicate with a server computing device 112 via a network 116. The network 116 can be a local area network (LAN), a wide area network (WAN), e.g., the Internet, or a combination thereof. The term "server computing device" as used herein can refer to both a single server computing device and two or more server computing devices operating in a parallel or distributed architecture. For example, a machine learning model may be distributed over a plurality of server computing devices” teaches a distributed system in which there can be multiple servers working in parallel at various locations).
Regarding Claim 8,
Sharifi et al. in view of Wang et al. teaches the method of claim 1.
Wang et al. further teaches further comprising combining different neural network models by factoring neural network structures, weights, and relationship between scenario classes as inputs (Co. 8 lines 28-32: “In Step 2), the training subsample set and the original data set obtained by the above clustering are used respectively to train a plurality of different deep network models, and the training process of each model is computed by multi-thread in parallel. The following three types of deep network models can be used here...” teaches selecting a best set of models for combination to generate a new classifier (see Fig. 1) in which the relationships between different inputs are considered and analyzed using clustering; Col. 13 lines 15-23: “the present invention also proposes a re-sampling method based on a K-means clustering algorithm, in which the candidate sample frames extracted from the original data set are clustered according to certain characteristics, different types of pedestrian samples and non-pedestrian samples are obtained, and then they are input into different detection models for learning, so that the classifier can learn more concentrated sample characteristics” teaches clustering the input samples to obtain relationship between frames for pedestrian and non-pedestrian samples (correspond to scenario classes); Col. 8 lines 40-47: “Step 2B) Strong deep learning detectors, such as the United Deep Model described above, which can detect pedestrians in the image more accurately and quickly than the basic deep network model, and have better countermeasures for complex scenes. The addition of a strong deep learning detector in CDN can effectively ensure the detection effect of the overall model and further improve the detection performance of the strong deep learning detector” teaches selecting a neural network with a strong Col. 4 lines 65-67 teaches weights of the neural networks are factored into consideration).
Sharifi et al. and Wang et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate further comprising combining different neural network models by factoring neural network structures, weights, and relationship between scenario classes as inputs as taught by Wang et al. to the disclosed invention of Sharifi et al.
One of ordinary skill in the arts would have been motivated to make this modification because the combination of strong deep learning model with other deep neural network models “can effectively ensure the detection effect of the overall model and further improve the detection performance of the strong deep learning detector” (Wang et al. Col. 8 lines 40-47).
Regarding Claim 11,
Sharifi et al. in view of Wang et al. teaches the method of claim 1. 
Wang et al. further teaches wherein at least two neural networks are combined into a single neural network (Fig. 1 teaches combining multiple neural networks into a single neural network).
Sharifi et al. and Wang et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein at least two neural networks are combined into a single neural network as taught by Wang et al. to the disclosed invention of Sharifi et al.
One of ordinary skill in the arts would have been motivated to make this modification because the combination of strong deep learning model with other deep neural network models “can effectively 
Regarding Claim 13,
Claim 13 is substantially similar to claim 1, therefore claim 13 is rejected on the same ground as claim 1. 
Sharifi et al. further teaches a neural network classification system, said system comprising: a processor; and a memory, the memory storing instructions to cause the processor to perform (Fig. 2 and Col. 1 lines 39-42: “A computer-implemented technique and a computing system having one or more processors and a non-transitory memory storing a set of executable instructions for the technique are presented” teach a system with processor and memory storing instructions; Col. 2 lines 51-52: “FIGS. 3A-3C are diagrams of various artificial neural network (ANN) configurations” teaches implementing neural network classification).
Regarding Claim 14,
Claim 14 is substantially similar to claim 2, therefore claim 14 is rejected on the same ground as claim 2. 
Regarding Claim 15,
Claim 15 is substantially similar to claim 3, therefore claim 15 is rejected on the same ground as claim 3. 
Regarding Claim 18,
Sharifi et al. in view of Wang et al. teaches the system of claim 15
Sharifi et al. further teaches further comprising matching a scenario descriptor with a model trained at said each server. (Col. 6 lines 27-34: “In FIG. 3B, another example ANN 330 is illustrated. Again, the processing task is divided into client-side layers 334 and server-side layers 338 that are divided by a bottle-neck layer 342. This ANN 330 can be utilized, for example, when the network condition speed/bandwidth is low/poor (e.g., below the threshold) and the processing capabilities of the client computing device 104 are also low/poor (e.g., below a threshold)” teaches an example of matching a scenario descriptor of “when the network condition speed/bandwidth is low/poor” with an ANN model trained at the server to address the scenario; Col. 3 lines 41-51: “The client computing device 104 can communicate with a server computing device 112 via a network 116. The network 116 can be a local area network (LAN), a wide area network (WAN), e.g., the Internet, or a combination thereof. The term "server computing device" as used herein can refer to both a single server computing device and two or more server computing devices operating in a parallel or distributed architecture. For example, a machine learning model may be distributed over a plurality of server computing devices” teaches a distributed system in which there can be multiple servers).
Regarding Claim 20,
Claim 20 is substantially similar to claim 1, therefore claim 20 is rejected on the same ground as claim 1. 
Sharifi et al. further teaches a computer program product for terminology extraction, the computer program product comprising a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform (Fig. 2 and Col. 1 lines 39-42: “A computer-implemented technique and a computing system having one or more processors and a non-transitory memory storing a set of executable instructions for the technique are presented” teach a system with processor and memory storing instructions; Col. 2 lines 16-18: “In other embodiments, the digital media item is an image file or a video file, and the processing task is image recognition or text recognition” teaches text recognition, which corresponds to terminology extraction).

Claims 4-6 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi et al. (US 10,878,318 B2) in view of Wang et al. (US 10,867,167 B2) and further in view of Aslan et al. (US 2017/0132528 A1).
Regarding Claim 4,
Sharifi et al. in view of Wang et al. teaches the method of claim 3.
Sharifi et al. further teaches further comprising:...defining a scenario descriptor for each of the classifiers (Col. 4 line 57 to Col. 5 line 1: “The task can be described as training the ANNs such that they work optimally from both a quality point of view, while also balancing other objectives. For example, one other objective can be to limit client-side computation to what is feasible on the device. This may be determined, for example, by how many pre-bottleneck layers there are and the size of each layer, which corresponds to increased computation costs. Another example objective can be to limit the amount of communication/network cost and latency. This may be determined, for example, by the number of nodes and the amount of information (level of fidelity) in the bottleneck layer” teaches scenario descriptors such as limiting the amount of communication/network cost and latency for each of the ANNs; Figs. 3A-3C teach various artificial neural network (ANN) configurations); and 
receiving the requirement and a request for the new model by specifying a target scenario descriptor and a size limit of the new classifier (Col. 4 line 57 to Col. 5 line 1: “The task can be described as training the ANNs such that they work optimally from both a quality point of view, while also balancing other objectives. For example, one other objective can be to limit client-side computation to what is feasible on the device. This may be determined, for example, by how many pre-bottleneck layers there are and the size of each layer, which corresponds to increased computation costs. Another example objective can be to limit the amount of communication/network cost and latency. This may be determined, for example, by the number of nodes and the amount of information (level of fidelity) in the bottleneck layer” teaches requirements including size of the layers in the model (model size) and Fig. 4 Step 404 teaches a request for new model).
Sharifi et al. in view of Wang et al. does not appear to explicitly teach collecting labeled data at each server for training a model at said each server.
However, Aslan et al. teaches collecting labeled data at each server for training a model at said each server (pg. 8 [0065]: “FIG. 8 illustrates an exemplary computing system environment 800 for implementing the joint training techniques and systems described herein. The environment 800 can include a computing device 802, which can represent any suitable computing device, or set of computing devices (e.g., server computers)” teaches the joint training system is implemented on servers; pg. 3 [0024]: “The training data 104 can be stored in a database or repository of any suitable data, such as image data, speech data, text data, video data, or any other suitable type of data that can be processed by the machine learning models 100 and 102...The training data 104 can further include at least two additional components: features and labels” teaches the joint training system, which is implemented by servers, collects and stores labeled training data; pg. 2 [0023]: “the unlabeled data accessible to the second model 102 can be unlabeled data that the first model 100 uses to generate an output that is passed to the second model 102 for joint training. In this manner, information can be passed between the first model 100 and the second model 102 and the second model 102 can learn from the first model 100 as the second model 102 is trained” teaches that in one scenario, generated output from one model (labeled data) is passed to another model for joint learning).
Sharifi et al., Wang et al., and Aslan et al. are analogous art because they are directed to implementing neural network classification.

One of ordinary skill in the arts would have been motivated to make this modification in order to implement transfer learning between two models by using labeled data (Aslan et al. pg. 2 [0023]).
Regarding Claim 5,
Sharifi et al. in view of Wang et al. in view of Aslan et al. teaches the method of claim 4.
Sharifi et al. further teaches...and wherein said each server accepts the request for the new model, by specifying the target scenario descriptor and the size limit of the new classifier (Col. 4 line 57 to Col. 5 line 1: “The task can be described as training the ANNs such that they work optimally from both a quality point of view, while also balancing other objectives. For example, one other objective can be to limit client-side computation to what is feasible on the device. This may be determined, for example, by how many pre-bottleneck layers there are and the size of each layer, which corresponds to increased computation costs. Another example objective can be to limit the amount of communication/network cost and latency. This may be determined, for example, by the number of nodes and the amount of information (level of fidelity) in the bottleneck layer” teaches requirements including size of the layers in the model (model size) and target scenario descriptor such as limiting the amount of communication/network cost and latency for each of the ANNs, which is based on information about number of nodes and the amount of information in the bottleneck layer of the model; Fig. 4 Step 404 teaches a request for new model; Fig. 3A-3C further teach a server accepting a request for new model).
Aslan et al. further teaches wherein said each server has a method of collecting labeled data locally and training a neural network model (pg. 8 [0065]: “FIG. 8 illustrates an exemplary computing system environment 800 for implementing the joint training techniques and systems described herein. The environment 800 can include a computing device 802, which can represent any suitable computing device, or set of computing devices (e.g., server computers)” teaches the joint training system is implemented on local servers; pg. 3 [0024]: “The training data 104 can be stored in a database or repository of any suitable data, such as image data, speech data, text data, video data, or any other suitable type of data that can be processed by the machine learning models 100 and 102...The training data 104 can further include at least two additional components: features and labels” teaches the joint training system, which is implemented by servers, collects and stores labeled training data; pg. 2 [0023]: “the unlabeled data accessible to the second model 102 can be unlabeled data that the first model 100 uses to generate an output that is passed to the second model 102 for joint training. In this manner, information can be passed between the first model 100 and the second model 102 and the second model 102 can learn from the first model 100 as the second model 102 is trained” teaches that in one scenario, generated output from one model (labeled data) is passed to another model for joint learning; pg. 4 [0033]: “an optimization problem can be solved during joint training by optimizing an objective function jointly with respect to weight parameters of multiple models being trained in parallel” teaches training neural network in parallel).
Sharifi et al., Wang et al., and Aslan et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein said each server has a method of collecting labeled data locally and training a neural network model as taught by Aslan et al. to the disclosed invention of Sharifi et al. in view of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement transfer learning between two models by using labeled data (Aslan et al. pg. 2 [0023]).

Regarding Claim 6,
Sharifi et al. in view of Wang et al. in view of Aslan et al. teaches the method of claim 4.
Sharifi et al. further teaches further comprising matching a scenario descriptor with a model trained at said each server (Col. 6 lines 27-34: “In FIG. 3B, another example ANN 330 is illustrated. Again, the processing task is divided into client-side layers 334 and server-side layers 338 that are divided by a bottle-neck layer 342. This ANN 330 can be utilized, for example, when the network condition speed/bandwidth is low/poor (e.g., below the threshold) and the processing capabilities of the client computing device 104 are also low/poor (e.g., below a threshold)” teaches an example of matching a scenario descriptor of “when the network condition speed/bandwidth is low/poor” with an ANN model trained at the server to address the scenario; Col. 3 lines 41-51: “The client computing device 104 can communicate with a server computing device 112 via a network 116. The network 116 can be a local area network (LAN), a wide area network (WAN), e.g., the Internet, or a combination thereof. The term "server computing device" as used herein can refer to both a single server computing device and two or more server computing devices operating in a parallel or distributed architecture. For example, a machine learning model may be distributed over a plurality of server computing devices” teaches a distributed system in which there can be multiple servers).
Regarding Claim 16,
Sharifi et al. in view of Wang et al. teaches the system of claim 15.
Sharifi et al. further teaches further comprising:... defining a scenario descriptor for said each of the models (Col. 4 line 57 to Col. 5 line 1: “The task can be described as training the ANNs such that they work optimally from both a quality point of view, while also balancing other objectives. For example, one other objective can be to limit client-side computation to what is feasible on the device. This may be determined, for example, by how many pre-bottleneck layers there are and the size of each layer, which corresponds to increased computation costs. Another example objective can be to limit the amount of communication/network cost and latency. This may be determined, for example, by the number of nodes and the amount of information (level of fidelity) in the bottleneck layer” teaches scenario descriptors such as limiting the amount of communication/network cost and latency for each of the ANNs; Figs. 3A-3C teach various artificial neural network (ANN) configurations); and 
receiving the requirement and a request for the new model by specifying a target scenario descriptor and a size limit of the new model (Col. 4 line 57 to Col. 5 line 1: “The task can be described as training the ANNs such that they work optimally from both a quality point of view, while also balancing other objectives. For example, one other objective can be to limit client-side computation to what is feasible on the device. This may be determined, for example, by how many pre-bottleneck layers there are and the size of each layer, which corresponds to increased computation costs. Another example objective can be to limit the amount of communication/network cost and latency. This may be determined, for example, by the number of nodes and the amount of information (level of fidelity) in the bottleneck layer” teaches requirements including size of the layers in the model (model size) and target scenario descriptor such as limiting the amount of communication/network cost and latency for each of the ANNs, which is based on information about number of nodes and the amount of information in the bottleneck layer of the model; Fig. 4 Step 404 teaches a request for new model).
Sharifi et al. in view of Wang et al. does not appear to explicitly teach collecting labeled data at each server for training a classifier at each server.
However, Aslan et al. teaches collecting labeled data at each server for training a classifier at each server (pg. 8 [0065]: “FIG. 8 illustrates an exemplary computing system environment 800 for implementing the joint training techniques and systems described herein. The environment 800 can include a computing device 802, which can represent any suitable computing device, or set of computing devices (e.g., server computers)” teaches the joint training system is implemented on servers; pg. 3 [0024]: “The training data 104 can be stored in a database or repository of any suitable data, such as image data, speech data, text data, video data, or any other suitable type of data that can be processed by the machine learning models 100 and 102...The training data 104 can further include at least two additional components: features and labels” teaches the joint training system, which is implemented by servers, collects and stores labeled training data; pg. 2 [0023]: “the unlabeled data accessible to the second model 102 can be unlabeled data that the first model 100 uses to generate an output that is passed to the second model 102 for joint training. In this manner, information can be passed between the first model 100 and the second model 102 and the second model 102 can learn from the first model 100 as the second model 102 is trained” teaches that in one scenario, generated output from one model (labeled data) is passed to another model for joint learning).
Sharifi et al., Wang et al., and Aslan et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate collecting labeled data at each server for training a classifier at each server as taught by Aslan et al. to the disclosed invention of Sharifi et al. in view of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement transfer learning between two models by using labeled data (Aslan et al. pg. 2 [0023]).
Regarding Claim 17,
Sharifi et al. in view of Wang et al. in view of Aslan et al. teaches the system of claim 16. 
Sharifi et al. further teaches...and wherein said each server accepts the request for the new model, by specifying the target scenario descriptor and the size limit of the new model (Col. 4 line 57 to Col. 5 line 1: “The task can be described as training the ANNs such that they work optimally from both a quality point of view, while also balancing other objectives. For example, one other objective can be to limit client-side computation to what is feasible on the device. This may be determined, for example, by how many pre-bottleneck layers there are and the size of each layer, which corresponds to increased computation costs. Another example objective can be to limit the amount of communication/network cost and latency. This may be determined, for example, by the number of nodes and the amount of information (level of fidelity) in the bottleneck layer” teaches requirements including size of the layers in the model (model size) and target scenario descriptor such as limiting the amount of communication/network cost and latency for each of the ANNs, which is based on information about number of nodes and the amount of information in the bottleneck layer of the model; Fig. 4 Step 404 teaches a request for new model; Fig. 3A-3C further teach a server accepting a request for new model).
Aslan et al. further teaches wherein each server has a method of collecting labeled data locally and training a neural network model (pg. 8 [0065]: “FIG. 8 illustrates an exemplary computing system environment 800 for implementing the joint training techniques and systems described herein. The environment 800 can include a computing device 802, which can represent any suitable computing device, or set of computing devices (e.g., server computers)” teaches the joint training system is implemented on local servers; pg. 3 [0024]: “The training data 104 can be stored in a database or repository of any suitable data, such as image data, speech data, text data, video data, or any other suitable type of data that can be processed by the machine learning models 100 and 102...The training data 104 can further include at least two additional components: features and labels” teaches the joint training system, which is implemented by servers, collects and stores labeled training data; pg. 2 [0023]: “the unlabeled data accessible to the second model 102 can be unlabeled data that the first model 100 uses to generate an output that is passed to the second model 102 for joint training. In this manner, information can be passed between the first model 100 and the second model 102 and the second model 102 can learn from the first model 100 as the second model 102 is trained” teaches that in one scenario, generated output from one model (labeled data) is passed to another model for joint learning; pg. 4 [0033]: “an optimization problem can be solved during joint training by optimizing an objective function jointly with respect to weight parameters of multiple models being trained in parallel” teaches training neural network in parallel).
Sharifi et al., Wang et al., and Aslan et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein each server has a method of collecting labeled data locally and training a neural network model as taught by Aslan et al. to the disclosed invention of Sharifi et al. in view of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement transfer learning between two models by using labeled data (Aslan et al. pg. 2 [0023]).

Claims 7, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi et al. (US 10,878,318 B2) in view of Wang et al. (US 10,867,167 B2) and further in view of Wesolowski et al. (US 2019/0114537 A1).
Regarding Claim 7,
Sharifi et al. in view of Wang et al. teaches the computer-implemented method of claim 1.
Wang et al. further teaches wherein the selecting selects a best set of the existing models to be combined, based on the requirement of the new model specified and the scenario descriptor of the existing classifiers (Co. 8 lines 28-32: “In Step 2), the training subsample set and the original data set obtained by the above clustering are used respectively to train a plurality of different deep network models, and the training process of each model is computed by multi-thread in parallel. The following three types of deep network models can be used here...” teaches selecting a best set of models for combination to generate a new classifier (see Fig. 1); Col. 8 lines 40-47: “Step 2B) Strong deep learning detectors, such as the United Deep Model described above, which can detect pedestrians in the image more accurately and quickly than the basic deep network model, and have better countermeasures for complex scenes. The addition of a strong deep learning detector in CDN can effectively ensure the detection effect of the overall model and further improve the detection performance of the strong deep learning detector” teaches one example of selecting a model based on the description of the existing model being useful for learning complex scenes (corresponds to scenario descriptor), which matches the requirement of a new model that is able to “effectively ensure the detection effect of the overall model”).
Sharifi et al. and Wang et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the selecting selects a best set of the existing models to be combined, based on the requirement of the new model specified and the scenario descriptor of the existing classifiers as taught by Wang et al. to the disclosed invention of Sharifi et al.
One of ordinary skill in the arts would have been motivated to make this modification because the combination of strong deep learning model with other deep neural network models “can effectively ensure the detection effect of the overall model and further improve the detection performance of the strong deep learning detector” (Wang et al. Col. 8 lines 40-47).
Sharifi et al. in view of Wang et al. does not appear to explicitly teach wherein the selection is based on a similarity amongst the neural network comprising a distributed system having servers located at different geographical locations.
However, Wesolowski et al. teaches wherein the selection is based on a similarity amongst the neural network comprising a distributed system having servers located at different geographical locations (pg. 11 [0068] “master ML control system 21 may introduce check-point handshaking, wherein it compares the configuration of a first machine wherein a check-point is generated to the configuration of a target machine (or training group) to where a neural network model (or graph-segment of the neural network ML model) is to be transferred, identifies hyper-parameters of the training technique (e.g., distributed SGD, or other gradient descent-based technique) being used to train the neural network on the first machine, and adjusts at least part of these hyper-parameters in accordance with (hardware or performance) characteristics (e.g., type) of the target machine ( or training group) so that the target machine (or training group) produces training results (e.g., weights or parameters) similar to, e.g., within a predefined percentage range, of training results achievable by the first machine if its training of the neural network ML model had not been interrupted (e.g., with its original hyper-parameter settings)” teaches selecting a model based on similarity in producing training results amongst the neural network comprising various machines; pg. 16 [0093]: “computer system 1200 may be a server...Where appropriate, computer system 1200 may include one or more computer systems 1200; be unitary or distributed; span multiple locations; span multiple machines; span multiple data centers; or reside in a cloud, which may include one or more cloud components in one or more networks” teaches a distributed system having servers spanning multiple geographical locations).
Sharifi et al., Wang et al., and Wesolowski et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the selection is based on a similarity amongst the neural network comprising a distributed system having servers located at different geographical locations as taught by Wesolowski et al. to the disclosed invention of Sharifi et al. in view of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage a distributed cloud-based implementation of a machine learning system such that the system can “monitor the performance of each computing machine, and if necessary, transfer execution of a portion of the machine learning model from one machine to a faster or slower machine, as necessary, to 
Regarding Claim 12,
Sharifi et al. in view of Wang et al. teaches the computer-implemented method of claim 1.
Sharifi et al. in view of Wang et al. does not appear to explicitly teach embodied in a cloud-computing environment.
However, Wesolowski et al. teaches embodied in a cloud-computing environment (pg. 5 [0039]: “an ML model may be distributed among multiple similar machines, e.g., machines having identical or substantially similar architectures, using various distributive techniques” and pg. 16 [0093]: “Where appropriate, computer system 1200 may include one or more computer systems 1200; be unitary or distributed; span multiple locations; span multiple machines; span multiple data centers; or reside in a cloud, which may include one or more cloud components in one or more networks” teach a distributed machine learning system embodied in a cloud-computing environment; pg. 8 [0055] teaches multiple instances of models).
Sharifi et al., Wang et al., and Wesolowski et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate embodied in a cloud-computing environment as taught by Wesolowski et al. to the disclosed invention of Sharifi et al. in view of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage a distributed cloud-based implementation of a machine learning system such that the system can “monitor the performance of each computing machine, and if necessary, transfer execution of a portion of the machine learning model from one machine to a faster or slower machine, as necessary, to 
Regarding Claim 19,
Sharifi et al. in view of Wang et al. teaches the system of claim 13.
Sharifi et al. in view of Wang et al. does not appear to explicitly teach embodied in a cloud-computing environment.
However, Wesolowski et al. teaches embodied in a cloud-computing environment (pg. 5 [0039]: “an ML model may be distributed among multiple similar machines, e.g., machines having identical or substantially similar architectures, using various distributive techniques” and pg. 16 [0093]: “Where appropriate, computer system 1200 may include one or more computer systems 1200; be unitary or distributed; span multiple locations; span multiple machines; span multiple data centers; or reside in a cloud, which may include one or more cloud components in one or more networks” teach a distributed machine learning system embodied in a cloud-computing environment; pg. 8 [0055] teaches multiple instances of models).
Sharifi et al., Wang et al., and Wesolowski et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate embodied in a cloud-computing environment as taught by Wesolowski et al. to the disclosed invention of Sharifi et al. in view of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage a distributed cloud-based implementation of a machine learning system such that the system can “monitor the performance of each computing machine, and if necessary, transfer execution of a portion of the machine learning model from one machine to a faster or slower machine, as necessary, to .

Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi et al. (US 10,878,318 B2) in view of Wang et al. (US 10,867,167 B2) and further in view of Andoni et al. (US 2019/0130277 A1).
Regarding Claim 9,
Sharifi et al. in view of Wang et al. teaches the computer-implemented method of claim 8.
Sharifi et al. in view of Wang et al. does not appear to explicitly teach wherein the combining the neural networks aligns weights of at least two networks so that the at least two networks are best correlated, and combines each pair of aligned weights by taking an average or a maximum value.
However, Andoni et al. teaches wherein the combining the neural networks aligns weights of at least two networks so that the at least two networks are best correlated, and combines each pair of aligned weights by taking an average or a maximum value (Fig. 5 teaches aligning the weights of at least two neural network models selected by the ensemble (thus rendering the models are correlated in that they are now in the same ensemble) and taking an average of at least one pair of weights).
Sharifi et al., Wang et al., and Andoni et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the combining the neural networks aligns weights of at least two networks so that the at least two networks are best correlated, and combines each pair of aligned weights by taking an average or a maximum value as taught by Andoni et al. to the disclosed invention of Sharifi et al. in view of Wang et al.

Regarding Claim 10,
Sharifi et al. in view of Wang et al. teaches the computer-implemented method of claim 8.
Sharifi et al. in view of Wang et al. does not appear to explicitly teach wherein the combining is performed by simulating the network using random input data and finding change points of a classification result, and merging the neural networks based on observing different change point ranges.
However, Andoni et al. teaches wherein the combining is performed by simulating the network using random input data and finding change points of a classification result, and merging the neural networks based on observing different change point ranges (pg. 1 [0006]: “In accordance with the described techniques, a genetic algorithm may be executed to generate and train a neural network. Genetic algorithms are iterative adaptive search heuristics inspired by biological natural selection. The genetic algorithm may start with a population of random models that each define a neural network with different topology, weights and activation functions” and pg. 1 [0007]: “During at least one epoch, multiple models may be aggregated to form an ensembler. In this context, an "ensembler" is a data structure that links multiple models with an ensembling function” teach using an “ensembler” to link (combine/merge) multiple neural network models together in which the network is trained (simulated) using genetic algorithm; pg. 3 [0024]: “The input set 120 of an initial epoch of the genetic algorithm 110 may be randomly or pseudo-randomly generated” teaches using random input data; pg. 1 [0007]:  “The ensembling function is applied to the outputs of the models (e.g., intermediate outputs) to generate an ensembler output of the ensembler. For example, models having highest overall fitness values, models having highest fitness value per species, or both, may be aggregated to form the ensembler” and pg. 7 [0055]: “a subset of models of the input set may be aggregated to form the ensembler 172. In a particular implementation, the subset of models includes the "overall elite" models 460, 462, and 464. For example, a particular number of the fittest models, or all models having a fitness that exceeds a threshold, may be selected as the subset of models” teach evaluating the fitness values (corresponds to change points since fitness values reflect changes/differences in the accuracy of the model outputs (classification result)); see pg. 6 [0049]); pg. 7 [0053]: “In a particular aspect, the genetic algorithm 110 uses species fitness to determine if a species has become stagnant and is therefore to become extinct. As an illustrative non-limiting example, the stagnation criterion 150 may indicate that a species has become stagnant if the fitness of that species remains within a particular range (e.g., +/-5%) for a particular number (e.g., 5) epochs” teaches different ranges for the fitness score).
Sharifi et al., Wang et al., and Andoni et al. are analogous art because they are directed to implementing neural network classification.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the combining is performed by simulating the network using random input data and finding change points of a classification result, and merging the neural networks based on observing different change point ranges as taught by Andoni et al. to the disclosed invention of Sharifi et al. in view of Wang et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “automated model building systems and methods that utilize a genetic algorithm to generate and train a neural network in a manner that is applicable to multiple types of machine-learning problems and to generate an ensembler that generates an output that is based on outputs of multiple neural networks” (Andoni et al. pg. 1 [0004]).
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Markram et al. (US 2012/0323833 A1) teaches combining or merging neural networks, which is relevant to Fig. 2 of the present application.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484.  The examiner can normally be reached on Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 






/Y.C./               Examiner, Art Unit 2125            


/KAMRAN AFSHAR/               Supervisory Patent Examiner, Art Unit 2125